Last Friday, I formally resigned from my position as Chief Architect of
the dbXML Group, and so I am now a free agent. I am about to take a job
with a company in the Bay Area, and will be relocating there shortly
after that. This new position may or may not afford me the ability to
continue working on Xindice with the amount of attention I devote to it
now, so we need to start taking steps in order to make sure that the
project continues to evolve if the situation is such that I can't do a
lot of the coding any more.
There are a few of things that need to be addressed in future revisions
of Xindice. I'll run through them very quickly, and then I'd like to
hear people's feedback.
Wire Protocol changes
-------------------------------
These have been widely mentioned, but we need to start moving away from
CORBA and supporting a more flexible wire protocol system with Xindice.
I'd propose to use my Labrador framework to provide this functionality,
as I've already experimented with it, and it works rather well.
Schema support
-----------------------
We need to support schemas in an abstracted fashion. If we can
architect a content model API that would allow the system to validate
and operate against a content model without needing to know that the
content model is based on XML Schemas or Relax NG, that would be ideal.
Context-sensitive indexing
------------------------------------
XML Schemas introduces the idea of contextually-dependant typing. What
this means is that for any particular schema, that schema may use the
same element name in more than one scope, and assign to that element
name a completely different primitive type for each scope. So in one
scope, it may be an int, while in another it may be a string, or even a
complex structure.
Xindice's indexing system was originally design when DTDs were the only
standard way of representing an XML schema, and in DTDs, an element name
is globally unique. So we need to rearchitect the indexing system to
support the ability for attaching a particular index to a schema
context. I have some vague ideas of how to do this, but I'd like to get
a user's perspective on how you'd like to see this made available.
Large Documents and Document Versioning
------------------------------------------------------------
Xindice needs to be capable of supporting massive documents in a
scalable fashion and with acceptable performance. Currently, the
document representation architecture is based on a tokenized, lazy DOM
where the bytestream images that feed the DOM are stored and retrieved
in a paged filing system. Every document is treated as an atomic unit.
This has some serious limitations when it comes to massive documents.
In order to support very large documents, the tokenization system needs
to be replaced and geared more toward the simplified representation of
document structure rather than an equal balance of structure and
content. Also, the Filer interfaces need to support the notion of
streaming, and even more importantly, the ability to support random
access streaming.
Also, the tokenization system needs to support versioning in one way or
another. For small documents, complete document revision links or
permissible, but for massive documents, there's no way that versioning
of that nature is acceptible. So, the tokenization system needs to
understand the notion of versioned linking.
The DTSM stuff that I started working on will help with the massive
document problem, but we'd need to introduce the versioning concept into
the specification as well.
Paged Files and BTrees
---------------------------------
Nodes that are stored by Paged files are currently materialize in their
entirety, even if all of their content isn't needed. Originally, it was
written like this because I wanted to nail down functionality. In a
language like C++ or C, this is not an issue because you point a struct
pointer to an offset into your buffer, and voila, you're done, but in
Java, it requires a lot of conversion. For Java, it may improve
performance quite a bit if node portions (such as BTree node pointer and
value lists) were materialize only on demand rather than as a whole.
Obviously, this would require some research to determine if my guess is
true or not.
--
Tom Bradford - http://www.tbradford.org
Developer - Apache Xindice (Native XML Database) - http://xml.apache.org
Creator - Project Labrador (Web Services Framework) -
http://notdotnet.org
- Re: Future of Xindice Tom Bradford
- Re: Future of Xindice Jeff Greif
- Re: Future of Xindice Murray Altheim
- Re: Future of Xindice Dare Obasanjo
- RE: Future of Xindice Mike Mortensen
- RE: Future of Xindice Timothy M. Dean
- Re: Future of Xindice Murray Altheim
- RE: Future of Xindice Timothy M. Dean
- Re: Future of Xindice Murray Altheim
- RE: Future of Xindice Timothy M. Dean
- Validation Issues Murray Altheim