Re: [RT] Xindice 2.0

Kevin Ross 27 Nov 2002 14:12:26 -0000

Wow, much better website! I don't think I have to mention what this will do for our appearance to the general user community, showing real progress. Kudos to Vladimir and everyone else who helped out.

If we can agree to set some goals, I believe that Vladimir already has (had?) some on one of the web pages. When you look at forrest, it is the 'dreams' link. I can't seem to find that now Vladimir, you know where it is?

I think 'dreams' and 'todo' are a little different, in that we are commited to delivering todo's in the immediate or near timeline.

just my 2 cents...

-Kevin

PS- I need cross-collection XQuery

Gianugo Rabellino wrote:

This is probably a good time to start thinking about Xindice 2.0. The major number switch should come from a major evolution of the current architecture: we have now a quite solid XML database, but still there is a lot of work to do in order to make Xindice a viable solution for the use cases that have been aniticipated by our candidate users.

This is just a "starting point" to try and set things straight, in order to try to come up together with a sort of guideline for future developments. Please, feel free to fire at will, and remember that these are just Random Toughts. :-)

There are some major points that I would like to address in the next future. In no particular order I think we need to work on:

1. XML:DB API This is not a 100% issue of Xindice, yet I think that since dbXML before and Xindice afterwards are the de facto standards for this API, the XML:DB APIs should be the primary way to access the database. I still think that it's really important to have a vendor-neutral API for accessing XML databases, so I would like to invest more and more on this: we might try to push on the xapi-dev list and see what happens, if we fail it will be always possible to run wild and do our own extensions.

I think that we need to extend the API in order to accomodate the needs anticipated by the users. These points at least are crucial to me:

- metadata: we need a neutral way to query metadata for collections and resources. I like David's solution of having a MetaData object with a set of fixed and basic metadata (author, creation, modification), a set of "properties" and a custom XML-based system: we don't really need much more than that, but we also need to refine it in order to come out with a complete solution that addresses the most basic needs (I, for one, would like to add to the MetaData the collection and the document ID). When the MetaData object is carved in stone we can decide how to get it: I'm all in favor for something like getMetaData() calls on Collection and Resource.

- transaction support: the API should have a basic support for atomic operations and for transactions;

- capabilities (is that the right English term?). There should be a way to query the Database (or maybe the Collection?) to understand if it supports some features (i.e.: transaction). A parallel with JDBC would be the DatabaseMetaData object even if I'm not really sure about the plethora of supports* methods, the alternative a SAX-feature like (URI based) set of capabilities and a single method to query for support, with a pseudocode of:
if (database.supports(Capabilities.TRANSACTIONS)) {
    begin()/work()/commit()
} else {
    workAndHopeForTheBest()
}
Again: this is not exactly the right place to discuss this, but before going to xapi-dev I'd like to hear your opinion and put together a draft that comprises all our present and (possibly :-)) future needs.

2. PERFORMANCE Face it: we are slow. We are fair enough for small jobs but we cannot stand high loads or huge documents, no matter how accurate your indexes might be. I put a great deal of hope into Tom's work on Xalan DTM (http://xml.apache.org/xalan-j/dtm.html) to improve the Xindice performances, but as of now I'm afraid that Tom is MIA too, so unless he shows up we have no choice but doing it on our own and decide what might be the best way to improve the Xindice storage and retrieval performance. I see some possible directions:

a. Stefano pointed me to the Lore documentation. The guys at Stanford did a whole lot of work thinking about storage of semi-structured data, we might borrow something from there, if it's still up to date (http://www-db.stanford.edu/lore/);

b. DTM (http://xml.apache.org/xalan-j/dtm.html). I had a small chat with Shane Curcuru from Xalan at ApacheCon and he was cautious about using DTM for persistent storage. But it might be worth trying (by asking to xalan-dev) to see if the DTM model is good enough (or can possibly be extended) to accomodate our needs;

c. SAX events. There is almost no doubt about SAX being the most efficient way to deal with XML speed & memory wise. As of now Xindice is heavily based on DOM (albeit compressed and finely tuned), it might be worth investigating if this should change. Cocoon had very good results using SAX even for the internal cache, by compiling SAX events to byte streams and interpreting them at a later time: see http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/java/org/apache/cocoon/components/sax/ and look for XMLByteStream[Compiler|Interpreter]. We might borrow that at least for the transport of SAX events over the wire in the XML-RPC protocol: if we have on the server side a Compiler (or, even better, if the documents are already stored in a compiled format) and on the client side an Interpreter things might be a whole lot faster, exp. when dealing with SAX based applications such as Cocoon.
3. AAA
Badly needed, on two sides:
a. Server side: not that hard to implement, after all, at least on a not-so-granular way. We might go the hard way with security-oriented markup languages and node based security or just rely on URI-based authentication, with a Tomcat/Slide/younameit-like role system. I'd go for the latter: Collection based security should be enough for most needs.

b. transport: if we are going to have username and passwords flying over the wire, we need to protect them. XML-RPC over HTTPS? CHAP? Kerberos? Other thoughts?

4. TRANSACTION This is needed too. I don't know how JTA might help here, I have no idea of the API and never worked with it. Any expert around? We would need to know not only if JTA would make the job, but also if, performance wise, it will suffice without imposing severe penalties to the system.
======================================================================
OK, this was the first stone in the lake: I hope to sparkle some discussion on it and, once we manage to agree on what we want from 2.0, to start writing docs and code. I'm now borrowing the world-famous absbestos underwear from Stefano & Sam and I'm eagerly waiting for your replies.
Ciao,

Re: [RT] Xindice 2.0

Reply via email to