----- Original Message ----- From: "Gianugo Rabellino" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, November 27, 2002 6:15 AM Subject: [RT] Xindice 2.0
> > This is probably a good time to start thinking about Xindice 2.0. The > major number switch should come from a major evolution of the current > architecture: we have now a quite solid XML database, but still there is > a lot of work to do in order to make Xindice a viable solution for the > use cases that have been aniticipated by our candidate users. > > This is just a "starting point" to try and set things straight, in order > to try to come up together with a sort of guideline for future > developments. Please, feel free to fire at will, and remember that these > are just Random Toughts. :-) > > There are some major points that I would like to address in the next > future. In no particular order I think we need to work on: > > 1. XML:DB API > This is not a 100% issue of Xindice, yet I think that since dbXML before > and Xindice afterwards are the de facto standards for this API, the > XML:DB APIs should be the primary way to access the database. I still > think that it's really important to have a vendor-neutral API for > accessing XML databases, so I would like to invest more and more on > this: we might try to push on the xapi-dev list and see what happens, if > we fail it will be always possible to run wild and do our own extensions. > > I think that we need to extend the API in order to accomodate the needs > anticipated by the users. These points at least are crucial to me: > > - metadata: we need a neutral way to query metadata for collections and > resources. I like David's solution of having a MetaData object with a > set of fixed and basic metadata (author, creation, modification), a set > of "properties" and a custom XML-based system: we don't really need much > more than that, but we also need to refine it in order to come out with > a complete solution that addresses the most basic needs (I, for one, > would like to add to the MetaData the collection and the document ID). > When the MetaData object is carved in stone we can decide how to get it: > I'm all in favor for something like getMetaData() calls on Collection > and Resource. > > - transaction support: the API should have a basic support for atomic > operations and for transactions; > > - capabilities (is that the right English term?). There should be a way > to query the Database (or maybe the Collection?) to understand if it > supports some features (i.e.: transaction). A parallel with JDBC would > be the DatabaseMetaData object even if I'm not really sure about the > plethora of supports* methods, the alternative a SAX-feature like (URI > based) set of capabilities and a single method to query for support, > with a pseudocode of: > > if (database.supports(Capabilities.TRANSACTIONS)) { > begin()/work()/commit() > } else { > workAndHopeForTheBest() > } > > Again: this is not exactly the right place to discuss this, but before > going to xapi-dev I'd like to hear your opinion and put together a draft > that comprises all our present and (possibly :-)) future needs. > > 2. PERFORMANCE > Face it: we are slow. We are fair enough for small jobs but we cannot > stand high loads or huge documents, no matter how accurate your indexes > might be. I put a great deal of hope into Tom's work on Xalan DTM > (http://xml.apache.org/xalan-j/dtm.html) to improve the Xindice > performances, but as of now I'm afraid that Tom is MIA too, so unless he > shows up we have no choice but doing it on our own and decide what might > be the best way to improve the Xindice storage and retrieval > performance. I see some possible directions: > > a. Stefano pointed me to the Lore documentation. The guys at Stanford > did a whole lot of work thinking about storage of semi-structured data, > we might borrow something from there, if it's still up to date > (http://www-db.stanford.edu/lore/); > > b. DTM (http://xml.apache.org/xalan-j/dtm.html). I had a small chat with > Shane Curcuru from Xalan at ApacheCon and he was cautious about using > DTM for persistent storage. But it might be worth trying (by asking to > xalan-dev) to see if the DTM model is good enough (or can possibly be > extended) to accomodate our needs; > > c. SAX events. There is almost no doubt about SAX being the most > efficient way to deal with XML speed & memory wise. As of now Xindice is > heavily based on DOM (albeit compressed and finely tuned), it might be > worth investigating if this should change. Cocoon had very good results > using SAX even for the internal cache, by compiling SAX events to byte > streams and interpreting them at a later time: see > http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/java/org/apache/cocoon/com ponents/sax/ > and look for XMLByteStream[Compiler|Interpreter]. We might borrow that > at least for the transport of SAX events over the wire in the XML-RPC > protocol: if we have on the server side a Compiler (or, even better, if > the documents are already stored in a compiled format) and on the client > side an Interpreter things might be a whole lot faster, exp. when > dealing with SAX based applications such as Cocoon. > > 3. AAA > Badly needed, on two sides: > > a. Server side: not that hard to implement, after all, at least on a > not-so-granular way. We might go the hard way with security-oriented > markup languages and node based security or just rely on URI-based > authentication, with a Tomcat/Slide/younameit-like role system. I'd go > for the latter: Collection based security should be enough for most needs. > > b. transport: if we are going to have username and passwords flying over > the wire, we need to protect them. XML-RPC over HTTPS? CHAP? Kerberos? > Other thoughts? > > 4. TRANSACTION > This is needed too. I don't know how JTA might help here, I have no idea > of the API and never worked with it. Any expert around? We would need to > know not only if JTA would make the job, but also if, performance wise, > it will suffice without imposing severe penalties to the system. > > ====================================================================== > > OK, this was the first stone in the lake: I hope to sparkle some > discussion on it and, once we manage to agree on what we want from 2.0, > to start writing docs and code. I'm now borrowing the world-famous > absbestos underwear from Stefano & Sam and I'm eagerly waiting for your > replies. > > Ciao, > > -- > Gianugo Rabellino