Lixin Meng wrote:
However, if the database return meta information tree /db /addressbook / email / phone ... It will open a door for an new breed of applications, such as a GUI tool that supports ad hoc query.
Do you mean that there might be a use case for a metadata that returns the *whole* database content? What would happen on a database with millions of documents? Is this feature available in RDBMS and JDBC? I assume that you want to "clone" something like "SELECT * FROM CAT" or "SHOW TABLES", am I right? If so, those commands will return you the tables (in our case, roughly speaking, the Collections) but never ever the whole data. Sorry if I'm not getting the point, but I feel a bit lost...
For end user, if one wants to get some weather information from a system, he
naturally thinks about '/USA/California/Bayarea/Temperature'. Do they really
care about that '/USA' is a collection or '/USA/California' is a collection?
The user possibly doesn't, but we definitely do. :-) Imagine we have a tree like this ("/" are Collections, "*" are Resources, "<>" are nodes in Resources):
/
|
+--/USA
+--*Statistics
| |
| +--<California>
| |
| +--<BayArea>
| |
| +--<Temperature>
|
+--/California
|
+--*BayArea
|
+--<Temperature>
How can we decide if Joe user wanted to know the value of the element <Temperature> on resource "Bayarea" contained inside the sub-collection "California" or if he wanted to query the USA collection for documents having an XPath of /California/BayArea/Temperature? Same XPath, but definitely different results...
On the other hand, if user really want to be specific, they can say /USA/California[system_type='collection']/... where 'system_type' is the meta information.
A bit clumsy but it might work, yet you would need to specify that even USA is a collection, so just in case I'd rather go for something like:
/collection[name='USA']/collection[name='California']/...
but then again if someone decides to put a Resource and call it "collection" you would be stuck anyway. True, you can add a namespace but it all feels so far away from Joe User to make it not really worthwile. But if we come up with a good syntax which is possibly compliant to the XPath specs (IIRC eXist had to go proprietary for some particular queries, I would do it only as a last resort), then I'm all ears.
I have my reservation on the issue that we should focus more on document-centric XML files too. At least, there is a 50-50 chance in the real world. As I said, my oringial movitation on searching XML database is from data processing not content management. More and more Web Services implementations mean more SOAP messages need to be logged and retrieved.
True. And not only that: I foresee a great potential for Xindice (and XML databases in general) to become a great persistence engine. We have all sort of object serialization to XML, so we would end up with an OODBMS at little or no cost.
But my point, actually, is to try to build an engine that is capable of dealing efficiently with both kind of XML. After all, in XML, you don't need the "R" in RDBMS, so it is intrinsecally overkill to use a relational database: in the end you would end up by using at most a handful of tables (while performing horrible and expensive JOINs), not to mention the overhead for serializing XML to SQL and SQL to XML. Add to this the network latency and you're set with a possibly suboptimal setup. On the other hand, nce you manage to have a tabular output you can use hashes, arrays and the like, so any DBM would suffice. Don't you think so?
Ciao,
-- Gianugo Rabellino