Re: [RT] Xindice 2.0

Gianugo Rabellino 28 Nov 2002 08:58:29 -0000

Lixin Meng wrote:

>Do you mean that there might be a use case for a metadata
>that returns
>the *whole* database content? What would happen on a database with
>millions of documents? Is this feature available in RDBMS and JDBC? I
>assume that you want to "clone" something like "SELECT * FROM CAT" or
>"SHOW TABLES", am I right? If so, those commands will return you the
>tables (in our case, roughly speaking, the Collections) but
>never ever
>the whole data. Sorry if I'm not getting the point, but I
>feel a bit lost...
Sorry confused (might even scare?) you. It is definitely not the whole database content. Only the *meta* information about the structure. It is more like in RDBMS, you can get the db schema from its system tables (like 'select table_name from user_tables' for Oracle). For RDBMS, one may only need to know tables and fields information. For XML database, to build a global meta tree will be much deeper and expensive than that.

I see your point, yet I'm still scared (expecially when using Xindice as a persistence engine or with Web Services, like the use cases that you were citing in your previous email) that this might turn into a performance bottleneck, with metadata requests hammering too much the database. We might consider it to some extent, maybe adding something like the WebDAV Depth concept to limit what would be returned.

That's the beauty of virtualization. By default, we return both. If you think the XPath actually represent the semantic meaning of the result, there is no difference at the semantic level. Also why people want to create or categorize those collections at the first place? Because they want to give some meaning to the content. Isn't that the same idea behind those XML tags? Crazy?

Not crazy, actually it might make sense, but I hear some FS bells ringing and I oversee a dead end alley somewhere, at least from the performance POV (think about looking at all possible permutations of such XPaths in the database: we should look if under USA there is any collection called California OR any document called California containing /Bayarea/Temperature OR any document containg /California/BayArea/Temperature. Then we move on to collection California, and we have to check if there is any collection called BayArea OR any document called Bayarea containing /Temperature OR any document containing /Bayarea/Temperature...

Computationally scary, don't you think? :-) Also, if we have more than one result, we need to return it in an intelligent way so that users might notice where we have collection and where we are talking about documents... all in all looks at least very difficult and not user friendly to me.

I agree one should avoid JOIN at all cost. If one want to build a DOM tree in RDBMS, JOIN will be inevitable (that's why I have some reservations over eXist). The preliminary idea in my previous email is not to build the DOM tree in order to minimize the JOINs, with the price paid to prepare those XPaths when inserting the document (kind of like a index).

OK. So just make me understand why you would want to use a *relational* database if your main target is to avoid *relations*. :-) I still think that, while I see that RDBMSs have been optimized for ages, a plain database would be the best tool for the job. But I'd most probably +1 a RDBMS based implementation of *indexes* as an alternative. I don't see the need for having it as a storage (looks really *ugly* to me to just dump a BLOB into the DB...

If the 'network latency' is referring to cost associated with JDBC connections, I guess it can be ignored at this stage,

Not that sure. Remember that if we go to a RDBMS we are adding another level of indirection (client->server->RDBMS), so we need to take into account even that.

I don't want to reinvent the wheel. My point is that if all I have is a car, I need a car wheel, I don't need a truck or a bicycle wheel. :-)

Ciao,

--
Gianugo Rabellino

Re: [RT] Xindice 2.0

Reply via email to