Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 05:38 AM, Arno de Quaasteniet wrote: Some general remarks: * Resource and Services are perfectly abstract names but its hard to imagine for a user what they mean. I'm in favor of more specific names, to make it easier for users to imagine what they stand for (I only have to figure out what the right names would be). A valid criticism, but it's a little late in the game for that. * As Dare Obasanjo already mentioned the tying of services to collections is not very practical. I think this is definitly something that should be changed. I disagree. A Service is a Collection augmentation mechanism and may be exposed with multiple implementations of the same interface depending on which collection it is augmenting, which would especially be important for servers whose underlying data model is aggregated from a variety of sources. To say that it shouldn't be associated with individual collections makes it much more difficult to implement. Interface specific remarks: Collection interface * I think the behavior and interface of the getServices method should be changed, because: - Each instance of a service could possibly take up resources, in which case you would want to instantiate those services lazy whenever getService is called. This is an implementation issue, and can be addressed specifically be a vendor and their own needs. Just as your concern about where collection associates should be... There's nothing barring someone from having a single instance of the same service, and having the Collection implementations resolve to that same instance... Or to lazily instantiate a service upon request. * I'm not quite sure about the use of getResourceCount/getChildCollectionCount, since in the case of X-Hive it involves counting the resources which of course has a bad performance characteristic. Bad characteristic for X-Hive, but not other DBs. This is not a problem for Xindice, and probably won't be a problem for systems like Tamino and many relationally mapped XML-DBs. You can always throw an exception. CollectionManagementService interface * If think this interface is overkill, why not add the createCollection and removeCollection methods to the CollectionInterface? If not should it then check if the collection it operates on is still open? This is a sticky issue no matter what, because the way most vendors implement collection management is different, though the way most people access the content of a collection is fairly consistent. For example, how would you propose a generic way to create a collection based on relational mapping? It's not very simple, which is why decoupling the two functionalities and allowing a vendor to write a proprietary collection management service if necessary seemed like to most appropriate solution. * getIterator returns a ResourceIterator. I'm more in favor of returning a java.util.Iterator (I don't see the cast that becomes necessary as a problem), and renaming the method to iterator() because that's more like other java interfaces, though I understand that this just a matter of taste, and having an own interface for it could make porting the API to other platforms than java easier. The primary goal of the API is to be platform and language independent, which is why I'm sure people who are implementing the API in Python or C++ wouldn't have agreed with you. * The ResourceIterator interface If not replaced by java.util.Iterator I would prefer if this interface would have methods named next() and hasNext() instead of nextResource() and hasMoreResources(). Why not write an Adapter that implements the Java Iterator interface and wraps the ResourceIterator? -- Tom Bradford - http://www.tbradford.org Apache Xindice (Native XML Database) - http://xml.apache.org Project Labrador (Web Services Framework) - http://notdotnet.org -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Remarks about XML:DB API
Hi, Inspired by the SixDML proposal I've been looking some more into the XMLD:DB API specification(since its partially based on the XML:DB core API spec) and have number of remarks about it, though I did not yet have time to read the specification thoroughly, so expect some more. Unfortunatly I also didn't have enough time to think of alternatives the things I have a problem with. Some general remarks: * Resource and Services are perfectly abstract names but its hard to imagine for a user what they mean. I'm in favor of more specific names, to make it easier for users to imagine what they stand for (I only have to figure out what the right names would be). * As Dare Obasanjo already mentioned the tying of services to collections is not very practical. I think this is definitly something that should be changed. Interface specific remarks: Collection interface * I think the behavior and interface of the getServices method should be changed, because: - Each instance of a service could possibly take up resources, in which case you would want to instantiate those services lazy whenever getService is called. - It's not likely you need them all at once. - If its meant for checking the types of services supported by the collection (though personally I do not think that services should be coupled to collections at all) then it could return only the names of the services it supports. * I'm not quite sure about the use of getResourceCount/getChildCollectionCount, since in the case of X-Hive it involves counting the resources which of course has a bad performance characteristic. CollectionManagementService interface * If think this interface is overkill, why not add the createCollection and removeCollection methods to the CollectionInterface? If not should it then check if the collection it operates on is still open? ResourceSet interface * getResource(long item) will only have a good performance if there's a random access list behind the resource set. * getSize will only have a good performance if there's a list behind the resource set When evaluating queries lazy (not always completely possible: for instance if the end result, or temporary results need to be sorted), you typically do not want to gather results in a list, but return them one by one in using an iterator. What you typically want to prevent is that users use code like this: ResourceSet rs = ...; for (long i = 0; i < rs.getSize(); i++) { Resource r = rs.getResource(i); } to iterate over the query results when the query is lazy evaluated. Because this would mean that the result set should first gather al the query results which would essentially mean that the results are iterated twice (and you may not have enough working memory to get all the results from the database). Though of course these methods could be useful when there's a list behind the resource set (for instance when the end result needed to be sorted) in those cases you can request the size without a performance penalty. So maybe some method should be added to see if the resourceset is lazy or not? * getIterator returns a ResourceIterator. I'm more in favor of returning a java.util.Iterator (I don't see the cast that becomes necessary as a problem), and renaming the method to iterator() because that's more like other java interfaces, though I understand that this just a matter of taste, and having an own interface for it could make porting the API to other platforms than java easier. * The ResourceIterator interface If not replaced by java.util.Iterator I would prefer if this interface would have methods named next() and hasNext() instead of nextResource() and hasMoreResources(). An finally I have a question, is there a test suite that tests conformance to the API? Kind regards, Arno de Quaasteniet X-Hive Corporation +31 (0)10 710 86 24 http://www.x-hive.com [EMAIL PROTECTED] -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
--- Tom Bradford <[EMAIL PROTECTED]> wrote: > > * As Dare Obasanjo already mentioned the tying of > services to > > collections is not very practical. I think this is > definitly something > > that should be changed. > > I disagree. A Service is a Collection augmentation > mechanism and may be > exposed with multiple implementations of the same > interface depending on > which collection it is augmenting, which would > especially be important > for servers whose underlying data model is > aggregated from a variety of > sources. To say that it shouldn't be associated > with individual > collections makes it much more difficult to > implement. > What you have described is an implementation detail that should be hidden from the user. Secondly I'm not even sure I understand what it means. However, I do understand that to start a transaction, perform a query or an update I need to first grab some collection object and then grab a service object from it. So if I grab a the "/db/my_collection/xsl/" collection and obtain a query service or transaction service. Does this mean that I can't use this object to start a transaction or perform a query if I'll be performing operations on the "/db/schemas/" collection? If the answer to the above question is Yes, then this is an issue that will cause user confusion and perhaps errors (standardization on getting services from the DB root would help but then defeats the purpose of tying services to collections in the first place). If the answer is No, then there doesn't seem to be any justification in tying services to collections. = LAWS OF COMPUTER PROGRAMMING, VIII Any non-trivial program contains at least one bug. http://www.25hoursaday.com Carnage4Life (slashdot/advogato/kuro5hin) __ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 01:44 PM, Dare Obasanjo wrote: What you have described is an implementation detail that should be hidden from the user. Secondly I'm not even sure I understand what it means. However, I do understand that to start a transaction, perform a query or an update I need to first grab some collection object and then grab a service object from it. So if I grab a the "/db/my_collection/xsl/" collection and obtain a query service or transaction service. Does this mean that I can't use this object to start a transaction or perform a query if I'll be performing operations on the "/db/schemas/" collection? Yes... and it shouldn't cause confusion because Services as they're implemented at the moment can't be repointed to other Collections. To a Service, the Collection provides context. It may be a starting context for recursive processing, or it may be a singular context... Depends on the nature of, and how the service is implemented. There's nothing stopping someone from implementing a Service that is tied to the root Collection of the database and operates on the database as a whole, but not allowing the possibility of context would be too restrictive contextually, where naming and implementation flexibility are concerned. -- Tom Bradford - http://www.tbradford.org Apache Xindice (Native XML Database) - http://xml.apache.org Project Labrador (Web Services Framework) - http://notdotnet.org -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 02:01 PM, Tom Bradford wrote: Yes... and it shouldn't cause confusion because Services as they're implemented at the moment can't be repointed to other Collections. To a Service, the Collection provides context. It may be a starting context for recursive processing, or it may be a singular context... Depends on the nature of, and how the service is implemented. There's nothing stopping someone from implementing a Service that is tied to the root Collection of the database and operates on the database as a whole, but not allowing the possibility of context would be too restrictive contextually, where naming and implementation flexibility are concerned. The problem comes if there is no root collection. For instance I have an Oracle 9i impl where the collection hierarchy is flat. I had to synthesize a root collection in order to have a starting point to create collections. This isn't intuitive when the database doesn't support a hierarchy of collections. I actually agree with Dare on this, Services tied to collections is too limiting. We need a cleaner distinction of database level services. I don't think all services should be database level, but the concept needs to exist. -- Tom Bradford - http://www.tbradford.org Apache Xindice (Native XML Database) - http://xml.apache.org Project Labrador (Web Services Framework) - http://notdotnet.org -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ -- Kimbro Staken XML Database Software, Consulting and Writing http://www.xmldatabases.org/ -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 02:09 PM, Kimbro Staken wrote: The problem comes if there is no root collection. For instance I have an Oracle 9i impl where the collection hierarchy is flat. I had to synthesize a root collection in order to have a starting point to create collections. This isn't intuitive when the database doesn't support a hierarchy of collections. I actually agree with Dare on this, Services tied to collections is too limiting. We need a cleaner distinction of database level services. I don't think all services should be database level, but the concept needs to exist. My only argument is that Collection-level services are needed, and shouldn't be eliminated. I have no problem with adding Database level services. -- Tom Bradford - http://www.tbradford.org Apache Xindice (Native XML Database) - http://xml.apache.org Project Labrador (Web Services Framework) - http://notdotnet.org -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:xapi-dev- [EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ -- Kimbro Staken XML Database Software, Consulting and Writing http://www.xmldatabases.org/ -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:xapi-dev- [EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ -- -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
--- Tom Bradford <[EMAIL PROTECTED]> wrote: > On Thursday, February 7, 2002, at 02:09 PM, Kimbro > Staken wrote: > > The problem comes if there is no root collection. > For instance I have > > an Oracle 9i impl where the collection hierarchy > is flat. I had to > > synthesize a root collection in order to have a > starting point to > > create collections. > > This isn't intuitive when the database doesn't > support a hierarchy of > > collections. I actually agree with Dare on this, > Services tied to > > collections is too limiting. We need a cleaner > distinction of database > > level services. I don't think all services should > be database level, > > but the concept needs to exist. > > My only argument is that Collection-level services > are needed, and > shouldn't be eliminated. I have no problem with > adding Database level > services. :) This can easily be supported by doing what I did with SiXDML. Just add getService(String, String) to the Database class. = LAWS OF COMPUTER PROGRAMMING, VIII Any non-trivial program contains at least one bug. http://www.25hoursaday.com Carnage4Life (slashdot/advogato/kuro5hin) __ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 02:30 PM, Dare Obasanjo wrote: This can easily be supported by doing what I did with SiXDML. Just add getService(String, String) to the Database class. Here's the problem with that though. Imagine you have a program that performs service requests in a generic fashion against Collections that are passed to it. Now furthermore, say you have two collections, one is a collection that is relationally mapped, the other that is native. Because of this, the Service may have to be implemented completely differently. When you request a Service of the same name, you'll be getting back the same interface, but with a different underlying implementation. It's awkward enough that you'd have to query the Collection for its absolute path, and then pass that absolute path to the Database to resolve the Service, but add to that the fact that when you offload Service resolution responsibilities to the Database, you're asking it not only to get a Service, but to get a specific implementation based on the Collection name you're passing to it, which is more responsibility than the Database needs to handle, especially in a system where the collection structure is based on many heterogeneous data sources and implementations. -- Tom Bradford - http://www.tbradford.org Apache Xindice (Native XML Database) - http://xml.apache.org Project Labrador (Web Services Framework) - http://notdotnet.org -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 02:40 PM, Tom Bradford wrote: On Thursday, February 7, 2002, at 02:30 PM, Dare Obasanjo wrote: This can easily be supported by doing what I did with SiXDML. Just add getService(String, String) to the Database class. Here's the problem with that though. Imagine you have a program that performs service requests in a generic fashion against Collections that are passed to it. Now furthermore, say you have two collections, one is a collection that is relationally mapped, the other that is native. Because of this, the Service may have to be implemented completely differently. When you request a Service of the same name, you'll be getting back the same interface, but with a different underlying implementation. It's awkward enough that you'd have to query the Collection for its absolute path, and then pass that absolute path to the Database to resolve the Service, but add to that the fact that when you offload Service resolution responsibilities to the Database, you're asking it not only to get a Service, but to get a specific implementation based on the Collection name you're passing to it, which is more responsibility than the Database needs to handle, especially in a system where the collection structure is based on many heterogeneous data sources and implementations. I don't think he was suggesting that this should be the only way to access collections just an addendum. The one problem I do see with it is that it changes the concept of the Database. In the current API you shouldn't be using the database instance for anything beyond the initial setup. If we move logic like getService into it then you'll actually be using the Database instance in other places as well. Not a major problem, but not as simple as just adding one method. We'd probably need a method on Collection to return the Database instance. Or another option would be to change the getService method to enable specification of what scope the service applies too. I almost like that better. Collection.getService(name, version, scope) where scope is one of three values, database, collection, or hierachy. These could be defined as constants in the Service interface. Hierarchy would apply to the collection and all children of the collection. Either way would work though. -- Tom Bradford - http://www.tbradford.org Apache Xindice (Native XML Database) - http://xml.apache.org Project Labrador (Web Services Framework) - http://notdotnet.org -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ -- Kimbro Staken XML Database Software, Consulting and Writing http://www.xmldatabases.org/ -- Post a message: mailto:[EMAIL PROTECTED] Unsubscribe:mailto:[EMAIL PROTECTED] Contact administrator: mailto:[EMAIL PROTECTED] Read archived messages: http://archive.xmldb.org/ --
Re: Remarks about XML:DB API
On Thursday, February 7, 2002, at 05:38 AM, Arno de Quaasteniet wrote: Hi, Inspired by the SixDML proposal I've been looking some more into the XMLD:DB API specification(since its partially based on the XML:DB core API spec) and have number of remarks about it, though I did not yet have time to read the specification thoroughly, so expect some more. Unfortunatly I also didn't have enough time to think of alternatives the things I have a problem with. Some general remarks: * Resource and Services are perfectly abstract names but its hard to imagine for a user what they mean. I'm in favor of more specific names, to make it easier for users to imagine what they stand for (I only have to figure out what the right names would be). I'd like to hear some suggestions as this is something we toiled over a fair bit in the beginning. However, I'll also say it hasn't really been a problem. We've had hundreds of people use the API through Xindice and the naming hasn't seemed to cause any confusion. In fact I'm kind of surprised at how easily people picked up on it. * As Dare Obasanjo already mentioned the tying of services to collections is not very practical. I think this is definitly something that should be changed. Yes, we need some changes here. Interface specific remarks: Collection interface * I think the behavior and interface of the getServices method should be changed, because: - Each instance of a service could possibly take up resources, in which case you would want to instantiate those services lazy whenever getService is called. - It's not likely you need them all at once. - If its meant for checking the types of services supported by the collection (though personally I do not think that services should be coupled to collections at all) then it could return only the names of the services it supports. We originally had a separate method to check for the existence of a service and it was decided later that it was not really necessary. Your point about the potential for heavy services is a valid one though so you may be right that the mechanism needs to be refined. * I'm not quite sure about the use of getResourceCount/getChildCollectionCount, since in the case of X-Hive it involves counting the resources which of course has a bad performance characteristic. Unfortunately the functionality is needed to build usable tools. CollectionManagementService interface * If think this interface is overkill, why not add the createCollection and removeCollection methods to the CollectionInterface? If not should it then check if the collection it operates on is still open? Not all databases can use that interface, it's too simplistic for something like Tamino where schemas are required. I added it just to have something that was usable for simple cases, so it's optional. ResourceSet interface * getResource(long item) will only have a good performance if there's a random access list behind the resource set. * getSize will only have a good performance if there's a list behind the resource set Optimize this and that's where you get competitive advantage. :-) When evaluating queries lazy (not always completely possible: for instance if the end result, or temporary results need to be sorted), you typically do not want to gather results in a list, but return them one by one in using an iterator. What you typically want to prevent is that users use code like this: ResourceSet rs = ...; for (long i = 0; i < rs.getSize(); i++) { Resource r = rs.getResource(i); } to iterate over the query results when the query is lazy evaluated. Because this would mean that the result set should first gather al the query results which would essentially mean that the results are iterated twice (and you may not have enough working memory to get all the results from the database). Again this is an implementation detail. There is no reason that the getSize operation has to be calculated from the contents of the result set. It could easily be provided by the database. Doing that would allow lazy retrieval of results. Though of course these methods could be useful when there's a list behind the resource set (for instance when the end result needed to be sorted) in those cases you can request the size without a performance penalty. So maybe some method should be added to see if the resourceset is lazy or not? What would be the use case for this? * getIterator returns a ResourceIterator. I'm more in favor of returning a java.util.Iterator (I don't see the cast that becomes necessary as a problem), and renaming the method to iterator() because that's more like other java interfaces, though I understand that this just a matter of taste, and having an own interface for it could make porting the API to other platforms than java easier. As Tom already pointed out the API is intended to be as language independent as possible. This is a big source of compromises, i.e. things like error codes instead of a collection hierarchy