Re: Remarks about XML:DB API

2002-02-07 Thread Tom Bradford
On Thursday, February 7, 2002, at 05:38 AM, Arno de Quaasteniet wrote:
Some general remarks:
* Resource and Services are perfectly abstract names but its hard to
imagine for a user what they mean. I'm in favor of more specific names,
to make it easier for users to imagine what they stand for (I only have
to figure out what the right names would be).
A valid criticism, but it's a little late in the game for that.

* As Dare Obasanjo already mentioned the tying of services to
collections is not very practical. I think this is definitly something
that should be changed.
I disagree.  A Service is a Collection augmentation mechanism and may be 
exposed with multiple implementations of the same interface depending on 
which collection it is augmenting, which would especially be important 
for servers whose underlying data model is aggregated from a variety of 
sources.  To say that it shouldn't be associated with individual 
collections makes it much more difficult to implement.

Interface specific remarks:

Collection interface

* I think the behavior and interface of the getServices method should be
changed, because:
- Each instance of a service could possibly take up resources, in which
case you would want to instantiate those services lazy whenever
getService is called.
This is an implementation issue, and can be addressed specifically be a 
vendor and their own needs.  Just as your concern about where collection 
associates should be...  There's nothing barring someone from having a 
single instance of the same service, and having the Collection 
implementations resolve to that same instance...  Or to lazily 
instantiate a service upon request.

* I'm not quite sure about the use of
getResourceCount/getChildCollectionCount, since in the case of X-Hive it
involves counting the resources which of course has a bad performance
characteristic.
Bad characteristic for X-Hive, but not other DBs.  This is not a problem 
for Xindice, and probably won't be a problem for systems like Tamino and 
many relationally mapped XML-DBs.  You can always throw an exception.

CollectionManagementService interface

* If think this interface is overkill, why not add the createCollection
and removeCollection methods to the CollectionInterface? If not should
it then check if the collection it operates on is still open?
This is a sticky issue no matter what, because the way most vendors 
implement collection management is different, though the way most people 
access the content of a collection is fairly consistent.  For example, 
how would you propose a generic way to create a collection based on 
relational mapping?  It's not very simple, which is why decoupling the 
two functionalities and allowing a vendor to write a proprietary 
collection management service if necessary seemed like to most 
appropriate solution.

* getIterator returns a ResourceIterator. I'm more in favor of returning
a java.util.Iterator (I don't see the cast that becomes necessary as a
problem), and renaming the method to iterator() because that's more like
other java interfaces, though I understand that this just a matter of
taste, and having an own interface for it could make porting the API to
other platforms than java easier.
The primary goal of the API is to be platform and language independent, 
which is why I'm sure people who are implementing the API in Python or 
C++ wouldn't have agreed with you.

* The ResourceIterator interface
If not replaced by java.util.Iterator I would prefer if this interface
would have methods named next() and hasNext() instead of nextResource()
and hasMoreResources().
Why not write an Adapter that implements the Java Iterator interface and 
wraps the ResourceIterator?

--
Tom Bradford - http://www.tbradford.org
Apache Xindice (Native XML Database) - http://xml.apache.org
Project Labrador (Web Services Framework) - http://notdotnet.org
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Remarks about XML:DB API

2002-02-07 Thread Arno de Quaasteniet
Hi,

Inspired by the SixDML proposal I've been looking some more into the
XMLD:DB API specification(since its partially based on the XML:DB core
API spec) and have number of remarks about it, though I did not yet have
time to read the specification thoroughly, so expect some more.
Unfortunatly I also didn't have enough time to think of alternatives the
things I have a problem with.

Some general remarks:
* Resource and Services are perfectly abstract names but its hard to
imagine for a user what they mean. I'm in favor of more specific names,
to make it easier for users to imagine what they stand for (I only have
to figure out what the right names would be).
* As Dare Obasanjo already mentioned the tying of services to
collections is not very practical. I think this is definitly something
that should be changed.

Interface specific remarks:

Collection interface

* I think the behavior and interface of the getServices method should be
changed, because:
- Each instance of a service could possibly take up resources, in which
case you would want to instantiate those services lazy whenever
getService is called. 
- It's not likely you need them all at once.
- If its meant for checking the types of services supported by the
collection (though personally I do not think that services should be
coupled to collections at all) then it could return only the names of
the services it supports.
* I'm not quite sure about the use of
getResourceCount/getChildCollectionCount, since in the case of X-Hive it
involves counting the resources which of course has a bad performance
characteristic.

CollectionManagementService interface

* If think this interface is overkill, why not add the createCollection
and removeCollection methods to the CollectionInterface? If not should
it then check if the collection it operates on is still open?

ResourceSet interface

* getResource(long item) will only have a good performance if there's a
random access list behind the resource set.
* getSize will only have a good performance if there's a list behind the
resource set

When evaluating queries lazy (not always completely possible: for
instance if the end result, or temporary results need to be sorted), you
typically do not want to gather results in a list, but return them one
by one in using an iterator. 

What you typically want to prevent is that users use code like this:

ResourceSet rs = ...;
for (long i = 0; i < rs.getSize(); i++) {
Resource r = rs.getResource(i);
} 

to iterate over the query results when the query is lazy evaluated.
Because this would mean that the result set should first gather al the
query results which would essentially mean that the results are iterated
twice (and you may not have enough working memory to get all the results
from the database).

Though of course these methods could be useful when there's a list
behind the resource set (for instance when the end result needed to be
sorted) in those cases you can request the size without a performance
penalty.

So maybe some method should be added to see if the resourceset is lazy
or not?

* getIterator returns a ResourceIterator. I'm more in favor of returning
a java.util.Iterator (I don't see the cast that becomes necessary as a
problem), and renaming the method to iterator() because that's more like
other java interfaces, though I understand that this just a matter of
taste, and having an own interface for it could make porting the API to
other platforms than java easier. 

* The ResourceIterator interface  
If not replaced by java.util.Iterator I would prefer if this interface
would have methods named next() and hasNext() instead of nextResource()
and hasMoreResources().

An finally I have a question, is there a test suite that tests
conformance to the API?

Kind regards,

Arno de Quaasteniet
X-Hive Corporation
+31 (0)10 710 86 24
http://www.x-hive.com
[EMAIL PROTECTED]
 
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Dare Obasanjo

--- Tom Bradford <[EMAIL PROTECTED]> wrote:

> > * As Dare Obasanjo already mentioned the tying of
> services to
> > collections is not very practical. I think this is
> definitly something
> > that should be changed.
> 
> I disagree.  A Service is a Collection augmentation
> mechanism and may be 
> exposed with multiple implementations of the same
> interface depending on 
> which collection it is augmenting, which would
> especially be important 
> for servers whose underlying data model is
> aggregated from a variety of 
> sources.  To say that it shouldn't be associated
> with individual 
> collections makes it much more difficult to
> implement.
> 

What you have described is an implementation detail
that should be hidden from the user. Secondly I'm not
even sure I understand what it means. However, I do
understand that to start a transaction, perform a
query or an update I need to first grab some
collection object and then grab a service object from
it. 

So if I grab a the "/db/my_collection/xsl/" collection
and obtain a query service or transaction service.
Does this mean that I can't use this object to start a
transaction or perform a query if I'll be performing
operations on the "/db/schemas/" collection?

If the answer to the above question is Yes, then this
is an issue that will cause user confusion and perhaps
errors (standardization on getting services from the
DB root would help but then defeats the purpose of
tying services to collections in the first place). 

If the answer is No, then there doesn't seem to be any
justification in tying services to collections. 

=
LAWS OF COMPUTER PROGRAMMING, VIII  
Any non-trivial program contains at least one bug. 
http://www.25hoursaday.com   
Carnage4Life (slashdot/advogato/kuro5hin)

__
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Tom Bradford
On Thursday, February 7, 2002, at 01:44 PM, Dare Obasanjo wrote:
What you have described is an implementation detail
that should be hidden from the user. Secondly I'm not
even sure I understand what it means. However, I do
understand that to start a transaction, perform a
query or an update I need to first grab some
collection object and then grab a service object from
it.
So if I grab a the "/db/my_collection/xsl/" collection
and obtain a query service or transaction service.
Does this mean that I can't use this object to start a
transaction or perform a query if I'll be performing
operations on the "/db/schemas/" collection?
Yes... and it shouldn't cause confusion because Services as they're 
implemented at the moment can't be repointed to other Collections.  To a 
Service, the Collection provides context.  It may be a starting context 
for recursive processing, or it may be a singular context... Depends on 
the nature of, and how the service is implemented.  There's nothing 
stopping someone from implementing a Service that is tied to the root 
Collection of the database and operates on the database as a whole, but 
not allowing the possibility of context would be too restrictive 
contextually, where naming and implementation flexibility are concerned.

--
Tom Bradford - http://www.tbradford.org
Apache Xindice (Native XML Database) - http://xml.apache.org
Project Labrador (Web Services Framework) - http://notdotnet.org
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Kimbro Staken
On Thursday, February 7, 2002, at 02:01 PM, Tom Bradford wrote:
Yes... and it shouldn't cause confusion because Services as they're 
implemented at the moment can't be repointed to other Collections.  To a 
Service, the Collection provides context.  It may be a starting context 
for recursive processing, or it may be a singular context... Depends on 
the nature of, and how the service is implemented.  There's nothing 
stopping someone from implementing a Service that is tied to the root 
Collection of the database and operates on the database as a whole, but 
not allowing the possibility of context would be too restrictive 
contextually, where naming and implementation flexibility are concerned.

The problem comes if there is no root collection. For instance I have an 
Oracle 9i impl where the collection hierarchy is flat. I had to synthesize 
a root collection in order to have a starting point to create collections.
 This isn't intuitive when the database doesn't support a hierarchy of 
collections. I actually agree with Dare on this, Services tied to 
collections is too limiting. We need a cleaner distinction of database 
level services. I don't think all services should be database level, but 
the concept needs to exist.

--
Tom Bradford - http://www.tbradford.org
Apache Xindice (Native XML Database) - http://xml.apache.org
Project Labrador (Web Services Framework) - http://notdotnet.org
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--

Kimbro Staken
XML Database Software, Consulting and Writing
http://www.xmldatabases.org/
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Tom Bradford
On Thursday, February 7, 2002, at 02:09 PM, Kimbro Staken wrote:
The problem comes if there is no root collection. For instance I have 
an Oracle 9i impl where the collection hierarchy is flat. I had to 
synthesize a root collection in order to have a starting point to 
create collections.
 This isn't intuitive when the database doesn't support a hierarchy of 
collections. I actually agree with Dare on this, Services tied to 
collections is too limiting. We need a cleaner distinction of database 
level services. I don't think all services should be database level, 
but the concept needs to exist.
My only argument is that Collection-level services are needed, and 
shouldn't be eliminated.  I have no problem with adding Database level 
services.

--
Tom Bradford - http://www.tbradford.org
Apache Xindice (Native XML Database) - http://xml.apache.org
Project Labrador (Web Services Framework) - http://notdotnet.org
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:xapi-dev-
[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--

Kimbro Staken
XML Database Software, Consulting and Writing
http://www.xmldatabases.org/
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:xapi-dev-
[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Dare Obasanjo

--- Tom Bradford <[EMAIL PROTECTED]> wrote:
> On Thursday, February 7, 2002, at 02:09 PM, Kimbro
> Staken wrote:
> > The problem comes if there is no root collection.
> For instance I have 
> > an Oracle 9i impl where the collection hierarchy
> is flat. I had to 
> > synthesize a root collection in order to have a
> starting point to 
> > create collections.
> >  This isn't intuitive when the database doesn't
> support a hierarchy of 
> > collections. I actually agree with Dare on this,
> Services tied to 
> > collections is too limiting. We need a cleaner
> distinction of database 
> > level services. I don't think all services should
> be database level, 
> > but the concept needs to exist.
> 
> My only argument is that Collection-level services
> are needed, and 
> shouldn't be eliminated.  I have no problem with
> adding Database level 
> services.

:) 

This can easily be supported by doing what I did with
SiXDML. Just add getService(String, String) to the
Database class. 

=
LAWS OF COMPUTER PROGRAMMING, VIII  
Any non-trivial program contains at least one bug. 
http://www.25hoursaday.com   
Carnage4Life (slashdot/advogato/kuro5hin)

__
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Tom Bradford
On Thursday, February 7, 2002, at 02:30 PM, Dare Obasanjo wrote:
This can easily be supported by doing what I did with
SiXDML. Just add getService(String, String) to the
Database class.
Here's the problem with that though.  Imagine you have a program that 
performs service requests in a generic fashion against Collections that 
are passed to it.  Now furthermore, say you have two collections, one is 
a collection that is relationally mapped, the other that is native.  
Because of this, the Service may have to be implemented completely 
differently.  When you request a Service of the same name, you'll be 
getting back the same interface, but with a different underlying 
implementation.

It's awkward enough that you'd have to query the Collection for its 
absolute path, and then pass that absolute path to the Database to 
resolve the Service, but add to that the fact that when you offload 
Service resolution responsibilities to the Database, you're asking it 
not only to get a Service, but to get a specific implementation based on 
the Collection name you're passing to it, which is more responsibility 
than the Database needs to handle, especially in a system where the 
collection structure is based on many heterogeneous data sources and 
implementations.

--
Tom Bradford - http://www.tbradford.org
Apache Xindice (Native XML Database) - http://xml.apache.org
Project Labrador (Web Services Framework) - http://notdotnet.org
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Kimbro Staken
On Thursday, February 7, 2002, at 02:40 PM, Tom Bradford wrote:

On Thursday, February 7, 2002, at 02:30 PM, Dare Obasanjo wrote:
This can easily be supported by doing what I did with
SiXDML. Just add getService(String, String) to the
Database class.
Here's the problem with that though.  Imagine you have a program that 
performs service requests in a generic fashion against Collections that 
are passed to it.  Now furthermore, say you have two collections, one is 
a collection that is relationally mapped, the other that is native.  
Because of this, the Service may have to be implemented completely 
differently.  When you request a Service of the same name, you'll be 
getting back the same interface, but with a different underlying 
implementation.

It's awkward enough that you'd have to query the Collection for its 
absolute path, and then pass that absolute path to the Database to 
resolve the Service, but add to that the fact that when you offload 
Service resolution responsibilities to the Database, you're asking it not 
only to get a Service, but to get a specific implementation based on the 
Collection name you're passing to it, which is more responsibility than 
the Database needs to handle, especially in a system where the collection 
structure is based on many heterogeneous data sources and implementations.

I don't think he was suggesting that this should be the only way to access 
collections just an addendum.

The one problem I do see with it is that it changes the concept of the 
Database. In the current API you  shouldn't be using the database instance 
for anything beyond the initial setup. If we move logic like getService 
into it then you'll actually be using the Database instance in other 
places as well. Not a major problem, but not as simple as just adding one 
method. We'd probably need a method on Collection to return the Database 
instance. Or another option would be to change the getService method to 
enable specification of what scope the service applies too. I almost like 
that better.

Collection.getService(name, version, scope) where scope is one of three 
values, database, collection, or hierachy. These could be defined as 
constants in the Service interface. Hierarchy would apply to the 
collection and all children of the collection.

Either way would work though.

--
Tom Bradford - http://www.tbradford.org
Apache Xindice (Native XML Database) - http://xml.apache.org
Project Labrador (Web Services Framework) - http://notdotnet.org
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--

Kimbro Staken
XML Database Software, Consulting and Writing
http://www.xmldatabases.org/
--
Post a message: mailto:[EMAIL PROTECTED]
Unsubscribe:mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
--


Re: Remarks about XML:DB API

2002-02-07 Thread Kimbro Staken
On Thursday, February 7, 2002, at 05:38 AM, Arno de Quaasteniet wrote:

Hi,

Inspired by the SixDML proposal I've been looking some more into the
XMLD:DB API specification(since its partially based on the XML:DB core
API spec) and have number of remarks about it, though I did not yet have
time to read the specification thoroughly, so expect some more.
Unfortunatly I also didn't have enough time to think of alternatives the
things I have a problem with.
Some general remarks:
* Resource and Services are perfectly abstract names but its hard to
imagine for a user what they mean. I'm in favor of more specific names,
to make it easier for users to imagine what they stand for (I only have
to figure out what the right names would be).
I'd like to hear some suggestions as this is something we toiled over a 
fair bit in the beginning. However, I'll also say it hasn't really been a 
problem. We've had hundreds of people use the API through Xindice and the 
naming hasn't seemed to cause any confusion. In fact I'm kind of surprised 
at how easily people picked up on it.

* As Dare Obasanjo already mentioned the tying of services to
collections is not very practical. I think this is definitly something
that should be changed.
Yes, we need some changes here.

Interface specific remarks:

Collection interface

* I think the behavior and interface of the getServices method should be
changed, because:
- Each instance of a service could possibly take up resources, in which
case you would want to instantiate those services lazy whenever
getService is called.
- It's not likely you need them all at once.
- If its meant for checking the types of services supported by the
collection (though personally I do not think that services should be
coupled to collections at all) then it could return only the names of
the services it supports.
We originally had a separate method to check for the existence of a 
service and it was decided later that it was not really necessary. Your 
point about the potential for heavy services is a valid one though so you 
may be right that the mechanism needs to be refined.

* I'm not quite sure about the use of
getResourceCount/getChildCollectionCount, since in the case of X-Hive it
involves counting the resources which of course has a bad performance
characteristic.
Unfortunately the functionality is needed to build usable tools.

CollectionManagementService interface

* If think this interface is overkill, why not add the createCollection
and removeCollection methods to the CollectionInterface? If not should
it then check if the collection it operates on is still open?
Not all databases can use that interface, it's too simplistic for 
something like Tamino where schemas are required. I added it just to have 
something that was usable for simple cases, so it's optional.

ResourceSet interface

* getResource(long item) will only have a good performance if there's a
random access list behind the resource set.
* getSize will only have a good performance if there's a list behind the
resource set
Optimize this and that's where you get competitive advantage. :-)

When evaluating queries lazy (not always completely possible: for
instance if the end result, or temporary results need to be sorted), you
typically do not want to gather results in a list, but return them one
by one in using an iterator.
What you typically want to prevent is that users use code like this:

ResourceSet rs = ...;
for (long i = 0; i < rs.getSize(); i++) {
Resource r = rs.getResource(i);
}
to iterate over the query results when the query is lazy evaluated.
Because this would mean that the result set should first gather al the
query results which would essentially mean that the results are iterated
twice (and you may not have enough working memory to get all the results
from the database).
Again this is an implementation detail. There is no reason that the 
getSize operation has to be calculated from the contents of the result set.
 It could easily be provided by the database. Doing that would allow lazy 
retrieval of results.

Though of course these methods could be useful when there's a list
behind the resource set (for instance when the end result needed to be
sorted) in those cases you can request the size without a performance
penalty.
So maybe some method should be added to see if the resourceset is lazy
or not?
What would be the use case for this?

* getIterator returns a ResourceIterator. I'm more in favor of returning
a java.util.Iterator (I don't see the cast that becomes necessary as a
problem), and renaming the method to iterator() because that's more like
other java interfaces, though I understand that this just a matter of
taste, and having an own interface for it could make porting the API to
other platforms than java easier.
As Tom already pointed out the API is intended to be as language 
independent as possible. This is a big source of compromises, i.e. things 
like error codes instead of a collection hierarchy