Hi Jonathan,

> I don't follow. Suppose you can have at most one data element having a
> particular constraint. You wish to either insert the element or replace the
> current one if it exists. How does this constrain the repository?

A lot of the time, you'll want to issue a query that updates a lot of
records at once.  Say you work at Blockbuster and it's time to issue
free rentals to all of your customers who have been around for 2
years...  Well, with the massive virtual document model, you seek to
where the customers are, iterate over each one of them, and update their
records if they're 2 years old, otherwise you skip them. If you've got a
lot of customers, that's possibly a lot of customers to skip.   With an
RDBMS, you'd issue a query that retrieves only the correct rows, and
then updates those rows as a single isolated batch, that level of
performance can't be achieved if you're virtualizing the data as a
single document in the background and then performing XSLT or some other
transform against it.  Obviously, a query optimizer might do the job of
narrowing the virtualized document behind your back, but I don't think
the standard should depend on implicit optimizations.

>     Is this true? Lets generalize beyond XPath into XPointer which is
> defined as a URI fragment identifier, or forget that and use virtual maps,
> given the URI:

The problem with this is the notion that there might actually be a way
to drill directly to a document from any location.  The only way to
access individual rows in a RDBMS table is to perform an isolating query
against the table.  There is no requirement (nor should there be) in an
RDBMS for an exposed ID that would let you directly refer to a single
row in a table.  Primary Keys are a convention for retrieving a row by a
unique ID but they aren't a storage requirement.  I think the same rule
should apply to documents in a repository.  

>     what do you mean by 'expose', do we require that the data be exposed as
> XML? what if the data is binary, is this within our scope?

No, what I mean is that if an XML repository manages many databases with
many data sets of different kinds of XML data, especially data that may
not be related, located, or accessible, then we shouldn't force the
in-query location context to be the top level of the repository. Doing
so implies a relationship and accessibility when there may actually be
none.  I just think the best way to access a variety of potential data
sources with a minimal amount of headache and mapping is to treat the
data set and locations within those data sets separately.

> Ok. but then what in particular are we to offer that, for example, is not
> already offered by say Quilt? It was my thought that we might define common
> interfaces for accessing XML data, my first thought is that this might be a
> DOM or SAX interface, and provide common specifications for interfaces to
> contexts, access control etc. From my readings, I see proposals for a number
> of query languages, but not so many for update languages, so we thought this
> might be a fruitful place to start.

The first thing we'd be offering is a language that's not yet another
language that looks like Perl, which is one of Quilt and XQLs primary
failings.  I think that's incredibly important, especially since most
database developers will be expecting something similar to SQL.  We need
to produce something that is as approachable as possible, and a lot of
the query languages out there don't provide that.  

One of the biggest problems with XML standards in general has been that
they're not very approachable by those who aren't intimately familiar
with XML technologies in general.  One could never expect a junior level
developer coming from the Microsoft SQL Server/IIS/ASP world to quickly
grasp XQL, XPath, XPointer, Schemas, and especially XSLT.  It's a
barrier for a lot developers who may not be quite so enlightened, and
even though these technologies are incredibly powerful, that gets lost
in the confusion of how to actually use them.

As far as interfaces, I do think an XML database CLI would definitely be
of great benefit to the community.  The biggest problem we face in that
arena is that a lot of companies are recycling legacy data stores into
XML databases, these may be RDBMSes, Object Databases, Content
Management Systems, you name it.  There's nothing uniform in the way you
access these different kinds of data stores, so creating a standard that
will minimally support each is going to be tough.  

I'm not trying to be a pain in the ass here, just trying to play devil's
advocate.  I know that these concerns will be raised at some point in
the future, and so we may as well shoot them down now.  The schema that
I wrote is about an hour of work total, it's nothing I'm proud of, it's
just an idea.  One of my thoughts all along for dbXML would be that the
entire system is accessible from the root level, so in that sense, I
really like your ideas, but I don't want to force my notion of an ideal
data store on the community that will be implementing the standard that
we generate here.

--Tom

-- 
<name>Tom Bradford</name>
<title>Chief Software Architect</title>
<company>The dbXML Group</company>
<phone>(480) 421-1233</phone>

------------------------------------------------------------------
Post a message:          mailto:[EMAIL PROTECTED]
Unsubscribe:             mailto:[EMAIL PROTECTED]
Contact adminstrator:    mailto:[EMAIL PROTECTED]
Read archived messages:  http://www.xmldb.org/
------------------------------------------------------------------

Reply via email to