xmbdb query language was: Re: welcome

Jonathan Borden Sat, 30 Sep 2000 00:22:15 +0200

Tom

>
> A lot of the time, you'll want to issue a query that updates a lot of
> records at once.  Say you work at Blockbuster and it's time to issue
> free rentals to all of your customers who have been around for 2
> years...  Well, with the massive virtual document model, you seek to
> where the customers are, iterate over each one of them, and update their
> records if they're 2 years old, otherwise you skip them. If you've got a
> lot of customers, that's possibly a lot of customers to skip.   With an
> RDBMS, you'd issue a query that retrieves only the correct rows, and
> then updates those rows as a single isolated batch, that level of
> performance can't be achieved if you're virtualizing the data as a
> single document in the background and then performing XSLT or some other
> transform against it.  Obviously, a query optimizer might do the job of
> narrowing the virtualized document behind your back, but I don't think
> the standard should depend on implicit optimizations.


    Agreed, but that's the reason to why I proposed adding a path-prefix
attribute which explicitly sets the context set of the operation. I think
this is very similar to your source attribute, no? I think we are trying to
say the same thing in different ways.

>
> >     Is this true? Lets generalize beyond XPath into XPointer which is
> > defined as a URI fragment identifier, or forget that and use virtual
maps,
> > given the URI:
>
> The problem with this is the notion that there might actually be a way
> to drill directly to a document from any location.  The only way to
> access individual rows in a RDBMS table is to perform an isolating query
> against the table.  There is no requirement (nor should there be) in an
> RDBMS for an exposed ID that would let you directly refer to a single
> row in a table.  Primary Keys are a convention for retrieving a row by a
> unique ID but they aren't a storage requirement.  I think the same rule
> should apply to documents in a repository.

    The point I was trying to make is that typical static web page usage
associates a URI with a specific storage location, but this association is
not required. For example the QUERY_STRING is essentially a fragment
identifier in the CGI syntax which can be used to build a SQL query. One can
even include a SQL directly as the fragment identifier e.g.
http://example.org/foo#SELECT%20bar%20FROM%20table%20WHERE%20baz='bop' in
general an XPointer or XPath need not refer to a specific node, rather a
node set.

>
> No, what I mean is that if an XML repository manages many databases with
> many data sets of different kinds of XML data, especially data that may
> not be related, located, or accessible, then we shouldn't force the
> in-query location context to be the top level of the repository. Doing
> so implies a relationship and accessibility when there may actually be
> none.  I just think the best way to access a variety of potential data
> sources with a minimal amount of headache and mapping is to treat the
> data set and locations within those data sets separately.

    This stresses the importance of a "source" or "path-prefix" mechanism to
define the query/update context. This could be defined as a URI reference
whose syntax can be left up to the requirements of the data structure. I do
think it would be a good idea to leverage the URI addressing mechanism to
provide for distributed queries across the web.

> 
> > Ok. but then what in particular are we to offer that, for example, is
not
> > already offered by say Quilt? It was my thought that we might define
common
> > interfaces for accessing XML data, my first thought is that this might
be a
> > DOM or SAX interface, and provide common specifications for interfaces
to
> > contexts, access control etc. From my readings, I see proposals for a
number
> > of query languages, but not so many for update languages, so we thought
this
> > might be a fruitful place to start.
>
> The first thing we'd be offering is a language that's not yet another
> language that looks like Perl, which is one of Quilt and XQLs primary
> failings.

:-))

I think that's incredibly important, especially since most
> database developers will be expecting something similar to SQL.  We need
> to produce something that is as approachable as possible, and a lot of
> the query languages out there don't provide that.

Excellent goal. The current XUpdate proposal leverages XPath as its query
language and hence solves a different, also as yet unspecified problem being
that we need something beyond XSLT or raw DOM access using Java as a way to
specify modifications to XML documents/repositories.

You've identified the need for an XML query language. We might have a goal
that it have the expressive power of say Quilt or SQL as a basis, but use
XML syntax.

>
> As far as interfaces, I do think an XML database CLI would definitely be
> of great benefit to the community.  The biggest problem we face in that
> arena is that a lot of companies are recycling legacy data stores into
> XML databases, these may be RDBMSes, Object Databases, Content
> Management Systems, you name it.  There's nothing uniform in the way you
> access these different kinds of data stores, so creating a standard that
> will minimally support each is going to be tough.

    Perhaps we should take the DOM as a start (because many vendors have
already implemented wrappers for both) and think about what is missing.
Would it be too much of an imposition to require that result sets are
returned as DOM NodeLists as a starter?


> One of my thoughts all along for dbXML would be that the
> entire system is accessible from the root level, so in that sense, I
> really like your ideas, but I don't want to force my notion of an ideal
> data store on the community that will be implementing the standard that
> we generate here.
>

    True, but we need to think about what the minimal requirement for
calling something an XML database are. I would say that the query result
needs to at least be expressed either as a series of SAX events or as a DOM
NodeList  What should the requirements for the CLI look like?

Jonathan Borden
The Open Healthcare Group
http://www.openhealth.org


------------------------------------------------------------------
Post a message:          mailto:[EMAIL PROTECTED]
Unsubscribe:             mailto:[EMAIL PROTECTED]
Contact adminstrator:    mailto:[EMAIL PROTECTED]
Read archived messages:  http://www.xmldb.org/
------------------------------------------------------------------

xmbdb query language was: Re: welcome

Reply via email to