Hi Jonathan, > I don't follow. Suppose you can have at most one data element having a > particular constraint. You wish to either insert the element or replace the > current one if it exists. How does this constrain the repository?
A lot of the time, you'll want to issue a query that updates a lot of records at once. Say you work at Blockbuster and it's time to issue free rentals to all of your customers who have been around for 2 years... Well, with the massive virtual document model, you seek to where the customers are, iterate over each one of them, and update their records if they're 2 years old, otherwise you skip them. If you've got a lot of customers, that's possibly a lot of customers to skip. With an RDBMS, you'd issue a query that retrieves only the correct rows, and then updates those rows as a single isolated batch, that level of performance can't be achieved if you're virtualizing the data as a single document in the background and then performing XSLT or some other transform against it. Obviously, a query optimizer might do the job of narrowing the virtualized document behind your back, but I don't think the standard should depend on implicit optimizations. > Is this true? Lets generalize beyond XPath into XPointer which is > defined as a URI fragment identifier, or forget that and use virtual maps, > given the URI: The problem with this is the notion that there might actually be a way to drill directly to a document from any location. The only way to access individual rows in a RDBMS table is to perform an isolating query against the table. There is no requirement (nor should there be) in an RDBMS for an exposed ID that would let you directly refer to a single row in a table. Primary Keys are a convention for retrieving a row by a unique ID but they aren't a storage requirement. I think the same rule should apply to documents in a repository. > what do you mean by 'expose', do we require that the data be exposed as > XML? what if the data is binary, is this within our scope? No, what I mean is that if an XML repository manages many databases with many data sets of different kinds of XML data, especially data that may not be related, located, or accessible, then we shouldn't force the in-query location context to be the top level of the repository. Doing so implies a relationship and accessibility when there may actually be none. I just think the best way to access a variety of potential data sources with a minimal amount of headache and mapping is to treat the data set and locations within those data sets separately. > Ok. but then what in particular are we to offer that, for example, is not > already offered by say Quilt? It was my thought that we might define common > interfaces for accessing XML data, my first thought is that this might be a > DOM or SAX interface, and provide common specifications for interfaces to > contexts, access control etc. From my readings, I see proposals for a number > of query languages, but not so many for update languages, so we thought this > might be a fruitful place to start. The first thing we'd be offering is a language that's not yet another language that looks like Perl, which is one of Quilt and XQLs primary failings. I think that's incredibly important, especially since most database developers will be expecting something similar to SQL. We need to produce something that is as approachable as possible, and a lot of the query languages out there don't provide that. One of the biggest problems with XML standards in general has been that they're not very approachable by those who aren't intimately familiar with XML technologies in general. One could never expect a junior level developer coming from the Microsoft SQL Server/IIS/ASP world to quickly grasp XQL, XPath, XPointer, Schemas, and especially XSLT. It's a barrier for a lot developers who may not be quite so enlightened, and even though these technologies are incredibly powerful, that gets lost in the confusion of how to actually use them. As far as interfaces, I do think an XML database CLI would definitely be of great benefit to the community. The biggest problem we face in that arena is that a lot of companies are recycling legacy data stores into XML databases, these may be RDBMSes, Object Databases, Content Management Systems, you name it. There's nothing uniform in the way you access these different kinds of data stores, so creating a standard that will minimally support each is going to be tough. I'm not trying to be a pain in the ass here, just trying to play devil's advocate. I know that these concerns will be raised at some point in the future, and so we may as well shoot them down now. The schema that I wrote is about an hour of work total, it's nothing I'm proud of, it's just an idea. One of my thoughts all along for dbXML would be that the entire system is accessible from the root level, so in that sense, I really like your ideas, but I don't want to force my notion of an ideal data store on the community that will be implementing the standard that we generate here. --Tom -- <name>Tom Bradford</name> <title>Chief Software Architect</title> <company>The dbXML Group</company> <phone>(480) 421-1233</phone> ------------------------------------------------------------------ Post a message: mailto:[EMAIL PROTECTED] Unsubscribe: mailto:[EMAIL PROTECTED] Contact adminstrator: mailto:[EMAIL PROTECTED] Read archived messages: http://www.xmldb.org/ ------------------------------------------------------------------
