Re: [CODE4LIB] Q: what is the best open source native XML database
BaseX is actively developed (6.0 came out about two weeks ago), but I understand your concern. It seems like they are moving towards building more of a community around it (mailing lists and such), but yes, the core is pretty much the university team. eXist has more plug-ins and specialty features than BaseX. BaseX has XQuery full text search and much much faster querying speed. As far as putting your eggs in a basket, I'm pretty sure that if you're looking to base your project around XML databases, you're already putting your eggs in a basket of some form... -Sean On Jan 19, 2010, at 1:00 PM, Godmar Back wrote: > On Tue, Jan 19, 2010 at 10:09 AM, Sean Hannan wrote: >> I've had the best experience (query speed, primarily) with BaseX. This was >> primarily for large XML document processing, so I'm not sure how much it >> will satisfy your transactional needs. >> >> I was initially using eXist, and then switched over to BaseX because the >> speed gains were very noticeable. >> > > What about the relative maturity/functionality of eXist vs BaseX? I'm > a bit skeptical to put my eggs in a University project basket not > backed by a continuous revenue stream (... did I just say that out > loud?) > > - Godmar
Re: [CODE4LIB] Q: what is the best open source native XML database
Godmar, We're using eXist for a couple of apps here, and like it quite a bit. The full text search extensions in the 1.4 release are backed by Lucene, and it's pretty quick once you've tuned it (try some searches here: http://diglib.princeton.edu/ead/ -- this is running on a beta of 1.4) and set up the indexing properly. Performance will not be good until you've configured some indexes and tweaked the JVM settings. There is a bit of a learning curve involved here, but the documentation is decent, and the community and developers are quite active and accessible. You can GET and PUT and DELETE documents very easily, or POST xqueries to get fragments. You can also GET fragments or documents by supplying parameters to an xquery stored in the database--they call this their "REST-style API"[1]. There are a few other ways to get content in and out[2], and Java integration isn't a problem via the xml:db API[3]. You can also write extension modules in Java. -Jon 1. http://exist.sourceforge.net/devguide_rest.html 2. http://exist.sourceforge.net/devguide.html 3. http://exist.sourceforge.net/devguide_xmldb.html On 01/16/2010 11:15 AM, Godmar Back wrote: Hi, we're currently looking for an XML database to store a variety of small-to-medium sized XML documents. The XML documents are unstructured in the sense that they do not follow a schema or DTD, and that their structure will be changing over time. We'll need to do efficient searching based on elements, attributes, and full text within text content. More importantly, the documents are mutable. We'll like to bring documents or fragments into memory in a DOM representation, manipulate them, then put them back into the database. Ideally, this should be done in a transaction-like manner. We need to efficiently serve document fragments over HTTP, ideally in a manner that allows for scaling through replication. We would prefer strong support for Java integration, but it's not a must. Have other encountered similar problems, and what have you been using? So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ), Base-X (http://www.basex.org/ ), MonetDB/XQuery (http://www.monetdb.nl/XQuery/ ), Sedna (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few others here: http://en.wikipedia.org/wiki/XML_database I'm wondering to what extent systems such as Lucene, or even digital object repositories such as Fedora could be coaxed into this usage scenario. Thanks for any insight you have or experience you can share. - Godmar -- Jon Stroop Metadata Analyst C-17-D2 Firestone Library Princeton University Princeton, NJ 08544 Email: jstr...@princeton.edu Phone: (609)258-0059 Fax: (609)258-0441 http://diglib.princeton.edu http://diglib.princeton.edu/ead
Re: [CODE4LIB] Q: what is the best open source native XML database
On Tue, Jan 19, 2010 at 10:09 AM, Sean Hannan wrote: > I've had the best experience (query speed, primarily) with BaseX. This was > primarily for large XML document processing, so I'm not sure how much it will > satisfy your transactional needs. > > I was initially using eXist, and then switched over to BaseX because the > speed gains were very noticeable. > What about the relative maturity/functionality of eXist vs BaseX? I'm a bit skeptical to put my eggs in a University project basket not backed by a continuous revenue stream (... did I just say that out loud?) - Godmar
Re: [CODE4LIB] Q: what is the best open source native XML database
I've had the best experience (query speed, primarily) with BaseX. This was primarily for large XML document processing, so I'm not sure how much it will satisfy your transactional needs. I was initially using eXist, and then switched over to BaseX because the speed gains were very noticeable. -Sean On Jan 16, 2010, at 11:15 AM, Godmar Back wrote: > Hi, > > we're currently looking for an XML database to store a variety of > small-to-medium sized XML documents. The XML documents are > unstructured in the sense that they do not follow a schema or DTD, and > that their structure will be changing over time. We'll need to do > efficient searching based on elements, attributes, and full text > within text content. More importantly, the documents are mutable. > We'll like to bring documents or fragments into memory in a DOM > representation, manipulate them, then put them back into the database. > Ideally, this should be done in a transaction-like manner. We need to > efficiently serve document fragments over HTTP, ideally in a manner > that allows for scaling through replication. We would prefer strong > support for Java integration, but it's not a must. > > Have other encountered similar problems, and what have you been using? > > So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ), > Base-X (http://www.basex.org/ ), MonetDB/XQuery > (http://www.monetdb.nl/XQuery/ ), Sedna > (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few > others here: http://en.wikipedia.org/wiki/XML_database > I'm wondering to what extent systems such as Lucene, or even digital > object repositories such as Fedora could be coaxed into this usage > scenario. > > Thanks for any insight you have or experience you can share. > > - Godmar
Re: [CODE4LIB] Q: what is the best open source native XML database
Hey Godmar, I'd definitely consider CouchDB as Patrick mentioned. It's a "schema-free" JSON document database and replication is it's greatest strength. It does have Lucene integration: http://github.com/rnewson/couchdb-lucene Paul J. Davis of the core CouchDB team has a nice write-up: http://www.davispj.com/2009/01/18/couchdb-lucene-indexing.html There's also some Solr integration available: http://github.com/deguzman/couchdb-solr2 From what you've described, CouchDB would be a great choice for your application. Hope that's helpful, Godmar, Benjamin -- President BigBlueHat P: 864.232.9553 W: http://www.bigbluehat.com/ http://www.linkedin.com/in/benjaminyoung Godmar Back wrote: Hi, we're currently looking for an XML database to store a variety of small-to-medium sized XML documents. The XML documents are unstructured in the sense that they do not follow a schema or DTD, and that their structure will be changing over time. We'll need to do efficient searching based on elements, attributes, and full text within text content. More importantly, the documents are mutable. We'll like to bring documents or fragments into memory in a DOM representation, manipulate them, then put them back into the database. Ideally, this should be done in a transaction-like manner. We need to efficiently serve document fragments over HTTP, ideally in a manner that allows for scaling through replication. We would prefer strong support for Java integration, but it's not a must. Have other encountered similar problems, and what have you been using? So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ), Base-X (http://www.basex.org/ ), MonetDB/XQuery (http://www.monetdb.nl/XQuery/ ), Sedna (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few others here: http://en.wikipedia.org/wiki/XML_database I'm wondering to what extent systems such as Lucene, or even digital object repositories such as Fedora could be coaxed into this usage scenario. Thanks for any insight you have or experience you can share. - Godmar
Re: [CODE4LIB] Q: what is the best open source native XML database
Depends on your datamodel, Godar. You could also consider databases like CouchDB. Not XML ..but if your datamodel can fit into JSON. Efficient serving of docs over HTTP is their trademark, like scaling through replication. Lucene. CouchDB has Lucene integration..but I find it somewhat flaky. In my case I did batch index jobs of the database. In another project we could (I don't say easily) fit the datamodel into MySQL. Our developers could then reuse all the MySQL tools, scripts. The sysadmin was happy. So first consider if XML is really needed throughout the whole codebase. Are you working with textual documents in XML, or database dumps in XML? Best, P@ Skype: patrick.hochstenbach Patrick Hochstenbach Software Architect University Library +32(0)92647980 Ghent University * Rozier 9 * 9000 * Gent -Original Message- From: Code for Libraries on behalf of Andrew Nagy Sent: Mon 18-1-2010 1:28 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: what is the best open source native XML database I've had the best luck with eXist and BerkeleyDB XML. Both support XQuery and have indexing features based on any XML structure. Andrew On 1/16/10, Godmar Back wrote: > Hi, > > we're currently looking for an XML database to store a variety of > small-to-medium sized XML documents. The XML documents are > unstructured in the sense that they do not follow a schema or DTD, and > that their structure will be changing over time. We'll need to do > efficient searching based on elements, attributes, and full text > within text content. More importantly, the documents are mutable. > We'll like to bring documents or fragments into memory in a DOM > representation, manipulate them, then put them back into the database. > Ideally, this should be done in a transaction-like manner. We need to > efficiently serve document fragments over HTTP, ideally in a manner > that allows for scaling through replication. We would prefer strong > support for Java integration, but it's not a must. > > Have other encountered similar problems, and what have you been using? > > So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ), > Base-X (http://www.basex.org/ ), MonetDB/XQuery > (http://www.monetdb.nl/XQuery/ ), Sedna > (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few > others here: http://en.wikipedia.org/wiki/XML_database > I'm wondering to what extent systems such as Lucene, or even digital > object repositories such as Fedora could be coaxed into this usage > scenario. > > Thanks for any insight you have or experience you can share. > > - Godmar > -- Sent from my mobile device
Re: [CODE4LIB] Q: what is the best open source native XML database
I've had the best luck with eXist and BerkeleyDB XML. Both support XQuery and have indexing features based on any XML structure. Andrew On 1/16/10, Godmar Back wrote: > Hi, > > we're currently looking for an XML database to store a variety of > small-to-medium sized XML documents. The XML documents are > unstructured in the sense that they do not follow a schema or DTD, and > that their structure will be changing over time. We'll need to do > efficient searching based on elements, attributes, and full text > within text content. More importantly, the documents are mutable. > We'll like to bring documents or fragments into memory in a DOM > representation, manipulate them, then put them back into the database. > Ideally, this should be done in a transaction-like manner. We need to > efficiently serve document fragments over HTTP, ideally in a manner > that allows for scaling through replication. We would prefer strong > support for Java integration, but it's not a must. > > Have other encountered similar problems, and what have you been using? > > So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ), > Base-X (http://www.basex.org/ ), MonetDB/XQuery > (http://www.monetdb.nl/XQuery/ ), Sedna > (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few > others here: http://en.wikipedia.org/wiki/XML_database > I'm wondering to what extent systems such as Lucene, or even digital > object repositories such as Fedora could be coaxed into this usage > scenario. > > Thanks for any insight you have or experience you can share. > > - Godmar > -- Sent from my mobile device