Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-17 Thread Andrew Nagy
I've had the best luck with eXist and BerkeleyDB XML.

Both support XQuery and have indexing features based on any XML structure.

Andrew

On 1/16/10, Godmar Back god...@gmail.com wrote:
 Hi,

 we're currently looking for an XML database to store a variety of
 small-to-medium sized XML documents. The XML documents are
 unstructured in the sense that they do not follow a schema or DTD, and
 that their structure will be changing over time. We'll need to do
 efficient searching based on elements, attributes, and full text
 within text content. More importantly, the documents are mutable.
 We'll like to bring documents or fragments into memory in a DOM
 representation, manipulate them, then put them back into the database.
 Ideally, this should be done in a transaction-like manner. We need to
 efficiently serve document fragments over HTTP, ideally in a manner
 that allows for scaling through replication. We would prefer strong
 support for Java integration, but it's not a must.

 Have other encountered similar problems, and what have you been using?

 So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
 Base-X (http://www.basex.org/ ), MonetDB/XQuery
 (http://www.monetdb.nl/XQuery/ ), Sedna
 (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
 others here: http://en.wikipedia.org/wiki/XML_database
 I'm wondering to what extent systems such as Lucene, or even digital
 object repositories such as Fedora could be coaxed into this usage
 scenario.

 Thanks for any insight you have or experience you can share.

  - Godmar


-- 
Sent from my mobile device


Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-17 Thread Patrick Hochstenbach
Depends on your datamodel, Godar. You could also consider databases like 
CouchDB.
Not XML ..but if your datamodel can fit into JSON. Efficient serving of docs 
over
HTTP is their trademark, like scaling through replication.

Lucene. CouchDB has Lucene integration..but I find it somewhat flaky. In my 
case I did batch index jobs of the database.

In another project we could (I don't say easily) fit the datamodel into MySQL. 
Our developers could then reuse all the MySQL tools, scripts. The sysadmin was 
happy.

So first consider if XML is really needed throughout the whole codebase. Are 
you working with textual documents in XML, or database dumps in XML?

Best,
P@

Skype: patrick.hochstenbach
Patrick Hochstenbach   Software Architect
University Library +32(0)92647980
Ghent University * Rozier 9 * 9000 * Gent


-Original Message-
From: Code for Libraries on behalf of Andrew Nagy
Sent: Mon 18-1-2010 1:28
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q: what is the best open source native XML database
 
I've had the best luck with eXist and BerkeleyDB XML.

Both support XQuery and have indexing features based on any XML structure.

Andrew

On 1/16/10, Godmar Back god...@gmail.com wrote:
 Hi,

 we're currently looking for an XML database to store a variety of
 small-to-medium sized XML documents. The XML documents are
 unstructured in the sense that they do not follow a schema or DTD, and
 that their structure will be changing over time. We'll need to do
 efficient searching based on elements, attributes, and full text
 within text content. More importantly, the documents are mutable.
 We'll like to bring documents or fragments into memory in a DOM
 representation, manipulate them, then put them back into the database.
 Ideally, this should be done in a transaction-like manner. We need to
 efficiently serve document fragments over HTTP, ideally in a manner
 that allows for scaling through replication. We would prefer strong
 support for Java integration, but it's not a must.

 Have other encountered similar problems, and what have you been using?

 So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
 Base-X (http://www.basex.org/ ), MonetDB/XQuery
 (http://www.monetdb.nl/XQuery/ ), Sedna
 (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
 others here: http://en.wikipedia.org/wiki/XML_database
 I'm wondering to what extent systems such as Lucene, or even digital
 object repositories such as Fedora could be coaxed into this usage
 scenario.

 Thanks for any insight you have or experience you can share.

  - Godmar


-- 
Sent from my mobile device