subject:"Re\: \[CODE4LIB\] Q\: what is the best open source native XML database"

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-20 Thread Sean Hannan

BaseX is actively developed (6.0 came out about two weeks ago), but I 
understand your concern.  It seems like they are moving towards building more 
of a community around it (mailing lists and such), but yes, the core is pretty 
much the university team. 

eXist has more plug-ins and specialty features than BaseX. BaseX has XQuery 
full text search and much much faster querying speed.

As far as putting your eggs in a basket, I'm pretty sure that if you're looking 
to base your project around XML databases, you're already putting your eggs in 
a basket of some form...

-Sean

On Jan 19, 2010, at 1:00 PM, Godmar Back wrote:

> On Tue, Jan 19, 2010 at 10:09 AM, Sean Hannan  wrote:
>> I've had the best experience (query speed, primarily) with BaseX.  This was 
>> primarily for large XML document processing, so I'm not sure how much it 
>> will satisfy your transactional needs.
>> 
>> I was initially using eXist, and then switched over to BaseX because the 
>> speed gains were very noticeable.
>> 
> 
> What about the relative maturity/functionality of eXist vs BaseX? I'm
> a bit skeptical to put my eggs in a University project basket not
> backed by a continuous revenue stream (... did I just say that out
> loud?)
> 
> - Godmar

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-19 Thread Jon Stroop


Godmar,
We're using eXist for a couple of apps here, and like it quite a bit.

The full text search extensions in the 1.4 release are backed by Lucene, 
and it's pretty quick once you've tuned it (try some searches here: 
http://diglib.princeton.edu/ead/ -- this is running on a beta of 1.4) 
and set up the indexing properly. Performance will not be good until 
you've configured some indexes and tweaked the JVM settings. There is a 
bit of a learning curve involved here, but the documentation is decent, 
and the community and developers are quite active and accessible.


You can GET and PUT and DELETE documents very easily, or POST xqueries 
to get fragments.  You can also GET fragments or documents by supplying 
parameters to an xquery stored in the database--they call this their 
"REST-style API"[1].  There are a few other ways to get content in and 
out[2], and Java integration isn't a problem via the xml:db API[3].  You 
can also write extension modules in Java.


-Jon

1. http://exist.sourceforge.net/devguide_rest.html
2. http://exist.sourceforge.net/devguide.html
3. http://exist.sourceforge.net/devguide_xmldb.html


On 01/16/2010 11:15 AM, Godmar Back wrote:

Hi,

we're currently looking for an XML database to store a variety of
small-to-medium sized XML documents. The XML documents are
unstructured in the sense that they do not follow a schema or DTD, and
that their structure will be changing over time. We'll need to do
efficient searching based on elements, attributes, and full text
within text content. More importantly, the documents are mutable.
We'll like to bring documents or fragments into memory in a DOM
representation, manipulate them, then put them back into the database.
Ideally, this should be done in a transaction-like manner. We need to
efficiently serve document fragments over HTTP, ideally in a manner
that allows for scaling through replication. We would prefer strong
support for Java integration, but it's not a must.

Have other encountered similar problems, and what have you been using?

So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
Base-X (http://www.basex.org/ ), MonetDB/XQuery
(http://www.monetdb.nl/XQuery/ ), Sedna
(http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
others here: http://en.wikipedia.org/wiki/XML_database
I'm wondering to what extent systems such as Lucene, or even digital
object repositories such as Fedora could be coaxed into this usage
scenario.

Thanks for any insight you have or experience you can share.

  - Godmar
   


--
Jon Stroop
Metadata Analyst
C-17-D2 Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu
http://diglib.princeton.edu/ead

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-19 Thread Godmar Back

On Tue, Jan 19, 2010 at 10:09 AM, Sean Hannan  wrote:
> I've had the best experience (query speed, primarily) with BaseX.  This was 
> primarily for large XML document processing, so I'm not sure how much it will 
> satisfy your transactional needs.
>
> I was initially using eXist, and then switched over to BaseX because the 
> speed gains were very noticeable.
>

What about the relative maturity/functionality of eXist vs BaseX? I'm
a bit skeptical to put my eggs in a University project basket not
backed by a continuous revenue stream (... did I just say that out
loud?)

 - Godmar

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-19 Thread Sean Hannan

I've had the best experience (query speed, primarily) with BaseX.  This was 
primarily for large XML document processing, so I'm not sure how much it will 
satisfy your transactional needs.

I was initially using eXist, and then switched over to BaseX because the speed 
gains were very noticeable. 

-Sean


On Jan 16, 2010, at 11:15 AM, Godmar Back wrote:

> Hi,
> 
> we're currently looking for an XML database to store a variety of
> small-to-medium sized XML documents. The XML documents are
> unstructured in the sense that they do not follow a schema or DTD, and
> that their structure will be changing over time. We'll need to do
> efficient searching based on elements, attributes, and full text
> within text content. More importantly, the documents are mutable.
> We'll like to bring documents or fragments into memory in a DOM
> representation, manipulate them, then put them back into the database.
> Ideally, this should be done in a transaction-like manner. We need to
> efficiently serve document fragments over HTTP, ideally in a manner
> that allows for scaling through replication. We would prefer strong
> support for Java integration, but it's not a must.
> 
> Have other encountered similar problems, and what have you been using?
> 
> So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
> Base-X (http://www.basex.org/ ), MonetDB/XQuery
> (http://www.monetdb.nl/XQuery/ ), Sedna
> (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
> others here: http://en.wikipedia.org/wiki/XML_database
> I'm wondering to what extent systems such as Lucene, or even digital
> object repositories such as Fedora could be coaxed into this usage
> scenario.
> 
> Thanks for any insight you have or experience you can share.
> 
> - Godmar

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-18 Thread Benjamin Young


Hey Godmar,

I'd definitely consider CouchDB as Patrick mentioned. It's a 
"schema-free" JSON document database and replication is it's greatest 
strength.


It does have Lucene integration:
http://github.com/rnewson/couchdb-lucene
Paul J. Davis of the core CouchDB team has a nice write-up:
http://www.davispj.com/2009/01/18/couchdb-lucene-indexing.html

There's also some Solr integration available:
http://github.com/deguzman/couchdb-solr2

From what you've described, CouchDB would be a great choice for your 
application.


Hope that's helpful, Godmar,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung 




Godmar Back wrote:

Hi,

we're currently looking for an XML database to store a variety of
small-to-medium sized XML documents. The XML documents are
unstructured in the sense that they do not follow a schema or DTD, and
that their structure will be changing over time. We'll need to do
efficient searching based on elements, attributes, and full text
within text content. More importantly, the documents are mutable.
We'll like to bring documents or fragments into memory in a DOM
representation, manipulate them, then put them back into the database.
Ideally, this should be done in a transaction-like manner. We need to
efficiently serve document fragments over HTTP, ideally in a manner
that allows for scaling through replication. We would prefer strong
support for Java integration, but it's not a must.

Have other encountered similar problems, and what have you been using?

So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
Base-X (http://www.basex.org/ ), MonetDB/XQuery
(http://www.monetdb.nl/XQuery/ ), Sedna
(http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
others here: http://en.wikipedia.org/wiki/XML_database
I'm wondering to what extent systems such as Lucene, or even digital
object repositories such as Fedora could be coaxed into this usage
scenario.

Thanks for any insight you have or experience you can share.

 - Godmar

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-17 Thread Patrick Hochstenbach

Depends on your datamodel, Godar. You could also consider databases like 
CouchDB.
Not XML ..but if your datamodel can fit into JSON. Efficient serving of docs 
over
HTTP is their trademark, like scaling through replication.

Lucene. CouchDB has Lucene integration..but I find it somewhat flaky. In my 
case I did batch index jobs of the database.

In another project we could (I don't say easily) fit the datamodel into MySQL. 
Our developers could then reuse all the MySQL tools, scripts. The sysadmin was 
happy.

So first consider if XML is really needed throughout the whole codebase. Are 
you working with textual documents in XML, or database dumps in XML?

Best,
P@

Skype: patrick.hochstenbach
Patrick Hochstenbach   Software Architect
University Library +32(0)92647980
Ghent University * Rozier 9 * 9000 * Gent


-Original Message-
From: Code for Libraries on behalf of Andrew Nagy
Sent: Mon 18-1-2010 1:28
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q: what is the best open source native XML database
 
I've had the best luck with eXist and BerkeleyDB XML.

Both support XQuery and have indexing features based on any XML structure.

Andrew

On 1/16/10, Godmar Back  wrote:
> Hi,
>
> we're currently looking for an XML database to store a variety of
> small-to-medium sized XML documents. The XML documents are
> unstructured in the sense that they do not follow a schema or DTD, and
> that their structure will be changing over time. We'll need to do
> efficient searching based on elements, attributes, and full text
> within text content. More importantly, the documents are mutable.
> We'll like to bring documents or fragments into memory in a DOM
> representation, manipulate them, then put them back into the database.
> Ideally, this should be done in a transaction-like manner. We need to
> efficiently serve document fragments over HTTP, ideally in a manner
> that allows for scaling through replication. We would prefer strong
> support for Java integration, but it's not a must.
>
> Have other encountered similar problems, and what have you been using?
>
> So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
> Base-X (http://www.basex.org/ ), MonetDB/XQuery
> (http://www.monetdb.nl/XQuery/ ), Sedna
> (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
> others here: http://en.wikipedia.org/wiki/XML_database
> I'm wondering to what extent systems such as Lucene, or even digital
> object repositories such as Fedora could be coaxed into this usage
> scenario.
>
> Thanks for any insight you have or experience you can share.
>
>  - Godmar
>

-- 
Sent from my mobile device

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-17 Thread Andrew Nagy

I've had the best luck with eXist and BerkeleyDB XML.

Both support XQuery and have indexing features based on any XML structure.

Andrew

On 1/16/10, Godmar Back  wrote:
> Hi,
>
> we're currently looking for an XML database to store a variety of
> small-to-medium sized XML documents. The XML documents are
> unstructured in the sense that they do not follow a schema or DTD, and
> that their structure will be changing over time. We'll need to do
> efficient searching based on elements, attributes, and full text
> within text content. More importantly, the documents are mutable.
> We'll like to bring documents or fragments into memory in a DOM
> representation, manipulate them, then put them back into the database.
> Ideally, this should be done in a transaction-like manner. We need to
> efficiently serve document fragments over HTTP, ideally in a manner
> that allows for scaling through replication. We would prefer strong
> support for Java integration, but it's not a must.
>
> Have other encountered similar problems, and what have you been using?
>
> So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
> Base-X (http://www.basex.org/ ), MonetDB/XQuery
> (http://www.monetdb.nl/XQuery/ ), Sedna
> (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
> others here: http://en.wikipedia.org/wiki/XML_database
> I'm wondering to what extent systems such as Lucene, or even digital
> object repositories such as Fedora could be coaxed into this usage
> scenario.
>
> Thanks for any insight you have or experience you can share.
>
>  - Godmar
>

-- 
Sent from my mobile device

Re: [CODE4LIB] Q: what is the best open source native XML database

Re: [CODE4LIB] Q: what is the best open source native XML database

Re: [CODE4LIB] Q: what is the best open source native XML database

Re: [CODE4LIB] Q: what is the best open source native XML database

Re: [CODE4LIB] Q: what is the best open source native XML database

Re: [CODE4LIB] Q: what is the best open source native XML database

Re: [CODE4LIB] Q: what is the best open source native XML database

7 matches

Site Navigation

Mail list logo

Footer information