Demetrius Nunes wrote:
Hi Nitin,

Great answer. Thanks a lot. One more question...

I am in the Javaland here, so another viable option for my application is
using JCR, such as the Apache Jackrabbit implementation.

Hi Demetrius,

I am a refugee from Javaland so am familiar with the power and limitations of Java. Yes, I have looked at JCR and JackRabbit in a previous project. These days I just recoil from the verbosity and conceptual layers you encounter when coding simple things in Java. And then there's XML..... So I would have held my nose and used JackRabbit if CouchDB didn't exist - but in my mind it's a distant second in practice even if it is conceptually similar and close in theory.

Personally when I see layer upon layer of abstraction in Java architecture diagrams I wonder how much of my CPU cost is going in converting from strings, to TypeA to LayeredClassB to factoryC to ORM D to EJB4 to disk and back again all the way to strings. So I am moving away from Java except when the best of breed solution is in Java ( Lucene/Solr) - so I don't hate Java - I just need to justify the overhead that it brings both in coding and in the build/install/deploy process.

CouchDB has minimal overhead in roundtrip datatype translations - it's what I call "WYSIWIS" - "what you see is what you store" i.e. JSON. There are people looking at an alternative to LAMP which they call JS3 - Javascript in all three layers - browser/helma/couchdb ( helma, helma.org, is a middle tier layer written in Java, runs on Jetty, uses JS as the language for doing UI templates and also ORM ) - I personally think CouchDB + CouchDBViews just makes it JS2 - browser-CouchDB.

I would suggest you download Rhino ( JS interpreter in Java) from Mozilla and start playing with both CouchDB and JackRabbit and then see.

Did I sound biased ? :-)


Nitin Borwankar,
Project Manager, Bibliographic Knowledge Network.
bibkn.org

Did you happen to take a look at that as well? I think JCR has even more
similarities with CouchDB than RDF.

How would you compare JCR and CouchDB ?

Thanks a lot,
Demetrius

On Thu, May 7, 2009 at 5:04 PM, Nitin Borwankar <[email protected]> wrote:

Demetrius Nunes wrote:

Hi there,

We are evaluating new technologies for managing semi-structured data and
documents in one of our applications. We've got tired of wrestling
relational databases for this.

I would like to know why would I prefer to use CouchDB instead of a RDF
database, such as Sesame ou Mulgara.

I know some of the RDF advantages, such as open standards,
interoperability,
rules engines, semantic queries, community and tool support, maturity,
etc.

But I really like the simplicity of the CouchDB model.

Can anyone enlighten me?

Thanks a lot,
Demetrius



Hi Demetrius,

We ( bibkn.org) have investigated and used SQL databases, RDF store
(Virtuoso) and CouchDB for bibliographic metadata management.  I am the
project manager and data architect for this project.
Relnl databases are a first choice often but have many limitations in
management of loosely typed, messy, string based data sets.  So we are in
agreement on not using that technology.

We, bibkn.org,  need both the schemalessness of CouchDB at one end of our
workflow and the strongly-typedness of RDF at the other end of the workflow
when all our data has been cleaned up and "ontologized". So we don't see
this as an either/or between CouchDB and RDF stores.
However we can definitely say one thing  - if you need  just the flexible
schema aspect  and are using RDF to give you that, then  that is massive
overkill and the conceptual overhead of the RDF (ontology, schemas,
namespaces, completely normalized everything ie URI's for subject,
predictae, object) , is simply not worth it.    If however you want to do
logical inference and reasoning over your data then clearly the RDF and
semantic  machinery gives you  a  whole lot of goodness that is worth the
overhead.

So CouchDB is not a substitute for an RDF-store, but you may be using an
RDF-store for the lesser things it gives you (flexible schema) and in that
case CouchDB can do a lot more for you at a much lower overhead and much
greater ease of use and integration into existing tools.

Additionally SPARQL  (like SQL) is not really meant for text search which
is critical for loosely typed data. So even at our RDF end we have a Solr
instance for rapid text search over the RDF store.
Additionally we have couchdb-lucene as an extension on our CouchDB instance
and this has given us everything we need at the loosely typed data end of
our workflow.

So if semi-structured data and document management is your primary use case
and there is no semantic/ontology/inference component then forget RDF-stores
and just go with CouchDB.

In our project we are developing a format on top of JSON to export
bibliographic metadata for integration into JSON friendly date consumers, it
also happens to have easy mapping to RDF.
So even if you go to Couch now you may be able to integrate into an
RDF-store at some later stage if the need arises.

Hope this helps,

Nitin Borwankar,
Project Manager,  Bibliographic Knowledge Network
bibkn.org








Reply via email to