Demetrius Nunes wrote:
Hi there,

We are evaluating new technologies for managing semi-structured data and
documents in one of our applications. We've got tired of wrestling
relational databases for this.

I would like to know why would I prefer to use CouchDB instead of a RDF
database, such as Sesame ou Mulgara.

I know some of the RDF advantages, such as open standards, interoperability,
rules engines, semantic queries, community and tool support, maturity, etc.

But I really like the simplicity of the CouchDB model.

Can anyone enlighten me?

Thanks a lot,
Demetrius

Hi Demetrius,

We ( bibkn.org) have investigated and used SQL databases, RDF store (Virtuoso) and CouchDB for bibliographic metadata management. I am the project manager and data architect for this project. Relnl databases are a first choice often but have many limitations in management of loosely typed, messy, string based data sets. So we are in agreement on not using that technology.

We, bibkn.org, need both the schemalessness of CouchDB at one end of our workflow and the strongly-typedness of RDF at the other end of the workflow when all our data has been cleaned up and "ontologized". So we don't see this as an either/or between CouchDB and RDF stores. However we can definitely say one thing - if you need just the flexible schema aspect and are using RDF to give you that, then that is massive overkill and the conceptual overhead of the RDF (ontology, schemas, namespaces, completely normalized everything ie URI's for subject, predictae, object) , is simply not worth it. If however you want to do logical inference and reasoning over your data then clearly the RDF and semantic machinery gives you a whole lot of goodness that is worth the overhead.

So CouchDB is not a substitute for an RDF-store, but you may be using an RDF-store for the lesser things it gives you (flexible schema) and in that case CouchDB can do a lot more for you at a much lower overhead and much greater ease of use and integration into existing tools.

Additionally SPARQL (like SQL) is not really meant for text search which is critical for loosely typed data. So even at our RDF end we have a Solr instance for rapid text search over the RDF store. Additionally we have couchdb-lucene as an extension on our CouchDB instance and this has given us everything we need at the loosely typed data end of our workflow.

So if semi-structured data and document management is your primary use case and there is no semantic/ontology/inference component then forget RDF-stores and just go with CouchDB.

In our project we are developing a format on top of JSON to export bibliographic metadata for integration into JSON friendly date consumers, it also happens to have easy mapping to RDF. So even if you go to Couch now you may be able to integrate into an RDF-store at some later stage if the need arises.

Hope this helps,

Nitin Borwankar,
Project Manager,  Bibliographic Knowledge Network
bibkn.org




Reply via email to