Demetrius Nunes wrote:
Hi there,
We are evaluating new technologies for managing semi-structured data and
documents in one of our applications. We've got tired of wrestling
relational databases for this.
I would like to know why would I prefer to use CouchDB instead of a RDF
database, such as Sesame ou Mulgara.
I know some of the RDF advantages, such as open standards, interoperability,
rules engines, semantic queries, community and tool support, maturity, etc.
But I really like the simplicity of the CouchDB model.
Can anyone enlighten me?
Thanks a lot,
Demetrius
Hi Demetrius,
We ( bibkn.org) have investigated and used SQL databases, RDF store
(Virtuoso) and CouchDB for bibliographic metadata management. I am the
project manager and data architect for this project.
Relnl databases are a first choice often but have many limitations in
management of loosely typed, messy, string based data sets. So we are
in agreement on not using that technology.
We, bibkn.org, need both the schemalessness of CouchDB at one end of
our workflow and the strongly-typedness of RDF at the other end of the
workflow when all our data has been cleaned up and "ontologized". So we
don't see this as an either/or between CouchDB and RDF stores.
However we can definitely say one thing - if you need just the
flexible schema aspect and are using RDF to give you that, then that
is massive overkill and the conceptual overhead of the RDF
(ontology, schemas, namespaces, completely normalized everything ie
URI's for subject, predictae, object) , is simply not worth it. If
however you want to do logical inference and reasoning over your data
then clearly the RDF and semantic machinery gives you a whole lot of
goodness that is worth the overhead.
So CouchDB is not a substitute for an RDF-store, but you may be using an
RDF-store for the lesser things it gives you (flexible schema) and in
that case CouchDB can do a lot more for you at a much lower overhead and
much greater ease of use and integration into existing tools.
Additionally SPARQL (like SQL) is not really meant for text search
which is critical for loosely typed data. So even at our RDF end we have
a Solr instance for rapid text search over the RDF store.
Additionally we have couchdb-lucene as an extension on our CouchDB
instance and this has given us everything we need at the loosely typed
data end of our workflow.
So if semi-structured data and document management is your primary use
case and there is no semantic/ontology/inference component then forget
RDF-stores and just go with CouchDB.
In our project we are developing a format on top of JSON to export
bibliographic metadata for integration into JSON friendly date
consumers, it also happens to have easy mapping to RDF.
So even if you go to Couch now you may be able to integrate into an
RDF-store at some later stage if the need arises.
Hope this helps,
Nitin Borwankar,
Project Manager, Bibliographic Knowledge Network
bibkn.org