Re: CouchDB x RDF databases comparison

Demetrius Nunes Fri, 08 May 2009 06:52:41 -0700

Thanks for all the thoughtful responses. They really helped.

On Fri, May 8, 2009 at 4:09 AM, Daniel Friesen <[email protected]>wrote:


> There's also XML Databases (XQuery) (I'll just use X for simplicity) to
> compare. I ended up starting to use Sedna over at my work.
>
> CouchDB uses JSON, X use XML
> CouchDB uses views, X uses XQuery which has some simple indexing and has a
> significantly powerful and understandable query language
> CouchDB has a lucene plugin, Sedna can have an extra fulltext index feature
> enabled.
> Updating data in CouchDB requires an entire document be updated, X
> databases can modify small parts of the document
> CouchDB saves a new document each change, X works on a current document.
> CouchDB handles conflicts using conflict resolution, X makes the
> modification query on the current document in order of queries (transactions
> are also supported).
> CouchDB uses a HTTP REST API, most X databases use a normal binary protocol
> (Sedna seams to have a good set of libraries for most languages)
> CouchDB is distributed and scalable.
> In X databases documents can be grouped into collections. (These can also
> be used in queries)
> It's probably a moot point, but XQuery is w3c standardized and implemented
> by a number of databases.
>
> IMHO compiling a comparison of alternative databases and seeing what
> features work best for what data you're working with is the best option.
>
> I went through the semantic databases myself to cause our company had
> "Semantics" in mind. I had issues getting them to work and finding help for
> most of them myself and ended up finding that our data better fit the
> document based database type. For us TQL was the only actual one with a
> significant improvement (we really needed the walk capabilities) other than
> that Semantics were only a little better than a RDBMS (although we were
> actually using RDBMS in an ugly semantic like hack; atoms table 3 columns).
> Our reason for moving away from RDBMS' was a need to remove the large
> amounts of queries going between our app and the database. We had a huge
> amount of hierarchical data the entire app was based around (a tree
> structure wasn't even guaranteed, something could have multiple parents
> referencing it and be part of multiple trees).
> We decided on Sedna (XQuery) rather than CouchDB because CouchDB's views
> couldn't handle our hierarchical data in multiple documents, and we couldn't
> put everything in one document because of how we update small pieces of data
> a lot which doesn't work out well with how entire documents need to be
> modified in Couch (Transmitting entire document to modify a single value,
> new document revision saved each time, getting a conflict because an
> unrelated part of the document was modified).
>
> Personally I have an idea for another type of database. The one thing I've
> always wanted was one program oriented. ie: Simplifying a database down to
> what it is, centralized data storage. Instead of a query language, embedding
> an existing programming language into the database environment. I wrote a
> bit of API drafting on it.
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire)
>
>
> Nitin Borwankar wrote:
>
>> Demetrius Nunes wrote:
>>
>>> Hi Nitin,
>>>
>>> Great answer. Thanks a lot. One more question...
>>>
>>> I am in the Javaland here, so another viable option for my application is
>>> using JCR, such as the Apache Jackrabbit implementation.
>>>
>>>
>>
>> Hi Demetrius,
>>
>> I am a refugee from Javaland so am familiar with the power and limitations
>> of Java.  Yes, I have looked at JCR and JackRabbit in a previous project.
>> These days I just recoil from the verbosity and conceptual layers you
>> encounter when coding simple things in Java.  And then there's XML.....
>> So I would have held my nose and used JackRabbit if CouchDB didn't exist -
>> but in my mind it's a distant second in practice even if it is conceptually
>> similar and close in theory.
>>
>> Personally when I see layer upon layer of abstraction in Java architecture
>> diagrams I wonder how much of my CPU cost is going in converting from
>> strings, to TypeA to LayeredClassB to factoryC to ORM D to EJB4 to disk and
>> back again all the way to strings.  So I am moving away from Java except
>> when the best of breed solution is in Java ( Lucene/Solr) - so I don't hate
>> Java - I just need to justify the overhead that it brings both in coding and
>> in the build/install/deploy process.
>>
>> CouchDB has minimal overhead in roundtrip datatype translations - it's
>> what I call "WYSIWIS"  - "what you see is what you store" i.e. JSON.
>> There are people looking at an alternative to LAMP which they call JS3 -
>> Javascript in all three layers - browser/helma/couchdb  ( helma,
>> helma.org, is a middle tier layer written in Java, runs on Jetty, uses JS
>> as the language for doing UI templates and also ORM ) - I personally think
>> CouchDB + CouchDBViews just makes it JS2 - browser-CouchDB.
>>
>> I would suggest you download Rhino ( JS interpreter in Java) from Mozilla
>> and start playing with both CouchDB and JackRabbit and then see.
>>
>> Did I sound biased ? :-)
>>
>>
>> Nitin Borwankar,
>> Project Manager, Bibliographic Knowledge Network.
>> bibkn.org
>>
>>  Did you happen to take a look at that as well? I think JCR has even more
>>> similarities with CouchDB than RDF.
>>>
>>> How would you compare JCR and CouchDB ?
>>>
>>> Thanks a lot,
>>> Demetrius
>>>
>>> On Thu, May 7, 2009 at 5:04 PM, Nitin Borwankar <[email protected]>
>>> wrote:
>>>
>>>
>>>
>>>> Demetrius Nunes wrote:
>>>>
>>>>
>>>>
>>>>> Hi there,
>>>>>
>>>>> We are evaluating new technologies for managing semi-structured data
>>>>> and
>>>>> documents in one of our applications. We've got tired of wrestling
>>>>> relational databases for this.
>>>>>
>>>>> I would like to know why would I prefer to use CouchDB instead of a RDF
>>>>> database, such as Sesame ou Mulgara.
>>>>>
>>>>> I know some of the RDF advantages, such as open standards,
>>>>> interoperability,
>>>>> rules engines, semantic queries, community and tool support, maturity,
>>>>> etc.
>>>>>
>>>>> But I really like the simplicity of the CouchDB model.
>>>>>
>>>>> Can anyone enlighten me?
>>>>>
>>>>> Thanks a lot,
>>>>> Demetrius
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Hi Demetrius,
>>>>
>>>> We ( bibkn.org) have investigated and used SQL databases, RDF store
>>>> (Virtuoso) and CouchDB for bibliographic metadata management.  I am the
>>>> project manager and data architect for this project.
>>>> Relnl databases are a first choice often but have many limitations in
>>>> management of loosely typed, messy, string based data sets.  So we are
>>>> in
>>>> agreement on not using that technology.
>>>>
>>>> We, bibkn.org,  need both the schemalessness of CouchDB at one end of
>>>> our
>>>> workflow and the strongly-typedness of RDF at the other end of the
>>>> workflow
>>>> when all our data has been cleaned up and "ontologized". So we don't see
>>>> this as an either/or between CouchDB and RDF stores.
>>>> However we can definitely say one thing  - if you need  just the
>>>> flexible
>>>> schema aspect  and are using RDF to give you that, then  that is massive
>>>> overkill and the conceptual overhead of the RDF (ontology, schemas,
>>>> namespaces, completely normalized everything ie URI's for subject,
>>>> predictae, object) , is simply not worth it.    If however you want to
>>>> do
>>>> logical inference and reasoning over your data then clearly the RDF and
>>>> semantic  machinery gives you  a  whole lot of goodness that is worth
>>>> the
>>>> overhead.
>>>>
>>>> So CouchDB is not a substitute for an RDF-store, but you may be using an
>>>> RDF-store for the lesser things it gives you (flexible schema) and in
>>>> that
>>>> case CouchDB can do a lot more for you at a much lower overhead and much
>>>> greater ease of use and integration into existing tools.
>>>>
>>>> Additionally SPARQL  (like SQL) is not really meant for text search
>>>> which
>>>> is critical for loosely typed data. So even at our RDF end we have a
>>>> Solr
>>>> instance for rapid text search over the RDF store.
>>>> Additionally we have couchdb-lucene as an extension on our CouchDB
>>>> instance
>>>> and this has given us everything we need at the loosely typed data end
>>>> of
>>>> our workflow.
>>>>
>>>> So if semi-structured data and document management is your primary use
>>>> case
>>>> and there is no semantic/ontology/inference component then forget
>>>> RDF-stores
>>>> and just go with CouchDB.
>>>>
>>>> In our project we are developing a format on top of JSON to export
>>>> bibliographic metadata for integration into JSON friendly date
>>>> consumers, it
>>>> also happens to have easy mapping to RDF.
>>>> So even if you go to Couch now you may be able to integrate into an
>>>> RDF-store at some later stage if the need arises.
>>>>
>>>> Hope this helps,
>>>>
>>>> Nitin Borwankar,
>>>> Project Manager,  Bibliographic Knowledge Network
>>>> bibkn.org
>>>>
>>>
>


-- 
____________________________
http://www.demetriusnunes.com

Re: CouchDB x RDF databases comparison

Reply via email to