Dear wiki user,

You have subscribed to a wiki page "Couchdb Wiki" for change notification.

The page "Why" has been deleted by JoanTouzet:

https://wiki.apache.org/couchdb/Why?action=diff&rev1=2&rev2=3

Comment:
This feels like a personal blogpost, not something representative of the 
CouchDB community. Sorry.

- ''This is a work in progress, I hesitate to even call it a rough draft yet. 
It's put on the wiki to make collaborative edits easier.''
  
- With enough work, you can find a way to 
[[http://en.wikipedia.org/wiki/Database_normalization|express anything 
relationally]]. This is the "brilliant" insight into relational theory that 
[[http://en.wikipedia.org/wiki/Edgar_F._Codd|Codd]] had and every relational 
database zealot has harped on forever since. The worst of these zealot will 
tell you SQL is an abomination because isn't really relational, it adheres only 
to "some" of the relational principles. 
[[http://www.dbdebunk.com/index.html|Some people seem to have made entire 
careers out of zealotry.]] Yet SQL has been very successful for a very long 
time, despite the obvious deficiency of not being "fully relational". Why? Why 
is it successful if it doesn't and can't adhere strictly to these relational 
principles?
- 
- Because no one really gives a shit.
- 
- Why should a normal person care in the least about relational purity? What 
they want is a database that stores their data, tells them interesting things 
about it and gives it back to them reliably.
- 
- ''The reason SQL is successful is not because of how well adheres to 
relational principles, but because it deviates from those principles in ways 
that are convenient in real use.''
- 
- On the other end of the spectrum, the opposite of the pure relational model, 
you have a world of pure objects, where data is treated as unstructured, 
indivisible units and more closely represents how data is actually represented 
and passed around in the real world. The relatively recent object-oriented 
programming paradigm is an extension of this model and in storage we've had the 
concept of files since just about forever.
- 
- This self contained, isolated object model, as it relates to the web, is 
what's also becoming popularly known as the 
[[http://en.wikipedia.org/wiki/REST|REST]] model. And as so many of its 
proponents love to claim, it scales like crazy. In fact, REST, representing and 
manipulating data as address chunks, is a realization and a formalization of 
how most of the internet works and scales.
- 
- But just like very little of our data is completely relational in nature, 
very little of our data is completely distinct or unstructured. Most of our 
data is has some common structure and is completely self contained, but refers 
to external data and concepts.
- 
- Examples of data that has characteristics of both are easy to find, for 
example a restaurant receipt. A receipt is both a self contained document and 
relational piece of data, its purpose is to record data and meta-data about a 
financial transaction. It records definitive facts, things that are always 
true, like the method of payment and the amount paid, but it also records more 
conceptual things, like the name of the restaurant involved and the dishes 
served. Recipes and ingredients change, and restaurants can be renamed, sold 
and moved. The link between receipt and these external concepts, like the 
restaurant, can be broken in a number of ways. Even the value of money changes 
over time, not just with inflation, but also units of currency can be 
completely revalued by governments. The information on that receipt, at any 
point in time, is of unknown accuracy and verifiability. And yet despite all 
these short comings, a receipt is still useful.
- 
- And so the receipt data, despite some relational characteristic, is 
problematic to represent "correctly" in a relational database, because there is 
no absolute answer to what the meaning of each piece of data is and how it 
relates to everything else. In paper form, the receipt preserves the original 
restaurant information regardless what happens in the real world, it’s not just 
a record of relations but of facts at a place in time. And even if the all the 
things that link the receipt back to the originating restaurant change, the 
receipt still has value. This is the "temporal" aspect of data and is something 
the relational model doesn't handle well.
- 
- But like pure relational modeling isn't very productive for many real world 
problems, modeling data as isolated objects isn't terribly useful either. 
Beyond acting as a fancy file server (like WebDAV), the REST model for data 
storage and sharing is too simple to build real applications on, objects alone 
are far too restrictive to use as a general storage model, it doesn't do enough 
of the things we want and expect from our applications. So instead SQL 
databases, with their abilities to slice and dice and tell us interesting 
things about our data, continue to dominate.
- 
- What is needed is a way to make these discrete objects be a little more 
relational. To aggregate, organize and report on data that hasn't already been 
decomposed into tables.
- 
- This was very much the idea behind WinFS, Microsoft's spectacularly brilliant 
and disastrous unifying vision for data modeling in the Windows world. The 
problem was Microsoft couldn't make it work efficiently (not to mention they 
seriously overreached, trying to make WinFS a solution to solve all problems, 
before it ever solved a single problem).
- 
- One of the reasons I think WinFS has failed is it is too rooted in relational 
technology, the reporting engine is based in large part on SQL Server. MS is 
far from the first to fail in this area, building object databases on top of 
relational technology. In Oracle-land, attempts to reproduce the similar models 
even has a name, 
[[http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html|Entity
 Attribute Value model]], and is generally considered a very bad idea. To this 
day I've never seen anyone build it on a relational store, make something 
flexible, easy to use and that gives suitable performance.
- 
- Even IBM has been working for years to get DB2 to serve as a persistence 
layer for Notes and Domino, and they still can't get it stable, easy and 
performant enough to be released gold. IBM is working on it because customers 
keep saying how much they love the idea, but the reality is that it's damn hard 
to build on top of a relational database. Essentially they are try to emulate 
one general database and reporting model on top of another general purpose 
database and reporting model, but the models conflict.
- 
- The updates and usage model of CouchDb is a lockless, optimistic model that 
is pretty much the opposite of how the relational model works. To edit an 
object, you load it up, make your changes, then save them back to the database. 
If someone else edits at the same time and they save their changes before you, 
you'll get a conflict error on save. To resolved the conflict, the updated 
document can be opened and edits can be applied to it and try again.
- 
- This storage and access model is such that any number of hosts can be 
replicating while simultaneously any number of users can be reading, editing 
and deleting documents, performing lookups and reads in computed tables which 
are simultaneously being updated for other users. Document writes are 
serialized, except for file attachments and binary blobs, which are written 
concurrently. And the database never needs to “lock” to do this.
- 
- Relational databases, or more specifically SQL databases, simply don't make 
good document databases, the fundamental set of compromises made is all wrong. 
If you want something that performs well and is reliable, you need to start 
from scratch and think carefully about the whole design. Lotus Notes did this 
long ago, and over the years a lot of very talented people have been 
successfully adapting and expanding that design to keep it competitive. And 
Notes and Domino continue to be big money makers despite how many people are 
other annoyed by the software, because it makes it simple to do the things that 
a nightmares to deal with in SQL.
- 
- CouchDB addresses the data storage problem from the non-relational, RESTful 
point of view, and is designed and optimized in multiple, complimentary ways 
for that world view of semi-structured, document oriented data. And by 
providing a specific but flexible document storage, query and replication 
model, it hits the sweet spot storage and reporting for things that are more 
"document oriented" than relational, a huge class of problems in application 
development.
- 
- Just as SQL databases rarely are used for data that is entirely structured 
and relational in nature, Couch databases rarely will be used for data that has 
no structure, but where the interrelations and structure aren't strong and 
might be broken. The CouchDB computed table reporting engine adds back 
structure and organizes semi-structured, document data and while allowing it to 
stay in it's preferred state.
- 
- The CouchDB document and query model aggregates documents into a manageable 
tables. Unlike SQL database, where your data must be carefully decomposed into 
tables ahead of time, the tables on CouchDB are built on the fly and aggregates 
the already existing data.
- 

Reply via email to