On 9 Feb 2009, at 13:49, Mister Donut wrote:
I'll jump right in.
CouchDB won't allow you to "jump to page X", but if you look at
e.g. Google, it doesn't work either. [...]
But surrogate keys are considered harmful and I'd say (but that
really depends on the application), not very helpful.
I guess I was assuming that CouchDB, due to its different nature, has
a sophisticated solution for this. But apparently pagination is a
problem that is really hard to solve.
CouchDB in its current form is very bare bones. Many of us are not
as experienced in CouchDB as in other RDBMS, just because CouchDB
hasn't been around that long. We've came a long way defining standard
patterns of how to solve common problems in CouchDB, but there's
a lot more to do.
Pagination lives and dies by being able to calculate which row lives on
which page. This works best with a surrogate index that is a sequence
over the rows. You can build a sequence on a distributed system, but
you are introducing a global queue that all requests to your system
would
need to go through. You'd need to make that global queue fault tolerant
and able to hand all your load. This is tricky. Or you give up on that
and accept that sequences in a distributed systems are not feasible.
This is where pagination gets indeed gets hard. But if you work
backwards
from the user experience, you can make a decent trade-off, see Google.
Can you elaborate on that? I don't quote get the "or duplicate data,
basically anything that needs to be the same as something else" bit.
Well. Let's say you have a list of documents. You want to store some
information about the newest document in a separate key (instead of a
view, which might be slow? if you have too many).
Too many what? Views or documents? Views are not really slow once
the index is built and with incremental updates, not in production
either.
Having many views is no problem either as they are evaluated on-read,
not on document-write (unlike traditional RDBMS column indexes).
That isn't possible.
Or let's say you have documents, and categories. And many, many, many
of them. Again, the view to show the latest document might be too
slow, so you want to save that information in a separate key. Not
possible.
Once a view is built, it is rather quick to look up things. I'm
suspecting here
that you assume that views are created on demand, based on user-input.
This is not something that would work except for very few documents and
you're advised to find a solution with predefined views.
A couple of things you can do with CouchDB replication (again, not
saying,
that you can't do some of those with an RDBMS but it is getting
harder
the further you move down the list): [...]
Thank you for that list. I think, and like many other users,
considering what I have read in blogs, seem to expect something else
from CouchDB. I am not so sure where this is coming from.
Check the Ruby thing a few mails down. How exactly is that
implementation going to work without immediate consistency?
Paul's Stuffing is for people who want to get going with CouchDB quickly
in their rails environment. It is specifically not designed around all
concepts
of CouchDB.
Everyone
seems to be going on about it being schema free, but you can just add
a "param" field to any database and transparently (un)serialize and
there you have it, schema-free.
Your alter-table statement locks your table (in MySQL). If you normalize
that out into a separate table, you add a JOIN which might end up not
being as fast as you like. Totally generic object behaviour abstractions
in SQL need something like 8 tables, there's no way this flies :)
If you actually have a few nodes (with that implementation), it will
break big, big time.
How? (Assuming you have a use-case in mind, can you explain that?)
I think, possibly,
with the "Cloud Hype", that I got into believing, that it will "just
work". With anything that you throw at it. Like what Amazon SimpleDB
tells you it would.
There is no magic bullet. Distributed programming is hard. :)
Yes, Key/Value pairs are incredibly easy. MapReduce is amazing and
intriguing. But handling the replication, won't it be so difficult
that you end up with a Quasi-Mini-RDBMS anyway?
What is a quasi-mini-RDBMS? Of course, concepts and behaviour
will likely overlap, but there are a number of properties that draws
people to CouchDB. The REST API is one thing. JSON another.
Replication yet another and the Erlang core another, another.
Speaking of which, Erlang is pretty cool for multi-core systems
that are rather hard to program in other languages (yet again,
no silver bullet).
Now I got far away from my original questions, but I guess that
happens often in discussions.
Basically, now: "Is it possible to handle the replication in such a
way that you don't end up with a Mini-RDBMS anyway in the end?"
Again, can you wrap that into a concrete example, I don't quite get what
that mini-RDBMS is and how your understanding of replication ties
into that :)
I would just, really really really, like to see an example that goes
beyond schema-free. That handles replication. I think that would show
where CouchDB shines, and where you'd fail with a RDBMS.
See the last three items on the list in the last mail. They are
traditionally
not easy to build on top of an RDBMS in a practical or scalable manner.
Cheers
Jan
--