I think what many people really concerned is the growing pattern of size as number of docs increase. (space complexity) (If it grows exponentially then that's not a good sign.)
So is there any official/non-official, theoretically/benchmark showing this characteristic? 2011/6/30 Paul Davis <[email protected]> > > Teslan, > > I'm not sure where you were getting the impression that Erlang was > frugal with disk space. In general, its true that Erlang is pretty > good at using a minimal amount of CPU/RAM resources while it runs, > though as in all things, that usage will scale with load. > > As to disk usage, that's a direct trade off in the design of CouchDB. > The append only b+tree is going to cause fragmentation in the database > files. There are of course games we could play to minimize to a > certain extent by doing things like log structured merge trees with > more aggressive compaction but then the issue becomes that we end up > requiring more active file descriptors per database which in turn > hurts people that are hosting a large number of databases on a single > node (think hosting, or db per user account). > > My guess that whoever it was on IRC was just speaking with conviction. > We don't try and hide the fact that CouchDB uses quite a bit more > space than people would expect at first by any means. > > As to the amount of space that can be cleaned up, it really depends on > the specific load patterns and how aggressive people are at keeping > the database files compacted. Obviously I could write a single > document hundreds of thousands of times without compacting, and then > compact and have a database that is a percent or less of the > "uncompacted" size. > > I'm also not sure about why someone would say that a 2GiB database > would struggle with less than 2GiB of RAM. RAM usage is more or less > tied to the number of concurrent clients you have accessing the > database and the amount and type of view generations you have running. > Its not really tied to the physical size of the database as we don't > hold caches to anything. There used to be a silly benchmark floating > around that showed CouchDB handling a couple thousand requests for a > small doc and it was only using 9M of RAM. Granted that's a super > idealized case, but I'd just point out that it's more about access > patterns rather than disk usage. > > As to the mobile stuff, my guess would probably be "don't store a lot > of data on the device". AFAIK the story for mobile developers revolves > quite a bit around the fact that replicating data in and out from The > Cloud ™ makes it super easy for them to have bits and pieces of > a marge larger database. > > But in the end, the fact that CouchDB has a much larger disk usage > size than some would expect is that's the trade off in the grand > design. There are features we have like database snapshots, append > only storage to simplify guarantees on consistency (also, hot backups) > and hosting a large number of db's in a single Erlang VM that end up > intersecting in such a way that the price we pay is using more bytes. > > Also, I'd like to recommend you keep an eye on development because > this is an active area of optimization. Filipe has been doing awesome > work integrating things like snappy compression and other things deep > down at the storage layer to improve the situation. We may be frank in > saying we use a non-trivial amount of extra space, but its not like > we're not working on improving that situation. :D > > That ended up longer than expected. Let us know if you have any other > questions. > -- - sleepnova
