Hi Erik! The common practice for all databases (SQL, NoSQL) that serves fast growing data is partitioning[1] - splitting data into partition per some datetime period. Depended upon how fast data grows this period may be year, month or even day. Applying to CouchDB this practice you have to split data per databases with period in their name e.g.:
world_logs/2012/10 world_logs/2012/09 world_logs/2012/08 world_logs/2012/07 ... Note slashes in names. With this trick CouchDB will create directory hierarchy for these databases at filesystem: + world_logs/ | ---- + 2012/ | ---- | ---- + 07.couch | ---- | ---- + 08.couch | ---- | ---- + 09.couch | ---- | ---- + 10.couch So if your data grows by 1M docs per year splitting him by months will creates 12 databases with ~100K documents. The big difference from one-big database is that "old" data is already has computed view index; if you adding new view you don't need to wait while all data will be indexed - you'll get result much faster since index will be build for small chunk that you currently interested. Also, you still could have simultaneously one big database with all data which imports data from these small databases though replication. That's about how to optimize data to make views run faster. Also you could try to switch from JavaScript query server to Erlang[2] one. Erlang query server is native and doesn't suffers from stdio and json serialization/deserialization overhead. As for me it gains indexation boost for about 3-4 times depending on complexity of map function. P.S. There is good news for you: in 1.3 release there will be new query server engine(already in master branch) that for my feeling is a bit faster than similar in 1.2. [1]: http://en.wikipedia.org/wiki/Partition_%28database%29 [2]: http://wiki.apache.org/couchdb/EnableErlangViews -- ,,,^..^,,, On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote: > Hi, > > I'm wondering if there are any write performance improvements on the > horizon? Although day to day read queries are great, and modest updates are > fine, bulk updates and index rebuilding is pretty painful. I know > performance tips are a broad enough topic without focusing it down. Since I > need to deal with multiple databases which will grow at about a million > documents per year, I'm in a bit of pain even testing the database with > significant depth of data (e.g. 5 years). > > I'd be happy to provide my use case and experience, but thought I'd cut my > usually verbose missives down to the bare question. > > Thanks, > Erik.
