don't forget that database sharding (and therefore view sharding) is coming in the release after 1.3 when we merge BigCouch. View shards build in parallel.
B. On 20 October 2012 10:06, Gabriel De Oliveira Barbosa <[email protected]> wrote: > This topic is also interesting for me. > > How can I read this data ? I have to implement this logic in my application > or couchdb understand what I'm finding and redirect me to right database ? > And what if I have to query data between two or more database ? > > Thanks > > Sent from my iPad > > On 20/10/2012, at 08:59, Alexander Shorin <[email protected]> wrote: > >> Hi Erik! >> >> The common practice for all databases (SQL, NoSQL) that serves fast >> growing data is partitioning[1] - splitting data into partition per >> some datetime period. Depended upon how fast data grows this period >> may be year, month or even day. Applying to CouchDB this practice you >> have to split data per databases with period in their name e.g.: >> >> world_logs/2012/10 >> world_logs/2012/09 >> world_logs/2012/08 >> world_logs/2012/07 >> ... >> >> Note slashes in names. With this trick CouchDB will create directory >> hierarchy for these databases at filesystem: >> + world_logs/ >> | ---- + 2012/ >> | ---- | ---- + 07.couch >> | ---- | ---- + 08.couch >> | ---- | ---- + 09.couch >> | ---- | ---- + 10.couch >> >> So if your data grows by 1M docs per year splitting him by months will >> creates 12 databases with ~100K documents. The big difference from >> one-big database is that "old" data is already has computed view >> index; if you adding new view you don't need to wait while all data >> will be indexed - you'll get result much faster since index will be >> build for small chunk that you currently interested. >> >> Also, you still could have simultaneously one big database with all >> data which imports data from these small databases though replication. >> >> That's about how to optimize data to make views run faster. Also you >> could try to switch from JavaScript query server to Erlang[2] one. >> Erlang query server is native and doesn't suffers from stdio and json >> serialization/deserialization overhead. As for me it gains indexation >> boost for about 3-4 times depending on complexity of map function. >> >> P.S. There is good news for you: in 1.3 release there will be new >> query server engine(already in master branch) that for my feeling is a >> bit faster than similar in 1.2. >> >> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29 >> [2]: http://wiki.apache.org/couchdb/EnableErlangViews >> >> -- >> ,,,^..^,,, >> >> >> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote: >>> Hi, >>> >>> I'm wondering if there are any write performance improvements on the >>> horizon? Although day to day read queries are great, and modest updates are >>> fine, bulk updates and index rebuilding is pretty painful. I know >>> performance tips are a broad enough topic without focusing it down. Since I >>> need to deal with multiple databases which will grow at about a million >>> documents per year, I'm in a bit of pain even testing the database with >>> significant depth of data (e.g. 5 years). >>> >>> I'd be happy to provide my use case and experience, but thought I'd cut my >>> usually verbose missives down to the bare question. >>> >>> Thanks, >>> Erik.
