Re: view index build performance improvements coming soon?

Alexander Shorin Sat, 20 Oct 2012 04:59:57 -0700

Hi Erik!

The common practice for all databases (SQL, NoSQL) that serves fast
growing data is partitioning[1] - splitting data into partition per
some datetime period. Depended upon how fast data grows this period
may be year, month or even day. Applying to CouchDB this practice you
have to split data per databases with period in their name e.g.:

world_logs/2012/10
world_logs/2012/09
world_logs/2012/08
world_logs/2012/07
...

Note slashes in names. With this trick CouchDB will create directory
hierarchy for these databases at filesystem:
+ world_logs/
| ---- + 2012/
| ---- | ---- + 07.couch
| ---- | ---- + 08.couch
| ---- | ---- + 09.couch
| ---- | ---- + 10.couch

So if your data grows by 1M docs per year splitting him by months will
creates 12 databases with ~100K documents. The big difference from
one-big database is that "old" data is already has computed view
index; if you adding new view you don't need to wait while all data
will be indexed - you'll get result much faster since index will be
build for small chunk that you currently interested.

Also, you still could have simultaneously one big database with all
data which imports data from these small databases though replication.

That's about how to optimize data to make views run faster. Also you
could try to switch from JavaScript query server to Erlang[2] one.
Erlang query server is native and doesn't suffers from stdio and json
serialization/deserialization overhead. As for me it gains indexation
boost for about 3-4 times depending on complexity of map function.

P.S. There is good news for you: in 1.3 release there will be new
query server engine(already in master branch) that for my feeling is a
bit faster than similar in 1.2.

[1]: http://en.wikipedia.org/wiki/Partition_%28database%29
[2]: http://wiki.apache.org/couchdb/EnableErlangViews

--
,,,^..^,,,

On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote:
> Hi,
>
> I'm wondering if there are any write performance improvements on the
> horizon? Although day to day read queries are great, and modest updates are
> fine, bulk updates and index rebuilding is pretty painful. I know
> performance tips are a broad enough topic without focusing it down. Since I
> need to deal with multiple databases which will grow at about a million
> documents per year, I'm in a bit of pain even testing the database with
> significant depth of data (e.g. 5 years).
>
> I'd be happy to provide my use case and experience, but thought I'd cut my
> usually verbose missives down to the bare question.
>
> Thanks,
> Erik.

Re: view index build performance improvements coming soon?

Reply via email to