Re: view index build performance improvements coming soon?

Robert Newson Sat, 20 Oct 2012 07:20:29 -0700

don't forget that database sharding (and therefore view sharding) is
coming in the release after 1.3 when we merge BigCouch. View shards
build in parallel.


B.


On 20 October 2012 10:06, Gabriel De Oliveira Barbosa
<[email protected]> wrote:
> This topic is also interesting for me.
>
> How can I read this data ? I have to implement this logic in my application 
> or couchdb understand what I'm finding and redirect me to right database ?
> And what if I have to query data between two or more database ?
>
> Thanks
>
> Sent from my iPad
>
> On 20/10/2012, at 08:59, Alexander Shorin <[email protected]> wrote:
>
>> Hi Erik!
>>
>> The common practice for all databases (SQL, NoSQL) that serves fast
>> growing data is partitioning[1] - splitting data into partition per
>> some datetime period. Depended upon how fast data grows this period
>> may be year, month or even day. Applying to CouchDB this practice you
>> have to split data per databases with period in their name e.g.:
>>
>> world_logs/2012/10
>> world_logs/2012/09
>> world_logs/2012/08
>> world_logs/2012/07
>> ...
>>
>> Note slashes in names. With this trick CouchDB will create directory
>> hierarchy for these databases at filesystem:
>> + world_logs/
>> | ---- + 2012/
>> | ---- | ---- + 07.couch
>> | ---- | ---- + 08.couch
>> | ---- | ---- + 09.couch
>> | ---- | ---- + 10.couch
>>
>> So if your data grows by 1M docs per year splitting him by months will
>> creates 12 databases with ~100K documents. The big difference from
>> one-big database is that "old" data is already has computed view
>> index; if you adding new view you don't need to wait while all data
>> will be indexed - you'll get result much faster since index will be
>> build for small chunk that you currently interested.
>>
>> Also, you still could have simultaneously one big database with all
>> data which imports data from these small databases though replication.
>>
>> That's about how to optimize data to make views run faster. Also you
>> could try to switch from JavaScript query server to Erlang[2] one.
>> Erlang query server is native and doesn't suffers from stdio and json
>> serialization/deserialization overhead. As for me it gains indexation
>> boost for about 3-4 times depending on complexity of map function.
>>
>> P.S. There is good news for you: in 1.3 release there will be new
>> query server engine(already in master branch) that for my feeling is a
>> bit faster than similar in 1.2.
>>
>> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
>> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
>>
>> --
>> ,,,^..^,,,
>>
>>
>> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote:
>>> Hi,
>>>
>>> I'm wondering if there are any write performance improvements on the
>>> horizon? Although day to day read queries are great, and modest updates are
>>> fine, bulk updates and index rebuilding is pretty painful. I know
>>> performance tips are a broad enough topic without focusing it down. Since I
>>> need to deal with multiple databases which will grow at about a million
>>> documents per year, I'm in a bit of pain even testing the database with
>>> significant depth of data (e.g. 5 years).
>>>
>>> I'd be happy to provide my use case and experience, but thought I'd cut my
>>> usually verbose missives down to the bare question.
>>>
>>> Thanks,
>>> Erik.

Re: view index build performance improvements coming soon?

Reply via email to