don't forget that database sharding (and therefore view sharding) is
coming in the release after 1.3 when we merge BigCouch. View shards
build in parallel.

B.


On 20 October 2012 10:06, Gabriel De Oliveira Barbosa
<[email protected]> wrote:
> This topic is also interesting for me.
>
> How can I read this data ? I have to implement this logic in my application 
> or couchdb understand what I'm finding and redirect me to right database ?
> And what if I have to query data between two or more database ?
>
> Thanks
>
> Sent from my iPad
>
> On 20/10/2012, at 08:59, Alexander Shorin <[email protected]> wrote:
>
>> Hi Erik!
>>
>> The common practice for all databases (SQL, NoSQL) that serves fast
>> growing data is partitioning[1] - splitting data into partition per
>> some datetime period. Depended upon how fast data grows this period
>> may be year, month or even day. Applying to CouchDB this practice you
>> have to split data per databases with period in their name e.g.:
>>
>> world_logs/2012/10
>> world_logs/2012/09
>> world_logs/2012/08
>> world_logs/2012/07
>> ...
>>
>> Note slashes in names. With this trick CouchDB will create directory
>> hierarchy for these databases at filesystem:
>> + world_logs/
>> | ---- + 2012/
>> | ---- | ---- + 07.couch
>> | ---- | ---- + 08.couch
>> | ---- | ---- + 09.couch
>> | ---- | ---- + 10.couch
>>
>> So if your data grows by 1M docs per year splitting him by months will
>> creates 12 databases with ~100K documents. The big difference from
>> one-big database is that "old" data is already has computed view
>> index; if you adding new view you don't need to wait while all data
>> will be indexed - you'll get result much faster since index will be
>> build for small chunk that you currently interested.
>>
>> Also, you still could have simultaneously one big database with all
>> data which imports data from these small databases though replication.
>>
>> That's about how to optimize data to make views run faster. Also you
>> could try to switch from JavaScript query server to Erlang[2] one.
>> Erlang query server is native and doesn't suffers from stdio and json
>> serialization/deserialization overhead. As for me it gains indexation
>> boost for about 3-4 times depending on complexity of map function.
>>
>> P.S. There is good news for you: in 1.3 release there will be new
>> query server engine(already in master branch) that for my feeling is a
>> bit faster than similar in 1.2.
>>
>> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
>> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
>>
>> --
>> ,,,^..^,,,
>>
>>
>> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote:
>>> Hi,
>>>
>>> I'm wondering if there are any write performance improvements on the
>>> horizon? Although day to day read queries are great, and modest updates are
>>> fine, bulk updates and index rebuilding is pretty painful. I know
>>> performance tips are a broad enough topic without focusing it down. Since I
>>> need to deal with multiple databases which will grow at about a million
>>> documents per year, I'm in a bit of pain even testing the database with
>>> significant depth of data (e.g. 5 years).
>>>
>>> I'd be happy to provide my use case and experience, but thought I'd cut my
>>> usually verbose missives down to the bare question.
>>>
>>> Thanks,
>>> Erik.

Reply via email to