Re: view index build performance improvements coming soon?

Gabriel De Oliveira Barbosa Sat, 20 Oct 2012 07:59:57 -0700

I'm excited for this merge.

Where is the best place to follow the merge progress ?


Sent from my iPad

On 20/10/2012, at 11:29, Alexander Shorin <[email protected]> wrote:

> On Sat, Oct 20, 2012 at 6:06 PM, Gabriel De Oliveira Barbosa
> <[email protected]> wrote:
>> This topic is also interesting for me.
>> 
>> How can I read this data ? I have to implement this logic in my application 
>> or couchdb understand what I'm finding and redirect me to right database ?
>> And what if I have to query data between two or more database?
> 
> This could be easily done with proxy to os_daemon[1], so the only
> thing you have is to write logic for sharding and requesting correct
> shards - this mostly depended from that problem you're solving. Also
> you can have symlink to actual database with static name - CouchDB is
> able to follow them and this allows you to switch current database
> shard more less transparently.
> 
> But as Robert mentioned, BigCouch merge should simplify these things
> and more others.
> 
> [1]: http://davispj.com/2010/09/26/new-couchdb-externals-api.html
> 
> --
> ,,,^..^,,,
> 
> 
> On Sat, Oct 20, 2012 at 6:20 PM, Robert Newson <[email protected]> wrote:
>> don't forget that database sharding (and therefore view sharding) is
>> coming in the release after 1.3 when we merge BigCouch. View shards
>> build in parallel.
>> 
>> B.
>> 
>> 
>> On 20 October 2012 10:06, Gabriel De Oliveira Barbosa
>> <[email protected]> wrote:
>>> This topic is also interesting for me.
>>> 
>>> How can I read this data ? I have to implement this logic in my application 
>>> or couchdb understand what I'm finding and redirect me to right database ?
>>> And what if I have to query data between two or more database ?
>>> 
>>> Thanks
>>> 
>>> Sent from my iPad
>>> 
>>> On 20/10/2012, at 08:59, Alexander Shorin <[email protected]> wrote:
>>> 
>>>> Hi Erik!
>>>> 
>>>> The common practice for all databases (SQL, NoSQL) that serves fast
>>>> growing data is partitioning[1] - splitting data into partition per
>>>> some datetime period. Depended upon how fast data grows this period
>>>> may be year, month or even day. Applying to CouchDB this practice you
>>>> have to split data per databases with period in their name e.g.:
>>>> 
>>>> world_logs/2012/10
>>>> world_logs/2012/09
>>>> world_logs/2012/08
>>>> world_logs/2012/07
>>>> ...
>>>> 
>>>> Note slashes in names. With this trick CouchDB will create directory
>>>> hierarchy for these databases at filesystem:
>>>> + world_logs/
>>>> | ---- + 2012/
>>>> | ---- | ---- + 07.couch
>>>> | ---- | ---- + 08.couch
>>>> | ---- | ---- + 09.couch
>>>> | ---- | ---- + 10.couch
>>>> 
>>>> So if your data grows by 1M docs per year splitting him by months will
>>>> creates 12 databases with ~100K documents. The big difference from
>>>> one-big database is that "old" data is already has computed view
>>>> index; if you adding new view you don't need to wait while all data
>>>> will be indexed - you'll get result much faster since index will be
>>>> build for small chunk that you currently interested.
>>>> 
>>>> Also, you still could have simultaneously one big database with all
>>>> data which imports data from these small databases though replication.
>>>> 
>>>> That's about how to optimize data to make views run faster. Also you
>>>> could try to switch from JavaScript query server to Erlang[2] one.
>>>> Erlang query server is native and doesn't suffers from stdio and json
>>>> serialization/deserialization overhead. As for me it gains indexation
>>>> boost for about 3-4 times depending on complexity of map function.
>>>> 
>>>> P.S. There is good news for you: in 1.3 release there will be new
>>>> query server engine(already in master branch) that for my feeling is a
>>>> bit faster than similar in 1.2.
>>>> 
>>>> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
>>>> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
>>>> 
>>>> --
>>>> ,,,^..^,,,
>>>> 
>>>> 
>>>> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote:
>>>>> Hi,
>>>>> 
>>>>> I'm wondering if there are any write performance improvements on the
>>>>> horizon? Although day to day read queries are great, and modest updates 
>>>>> are
>>>>> fine, bulk updates and index rebuilding is pretty painful. I know
>>>>> performance tips are a broad enough topic without focusing it down. Since 
>>>>> I
>>>>> need to deal with multiple databases which will grow at about a million
>>>>> documents per year, I'm in a bit of pain even testing the database with
>>>>> significant depth of data (e.g. 5 years).
>>>>> 
>>>>> I'd be happy to provide my use case and experience, but thought I'd cut my
>>>>> usually verbose missives down to the bare question.
>>>>> 
>>>>> Thanks,
>>>>> Erik.

Re: view index build performance improvements coming soon?

Reply via email to