I'm excited for this merge. Where is the best place to follow the merge progress ?
Sent from my iPad On 20/10/2012, at 11:29, Alexander Shorin <[email protected]> wrote: > On Sat, Oct 20, 2012 at 6:06 PM, Gabriel De Oliveira Barbosa > <[email protected]> wrote: >> This topic is also interesting for me. >> >> How can I read this data ? I have to implement this logic in my application >> or couchdb understand what I'm finding and redirect me to right database ? >> And what if I have to query data between two or more database? > > This could be easily done with proxy to os_daemon[1], so the only > thing you have is to write logic for sharding and requesting correct > shards - this mostly depended from that problem you're solving. Also > you can have symlink to actual database with static name - CouchDB is > able to follow them and this allows you to switch current database > shard more less transparently. > > But as Robert mentioned, BigCouch merge should simplify these things > and more others. > > [1]: http://davispj.com/2010/09/26/new-couchdb-externals-api.html > > -- > ,,,^..^,,, > > > On Sat, Oct 20, 2012 at 6:20 PM, Robert Newson <[email protected]> wrote: >> don't forget that database sharding (and therefore view sharding) is >> coming in the release after 1.3 when we merge BigCouch. View shards >> build in parallel. >> >> B. >> >> >> On 20 October 2012 10:06, Gabriel De Oliveira Barbosa >> <[email protected]> wrote: >>> This topic is also interesting for me. >>> >>> How can I read this data ? I have to implement this logic in my application >>> or couchdb understand what I'm finding and redirect me to right database ? >>> And what if I have to query data between two or more database ? >>> >>> Thanks >>> >>> Sent from my iPad >>> >>> On 20/10/2012, at 08:59, Alexander Shorin <[email protected]> wrote: >>> >>>> Hi Erik! >>>> >>>> The common practice for all databases (SQL, NoSQL) that serves fast >>>> growing data is partitioning[1] - splitting data into partition per >>>> some datetime period. Depended upon how fast data grows this period >>>> may be year, month or even day. Applying to CouchDB this practice you >>>> have to split data per databases with period in their name e.g.: >>>> >>>> world_logs/2012/10 >>>> world_logs/2012/09 >>>> world_logs/2012/08 >>>> world_logs/2012/07 >>>> ... >>>> >>>> Note slashes in names. With this trick CouchDB will create directory >>>> hierarchy for these databases at filesystem: >>>> + world_logs/ >>>> | ---- + 2012/ >>>> | ---- | ---- + 07.couch >>>> | ---- | ---- + 08.couch >>>> | ---- | ---- + 09.couch >>>> | ---- | ---- + 10.couch >>>> >>>> So if your data grows by 1M docs per year splitting him by months will >>>> creates 12 databases with ~100K documents. The big difference from >>>> one-big database is that "old" data is already has computed view >>>> index; if you adding new view you don't need to wait while all data >>>> will be indexed - you'll get result much faster since index will be >>>> build for small chunk that you currently interested. >>>> >>>> Also, you still could have simultaneously one big database with all >>>> data which imports data from these small databases though replication. >>>> >>>> That's about how to optimize data to make views run faster. Also you >>>> could try to switch from JavaScript query server to Erlang[2] one. >>>> Erlang query server is native and doesn't suffers from stdio and json >>>> serialization/deserialization overhead. As for me it gains indexation >>>> boost for about 3-4 times depending on complexity of map function. >>>> >>>> P.S. There is good news for you: in 1.3 release there will be new >>>> query server engine(already in master branch) that for my feeling is a >>>> bit faster than similar in 1.2. >>>> >>>> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29 >>>> [2]: http://wiki.apache.org/couchdb/EnableErlangViews >>>> >>>> -- >>>> ,,,^..^,,, >>>> >>>> >>>> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> I'm wondering if there are any write performance improvements on the >>>>> horizon? Although day to day read queries are great, and modest updates >>>>> are >>>>> fine, bulk updates and index rebuilding is pretty painful. I know >>>>> performance tips are a broad enough topic without focusing it down. Since >>>>> I >>>>> need to deal with multiple databases which will grow at about a million >>>>> documents per year, I'm in a bit of pain even testing the database with >>>>> significant depth of data (e.g. 5 years). >>>>> >>>>> I'd be happy to provide my use case and experience, but thought I'd cut my >>>>> usually verbose missives down to the bare question. >>>>> >>>>> Thanks, >>>>> Erik.
