Why sequential document ids? [was: Re: What's the speed(performance) of couchdb?]

Barry Wark Thu, 26 Feb 2009 10:49:40 -0800

On Thu, Feb 26, 2009 at 8:30 AM, Chris Anderson <jch...@apache.org> wrote:
> On Thu, Feb 26, 2009 at 2:04 AM, Jan Lehnardt <j...@apache.org> wrote:
>> Hi Scott,
>>
>> thanks for your feedback. As a general note, you can't expect any magic
>> from CouchDB. It is bound by the same constraint all other programmes
>> are. To get the most out of CouchDB or SqlServer or MySQL, you need
>> to understand how it works.
>>
>>
>> On 26 Feb 2009, at 05:30, Scott Zhang wrote:
>>
>>> Hi. Thanks for replying.
>>> But what a database is for if it is slow? Every database has the feature
>>> to
>>> make cluster to improve speed and capacity (Don't metion "access" things).
>>
>> The point of CouchDB is allowing high numbers of concurrent requests. This
>> gives you more throughput for a single machine but not necessarily faster
>> single query execution speed.
>>
>>
>>> I was expecting couchDB is as fast as SqlServer or mysql. At least I know,
>>> mnesia is much faster than SqlServer. But mnesia always throw harmless
>>> "overload" message.
>>
>> CouchDB is not nearly as old as either of them. Did you really expect a
>> software in alpha stages to be faster than fine-tuned systems that have
>> been used in production for a decade or longer?
>>
>>
>>> I will try bulk insert now. But be  fair, I was inserting  into sqlserver
>>> one insert one time.
>>
>> Insert speed can be speed up in numerous ways:
>>
>>  - Use sequential descending document ids on insert.
>
> or ascending...


As an asside, why is it that sequential document ids would produce a
significant performance boost? I suspect the answer is something
rather fundamental to CouchDB's design, and I'd like to try to grok
it.

Thanks,
Barry

>
>>  - Use bulk insert.
>
> with ascending keys and bulk insert of 1000 docs at a time I was able
> to write 3k docs per second. here is the benchmark script:
> http://friendpaste.com/5g0kOEPonxdXMKibNRzetJ
>
>
>>  - Bypass the HTTP API and insert native Erlang terms and skip JSON
>> conversion.
>
> doing this I was able to get 6k docs / sec
>
> In a separate test using attachments of 250k and an Erlang API (no
> HTTP) I was able to write to my disk at 80% of the speed it can accept
> when streaming raw bytes to disk. (Rougly 20 MB/sec)
>
>>
>> The question is what you need you system to look like eventually. If this is
>> an initial data-import and after that you get mostly read requests, the
>> longer
>> insertion time will amortize over time.
>>
>> What version is the Windows binary you are using? If it is still 0.8, you
>> should
>> try trunk (which most likely means switching to some UNIXy system).
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>
>>
>>>
>>> Regards.
>>>
>>>
>>>
>>>
>>> On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <j...@mooseyard.com> wrote:
>>>
>>>>
>>>> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>>>>
>>>> But the performance is as bad as I can image, After several minutes run,
>>>> I
>>>>>
>>>>> only inserted into 120K records. I saw the speed is ~20 records each
>>>>> second.
>>>>>
>>>>
>>>> Use the bulk-insert API to improve speed. The way you're doing it, every
>>>> record being added is a separate transaction, which requires a separate
>>>> HTTP
>>>> request and flushing the file.
>>>>
>>>> (I'm a CouchDB newbie, but I don't think the point of CouchDB is speed.
>>>> What's exciting about it is the flexibility and the ability to build
>>>> distributed systems. If you're looking for a traditional database with
>>>> speed, have you tried MySQL?)
>>>>
>>>> —Jens
>>
>>
>
>
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Why sequential document ids? [was: Re: What's the speed(performance) of couchdb?]

Reply via email to