On Wed, Mar 19, 2014 at 4:15 AM, Andrey Anpilogov <[email protected]> wrote:
> Hi,
>
> I've been playing with Riak 2.0 and found some strange performance drop
> with new Search system.
> Run two Riak nodes on E3 machines with 32GB RAM and SSD drives.
> 1) Use leveldb as backend.
> 2) Enabled Search engine
> 3) Join them into cluster.
> 4) Created "users_t" index with default schema.
> 5) Associate that index with "users_t" bucket type.
> 6) Run simple node.js app that put json object into bucket:
> { name_s: 'Cara Pagac',
> username_s: 'Ines.Crooks',
> email_s: '[email protected]',
> address:
> { street_s: 'Hermann Row',
> suite_s: 'Suite 320',
> city_s: 'Lake Oscarstad',
> zipcode_i: '68010-5144',
> state_s: 'Delaware',
> geo: { lat_i: '-12.3129', lng_i: '-163.9018' } },
> phone_s: '(269)258-3990',
> website_s: 'daija.me',
> company:
> { name_s: 'Padberg-Gorczany',
> catchPhrase_s: 'Face to face dedicated time-frame',
> bs_s: 'streamline open-source portals' },
> weight_i: 49 }
>
> I've been surprised by results:
> 1) Insert into /types/users_t/buckets/users/keys/[key] - 400 ops/s.
> 2) Insert into /buckets/users/keys/[keys] - 40 ops/s
>
According to this your indexed data is doing an order of magnitude more
throughput than your non-indexed data. To me that would have nothing to do
with Yokozuna but perhaps some difference in typed vs untyped bucket code.
I'm not really sure. You put the exact same size/type of data in both
buckets?
>
> Initially I tested same app with single Riak node and performance for
> data-type'd bucket was ~300ops/s.
> But once I add second node to cluster performance dropped almost 7.5
> times.
> Does it mean that distributed Yokozuna is so slow?
>
The Yokozuna queries will probably get slower with each node you add but
indexing speed should not be affected. Yokozuna currently indexes all KV
ojbects locally. That is, there is no cross-node traffic for indexing. Did
you let the join finish before writing new data? Perhaps ownership handoff
was occurring?
>
> P/S:
> I run node.js app on separate machine. They all in same 1Gb network.
> In case of one Riak node the bottleneck is storage.
> In case of two Riak nodes there is not bottleneck. I mean there is enough
> free IO (disk and network), CPU and RAM resources.
>
How did you determine disk was the bottleneck? Remember tools like iostat
do not necessarily indicate a disk bottleneck as it can only see physical
I/O and not logical. You can have 100% busy disk without harming
application latency.
As mentioned my Andrew Riak, in it's default config, is meant for 5 nodes
or more. If you are going to use 2 nodes in production you should set n_val
and target_n to 2.
Another thing worth tweaking is you distribution buffer size.
http://docs.basho.com/riak/2.0.0pre11/ops/advanced/configs/configuration-files/#Erlang-VM
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com