Re: map performance notes and question

Alexander Sicular Tue, 18 Jan 2011 10:50:54 -0800

To your second point re. returning a large string via a js m/r, yes,
this has to do with passing the string from the erlang environment to
the javascript environment. That is why it is always recommended that
as your needs escalate you should seriously look into going native
erlang in your m/r's.


Best, Alexander

On Tue, Jan 18, 2011 at 18:39, Brendan <[email protected]> wrote:
> hi. i'm a new riak user. i scanned the list archives and didn't notice
> either of these things discussed - i apologize in advance if they
> already have been.
>
> i noticed something that is perhaps of use to people. when using the
> data of an object in a map phase to generate a list of [bucket,object]
> pairs for a second map phase i've seen the suggestion to use a json
> array as the data and Riak.mapValues to generate the list. i've been
> playing with this (and some more complex json objects beyond a simple
> array) and i've come to the conclusion that using a comma separated list
> as the data instead of a json array is far more efficient on large
> datasets.
>
> JSON.parse (and the eval()) it calls is relatively slow, but a
> split(',') operation is very quick. on a 100,000 element dataset on my
> test machine the split averages about 25ms to convert the csv data into
> a javascript array object, but using JSON.parse to convert the json data
> into a javascript array object averages around 750ms. for smaller
> datasets there isn't much of a difference, but for large ones the
> difference can become quite significant. hopefully this observation is
> of use to someone besides me. :)
>
> i also have a question regarding map performance with large objects. if
> i create an object with one of these large 100,000 element datasets in
> it (whether a csv or json array doesn't matter) then stop and start
> riak, i can retrieve the object (get, via rest api) in about 0.2
> seconds. subsequent requests take about 0.06 seconds, i assume because
> riak had to load the data from disk the first time and had it cached the
> second. at any rate, riak is very quick in both cases.
>
> if, however, i then do a simple map operation of
> "function(v){return[1]}" where the map doesn't actually do any work at
> all on just that single object, then the time to process the request
> balloons to 2.1 seconds. given that riak can return the data via GET in
> 0.06 seconds, 2.1 seconds simply to load the object into a map phase
> seems excessive. is this a penalty for passing the data into the
> javascript engine, and thus something i could avoid by doing the map
> operation natively in erlang (my erlang skills are pretty weak, but if
> this works around the issue it'll be a good driver to force me to
> improve them)? or is it simply a penalty of the map phase, with no way
> around it?
>
> (everything above is based on tests with riak-0.14-0)
>
> thanks
> -brendan
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: map performance notes and question

Reply via email to