We have done so far up to 3.5m keys listing for the same purpose using 2i search over protocol buffers and it seems to be fast enough.

Maybe it is fast because it streams and compress the keys list directly into protocol buffer I/O stream without leaving a big foot print in memory? I don't know the answer to that question though.

Using 2i listing has never failed for us where Map reduce identity on a 2i (for counting few millions keys based on 2i) have had a 50% chance to fail depending on the key's size/count, at least for us.

We use Riak Java client, so that's also another concern, if you are using other programming language then you would wonder if the client uses 2i over PB.

Hope that helps,

Guido.

On 04/03/13 13:18, Pavel Kirienko wrote:
Hi everyone,

Is there any way to request a large number of keys through 2i streaming? Say, there is index with 10M entries, I want to extract 1M of them. Obviously the block request (i.e. all data packed into the single response) is not a best idea since it requires a good amount of memory either on client and the server.

One can suggest to feed 2i output into the Map/Reduce job with streaming output, but this way is not so hot either: it is really slow (our 3-node cluster stumbles on 100k keys for a minutes); and sometimes it just isn't working (streaming may stop occasionally before all data being kicked out). Not to mention that on 1M of keys Map/Reduce job just never starts.

Is it possible to perform 2i queries for large number of keys, or shall I use another storage for indexing instead? (like Redis maybe)

Thanks in advance.

Pavel.


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to