Hi Ryan,

Thanks for your response.

To answer your question.

#1: mapreduce will not go thru the entire key list if you feed it a search 
query on input, only the matched objects [Understood, but the java client 
doesn’t have a mapreduce search function. We did a custom implementation of 
this but the response time was still  "very" slow.]
#2: to rule out any funkiness in the java client have you tried the queries via 
curl as well? Have tried this as well, the response is significantly quicker 
than the above mapreduce but still not fast enough.

# of Objects: 10,000+
Backend Used: bitcask
# of nodes: 5
Machine Specs: Amazon EC2 (Large Instance)


Thanks

Harshal Dhir | Technical Architect

solutionset
P: 510-214-3519  Twitter: @harshaldhir
85 Second Street, San Francisco, CA 94105
Twitter: @harshaldhir MSN: [email protected] Jabber: 
[email protected]
www.solutionset.com<http://www.solutionset.com/>

This message is intended for the addressee(s) only and may contain confidential 
or privileged
information. Any use of this information by persons other than addressee(s) is 
prohibited. If you
have received this message in error, please reply to the sender and delete or 
destroy all copies.

From: Ryan Zezeski <[email protected]<mailto:[email protected]>>
Date: Wed, 28 Sep 2011 11:33:56 -0700
To: Harshal Dhir 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Does Java client support querying large amounts of data from Riak? 
What is the JSON parse/serialize overhead for such Queries?

Harshal,

Harshal,

If I understand correctly you are trying to retrieve a set of objects that 
match a given search query?

1: mapreduce will not go thru the entire key list if you feed it a search query 
on input, only the matched objects

2: to rule out any funkiness in the java client have you tried the queries via 
curl as well?

The nominal latency is going to highly depend on many factors including 
network, # of objects matched, backend used, machine specs, # of nodes, # of 
concurrent queries, etc.  My first question would be what is the average result 
set size coming back for these 4 second latencies (i.e. how many objects are 
matched by the query)?


-Ryan

On Wed, Sep 28, 2011 at 2:15 PM, Harshal Dhir 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

We currently using riak java client 0.15.0-SNAPSHOT, but there is no better 
interface for RiakSearch at the moment. We are trying to fetch data from a 
bucket with million rows, here is a gist of issues we are facing with different 
approaches:

 *   MapReduce:
    *   Using mapreduce, the data response is just too slow due to the fact 
that this is a O(N) operation of going through the keys list.
 *   Solr Client:
    *   Using Solr Client, we are able to query but the response takes a long 
time about 4seconds to return a nominal result is still slow. [I heard the 
normal response time is approx 200ms]

Is there a way in the current java client to fetch large amount of data in 
nominal time.

What is the recommended approach of saving / retrieving / querying this data in 
the fastest possible way?

Thanks
Harshal


_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to