Thanks Evan. I tried doing it in python like this (realizing that the previous 
way I did it uses MapReduce) and I had better results. It finished in 3.5 
minutes, but nowhere close to the 15 seconds from the straight http query:

import riak
from pprint import pprint

bucket_name = "mybucket"

client = riak.RiakClient(port=8087,transport_class=riak.RiakPbcTransport)
bucket = client.bucket(bucket_name)
results = bucket.get_index('status_bin', 'PERSISTED')

print len(results)


On Apr 10, 2013, at 4:00 PM, Evan Vigil-McClanahan <[email protected]> 
wrote:

> get_index() is the right function there, I think.
> 
> On Wed, Apr 10, 2013 at 2:53 PM, Jeff Peck <[email protected]> wrote:
>> I can grab over 900,000 keys from an indexs, using an http query in about 15 
>> seconds, whereas the same operation in python times out after 5 minutes. 
>> Does this indicate that I am using the python API incorrectly? Should I be 
>> relying on an http request initially when I need to grab this many keys?
>> 
>> (Note: This is tied to the question that I asked earlier, but is also a 
>> general question to help understand the proper usage of the python API.)
>> 
>> Thanks! Examples are below.
>> 
>> - Jeff
>> 
>> ---
>> 
>> HTTP:
>> 
>> $ time curl -s 
>> http://localhost:8098/buckets/mybucket/index/status_bin/PERSISTED | grep -o 
>> , | wc -l
>> 926047
>> 
>> real    0m14.583s
>> user    0m2.500s
>> sys     0m0.270s
>> 
>> ---
>> 
>> Python:
>> 
>> import riak
>> 
>> bucket = "my bucket"
>> client = riak.RiakClient(port=8098)
>> results = client.index(bucket, 'status_bin', 
>> 'PERSISTED').run(timeout=5*60*1000) # 5 minute timeout
>> print len(results)
>> 
>> (times out after 5 minutes)
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to