Thanks Evan. I tried doing it in python like this (realizing that the previous
way I did it uses MapReduce) and I had better results. It finished in 3.5
minutes, but nowhere close to the 15 seconds from the straight http query:
import riak
from pprint import pprint
bucket_name = "mybucket"
client = riak.RiakClient(port=8087,transport_class=riak.RiakPbcTransport)
bucket = client.bucket(bucket_name)
results = bucket.get_index('status_bin', 'PERSISTED')
print len(results)
On Apr 10, 2013, at 4:00 PM, Evan Vigil-McClanahan <[email protected]>
wrote:
> get_index() is the right function there, I think.
>
> On Wed, Apr 10, 2013 at 2:53 PM, Jeff Peck <[email protected]> wrote:
>> I can grab over 900,000 keys from an indexs, using an http query in about 15
>> seconds, whereas the same operation in python times out after 5 minutes.
>> Does this indicate that I am using the python API incorrectly? Should I be
>> relying on an http request initially when I need to grab this many keys?
>>
>> (Note: This is tied to the question that I asked earlier, but is also a
>> general question to help understand the proper usage of the python API.)
>>
>> Thanks! Examples are below.
>>
>> - Jeff
>>
>> ---
>>
>> HTTP:
>>
>> $ time curl -s
>> http://localhost:8098/buckets/mybucket/index/status_bin/PERSISTED | grep -o
>> , | wc -l
>> 926047
>>
>> real 0m14.583s
>> user 0m2.500s
>> sys 0m0.270s
>>
>> ---
>>
>> Python:
>>
>> import riak
>>
>> bucket = "my bucket"
>> client = riak.RiakClient(port=8098)
>> results = client.index(bucket, 'status_bin',
>> 'PERSISTED').run(timeout=5*60*1000) # 5 minute timeout
>> print len(results)
>>
>> (times out after 5 minutes)
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com