Re: Import big data to Riak

Russell Brown Tue, 29 Oct 2013 08:35:39 -0700

Hi Georgi,

All Guido’s (below) advice is good. If you are just importing unique items, I 
would set the bucket property to LWW=true for the import, it will be much 
faster since Riak will not do N local reads for vclock data.


Cheers

Russell

On 29 Oct 2013, at 15:21, Guido Medina <[email protected]> wrote:

> Your tests are not close to what you are going to have in production IMHO, 
> here are few recommendations:
>       • Build a cluster with at least 5 nodes with N=3 and R=W=2 (You can 
> update your bucket properties via PBC with Java)
>       • Use PBC instead of HTTP.
>       • If you are only importing data call 
> .store()....withoutFetch().execute() to avoid unnecessary roundtrips.
> If you test using unrealistic scenarios you will find unpleasant surprises 
> when you are about to be go live so better to set your expectations right at 
> the beginning.
> HTH,
> Guido.
> On 29/10/13 14:59, Georgi Ivanov wrote:
>> Hello,
>> I am importing some big data to Riak. 
>> I am importing like 10GB per day and i have to import one year of data. 
>> The task is to speed up the initial import. After  that i will import on 
>> daily 
>> basis, so the speed is not very important.
>> 
>> I am using JAVA HTTP client. So far my test show that the fastest setup is 
>> to 
>> use n_val 1 and import to single server.
>> 
>> I tested importing on 2 servers (with n_val:2), but it is actually slower.
>> My JAVA client is multi-threaded.
>> 
>> My idea is to use n_val:1 on single node, then increase the n_val:2 and add 
>> one more node to the cluster. The problem is that i don't see the storage to 
>> grow when i change n_val : 2
>> I was looking at Riak Active Anti-Entropy feature and i am expecting my 
>> storage to grow after i increase the n_val. Unfortunately this is not the 
>> case  
>> or i don't understand AAE feature ....
>> I can't any changes in storage size at all. I don't want to go in direction 
>> of 
>> force repair as it would take forever.
>> 
>> Can anyone shed some light on AAE ? Or any tips for speeding up the import 
>> in 
>> general.
>> 
>> To summarize the situation :
>> 1. One Riak node with n_val : 1 , eLevelDb as back-end
>> 2. Import data.
>> 3. Change n_val to 2
>> 4. Join one more node to the cluster.
>> 
>> What i expect to happen :
>> To have all the keys distributed to 2 riak nodes with n_val:2
>> So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 
>> and 
>> joining one more node, to have 1TB of data on each node.
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> 
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Import big data to Riak

Reply via email to