Hi Georgi, All Guido’s (below) advice is good. If you are just importing unique items, I would set the bucket property to LWW=true for the import, it will be much faster since Riak will not do N local reads for vclock data.
Cheers Russell On 29 Oct 2013, at 15:21, Guido Medina <[email protected]> wrote: > Your tests are not close to what you are going to have in production IMHO, > here are few recommendations: > • Build a cluster with at least 5 nodes with N=3 and R=W=2 (You can > update your bucket properties via PBC with Java) > • Use PBC instead of HTTP. > • If you are only importing data call > .store()....withoutFetch().execute() to avoid unnecessary roundtrips. > If you test using unrealistic scenarios you will find unpleasant surprises > when you are about to be go live so better to set your expectations right at > the beginning. > HTH, > Guido. > On 29/10/13 14:59, Georgi Ivanov wrote: >> Hello, >> I am importing some big data to Riak. >> I am importing like 10GB per day and i have to import one year of data. >> The task is to speed up the initial import. After that i will import on >> daily >> basis, so the speed is not very important. >> >> I am using JAVA HTTP client. So far my test show that the fastest setup is >> to >> use n_val 1 and import to single server. >> >> I tested importing on 2 servers (with n_val:2), but it is actually slower. >> My JAVA client is multi-threaded. >> >> My idea is to use n_val:1 on single node, then increase the n_val:2 and add >> one more node to the cluster. The problem is that i don't see the storage to >> grow when i change n_val : 2 >> I was looking at Riak Active Anti-Entropy feature and i am expecting my >> storage to grow after i increase the n_val. Unfortunately this is not the >> case >> or i don't understand AAE feature .... >> I can't any changes in storage size at all. I don't want to go in direction >> of >> force repair as it would take forever. >> >> Can anyone shed some light on AAE ? Or any tips for speeding up the import >> in >> general. >> >> To summarize the situation : >> 1. One Riak node with n_val : 1 , eLevelDb as back-end >> 2. Import data. >> 3. Change n_val to 2 >> 4. Join one more node to the cluster. >> >> What i expect to happen : >> To have all the keys distributed to 2 riak nodes with n_val:2 >> So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 >> and >> joining one more node, to have 1TB of data on each node. >> >> >> _______________________________________________ >> riak-users mailing list >> >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
