Use the https://github.com/basho/riak-erlang-client directly, instead of calling os:cmd and pushing through CURL. You can also parallelize it at that time, because right now you're doing 25million os:cmd calls and making 25million curl calls. Open up a pool of connections (or even just N and round-robin them) and keep them open.
A 2-node cluster will have 1/3 of the set on one machine, and 2/3 on the other. You may consider moving to N=2 on the bucket, which will put one copy on each machine (eg, dual-master.) Beyond that, you have not provided enough information as to where the bottleneck may be, though I'm sure the Basho crew will have some better better answers. :) -mox On Mon, Aug 27, 2012 at 8:26 PM, <[email protected]>wrote: > Dear team, > > > > > > I am trying to load 25 million dataset (1.3 Gb) of sample call data onto > riak..its a 4-quad core ---1.5 TB storage 2-node raik cluster…takes > real 5671m12.812s.please suggest the solutions for the betterment of > the same…5671m12.812s is quite huge…we deal with bigdata and I need to > store and test 165 GB on the riak..if so I may take years for loading I > guess with the present scenario…loaded 165 GB on to mongodb and got the > results..for *comparative performance study of mongodb and riak db* …please > do assist me with the same . > > > > > > > > *using the following code for loading :* > > > > #!/usr/local/bin/escript > > main([Filename]) -> > > {ok, Data} = file:read_file(Filename), > > Lines = tl(re:split(Data, "\r?\n", [{return, binary},trim])), > > lists:foreach(fun(L) -> LS = re:split(L, ","), format_and_insert(LS) > end, Lines). > > > > format_and_insert(Line) -> > > JSON = > io_lib:format("{\"id\":\"~s\",\"phonenumber\":~s,\"callednumber\":~s,\"starttime\":~s,\"endtime\":~s,\"status\":~s}", > Line), > > Command = io_lib:format("curl -X PUT > http://10.232.5.169:8098/riak/CustCalls25m/~s -d '~s' -H 'content-type: > application/json'", [hd(Line),JSON]), > > io:format("Inserting: ~s~n", [hd(Line)]), > > os:cmd(Command). > > > > *[hadoop@CTSINGMRGTO data]$ time ./load_data25m CustCalls25m.csv >> > 25m.txt &* > > [3] 32354 > > > > > > [hadoop@CTSINGMRGTO data]$ > > *real 5671m12.812s* > > user 1725m31.862s > > sys 3074m42.135s > > [hadoop@CTSINGMRGTO data]$ > > > > [hadoop@CTSINGMRGTO data]$ tail -4 25m.txt > > Inserting: 24999997 > > Inserting: 24999998 > > Inserting: 24999999 > > *Inserting: 25000000* > > [hadoop@CTSINGMRGTO data]$ > > > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
