We have 6 node riak cluster.I simple custom erlang application for custom
MapReduce job.
We start MapReduce job using riak_kv_mrc_pipe pipe module,for example -
Query = [{map, {modfun,Mod,MapFun},[do_prereduce,{from,1}], false},{reduce,
{modfun,Mod,ReduceFun},[{reduce_phase_batch_size, 1000}], true}],
riak_kv_mrc_pipe:mapred({index,Bucket,Field,From,To},Query,Timeout).
But if one of the node down for along time. Response is unpredictable sometimes
it's return {ok,GoodResultList}, but sometimes {ok,[]}(empty list).
We trace riak_kv and riak_pipe and found too problem:
1. In riak_kv_pipe_index or in riak_kv_pipe_liskeys created fitting_spec this
nval always is 1.
2. Actual error is occurred in riak_pipe_vnode:remaining_preflist that retun
empty PrefList for some Hash(#fitting_spec.nval is 1). It use
riak_core_apl:get_primary_apl function.
But if we use old style map reduce,for example:
{ok,C} = riak:local_client(),
Me = self(),
Query = [{map, {modfun,Mod,MapFun},[do_prereduce,{from,1}],
false},{reduce, {modfun,Mod,ReduceFun},[{reduce_phase_batch_size, 1000}],
true}],
{ok, {ReqId,FlowPid}} = C:mapred_stream(Query,Me,Timeout),
{ok,_}=riak_kv_index_fsm_sup:start_index_fsm(zont_riak_connection:riak_node(),
[{raw, ReqId,FlowPid}, [Bucket, none,{range,Field,From,To},Timeout,mapred]]),
luke_flow:collect_output(ReqId, Timeout).
Query executed well. But problem is that do_prereduce and
{reduce_phase_batch_size, 1000} is ignored,that why execution is slow.
Can you make some recommendation? May be riak_pipe_vnode:remaining_preflist we
need use riak_core_apl:get_apl_ann or set #fitting_spec.nval to nval from out
Bucket props?
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com