Hello John,
I'm not really an expert but looking at your crash log my first guess is that
an error occurs in the reduce part of the map/reduce of the job itself.
More specifically I think you need to examine the meaning of this bit in your
crash report:
> {details,{fitting_details,{fitting,<11882.27640.38>,#Ref<11882.0.6211.103965>,follow,1},{prereduce,0},riak_kv_w_reduce,{rct,#Fun<reduce_inputs.reduceStatsList.2>,none},{fitting,<11882.27639.38>,#Ref<11882.0.6211.103965>,<<103,28,147,16,123,67,248,114,104,204,9,54,33,62,81,41,129,84,203,83>>,1},[{sink,{fitting,<11882.24969.38>,#Ref<11882.0.6211.103965>,sink,undefined}},{log,sink},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}],64}}]}]
But like I said, it's just a hunch.
Cheers,
Erik Hoogeveen
On 25 jul 2012, at 20:11, John Roy wrote:
> Hi --
>
> I'm seeing an issue with timeouts for map/reduces. We're running erlang
> files via a curl command, as
> part of a haskell job. In the curl data we specify the timeout to be one
> hour (3,600,000 milliseconds --
> see the example below). However, the job crashes (times out) after well less
> than an hour
> (genarally 450-1000 seconds). See the sample crash below.
>
> Does anyone have an idea or insight on why that might occur? I've done some
> searching on the riak_kv
> code but haven't been able to trace the error through it yet.
>
> A sample mr (similar to this one) has 3398 keys which should map 278012 items
> to be reduced.
>
> We're using eleveldb back end with riak 1.1.1
>
> One other note, we can be running more than one mr over the same data
> simultaneously.
>
> Thanks for your help!
>
>
>
> the input to the curls is something like this:
> {
> "inputs" : {
> "bucket" : "data_bucket",
> "index" : "minute_int",
> "start" : 0,
> "end" : 900
> },
> "query" : [
> { "map" : {"language" : "erlang", "module" : "maps", "function" :
> "emitStatsFromList", "keep" : false } },
> { "reduce" : {"language" : "erlang", "module" : "reduces", "function"
> : "reduceStatsList", "keep" : true } }
> ],
> "timeout" : 3600000
> }
>
>
> Here's a (cleansed) crash log:
>
> 2012-07-24 08:01:44 =CRASH REPORT====
> crasher:
> initial call: riak_pipe_vnode_worker:init/1
> pid: <0.28552.254>
> registered_name: []
> exception exit:
> {timeout,{gen_server,call,[{riak_pipe_vnode_master,'[email protected]'},{return_vnode,{riak_vnode_req_v1,593735040165679310520246963290989976735222595584,{raw,#Ref<11882.0.6211.103965>,<0.28552.254>},{cmd_enqueue,{fitting,<11882.27639.38>,#Ref<11882.0.6211.103965>,<<103,28,147,16,123,67,248,114,104,204,9,54,33,62,81,41,129,84,203,83>>,1},{"dummykey",1},infinity,[{593735040165679310520246963290989976735222595584,'[email protected]'}]}}}]}}
> in function gen_fsm:terminate/7
> in call from proc_lib:init_p_do_apply/3
> ancestors: [<0.348.0>,<0.347.0>,riak_core_vnode_sup,riak_core_sup,<0.89.0>]
> messages: []
> links: [<0.348.0>,<0.347.0>]
> dictionary:
> [{eunit,[{module,riak_pipe_vnode_worker},{partition,662242929415565384811044689824565743281594433536},{<0.347.0>,<0.347.0>},{details,{fitting_details,{fitting,<11882.27640.38>,#Ref<11882.0.6211.103965>,follow,1},{prereduce,0},riak_kv_w_reduce,{rct,#Fun<reduce_inputs.reduceStatsList.2>,none},{fitting,<11882.27639.38>,#Ref<11882.0.6211.103965>,<<103,28,147,16,123,67,248,114,104,204,9,54,33,62,81,41,129,84,203,83>>,1},[{sink,{fitting,<11882.24969.38>,#Ref<11882.0.6211.103965>,sink,undefined}},{log,sink},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}],64}}]}]
> trap_exit: false
> status: running
> heap_size: 317811
> stack_size: 24
> reductions: 14770819
> neighbours:
> 2012-07-24 08:01:44 =SUPERVISOR REPORT====
> Supervisor: {<0.348.0>,riak_pipe_vnode_worker_sup}
> Context: child_terminated
> Reason:
> {timeout,{gen_server,call,[{riak_pipe_vnode_master,'[email protected]'},{return_vnode,{riak_vnode_req_v1,593735040165679310520246963290989976735222595584,{raw,#Ref<11882.0.6211.103965>,<0.28552.254>},{cmd_enqueue,{fitting,<11882.27639.38>,#Ref<11882.0.6211.103965>,<<103,28,147,16,123,67,248,114,104,204,9,54,33,62,81,41,129,84,203,83>>,1},{"dummykey",1},infinity,[{593735040165679310520246963290989976735222595584,'[email protected]'}]}}}]}}
> Offender:
> [{pid,<0.28552.254>},{name,undefined},{mfargs,{riak_pipe_vnode_worker,start_link,undefined}},{restart_type,temporary},{shutdown,2000},{child_type,worker}]
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com