Erlang: R13B04
Riak: 0.14.2
I am having the same issue as Jeremy.
I just did 208 MapReduce jobs using anonymous JavaScript functions in the map
and reduce phases. I am sending the MapReduce jobs to a single node, riak01.
Out of the 208 jobs, I got two "mapexec_error" {error,timeout} on riak02.
I read on the basho wiki that the default timeout is 60 seconds.
http://wiki.basho.com/Loading-Data-and-Running-MapReduce-Queries.html
Map/Reduce queries have a default timeout of 60000 milliseconds (60 seconds).
I have discovered that if a MapReduce job does not complete within 10 seconds,
then it likely is having issues. Most MapReduce jobs complete in one to two
seconds. I can try increasing the MapReduce timeout to 120 seconds, but I
doubt that this will help.
I have discovered that if there are several timeouts, then the beam process can
terminate.
Any help would be appreciated.
The following is from the sals-error.log on riak01.
=ERROR REPORT==== 21-Jun-2011::16:29:11 ===
** State machine <0.11130.0> terminating
** Last event in was {mapexec_error,{<<"46">>,'[email protected]'},
{error,timeout}}
** When State == executing
** Data == {state,0,riak_kv_map_phase,
{state,true,
{javascript,
{map,
{jsanon,
.................
=ERROR REPORT==== 21-Jun-2011::16:29:11 ===
** State machine <0.11127.0> terminating
** Last message in was {'EXIT',<0.11130.0>,{error,timeout}}
** When State == executing
** Data == {state,41465578,
[<0.11130.0>,[<0.11129.0>,<0.11128.0>]],
<0.10971.0>,66000,
{1308688220159363,#Ref<0.0.0.198634>},
#Fun<riak_kv_mapred_json.jsonify_not_found.1>,[],[]}
** Reason for termination =
** {error,{phase_error,{error,timeout}}}
=CRASH REPORT==== 21-Jun-2011::16:29:11 ===
crasher:
initial call: luke_flow:init/1
pid: <0.11127.0>
registered_name: []
exception exit: {error,{phase_error,{error,timeout}}}
in function gen_fsm:terminate/7
in call from proc_lib:init_p_do_apply/3
ancestors: [luke_flow_sup,luke_sup,<0.91.0>]
messages: []
links: [<0.11128.0>,<0.11129.0>,<0.93.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 23099
neighbours:
neighbour:
[{pid,<0.11129.0>},{registered_name,[]},{initial_call,{luke_phase,init,[Argument__1]}},{current_function,{gen_fsm,loop,7}},{ancestors,[luke_phase_sup,luke_sup,<0.
91.0>]},{messages,[]},{links,[<0.11127.0>,<0.11128.0>,<0.94.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,6765},{stack_size,10},{reductions,4926}]
neighbour:
[{pid,<0.11128.0>},{registered_name,[]},{initial_call,{luke_phase,init,[Argument__1]}},{current_function,{gen_fsm,loop,7}},{ancestors,[luke_phase_sup,luke_sup,<0.
91.0>]},{messages,[]},{links,[<0.11127.0>,<0.11129.0>,<0.94.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,4181},{stack_size,10},{reductions,4905}]
The second timeout error:
=ERROR REPORT==== 21-Jun-2011::16:31:10 ===
** State machine <0.15144.0> terminating
** Last message in was flow_timeout
** When State == executing
** Data == {state,78575179,
[<0.15147.0>,[<0.15146.0>,<0.15145.0>]],
<0.15118.0>,66000,
{1308688285727293,#Ref<0.0.1.11874>},
#Fun<riak_kv_mapred_json.jsonify_not_found.1>,[],[]}
** Reason for termination =
** {error,flow_timeout}
=CRASH REPORT==== 21-Jun-2011::16:31:10 ===
crasher:
initial call: luke_flow:init/1
pid: <0.15144.0>
registered_name: []
exception exit: {error,flow_timeout}
in function gen_fsm:terminate/7
in call from proc_lib:init_p_do_apply/3
ancestors: [luke_flow_sup,luke_sup,<0.91.0>]
messages: []
links: [<0.15145.0>,<0.15147.0>,<0.15146.0>,<0.93.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 20791
neighbours:
neighbour:
[{pid,<0.15146.0>},{registered_name,[]},{initial_call,{luke_phase,init,[Argument__1]}},{current_function,{gen_fsm,loop,7}},{ancestors,[luke_phase_sup,luke_sup,<0.
91.0>]},{messages,[]},{links,[<0.15144.0>,<0.15145.0>,<0.94.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,4181},{stack_size,10},{reductions,6554}]
neighbour:
[{pid,<0.15145.0>},{registered_name,[]},{initial_call,{luke_phase,init,[Argument__1]}},{current_function,{gen_fsm,loop,7}},{ancestors,[luke_phase_sup,luke_sup,<0.
91.0>]},{messages,[]},{links,[<0.15144.0>,<0.15146.0>,<0.94.0>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,4181},{stack_size,10},{reductions,6274}]
=SUPERVISOR REPORT==== 21-Jun-2011::16:31:10 ===
Supervisor: {local,luke_flow_sup}
Context: child_terminated
Reason: {error,flow_timeout}
Offender:
[{pid,<0.15144.0>},{name,undefined},{mfa,{luke_flow,start_link,[<0.15118.0>,78575179,[{riak_kv_map_phase,[],[{javascript,{map,{jsanon,<<"function(value,keyData,
................
David
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com