Jon, We dont have any MR jobs running at all :)) Here you go:
1>rpc:multicall(supervisor, count_children, [riak_pipe_builder_sup]). {[[{specs,1},{active,0},{supervisors,0},{workers,0}], [{specs,1},{active,0},{supervisors,0},{workers,0}], [{specs,1},{active,0},{supervisors,0},{workers,0}]], []} 2>rpc:multicall(supervisor, count_children, [riak_pipe_fitting_sup]). {[[{specs,1},{active,0},{supervisors,0},{workers,0}], [{specs,1},{active,0},{supervisors,0},{workers,0}], [{specs,1},{active,0},{supervisors,0},{workers,0}]], []} 3>rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_map]). {[0,0,0],[]} 4>rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_reduce]). {[0,0,0],[]} -Matthew On Tue, Apr 3, 2012 at 14:00, Jon Meredith <jmered...@basho.com> wrote: > Hi Matthew, > > Thanks for the logs. Do you know how many map/reduce jobs you run > concurrently? Is this a load-based issue or do you run into problems? > > Running this from the erlang console should give an idea of how many > pipes, fittings and JS VMs you are running concurrently ('riak attach' to > start, ^D to exit) > > rpc:multicall(supervisor, count_children, [riak_pipe_builder_sup]). > rpc:multicall(supervisor, count_children, [riak_pipe_fitting_sup]). > rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_map]). > rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_reduce]). > > For the two supervisors, add up the active counts, for the two managers, > that gives how many JS VMs are running per-node. > > If the number of concurrent requests is low, maybe we can find a higher > bandwidth way of investigating. > > Jon > > On Mon, Apr 2, 2012 at 4:21 PM, Matthew Tovbin <matt...@tovbin.com> wrote: > >> Unfortunately, yes, Jon. >> >> Here are my log files after setting 128/128 for 'map_js_vm_count' and >> 'reduce_js_vm_count' accordingly and restarting all the nodes (see >> attached app.config). >> >> If you need any other information provided or tests performed please let >> me know. >> >> *-Console.log:* >> 2012-04-02 15:16:56.848 [notice] >> <0.22138.162>@riak_kv_js_manager:blocking_dispatch:247 JS call failed: All >> VMs are busy. >> 2012-04-02 15:16:56.853 [error] <0.22121.162> gen_fsm <0.22121.162> in >> state wait_pipeline_shutdown terminated with reason: {sink_died,shutdown} >> 2012-04-02 15:16:56.860 [error] <0.22121.162> CRASH REPORT Process >> <0.22121.162> with 0 neighbours crashed with reason: {sink_died,shutdown} >> 2012-04-02 15:16:56.862 [error] <0.1451.0> Supervisor >> riak_pipe_builder_sup had child undefined started with >> {riak_pipe_builder,start_link,undefined} at <0.22121.162> exit with reason >> {sink_died,shutdown} in context child_terminated >> >> *-Error.log:* >> ** State machine <0.22121.162> terminating >> ** Last message in was >> {'DOWN',#Ref<0.0.33.48784>,process,<0.22119.162>,shutdown} >> ** When State == wait_pipeline_shutdown >> ** Data == >> {state,[{sink,{fitting,<0.22119.162>,#Ref<0.0.33.48768>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.21554.162>,#Ref<0.0.33.48768>,sink,undefined}}}],{pipe,<0.22121.162>,[{listkeys,{fitting,<0.22122.162>,#Ref<0.0.33.48768>,{chash,key_of},1}}],{fitting,<0.22119.162>,#Ref<0.0.33.48768>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},[{{fitting,<0.22122.162>,#Ref<0.0.33.48768>,{chash,key_of},1},#Ref<0.0.33.48787>}],#Ref<0.0.33.48784>} >> ** Reason for termination = >> ** {sink_died,shutdown} >> 2012-04-02 15:16:56 =CRASH REPORT==== >> crasher: >> initial call: riak_pipe_builder:init/1 >> pid: <0.22121.162> >> registered_name: [] >> exception exit: {sink_died,shutdown} >> in function gen_fsm:terminate/7 >> in call from proc_lib:init_p_do_apply/3 >> ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.1448.0>] >> messages: [{'DOWN',#Ref<0.0.33.48787>,process,<0.22122.162>,shutdown}] >> links: [<0.1451.0>] >> dictionary: >> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.33.48768>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.22119.162>,#Ref<0.0.33.48768>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.21554.162>,#Ref<0.0.33.48768>,sink,undefined}}}]},{fittings,[{{fitting,<0.22122.162>,#Ref<0.0.33.48768>,{chash,key_of},1},#Ref<0.0.33.48787>}]}]}] >> trap_exit: false >> status: running >> heap_size: 987 >> stack_size: 24 >> reductions: 251 >> neighbours: >> 2012-04-02 15:16:56 =SUPERVISOR REPORT==== >> Supervisor: {local,riak_pipe_builder_sup} >> Context: child_terminated >> Reason: {sink_died,shutdown} >> Offender: >> [{pid,<0.22121.162>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}] >> >> >> >> >> >> >> -Matthew >> >> >> >> On Mon, Apr 2, 2012 at 12:27, Jon Meredith <jmered...@basho.com> wrote: >> >>> Very strange. >>> >>> Are you seeing the >>> >>> 16:40:50.037 [notice] JS call failed: All VMs are busy. >>> >>> message in your console.log files on any of the nodes in your cluster? >>> >>> Jon >>> >>> >>> On Mon, Apr 2, 2012 at 11:48 AM, Matthew Tovbin <matt...@tovbin.com>wrote: >>> >>>> Jon, >>>> >>>> Thanks for the reply. I tried increasing the 'map_js_vm_count' and >>>> 'reduce_js_vm_count' on the cluster to the values you suggested 24/18 >>>> and even larger ones 32/24 and 128/128, but with no success - I still get >>>> the same error in the console log of the node. >>>> >>>> What are we missing?! >>>> >>>> -Matthew >>>> >>>> >>>> >>>> On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmered...@basho.com>wrote: >>>> >>>>> Hi list, >>>>> >>>>> I spend some time looking at this issue today. I suspect it is due to >>>>> all the Javascript VMs being busy inside Riak due to many parallel >>>>> MapReduce jobs. >>>>> >>>>> If you're seeing this issue please check for this message in the >>>>> console.log file >>>>> >>>>> 16:40:50.037 [notice] JS call failed: All VMs are busy. >>>>> >>>>> I've filed https://github.com/basho/riak_pipe/issues/44 to track >>>>> improvements to error reporting. >>>>> >>>>> If you are seeing the log message, the number of VMs allocated for map >>>>> and reduce tasks can be configured in the app.config - defaults are 8 for >>>>> map and 6 for reduce, increasing them will use more memory but should >>>>> reduce the incidence of the error message. >>>>> >>>>> {riak_kv, [ >>>>> ... >>>>> {map_js_vm_count, 24 }, >>>>> {reduce_js_vm_count, 18 }, >>>>> ...] >>>>> >>>>> If you were affected by it and changing this does not resolve your >>>>> issue, I'll keep digging. >>>>> >>>>> Cheers, Jon. >>>>> >>>>> On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin >>>>> <matt...@tovbin.com>wrote: >>>>> >>>>>> Guys, >>>>>> >>>>>> Any updates on the issue?! >>>>>> >>>>>> -Matthew >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin <matt...@tovbin.com>wrote: >>>>>> >>>>>>> Here is a log from one of the servers: >>>>>>> >>>>>>> ==> /mnt/dataraid/log/crash.log <== >>>>>>> 2012-03-13 18:24:44 =CRASH REPORT==== >>>>>>> crasher: >>>>>>> initial call: riak_pipe_builder:init/1 >>>>>>> pid: <0.20949.24> >>>>>>> registered_name: [] >>>>>>> exception exit: {sink_died,shutdown} >>>>>>> in function gen_fsm:terminate/7 >>>>>>> in call from proc_lib:init_p_do_apply/3 >>>>>>> ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>] >>>>>>> messages: >>>>>>> [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}] >>>>>>> links: [<0.175.0>] >>>>>>> dictionary: >>>>>>> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}] >>>>>>> trap_exit: false >>>>>>> status: running >>>>>>> heap_size: 987 >>>>>>> stack_size: 24 >>>>>>> reductions: 251 >>>>>>> neighbours: >>>>>>> 2012-03-13 18:24:44 =SUPERVISOR REPORT==== >>>>>>> Supervisor: {local,riak_pipe_builder_sup} >>>>>>> Context: child_terminated >>>>>>> Reason: {sink_died,shutdown} >>>>>>> Offender: >>>>>>> [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}] >>>>>>> >>>>>>> >>>>>>> ==> /mnt/dataraid/log/console.log <== >>>>>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process >>>>>>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown} >>>>>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor >>>>>>> riak_pipe_builder_sup had child undefined started with >>>>>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with >>>>>>> reason >>>>>>> {sink_died,shutdown} in context child_terminated >>>>>>> 2012-03-13 18:24:56.420 [info] >>>>>>> <0.71.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc >>>>>>> <0.32603.24> >>>>>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] >>>>>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}] >>>>>>> >>>>>>> >>>>>>> ==> /mnt/dataraid/log/erlang.log.1 <== >>>>>>> 18:24:44.634 [error] gen_fsm <0.20949.24> in state >>>>>>> wait_pipeline_shutdown terminated with reason: {sink_died,shutdown} >>>>>>> (riak@1)1> 18:24:44.641 [error] Pipe worker startup failed:fitting >>>>>>> was gone before startup >>>>>>> (riak@1)1> >>>>>>> 18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0 >>>>>>> neighbours crashed with reason: {sink_died,shutdown} >>>>>>> (riak@1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup >>>>>>> had child undefined started with >>>>>>> {riak_pipe_builder,start_link,undefined} >>>>>>> at <0.20949.24> exit with reason {sink_died,shutdown} in context >>>>>>> child_terminated >>>>>>> (riak@1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24> >>>>>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] >>>>>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}] >>>>>>> (riak@1)1> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ==> /mnt/dataraid/log/error.log <== >>>>>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process >>>>>>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown} >>>>>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor >>>>>>> riak_pipe_builder_sup had child undefined started with >>>>>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with >>>>>>> reason >>>>>>> {sink_died,shutdown} in context child_terminated >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -Matthew >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin <matt...@tovbin.com>wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I got the same problem today on 1.1.0. >>>>>>>> >>>>>>>> As suggested, I updated all the nodes to 1.1.1. The error remains >>>>>>>> the same: >>>>>>>> >>>>>>>> { stack: [Getter/Setter], >>>>>>>> arguments: undefined, >>>>>>>> type: undefined, >>>>>>>> message: 'HTTP error 500: >>>>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object >>>>>>>> ....."}}} >>>>>>>> >>>>>>>> >>>>>>>> -Matthew >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 13, 2012 at 06:48, Jon Meredith <jmered...@basho.com>wrote: >>>>>>>> >>>>>>>>> Hi Arne, >>>>>>>>> >>>>>>>>> Could you try with 1.1.1 please. We've fixed a number of bugs >>>>>>>>> with how MapReduce handles errors that could explain some of what you >>>>>>>>> are >>>>>>>>> seeing. >>>>>>>>> >>>>>>>>> Jon >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen < >>>>>>>>> arne+r...@claassen.net> wrote: >>>>>>>>> >>>>>>>>>> I'm running 1.1.0 built from source on Centos 6. I set up two >>>>>>>>>> nodes on localhost following the instructions on the basho site for >>>>>>>>>> testing >>>>>>>>>> and everything was going fine until i needed to fetch all items in a >>>>>>>>>> bucket. >>>>>>>>>> >>>>>>>>>> For this i was using node.js with riak-js and it's db.getAll >>>>>>>>>> call,which from the debug logs looks like it uses /mapred >>>>>>>>>> >>>>>>>>>> The bucket only has about 1200 items in it, each a json payload >>>>>>>>>> of about 3KB. Initially, it'd keep getting inconsistent results, >>>>>>>>>> sometimes >>>>>>>>>> the data would come back, sometimes data would be undefined. So i >>>>>>>>>> kept >>>>>>>>>> running it, trying to see if there was a pattern, adding more >>>>>>>>>> debugging >>>>>>>>>> code, etc. But still it would be about a request every couple of >>>>>>>>>> seconds. >>>>>>>>>> Then it just stopped working altogether and now i always get: >>>>>>>>>> >>>>>>>>>> { [Error: HTTP error 500: >>>>>>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}] >>>>>>>>>> statusCode: 500 } >>>>>>>>>> >>>>>>>>>> in the console. I haven't tried restarting riak, just because i'd >>>>>>>>>> rather find out what's happening than it going away and coming back >>>>>>>>>> later >>>>>>>>>> when i don't expect it. >>>>>>>>>> >>>>>>>>>> Any suggestions on how to troubleshoot this? I can still access >>>>>>>>>> items individually and i can even list the keys via ?keys=true, it's >>>>>>>>>> just >>>>>>>>>> he map reduce that seems to be busted. >>>>>>>>>> >>>>>>>>>> cheers, >>>>>>>>>> arne >>>>>>>>>> _______________________________________________ >>>>>>>>>> riak-users mailing list >>>>>>>>>> riak-users@lists.basho.com >>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jon Meredith >>>>>>>>> Platform Engineering Manager >>>>>>>>> Basho Technologies, Inc. >>>>>>>>> jmered...@basho.com >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> riak-users mailing list >>>>>>>>> riak-users@lists.basho.com >>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jon Meredith >>>>> Platform Engineering Manager >>>>> Basho Technologies, Inc. >>>>> jmered...@basho.com >>>>> >>>>> >>>> >>> >>> >>> -- >>> Jon Meredith >>> Platform Engineering Manager >>> Basho Technologies, Inc. >>> jmered...@basho.com >>> >>> >> > > > -- > Jon Meredith > Platform Engineering Manager > Basho Technologies, Inc. > jmered...@basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com