Jon, I even restarted all of the nodes in the cluster and it's still zero.
-Matthew On Tue, Apr 3, 2012 at 16:56, Jon Meredith <jmered...@basho.com> wrote: > That's very interesting, you don't have any Javascript VMs running. Here's > an example on a single node system after restart - I've configured it with > 2 map VMs and 3 reduce VMs just to get easy numbers. > > (dev1@127.0.0.1)1> > rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_map]). > {[2],[]} > (dev1@127.0.0.1)2> > (dev1@127.0.0.1)2> > rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_reduce]). > {[3],[]} > > Not having any VMs available would explain your preflist exhausted issue. > > Could you restart one of the nodes and check the count, preferably before > running any JS MapReduce jobs? Then if they are still zero we'll have to > work out why they aren't starting up. > > Cheers, Jon. > > > On Tue, Apr 3, 2012 at 5:10 PM, Matthew Tovbin <matt...@tovbin.com> wrote: > >> Jon, >> >> We dont have any MR jobs running at all :)) Here you go: >> >> 1>rpc:multicall(supervisor, count_children, [riak_pipe_builder_sup]). >> {[[{specs,1},{active,0},{supervisors,0},{workers,0}], >> [{specs,1},{active,0},{supervisors,0},{workers,0}], >> [{specs,1},{active,0},{supervisors,0},{workers,0}]], >> []} >> 2>rpc:multicall(supervisor, count_children, [riak_pipe_fitting_sup]). >> {[[{specs,1},{active,0},{supervisors,0},{workers,0}], >> [{specs,1},{active,0},{supervisors,0},{workers,0}], >> [{specs,1},{active,0},{supervisors,0},{workers,0}]], >> []} >> 3>rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_map]). >> {[0,0,0],[]} >> 4>rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_reduce]). >> {[0,0,0],[]} >> >> -Matthew >> >> >> >> On Tue, Apr 3, 2012 at 14:00, Jon Meredith <jmered...@basho.com> wrote: >> >>> Hi Matthew, >>> >>> Thanks for the logs. Do you know how many map/reduce jobs you run >>> concurrently? Is this a load-based issue or do you run into problems? >>> >>> Running this from the erlang console should give an idea of how many >>> pipes, fittings and JS VMs you are running concurrently ('riak attach' to >>> start, ^D to exit) >>> >>> rpc:multicall(supervisor, count_children, [riak_pipe_builder_sup]). >>> rpc:multicall(supervisor, count_children, [riak_pipe_fitting_sup]). >>> rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_map]). >>> rpc:multicall(riak_kv_js_manager,pool_size,[riak_kv_js_reduce]). >>> >>> For the two supervisors, add up the active counts, for the two managers, >>> that gives how many JS VMs are running per-node. >>> >>> If the number of concurrent requests is low, maybe we can find a higher >>> bandwidth way of investigating. >>> >>> Jon >>> >>> On Mon, Apr 2, 2012 at 4:21 PM, Matthew Tovbin <matt...@tovbin.com>wrote: >>> >>>> Unfortunately, yes, Jon. >>>> >>>> Here are my log files after setting 128/128 for 'map_js_vm_count' and >>>> 'reduce_js_vm_count' accordingly and restarting all the nodes (see >>>> attached app.config). >>>> >>>> If you need any other information provided or tests performed please >>>> let me know. >>>> >>>> *-Console.log:* >>>> 2012-04-02 15:16:56.848 [notice] >>>> <0.22138.162>@riak_kv_js_manager:blocking_dispatch:247 JS call failed: All >>>> VMs are busy. >>>> 2012-04-02 15:16:56.853 [error] <0.22121.162> gen_fsm <0.22121.162> in >>>> state wait_pipeline_shutdown terminated with reason: {sink_died,shutdown} >>>> 2012-04-02 15:16:56.860 [error] <0.22121.162> CRASH REPORT Process >>>> <0.22121.162> with 0 neighbours crashed with reason: {sink_died,shutdown} >>>> 2012-04-02 15:16:56.862 [error] <0.1451.0> Supervisor >>>> riak_pipe_builder_sup had child undefined started with >>>> {riak_pipe_builder,start_link,undefined} at <0.22121.162> exit with reason >>>> {sink_died,shutdown} in context child_terminated >>>> >>>> *-Error.log:* >>>> ** State machine <0.22121.162> terminating >>>> ** Last message in was >>>> {'DOWN',#Ref<0.0.33.48784>,process,<0.22119.162>,shutdown} >>>> ** When State == wait_pipeline_shutdown >>>> ** Data == >>>> {state,[{sink,{fitting,<0.22119.162>,#Ref<0.0.33.48768>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.21554.162>,#Ref<0.0.33.48768>,sink,undefined}}}],{pipe,<0.22121.162>,[{listkeys,{fitting,<0.22122.162>,#Ref<0.0.33.48768>,{chash,key_of},1}}],{fitting,<0.22119.162>,#Ref<0.0.33.48768>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},[{{fitting,<0.22122.162>,#Ref<0.0.33.48768>,{chash,key_of},1},#Ref<0.0.33.48787>}],#Ref<0.0.33.48784>} >>>> ** Reason for termination = >>>> ** {sink_died,shutdown} >>>> 2012-04-02 15:16:56 =CRASH REPORT==== >>>> crasher: >>>> initial call: riak_pipe_builder:init/1 >>>> pid: <0.22121.162> >>>> registered_name: [] >>>> exception exit: {sink_died,shutdown} >>>> in function gen_fsm:terminate/7 >>>> in call from proc_lib:init_p_do_apply/3 >>>> ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.1448.0>] >>>> messages: >>>> [{'DOWN',#Ref<0.0.33.48787>,process,<0.22122.162>,shutdown}] >>>> links: [<0.1451.0>] >>>> dictionary: >>>> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.33.48768>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.22119.162>,#Ref<0.0.33.48768>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.21554.162>,#Ref<0.0.33.48768>,sink,undefined}}}]},{fittings,[{{fitting,<0.22122.162>,#Ref<0.0.33.48768>,{chash,key_of},1},#Ref<0.0.33.48787>}]}]}] >>>> trap_exit: false >>>> status: running >>>> heap_size: 987 >>>> stack_size: 24 >>>> reductions: 251 >>>> neighbours: >>>> 2012-04-02 15:16:56 =SUPERVISOR REPORT==== >>>> Supervisor: {local,riak_pipe_builder_sup} >>>> Context: child_terminated >>>> Reason: {sink_died,shutdown} >>>> Offender: >>>> [{pid,<0.22121.162>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}] >>>> >>>> >>>> >>>> >>>> >>>> >>>> -Matthew >>>> >>>> >>>> >>>> On Mon, Apr 2, 2012 at 12:27, Jon Meredith <jmered...@basho.com> wrote: >>>> >>>>> Very strange. >>>>> >>>>> Are you seeing the >>>>> >>>>> 16:40:50.037 [notice] JS call failed: All VMs are busy. >>>>> >>>>> message in your console.log files on any of the nodes in your cluster? >>>>> >>>>> Jon >>>>> >>>>> >>>>> On Mon, Apr 2, 2012 at 11:48 AM, Matthew Tovbin <matt...@tovbin.com>wrote: >>>>> >>>>>> Jon, >>>>>> >>>>>> Thanks for the reply. I tried increasing the 'map_js_vm_count' and >>>>>> 'reduce_js_vm_count' on the cluster to the values you suggested >>>>>> 24/18 and even larger ones 32/24 and 128/128, but with no success - I >>>>>> still >>>>>> get the same error in the console log of the node. >>>>>> >>>>>> What are we missing?! >>>>>> >>>>>> -Matthew >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmered...@basho.com>wrote: >>>>>> >>>>>>> Hi list, >>>>>>> >>>>>>> I spend some time looking at this issue today. I suspect it is due >>>>>>> to all the Javascript VMs being busy inside Riak due to many parallel >>>>>>> MapReduce jobs. >>>>>>> >>>>>>> If you're seeing this issue please check for this message in the >>>>>>> console.log file >>>>>>> >>>>>>> 16:40:50.037 [notice] JS call failed: All VMs are busy. >>>>>>> >>>>>>> I've filed https://github.com/basho/riak_pipe/issues/44 to track >>>>>>> improvements to error reporting. >>>>>>> >>>>>>> If you are seeing the log message, the number of VMs allocated for >>>>>>> map and reduce tasks can be configured in the app.config - defaults are >>>>>>> 8 >>>>>>> for map and 6 for reduce, increasing them will use more memory but >>>>>>> should >>>>>>> reduce the incidence of the error message. >>>>>>> >>>>>>> {riak_kv, [ >>>>>>> ... >>>>>>> {map_js_vm_count, 24 }, >>>>>>> {reduce_js_vm_count, 18 }, >>>>>>> ...] >>>>>>> >>>>>>> If you were affected by it and changing this does not resolve your >>>>>>> issue, I'll keep digging. >>>>>>> >>>>>>> Cheers, Jon. >>>>>>> >>>>>>> On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin <matt...@tovbin.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Guys, >>>>>>>> >>>>>>>> Any updates on the issue?! >>>>>>>> >>>>>>>> -Matthew >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin >>>>>>>> <matt...@tovbin.com>wrote: >>>>>>>> >>>>>>>>> Here is a log from one of the servers: >>>>>>>>> >>>>>>>>> ==> /mnt/dataraid/log/crash.log <== >>>>>>>>> 2012-03-13 18:24:44 =CRASH REPORT==== >>>>>>>>> crasher: >>>>>>>>> initial call: riak_pipe_builder:init/1 >>>>>>>>> pid: <0.20949.24> >>>>>>>>> registered_name: [] >>>>>>>>> exception exit: {sink_died,shutdown} >>>>>>>>> in function gen_fsm:terminate/7 >>>>>>>>> in call from proc_lib:init_p_do_apply/3 >>>>>>>>> ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>] >>>>>>>>> messages: >>>>>>>>> [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}] >>>>>>>>> links: [<0.175.0>] >>>>>>>>> dictionary: >>>>>>>>> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}] >>>>>>>>> trap_exit: false >>>>>>>>> status: running >>>>>>>>> heap_size: 987 >>>>>>>>> stack_size: 24 >>>>>>>>> reductions: 251 >>>>>>>>> neighbours: >>>>>>>>> 2012-03-13 18:24:44 =SUPERVISOR REPORT==== >>>>>>>>> Supervisor: {local,riak_pipe_builder_sup} >>>>>>>>> Context: child_terminated >>>>>>>>> Reason: {sink_died,shutdown} >>>>>>>>> Offender: >>>>>>>>> [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}] >>>>>>>>> >>>>>>>>> >>>>>>>>> ==> /mnt/dataraid/log/console.log <== >>>>>>>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process >>>>>>>>> <0.20949.24> with 0 neighbours crashed with reason: >>>>>>>>> {sink_died,shutdown} >>>>>>>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor >>>>>>>>> riak_pipe_builder_sup had child undefined started with >>>>>>>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with >>>>>>>>> reason >>>>>>>>> {sink_died,shutdown} in context child_terminated >>>>>>>>> 2012-03-13 18:24:56.420 [info] >>>>>>>>> <0.71.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc >>>>>>>>> <0.32603.24> >>>>>>>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] >>>>>>>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}] >>>>>>>>> >>>>>>>>> >>>>>>>>> ==> /mnt/dataraid/log/erlang.log.1 <== >>>>>>>>> 18:24:44.634 [error] gen_fsm <0.20949.24> in state >>>>>>>>> wait_pipeline_shutdown terminated with reason: {sink_died,shutdown} >>>>>>>>> (riak@1)1> 18:24:44.641 [error] Pipe worker startup >>>>>>>>> failed:fitting was gone before startup >>>>>>>>> (riak@1)1> >>>>>>>>> 18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0 >>>>>>>>> neighbours crashed with reason: {sink_died,shutdown} >>>>>>>>> (riak@1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup >>>>>>>>> had child undefined started with >>>>>>>>> {riak_pipe_builder,start_link,undefined} >>>>>>>>> at <0.20949.24> exit with reason {sink_died,shutdown} in context >>>>>>>>> child_terminated >>>>>>>>> (riak@1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24> >>>>>>>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] >>>>>>>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}] >>>>>>>>> (riak@1)1> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ==> /mnt/dataraid/log/error.log <== >>>>>>>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process >>>>>>>>> <0.20949.24> with 0 neighbours crashed with reason: >>>>>>>>> {sink_died,shutdown} >>>>>>>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor >>>>>>>>> riak_pipe_builder_sup had child undefined started with >>>>>>>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with >>>>>>>>> reason >>>>>>>>> {sink_died,shutdown} in context child_terminated >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthew >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin >>>>>>>>> <matt...@tovbin.com>wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I got the same problem today on 1.1.0. >>>>>>>>>> >>>>>>>>>> As suggested, I updated all the nodes to 1.1.1. The error remains >>>>>>>>>> the same: >>>>>>>>>> >>>>>>>>>> { stack: [Getter/Setter], >>>>>>>>>> arguments: undefined, >>>>>>>>>> type: undefined, >>>>>>>>>> message: 'HTTP error 500: >>>>>>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object >>>>>>>>>> ....."}}} >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthew >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 13, 2012 at 06:48, Jon Meredith >>>>>>>>>> <jmered...@basho.com>wrote: >>>>>>>>>> >>>>>>>>>>> Hi Arne, >>>>>>>>>>> >>>>>>>>>>> Could you try with 1.1.1 please. We've fixed a number of bugs >>>>>>>>>>> with how MapReduce handles errors that could explain some of what >>>>>>>>>>> you are >>>>>>>>>>> seeing. >>>>>>>>>>> >>>>>>>>>>> Jon >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen < >>>>>>>>>>> arne+r...@claassen.net> wrote: >>>>>>>>>>> >>>>>>>>>>>> I'm running 1.1.0 built from source on Centos 6. I set up two >>>>>>>>>>>> nodes on localhost following the instructions on the basho site >>>>>>>>>>>> for testing >>>>>>>>>>>> and everything was going fine until i needed to fetch all items in >>>>>>>>>>>> a bucket. >>>>>>>>>>>> >>>>>>>>>>>> For this i was using node.js with riak-js and it's db.getAll >>>>>>>>>>>> call,which from the debug logs looks like it uses /mapred >>>>>>>>>>>> >>>>>>>>>>>> The bucket only has about 1200 items in it, each a json payload >>>>>>>>>>>> of about 3KB. Initially, it'd keep getting inconsistent results, >>>>>>>>>>>> sometimes >>>>>>>>>>>> the data would come back, sometimes data would be undefined. So i >>>>>>>>>>>> kept >>>>>>>>>>>> running it, trying to see if there was a pattern, adding more >>>>>>>>>>>> debugging >>>>>>>>>>>> code, etc. But still it would be about a request every couple of >>>>>>>>>>>> seconds. >>>>>>>>>>>> Then it just stopped working altogether and now i always get: >>>>>>>>>>>> >>>>>>>>>>>> { [Error: HTTP error 500: >>>>>>>>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}] >>>>>>>>>>>> statusCode: 500 } >>>>>>>>>>>> >>>>>>>>>>>> in the console. I haven't tried restarting riak, just because >>>>>>>>>>>> i'd rather find out what's happening than it going away and coming >>>>>>>>>>>> back >>>>>>>>>>>> later when i don't expect it. >>>>>>>>>>>> >>>>>>>>>>>> Any suggestions on how to troubleshoot this? I can still access >>>>>>>>>>>> items individually and i can even list the keys via ?keys=true, >>>>>>>>>>>> it's just >>>>>>>>>>>> he map reduce that seems to be busted. >>>>>>>>>>>> >>>>>>>>>>>> cheers, >>>>>>>>>>>> arne >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> riak-users mailing list >>>>>>>>>>>> riak-users@lists.basho.com >>>>>>>>>>>> >>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jon Meredith >>>>>>>>>>> Platform Engineering Manager >>>>>>>>>>> Basho Technologies, Inc. >>>>>>>>>>> jmered...@basho.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> riak-users mailing list >>>>>>>>>>> riak-users@lists.basho.com >>>>>>>>>>> >>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jon Meredith >>>>>>> Platform Engineering Manager >>>>>>> Basho Technologies, Inc. >>>>>>> jmered...@basho.com >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jon Meredith >>>>> Platform Engineering Manager >>>>> Basho Technologies, Inc. >>>>> jmered...@basho.com >>>>> >>>>> >>>> >>> >>> >>> -- >>> Jon Meredith >>> Platform Engineering Manager >>> Basho Technologies, Inc. >>> jmered...@basho.com >>> >>> >> > > > -- > Jon Meredith > Platform Engineering Manager > Basho Technologies, Inc. > jmered...@basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com