Thanks Christian. We do indeed use mapreduce but it's a fairly simple function: We retrieve a first object whose value is an array of at most 10 ids and then we fetch all the values for these 10 ids. However, this mapreduce job is quite rare (maybe 10 times a day at most at this point...) so I don't think that's our issue. I'll try to run the cluster without any call to that to see if that's better, but I'd be very surprised. Also, we were doing this already even before we allowed for multiple value and the cluster was stable back then. We do not do key listing or anything like that.
I'll try looking at the statistics too. Thanks, On Tue, May 14, 2013 at 11:50 AM, Christian Dahlqvist <[email protected]>wrote: > Hi Julien, > > The node appear to have crashed due to inability to allocate memory. How > are you accessing your data? Are you running any key listing or large > MapReduce jobs that could use up a lot of memory? > > In order to ensure that you are efficiently resolving siblings I would > recommend you monitor the statistics in Riak ( > http://docs.basho.com/riak/latest/cookbooks/Statistics-and-Monitoring/). > Specifically look at node_get_fsm_objsize_* and node_get_fsm_siblings_* > statistics in order to identify objects that are very large or have lots of > siblings. > > Best regards, > > Christian > > > > On 13 May 2013, at 16:44, Julien Genestoux <[email protected]> > wrote: > > Christian, All, > > Bad news: my laptop is completely dead. Good news: I have a new one, and > it's now fully operational (backups FTW!). > > The log files have finally been uploaded: > https://www.dropbox.com/s/j7l3lniu0wogu29/riak-died.tar.gz > > I have attached to that mail our config. > > The machine is a virtual Xen instance at Linode with 4GB of memory. I know > it's probably not the very best setup, but 1) we're on a budget and 2) we > assumed that would fit our needs quite well. > > Just to put things in more details. Initially we did not use allow_mult > and things worked out fine for a couple of days. As soon as we enabled > allow_mult, > we were not able to run the cluster for more then 5 hours without seeing > failing nodes, which is why I'm convinced we must be doing something wrong. > The question is: what? > > Thanks > > > On Sun, May 12, 2013 at 8:07 PM, Christian Dahlqvist > <[email protected]>wrote: > >> Hi Julien, >> >> I was not able to access the logs based on the link you provided. >> >> Could you please attach a copy of your app.config file so we can get a >> better understanding of the configuration of your cluster? Also, what is >> the specification of the machines in the cluster? >> >> How much data do you have in the cluster and how are you querying it? >> >> Best regards, >> >> Christian >> >> >> >> On 12 May 2013, at 19:11, Julien Genestoux <[email protected]> >> wrote: >> >> Hi, >> >> We are running a cluster of 5 servers, or at least trying to, because >> nodes seem to be dying 'randomly' >> without us knowing any reason why. We don't have a great Erlang guy >> aboard, and the error logs are not >> that verbose. >> So I've just .tgz the whole log directory and I was hoping somebody could >> give us a clue. >> It's there: https://www.dropbox.com/s/z9ezv0qlxgfhcyq/riak-died.tar.gz(might >> not be fully uploaded to dropbox yet!) >> >> I've looked at the archive and some people said their server was dying >> because some object's size was just >> too big to allocate the whole memory. Maybe that's what we're seeing? >> >> As one of our buckets is set with allow_mult, I am tempted to think that >> some object's size may be exploding. >> However, we do actually try to resolve conflicts in our code. Any idea >> how to confirm and then debug that we >> have an issue there? >> >> >> Thanks a lot for your precious help... >> >> Julien >> >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> > <app.config> > > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
