Hi, I'm testdriving Riak (1.0.1) using Bitcask and a lot of data (currently ~25 million documents). I've deployed Riak on three machines with a n_val of 3 (acutally, I left it at the default).
Soon after I started an import process, Riak crashed about every 6 million documents (sometimes more frequently) leaving no obvious cause in the logfiles. I've opened a ticket (Bug 1282 [1]), but maybe it's better to discuss it here since I'm not having much information on this. The only node that crashes is the one I'm adding the data to, the other two nodes didn't crash yet. The importer and the (crashing) Riak node are on the same machine and I'm currently using the HTTP Java client (before that, I was using the PBC Java client). It seems that the crashes occur after a long running gc alert in the logfiles, yet that may be unrelated (the memory usage on my machine does not go up). I'm running Riak on machines with 24GB of RAM, the bucket-name is about 10 chars long and the keys 20 chars. I expect about 200 million documents with roughly 20k of data, but currently I've only imported only 24 million. The first crash happened after 6 million documents. The nofile limit for Riak is 32000 on Linux Debian 6 with all updates installed. The capacity planning page tells me that I've enough RAM (recommendation: 3 nodes with 14GB of RAM). The bitcask directory has about 180GB of data and contains 344 files. I've tried switching to eleveldb thinking it might be a memory issue, but that used up more disk space than I have available. My migration plan was to install Riak on another machine, setup leveldb and tried to join the node running bitcask, disk consumption went from 200G to over a terrabyte during the process. I've upgraded to Riak 1.0.2, but the changelog does not mention anything related to that. What could I do to identify the problem? Are there any debugging switches I could turn on (I've recently activated the sasl_error_logger)? I'm thinking of activating the Heartbeat management in vm.args, but that wouldn't fix the root cause... . I've just restarted Riak using 1.0.2 and cleaned all logfiles. Up until now, crashes were frequent enough that I should be able to provide a set of logfiles on monday, but are there any obvious things I might have forgotten? Cheers, Michael 1: https://issues.basho.com/show_bug.cgi?id=1282 _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
