Arun,
You are running out of RAM for the leveldb AAE. There are several ways to fix
that:
- reduce memory allocated to bitcask
- more memory per server
- more servers of same memory
- reduce the ring size from 64 to 8, and rebuild data within the cluster from
scratch
- lie to leveldb and give it a big than real memory setting in riak.conf:
leveldb.maximum_memory=8G
The key LOG lines are:
Options.total_leveldb_mem: 2,901,766,963 <-- this is the total memory
assigned to ALL of leveldb, but
only 20% of it goes to AAE vnodes
File cache size: 5833527 <-- the first vnode says, cool enough memory for me
Block cache size: 7930679 <-- ditto
... but as more vnodes start:
File cache size: 0 <-- things are just not going to work well
Block cache size: 0
There are no actual file system error messages in your LOG files. That
supports that the real problem is memory unhappiness.
Matthew
> On Feb 14, 2017, at 3:34 PM, Arun Rajagopalan <[email protected]>
> wrote:
>
> Hi Matthew, Magnus
>
> I have attached the log files for your review
>
> Thanks
> Arun
>
>
> On Tue, Feb 14, 2017 at 11:55 AM, Matthew Von-Maszewski <[email protected]
> <mailto:[email protected]>> wrote:
> Arun,
>
> The AAE code uses leveldb for its storage of anti-entropy data, no matter
> which backend holds the user data. Therefore the error below suggests
> corruption within leveldb files (which is not impossible, but becoming really
> rare except with bad hardware or full disks).
>
> Before wiping out the AAE directory, you should copy the LOG file within it.
> There are likely more useful error messages within that file ... maybe put
> the file in drop box or zip attach to a reply for us to review.
>
> Matthew
>
>> On Feb 14, 2017, at 10:42 AM, Magnus Kessler <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> On 14 February 2017 at 14:46, Arun Rajagopalan <[email protected]
>> <mailto:[email protected]>> wrote:
>> Hi Magnus
>>
>> RIAK crashes on startup when I have trucated bitcask file
>>
>> It also crashes when the AAE files are bad too I think. Example below
>>
>> 2017-02-13 21:18:30 =CRASH REPORT====
>> crasher:
>> initial call: riak_kv_index_hashtree:init/1
>> pid: <0.6037.0>
>> registered_name: []
>> exception exit: {{{badmatch,{error,{db_open,"Corruption: truncated
>> record at end of file"}}},[{hashtree,new_segment_
>> store,2,[{file,"src/hashtree.erl"},{line,675}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_h
>> ashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,610}]},{lists,foldl,3,[{file,"lists.erl"},{line,124
>> 8}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,474}]},{riak_kv_index_hashtree,
>> init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,268}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]}
>> ,{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line
>> ,328}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>> ancestors: [<0.715.0>,riak_core_vnode_sup,riak_core_sup,<0.160.0>]
>> messages: []
>> links: []
>> dictionary: []
>> trap_exit: false
>> status: running
>> heap_size: 1598
>> stack_size: 27
>> reductions: 889
>> neighbours:
>>
>>
>> Regards
>> Arun
>>
>>
>> Hi Arun,
>>
>> The crash log you provided shows that there is a corrupted file in the AAE
>> (anti_entropy) backend. Entries in console.log should have more information
>> about which partition is affected. Please post output from the affected node
>> at around 2017-02-13T21:18:30. As this is AAE data, it is safe to remove the
>> directory named after the affected partition from the active_entropy
>> directory before restarting the node. You may find that there is more than
>> one affected partition, the next of which will be encountered after the
>> attempted restart only. If this is the case, simply identify the next
>> partition in the same way and remove it, too, until the node starts up
>> successfully again.
>>
>> Is there a reason why the nodes aren't shut down in the regular way?
>>
>> Kind Regards,
>>
>> Magnus
>>
>>
>>
>> --
>> Magnus Kessler
>> Client Services Engineer
>> Basho Technologies Limited
>>
>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
>> _______________________________________________
>> riak-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>
>
> <aaeLOG.tar>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com