Hi Charles
We have a slightly different issue to yours in the majority of our requests
succeed and only the odd one fails - or is that what you are observing on your
cluster?
I’ve been talking about the issue off-list with Luke and Kelly. Luke took a
look at some of our debug logs for Riak and suspects that we had over-committed
the resources of our cluster. I’ve modified the configuration of our cluster as
per his recommendations:
{multi_backend, [
{be_default, riak_kv_eleveldb_backend, [
{max_open_files, 14},
{cache_size, 4194304},
{data_root, "/var/db/riak/leveldb"}
]},
{be_blocks, riak_kv_bitcask_backend, [
{data_root, "/var/db/riak/bitcask"}
]}
]},
{anti_entropy, {off, []}},
Our environment is a 4 node cluster with 4GB of RAM each running on SmartOS
(from Joyent). I applied these changes, but while things improved I still
encountered the odd failure. I also de-activated n_val_1_get_requests and I
haven’t been able to reproduce the issues that I was encountering previously.
Thanks,
Dave
> On 22 Jul 2014, at 9:01 pm, Charles Bijon <[email protected]> wrote:
>
> Hi,
>
> We have the same issue there. But we have 45 riak/riak-cs nodes in
> production. Do you have any idea to correct it ?
>
> Regards,
>
> Charles
>
>
> Le 17/07/2014 23:21, Dave Finster a écrit :
>> Hi Kelly
>>
>> 1.4.5 - Riak CS
>> 1.4.8 - Riak
>> Anti Entropy is on (all nodes)
>>
>> Deactivating n_val_1_get_requests still allows me to cause the issue (with
>> less occurrence), however a different error has cropped up now:
>>
>> 2014-07-17 21:15:38 =ERROR REPORT====
>> webmachine error: path="/buckets/<bucket
>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png"
>> {exit,{{{{case_clause,{error,timeout}},[{riak_cs_manifest_fsm,handle_get_manifests,1,[{file,"src/riak_cs_manifest_fsm.erl"},{line,265}]},{riak_cs_manifest_fsm,waiting_command,3,[{file,"src/riak_cs_manifest_fsm.erl"},{line,201}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},{gen_fsm,sync_send_event,[<0.1383.0>,get_manifests,infinity]}},{gen_fsm,sync_send_event,[<0.1382.0>,get_manifest,infinity]}},[{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,ensure_doc,2,[{file,"src/riak_cs_wm_utils.erl"},{line,236}]},{riak_cs_wm_object,authorize,2,[{file,"src/riak_cs_wm_object.erl"},{line,64}]},{riak_cs_wm_common,authorize,2,[{file,"src/riak_cs_wm_common.erl"},{line,396}]},{riak_cs_wm_common,forbidden,2,[{file,"src/riak_cs_wm_common.erl"},{line,182}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,
>> [{
>> file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}
>> [{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,ensure_doc,2,[{file,"src/riak_cs_wm_utils.erl"},{line,236}]},{riak_cs_wm_object,authorize,2,[{file,"src/riak_cs_wm_object.erl"},{line,64}]},{riak_cs_wm_common,authorize,2,[{file,"src/riak_cs_wm_common.erl"},{line,396}]},{riak_cs_wm_common,forbidden,2,[{file,"src/riak_cs_wm_common.erl"},{line,182}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>>
>> Thanks,
>> Dave
>>
>>> On 18 Jul 2014, at 2:19 am, Kelly McLaughlin <[email protected]> wrote:
>>>
>>> Dave,
>>>
>>> Can you tell me what versions of Riak and Riak CS you have installed? Do
>>> you have AAE enabled or disabled? It’s tough to come up with an explanation
>>> without more information, but I would try setting n_val_1_get_requests to
>>> false and see if you continue to experience the problem. My guess is that
>>> will resolve the issue, but let me know what happens.
>>>
>>> Kelly
>>>
>>> On July 17, 2014 at 1:00:19 AM, Dave Finster ([email protected]) wrote:
>>>
>>>> Hi Everyone
>>>>
>>>> Spent a bit of time trying to debug this one and not sure were to from
>>>> here. The use case that appears to cause this breakage is a web page that
>>>> links to 8 x 10MB images and it attempts to fetch them simultaneously.
>>>>
>>>> Occasionally, one or two of the images will just fail to load, while other
>>>> times they all work file. I’ve tracked it down to the crash below. It
>>>> isn’t always the same image. To make the problem more repeatable, I forced
>>>> our load balancer into only using a single Riak-CS node, so it will be
>>>> getting hit with all the requests. We are using HAProxy out the front and
>>>> are running SmartOS 64-bit images across the board.
>>>>
>>>> arekinath helped me look into it and one thought was that I was hit by the
>>>> AAE bug prior to 1.4.8, but even clearing the AAE made no difference. The
>>>> n-val on the buckets is 3 and its a 4-node cluster. All 4 nodes have both
>>>> a Riak and a Riak-CS node on it. I also have pb_backlog turned up to 256,
>>>> n_val_1_get_requests set to true and fold_objects_for_list_keys set to
>>>> true. ‘ring-status’ shows that the whole ring is reachable.
>>>>
>>>> Any idea on how to diagnose this one further?
>>>>
>>>> 2014-07-17 06:38:54 =CRASH REPORT====
>>>> crasher:
>>>> initial call: mochiweb_acceptor:init/3
>>>> pid: <0.26119.1>
>>>> registered_name: []
>>>> exception exit:
>>>> {{normal,{gen_fsm,sync_send_event,[<0.27617.1>,get_next_chunk,infinity]}},[{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,streaming_get,4,[{file,"src/riak_cs_wm_utils.erl"},{line,272}]},{webmachine_decision_core,'-make_encoder_stream/3-fun-0-',3,[{file,"src/webmachine_decision_core.erl"},{line,667}]},{webmachine_request,send_stream_body_no_chunk,2,[{file,"src/webmachine_request.erl"},{line,334}]},{webmachine_request,send_response,3,[{file,"src/webmachine_request.erl"},{line,398}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,251}]},{webmachine_decision_core,wrcall,1,[{file,"src/webmachine_decision_core.erl"},{line,42}]},{webmachine_decision_core,finish_response,3,[{file,"src/webmachine_decision_core.erl"},{line,92}]}]}
>>>> ancestors: [object_web_mochiweb,riak_cs_sup,<0.143.0>]
>>>> messages: []
>>>> links: [<0.298.0>,#Port<0.12015>]
>>>> dictionary:
>>>> [{reqstate,{wm_reqstate,#Port<0.12015>,[{'content-encoding',"identity"},{'content-type',"application/octet-stream"},{resource_module,riak_cs_wm_object}],undefined,"10.4.242.1",{wm_reqdata,'GET',http,{1,1},"10.4.242.1",undefined,[],"/buckets/<the
>>>> bucket
>>>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png","/buckets/<the
>>>> bucket
>>>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png?Signature=U0By3mIwaRIVBHNcYhSt6r5QgPk%3D&Expires=1405580057&AWSAccessKeyId=DGTXHHWIEDF4XUBSBYVI",[{bucket,"<the
>>>> bucket
>>>> name>"},{object,"bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png"}],[],"../../../..",{200,undefined},1073741824,67108864,[{"_ga","GA1.3.643660316.1404789703"}],[{"Signature","U0By3mIwaRIVBHNcYhSt6r5QgPk="},{"Expires","1405580057"},{"AWSAccessKeyId","DGTXHHWIEDF4XUBSBYVI"}],{9,{"cookie",{'Cookie',"_ga=GA1.3.643660316.1404789703"},{"accept-language",{'Accept-Language',"en-US,en;q=0.8"},{"accept-encoding",{'Accept-Encoding',"gzip,deflate,sdch"},{"accept",{'Accept',"image/webp,*/*;q=0.8"},nil,nil},nil},{"connection",{'Connection',"keep-alive"},nil,nil}},{"referer",{'Referer’,”<the
>>>> referrer>"},{"host",{'Host’,”<our riak-cs host
>>>> name>"},nil,nil},{"user-agent",{'User-Agent',"Mozilla/5.0 (Macintosh;
>>>> Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko)
>>>> Chrome/35.0.1916.153
>>>> Safari/537.36"},nil,{"x-rcs-rewrite-path",{"x-rcs-rewrite-path","/<the
>>>> bucket
>>>> name>/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f/847c340cfe2f44028d6fd5606f696796/Attachment-1.png?AWSAccessKeyId=DGTXHHWIEDF4XUBSBYVI&Expires=1405580057&Signature=U0By3mIwaRIVBHNcYhSt6r5QgPk%3D"},nil,nil}}}}},not_fetched_yet,false,{3,{"content-type",{"Content-Type","application/octet-stream"},nil,{"etag",{"ETag","\"a3a32cf5d8f502d7e8d35fd8412a6878\""},nil,
>>>> trap_exit: false
>>>> status: running
>>>> heap_size: 28657
>>>> stack_size: 24
>>>> reductions: 80773
>>>> neighbours:
>>>>
>>>> Thanks,
>>>> Dave Finster
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com