Hi Charles

We have a slightly different issue to yours in the majority of our requests 
succeed and only the odd one fails - or is that what you are observing on your 
cluster?

I’ve been talking about the issue off-list with Luke and Kelly. Luke took a 
look at some of our debug logs for Riak and suspects that we had over-committed 
the resources of our cluster. I’ve modified the configuration of our cluster as 
per his recommendations:

{multi_backend, [
   {be_default, riak_kv_eleveldb_backend, [
       {max_open_files, 14},
       {cache_size, 4194304},
       {data_root, "/var/db/riak/leveldb"}
   ]},
   {be_blocks, riak_kv_bitcask_backend, [
       {data_root, "/var/db/riak/bitcask"}
   ]}
]},

{anti_entropy, {off, []}},

Our environment is a 4 node cluster with 4GB of RAM each running on SmartOS 
(from Joyent). I applied these changes, but while things improved I still 
encountered the odd failure. I also de-activated n_val_1_get_requests and I 
haven’t been able to reproduce the issues that I was encountering previously. 

Thanks,
Dave

> On 22 Jul 2014, at 9:01 pm, Charles Bijon <[email protected]> wrote:
> 
> Hi,
> 
> We have the same issue there. But we have 45 riak/riak-cs nodes in 
> production. Do you have any idea to correct it ?
> 
> Regards,
> 
> Charles
> 
> 
> Le 17/07/2014 23:21, Dave Finster a écrit :
>> Hi Kelly
>> 
>> 1.4.5 - Riak CS
>> 1.4.8 - Riak
>> Anti Entropy is on (all nodes)
>> 
>> Deactivating n_val_1_get_requests still allows me to cause the issue (with 
>> less occurrence), however a different error has cropped up now:
>> 
>> 2014-07-17 21:15:38 =ERROR REPORT====
>> webmachine error: path="/buckets/<bucket 
>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png"
>> {exit,{{{{case_clause,{error,timeout}},[{riak_cs_manifest_fsm,handle_get_manifests,1,[{file,"src/riak_cs_manifest_fsm.erl"},{line,265}]},{riak_cs_manifest_fsm,waiting_command,3,[{file,"src/riak_cs_manifest_fsm.erl"},{line,201}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},{gen_fsm,sync_send_event,[<0.1383.0>,get_manifests,infinity]}},{gen_fsm,sync_send_event,[<0.1382.0>,get_manifest,infinity]}},[{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,ensure_doc,2,[{file,"src/riak_cs_wm_utils.erl"},{line,236}]},{riak_cs_wm_object,authorize,2,[{file,"src/riak_cs_wm_object.erl"},{line,64}]},{riak_cs_wm_common,authorize,2,[{file,"src/riak_cs_wm_common.erl"},{line,396}]},{riak_cs_wm_common,forbidden,2,[{file,"src/riak_cs_wm_common.erl"},{line,182}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,
>>  [{ 
>> file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}
>> [{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,ensure_doc,2,[{file,"src/riak_cs_wm_utils.erl"},{line,236}]},{riak_cs_wm_object,authorize,2,[{file,"src/riak_cs_wm_object.erl"},{line,64}]},{riak_cs_wm_common,authorize,2,[{file,"src/riak_cs_wm_common.erl"},{line,396}]},{riak_cs_wm_common,forbidden,2,[{file,"src/riak_cs_wm_common.erl"},{line,182}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>> 
>> Thanks,
>> Dave
>> 
>>> On 18 Jul 2014, at 2:19 am, Kelly McLaughlin <[email protected]> wrote:
>>> 
>>> Dave,
>>> 
>>> Can you tell me what versions of Riak and Riak CS you have installed? Do 
>>> you have AAE enabled or disabled? It’s tough to come up with an explanation 
>>> without more information, but I would try setting n_val_1_get_requests to 
>>> false and see if you continue to experience the problem. My guess is that 
>>> will resolve the issue, but let me know what happens. 
>>> 
>>> Kelly
>>> 
>>> On July 17, 2014 at 1:00:19 AM, Dave Finster ([email protected]) wrote:
>>> 
>>>> Hi Everyone
>>>> 
>>>> Spent a bit of time trying to debug this one and not sure were to from 
>>>> here. The use case that appears to cause this breakage is a web page that 
>>>> links to 8 x 10MB images and it attempts to fetch them simultaneously. 
>>>> 
>>>> Occasionally, one or two of the images will just fail to load, while other 
>>>> times they all work file. I’ve tracked it down to the crash below. It 
>>>> isn’t always the same image. To make the problem more repeatable, I forced 
>>>> our load balancer into only using a single Riak-CS node, so it will be 
>>>> getting hit with all the requests. We are using HAProxy out the front and 
>>>> are running SmartOS 64-bit images across the board.
>>>> 
>>>> arekinath helped me look into it and one thought was that I was hit by the 
>>>> AAE bug prior to 1.4.8, but even clearing the AAE made no difference. The 
>>>> n-val on the buckets is 3 and its a 4-node cluster. All 4 nodes have both 
>>>> a Riak and a Riak-CS node on it. I also have pb_backlog turned up to 256, 
>>>> n_val_1_get_requests set to true and fold_objects_for_list_keys set to 
>>>> true. ‘ring-status’ shows that the whole ring is reachable. 
>>>> 
>>>> Any idea on how to diagnose this one further?
>>>> 
>>>> 2014-07-17 06:38:54 =CRASH REPORT====
>>>> crasher:
>>>> initial call: mochiweb_acceptor:init/3
>>>> pid: <0.26119.1>
>>>> registered_name: []
>>>> exception exit: 
>>>> {{normal,{gen_fsm,sync_send_event,[<0.27617.1>,get_next_chunk,infinity]}},[{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,streaming_get,4,[{file,"src/riak_cs_wm_utils.erl"},{line,272}]},{webmachine_decision_core,'-make_encoder_stream/3-fun-0-',3,[{file,"src/webmachine_decision_core.erl"},{line,667}]},{webmachine_request,send_stream_body_no_chunk,2,[{file,"src/webmachine_request.erl"},{line,334}]},{webmachine_request,send_response,3,[{file,"src/webmachine_request.erl"},{line,398}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,251}]},{webmachine_decision_core,wrcall,1,[{file,"src/webmachine_decision_core.erl"},{line,42}]},{webmachine_decision_core,finish_response,3,[{file,"src/webmachine_decision_core.erl"},{line,92}]}]}
>>>> ancestors: [object_web_mochiweb,riak_cs_sup,<0.143.0>]
>>>> messages: []
>>>> links: [<0.298.0>,#Port<0.12015>]
>>>> dictionary: 
>>>> [{reqstate,{wm_reqstate,#Port<0.12015>,[{'content-encoding',"identity"},{'content-type',"application/octet-stream"},{resource_module,riak_cs_wm_object}],undefined,"10.4.242.1",{wm_reqdata,'GET',http,{1,1},"10.4.242.1",undefined,[],"/buckets/<the
>>>>  bucket 
>>>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png","/buckets/<the
>>>>  bucket 
>>>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png?Signature=U0By3mIwaRIVBHNcYhSt6r5QgPk%3D&Expires=1405580057&AWSAccessKeyId=DGTXHHWIEDF4XUBSBYVI",[{bucket,"<the
>>>>  bucket 
>>>> name>"},{object,"bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png"}],[],"../../../..",{200,undefined},1073741824,67108864,[{"_ga","GA1.3.643660316.1404789703"}],[{"Signature","U0By3mIwaRIVBHNcYhSt6r5QgPk="},{"Expires","1405580057"},{"AWSAccessKeyId","DGTXHHWIEDF4XUBSBYVI"}],{9,{"cookie",{'Cookie',"_ga=GA1.3.643660316.1404789703"},{"accept-language",{'Accept-Language',"en-US,en;q=0.8"},{"accept-encoding",{'Accept-Encoding',"gzip,deflate,sdch"},{"accept",{'Accept',"image/webp,*/*;q=0.8"},nil,nil},nil},{"connection",{'Connection',"keep-alive"},nil,nil}},{"referer",{'Referer’,”<the
>>>>  referrer>"},{"host",{'Host’,”<our riak-cs host 
>>>> name>"},nil,nil},{"user-agent",{'User-Agent',"Mozilla/5.0 (Macintosh; 
>>>> Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) 
>>>> Chrome/35.0.1916.153 
>>>> Safari/537.36"},nil,{"x-rcs-rewrite-path",{"x-rcs-rewrite-path","/<the 
>>>> bucket 
>>>> name>/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f/847c340cfe2f44028d6fd5606f696796/Attachment-1.png?AWSAccessKeyId=DGTXHHWIEDF4XUBSBYVI&Expires=1405580057&Signature=U0By3mIwaRIVBHNcYhSt6r5QgPk%3D"},nil,nil}}}}},not_fetched_yet,false,{3,{"content-type",{"Content-Type","application/octet-stream"},nil,{"etag",{"ETag","\"a3a32cf5d8f502d7e8d35fd8412a6878\""},nil,
>>>> trap_exit: false
>>>> status: running
>>>> heap_size: 28657
>>>> stack_size: 24
>>>> reductions: 80773
>>>> neighbours:
>>>> 
>>>> Thanks,
>>>> Dave Finster
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to