Gabriel,

We received information about this incident yesterday and have been
discussing it internally. Thank you for providing such a detailed
diagnosis. There must be something about the default httpchk headers that
triggers a bug.

I'll follow up if we need more information. In the meantime, I'm glad your
custom httpchk string fixed your issue.

--
Luke Bakken
CSE
[email protected]


On Thu, Apr 3, 2014 at 3:17 PM, Gabriel Littman <[email protected]> wrote:

> Hi All,
>
> I once again return to the bottomless pool of knowledge that is the
> riak mailing list. :)
>
> Recently we've started work to isolate our Riak cluster.  (We
> currently host other services on our Riak machines.)  Part of that
> work is to put Riak behind HAProxy and have our other services access
> Riak through that.   Then on Tuesday Riak started to eat up all of cpu
> and memory, causing it to swap (yes swap is still on we plan to turn
> it off after we isolate riak.) and the load to shoot up.  It took a
> while to figure out what was going on but it turns out that what had
> changed was that HAProxy.  When we turned of the load balancer and
> restarted Riak it would stay happy but as soon as we turned it (with
> the 'check' option) cpu and memory will slowly creep up until the
> machine is unusable.
>
> It's very confusing since we have already done this work in our
> staging cluster and not had any of these problems.  We are going to do
> a more thorough analysis of differences in hardware and configurations
> but we are pretty good about packaging and deploying important
> settings in a standard way.
>
> Also we were not able to reproduce this except by using HAProxy.
> Meaning when we created a script to try to load up /ping and riak
> handled it just fine.  When we looked deeper and sniffed the network
> my coworkers noticed that curl and haproxy requests were slightly
> different.  When we added some to the header info and used HTTP 1.1
> instead of 1.0 to haproxy it seems to not have the same affect on
> riak.
>
> option             httpchk GET /ping HTTP/1.1\r\nHost:\
> riak\r\nUser-Agent:\ curl/7.22.0\r\nHost:\ riak:8098\r\nAccept:\
> */*\r\n
>
> I've attached a bunch of logs and configurations.  Any advice or
> insights would be much appreciated.
>
> Thanks,
>
> Gabe
>
>
>
> More Info:
> riak 1.4.1
> 5 nodes
> nval 2
> 256 partitions
>
> ubuntu 12.0.4
> 12 core system
> 32g mem
> 32g swap
>
>
> Some suspicious looking riak-admin top entries (don't really know what
> they mean):
> <6201.16207.0>      proc_lib:init_p/5             '-' 3211097921
> 4455251976         62 riak_kv_index_fsm:update_buffer/3
> <6201.1908.0>       proc_lib:init_p/5             '-'  190425738
> 88592          0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
>
>
> <6201.15965.0>      proc_lib:init_p/5             '-' 4156696872
> 3564395400         63 sms:'-values/1-lc$^0/1-0-'/1
> <6201.23152.0>      proc_lib:init_p/5             '-' 3400754909
> 1460972816         99 orddict:update/3
> <6201.1913.0>       proc_lib:init_p/5             '-'  152610162
> 88448          0 gen_server:loop/6
> <6201.1654.0>       proc_lib:init_p/5             '-'  125490325
> 55176          0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
> <6201.1450.0>       proc_lib:init_p/5             '-'  124185894
> 34504          0 riak_kv_vnode:'-result
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to