Hi Toby,

Can you try raising the pb_backlog to 128 in your riak app.config on each
node. It's likely those disconnect errors are left over from the stampede
of connections from the CS connection pool on startup. For one reason or
another the resets don't come through and the hanging disconnected socket
isn't discovered until the next send attempt.

-Andrew


On Wed, Sep 18, 2013 at 10:31 PM, Toby Corkindale <
[email protected]> wrote:

> On 19/09/13 11:17, Luke Bakken wrote:
>
>> The "error: disconnected" message is a good clue. If you can provide
>> log files that may point to the cause.
>>
>
> See below for logs from riak and riak cs, for a roughly five minute window
> during which I'd fiddled with the load balancer and DNS to try and isolate
> this server from other requests, and only send mine to it.
>
> Unfortunately I really don't see much in there apart from the disconnected
> messages. I did note that there were warnings about system hitting high
> watermarks for memory at one point though. That's a bit odd, as the machine
> wasn't near max capacity at any point - it's only 8GB, but 'free' was
> reporting that most was being used just for buffers and cache at the time;
> as follows:
>
> ie.
>              total       used       free     shared    buffers     cached
> Mem:       8176412    3803404    4373008          0      34112    3052340
> -/+ buffers/cache:     716952    7459460
> Swap:      3903484          0    3903484
>
>
> ======= erlang.log (riak cs) =========
>
> ===== Thu Sep 19 11:58:46 EST 2013
> 11:58:46.287 [info] alarm_handler: {set,{system_memory_high_**
> watermark,[]}}^M
> 12:00:47.864 [info] webmachine_log_handler: closing log file:
> "/var/log/riak-cs/access.log"
> ^M
> 12:00:47.865 [info] opening log file: "/var/log/riak-cs/access.log.**
> 2013_09_19_02"
> ^M
> 12:00:50.081 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:00:51.013 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:00:51.406 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:01:49.320 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:02:04.662 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:03:37.330 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:06:24.345 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:06:41.670 [error] Retrieval of user record for s3 failed. Reason:
> disconnected^M
> 12:06:46.554 [info] Finished garbage collection: 0 seconds, 0 batch_count,
> 0 batch_skips, 0 manif_count, 0 block_count^M
> 12:11:46.300 [info] alarm_handler: {clear,system_memory_high_**
> watermark}^M
> 12:16:46.305 [info] alarm_handler: {set,{system_memory_high_**
> watermark,[]}}^M
> 12:18:46.307 [info] alarm_handler: {clear,system_memory_high_**
> watermark}^M
> ==============================**==========
>
>
> ===== console.log (riak cs) ===================
> 2013-09-19 12:00:47.865 [info] <0.94.0> opening log file:
> "/var/log/riak-cs/access.log.**2013_09_19_02"
>
> 2013-09-19 12:00:50.081 [error] 
> <0.7092.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:00:51.013 [error] 
> <0.10406.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:00:51.406 [error] 
> <0.11179.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:01:49.320 [error] 
> <0.11197.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:02:04.662 [error] 
> <0.9667.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:03:37.330 [error] 
> <0.11343.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:06:24.345 [error] 
> <0.11589.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:06:41.670 [error] 
> <0.11215.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:06:46.554 [info] 
> <0.274.0>@riak_cs_gc_d:**fetching_next_fileset:162
> Finished garbage collection: 0 seconds, 0 batch_count, 0 batch_skips, 0
> manif_count, 0 block_count
> 2013-09-19 12:11:46.300 [info] <0.41.0> alarm_handler:
> {clear,system_memory_high_**watermark}
> 2013-09-19 12:16:46.305 [info] <0.41.0> alarm_handler:
> {set,{system_memory_high_**watermark,[]}}
> 2013-09-19 12:18:46.307 [info] <0.41.0> alarm_handler:
> {clear,system_memory_high_**watermark}
> ==============================**================
>
>
>
> ======= error.log (riak cs) ================
> 2013-09-19 12:00:50.081 [error] 
> <0.7092.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:00:51.013 [error] 
> <0.10406.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:00:51.406 [error] 
> <0.11179.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:01:49.320 [error] 
> <0.11197.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:02:04.662 [error] 
> <0.9667.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:03:37.330 [error] 
> <0.11343.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:06:24.345 [error] 
> <0.11589.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> 2013-09-19 12:06:41.670 [error] 
> <0.11215.0>@riak_cs_wm_common:**maybe_create_user:223
> Retrieval of user record for s3 failed. Reason: disconnected
> ==============================**======
>
>
>
> ========= erlang.log (riak) ==========
> ===== ALIVE Thu Sep 19 12:01:04 EST 2013
>
> ===== ALIVE Thu Sep 19 12:16:04 EST 2013
> ==============================**========
>
>
> =========== console.log (riak) =========
> 2013-09-19 12:03:05.471 [info] 
> <0.146.0>@riak_core_gossip:**log_membership_changes:372
> 'riak@mel-storage04.**strategicdata.internal' joined cluster with status
> 'joining'
> 2013-09-19 12:03:21.048 [info] 
> <0.146.0>@riak_core_gossip:**log_membership_changes:378
> 'riak@mel-storage04.**strategicdata.internal' changed from 'joining' to
> 'valid'
> 2013-09-19 12:03:26.235 [info] 
> <0.182.0>@riak_core_handoff_**manager:handle_info:286
> An outbound handoff of partition riak_kv_vnode
> 274031556999544297163190906134**303066185487351808 was terminated for
> reason: {shutdown,max_concurrency}
>
> ^^ very similar messages to that repeat a few hundred times ^^
>
> ==============================**============
>
>
> ______________________________**_________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to