On 19/09/13 11:17, Luke Bakken wrote:
> Hi Toby,
>
> Invalid hint files won't cause Riak to fail requests - there must have
> been something else happening. Hint files are used by Riak to speed
> start time when loading a large key set.
>
> You mentioned a "load-balancer pool" - are you using something like
> HAProxy to load-balance requests to your Riak CS cluster?
>
> The "error: disconnected" message is a good clue. If you can provide
> log files that may point to the cause.
Hi Luke,
I'm still seeing quite a few failed requests. I've been chasing the
hintfiles but I guess that was a red herring.
We're using nginx to load balance requests to Riak CS.
I tried going directly to each node in turn, and it didn't show that any
one node was reliably failing every request.
Hitting one server just now came up with OK/403/403/OK/OK.
Trying another was OK/OK/OK/OK/403 though.
Here's some logs from riak-cs:
error.log:2013-09-19 11:37:02.242 [error]
<0.5105.0>@riak_cs_wm_common:maybe_create_user:223 Retrieval of user
record for s3 failed. Reason: disconnected
There wasn't anything immediately either side of that. The riak logs for
the same minute on that server likewise do not have anything.
There's quite a lot of free memory on the servers; they have 32000 file
handles available.
Toby
> On Wed, Sep 18, 2013 at 4:29 PM, Toby Corkindale
> <
[email protected]> wrote:
>> I found one Riak server was reporting a lot of errors like
>> [error] <0.808.0> Hintfile
>> '/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/3.bitcask.hint'
>> invalid
>>
>> And the Riak CS logs contained a lot of messages about being unable to
>> retrieve s3 user details because "error: disconnected"
>>
>> I think I've blown away the bad hintfiles and have had them repaired from
>> other replicas now, and I haven't seen any more errors for a little while.
>>
>> I'm not sure what caused those to become invalid.
>> Just a thought, but would be good if Riak could automatically repair them
>> rather than failing requests.
>>
>> Cheer,s
>> Toby
>>
>> On 19/09/13 08:42, Toby Corkindale wrote:
>>>
>>> Ah, hold on.. have just discovered that rather than it being deletion
>>> calls, it seems to just be every X calls of any sort.. sounds like one
>>> of the servers in the load-balancer pool must be misconfigured somehow,
>>> but the rest are OK.
>>>
>>> On 19/09/13 08:34, Toby Corkindale wrote:
>>>>
>>>> I've just upgraded from Riak CS 1.3.1 to 1.4.1
>>>>
>>>> Using s3cmd to test a few things, I've found some odd behaviour.
>>>> Creating a bucket and putting a file works just fine, eg:
>>>>
>>>> s3cmd mb s3://test
>>>> s3cmd put README s3://test
>>>> s3cmd get s3://test/README
>>>>
>>>> However if I try to delete a file or bucket, it throws an error:
>>>>
>>>> s3cmd del s3://test/README
>>>> ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you
>>>> provided does not exist in our records.
>>>>
>>>> s3cmd rb s3://test
>>>> ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you
>>>> provided does not exist in our records.
>>>>
>>>>
>>>> Have I messed something up during the upgrade, or is this a bug in 1.4.1?
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com