Re: [ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread John Spray
On Fri, Nov 18, 2016 at 1:04 PM, Iain Buclaw  wrote:
> On 18 November 2016 at 13:14, John Spray  wrote:
>> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw  wrote:
>>> Hi,
>>>
>>> Follow up from the suggestion to use any of the following options:
>>>
>>> - client_mount_timeout
>>> - rados_mon_op_timeout
>>> - rados_osd_op_timeout
>>>
>>> To mitigate the waiting time being blocked on requests.  Is there
>>> really no other way around this?
>>>
>>> If two OSDs go down that between them have the both copies of an
>>> object, it would be nice to have clients fail *immediately*.  I've
>>> tried reducing the rados_osd_op_timeout setting to 0.5, but when
>>> things go wrong, it still results in the collapse of the cluster and
>>> all reads from it.
>>
>> Can you be more specific about what is happening when you set
>> rados_osd_op_timeout?  You're not seeing timeouts at all, operations
>> are blocking instead?
>>
>
> Certainly, they are timing out, but the problem is a numbers game.
>
> Let's say there are 8 client workers, and between them they are
> handling 250 requests per second.  A DR situation happens and two OSDs
> go down taking 60 PGs with it belonging to a pool with 1024 PGs.  Now
> you have a situation where 1 in every (1024 / 60) requests to ceph
> will timeout.  Eventually ending up with a situation where all clients
> are blocked waiting for either a response from the OSD or ETIMEOUT.
>
>> If you can provide a short librados program that demonstrates an op
>> blocking indefinitely even when a timeout is set, that would be
>> useful.
>>
>
> It's not blocking indefinitely, but the fact that it's blocking at all
> is a concern.  If a PG is down, no use waiting for it to come back up.
> Just give up on the read operation and notify the client immediately,
> rather than blocking the client from doing anything else.

OK, so you want a new behaviour where it cancels your requests when
OSDs go down, as opposed to timing out.  Clearly that's not what the
current code does: you would have to modify Ceph yourself to do this.
Look at Objecter::_scan_requests to see how it currently responds to
osdmap updates that affect requests in flight -- it scans through them
to identify which ones need resending to a different OSD, you would
add an extra behaviour to identify requests that weren't currently
serviceable, and cancel them.

John

> To clarify another position, it makes no sense to use the AIO in my
> case.  The clinets in question are nginx worker threads, and they
> manage async processing between them.  Where async doesn't happen is
> when the thread is stuck inside a stat() or read() call into librados.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread Iain Buclaw
On 18 November 2016 at 13:14, John Spray  wrote:
> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw  wrote:
>> Hi,
>>
>> Follow up from the suggestion to use any of the following options:
>>
>> - client_mount_timeout
>> - rados_mon_op_timeout
>> - rados_osd_op_timeout
>>
>> To mitigate the waiting time being blocked on requests.  Is there
>> really no other way around this?
>>
>> If two OSDs go down that between them have the both copies of an
>> object, it would be nice to have clients fail *immediately*.  I've
>> tried reducing the rados_osd_op_timeout setting to 0.5, but when
>> things go wrong, it still results in the collapse of the cluster and
>> all reads from it.
>
> Can you be more specific about what is happening when you set
> rados_osd_op_timeout?  You're not seeing timeouts at all, operations
> are blocking instead?
>

Certainly, they are timing out, but the problem is a numbers game.

Let's say there are 8 client workers, and between them they are
handling 250 requests per second.  A DR situation happens and two OSDs
go down taking 60 PGs with it belonging to a pool with 1024 PGs.  Now
you have a situation where 1 in every (1024 / 60) requests to ceph
will timeout.  Eventually ending up with a situation where all clients
are blocked waiting for either a response from the OSD or ETIMEOUT.

> If you can provide a short librados program that demonstrates an op
> blocking indefinitely even when a timeout is set, that would be
> useful.
>

It's not blocking indefinitely, but the fact that it's blocking at all
is a concern.  If a PG is down, no use waiting for it to come back up.
Just give up on the read operation and notify the client immediately,
rather than blocking the client from doing anything else.

To clarify another position, it makes no sense to use the AIO in my
case.  The clinets in question are nginx worker threads, and they
manage async processing between them.  Where async doesn't happen is
when the thread is stuck inside a stat() or read() call into librados.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread John Spray
On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw  wrote:
> Hi,
>
> Follow up from the suggestion to use any of the following options:
>
> - client_mount_timeout
> - rados_mon_op_timeout
> - rados_osd_op_timeout
>
> To mitigate the waiting time being blocked on requests.  Is there
> really no other way around this?
>
> If two OSDs go down that between them have the both copies of an
> object, it would be nice to have clients fail *immediately*.  I've
> tried reducing the rados_osd_op_timeout setting to 0.5, but when
> things go wrong, it still results in the collapse of the cluster and
> all reads from it.

Can you be more specific about what is happening when you set
rados_osd_op_timeout?  You're not seeing timeouts at all, operations
are blocking instead?

If you can provide a short librados program that demonstrates an op
blocking indefinitely even when a timeout is set, that would be
useful.

John

>
> Reducing the rados_osd_op_timeout down to 0.05 seems like a sure way
> to cause more false positives.  But in reality, if an OSD operation
> can't serve in 150ms, then it's missed the train by over an hour.
>
> --
> Iain Buclaw
>
> *(p < e ? p++ : p) = (c & 0x0f) + '0';
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread Iain Buclaw
Hi,

Follow up from the suggestion to use any of the following options:

- client_mount_timeout
- rados_mon_op_timeout
- rados_osd_op_timeout

To mitigate the waiting time being blocked on requests.  Is there
really no other way around this?

If two OSDs go down that between them have the both copies of an
object, it would be nice to have clients fail *immediately*.  I've
tried reducing the rados_osd_op_timeout setting to 0.5, but when
things go wrong, it still results in the collapse of the cluster and
all reads from it.

Reducing the rados_osd_op_timeout down to 0.05 seems like a sure way
to cause more false positives.  But in reality, if an OSD operation
can't serve in 150ms, then it's missed the train by over an hour.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com