Re: [ceph-users] Down OSDs blocking read requests.
On Fri, Nov 18, 2016 at 1:04 PM, Iain Buclawwrote: > On 18 November 2016 at 13:14, John Spray wrote: >> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote: >>> Hi, >>> >>> Follow up from the suggestion to use any of the following options: >>> >>> - client_mount_timeout >>> - rados_mon_op_timeout >>> - rados_osd_op_timeout >>> >>> To mitigate the waiting time being blocked on requests. Is there >>> really no other way around this? >>> >>> If two OSDs go down that between them have the both copies of an >>> object, it would be nice to have clients fail *immediately*. I've >>> tried reducing the rados_osd_op_timeout setting to 0.5, but when >>> things go wrong, it still results in the collapse of the cluster and >>> all reads from it. >> >> Can you be more specific about what is happening when you set >> rados_osd_op_timeout? You're not seeing timeouts at all, operations >> are blocking instead? >> > > Certainly, they are timing out, but the problem is a numbers game. > > Let's say there are 8 client workers, and between them they are > handling 250 requests per second. A DR situation happens and two OSDs > go down taking 60 PGs with it belonging to a pool with 1024 PGs. Now > you have a situation where 1 in every (1024 / 60) requests to ceph > will timeout. Eventually ending up with a situation where all clients > are blocked waiting for either a response from the OSD or ETIMEOUT. > >> If you can provide a short librados program that demonstrates an op >> blocking indefinitely even when a timeout is set, that would be >> useful. >> > > It's not blocking indefinitely, but the fact that it's blocking at all > is a concern. If a PG is down, no use waiting for it to come back up. > Just give up on the read operation and notify the client immediately, > rather than blocking the client from doing anything else. OK, so you want a new behaviour where it cancels your requests when OSDs go down, as opposed to timing out. Clearly that's not what the current code does: you would have to modify Ceph yourself to do this. Look at Objecter::_scan_requests to see how it currently responds to osdmap updates that affect requests in flight -- it scans through them to identify which ones need resending to a different OSD, you would add an extra behaviour to identify requests that weren't currently serviceable, and cancel them. John > To clarify another position, it makes no sense to use the AIO in my > case. The clinets in question are nginx worker threads, and they > manage async processing between them. Where async doesn't happen is > when the thread is stuck inside a stat() or read() call into librados. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Down OSDs blocking read requests.
On 18 November 2016 at 13:14, John Spraywrote: > On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote: >> Hi, >> >> Follow up from the suggestion to use any of the following options: >> >> - client_mount_timeout >> - rados_mon_op_timeout >> - rados_osd_op_timeout >> >> To mitigate the waiting time being blocked on requests. Is there >> really no other way around this? >> >> If two OSDs go down that between them have the both copies of an >> object, it would be nice to have clients fail *immediately*. I've >> tried reducing the rados_osd_op_timeout setting to 0.5, but when >> things go wrong, it still results in the collapse of the cluster and >> all reads from it. > > Can you be more specific about what is happening when you set > rados_osd_op_timeout? You're not seeing timeouts at all, operations > are blocking instead? > Certainly, they are timing out, but the problem is a numbers game. Let's say there are 8 client workers, and between them they are handling 250 requests per second. A DR situation happens and two OSDs go down taking 60 PGs with it belonging to a pool with 1024 PGs. Now you have a situation where 1 in every (1024 / 60) requests to ceph will timeout. Eventually ending up with a situation where all clients are blocked waiting for either a response from the OSD or ETIMEOUT. > If you can provide a short librados program that demonstrates an op > blocking indefinitely even when a timeout is set, that would be > useful. > It's not blocking indefinitely, but the fact that it's blocking at all is a concern. If a PG is down, no use waiting for it to come back up. Just give up on the read operation and notify the client immediately, rather than blocking the client from doing anything else. To clarify another position, it makes no sense to use the AIO in my case. The clinets in question are nginx worker threads, and they manage async processing between them. Where async doesn't happen is when the thread is stuck inside a stat() or read() call into librados. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Down OSDs blocking read requests.
On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclawwrote: > Hi, > > Follow up from the suggestion to use any of the following options: > > - client_mount_timeout > - rados_mon_op_timeout > - rados_osd_op_timeout > > To mitigate the waiting time being blocked on requests. Is there > really no other way around this? > > If two OSDs go down that between them have the both copies of an > object, it would be nice to have clients fail *immediately*. I've > tried reducing the rados_osd_op_timeout setting to 0.5, but when > things go wrong, it still results in the collapse of the cluster and > all reads from it. Can you be more specific about what is happening when you set rados_osd_op_timeout? You're not seeing timeouts at all, operations are blocking instead? If you can provide a short librados program that demonstrates an op blocking indefinitely even when a timeout is set, that would be useful. John > > Reducing the rados_osd_op_timeout down to 0.05 seems like a sure way > to cause more false positives. But in reality, if an OSD operation > can't serve in 150ms, then it's missed the train by over an hour. > > -- > Iain Buclaw > > *(p < e ? p++ : p) = (c & 0x0f) + '0'; > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Down OSDs blocking read requests.
Hi, Follow up from the suggestion to use any of the following options: - client_mount_timeout - rados_mon_op_timeout - rados_osd_op_timeout To mitigate the waiting time being blocked on requests. Is there really no other way around this? If two OSDs go down that between them have the both copies of an object, it would be nice to have clients fail *immediately*. I've tried reducing the rados_osd_op_timeout setting to 0.5, but when things go wrong, it still results in the collapse of the cluster and all reads from it. Reducing the rados_osd_op_timeout down to 0.05 seems like a sure way to cause more false positives. But in reality, if an OSD operation can't serve in 150ms, then it's missed the train by over an hour. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com