Re: More ondisk_finisher thread?
Sorry for the noise. I have find out the cause in our setup and case: We gathered too many logs in our RADOS IO path, and the latency seems to be reasonable(about 0.026 ms) if we don't gather that many logs... 2015-08-05 20:29 GMT+08:00 Sage Weil s...@newdream.net: On Wed, 5 Aug 2015, Ding Dinghua wrote: 2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com: Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock.. My concern is, if pg lock of pg A has been grabbed, not only ondisk callback of pg A is delayed, since ondisk_finisher has only one thread, ondisk callback of other pgs will be delayed too. I wonder if an optimistic approach might help here by making the completion synchronous and doing something like if (pg-lock.TryLock()) { pg-_finish_thing(completion-op); delete completion; } else { finisher.queue(completion); } or whatever. We'd need to ensure that we aren't holding any lock or throttle budget that the pg could deadlock against. sage -- Ding Dinghua -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More ondisk_finisher thread?
On Wed, 5 Aug 2015, Ding Dinghua wrote: 2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com: Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock.. My concern is, if pg lock of pg A has been grabbed, not only ondisk callback of pg A is delayed, since ondisk_finisher has only one thread, ondisk callback of other pgs will be delayed too. I wonder if an optimistic approach might help here by making the completion synchronous and doing something like if (pg-lock.TryLock()) { pg-_finish_thing(completion-op); delete completion; } else { finisher.queue(completion); } or whatever. We'd need to ensure that we aren't holding any lock or throttle budget that the pg could deadlock against. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
More ondisk_finisher thread?
Hi: Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9. We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable. Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd-op_wq), all ondisk callback will be delayed, then all write op will be delayed. I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed. -- Ding Dinghua -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More ondisk_finisher thread?
It's interesting to see ondisk_finisher will occur 1ms, could you replay this workload and see whether exists read io from iostat. I guess it may help to see the cause. On Wed, Aug 5, 2015 at 12:13 AM, Somnath Roy somnath@sandisk.com wrote: Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock.. But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help... Thanks Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ding Dinghua Sent: Tuesday, August 04, 2015 3:00 AM To: ceph-devel@vger.kernel.org Subject: More ondisk_finisher thread? Hi: Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9. We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable. Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd-op_wq), all ondisk callback will be delayed, then all write op will be delayed. I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed. -- Ding Dinghua -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- Best Regards, Wheat -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More ondisk_finisher thread?
Please see the comment below: 2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com: Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock.. My concern is, if pg lock of pg A has been grabbed, not only ondisk callback of pg A is delayed, since ondisk_finisher has only one thread, ondisk callback of other pgs will be delayed too. But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help... We also count that latency, and It doesn't took much time in our case. 2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com: Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock.. But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help... Thanks Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ding Dinghua Sent: Tuesday, August 04, 2015 3:00 AM To: ceph-devel@vger.kernel.org Subject: More ondisk_finisher thread? Hi: Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9. We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable. Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd-op_wq), all ondisk callback will be delayed, then all write op will be delayed. I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed. -- Ding Dinghua -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- Ding Dinghua -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: More ondisk_finisher thread?
Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock.. But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help... Thanks Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ding Dinghua Sent: Tuesday, August 04, 2015 3:00 AM To: ceph-devel@vger.kernel.org Subject: More ondisk_finisher thread? Hi: Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9. We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable. Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd-op_wq), all ondisk callback will be delayed, then all write op will be delayed. I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed. -- Ding Dinghua -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).