Re: More ondisk_finisher thread?

2015-08-06 Thread Ding Dinghua
Sorry for the noise.
I have find out the cause in our setup and case: We gathered too many
logs in our RADOS IO path, and the latency seems to be
reasonable(about 0.026 ms) if we don't gather that many logs...

2015-08-05 20:29 GMT+08:00 Sage Weil s...@newdream.net:
 On Wed, 5 Aug 2015, Ding Dinghua wrote:
 2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com:
  Yes, it has to re-acquire pg_lock today..
  But, between journal write and initiating the ondisk ack, there is one 
  context switche in the code path. So, I guess the pg_lock is not the only 
  one that is causing this 1 ms delay...
  Not sure increasing the finisher threads will help in the pg_lock case as 
  it will be more or less serialized by this pg_lock..
 My concern is, if pg lock of pg A has been grabbed, not only ondisk
 callback of pg A is delayed, since ondisk_finisher has only one
 thread,  ondisk callback of other pgs will be delayed too.

 I wonder if an optimistic approach might help here by making the
 completion synchronous and doing something like

if (pg-lock.TryLock()) {
   pg-_finish_thing(completion-op);
   delete completion;
} else {
   finisher.queue(completion);
}

 or whatever.  We'd need to ensure that we aren't holding any lock or
 throttle budget that the pg could deadlock against.

 sage



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More ondisk_finisher thread?

2015-08-05 Thread Sage Weil
On Wed, 5 Aug 2015, Ding Dinghua wrote:
 2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com:
  Yes, it has to re-acquire pg_lock today..
  But, between journal write and initiating the ondisk ack, there is one 
  context switche in the code path. So, I guess the pg_lock is not the only 
  one that is causing this 1 ms delay...
  Not sure increasing the finisher threads will help in the pg_lock case as 
  it will be more or less serialized by this pg_lock..
 My concern is, if pg lock of pg A has been grabbed, not only ondisk
 callback of pg A is delayed, since ondisk_finisher has only one
 thread,  ondisk callback of other pgs will be delayed too.

I wonder if an optimistic approach might help here by making the 
completion synchronous and doing something like

   if (pg-lock.TryLock()) {
  pg-_finish_thing(completion-op);
  delete completion;
   } else {
  finisher.queue(completion);
   }

or whatever.  We'd need to ensure that we aren't holding any lock or 
throttle budget that the pg could deadlock against.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More ondisk_finisher thread?

2015-08-04 Thread Ding Dinghua
Hi:
   Now we are doing some ceph performance tuning work, our setup has
ten ceph nodes, and SSD as journal, HDD for filestore, and ceph
version is 0.80.9.
   We run fio in virtual maching with random 4KB write workload, we
find that It took about 1ms in average for ondisk_finisher, while
journal write only took 0.4ms, so I think it's unreasonable.
Since ondisk callback will be called with pg lock held, If pg lock
has been grabbed by another thread(for example, osd-op_wq), all
ondisk callback will be delayed, then all write op will be delayed.
 I found that op_commit must be called with pg lock, so what about
increase the ondisk_finisher thread number, so ondisk callback can be
less likely to be delayed.

-- 
Ding Dinghua
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More ondisk_finisher thread?

2015-08-04 Thread Haomai Wang
It's interesting to see ondisk_finisher will occur 1ms, could you
replay this workload and see whether exists read io from iostat. I
guess it may help to see the cause.

On Wed, Aug 5, 2015 at 12:13 AM, Somnath Roy somnath@sandisk.com wrote:
 Yes, it has to re-acquire pg_lock today..
 But, between journal write and initiating the ondisk ack, there is one 
 context switche in the code path. So, I guess the pg_lock is not the only one 
 that is causing this 1 ms delay...
 Not sure increasing the finisher threads will help in the pg_lock case as it 
 will be more or less serialized by this pg_lock..
 But, increasing finisher threads for the other context switches I was talking 
 about (see queue_completion_thru) may help...

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ding Dinghua
 Sent: Tuesday, August 04, 2015 3:00 AM
 To: ceph-devel@vger.kernel.org
 Subject: More ondisk_finisher thread?

 Hi:
Now we are doing some ceph performance tuning work, our setup has ten ceph 
 nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9.
We run fio in virtual maching with random 4KB write workload, we find that 
 It took about 1ms in average for ondisk_finisher, while journal write only 
 took 0.4ms, so I think it's unreasonable.
 Since ondisk callback will be called with pg lock held, If pg lock has 
 been grabbed by another thread(for example, osd-op_wq), all ondisk callback 
 will be delayed, then all write op will be delayed.
  I found that op_commit must be called with pg lock, so what about 
 increase the ondisk_finisher thread number, so ondisk callback can be less 
 likely to be delayed.

 --
 Ding Dinghua
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).




-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More ondisk_finisher thread?

2015-08-04 Thread Ding Dinghua
Please see the comment below:

2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com:
 Yes, it has to re-acquire pg_lock today..
 But, between journal write and initiating the ondisk ack, there is one 
 context switche in the code path. So, I guess the pg_lock is not the only one 
 that is causing this 1 ms delay...
 Not sure increasing the finisher threads will help in the pg_lock case as it 
 will be more or less serialized by this pg_lock..
My concern is, if pg lock of pg A has been grabbed, not only ondisk
callback of pg A is delayed, since ondisk_finisher has only one
thread,  ondisk callback of other pgs will be delayed too.
 But, increasing finisher threads for the other context switches I was talking 
 about (see queue_completion_thru) may help...
We also count that latency, and It doesn't took much time in our case.

2015-08-05 0:13 GMT+08:00 Somnath Roy somnath@sandisk.com:
 Yes, it has to re-acquire pg_lock today..
 But, between journal write and initiating the ondisk ack, there is one 
 context switche in the code path. So, I guess the pg_lock is not the only one 
 that is causing this 1 ms delay...
 Not sure increasing the finisher threads will help in the pg_lock case as it 
 will be more or less serialized by this pg_lock..
 But, increasing finisher threads for the other context switches I was talking 
 about (see queue_completion_thru) may help...

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ding Dinghua
 Sent: Tuesday, August 04, 2015 3:00 AM
 To: ceph-devel@vger.kernel.org
 Subject: More ondisk_finisher thread?

 Hi:
Now we are doing some ceph performance tuning work, our setup has ten ceph 
 nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9.
We run fio in virtual maching with random 4KB write workload, we find that 
 It took about 1ms in average for ondisk_finisher, while journal write only 
 took 0.4ms, so I think it's unreasonable.
 Since ondisk callback will be called with pg lock held, If pg lock has 
 been grabbed by another thread(for example, osd-op_wq), all ondisk callback 
 will be delayed, then all write op will be delayed.
  I found that op_commit must be called with pg lock, so what about 
 increase the ondisk_finisher thread number, so ondisk callback can be less 
 likely to be delayed.

 --
 Ding Dinghua
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).




-- 
Ding Dinghua
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: More ondisk_finisher thread?

2015-08-04 Thread Somnath Roy
Yes, it has to re-acquire pg_lock today..
But, between journal write and initiating the ondisk ack, there is one context 
switche in the code path. So, I guess the pg_lock is not the only one that is 
causing this 1 ms delay...
Not sure increasing the finisher threads will help in the pg_lock case as it 
will be more or less serialized by this pg_lock..
But, increasing finisher threads for the other context switches I was talking 
about (see queue_completion_thru) may help...

Thanks  Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ding Dinghua
Sent: Tuesday, August 04, 2015 3:00 AM
To: ceph-devel@vger.kernel.org
Subject: More ondisk_finisher thread?

Hi:
   Now we are doing some ceph performance tuning work, our setup has ten ceph 
nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9.
   We run fio in virtual maching with random 4KB write workload, we find that 
It took about 1ms in average for ondisk_finisher, while journal write only took 
0.4ms, so I think it's unreasonable.
Since ondisk callback will be called with pg lock held, If pg lock has been 
grabbed by another thread(for example, osd-op_wq), all ondisk callback will be 
delayed, then all write op will be delayed.
 I found that op_commit must be called with pg lock, so what about increase 
the ondisk_finisher thread number, so ondisk callback can be less likely to be 
delayed.

--
Ding Dinghua
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).