[PATCH V1] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 6647 2.453665504 2579 M W 7493152 + 8 [md0_raid5] 8,16 0 6648 2.453672411 2579 Q W 7493160 + 8 [md0_raid5] 8,16 0 6649 2.453672606 2579 M W 7493160 + 8 [md0_raid5] 8,16 0 6650 2.453679255 2579 Q W 7493168 + 8 [md0_raid5] 8,16 0 6651 2.453679441 2579 M W 7493168 + 8 [md0_raid5] 8,16 0 6652 2.453685948 2579 Q W 7493176 + 8 [md0_raid5] 8,16 0 6653 2.453686149 2579 M W 7493176 + 8 [md0_raid5] 8,16 0 6654 2.453693074 2579 Q W 7493184 + 8 [md0_raid5] 8,16 0 6655 2.453693254 2579 M W 7493184 + 8 [md0_raid5] 8,16 0 6656 2.453704290 2579 Q W 7493192 + 8 [md0_raid5] 8,16 0 6657 2.453704482 2579 M W 7493192 + 8 [md0_raid5] 8,16 0 6658 2.453715016 2579 Q W 7493200 + 8 [md0_raid5] 8,16 0 6659 2.453715247 2579 M W 7493200 + 8 [md0_raid5] 8,16 0 6660 2.453721730 2579 Q W 7493208 + 8 [md0_raid5] 8,16 0 6661 2.453721974 2579 M W 7493208 + 8 [md0_raid5] 8,16 0 6662 2.453728202 2579 Q W 7493216 + 8 [md0_raid5] 8,16 0 6663 2.453728436 2579 M W 7493216 + 8 [md0_raid5] 8,16 0 6664 2.453734782 2579 Q W 7493224 + 8 [md0_raid5] 8,16 0 6665 2.453735019 2579 M W 7493224 + 8 [md0_raid5] 8,16 0 2.453741401 2579 Q W 7493232 + 8 [md0_raid5] 8,16 0 6667 2.453741632 2579 M W 7493232 + 8 [md0_raid5] 8,16 0 6668 2.453748148 2579 Q W 7493240 + 8 [md0_raid5] 8,16 0 6669 2.453748386 2579 M W 7493240 + 8 [md0_raid5] 8,16 0 6670 2.453851843 2579 I W 7493144 + 104 [md0_raid5] 8,16 00 2.453853661 0 m N cfq2579 insert_request 8,16 0 6671 2.453854064 2579 I W 7493120 + 24 [md0_raid5] 8,16 00 2.453854439 0 m N cfq2579 insert_request 8,16 0 6672 2.453854793 2579 U N [md0_raid5] 2 8,16 00 2.453855513 0 m N cfq2579 Not idling.st->count:1 8,16 00 2.453855927 0 m N cfq2579 dispatch_insert 8,16 00 2.453861771 0 m N cfq2579 dispatched a request 8,16 00 2.453862248 0 m N cfq2579 activate rq,drv=1 8,16 0 6673 2.453862332 2579 D W 7493120 + 24 [md0_raid5] 8,16 00 2.453865957 0 m N cfq2579 Not idling.st->count:1 8,16 00 2.453866269 0 m N cfq2579 dispatch_insert 8,16 00 2.453866707 0 m N cfq2579 dispatched a request 8,16 00 2.453867061 0 m N cfq2579 activate rq,drv=2 8,16 0 6674 2.453867145 2579 D W 7493144 + 104 [md0_raid5] 8,16 0 6675 2.454147608 0 C W 7493120 + 24 [0] 8,16 00 2.454149357 0 m N cfq2579 complete rqnoidle 0 8,16 0 6676 2.454791505 0 C W 7493144 + 104 [0] 8,16 00 2.454794803 0 m N cfq2579 complete rqnoidle 0 8,16 00 2.454795160 0 m N cfq schedule dispatch From above messages,we can find rq[W 7493144 + 104] and rq[W 7493120 + 24] do not merge. Because the bio order is: 8,16 0 6638 2.453619407 2579 Q W 7493144 + 8 [md0_raid5] 8,16 0 6639 2.453620460 2579 G W 7493144 + 8 [md0_raid5] 8,16 0 6640 2.453639311 2579 Q W 7493120 + 8 [md0_raid5] 8,16 0 6641 2.453639842 2579 G W 7493120 + 8 [md0_raid5] The bio(7493144) first and bio(7493120) later.So the subsequent bios will be divided into two parts. When flushing plug-list,because elv_attempt_insert_merge only support backmerge,not supporting frontmerge. So rq[7493120 + 24] can't merge with rq[7493144 + 104]. From my test,i found those situation can count 25% in our system. Using this patch, there is no this situation. Signed-off-by: Jianpeng Ma CC:Shaohua Li --- block/blk-core.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index a33870b..3c95c4d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2868,7 +2868,8 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) struct request *rqa = container_of(a, struct request, queuelist); struct request *rqb = container_of(b, struct request, queuelist); - return !(rqa->q <= rqb->q); + return !(rqa->q < rqb->q || + (rqa->q == rqb->q && blk_rq_pos(rqa) < blk_rq_pos(rqb))); } /* -- 1.7.9.5
Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
On 2012-10-16 15:48 Shaohua Li Wrote: >2012/10/16 Jianpeng Ma : >> On 2012-10-15 21:18 Shaohua Li Wrote: >>>2012/10/15 Shaohua Li : >>>> 2012/10/15 Jianpeng Ma : >>>>> My workload is a raid5 which had 16 disks. And used our filesystem to >>>>> write using direct-io mode. >>>>> I used the blktrace to find those message: >>>>> >>>>> 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 >>>>> [md127_raid5] >>>>> 8,16 00 1.083926214 0 m N cfq2519 insert_request >>>>> 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 >>>>> [md127_raid5] >>>>> 8,16 00 1.083926952 0 m N cfq2519 insert_request >>>>> 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 >>>>> 8,16 00 1.083927870 0 m N cfq2519 Not >>>>> idling.st->count:1 >>>>> 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert >>>>> 8,16 00 1.083928951 0 m N cfq2519 dispatched a >>>>> request >>>>> 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 >>>>> 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 >>>>> [md127_raid5] >>>>> 8,16 00 1.083933883 0 m N cfq2519 Not >>>>> idling.st->count:1 >>>>> 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert >>>>> 8,16 00 1.083934654 0 m N cfq2519 dispatched a >>>>> request >>>>> 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 >>>>> 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 >>>>> [md127_raid5] >>>>> 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] >>>>> 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 >>>>> 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] >>>>> .. >>>>> 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 >>>>> [md127_raid5] >>>>> 8,16 10 1.091396181 0 m N cfq2519 insert_request >>>>> 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 >>>>> [md127_raid5] >>>>> 8,16 10 1.091396934 0 m N cfq2519 insert_request >>>>> 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 >>>>> [md127_raid5] >>>>> 8,16 10 1.091397477 0 m N cfq2519 insert_request >>>>> 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 >>>>> [md127_raid5] >>>>> 8,16 10 1.091398023 0 m N cfq2519 insert_request >>>>> 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 >>>>> 8,16 10 1.091398986 0 m N cfq2519 Not idling. >>>>> st->count:1 >>>>> 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert >>>>> 8,16 10 1.091400217 0 m N cfq2519 dispatched a >>>>> request >>>>> 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 >>>>> 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 >>>>> [md127_raid5] >>>>> 8,16 10 1.091406151 0 m N cfq2519 Not >>>>> idling.st->count:1 >>>>> 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert >>>>> 8,16 10 1.091406931 0 m N cfq2519 dispatched a >>>>> request >>>>> 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 >>>>> 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 >>>>> [md127_raid5] >>>>> 8,16 10 1.091414006 0 m N cfq2519 Not >>>>> idling.st->count:1 >>>>> 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert >>>>> 8,16 10 1.091414702 0 m N cfq2519 dispatched a >>>>> request >>>>> 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 >>>>> 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 >>>>> [md127_raid5] >>>>> 8,16 10 1.091416469 0 m N cfq2519 Not >>>>> id
Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
On 2012-10-15 21:18 Shaohua Li Wrote: >2012/10/15 Shaohua Li : >> 2012/10/15 Jianpeng Ma : >>> My workload is a raid5 which had 16 disks. And used our filesystem to >>> write using direct-io mode. >>> I used the blktrace to find those message: >>> >>> 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] >>> 8,16 00 1.083926214 0 m N cfq2519 insert_request >>> 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] >>> 8,16 00 1.083926952 0 m N cfq2519 insert_request >>> 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 >>> 8,16 00 1.083927870 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert >>> 8,16 00 1.083928951 0 m N cfq2519 dispatched a request >>> 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 >>> 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] >>> 8,16 00 1.083933883 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert >>> 8,16 00 1.083934654 0 m N cfq2519 dispatched a request >>> 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 >>> 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] >>> 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] >>> 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 >>> 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] >>> .. >>> 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] >>> 8,16 10 1.091396181 0 m N cfq2519 insert_request >>> 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] >>> 8,16 10 1.091396934 0 m N cfq2519 insert_request >>> 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] >>> 8,16 10 1.091397477 0 m N cfq2519 insert_request >>> 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] >>> 8,16 10 1.091398023 0 m N cfq2519 insert_request >>> 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 >>> 8,16 10 1.091398986 0 m N cfq2519 Not idling. >>> st->count:1 >>> 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091400217 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 >>> 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] >>> 8,16 10 1.091406151 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091406931 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 >>> 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] >>> 8,16 10 1.091414006 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091414702 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 >>> 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] >>> 8,16 10 1.091416469 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091417186 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 >>> 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] >>> 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] >>> 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 >>> 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] >>> 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 >>> 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] >>> 8,16 10 1.092351688 0 m N cfq2519 complete rq
Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
On 2012-10-15 21:18 Shaohua Li s...@kernel.org Wrote: 2012/10/15 Shaohua Li s...@fusionio.com: 2012/10/15 Jianpeng Ma majianp...@gmail.com: My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] 8,16 10 1.092351688 0 m N cfq2519 complete rqnoidle 0 8,16 1 3608 1.093629323 0 C W 144322432 + 56 [0] 8,16 10 1.093631151 0 m N cfq2519 complete rqnoidle 0 8,16 10 1.093631574 0 m N cfq2519 will busy wait 8,16 10 1.093631829 0 m N cfq schedule dispatch Because in func elv_attempt_insert_merge, it only to try to backmerge.So the four request can't merge in theory. I trace ten minutes and count those situation, it can count 25%. With the patch,i tested and not found situation like above. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index a33870b..3c95c4d 100644 --- a/block/blk
Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
On 2012-10-16 15:48 Shaohua Li s...@kernel.org Wrote: 2012/10/16 Jianpeng Ma majianp...@gmail.com: On 2012-10-15 21:18 Shaohua Li s...@kernel.org Wrote: 2012/10/15 Shaohua Li s...@fusionio.com: 2012/10/15 Jianpeng Ma majianp...@gmail.com: My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] 8,16 10 1.092351688 0 m N cfq2519 complete rqnoidle 0 8,16 1 3608 1.093629323 0 C W 144322432 + 56 [0] 8,16 10 1.093631151 0 m N cfq2519 complete rqnoidle 0 8,16 10 1.093631574 0 m N cfq2519 will busy wait 8,16 10 1.093631829 0 m N cfq schedule dispatch Because in func elv_attempt_insert_merge, it only to try to backmerge.So the four request can't merge in theory. I trace ten minutes and count those situation, it can count 25%. With the patch,i tested and not found situation like above. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c |3 ++- 1 file
[PATCH V1] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 6647 2.453665504 2579 M W 7493152 + 8 [md0_raid5] 8,16 0 6648 2.453672411 2579 Q W 7493160 + 8 [md0_raid5] 8,16 0 6649 2.453672606 2579 M W 7493160 + 8 [md0_raid5] 8,16 0 6650 2.453679255 2579 Q W 7493168 + 8 [md0_raid5] 8,16 0 6651 2.453679441 2579 M W 7493168 + 8 [md0_raid5] 8,16 0 6652 2.453685948 2579 Q W 7493176 + 8 [md0_raid5] 8,16 0 6653 2.453686149 2579 M W 7493176 + 8 [md0_raid5] 8,16 0 6654 2.453693074 2579 Q W 7493184 + 8 [md0_raid5] 8,16 0 6655 2.453693254 2579 M W 7493184 + 8 [md0_raid5] 8,16 0 6656 2.453704290 2579 Q W 7493192 + 8 [md0_raid5] 8,16 0 6657 2.453704482 2579 M W 7493192 + 8 [md0_raid5] 8,16 0 6658 2.453715016 2579 Q W 7493200 + 8 [md0_raid5] 8,16 0 6659 2.453715247 2579 M W 7493200 + 8 [md0_raid5] 8,16 0 6660 2.453721730 2579 Q W 7493208 + 8 [md0_raid5] 8,16 0 6661 2.453721974 2579 M W 7493208 + 8 [md0_raid5] 8,16 0 6662 2.453728202 2579 Q W 7493216 + 8 [md0_raid5] 8,16 0 6663 2.453728436 2579 M W 7493216 + 8 [md0_raid5] 8,16 0 6664 2.453734782 2579 Q W 7493224 + 8 [md0_raid5] 8,16 0 6665 2.453735019 2579 M W 7493224 + 8 [md0_raid5] 8,16 0 2.453741401 2579 Q W 7493232 + 8 [md0_raid5] 8,16 0 6667 2.453741632 2579 M W 7493232 + 8 [md0_raid5] 8,16 0 6668 2.453748148 2579 Q W 7493240 + 8 [md0_raid5] 8,16 0 6669 2.453748386 2579 M W 7493240 + 8 [md0_raid5] 8,16 0 6670 2.453851843 2579 I W 7493144 + 104 [md0_raid5] 8,16 00 2.453853661 0 m N cfq2579 insert_request 8,16 0 6671 2.453854064 2579 I W 7493120 + 24 [md0_raid5] 8,16 00 2.453854439 0 m N cfq2579 insert_request 8,16 0 6672 2.453854793 2579 U N [md0_raid5] 2 8,16 00 2.453855513 0 m N cfq2579 Not idling.st-count:1 8,16 00 2.453855927 0 m N cfq2579 dispatch_insert 8,16 00 2.453861771 0 m N cfq2579 dispatched a request 8,16 00 2.453862248 0 m N cfq2579 activate rq,drv=1 8,16 0 6673 2.453862332 2579 D W 7493120 + 24 [md0_raid5] 8,16 00 2.453865957 0 m N cfq2579 Not idling.st-count:1 8,16 00 2.453866269 0 m N cfq2579 dispatch_insert 8,16 00 2.453866707 0 m N cfq2579 dispatched a request 8,16 00 2.453867061 0 m N cfq2579 activate rq,drv=2 8,16 0 6674 2.453867145 2579 D W 7493144 + 104 [md0_raid5] 8,16 0 6675 2.454147608 0 C W 7493120 + 24 [0] 8,16 00 2.454149357 0 m N cfq2579 complete rqnoidle 0 8,16 0 6676 2.454791505 0 C W 7493144 + 104 [0] 8,16 00 2.454794803 0 m N cfq2579 complete rqnoidle 0 8,16 00 2.454795160 0 m N cfq schedule dispatch From above messages,we can find rq[W 7493144 + 104] and rq[W 7493120 + 24] do not merge. Because the bio order is: 8,16 0 6638 2.453619407 2579 Q W 7493144 + 8 [md0_raid5] 8,16 0 6639 2.453620460 2579 G W 7493144 + 8 [md0_raid5] 8,16 0 6640 2.453639311 2579 Q W 7493120 + 8 [md0_raid5] 8,16 0 6641 2.453639842 2579 G W 7493120 + 8 [md0_raid5] The bio(7493144) first and bio(7493120) later.So the subsequent bios will be divided into two parts. When flushing plug-list,because elv_attempt_insert_merge only support backmerge,not supporting frontmerge. So rq[7493120 + 24] can't merge with rq[7493144 + 104]. From my test,i found those situation can count 25% in our system. Using this patch, there is no this situation. Signed-off-by: Jianpeng Ma majianp...@gmail.com CC:Shaohua Li s...@kernel.org --- block/blk-core.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index a33870b..3c95c4d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2868,7 +2868,8 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) struct request *rqa = container_of(a, struct request, queuelist); struct request *rqb = container_of(b, struct request, queuelist); - return !(rqa-q = rqb-q); + return !(rqa-q rqb-q || + (rqa-q == rqb-q blk_rq_pos(rqa) blk_rq_pos(rqb))); } /* -- 1.7.9.5
Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
On 2012-10-15 21:18 Shaohua Li Wrote: >2012/10/15 Shaohua Li : >> 2012/10/15 Jianpeng Ma : >>> My workload is a raid5 which had 16 disks. And used our filesystem to >>> write using direct-io mode. >>> I used the blktrace to find those message: >>> >>> 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] >>> 8,16 00 1.083926214 0 m N cfq2519 insert_request >>> 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] >>> 8,16 00 1.083926952 0 m N cfq2519 insert_request >>> 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 >>> 8,16 00 1.083927870 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert >>> 8,16 00 1.083928951 0 m N cfq2519 dispatched a request >>> 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 >>> 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] >>> 8,16 00 1.083933883 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert >>> 8,16 00 1.083934654 0 m N cfq2519 dispatched a request >>> 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 >>> 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] >>> 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] >>> 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 >>> 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] >>> .. >>> 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] >>> 8,16 10 1.091396181 0 m N cfq2519 insert_request >>> 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] >>> 8,16 10 1.091396934 0 m N cfq2519 insert_request >>> 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] >>> 8,16 10 1.091397477 0 m N cfq2519 insert_request >>> 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] >>> 8,16 10 1.091398023 0 m N cfq2519 insert_request >>> 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 >>> 8,16 10 1.091398986 0 m N cfq2519 Not idling. >>> st->count:1 >>> 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091400217 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 >>> 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] >>> 8,16 10 1.091406151 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091406931 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 >>> 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] >>> 8,16 10 1.091414006 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091414702 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 >>> 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] >>> 8,16 10 1.091416469 0 m N cfq2519 Not >>> idling.st->count:1 >>> 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert >>> 8,16 10 1.091417186 0 m N cfq2519 dispatched a request >>> 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 >>> 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] >>> 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] >>> 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 >>> 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] >>> 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 >>> 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] >>> 8,16 10 1.092351688 0 m N cfq2519 complete rq
[PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling.st->count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling.st->count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st->count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling.st->count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling.st->count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling.st->count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] 8,16 10 1.092351688 0 m N cfq2519 complete rqnoidle 0 8,16 1 3608 1.093629323 0 C W 144322432 + 56 [0] 8,16 10 1.093631151 0 m N cfq2519 complete rqnoidle 0 8,16 10 1.093631574 0 m N cfq2519 will busy wait 8,16 10 1.093631829 0 m N cfq schedule dispatch Because in func "elv_attempt_insert_merge", it only to try to backmerge.So the four request can't merge in theory. I trace ten minutes and count those situation, it can count 25%. With the patch,i tested and not found situation like above. Signed-off-by: Jianpeng Ma --- block/blk-core.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index a33870b..3c95c4d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2868,7 +2868,8 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) struct request *rqa = container_of(a, struct request, queuelist);
About reuse address space when __init func removed
Hi all, Today, I found some kernel message about memeleaking.As follows: unreferenced object 0x8800b6e6b980 (size 64): comm "modprobe", pid 1137, jiffies 4294676166 (age 7326.499s) hex dump (first 32 bytes): 01 04 01 00 00 00 00 00 00 00 98 b5 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [] kmemleak_alloc+0x56/0xc0 [] __kmalloc+0x173/0x310 [] 0xa009a78a [] 0xa009ad95 [] pci_device_probe+0x75/0xa0 [] driver_probe_device+0x84/0x380 [] __driver_attach+0xa3/0xb0 [] bus_for_each_dev+0x56/0x90 [] driver_attach+0x19/0x20 [] bus_add_driver+0x1a0/0x2c0 [] driver_register+0x75/0x150 [] __pci_register_driver+0x5c/0x70 [] nfsd_last_thread+0x47/0x70 [nfsd] [] do_one_initcall+0x3a/0x170 [] sys_init_module+0x8c/0x200 [] system_call_fastpath+0x16/0x1b But the problem is not memleak, but the stack. I noticed "[] nfsd_last_thread+0x47/0x70 [nfsd]".But the real module is mvsas. Why the kernel print nfsd? I added some debuginfo in func mvs_init. diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index cc59dff..d34ce01 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -821,6 +821,7 @@ static int __init mvs_init(void) { int rc; mvs_stt = sas_domain_attach_transport(_transport_ops); + printk(KERN_ERR"%s:0x%lx\n", __func__, _THIS_IP_); if (!mvs_stt) return -ENOMEM; The result is "[3.781487] mvs_init:0xa00a2000" I think because the __init attribute.When mvs_init execd,those memeory removed and the address space alse removed.So after func nfsd_last_thread used those address. Is it a bug? Thanks!
About reuse address space when __init func removed
Hi all, Today, I found some kernel message about memeleaking.As follows: unreferenced object 0x8800b6e6b980 (size 64): comm modprobe, pid 1137, jiffies 4294676166 (age 7326.499s) hex dump (first 32 bytes): 01 04 01 00 00 00 00 00 00 00 98 b5 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [816a3f16] kmemleak_alloc+0x56/0xc0 [8113bd43] __kmalloc+0x173/0x310 [a009a78a] 0xa009a78a [a009ad95] 0xa009ad95 [81300985] pci_device_probe+0x75/0xa0 [814078c4] driver_probe_device+0x84/0x380 [81407c63] __driver_attach+0xa3/0xb0 [81405a96] bus_for_each_dev+0x56/0x90 [81407359] driver_attach+0x19/0x20 [81406e80] bus_add_driver+0x1a0/0x2c0 [81408195] driver_register+0x75/0x150 [812ffa1c] __pci_register_driver+0x5c/0x70 [a00a20a7] nfsd_last_thread+0x47/0x70 [nfsd] [810001fa] do_one_initcall+0x3a/0x170 [810a7d1c] sys_init_module+0x8c/0x200 [816cc352] system_call_fastpath+0x16/0x1b But the problem is not memleak, but the stack. I noticed [a00a20a7] nfsd_last_thread+0x47/0x70 [nfsd].But the real module is mvsas. Why the kernel print nfsd? I added some debuginfo in func mvs_init. diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index cc59dff..d34ce01 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -821,6 +821,7 @@ static int __init mvs_init(void) { int rc; mvs_stt = sas_domain_attach_transport(mvs_transport_ops); + printk(KERN_ERR%s:0x%lx\n, __func__, _THIS_IP_); if (!mvs_stt) return -ENOMEM; The result is [3.781487] mvs_init:0xa00a2000 I think because the __init attribute.When mvs_init execd,those memeory removed and the address space alse removed.So after func nfsd_last_thread used those address. Is it a bug? Thanks!
[PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] 8,16 10 1.092351688 0 m N cfq2519 complete rqnoidle 0 8,16 1 3608 1.093629323 0 C W 144322432 + 56 [0] 8,16 10 1.093631151 0 m N cfq2519 complete rqnoidle 0 8,16 10 1.093631574 0 m N cfq2519 will busy wait 8,16 10 1.093631829 0 m N cfq schedule dispatch Because in func elv_attempt_insert_merge, it only to try to backmerge.So the four request can't merge in theory. I trace ten minutes and count those situation, it can count 25%. With the patch,i tested and not found situation like above. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index a33870b..3c95c4d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2868,7 +2868,8 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) struct request *rqa = container_of(a, struct request, queuelist); struct request
Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.
On 2012-10-15 21:18 Shaohua Li s...@kernel.org Wrote: 2012/10/15 Shaohua Li s...@fusionio.com: 2012/10/15 Jianpeng Ma majianp...@gmail.com: My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq,drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling.st-count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq,drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq,drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq,drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling.st-count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq,drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0] 8,16 10 1.092069851 0 m N cfq2519 complete rqnoidle 0 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0] 8,16 10 1.092351688 0 m N cfq2519 complete rqnoidle 0 8,16 1 3608 1.093629323 0 C W 144322432 + 56 [0] 8,16 10 1.093631151 0 m N cfq2519 complete rqnoidle 0 8,16 10 1.093631574 0 m N cfq2519 will busy wait 8,16 10 1.093631829 0 m N cfq schedule dispatch Because in func elv_attempt_insert_merge, it only to try to backmerge.So the four request can't merge in theory. I trace ten minutes and count those situation, it can count 25%. With the patch,i tested and not found situation like above. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index a33870b..3c95c4d 100644 --- a/block/blk
Re: [PATCH 0/3] Fix problems about handling bio to plug when bio merged failed.
On 2012-08-10 19:44 Jianpeng Ma Wrote: >There are some problems about handling bio which merge to plug failed. >Patch1 will avoid unnecessary plug should_sort test,although it's not a bug. >Patch2 correct a bug when handle more devices,it leak some devices to trace >plug-operation. > >Because the patch2,so it's not necessary to sort when flush plug.Although >patch2 has >O(n*n) complexity,it's more than list_sort which has O(nlog(n)) complexity.But >the plug >list is unlikely too long,so i think patch3 can accept. > > >Jianpeng Ma (3): > block: avoid unnecessary plug should_sort test. > block: Fix not tracing all device plug-operation. > block: Remove unnecessary requests sort. > > block/blk-core.c | 35 ++- > 1 file changed, 18 insertions(+), 17 deletions(-) > >-- >1.7.9.5 Hi axboe: Sorry for asking you again. But I found a problem which it contained those code. So i asked how those patchset again. If you discard those,i will send the patch using the old code. On the other hand,I will wait the patchest release and to continue. The problem is about blk_plug. My workload is raid5 which had 16 disks. And used our filesystem to write used direct mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling. st->count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq, drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling. st->count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq, drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st->count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq, drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling. st->count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq, drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling. st->count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling. st->count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq, drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8
Re: Re: Why blktrace didn't trace requests merge?
On 2012-09-18 13:49 Jens Axboe Wrote: >On 2012-09-18 02:30, Jianpeng Ma wrote: >> On 2012-09-18 02:27 Jens Axboe Wrote: >>> On 2012-09-17 19:55, Tejun Heo wrote: >>>> (cc'ing Jens) >>>> >>>> On Mon, Sep 17, 2012 at 09:22:28AM -0400, Steven Rostedt wrote: >>>>> On Mon, 2012-09-17 at 19:33 +0800, Jianpeng Ma wrote: >>>>>> Hi all: >>>>>> I used blktrace to trace some io.But i can't find requests merge. I >>>>>> searched the code and did't not find. >>>>>> Why? >>>>>> >>>>> >>>>> No idea. I don't use blktrace much, but I Cc'd those that understand it >>>>> better than I. >>> >>> Works for me: >>> >>> [...] >>> >>> >>> 8,00 26 0.009147735 664 A WS 315226143 + 8 <- (8,7) >>> 19406344 >>> 8,00 27 0.009148677 664 Q WS 315226143 + 8 >>> [btrfs-submit-1] >>> 8,00 28 0.009152967 664 G WS 315226143 + 8 >>> [btrfs-submit-1] >>> 8,00 29 0.009154242 664 P N [btrfs-submit-1] >>> 8,00 30 0.009155538 664 A WS 315226151 + 8 <- (8,7) >>> 19406352 >>> 8,00 31 0.009155743 664 Q WS 315226151 + 8 >>> [btrfs-submit-1] >>> 8,00 32 0.009157086 664 M WS 315226151 + 8 >>> [btrfs-submit-1] >>> 8,00 33 0.009158716 664 I WS 315226143 + 16 >>> [btrfs-submit-1] >>> >>> That's from a quick trace of /dev/sda. I started blktrace, then did: >>> >>> $ dd if=/dev/zero of=foo bs=4k count=128 && sync >>> >>> to ensure that I knew merges would be happening. Output stats at the end: >>> >>> Total (sda): >>> Reads Queued: 7, 44KiB Writes Queued: 447, >>> 7692KiB >>> Read Dispatches:7, 44KiB Write Dispatches: 416, >>> 7692KiB >>> Reads Requeued: 0 Writes Requeued: 0 >>> Reads Completed:7, 44KiB Writes Completed: 435, >>> 5864KiB >>> Read Merges:0,0KiB Write Merges: 23, >>> 428KiB >>> IO unplugs:78 Timer unplugs: 0 >>> >>> -- >>> Jens Axboe >>> >> First, Thanks your time! >> If i understand correctly, the merge of your example is bio with >> request, not request wiht request. Yes or no? > >It is bio to request, correct. Request to request merges are relatively >more rare. > >-- >Jens Axboe > Thanks very much, I know. Jianpeng
Re: Re: Why blktrace didn't trace requests merge?
On 2012-09-18 13:49 Jens Axboe ax...@kernel.dk Wrote: On 2012-09-18 02:30, Jianpeng Ma wrote: On 2012-09-18 02:27 Jens Axboe ax...@kernel.dk Wrote: On 2012-09-17 19:55, Tejun Heo wrote: (cc'ing Jens) On Mon, Sep 17, 2012 at 09:22:28AM -0400, Steven Rostedt wrote: On Mon, 2012-09-17 at 19:33 +0800, Jianpeng Ma wrote: Hi all: I used blktrace to trace some io.But i can't find requests merge. I searched the code and did't not find. Why? No idea. I don't use blktrace much, but I Cc'd those that understand it better than I. Works for me: [...] 8,00 26 0.009147735 664 A WS 315226143 + 8 - (8,7) 19406344 8,00 27 0.009148677 664 Q WS 315226143 + 8 [btrfs-submit-1] 8,00 28 0.009152967 664 G WS 315226143 + 8 [btrfs-submit-1] 8,00 29 0.009154242 664 P N [btrfs-submit-1] 8,00 30 0.009155538 664 A WS 315226151 + 8 - (8,7) 19406352 8,00 31 0.009155743 664 Q WS 315226151 + 8 [btrfs-submit-1] 8,00 32 0.009157086 664 M WS 315226151 + 8 [btrfs-submit-1] 8,00 33 0.009158716 664 I WS 315226143 + 16 [btrfs-submit-1] That's from a quick trace of /dev/sda. I started blktrace, then did: $ dd if=/dev/zero of=foo bs=4k count=128 sync to ensure that I knew merges would be happening. Output stats at the end: Total (sda): Reads Queued: 7, 44KiB Writes Queued: 447, 7692KiB Read Dispatches:7, 44KiB Write Dispatches: 416, 7692KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed:7, 44KiB Writes Completed: 435, 5864KiB Read Merges:0,0KiB Write Merges: 23, 428KiB IO unplugs:78 Timer unplugs: 0 -- Jens Axboe First, Thanks your time! If i understand correctly, the merge of your example is bio with request, not request wiht request. Yes or no? It is bio to request, correct. Request to request merges are relatively more rare. -- Jens Axboe Thanks very much, I know. Jianpeng
Re: [PATCH 0/3] Fix problems about handling bio to plug when bio merged failed.
On 2012-08-10 19:44 Jianpeng Ma majianp...@gmail.com Wrote: There are some problems about handling bio which merge to plug failed. Patch1 will avoid unnecessary plug should_sort test,although it's not a bug. Patch2 correct a bug when handle more devices,it leak some devices to trace plug-operation. Because the patch2,so it's not necessary to sort when flush plug.Although patch2 has O(n*n) complexity,it's more than list_sort which has O(nlog(n)) complexity.But the plug list is unlikely too long,so i think patch3 can accept. Jianpeng Ma (3): block: avoid unnecessary plug should_sort test. block: Fix not tracing all device plug-operation. block: Remove unnecessary requests sort. block/blk-core.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) -- 1.7.9.5 Hi axboe: Sorry for asking you again. But I found a problem which it contained those code. So i asked how those patchset again. If you discard those,i will send the patch using the old code. On the other hand,I will wait the patchest release and to continue. The problem is about blk_plug. My workload is raid5 which had 16 disks. And used our filesystem to write used direct mode. I used the blktrace to find those message: 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5] 8,16 00 1.083926214 0 m N cfq2519 insert_request 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5] 8,16 00 1.083926952 0 m N cfq2519 insert_request 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2 8,16 00 1.083927870 0 m N cfq2519 Not idling. st-count:1 8,16 00 1.083928320 0 m N cfq2519 dispatch_insert 8,16 00 1.083928951 0 m N cfq2519 dispatched a request 8,16 00 1.083929443 0 m N cfq2519 activate rq, drv=1 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5] 8,16 00 1.083933883 0 m N cfq2519 Not idling. st-count:1 8,16 00 1.083934189 0 m N cfq2519 dispatch_insert 8,16 00 1.083934654 0 m N cfq2519 dispatched a request 8,16 00 1.083935014 0 m N cfq2519 activate rq, drv=2 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5] 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0] 8,16 00 1.084197979 0 m N cfq2519 complete rqnoidle 0 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0] .. 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5] 8,16 10 1.091396181 0 m N cfq2519 insert_request 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5] 8,16 10 1.091396934 0 m N cfq2519 insert_request 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5] 8,16 10 1.091397477 0 m N cfq2519 insert_request 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5] 8,16 10 1.091398023 0 m N cfq2519 insert_request 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4 8,16 10 1.091398986 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091399511 0 m N cfq2519 dispatch_insert 8,16 10 1.091400217 0 m N cfq2519 dispatched a request 8,16 10 1.091400688 0 m N cfq2519 activate rq, drv=1 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5] 8,16 10 1.091406151 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091406460 0 m N cfq2519 dispatch_insert 8,16 10 1.091406931 0 m N cfq2519 dispatched a request 8,16 10 1.091407291 0 m N cfq2519 activate rq, drv=2 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5] 8,16 10 1.091414006 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091414297 0 m N cfq2519 dispatch_insert 8,16 10 1.091414702 0 m N cfq2519 dispatched a request 8,16 10 1.091415047 0 m N cfq2519 activate rq, drv=3 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5] 8,16 10 1.091416469 0 m N cfq2519 Not idling. st-count:1 8,16 10 1.091416754 0 m N cfq2519 dispatch_insert 8,16 10 1.091417186 0 m N cfq2519 dispatched a request 8,16 10 1.091417535 0 m N cfq2519 activate rq, drv=4 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5] 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0] 8,16 10 1.091858753 0 m N cfq2519 complete rqnoidle 0 8,16 1 3606
Re: Re: Why blktrace didn't trace requests merge?
On 2012-09-18 02:27 Jens Axboe Wrote: >On 2012-09-17 19:55, Tejun Heo wrote: >> (cc'ing Jens) >> >> On Mon, Sep 17, 2012 at 09:22:28AM -0400, Steven Rostedt wrote: >>> On Mon, 2012-09-17 at 19:33 +0800, Jianpeng Ma wrote: >>>> Hi all: >>>>I used blktrace to trace some io.But i can't find requests merge. I >>>> searched the code and did't not find. >>>>Why? >>>> >>> >>> No idea. I don't use blktrace much, but I Cc'd those that understand it >>> better than I. > >Works for me: > >[...] > > > 8,00 26 0.009147735 664 A WS 315226143 + 8 <- (8,7) > 19406344 > 8,00 27 0.009148677 664 Q WS 315226143 + 8 [btrfs-submit-1] > 8,00 28 0.009152967 664 G WS 315226143 + 8 [btrfs-submit-1] > 8,00 29 0.009154242 664 P N [btrfs-submit-1] > 8,00 30 0.009155538 664 A WS 315226151 + 8 <- (8,7) > 19406352 > 8,00 31 0.009155743 664 Q WS 315226151 + 8 [btrfs-submit-1] > 8,00 32 0.009157086 664 M WS 315226151 + 8 [btrfs-submit-1] > 8,00 33 0.009158716 664 I WS 315226143 + 16 > [btrfs-submit-1] > >That's from a quick trace of /dev/sda. I started blktrace, then did: > >$ dd if=/dev/zero of=foo bs=4k count=128 && sync > >to ensure that I knew merges would be happening. Output stats at the end: > >Total (sda): > Reads Queued: 7, 44KiB Writes Queued: 447, > 7692KiB > Read Dispatches:7, 44KiB Write Dispatches: 416, > 7692KiB > Reads Requeued: 0 Writes Requeued: 0 > Reads Completed:7, 44KiB Writes Completed: 435, > 5864KiB > Read Merges:0,0KiB Write Merges: 23, > 428KiB > IO unplugs:78 Timer unplugs: 0 > >-- >Jens Axboe > First, Thanks your time! If i understand correctly, the merge of your example is bio with request, not request wiht request. Yes or no? Thanks! Jianpeng
Why blktrace didn't trace requests merge?
Hi all: I used blktrace to trace some io.But i can't find requests merge. I searched the code and did't not find. Why? Thanks! JianpengN�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Why blktrace didn't trace requests merge?
Hi all: I used blktrace to trace some io.But i can't find requests merge. I searched the code and did't not find. Why? Thanks! JianpengN�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Re: Re: Why blktrace didn't trace requests merge?
On 2012-09-18 02:27 Jens Axboe ax...@kernel.dk Wrote: On 2012-09-17 19:55, Tejun Heo wrote: (cc'ing Jens) On Mon, Sep 17, 2012 at 09:22:28AM -0400, Steven Rostedt wrote: On Mon, 2012-09-17 at 19:33 +0800, Jianpeng Ma wrote: Hi all: I used blktrace to trace some io.But i can't find requests merge. I searched the code and did't not find. Why? No idea. I don't use blktrace much, but I Cc'd those that understand it better than I. Works for me: [...] 8,00 26 0.009147735 664 A WS 315226143 + 8 - (8,7) 19406344 8,00 27 0.009148677 664 Q WS 315226143 + 8 [btrfs-submit-1] 8,00 28 0.009152967 664 G WS 315226143 + 8 [btrfs-submit-1] 8,00 29 0.009154242 664 P N [btrfs-submit-1] 8,00 30 0.009155538 664 A WS 315226151 + 8 - (8,7) 19406352 8,00 31 0.009155743 664 Q WS 315226151 + 8 [btrfs-submit-1] 8,00 32 0.009157086 664 M WS 315226151 + 8 [btrfs-submit-1] 8,00 33 0.009158716 664 I WS 315226143 + 16 [btrfs-submit-1] That's from a quick trace of /dev/sda. I started blktrace, then did: $ dd if=/dev/zero of=foo bs=4k count=128 sync to ensure that I knew merges would be happening. Output stats at the end: Total (sda): Reads Queued: 7, 44KiB Writes Queued: 447, 7692KiB Read Dispatches:7, 44KiB Write Dispatches: 416, 7692KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed:7, 44KiB Writes Completed: 435, 5864KiB Read Merges:0,0KiB Write Merges: 23, 428KiB IO unplugs:78 Timer unplugs: 0 -- Jens Axboe First, Thanks your time! If i understand correctly, the merge of your example is bio with request, not request wiht request. Yes or no? Thanks! Jianpeng
Re: Re: [PATCH 2/3] block: Fix not tracing all device plug-operation.
On 2012-08-10 21:09 Jens Axboe Wrote: >On 08/10/2012 01:46 PM, Jianpeng Ma wrote: >> If process handled two or more devices,there will not be trace some >> devices plug-operation. >> >> Signed-off-by: Jianpeng Ma >> --- >> block/blk-core.c | 16 +++- >> 1 file changed, 15 insertions(+), 1 deletion(-) >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 7a3abc6..034f186 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -1521,11 +1521,25 @@ get_rq: >> struct request *__rq; >> >> __rq = list_entry_rq(plug->list.prev); >> -if (__rq->q != q) >> +if (__rq->q != q) { >> plug->should_sort = 1; >> +trace_block_plug(q); >> +} >> +} else { >> +struct request *__rq; >> +list_for_each_entry_reverse(__rq, >list, >> +queuelist) { >> +if (__rq->q == q) { >> +list_add_tail(>queuelist, >> +&__rq->queuelist); >> +goto stat_acct; > >Did you verify this? It doesn't look right to me. You browse the list in >reverse, which means __rq is the first one that has a matching q. Then >you add the new req IN FRONT of that. You would want list_add() here >instead, adding it as the last member of that q string, not in the >middle. > >-- >Jens Axboe > Hi all: How about those patches? Ok or wrong? Thanks!N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Re: Re: [PATCH 2/3] block: Fix not tracing all device plug-operation.
On 2012-08-10 21:09 Jens Axboe jax...@fusionio.com Wrote: On 08/10/2012 01:46 PM, Jianpeng Ma wrote: If process handled two or more devices,there will not be trace some devices plug-operation. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7a3abc6..034f186 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1521,11 +1521,25 @@ get_rq: struct request *__rq; __rq = list_entry_rq(plug-list.prev); -if (__rq-q != q) +if (__rq-q != q) { plug-should_sort = 1; +trace_block_plug(q); +} +} else { +struct request *__rq; +list_for_each_entry_reverse(__rq, plug-list, +queuelist) { +if (__rq-q == q) { +list_add_tail(req-queuelist, +__rq-queuelist); +goto stat_acct; Did you verify this? It doesn't look right to me. You browse the list in reverse, which means __rq is the first one that has a matching q. Then you add the new req IN FRONT of that. You would want list_add() here instead, adding it as the last member of that q string, not in the middle. -- Jens Axboe Hi all: How about those patches? Ok or wrong? Thanks!N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Re: Re: About function __create_file in debugfs
On 2012-09-08 23:25 gregkh Wrote: >On Sat, Sep 08, 2012 at 05:41:05PM +0800, Jianpeng Ma wrote: >> Hi: >> At present,i used blktrace to trace block io.But i always met error, >> the message like: >> >BLKTRACESETUP(2) /dev/sdc failed: 2/No such file or directory >> >Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file >> >or directory >> >Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file >> >or directory >> >Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file >> >or directory >> >> >Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file >> >or directory >> >FAILED to start thread on CPU 0: 1/Operation not permitted >> >FAILED to start thread on CPU 1: 1/Operation not permitted >> >FAILED to start thread on CPU 2: 1/Operation not permitted >> >FAILED to start thread on CPU 3: 1/Operation not permitted >> >> But those isn't important. I add some message in kernel and found the reason >> is inode already existed. >> But the function __create_file dosen't return correctly errno.So blktrace >> tool can't print correctly message. >> I think func __create_file should return correctly message(ERR_PTR(error)) >> not NULL. > >Patches are always welcome :) > >greg k-h Thanks, the patch is: Func debugfs_create_symlink/debugfs_create_file/debugfs_create_dir, it only return NULL when error counted. We should correctly error info instead of. Signed-off-by: Jianpeng Ma --- fs/debugfs/inode.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index 4733eab..1e94350 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -302,8 +302,10 @@ struct dentry *__create_file(const char *name, umode_t mode, error = simple_pin_fs(_fs_type, _mount, _mount_count); - if (error) + if (error) { + dentry = ERR_PTR(error); goto exit; + } /* If the parent is not specified, we create it in the root. * We need the root dentry to do this, which is in the super @@ -337,7 +339,7 @@ struct dentry *__create_file(const char *name, umode_t mode, mutex_unlock(>d_inode->i_mutex); if (error) { - dentry = NULL; + dentry = ERR_PTR(error); simple_release_fs(_mount, _mount_count); } exit: @@ -442,10 +444,10 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent, link = kstrdup(target, GFP_KERNEL); if (!link) - return NULL; + return ERR_PTR(-ENOMEM); result = __create_file(name, S_IFLNK | S_IRWXUGO, parent, link, NULL); - if (!result) + if (IS_ERR_OR_NULL(result)) kfree(link); return result; } -- 1.7.9.5 But i searched kernel-code and found at-least 100 places used those code. How about those code? Am i correct those or not? Thanks!N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Re: Re: About function __create_file in debugfs
On 2012-09-08 23:25 gregkh gre...@linuxfoundation.org Wrote: On Sat, Sep 08, 2012 at 05:41:05PM +0800, Jianpeng Ma wrote: Hi: At present,i used blktrace to trace block io.But i always met error, the message like: BLKTRACESETUP(2) /dev/sdc failed: 2/No such file or directory Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or directory Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory FAILED to start thread on CPU 0: 1/Operation not permitted FAILED to start thread on CPU 1: 1/Operation not permitted FAILED to start thread on CPU 2: 1/Operation not permitted FAILED to start thread on CPU 3: 1/Operation not permitted But those isn't important. I add some message in kernel and found the reason is inode already existed. But the function __create_file dosen't return correctly errno.So blktrace tool can't print correctly message. I think func __create_file should return correctly message(ERR_PTR(error)) not NULL. Patches are always welcome :) greg k-h Thanks, the patch is: Func debugfs_create_symlink/debugfs_create_file/debugfs_create_dir, it only return NULL when error counted. We should correctly error info instead of. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- fs/debugfs/inode.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index 4733eab..1e94350 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -302,8 +302,10 @@ struct dentry *__create_file(const char *name, umode_t mode, error = simple_pin_fs(debug_fs_type, debugfs_mount, debugfs_mount_count); - if (error) + if (error) { + dentry = ERR_PTR(error); goto exit; + } /* If the parent is not specified, we create it in the root. * We need the root dentry to do this, which is in the super @@ -337,7 +339,7 @@ struct dentry *__create_file(const char *name, umode_t mode, mutex_unlock(parent-d_inode-i_mutex); if (error) { - dentry = NULL; + dentry = ERR_PTR(error); simple_release_fs(debugfs_mount, debugfs_mount_count); } exit: @@ -442,10 +444,10 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent, link = kstrdup(target, GFP_KERNEL); if (!link) - return NULL; + return ERR_PTR(-ENOMEM); result = __create_file(name, S_IFLNK | S_IRWXUGO, parent, link, NULL); - if (!result) + if (IS_ERR_OR_NULL(result)) kfree(link); return result; } -- 1.7.9.5 But i searched kernel-code and found at-least 100 places used those code. How about those code? Am i correct those or not? Thanks!N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
About function __create_file in debugfs
Hi: At present,i used blktrace to trace block io.But i always met error, the message like: >BLKTRACESETUP(2) /dev/sdc failed: 2/No such file or directory >Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or >directory >Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or >directory >Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or >directory >Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or >directory >FAILED to start thread on CPU 0: 1/Operation not permitted >FAILED to start thread on CPU 1: 1/Operation not permitted >FAILED to start thread on CPU 2: 1/Operation not permitted >FAILED to start thread on CPU 3: 1/Operation not permitted But those isn't important. I add some message in kernel and found the reason is inode already existed. But the function __create_file dosen't return correctly errno.So blktrace tool can't print correctly message. I think func __create_file should return correctly message(ERR_PTR(error)) not NULL.
About function __create_file in debugfs
Hi: At present,i used blktrace to trace block io.But i always met error, the message like: BLKTRACESETUP(2) /dev/sdc failed: 2/No such file or directory Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or directory Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory FAILED to start thread on CPU 0: 1/Operation not permitted FAILED to start thread on CPU 1: 1/Operation not permitted FAILED to start thread on CPU 2: 1/Operation not permitted FAILED to start thread on CPU 3: 1/Operation not permitted But those isn't important. I add some message in kernel and found the reason is inode already existed. But the function __create_file dosen't return correctly errno.So blktrace tool can't print correctly message. I think func __create_file should return correctly message(ERR_PTR(error)) not NULL.
About multiple queries to control using $DBGMT/dynamic_debug/control
Hi, I used the $DBGMT/dynamic_debug/control to control printing debuginfo.But When i wote multiple queries which has at least one didn't match, the result is ok which no error can return. So i think i wrote correct queries.But until using dmesg, i only found error. And if parm verbose of dynamic_debug is zero, i can't find the result by dmesg. So i think it's not convenient. If it did not found any query,it should return error. Thanks!N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
About multiple queries to control using $DBGMT/dynamic_debug/control
Hi, I used the $DBGMT/dynamic_debug/control to control printing debuginfo.But When i wote multiple queries which has at least one didn't match, the result is ok which no error can return. So i think i wrote correct queries.But until using dmesg, i only found error. And if parm verbose of dynamic_debug is zero, i can't find the result by dmesg. So i think it's not convenient. If it did not found any query,it should return error. Thanks!N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Re: Re: [PATCH] block: Don't use static to define "void *p" in show_partition_start().
On 2012-08-12 23:45 Michael Tokarev Wrote: >On 03.08.2012 12:41, Jens Axboe wrote: >> On 08/03/2012 07:07 AM, majianpeng wrote: >[] >>> diff --git a/block/genhd.c b/block/genhd.c >>> index cac7366..d839723 100644 >>> --- a/block/genhd.c >>> +++ b/block/genhd.c >>> @@ -835,7 +835,7 @@ static void disk_seqf_stop(struct seq_file *seqf, void >>> *v) >>> >>> static void *show_partition_start(struct seq_file *seqf, loff_t *pos) >>> { >>> - static void *p; >>> + void *p; >>> >>> p = disk_seqf_start(seqf, pos); >>> if (!IS_ERR_OR_NULL(p) && !*pos) >> >> Huh, that looks like a clear bug. I've applied it, thanks. > >It also looks like a -stable material, don't you think? > >Thanks, > >/mjt > Yes, all kernel before this patach had this problem and should apply this patch.
Re: Re: [PATCH] block: Don't use static to define void *p in show_partition_start().
On 2012-08-12 23:45 Michael Tokarev m...@tls.msk.ru Wrote: On 03.08.2012 12:41, Jens Axboe wrote: On 08/03/2012 07:07 AM, majianpeng wrote: [] diff --git a/block/genhd.c b/block/genhd.c index cac7366..d839723 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -835,7 +835,7 @@ static void disk_seqf_stop(struct seq_file *seqf, void *v) static void *show_partition_start(struct seq_file *seqf, loff_t *pos) { - static void *p; + void *p; p = disk_seqf_start(seqf, pos); if (!IS_ERR_OR_NULL(p) !*pos) Huh, that looks like a clear bug. I've applied it, thanks. It also looks like a -stable material, don't you think? Thanks, /mjt Yes, all kernel before this patach had this problem and should apply this patch.
[PATCH 2/3 V1] block: Fix not tracing all device plug-operation.
If process handled two or more devices,there will not be trace some devices plug-operation. V0-->V1 Fix a bug when insert a req to plug-list which already had the same request-queue, it should used list_add not list_add_tail. Signed-off-by: Jianpeng Ma Signed-off-by: Jens Axboe --- block/blk-core.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7a3abc6..034f186 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1521,11 +1521,25 @@ get_rq: struct request *__rq; __rq = list_entry_rq(plug->list.prev); - if (__rq->q != q) + if (__rq->q != q) { plug->should_sort = 1; + trace_block_plug(q); + } + } else { + struct request *__rq; + list_for_each_entry_reverse(__rq, >list, + queuelist) { + if (__rq->q == q) { + list_add(>queuelist, + &__rq->queuelist); + goto stat_acct; + } + } + trace_block_plug(q); } } list_add_tail(>queuelist, >list); +stat_acct: drive_stat_acct(req, 1); } else { spin_lock_irq(q->queue_lock); -- 1.7.9.5 N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
[RFC PATCH] fs/direct-io.c: Add REQ_NOIDLE for last bio .
For last bio of dio, there are no bio will come.So set REQ_NOIDLE. Signed-off-by: Jianpeng Ma --- fs/direct-io.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 1faf4cb..7c6958f 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -127,6 +127,7 @@ struct dio { int page_errors;/* errno from get_user_pages() */ int is_async; /* is IO async ? */ int io_error; /* IO error in completion path */ + sector_t end_sector;/* the last sector for this dio */ unsigned long refcount; /* direct_io_worker() and bios */ struct bio *bio_list; /* singly linked via bi_private */ struct task_struct *waiter; /* waiting task (NULL if none) */ @@ -369,21 +370,28 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) { struct bio *bio = sdio->bio; unsigned long flags; - + int rw = dio->rw; bio->bi_private = dio; spin_lock_irqsave(>bio_lock, flags); dio->refcount++; spin_unlock_irqrestore(>bio_lock, flags); + /* +* If bio is the last for dio,so no bio can arrive for low-level +* unless this dio completed. +*/ + if (bio->bi_sector + bio_sectors(bio) >= dio->end_sector) + rw |= REQ_NOIDLE; + if (dio->is_async && dio->rw == READ) bio_set_pages_dirty(bio); if (sdio->submit_io) - sdio->submit_io(dio->rw, bio, dio->inode, + sdio->submit_io(rw, bio, dio->inode, sdio->logical_offset_in_bio); else - submit_bio(dio->rw, bio); + submit_bio(rw, bio); sdio->bio = NULL; sdio->boundary = 0; @@ -1147,6 +1155,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, dio->inode = inode; dio->rw = rw; + dio->end_sector = end >> 9; sdio.blkbits = blkbits; sdio.blkfactor = inode->i_blkbits - blkbits; sdio.block_in_file = offset >> blkbits; -- 1.7.9.5
[PATCH 3/3] block: Remove unnecessary requests sort.
When adding request to plug,it already sort.So there is not unnecessary. Signed-off-by: Jianpeng Ma --- block/blk-core.c | 12 1 file changed, 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 034f186..9dbdef6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2879,13 +2879,6 @@ void blk_start_plug(struct blk_plug *plug) } EXPORT_SYMBOL(blk_start_plug); -static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) -{ - struct request *rqa = container_of(a, struct request, queuelist); - struct request *rqb = container_of(b, struct request, queuelist); - - return !(rqa->q <= rqb->q); -} /* * If 'from_schedule' is true, then postpone the dispatch of requests @@ -2980,11 +2973,6 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule) list_splice_init(>list, ); - if (plug->should_sort) { - list_sort(NULL, , plug_rq_cmp); - plug->should_sort = 0; - } - q = NULL; depth = 0; -- 1.7.9.5
[PATCH 2/3] block: Fix not tracing all device plug-operation.
If process handled two or more devices,there will not be trace some devices plug-operation. Signed-off-by: Jianpeng Ma --- block/blk-core.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7a3abc6..034f186 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1521,11 +1521,25 @@ get_rq: struct request *__rq; __rq = list_entry_rq(plug->list.prev); - if (__rq->q != q) + if (__rq->q != q) { plug->should_sort = 1; + trace_block_plug(q); + } + } else { + struct request *__rq; + list_for_each_entry_reverse(__rq, >list, + queuelist) { + if (__rq->q == q) { + list_add_tail(>queuelist, + &__rq->queuelist); + goto stat_acct; + } + } + trace_block_plug(q); } } list_add_tail(>queuelist, >list); +stat_acct: drive_stat_acct(req, 1); } else { spin_lock_irq(q->queue_lock); -- 1.7.9.5
[PATCH 1/3] block: avoid unnecessary plug should_sort test.
If request_count >= BLK_MAX_REQUEST_COUNT,then it will exec blk_flush_plug_list which plug all request.So no need to do plug->should_sort test. Signed-off-by: Jianpeng Ma --- block/blk-core.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4b4dbdf..7a3abc6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1514,17 +1514,16 @@ get_rq: if (list_empty(>list)) trace_block_plug(q); else { - if (!plug->should_sort) { + if (request_count >= BLK_MAX_REQUEST_COUNT) { + blk_flush_plug_list(plug, false); + trace_block_plug(q); + } else if (!plug->should_sort) { struct request *__rq; __rq = list_entry_rq(plug->list.prev); if (__rq->q != q) plug->should_sort = 1; } - if (request_count >= BLK_MAX_REQUEST_COUNT) { - blk_flush_plug_list(plug, false); - trace_block_plug(q); - } } list_add_tail(>queuelist, >list); drive_stat_acct(req, 1); -- 1.7.9.5 N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
[PATCH 0/3] Fix problems about handling bio to plug when bio merged failed.
There are some problems about handling bio which merge to plug failed. Patch1 will avoid unnecessary plug should_sort test,although it's not a bug. Patch2 correct a bug when handle more devices,it leak some devices to trace plug-operation. Because the patch2,so it's not necessary to sort when flush plug.Although patch2 has O(n*n) complexity,it's more than list_sort which has O(nlog(n)) complexity.But the plug list is unlikely too long,so i think patch3 can accept. Jianpeng Ma (3): block: avoid unnecessary plug should_sort test. block: Fix not tracing all device plug-operation. block: Remove unnecessary requests sort. block/blk-core.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) -- 1.7.9.5
[PATCH 0/3] Fix problems about handling bio to plug when bio merged failed.
There are some problems about handling bio which merge to plug failed. Patch1 will avoid unnecessary plug should_sort test,although it's not a bug. Patch2 correct a bug when handle more devices,it leak some devices to trace plug-operation. Because the patch2,so it's not necessary to sort when flush plug.Although patch2 has O(n*n) complexity,it's more than list_sort which has O(nlog(n)) complexity.But the plug list is unlikely too long,so i think patch3 can accept. Jianpeng Ma (3): block: avoid unnecessary plug should_sort test. block: Fix not tracing all device plug-operation. block: Remove unnecessary requests sort. block/blk-core.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) -- 1.7.9.5
[PATCH 1/3] block: avoid unnecessary plug should_sort test.
If request_count = BLK_MAX_REQUEST_COUNT,then it will exec blk_flush_plug_list which plug all request.So no need to do plug-should_sort test. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4b4dbdf..7a3abc6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1514,17 +1514,16 @@ get_rq: if (list_empty(plug-list)) trace_block_plug(q); else { - if (!plug-should_sort) { + if (request_count = BLK_MAX_REQUEST_COUNT) { + blk_flush_plug_list(plug, false); + trace_block_plug(q); + } else if (!plug-should_sort) { struct request *__rq; __rq = list_entry_rq(plug-list.prev); if (__rq-q != q) plug-should_sort = 1; } - if (request_count = BLK_MAX_REQUEST_COUNT) { - blk_flush_plug_list(plug, false); - trace_block_plug(q); - } } list_add_tail(req-queuelist, plug-list); drive_stat_acct(req, 1); -- 1.7.9.5 N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
[PATCH 2/3] block: Fix not tracing all device plug-operation.
If process handled two or more devices,there will not be trace some devices plug-operation. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7a3abc6..034f186 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1521,11 +1521,25 @@ get_rq: struct request *__rq; __rq = list_entry_rq(plug-list.prev); - if (__rq-q != q) + if (__rq-q != q) { plug-should_sort = 1; + trace_block_plug(q); + } + } else { + struct request *__rq; + list_for_each_entry_reverse(__rq, plug-list, + queuelist) { + if (__rq-q == q) { + list_add_tail(req-queuelist, + __rq-queuelist); + goto stat_acct; + } + } + trace_block_plug(q); } } list_add_tail(req-queuelist, plug-list); +stat_acct: drive_stat_acct(req, 1); } else { spin_lock_irq(q-queue_lock); -- 1.7.9.5
[PATCH 3/3] block: Remove unnecessary requests sort.
When adding request to plug,it already sort.So there is not unnecessary. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c | 12 1 file changed, 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 034f186..9dbdef6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2879,13 +2879,6 @@ void blk_start_plug(struct blk_plug *plug) } EXPORT_SYMBOL(blk_start_plug); -static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) -{ - struct request *rqa = container_of(a, struct request, queuelist); - struct request *rqb = container_of(b, struct request, queuelist); - - return !(rqa-q = rqb-q); -} /* * If 'from_schedule' is true, then postpone the dispatch of requests @@ -2980,11 +2973,6 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule) list_splice_init(plug-list, list); - if (plug-should_sort) { - list_sort(NULL, list, plug_rq_cmp); - plug-should_sort = 0; - } - q = NULL; depth = 0; -- 1.7.9.5
[RFC PATCH] fs/direct-io.c: Add REQ_NOIDLE for last bio .
For last bio of dio, there are no bio will come.So set REQ_NOIDLE. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- fs/direct-io.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 1faf4cb..7c6958f 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -127,6 +127,7 @@ struct dio { int page_errors;/* errno from get_user_pages() */ int is_async; /* is IO async ? */ int io_error; /* IO error in completion path */ + sector_t end_sector;/* the last sector for this dio */ unsigned long refcount; /* direct_io_worker() and bios */ struct bio *bio_list; /* singly linked via bi_private */ struct task_struct *waiter; /* waiting task (NULL if none) */ @@ -369,21 +370,28 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) { struct bio *bio = sdio-bio; unsigned long flags; - + int rw = dio-rw; bio-bi_private = dio; spin_lock_irqsave(dio-bio_lock, flags); dio-refcount++; spin_unlock_irqrestore(dio-bio_lock, flags); + /* +* If bio is the last for dio,so no bio can arrive for low-level +* unless this dio completed. +*/ + if (bio-bi_sector + bio_sectors(bio) = dio-end_sector) + rw |= REQ_NOIDLE; + if (dio-is_async dio-rw == READ) bio_set_pages_dirty(bio); if (sdio-submit_io) - sdio-submit_io(dio-rw, bio, dio-inode, + sdio-submit_io(rw, bio, dio-inode, sdio-logical_offset_in_bio); else - submit_bio(dio-rw, bio); + submit_bio(rw, bio); sdio-bio = NULL; sdio-boundary = 0; @@ -1147,6 +1155,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, dio-inode = inode; dio-rw = rw; + dio-end_sector = end 9; sdio.blkbits = blkbits; sdio.blkfactor = inode-i_blkbits - blkbits; sdio.block_in_file = offset blkbits; -- 1.7.9.5
[PATCH 2/3 V1] block: Fix not tracing all device plug-operation.
If process handled two or more devices,there will not be trace some devices plug-operation. V0--V1 Fix a bug when insert a req to plug-list which already had the same request-queue, it should used list_add not list_add_tail. Signed-off-by: Jianpeng Ma majianp...@gmail.com Signed-off-by: Jens Axboe ax...@kernel.dk --- block/blk-core.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7a3abc6..034f186 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1521,11 +1521,25 @@ get_rq: struct request *__rq; __rq = list_entry_rq(plug-list.prev); - if (__rq-q != q) + if (__rq-q != q) { plug-should_sort = 1; + trace_block_plug(q); + } + } else { + struct request *__rq; + list_for_each_entry_reverse(__rq, plug-list, + queuelist) { + if (__rq-q == q) { + list_add(req-queuelist, + __rq-queuelist); + goto stat_acct; + } + } + trace_block_plug(q); } } list_add_tail(req-queuelist, plug-list); +stat_acct: drive_stat_acct(req, 1); } else { spin_lock_irq(q-queue_lock); -- 1.7.9.5 N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
Re: Re: [RFC PATCH] block:Fix some problems about handling plug in blk_queue_bio().
On 2012-08-08 11:06 Shaohua Li Wrote: >2012/8/8 Jianpeng Ma : >> I think there are three problems about handling plug in blk_queue_bio(): >> 1:if request_count >= BLK_MAX_REQUEST_COUNT, avoid unnecessary >> plug->should_sort judge. >this makes sense, though not a big deal, nice to fix it. Thanks > >> 2:Only two device can trace plug. >I didn't get the point, can you have more details? >>if (plug) { >> /* >> * If this is the first request added after a plug, fire >> * of a plug trace. If others have been added before, check >> * if we have multiple devices in this plug. If so, make a >> * note to sort the list before dispatch. >> */ >> if (list_empty(>list)) >> trace_block_plug(q); >> else { >> if (!plug->should_sort) { >> struct request *__rq; >> __rq = list_entry_rq(plug->list.prev); >> if (__rq->q != q) >> plug->should_sort = 1; >> } >> if (request_count >= BLK_MAX_REQUEST_COUNT) { >> blk_flush_plug_list(plug, false); >> trace_block_plug(q); The code only trace two point; A: list_empty(>list) B: request_count >= BLK_MAX_REQUEST_COUNT). it's the same like A which plug->list is empty. Suppose: 1;reqA-deviceA firstly come, it will call trace_block_plug because the list_empty(plug->list) is true. 2:reqB-deviceB comed, attempt_plug_merge will failed because not deviceB-request-queue.But it'll not to call trace_block_plug. But call blk_flush_plug_list,it will trace_block_unplug all request_queue. > >> 3:When exec blk_flush_plug_list,it use list_sort which has >> O(nlog(n)) complexity. When insert and sort, it only O(n) complexity. >but now you do the list iterator for every request, so it's O(n*n)? >The plug list is unlikely too long, so I didn't worry about the time >spending on list sort. Sorry, it's my fault.
[RFC PATCH] block:Fix some problems about handling plug in blk_queue_bio().
I think there are three problems about handling plug in blk_queue_bio(): 1:if request_count >= BLK_MAX_REQUEST_COUNT, avoid unnecessary plug->should_sort judge. 2:Only two device can trace plug. 3:When exec blk_flush_plug_list,it use list_sort which has O(nlog(n)) complexity. When insert and sort, it only O(n) complexity. Signed-off-by: Jianpeng Ma --- block/blk-core.c | 32 +++- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4b4dbdf..e7759f8 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1514,20 +1514,31 @@ get_rq: if (list_empty(>list)) trace_block_plug(q); else { - if (!plug->should_sort) { + if (request_count >= BLK_MAX_REQUEST_COUNT) { + blk_flush_plug_list(plug, false); + trace_block_plug(q); + } else if (!plug->should_sort) { struct request *__rq; __rq = list_entry_rq(plug->list.prev); if (__rq->q != q) plug->should_sort = 1; - } - if (request_count >= BLK_MAX_REQUEST_COUNT) { - blk_flush_plug_list(plug, false); + } else { + struct request *rq; + + list_for_each_entry_reverse(rq, >list, queuelist) { + if (rq->q == q) { + list_add(>queuelist, >queuelist); + goto stat_acct; + } + } trace_block_plug(q); } } list_add_tail(>queuelist, >list); +stat_acct: drive_stat_acct(req, 1); + } else { spin_lock_irq(q->queue_lock); add_acct_request(q, req, where); @@ -2866,14 +2877,6 @@ void blk_start_plug(struct blk_plug *plug) } EXPORT_SYMBOL(blk_start_plug); -static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) -{ - struct request *rqa = container_of(a, struct request, queuelist); - struct request *rqb = container_of(b, struct request, queuelist); - - return !(rqa->q <= rqb->q); -} - /* * If 'from_schedule' is true, then postpone the dispatch of requests * until a safe kblockd context. We due this to avoid accidental big @@ -2967,11 +2970,6 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule) list_splice_init(>list, ); - if (plug->should_sort) { - list_sort(NULL, , plug_rq_cmp); - plug->should_sort = 0; - } - q = NULL; depth = 0; -- 1.7.9.5
[RFC PATCH] block:Fix some problems about handling plug in blk_queue_bio().
I think there are three problems about handling plug in blk_queue_bio(): 1:if request_count = BLK_MAX_REQUEST_COUNT, avoid unnecessary plug-should_sort judge. 2:Only two device can trace plug. 3:When exec blk_flush_plug_list,it use list_sort which has O(nlog(n)) complexity. When insert and sort, it only O(n) complexity. Signed-off-by: Jianpeng Ma majianp...@gmail.com --- block/blk-core.c | 32 +++- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4b4dbdf..e7759f8 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1514,20 +1514,31 @@ get_rq: if (list_empty(plug-list)) trace_block_plug(q); else { - if (!plug-should_sort) { + if (request_count = BLK_MAX_REQUEST_COUNT) { + blk_flush_plug_list(plug, false); + trace_block_plug(q); + } else if (!plug-should_sort) { struct request *__rq; __rq = list_entry_rq(plug-list.prev); if (__rq-q != q) plug-should_sort = 1; - } - if (request_count = BLK_MAX_REQUEST_COUNT) { - blk_flush_plug_list(plug, false); + } else { + struct request *rq; + + list_for_each_entry_reverse(rq, plug-list, queuelist) { + if (rq-q == q) { + list_add(req-queuelist, rq-queuelist); + goto stat_acct; + } + } trace_block_plug(q); } } list_add_tail(req-queuelist, plug-list); +stat_acct: drive_stat_acct(req, 1); + } else { spin_lock_irq(q-queue_lock); add_acct_request(q, req, where); @@ -2866,14 +2877,6 @@ void blk_start_plug(struct blk_plug *plug) } EXPORT_SYMBOL(blk_start_plug); -static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) -{ - struct request *rqa = container_of(a, struct request, queuelist); - struct request *rqb = container_of(b, struct request, queuelist); - - return !(rqa-q = rqb-q); -} - /* * If 'from_schedule' is true, then postpone the dispatch of requests * until a safe kblockd context. We due this to avoid accidental big @@ -2967,11 +2970,6 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule) list_splice_init(plug-list, list); - if (plug-should_sort) { - list_sort(NULL, list, plug_rq_cmp); - plug-should_sort = 0; - } - q = NULL; depth = 0; -- 1.7.9.5
Re: Re: [RFC PATCH] block:Fix some problems about handling plug in blk_queue_bio().
On 2012-08-08 11:06 Shaohua Li s...@kernel.org Wrote: 2012/8/8 Jianpeng Ma majianp...@gmail.com: I think there are three problems about handling plug in blk_queue_bio(): 1:if request_count = BLK_MAX_REQUEST_COUNT, avoid unnecessary plug-should_sort judge. this makes sense, though not a big deal, nice to fix it. Thanks 2:Only two device can trace plug. I didn't get the point, can you have more details? if (plug) { /* * If this is the first request added after a plug, fire * of a plug trace. If others have been added before, check * if we have multiple devices in this plug. If so, make a * note to sort the list before dispatch. */ if (list_empty(plug-list)) trace_block_plug(q); else { if (!plug-should_sort) { struct request *__rq; __rq = list_entry_rq(plug-list.prev); if (__rq-q != q) plug-should_sort = 1; } if (request_count = BLK_MAX_REQUEST_COUNT) { blk_flush_plug_list(plug, false); trace_block_plug(q); The code only trace two point; A: list_empty(plug-list) B: request_count = BLK_MAX_REQUEST_COUNT). it's the same like A which plug-list is empty. Suppose: 1;reqA-deviceA firstly come, it will call trace_block_plug because the list_empty(plug-list) is true. 2:reqB-deviceB comed, attempt_plug_merge will failed because not deviceB-request-queue.But it'll not to call trace_block_plug. But call blk_flush_plug_list,it will trace_block_unplug all request_queue. 3:When exec blk_flush_plug_list,it use list_sort which has O(nlog(n)) complexity. When insert and sort, it only O(n) complexity. but now you do the list iterator for every request, so it's O(n*n)? The plug list is unlikely too long, so I didn't worry about the time spending on list sort. Sorry, it's my fault.