[PATCH] aoe: list new maintainer for aoe driver
Justin Sanders, who has extensive experience with ATA over Ethernet in general and AoE SCSI and block-device drivers in particular, is ready to take on the role of aoe maintainer. The driver needs a more active maintainer. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5c38f21..9a0c0838 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2568,7 +2568,7 @@ F: Documentation/devicetree/bindings/eeprom/at24.txt F: drivers/misc/eeprom/at24.c ATA OVER ETHERNET (AOE) DRIVER -M: "Ed L. Cashin" +M: "Justin Sanders" W: http://www.openaoe.org/ S: Supported F: Documentation/aoe/ -- 2.7.4
Re: [PATCH] block: aoe: no need to check return value of debugfs_create functions
On Tue, Jan 22, 2019 at 6:29 PM Omar Sandoval wrote: ... > Now entry is uninitialized here when we assign it to d->debugfs. Thanks for noticing that.
Re: [PATCH v3] aoe: document sysfs interface
On 02/19/2018 01:16 PM, Aishwarya Pant wrote: Documentation has been compiled from git commit logs and descriptions in Documentation/aoe/aoe.txt. This should be useful for scripting and tracking changes in the ABI. I don't see any problems with v3. Thank you. -- Ed
Re: [PATCH v3] aoe: document sysfs interface
On 02/19/2018 01:16 PM, Aishwarya Pant wrote: Documentation has been compiled from git commit logs and descriptions in Documentation/aoe/aoe.txt. This should be useful for scripting and tracking changes in the ABI. I don't see any problems with v3. Thank you. -- Ed
Re: [PATCH] aoe: document sysfs interface
> On Feb 17, 2018, at 9:37 AM, Julia Lawall <julia.law...@lip6.fr> wrote: > > > >> On Fri, 16 Feb 2018, Ed Cashin wrote: >> >>> On 02/16/2018 10:39 AM, Aishwarya Pant wrote: >>> Documentation has been compiled from git commit logs and descriptions in >>> Documentation/aoe/aoe.txt. This should be useful for scripting and >>> tracking changes in the ABI. >> ... >>> +What:/sys/block/etherd*/netif >> ... >>> +Description: >>> +(RO) The name of the network interface on the localhost >>> through >>> +which we are communicating with the remote AoE device. >> >> I'd recommend changing that to, "network interfaces". > > Should it then be "names"? Yes, s’il vous plait. Merci. — Ed
Re: [PATCH] aoe: document sysfs interface
> On Feb 17, 2018, at 9:37 AM, Julia Lawall wrote: > > > >> On Fri, 16 Feb 2018, Ed Cashin wrote: >> >>> On 02/16/2018 10:39 AM, Aishwarya Pant wrote: >>> Documentation has been compiled from git commit logs and descriptions in >>> Documentation/aoe/aoe.txt. This should be useful for scripting and >>> tracking changes in the ABI. >> ... >>> +What:/sys/block/etherd*/netif >> ... >>> +Description: >>> +(RO) The name of the network interface on the localhost >>> through >>> +which we are communicating with the remote AoE device. >> >> I'd recommend changing that to, "network interfaces". > > Should it then be "names"? Yes, s’il vous plait. Merci. — Ed
Re: [PATCH] aoe: document sysfs interface
On 02/16/2018 10:39 AM, Aishwarya Pant wrote: Documentation has been compiled from git commit logs and descriptions in Documentation/aoe/aoe.txt. This should be useful for scripting and tracking changes in the ABI. ... +What: /sys/block/etherd*/netif ... +Description: + (RO) The name of the network interface on the localhost through + which we are communicating with the remote AoE device. I'd recommend changing that to, "network interfaces". Thanks! -- Ed
Re: [PATCH] aoe: document sysfs interface
On 02/16/2018 10:39 AM, Aishwarya Pant wrote: Documentation has been compiled from git commit logs and descriptions in Documentation/aoe/aoe.txt. This should be useful for scripting and tracking changes in the ABI. ... +What: /sys/block/etherd*/netif ... +Description: + (RO) The name of the network interface on the localhost through + which we are communicating with the remote AoE device. I'd recommend changing that to, "network interfaces". Thanks! -- Ed
Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv
Good luck in your efforts, and thanks for your work on static analysis. > On Jan 27, 2018, at 9:12 PM, Jia-Ju Bai <baijiaju1...@gmail.com> wrote: > > > >> On 2018/1/28 1:48, Ed Cashin wrote: >> If the tool cannot tell whether the protected state is manipulated by >> *another* piece of code called in atomic context, then it's insufficient. >> >>> On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai <baijiaju1...@gmail.com> wrote: >>> >>> After checking all possible call chains to aoenet_rcv(), >>> my tool finds that aoenet_rcv() is never called in atomic context, >>> namely never in an interrupt handler or holding a spinlock. >>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL. >>> >>> This is found by a static analysis tool named DCNS written by myself. >>> >>> Signed-off-by: Jia-Ju Bai <baijiaju1...@gmail.com> >>> --- >>> drivers/block/aoe/aoenet.c |2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c >>> index 63773a9..d5fff7a 100644 >>> --- a/drivers/block/aoe/aoenet.c >>> +++ b/drivers/block/aoe/aoenet.c >>> @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str) >>>if (dev_net(ifp) != _net) >>>goto exit; >>> >>> -skb = skb_share_check(skb, GFP_ATOMIC); >>> +skb = skb_share_check(skb, GFP_KERNEL); >>>if (skb == NULL) >>>return 0; >>>if (!is_aoe_netif(ifp)) >>> -- >>> 1.7.9.5 >>> >>> > > Sorry, I find my report is false positive after I manually check the code. > aoenet_rcv() is used as function pointer via "->func", and it is called in > dev_queue_xmit_nit() in net/core/dev.c. > dev_queue_xmit_nit() calls a rcu_read_lock() before it calls pt_prev->func(). > Thus it is right to use GFP_ATOMIC in aoenet_rcv(). > Sorry again for my incorrect report... > > Thanks, > Jia-Ju Bai
Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv
Good luck in your efforts, and thanks for your work on static analysis. > On Jan 27, 2018, at 9:12 PM, Jia-Ju Bai wrote: > > > >> On 2018/1/28 1:48, Ed Cashin wrote: >> If the tool cannot tell whether the protected state is manipulated by >> *another* piece of code called in atomic context, then it's insufficient. >> >>> On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai wrote: >>> >>> After checking all possible call chains to aoenet_rcv(), >>> my tool finds that aoenet_rcv() is never called in atomic context, >>> namely never in an interrupt handler or holding a spinlock. >>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL. >>> >>> This is found by a static analysis tool named DCNS written by myself. >>> >>> Signed-off-by: Jia-Ju Bai >>> --- >>> drivers/block/aoe/aoenet.c |2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c >>> index 63773a9..d5fff7a 100644 >>> --- a/drivers/block/aoe/aoenet.c >>> +++ b/drivers/block/aoe/aoenet.c >>> @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str) >>>if (dev_net(ifp) != _net) >>>goto exit; >>> >>> -skb = skb_share_check(skb, GFP_ATOMIC); >>> +skb = skb_share_check(skb, GFP_KERNEL); >>>if (skb == NULL) >>>return 0; >>>if (!is_aoe_netif(ifp)) >>> -- >>> 1.7.9.5 >>> >>> > > Sorry, I find my report is false positive after I manually check the code. > aoenet_rcv() is used as function pointer via "->func", and it is called in > dev_queue_xmit_nit() in net/core/dev.c. > dev_queue_xmit_nit() calls a rcu_read_lock() before it calls pt_prev->func(). > Thus it is right to use GFP_ATOMIC in aoenet_rcv(). > Sorry again for my incorrect report... > > Thanks, > Jia-Ju Bai
Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv
If the tool cannot tell whether the protected state is manipulated by *another* piece of code called in atomic context, then it's insufficient. > On Jan 26, 2018, at 4:37 AM, Jia-Ju Baiwrote: > > After checking all possible call chains to aoenet_rcv(), > my tool finds that aoenet_rcv() is never called in atomic context, > namely never in an interrupt handler or holding a spinlock. > Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL. > > This is found by a static analysis tool named DCNS written by myself. > > Signed-off-by: Jia-Ju Bai > --- > drivers/block/aoe/aoenet.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c > index 63773a9..d5fff7a 100644 > --- a/drivers/block/aoe/aoenet.c > +++ b/drivers/block/aoe/aoenet.c > @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str) >if (dev_net(ifp) != _net) >goto exit; > > -skb = skb_share_check(skb, GFP_ATOMIC); > +skb = skb_share_check(skb, GFP_KERNEL); >if (skb == NULL) >return 0; >if (!is_aoe_netif(ifp)) > -- > 1.7.9.5 >
Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv
If the tool cannot tell whether the protected state is manipulated by *another* piece of code called in atomic context, then it's insufficient. > On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai wrote: > > After checking all possible call chains to aoenet_rcv(), > my tool finds that aoenet_rcv() is never called in atomic context, > namely never in an interrupt handler or holding a spinlock. > Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL. > > This is found by a static analysis tool named DCNS written by myself. > > Signed-off-by: Jia-Ju Bai > --- > drivers/block/aoe/aoenet.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c > index 63773a9..d5fff7a 100644 > --- a/drivers/block/aoe/aoenet.c > +++ b/drivers/block/aoe/aoenet.c > @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str) >if (dev_net(ifp) != _net) >goto exit; > > -skb = skb_share_check(skb, GFP_ATOMIC); > +skb = skb_share_check(skb, GFP_KERNEL); >if (skb == NULL) >return 0; >if (!is_aoe_netif(ifp)) > -- > 1.7.9.5 >
Re: [PATCH] [RESEND] aoe: use ktime_t instead of timeval
On 01/17/2018 10:41 AM, Jens Axboe wrote: ... Applied, thanks Arnd/Tina. Thank you for the CC. Sorry for the lack of response in November. -- Ed
Re: [PATCH] [RESEND] aoe: use ktime_t instead of timeval
On 01/17/2018 10:41 AM, Jens Axboe wrote: ... Applied, thanks Arnd/Tina. Thank you for the CC. Sorry for the lack of response in November. -- Ed
Re: [PATCH] block/aoe: discover_timer: Convert timers to use timer_setup()
On 11/02/2017 07:31 PM, Kees Cook wrote: In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. This refactors the discover_timer to remove the needless locking and state machine used for synchronizing timer death. Using del_timer_sync() will already do the right thing. Looks OK to me, thanks. -- Ed
Re: [PATCH] block/aoe: discover_timer: Convert timers to use timer_setup()
On 11/02/2017 07:31 PM, Kees Cook wrote: In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. This refactors the discover_timer to remove the needless locking and state machine used for synchronizing timer death. Using del_timer_sync() will already do the right thing. Looks OK to me, thanks. -- Ed
Re: [PATCH] aoe: use setup_timer
On 03/24/2017 10:15 AM, Geliang Tang wrote: Use setup_timer() instead of init_timer() to simplify the code. Signed-off-by: Geliang Tang--- drivers/block/aoe/aoemain.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/block/aoe/aoemain.c b/drivers/block/aoe/aoemain.c index 4b987c2..20865d4 100644 --- a/drivers/block/aoe/aoemain.c +++ b/drivers/block/aoe/aoemain.c @@ -28,10 +28,8 @@ discover_timer(ulong vp) switch (vp) { case TINIT: - init_timer(); + setup_timer(, discover_timer, TRUN); spin_lock_init(); - t.data = TRUN; - t.function = discover_timer; die = 0; case TRUN: spin_lock_irqsave(, flags); This change looks OK to me, thanks. -- Ed
Re: [PATCH] aoe: use setup_timer
On 03/24/2017 10:15 AM, Geliang Tang wrote: Use setup_timer() instead of init_timer() to simplify the code. Signed-off-by: Geliang Tang --- drivers/block/aoe/aoemain.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/block/aoe/aoemain.c b/drivers/block/aoe/aoemain.c index 4b987c2..20865d4 100644 --- a/drivers/block/aoe/aoemain.c +++ b/drivers/block/aoe/aoemain.c @@ -28,10 +28,8 @@ discover_timer(ulong vp) switch (vp) { case TINIT: - init_timer(); + setup_timer(, discover_timer, TRUN); spin_lock_init(); - t.data = TRUN; - t.function = discover_timer; die = 0; case TRUN: spin_lock_irqsave(, flags); This change looks OK to me, thanks. -- Ed
Re: [PATCH 06/15] genhd: Add return code to device_add_disk
On 08/17/2016 05:14 AM, Fam Zheng wrote: ... Of course, the plan is to write patches on top. I'm not cleaning up anything here because I'm concerned callers may double free (and I didn't look hard into that). Aside from Huck's concerns, the changes looked OK from aoe's perspective. -- Ed
Re: [PATCH 06/15] genhd: Add return code to device_add_disk
On 08/17/2016 05:14 AM, Fam Zheng wrote: ... Of course, the plan is to write patches on top. I'm not cleaning up anything here because I'm concerned callers may double free (and I didn't look hard into that). Aside from Huck's concerns, the changes looked OK from aoe's perspective. -- Ed
Re: [PATCH v2 05/12] aoeblk: Generate uevent after attribute available
On 06/29/2016 09:59 PM, Fam Zheng wrote: It is documented that KOBJ_ADD should be generated after the object's attributes and children are ready. We can achieve this with the new disk_gen_uevents interface. Looks like an improvement, thanks! -- Ed
Re: [PATCH v2 05/12] aoeblk: Generate uevent after attribute available
On 06/29/2016 09:59 PM, Fam Zheng wrote: It is documented that KOBJ_ADD should be generated after the object's attributes and children are ready. We can achieve this with the new disk_gen_uevents interface. Looks like an improvement, thanks! -- Ed
Re: [PATCH] aoe: remove unnecessary check for failing kthread creation
On 02/01/2016 10:53 AM, Insu Yun wrote: When kthread_run fails, it returns ERR, not NULL. Therefore, NULL checking is redundant. (https://www.kernel.org/doc/htmldocs/device-drivers/API-kthread-run.html) Thanks, the change looks reasonable. ... task = kthread_run(kthread, k, "%s", k->name); - if (task == NULL || IS_ERR(task)) + if (IS_ERR(task)) return -ENOMEM; Interestingly, after this change, it's more clear that returning -ENOMEM is a bit misleading when the underlying kernel/kthread.c function, kthread_create_on_node returns ERR_PTR(-EINTR). But because that would happen when the process we need to have the driver working gets killed, I'm not sure it matters. -- Ed
Re: [PATCH] aoe: remove unnecessary check for failing kthread creation
On 02/01/2016 10:53 AM, Insu Yun wrote: When kthread_run fails, it returns ERR, not NULL. Therefore, NULL checking is redundant. (https://www.kernel.org/doc/htmldocs/device-drivers/API-kthread-run.html) Thanks, the change looks reasonable. ... task = kthread_run(kthread, k, "%s", k->name); - if (task == NULL || IS_ERR(task)) + if (IS_ERR(task)) return -ENOMEM; Interestingly, after this change, it's more clear that returning -ENOMEM is a bit misleading when the underlying kernel/kthread.c function, kthread_create_on_node returns ERR_PTR(-EINTR). But because that would happen when the process we need to have the driver working gets killed, I'm not sure it matters. -- Ed
Re: [PATCH 09/39] aoe: drop null test before destroy functions
ACK. Thanks. On 09/13/2015 08:15 AM, Julia Lawall wrote: Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // Signed-off-by: Julia Lawall --- drivers/block/aoe/aoedev.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index ffd1947..efc8a8a 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -285,8 +285,7 @@ freedev(struct aoedev *d) e = t + d->ntargets; for (; t < e && *t; t++) freetgt(d, *t); - if (d->bufpool) - mempool_destroy(d->bufpool); + mempool_destroy(d->bufpool); skbpoolfree(d); minor_free(d->sysminor); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/39] aoe: drop null test before destroy functions
ACK. Thanks. On 09/13/2015 08:15 AM, Julia Lawall wrote: Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // Signed-off-by: Julia Lawall--- drivers/block/aoe/aoedev.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index ffd1947..efc8a8a 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -285,8 +285,7 @@ freedev(struct aoedev *d) e = t + d->ntargets; for (; t < e && *t; t++) freetgt(d, *t); - if (d->bufpool) - mempool_destroy(d->bufpool); + mempool_destroy(d->bufpool); skbpoolfree(d); minor_free(d->sysminor); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Revert "block: remove artifical max_hw_sectors cap"
ACK. On 08/13/2015 02:57 PM, Jeff Moyer wrote: This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc. That commit caused performance regressions for streaming I/O workloads on a number of different storage devices, from SATA disks to external RAID arrays. It also managed to trip up some buggy firmware in at least one drive, causing data corruption. The next patch will bump the default max_sectors_kb value to 1280, which will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress, a value of 1280 does not show a big performance difference from 512. This will hopefully still help the software RAID setup that Christoph saw the original performance gains with while still not regressing other storage configurations. Signed-off-by: Jeff Moyer --- block/blk-settings.c | 4 +++- drivers/block/aoe/aoeblk.c | 2 +- include/linux/blkdev.h | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index 12600bf..b160f89 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -257,7 +257,9 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_ __func__, max_hw_sectors); } - limits->max_sectors = limits->max_hw_sectors = max_hw_sectors; + limits->max_hw_sectors = max_hw_sectors; + limits->max_sectors = min_t(unsigned int, max_hw_sectors, + BLK_DEF_MAX_SECTORS); } EXPORT_SYMBOL(blk_limits_max_hw_sectors); diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index 46c282f..dd73e1f 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp) WARN_ON(d->flags & DEVFL_TKILL); WARN_ON(d->gd); WARN_ON(d->flags & DEVFL_UP); - blk_queue_max_hw_sectors(q, 1024); + blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS); q->backing_dev_info.name = "aoe"; q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE; d->bufpool = mp; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d4068c1..1fd459e1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1138,6 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm); enum blk_default_limits { BLK_MAX_SEGMENTS= 128, BLK_SAFE_MAX_SECTORS= 255, + BLK_DEF_MAX_SECTORS = 1024, BLK_MAX_SEGMENT_SIZE= 65536, BLK_SEG_BOUNDARY_MASK = 0xUL, }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Revert block: remove artifical max_hw_sectors cap
ACK. On 08/13/2015 02:57 PM, Jeff Moyer wrote: This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc. That commit caused performance regressions for streaming I/O workloads on a number of different storage devices, from SATA disks to external RAID arrays. It also managed to trip up some buggy firmware in at least one drive, causing data corruption. The next patch will bump the default max_sectors_kb value to 1280, which will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress, a value of 1280 does not show a big performance difference from 512. This will hopefully still help the software RAID setup that Christoph saw the original performance gains with while still not regressing other storage configurations. Signed-off-by: Jeff Moyer jmo...@redhat.com --- block/blk-settings.c | 4 +++- drivers/block/aoe/aoeblk.c | 2 +- include/linux/blkdev.h | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index 12600bf..b160f89 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -257,7 +257,9 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_ __func__, max_hw_sectors); } - limits-max_sectors = limits-max_hw_sectors = max_hw_sectors; + limits-max_hw_sectors = max_hw_sectors; + limits-max_sectors = min_t(unsigned int, max_hw_sectors, + BLK_DEF_MAX_SECTORS); } EXPORT_SYMBOL(blk_limits_max_hw_sectors); diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index 46c282f..dd73e1f 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp) WARN_ON(d-flags DEVFL_TKILL); WARN_ON(d-gd); WARN_ON(d-flags DEVFL_UP); - blk_queue_max_hw_sectors(q, 1024); + blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS); q-backing_dev_info.name = aoe; q-backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE; d-bufpool = mp; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d4068c1..1fd459e1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1138,6 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm); enum blk_default_limits { BLK_MAX_SEGMENTS= 128, BLK_SAFE_MAX_SECTORS= 255, + BLK_DEF_MAX_SECTORS = 1024, BLK_MAX_SEGMENT_SIZE= 65536, BLK_SEG_BOUNDARY_MASK = 0xUL, }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Convert use of __constant_ to
OK. Thanks. On 06/18/2015 11:15 PM, Vaishali Thakkar wrote: In little endian cases, the macros htons and cpu_to_be16 unfolds to __swab16 which provides special case for constants. In big endian cases, __constant_htons and htons expand directly to the same expression. The same applies for __constant_cpu_to_be16 and cpu_to_be16. So, replace __constant_htons with htons and __constant_cpu_to_be16 with cpu_to_be16 with the goal of getting rid of the definitions __constant_htons and __constant_cpu_to_be16 completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ ( - __constant_htons(x) + htons(x) | - __constant_cpu_to_be16(x) + cpu_to_be16(x) ) Signed-off-by: Vaishali Thakkar --- drivers/block/aoe/aoecmd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..6e891d2 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -69,7 +69,7 @@ new_skb(ulong len) skb_reserve(skb, MAX_HEADER); skb_reset_mac_header(skb); skb_reset_network_header(skb); - skb->protocol = __constant_htons(ETH_P_AOE); + skb->protocol = htons(ETH_P_AOE); skb_checksum_none_assert(skb); } return skb; @@ -132,7 +132,7 @@ aoehdr_atainit(struct aoedev *d, struct aoetgt *t, struct aoe_hdr *h) memcpy(h->src, t->ifp->nd->dev_addr, sizeof h->src); memcpy(h->dst, t->addr, sizeof h->dst); - h->type = __constant_cpu_to_be16(ETH_P_AOE); + h->type = cpu_to_be16(ETH_P_AOE); h->verfl = AOE_HVER; h->major = cpu_to_be16(d->aoemajor); h->minor = d->aoeminor; @@ -437,7 +437,7 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, struct sk_buff_head *qu memset(h->dst, 0xff, sizeof h->dst); memcpy(h->src, ifp->dev_addr, sizeof h->src); - h->type = __constant_cpu_to_be16(ETH_P_AOE); + h->type = cpu_to_be16(ETH_P_AOE); h->verfl = AOE_HVER; h->major = cpu_to_be16(aoemajor); h->minor = aoeminor; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Convert use of __constant_htons to htons
OK. Thanks. On 06/18/2015 11:23 PM, Vaishali Thakkar wrote: In little endian cases, the macro htons unfolds to __swab16 which provides special case for constants. In big endian cases, __constant_htons and htons expand directly to the same expression. So, replace __constant_htons with htons with the goal of getting rid of the definition of __constant_htons completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ - __constant_htons(x) + htons(x) Signed-off-by: Vaishali Thakkar --- drivers/block/aoe/aoenet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c index 63773a9..78ff47a 100644 --- a/drivers/block/aoe/aoenet.c +++ b/drivers/block/aoe/aoenet.c @@ -192,7 +192,7 @@ exit: } static struct packet_type aoe_pt __read_mostly = { - .type = __constant_htons(ETH_P_AOE), + .type = htons(ETH_P_AOE), .func = aoenet_rcv, }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Convert use of __constant_foo to foo
OK. Thanks. On 06/18/2015 11:15 PM, Vaishali Thakkar wrote: In little endian cases, the macros htons and cpu_to_be16 unfolds to __swab16 which provides special case for constants. In big endian cases, __constant_htons and htons expand directly to the same expression. The same applies for __constant_cpu_to_be16 and cpu_to_be16. So, replace __constant_htons with htons and __constant_cpu_to_be16 with cpu_to_be16 with the goal of getting rid of the definitions __constant_htons and __constant_cpu_to_be16 completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ ( - __constant_htons(x) + htons(x) | - __constant_cpu_to_be16(x) + cpu_to_be16(x) ) Signed-off-by: Vaishali Thakkar vthakkar1...@gmail.com --- drivers/block/aoe/aoecmd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..6e891d2 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -69,7 +69,7 @@ new_skb(ulong len) skb_reserve(skb, MAX_HEADER); skb_reset_mac_header(skb); skb_reset_network_header(skb); - skb-protocol = __constant_htons(ETH_P_AOE); + skb-protocol = htons(ETH_P_AOE); skb_checksum_none_assert(skb); } return skb; @@ -132,7 +132,7 @@ aoehdr_atainit(struct aoedev *d, struct aoetgt *t, struct aoe_hdr *h) memcpy(h-src, t-ifp-nd-dev_addr, sizeof h-src); memcpy(h-dst, t-addr, sizeof h-dst); - h-type = __constant_cpu_to_be16(ETH_P_AOE); + h-type = cpu_to_be16(ETH_P_AOE); h-verfl = AOE_HVER; h-major = cpu_to_be16(d-aoemajor); h-minor = d-aoeminor; @@ -437,7 +437,7 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, struct sk_buff_head *qu memset(h-dst, 0xff, sizeof h-dst); memcpy(h-src, ifp-dev_addr, sizeof h-src); - h-type = __constant_cpu_to_be16(ETH_P_AOE); + h-type = cpu_to_be16(ETH_P_AOE); h-verfl = AOE_HVER; h-major = cpu_to_be16(aoemajor); h-minor = aoeminor; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Convert use of __constant_htons to htons
OK. Thanks. On 06/18/2015 11:23 PM, Vaishali Thakkar wrote: In little endian cases, the macro htons unfolds to __swab16 which provides special case for constants. In big endian cases, __constant_htons and htons expand directly to the same expression. So, replace __constant_htons with htons with the goal of getting rid of the definition of __constant_htons completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ - __constant_htons(x) + htons(x) Signed-off-by: Vaishali Thakkar vthakkar1...@gmail.com --- drivers/block/aoe/aoenet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c index 63773a9..78ff47a 100644 --- a/drivers/block/aoe/aoenet.c +++ b/drivers/block/aoe/aoenet.c @@ -192,7 +192,7 @@ exit: } static struct packet_type aoe_pt __read_mostly = { - .type = __constant_htons(ETH_P_AOE), + .type = htons(ETH_P_AOE), .func = aoenet_rcv, }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in Please read the FAQ at http://www.tux.org/lkml/
Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame
On 05/13/2015 04:04 AM, Arnd Bergmann wrote: ... Shall we do the ktime_get_us() approach then? It still requires a 32-bit division like do_gettimeofday(), so it will not be as efficient as the shifted nanoseconds. It's no worse, though, right? So I think it's a good transition. Further optimization could be attempted in an experimental branch at some point for easy testing. As for the aoe_deadsecs computation, converting the aoe_deadsec module parameter into scaled nanoseconds can be done at module load time, and that way you also save the integer division you currently do for each frame in rexmit_timer() to turn the microseconds into seconds. Arnd That's true, but the "secs" in the identifier stands for "seconds". It would be misleading to have something called seconds be scaled nanoseconds. And we could just use another variable if it weren't for the fact that this module parameter is exposed through sysfs and can be changed through that mechanism at any time. -- Ed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame
On 05/13/2015 04:04 AM, Arnd Bergmann wrote: ... Shall we do the ktime_get_us() approach then? It still requires a 32-bit division like do_gettimeofday(), so it will not be as efficient as the shifted nanoseconds. It's no worse, though, right? So I think it's a good transition. Further optimization could be attempted in an experimental branch at some point for easy testing. As for the aoe_deadsecs computation, converting the aoe_deadsec module parameter into scaled nanoseconds can be done at module load time, and that way you also save the integer division you currently do for each frame in rexmit_timer() to turn the microseconds into seconds. Arnd That's true, but the secs in the identifier stands for seconds. It would be misleading to have something called seconds be scaled nanoseconds. And we could just use another variable if it weren't for the fact that this module parameter is exposed through sysfs and can be changed through that mechanism at any time. -- Ed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Use 64-bit timestamp in frame
On 05/12/2015 05:44 AM, Arnd Bergmann wrote: On Monday 11 May 2015 21:00:25 Ed Cashin wrote: ... In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these "unexpected" responses, this timing information can improve performance significantly without introducing extra overhead or risk. That path is not changed at all by this patch, right? It also looks like the jiffies information from there is only used to print an error message. That's right, thanks, the tag still has the old jiffies embedded in it. The information, though, is used to update the round-trip-time average and the running estimate of the RTT variance. For the unexpected responses that information can help the driver to maintain high performance when there are inconsistencies in the network performance. (calc_rttavg) -- Ed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame
On 05/12/2015 07:14 AM, Arnd Bergmann wrote: On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote: There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not? If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday. Two other approaches that occurred to me later: - introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms() and ktime_get_real_is() interfaces, to match the other interfaces we already provide. These could be done as efficiently or better than what aoe does manually today. - change the timebase that is used for the computations in aoe to use scaled nanoseconds instead of microseconds. Using u32 time = ktime_get_ns() >> 10; would give you a similar range and precision as microseconds, but completely avoid integer division. You could also use a different shift value to either extend the range beyond 71 minutes, or the extend the precision to something below a microsecond. This would be the most efficient implementation, but also require significant changes to the driver. That is an interesting idea. People do care about aoe_deadsecs being pretty accurate, so there would need to be a way to make that remain accurate. The driver will fail outstanding I/O to the target and mark it as "down" after unsuccessfully retransmitting commands to the target for a number of seconds equal to aoe_deadsecs. As to the efficient ktime_get_us idea, that sounds appealing since you mention that they would be efficient. Thanks for the analysis. -- Ed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Use 64-bit timestamp in frame
Thanks for the expanded motivation. I'll return to your ideas tonight, but I wanted to mention that it is possible for round trips to take well over five seconds, partly because the disk on the target might be resetting. If the rough-grained mechanism in the AoE tag still works, though, that's very helpful.
Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame
On 05/12/2015 07:14 AM, Arnd Bergmann wrote: On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote: There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not? If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday. Two other approaches that occurred to me later: - introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms() and ktime_get_real_is() interfaces, to match the other interfaces we already provide. These could be done as efficiently or better than what aoe does manually today. - change the timebase that is used for the computations in aoe to use scaled nanoseconds instead of microseconds. Using u32 time = ktime_get_ns() 10; would give you a similar range and precision as microseconds, but completely avoid integer division. You could also use a different shift value to either extend the range beyond 71 minutes, or the extend the precision to something below a microsecond. This would be the most efficient implementation, but also require significant changes to the driver. That is an interesting idea. People do care about aoe_deadsecs being pretty accurate, so there would need to be a way to make that remain accurate. The driver will fail outstanding I/O to the target and mark it as down after unsuccessfully retransmitting commands to the target for a number of seconds equal to aoe_deadsecs. As to the efficient ktime_get_us idea, that sounds appealing since you mention that they would be efficient. Thanks for the analysis. -- Ed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Use 64-bit timestamp in frame
On 05/12/2015 05:44 AM, Arnd Bergmann wrote: On Monday 11 May 2015 21:00:25 Ed Cashin wrote: ... In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these unexpected responses, this timing information can improve performance significantly without introducing extra overhead or risk. That path is not changed at all by this patch, right? It also looks like the jiffies information from there is only used to print an error message. That's right, thanks, the tag still has the old jiffies embedded in it. The information, though, is used to update the round-trip-time average and the running estimate of the RTT variance. For the unexpected responses that information can help the driver to maintain high performance when there are inconsistencies in the network performance. (calc_rttavg) -- Ed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: Use 64-bit timestamp in frame
Thanks for the expanded motivation. I'll return to your ideas tonight, but I wanted to mention that it is possible for round trips to take well over five seconds, partly because the disk on the target might be resetting. If the rough-grained mechanism in the AoE tag still works, though, that's very helpful.
Re: [PATCH] aoe: Use 64-bit timestamp in frame
First, thanks for the patch. I do appreciate the attempt to simplify this part of the driver, but I don't think that this patch is good to merge. I'll make some comments inline below. On 05/10/2015 10:35 PM, Tina Ruchandani wrote: 'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038. The comment in the deleted lines below mentions the fact that the overflow does not matter for calculating rough-grained deltas in time. So there is no problem in 2038 or on systems with the clock set to 2038 accidentally. This patch does the following: - Replace the use of 'struct timeval' and jiffies with ktime_t, which is a 64-bit timestamp and is year 2038 safe. - ktime_t provides both long range (like jiffies) and high resolution (like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time. But the patch only changes the struct frame data. The aoe driver only has the struct frame for an incoming AoE response when that response is "expected". If the response comes in a bit late, the frame may have already been used for a new command. You can see that in aoecmd_ata_rsp when getframe_deferred returns NULL and tsince is called instead of tsince_hr. In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these "unexpected" responses, this timing information can improve performance significantly without introducing extra overhead or risk. I don't think the patch considers this aspect of the way the round trip time is calculated, and I don't think the primary motivation is justified (if that's 2038 safety, which we have already). Simplifying it would be nice, but it would be difficult to thoroughly test all of the performance implications. There are still people using 32-bit systems, for example. Signed-off-by: Tina Ruchandani --- drivers/block/aoe/aoe.h| 3 +-- drivers/block/aoe/aoecmd.c | 36 +++- 2 files changed, 8 insertions(+), 31 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 9220f8e..4582b3c 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -112,8 +112,7 @@ enum frame_flags { struct frame { struct list_head head; u32 tag; - struct timeval sent;/* high-res time packet was sent */ - u32 sent_jiffs; /* low-res jiffies-based sent time */ + ktime_t sent; ulong waited; ulong waited_total; struct aoetgt *t; /* parent target I belong to */ diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..7f78780 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -398,8 +398,7 @@ aoecmd_ata_rw(struct aoedev *d) skb = skb_clone(f->skb, GFP_ATOMIC); if (skb) { - do_gettimeofday(>sent); - f->sent_jiffs = (u32) jiffies; + f->sent = ktime_get(); __skb_queue_head_init(); __skb_queue_tail(, skb); aoenet_xmit(); @@ -489,8 +488,7 @@ resend(struct aoedev *d, struct frame *f) skb = skb_clone(skb, GFP_ATOMIC); if (skb == NULL) return; - do_gettimeofday(>sent); - f->sent_jiffs = (u32) jiffies; + f->sent = ktime_get(); __skb_queue_head_init(); __skb_queue_tail(, skb); aoenet_xmit(); @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) { - struct timeval now; + ktime_t now; int n; - do_gettimeofday(); - n = now.tv_usec - f->sent.tv_usec; - n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC; + now = ktime_get(); + n = ktime_to_us(ktime_sub(now, f->sent)); if (n < 0) n = -n; - /* For relatively long periods, use jiffies to avoid -* discrepancies caused by updates to the system time. -* -* On system with HZ of 1000, 32-bits is over 49 days -* worth of jiffies, or over 71 minutes worth of usecs. -* -* Jiffies overflow is handled by subtraction of unsigned ints: -* (gdb) print (unsigned) 2 - (unsigned) 0xfffe -* $3 = 4 -* (gdb) -*/ - if (n > USEC_PER_SEC / 4) { - n = ((u32) jiffies) - f->sent_jiffs; - n *= USEC_PER_SEC / HZ; - } - return n; } @@ -589,7 +570,6 @@ reassign_frame(struct frame *f) nf->waited = 0; nf->waited_total = f->waited_total; nf->sent = f->sent; - nf->sent_jiffs = f->sent_jiffs; f->skb = skb; return
Re: [PATCH] aoe: Use 64-bit timestamp in frame
I would like to see some performance measurements for this patch on a system with fast storage and multiple 10 GbE links. If not, at least a good analysis of the expected performance impact the patch will have on major architectures. Tonight I will think about whether the 2038 thing even matters or whether we just need a comment explaining why it's safe. On May 11, 2015 11:38 AM, Arnd Bergmann wrote: > > On Monday 11 May 2015 08:05:05 Tina Ruchandani wrote: > > 'struct frame' uses two variables to store the sent timestamp - 'struct > > timeval' and jiffies. jiffies is used to avoid discrepancies caused by > > updates to system time. 'struct timeval' uses 32-bit representation for > > seconds which will overflow in year 2038. > > This patch does the following: > > - Replace the use of 'struct timeval' and jiffies with ktime_t, which > > is a 64-bit timestamp and is year 2038 safe. > > - ktime_t provides both long range (like jiffies) and high resolution > > (like timeval). Using ktime_get (monotonic time) instead of wall-clock > > time prevents any discprepancies caused by updates to system time. > > > > Signed-off-by: Tina Ruchandani > > Very nice! > > > @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) > > static int > > tsince_hr(struct frame *f) > > { > > - struct timeval now; > > + ktime_t now; > > int n; > > > > - do_gettimeofday(); > > - n = now.tv_usec - f->sent.tv_usec; > > - n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC; > > + now = ktime_get(); > > + n = ktime_to_us(ktime_sub(now, f->sent)); > > > > I would cut four extra lines by writing this as > > return ktime_us_delta(ktime_get(), f->sent)); > > but the effect is exactly the same. > > With that change, please add > > Reviewed-by: Arnd Bergmann > > Arnd
Re: [PATCH] aoe: Use 64-bit timestamp in frame
First, thanks for the patch. I do appreciate the attempt to simplify this part of the driver, but I don't think that this patch is good to merge. I'll make some comments inline below. On 05/10/2015 10:35 PM, Tina Ruchandani wrote: 'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038. The comment in the deleted lines below mentions the fact that the overflow does not matter for calculating rough-grained deltas in time. So there is no problem in 2038 or on systems with the clock set to 2038 accidentally. This patch does the following: - Replace the use of 'struct timeval' and jiffies with ktime_t, which is a 64-bit timestamp and is year 2038 safe. - ktime_t provides both long range (like jiffies) and high resolution (like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time. But the patch only changes the struct frame data. The aoe driver only has the struct frame for an incoming AoE response when that response is expected. If the response comes in a bit late, the frame may have already been used for a new command. You can see that in aoecmd_ata_rsp when getframe_deferred returns NULL and tsince is called instead of tsince_hr. In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these unexpected responses, this timing information can improve performance significantly without introducing extra overhead or risk. I don't think the patch considers this aspect of the way the round trip time is calculated, and I don't think the primary motivation is justified (if that's 2038 safety, which we have already). Simplifying it would be nice, but it would be difficult to thoroughly test all of the performance implications. There are still people using 32-bit systems, for example. Signed-off-by: Tina Ruchandani ruchandani.t...@gmail.com --- drivers/block/aoe/aoe.h| 3 +-- drivers/block/aoe/aoecmd.c | 36 +++- 2 files changed, 8 insertions(+), 31 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 9220f8e..4582b3c 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -112,8 +112,7 @@ enum frame_flags { struct frame { struct list_head head; u32 tag; - struct timeval sent;/* high-res time packet was sent */ - u32 sent_jiffs; /* low-res jiffies-based sent time */ + ktime_t sent; ulong waited; ulong waited_total; struct aoetgt *t; /* parent target I belong to */ diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..7f78780 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -398,8 +398,7 @@ aoecmd_ata_rw(struct aoedev *d) skb = skb_clone(f-skb, GFP_ATOMIC); if (skb) { - do_gettimeofday(f-sent); - f-sent_jiffs = (u32) jiffies; + f-sent = ktime_get(); __skb_queue_head_init(queue); __skb_queue_tail(queue, skb); aoenet_xmit(queue); @@ -489,8 +488,7 @@ resend(struct aoedev *d, struct frame *f) skb = skb_clone(skb, GFP_ATOMIC); if (skb == NULL) return; - do_gettimeofday(f-sent); - f-sent_jiffs = (u32) jiffies; + f-sent = ktime_get(); __skb_queue_head_init(queue); __skb_queue_tail(queue, skb); aoenet_xmit(queue); @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) { - struct timeval now; + ktime_t now; int n; - do_gettimeofday(now); - n = now.tv_usec - f-sent.tv_usec; - n += (now.tv_sec - f-sent.tv_sec) * USEC_PER_SEC; + now = ktime_get(); + n = ktime_to_us(ktime_sub(now, f-sent)); if (n 0) n = -n; - /* For relatively long periods, use jiffies to avoid -* discrepancies caused by updates to the system time. -* -* On system with HZ of 1000, 32-bits is over 49 days -* worth of jiffies, or over 71 minutes worth of usecs. -* -* Jiffies overflow is handled by subtraction of unsigned ints: -* (gdb) print (unsigned) 2 - (unsigned) 0xfffe -* $3 = 4 -* (gdb) -*/ - if (n USEC_PER_SEC / 4) { - n = ((u32) jiffies) - f-sent_jiffs; - n *= USEC_PER_SEC / HZ; - } - return n; } @@ -589,7 +570,6 @@ reassign_frame(struct frame *f) nf-waited = 0; nf-waited_total = f-waited_total; nf-sent = f-sent; - nf-sent_jiffs =
Re: [PATCH] aoe: Use 64-bit timestamp in frame
I would like to see some performance measurements for this patch on a system with fast storage and multiple 10 GbE links. If not, at least a good analysis of the expected performance impact the patch will have on major architectures. Tonight I will think about whether the 2038 thing even matters or whether we just need a comment explaining why it's safe. On May 11, 2015 11:38 AM, Arnd Bergmann a...@arndb.de wrote: On Monday 11 May 2015 08:05:05 Tina Ruchandani wrote: 'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038. This patch does the following: - Replace the use of 'struct timeval' and jiffies with ktime_t, which is a 64-bit timestamp and is year 2038 safe. - ktime_t provides both long range (like jiffies) and high resolution (like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time. Signed-off-by: Tina Ruchandani ruchandani.t...@gmail.com Very nice! @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) { - struct timeval now; + ktime_t now; int n; - do_gettimeofday(now); - n = now.tv_usec - f-sent.tv_usec; - n += (now.tv_sec - f-sent.tv_sec) * USEC_PER_SEC; + now = ktime_get(); + n = ktime_to_us(ktime_sub(now, f-sent)); I would cut four extra lines by writing this as return ktime_us_delta(ktime_get(), f-sent)); but the effect is exactly the same. With that change, please add Reviewed-by: Arnd Bergmann a...@arndb.de Arnd
[PATCH] aoe: Update aoe maintainer information
The coraid.com email address is defunct. The old aoe support area hosted at coraid.com is no longer up. These changes update the email and website to current ones. Signed-off-by: Ed Cashin --- The docs need updating as well, but I want to fix the email before tackling that. MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5d1bddc..7f58068 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1679,8 +1679,8 @@ F:drivers/misc/eeprom/at24.c F: include/linux/platform_data/at24.h ATA OVER ETHERNET (AOE) DRIVER -M: "Ed L. Cashin" -W: http://support.coraid.com/support/linux +M: "Ed L. Cashin" +W: http://www.openaoe.org/ S: Supported F: Documentation/aoe/ F: drivers/block/aoe/ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] aoe: Update aoe maintainer information
The coraid.com email address is defunct. The old aoe support area hosted at coraid.com is no longer up. These changes update the email and website to current ones. Signed-off-by: Ed Cashin ed.cas...@acm.org --- The docs need updating as well, but I want to fix the email before tackling that. MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5d1bddc..7f58068 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1679,8 +1679,8 @@ F:drivers/misc/eeprom/at24.c F: include/linux/platform_data/at24.h ATA OVER ETHERNET (AOE) DRIVER -M: Ed L. Cashin ecas...@coraid.com -W: http://support.coraid.com/support/linux +M: Ed L. Cashin ed.cas...@acm.org +W: http://www.openaoe.org/ S: Supported F: Documentation/aoe/ F: drivers/block/aoe/ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] incorrect use of init_completion fixup
On Tue, Dec 23, 2014 at 12:44 PM, Nicholas Mc Guire wrote: > > The second init_completion call should be a reinit_completion here. > > patch is against 3.18.0 linux-next > > Signed-off-by: Nicholas Mc Guire > --- > drivers/block/aoe/aoecmd.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c > index 422b7d8..2239513 100644 > --- a/drivers/block/aoe/aoecmd.c > +++ b/drivers/block/aoe/aoecmd.c > @@ -1331,7 +1331,7 @@ aoe_ktstart(struct ktstate *k) > return -ENOMEM; > k->task = task; > wait_for_completion(>rendez); /* allow kthread to start */ > - init_completion(>rendez);/* for waiting for exit later */ > + reinit_completion(>rendez); /* for waiting for exit later */ > return 0; > } That looks good to me, thanks. -- Ed Cashin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] incorrect use of init_completion fixup
On Tue, Dec 23, 2014 at 12:44 PM, Nicholas Mc Guire der.h...@hofr.at wrote: The second init_completion call should be a reinit_completion here. patch is against 3.18.0 linux-next Signed-off-by: Nicholas Mc Guire der.h...@hofr.at --- drivers/block/aoe/aoecmd.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..2239513 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -1331,7 +1331,7 @@ aoe_ktstart(struct ktstate *k) return -ENOMEM; k-task = task; wait_for_completion(k-rendez); /* allow kthread to start */ - init_completion(k-rendez);/* for waiting for exit later */ + reinit_completion(k-rendez); /* for waiting for exit later */ return 0; } That looks good to me, thanks. -- Ed Cashin ecas...@noserose.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] aoe: remove do-nothing NAME="%k" term from example udev rules
When the example udev rules in the documentation are used without modification, warnings like the one shown below appear in the system logs: /var/log/messages:Aug 22 11:09:11 kung udevd[445]: NAME="%k" \ is superfluous and breaks kernel supplied names, please remove \ it from /etc/udev/rules.d/60-aoe.rules:26 Removing the term does not cause any problems with the creation of the special character and block device nodes. Signed-off-by: Ed Cashin --- Documentation/aoe/udev.txt |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt index 8686e78..1f06daf 100644 --- a/Documentation/aoe/udev.txt +++ b/Documentation/aoe/udev.txt @@ -23,4 +23,4 @@ SUBSYSTEM=="aoe", KERNEL=="revalidate", NAME="etherd/%k", GROUP="disk", MODE="02 SUBSYSTEM=="aoe", KERNEL=="flush", NAME="etherd/%k", GROUP="disk", MODE="0220" # aoe block devices -KERNEL=="etherd*", NAME="%k", GROUP="disk" +KERNEL=="etherd*", GROUP="disk" -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] aoe: remove too-strict BUG from aoedev teardown and update udev example
This patch series applies to linux-next/akpm fetched 22 August, commit a187db44dd9b24a9eed2141af962306a36935423. Yesterday's linux-next/akpm has no updates to the aoe files that could conflict, though. Ed L. Cashin (2): aoe: do not BUG if memory pressure prevented debugfs file creation aoe: remove do-nothing NAME="%k" term from example udev rules Documentation/aoe/udev.txt |2 +- drivers/block/aoe/aoeblk.c |1 - 2 files changed, 1 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] aoe: do not BUG if memory pressure prevented debugfs file creation
If the system has trouble allocating memory for the creation of the aoe debugfs directory or of a file inside it, the debugfs member of an aoedev can be NULL. Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown, avoiding the user impact of an unecessary panic. Signed-off-by: Ed Cashin --- drivers/block/aoe/aoeblk.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index d63dcf0..dd73e1f 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -215,7 +215,6 @@ aoedisk_add_debugfs(struct aoedev *d) void aoedisk_rm_debugfs(struct aoedev *d) { - BUG_ON(d->debugfs == NULL); debugfs_remove(d->debugfs); d->debugfs = NULL; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] aoe: do not BUG if memory pressure prevented debugfs file creation
If the system has trouble allocating memory for the creation of the aoe debugfs directory or of a file inside it, the debugfs member of an aoedev can be NULL. Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown, avoiding the user impact of an unecessary panic. Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoeblk.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index d63dcf0..dd73e1f 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -215,7 +215,6 @@ aoedisk_add_debugfs(struct aoedev *d) void aoedisk_rm_debugfs(struct aoedev *d) { - BUG_ON(d-debugfs == NULL); debugfs_remove(d-debugfs); d-debugfs = NULL; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] aoe: remove too-strict BUG from aoedev teardown and update udev example
This patch series applies to linux-next/akpm fetched 22 August, commit a187db44dd9b24a9eed2141af962306a36935423. Yesterday's linux-next/akpm has no updates to the aoe files that could conflict, though. Ed L. Cashin (2): aoe: do not BUG if memory pressure prevented debugfs file creation aoe: remove do-nothing NAME=%k term from example udev rules Documentation/aoe/udev.txt |2 +- drivers/block/aoe/aoeblk.c |1 - 2 files changed, 1 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] aoe: remove do-nothing NAME=%k term from example udev rules
When the example udev rules in the documentation are used without modification, warnings like the one shown below appear in the system logs: /var/log/messages:Aug 22 11:09:11 kung udevd[445]: NAME=%k \ is superfluous and breaks kernel supplied names, please remove \ it from /etc/udev/rules.d/60-aoe.rules:26 Removing the term does not cause any problems with the creation of the special character and block device nodes. Signed-off-by: Ed Cashin ecas...@coraid.com --- Documentation/aoe/udev.txt |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt index 8686e78..1f06daf 100644 --- a/Documentation/aoe/udev.txt +++ b/Documentation/aoe/udev.txt @@ -23,4 +23,4 @@ SUBSYSTEM==aoe, KERNEL==revalidate, NAME=etherd/%k, GROUP=disk, MODE=02 SUBSYSTEM==aoe, KERNEL==flush, NAME=etherd/%k, GROUP=disk, MODE=0220 # aoe block devices -KERNEL==etherd*, NAME=%k, GROUP=disk +KERNEL==etherd*, GROUP=disk -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: suppress compiler warnings
On Aug 26, 2013, at 5:45 AM, Andy Shevchenko wrote: > This patch fixes following compiler warnings: > > drivers/block/aoe/aoecmd.c: In function ‘aoecmd_ata_rw’: > drivers/block/aoe/aoecmd.c:383:17: warning: variable ‘t’ set but not used > [-Wunused-but-set-variable] > struct aoetgt *t; > ^ Looks good, thanks. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: suppress compiler warnings
On Aug 26, 2013, at 5:45 AM, Andy Shevchenko wrote: This patch fixes following compiler warnings: drivers/block/aoe/aoecmd.c: In function ‘aoecmd_ata_rw’: drivers/block/aoe/aoecmd.c:383:17: warning: variable ‘t’ set but not used [-Wunused-but-set-variable] struct aoetgt *t; ^ Looks good, thanks. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/22] block: Abstract out bvec iterator
On Aug 9, 2013, Ed Cashin wrote: > On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote: > ... > > It's in the for-jens branch now. > > > Just examining the patches, I like the way it cleans up the aoe code. I > had a question about a new BUG added by the for-jens branch the > read-response handling path of the aoe driver. The aoe driver in linux-bcache/for-jens commit 4c36c973a8f45 is passing my tests. Here is a patch against that branch illustrating my suggestion for handling bad target responses gracefully. commit 2c39f50b1ee02e2ac07fd072a883a91713da53cc Author: Ed Cashin Date: Tue Aug 13 10:50:28 2013 -0400 aoe: bad AoE responses fail I/O without BUG Instead of having a BUG when the AoE target does something wrong, just fail the I/O and log the problem with rate limiting. diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index cacd48e..b9916a6 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -1096,7 +1096,6 @@ bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt) int soff = 0; struct bio_vec bv; - BUG_ON(cnt > iter.bi_size); iter.bi_size = cnt; __bio_for_each_segment(bv, bio, iter, iter) { @@ -1196,6 +1195,14 @@ noskb: if (buf) clear_bit(BIO_UPTODATE, >bio->bi_flags); break; } + if (n > f->iter.bi_size) { + pr_err_ratelimited("%s e%ld.%d. bytes=%ld need=%u\n", + "aoe: too-large data size in read from", + (long) d->aoemajor, d->aoeminor, + n, f->iter.bi_size); + clear_bit(BIO_UPTODATE, >bio->bi_flags); + break; + } bvcpy(skb, f->buf->bio, f->iter, n); case ATA_CMD_PIO_WRITE: case ATA_CMD_PIO_WRITE_EXT: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/22] block: Abstract out bvec iterator
On Aug 9, 2013, Ed Cashin wrote: On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote: ... It's in the for-jens branch now. Just examining the patches, I like the way it cleans up the aoe code. I had a question about a new BUG added by the for-jens branch the read-response handling path of the aoe driver. The aoe driver in linux-bcache/for-jens commit 4c36c973a8f45 is passing my tests. Here is a patch against that branch illustrating my suggestion for handling bad target responses gracefully. commit 2c39f50b1ee02e2ac07fd072a883a91713da53cc Author: Ed Cashin ecas...@coraid.com Date: Tue Aug 13 10:50:28 2013 -0400 aoe: bad AoE responses fail I/O without BUG Instead of having a BUG when the AoE target does something wrong, just fail the I/O and log the problem with rate limiting. diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index cacd48e..b9916a6 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -1096,7 +1096,6 @@ bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt) int soff = 0; struct bio_vec bv; - BUG_ON(cnt iter.bi_size); iter.bi_size = cnt; __bio_for_each_segment(bv, bio, iter, iter) { @@ -1196,6 +1195,14 @@ noskb: if (buf) clear_bit(BIO_UPTODATE, buf-bio-bi_flags); break; } + if (n f-iter.bi_size) { + pr_err_ratelimited(%s e%ld.%d. bytes=%ld need=%u\n, + aoe: too-large data size in read from, + (long) d-aoemajor, d-aoeminor, + n, f-iter.bi_size); + clear_bit(BIO_UPTODATE, buf-bio-bi_flags); + break; + } bvcpy(skb, f-buf-bio, f-iter, n); case ATA_CMD_PIO_WRITE: case ATA_CMD_PIO_WRITE_EXT: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/22] block: Abstract out bvec iterator
On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote: ... > It's in the for-jens branch now. Just examining the patches, I like the way it cleans up the aoe code. I had a question about a new BUG added by the for-jens branch the read-response handling path of the aoe driver. It looks like if a misbehaving AoE target has a bad (too high compared to the request) sector count, but sends a packet large enough for that sector count to seem legit in ktiocomplete, then the patched bvcpy will BUG. An example would be the case where 1024 bytes was requested but a (bad but possible) AoE read response comes back with 4096 bytes in a jumbo frame. Here's an excerpt from ktiocomplete: n = ahout->scnt << 9; switch (ahout->cmdstat) { case ATA_CMD_PIO_READ: case ATA_CMD_PIO_READ_EXT: if (skb->len < n) { pr_err("%s e%ld.%d. skb->len=%d need=%ld\n", "aoe: runt data size in read from", (long) d->aoemajor, d->aoeminor, skb->len, n); clear_bit(BIO_UPTODATE, >bio->bi_flags); break; } bvcpy(skb, f->buf->bio, f->iter, n); ... and earlier in linux-bcache/for-jens aoecmd.c there's bvcpy ... static void bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt) { int soff = 0; struct bio_vec bv; BUG_ON(cnt > iter.bi_size); It seems like it would be better to treat that case as another indication of a problem with the target that gets logged when the AoE response is ignored, just as happens for "runt data size". That way people working on or trying out experimental AoE targets don't panic the initiator system. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/22] block: Abstract out bvec iterator
On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote: ... It's in the for-jens branch now. Just examining the patches, I like the way it cleans up the aoe code. I had a question about a new BUG added by the for-jens branch the read-response handling path of the aoe driver. It looks like if a misbehaving AoE target has a bad (too high compared to the request) sector count, but sends a packet large enough for that sector count to seem legit in ktiocomplete, then the patched bvcpy will BUG. An example would be the case where 1024 bytes was requested but a (bad but possible) AoE read response comes back with 4096 bytes in a jumbo frame. Here's an excerpt from ktiocomplete: n = ahout-scnt 9; switch (ahout-cmdstat) { case ATA_CMD_PIO_READ: case ATA_CMD_PIO_READ_EXT: if (skb-len n) { pr_err(%s e%ld.%d. skb-len=%d need=%ld\n, aoe: runt data size in read from, (long) d-aoemajor, d-aoeminor, skb-len, n); clear_bit(BIO_UPTODATE, buf-bio-bi_flags); break; } bvcpy(skb, f-buf-bio, f-iter, n); ... and earlier in linux-bcache/for-jens aoecmd.c there's bvcpy ... static void bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt) { int soff = 0; struct bio_vec bv; BUG_ON(cnt iter.bi_size); It seems like it would be better to treat that case as another indication of a problem with the target that gets logged when the AoE response is ignored, just as happens for runt data size. That way people working on or trying out experimental AoE targets don't panic the initiator system. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/22] block: Abstract out bvec iterator
On Aug 8, 2013, at 8:09 PM, Kent Overstreet wrote: > On Wed, Aug 07, 2013 at 10:04:36PM -0400, Ed Cashin wrote: >> On Aug 7, 2013, at 5:54 PM, Kent Overstreet wrote: >> >>> Immutable biovecs are going to require an explicit iterator. To >>> implement immutable bvecs, a later patch is going to add a bi_bvec_done >>> member to this struct; for now, this patch effectively just renames >>> things. >> >> Hi, Kent Overstreet. Thanks for Cc'ing me and for the promising work. >> >> Were you able to do sanity tests with aoe this time around? Last time, >> basic I/O was not working with the immutable biovec patches applied. >> >> Here is my 28 June email about my experiences with >> git://evilpiepirate.org/~kent/linux-bcache.git at that time. It also >> includes information about creating an easy software-only aoe test >> environment. >> >> http://thread.gmane.org/gmane.linux.kernel/1505222/focus=1517924 > > Hey, thanks for testing it - sorry, I think I remember seeing that email > last time and got sidetracked before I got around to setting up some > tests. No worries. > I think I've got it working now, it's running the same stress tests I > use for bcache. Here's a fixed patch, I broke the aoe changes out into > their own patch since they were more involved than most of the others: I had added your git tree as a remote and checked out the for-jens branch. This patch conflicts with some of the changes in that patch, so I could use a pointer as to which base this new patch applies to. If it's already in some branch in linux-bcache, I can just use it there, and that's even more convenient. > (I am seeing a bug where it's getting stuck after running stress tests > for awhile, but I can reproduce that without the aoe changes too...) Just so I know what to look for, what does that behavior look like? I/O stops going and procs get stuck in disk sleep, maybe? Thanks. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 8, 2013, at 3:56 PM, Andrew Morton wrote: > On Wed, 7 Aug 2013 20:50:09 -0400 Ed Cashin wrote: ... >> When the workaround was created, it was with the assumption that the >> zero-count pages are not always tail pages, and that seemed to be the case >> in 2007, but as I said, I don't have a mechanism for detecting that now, so >> I cannot say whether it really happens with today's kernel. > > It sounds we should pull out all that code and retest. It shouldn't be > needed - if this results in some failure then I suspect core MM will > need changes. OK. I'll look into that. It sure would be nice to get rid of it. > Why don't you have a "mechanism for detecting that"? It's a matter of > pointing AOE at some hugetlb pages? No, I was testing with hugetlb pages already. But there are many use case combinations, and I meant that aoe has no mechanism that has been in the aoe code all these years to catch specific (possibly rare) use cases that result in zero-count pages. It would be something like this: +static void +check_page_counts(struct bio *bio) +{ + struct bio_vec *bv; + int n; + int i; + + bio_for_each_segment(bv, bio, i) { + n = page_count(bv->bv_page); + WARN_ONCE(n <= 0, + "aoe: page %d in bio has non-positive count %d\n", + i, n); + } +} I'll add that when I try testing without the page count manipulation. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 8, 2013, at 3:56 PM, Andrew Morton wrote: On Wed, 7 Aug 2013 20:50:09 -0400 Ed Cashin ecas...@coraid.com wrote: ... When the workaround was created, it was with the assumption that the zero-count pages are not always tail pages, and that seemed to be the case in 2007, but as I said, I don't have a mechanism for detecting that now, so I cannot say whether it really happens with today's kernel. It sounds we should pull out all that code and retest. It shouldn't be needed - if this results in some failure then I suspect core MM will need changes. OK. I'll look into that. It sure would be nice to get rid of it. Why don't you have a mechanism for detecting that? It's a matter of pointing AOE at some hugetlb pages? No, I was testing with hugetlb pages already. But there are many use case combinations, and I meant that aoe has no mechanism that has been in the aoe code all these years to catch specific (possibly rare) use cases that result in zero-count pages. It would be something like this: +static void +check_page_counts(struct bio *bio) +{ + struct bio_vec *bv; + int n; + int i; + + bio_for_each_segment(bv, bio, i) { + n = page_count(bv-bv_page); + WARN_ONCE(n = 0, + aoe: page %d in bio has non-positive count %d\n, + i, n); + } +} I'll add that when I try testing without the page count manipulation. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/22] block: Abstract out bvec iterator
On Aug 8, 2013, at 8:09 PM, Kent Overstreet wrote: On Wed, Aug 07, 2013 at 10:04:36PM -0400, Ed Cashin wrote: On Aug 7, 2013, at 5:54 PM, Kent Overstreet wrote: Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Hi, Kent Overstreet. Thanks for Cc'ing me and for the promising work. Were you able to do sanity tests with aoe this time around? Last time, basic I/O was not working with the immutable biovec patches applied. Here is my 28 June email about my experiences with git://evilpiepirate.org/~kent/linux-bcache.git at that time. It also includes information about creating an easy software-only aoe test environment. http://thread.gmane.org/gmane.linux.kernel/1505222/focus=1517924 Hey, thanks for testing it - sorry, I think I remember seeing that email last time and got sidetracked before I got around to setting up some tests. No worries. I think I've got it working now, it's running the same stress tests I use for bcache. Here's a fixed patch, I broke the aoe changes out into their own patch since they were more involved than most of the others: I had added your git tree as a remote and checked out the for-jens branch. This patch conflicts with some of the changes in that patch, so I could use a pointer as to which base this new patch applies to. If it's already in some branch in linux-bcache, I can just use it there, and that's even more convenient. (I am seeing a bug where it's getting stuck after running stress tests for awhile, but I can reproduce that without the aoe changes too...) Just so I know what to look for, what does that behavior look like? I/O stops going and procs get stuck in disk sleep, maybe? Thanks. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 8:05 PM, Andrew Morton wrote: > On Wed, 7 Aug 2013 19:54:45 -0400 Ed Cashin wrote: > >>> +bio_pageinc(struct bio *bio) >>> +{ >>> + struct bio_vec *bv; >>> + struct page *page; >>> + int i; >>> + >>> + bio_for_each_segment(bv, bio, i) { >>> + page = bv->bv_page; >>> + /* Non-zero page count for non-head members of >>> +* compound pages is no longer allowed by the kernel, >>> +* but this has never been seen here. >>> +*/ >>> + if (unlikely(PageCompound(page))) >>> + if (compound_trans_head(page) != page) { >>> + pr_crit("page tail used for block I/O\n"); >>> + BUG(); >>> + } >>> >>> But get_page/put_page against a tail page of a compound page should >>> Just Work. The core MM will hunt down the head page and manipulate its >>> refcount. >>> >>> Perhaps the problem here is that AOE is diving under the covers and is >>> using low-level page refcount alterations: >>> >>> + atomic_inc(>_count); >>> >>> Why does AOE do this? It would be better if it were to use the >>> official published MM interfaces. If those interfaces need >>> alteration/augmentation then we can do that. >> >> The get_page cannot be used when the _count is 0 (which is exactly when we >> need to increment it), because it will VM_BUG_ON. >> >> /* >> * Getting a normal page or the head of a compound page >> * requires to already have an elevated page->_count. >> */ >> VM_BUG_ON(atomic_read(>_count) <= 0); >> > > But we shouldn't get that far: > > static inline void get_page(struct page *page) > { > if (unlikely(PageTail(page))) > if (likely(__get_page_tail(page))) > return; > /* >* Getting a normal page or the head of a compound page >* requires to already have an elevated page->_count. >*/ > VM_BUG_ON(atomic_read(>_count) <= 0); > atomic_inc(>_count); > } > > This is a tail page, so we should be using __get_page_tail(). When the workaround was created, it was with the assumption that the zero-count pages are not always tail pages, and that seemed to be the case in 2007, but as I said, I don't have a mechanism for detecting that now, so I cannot say whether it really happens with today's kernel. If it is never correct for normal pages or compound page heads to have a zero count while they are associated with a bio, then yes, I think get_page is a great solution. The VM_BUG_ON in get_page would identify any parts of the kernel that are supplying bios that have pages without references. Just a note in response to "we shouldn't get that far": I believe the VM_BUG_ON line in get_page does get executed for tail pages when the __get_page_tail returns false because the compound page head has a zero count: get_page -> __get_page_tail -> get_page_unless_zero returns false, so __get_page_tail returns "got", which is still false, so get_page executes the VM_BUG_ON, where the _count will be zero for the tail page. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 7:35 PM, Andrew Morton wrote: > On Wed, 7 Aug 2013 19:27:40 -0400 Ed Cashin wrote: > >> On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote: >> >>> On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin wrote: >> ... >>>> Sorry. I tried to cover that but maybe should have separated it more >>>> clearly from the running text. >>>> >>>> The patch changes the current behavior encountered with direct I/O and aoe: >>>> >>>> aoe can BUG when direct I/O used >>>> >>>> ... to a new behavior: >>>> >>>> aoe does not BUG when direct I/O is used >>>> >>>> And the user will never know that instead aoe is manipulating the _count >>>> of the head page, even though that's what's happening with this patch >>>> applied. >>>> >>>> (The user-visible effects that motivated the older patch that added the >>>> BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that >>>> occurred when the aoe driver manipulated the _count of a compound page >>>> tail.) >>> >>> OK, I added "Fix a BUG which can trigger when direct-IO is used with >>> AOE". Which kernel versions are affected? >> >> >> That BUG was added to the aoe driver in a commit that first appears in v3.7. >> It persists until this patch adds the code that handles the situation that >> the BUG was intended to detect. >> >> [ecashin@marino linux]$ git describe --contains --match 'v*' >> 69cf2d85de773d998798e47e3335b85e5645d157 >> v3.7-rc1~110^2~22 >> [ecashin@marino linux]$ >> > > ah, is it this? > > +bio_pageinc(struct bio *bio) > +{ > + struct bio_vec *bv; > + struct page *page; > + int i; > + > + bio_for_each_segment(bv, bio, i) { > + page = bv->bv_page; > + /* Non-zero page count for non-head members of > +* compound pages is no longer allowed by the kernel, > +* but this has never been seen here. > +*/ > + if (unlikely(PageCompound(page))) > + if (compound_trans_head(page) != page) { > + pr_crit("page tail used for block I/O\n"); > + BUG(); > + } > > But get_page/put_page against a tail page of a compound page should > Just Work. The core MM will hunt down the head page and manipulate its > refcount. > > Perhaps the problem here is that AOE is diving under the covers and is > using low-level page refcount alterations: > > + atomic_inc(>_count); > > Why does AOE do this? It would be better if it were to use the > official published MM interfaces. If those interfaces need > alteration/augmentation then we can do that. The get_page cannot be used when the _count is 0 (which is exactly when we need to increment it), because it will VM_BUG_ON. /* * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ VM_BUG_ON(atomic_read(>_count) <= 0); (And seeing the code reminds me that I forgot to include atomic_read in my last email's proposal. LKML + dinner isn't a good mix. ;) -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 7:41 PM, Ed Cashin wrote: > It sounds like it's wrong to give block pages with a zero count, so why not > just have aoe BUG_ON(compound_trans_head(bv->page->_count) == 0) until we're > sure nobody does that anymore? Ugh. I goofed the parens and such. I meant, BUG_ON(compound_trans_head(bv->bv_page)->_count == 0) ... will catch the cases we think should not be occurring. > If that idea makes sense to you, I will submit a new patch to follow the one > under discussion. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 5:27 PM, Andrew Morton wrote: > On Wed, 7 Aug 2013 14:18:35 -0700 Andrew Morton > wrote: > >> On Wed, 7 Aug 2013 17:12:36 -0400 Ed Cashin wrote: >> >>> >>> On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote: >>> >>>> On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin wrote: >>>> >>>>> As discussed previously, >>>> >>>> I think I missed that. >>>> >>>>> the fact that some users of the block >>>>> layer provide bios that point to pages with a zero _count means >>>>> that it is not OK for the network layer to do a put_page on the >>>>> skb frags during an skb_linearize, so the aoe driver gets a >>>>> reference to pages in bios and puts the reference before ending >>>>> the bio. And because it cannot use get_page on a page with a >>>>> zero _count, it manipulates the value directly. >>>> >>>> Eh? What code is putting count==0 pages into bios? That sounds very >>>> weird and broken. >>> >>> I thought so in 2007 but couldn't solicit a clear "this is wrong" consensus >>> from the discussion. >>> >>> http://article.gmane.org/gmane.linux.kernel/499197 >>> https://lkml.org/lkml/2007/1/19/56 >>> https://lkml.org/lkml/2006/12/18/230 >>> >>> We were seeing zero-count pages in bios from XFS, but Christoph Hellwig >>> pointed out that kmalloced pages can also come from ext3 when it's doing >>> log recovery, and they'll have zero page counts. >> >> aiiee! >> >> It is (I suppose) reasonable to put kmalloced memory into a BIO's page >> array. And it is perfectly reasonable for a user of that bio to do a >> get_page/put_page against that page. It is utterly unreasonable for >> the damn page to get freed as a result! >> >> I'd claim that slab is broken. The page is in use, so it should have an >> elevated refcount, full stop. >> > > err, no. slab.c uses alloc_pages(), so the underlying page indeed has > a proper refcount. I'm still not understanding how this situation comes > about. It sounds like it's wrong to give block pages with a zero count, so why not just have aoe BUG_ON(compound_trans_head(bv->page->_count) == 0) until we're sure nobody does that anymore? If that idea makes sense to you, I will submit a new patch to follow the one under discussion. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote: > On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin wrote: ... >> Sorry. I tried to cover that but maybe should have separated it more >> clearly from the running text. >> >> The patch changes the current behavior encountered with direct I/O and aoe: >> >> aoe can BUG when direct I/O used >> >> ... to a new behavior: >> >> aoe does not BUG when direct I/O is used >> >> And the user will never know that instead aoe is manipulating the _count of >> the head page, even though that's what's happening with this patch applied. >> >> (The user-visible effects that motivated the older patch that added the BUG >> to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred >> when the aoe driver manipulated the _count of a compound page tail.) > > OK, I added "Fix a BUG which can trigger when direct-IO is used with > AOE". Which kernel versions are affected? That BUG was added to the aoe driver in a commit that first appears in v3.7. It persists until this patch adds the code that handles the situation that the BUG was intended to detect. [ecashin@marino linux]$ git describe --contains --match 'v*' 69cf2d85de773d998798e47e3335b85e5645d157 v3.7-rc1~110^2~22 [ecashin@marino linux]$ -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 5:00 PM, Andrew Morton wrote: > On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin wrote: > >> As discussed previously, the fact that some users of the block >> layer provide bios that point to pages with a zero _count means >> that it is not OK for the network layer to do a put_page on the >> skb frags during an skb_linearize, so the aoe driver gets a >> reference to pages in bios and puts the reference before ending >> the bio. And because it cannot use get_page on a page with a >> zero _count, it manipulates the value directly. >> >> It is not OK to increment the _count of a compound page tail, >> though, since the VM layer will VM_BUG_ON a non-zero _count. >> Block users that do direct I/O can result in the aoe driver >> seeing compound page tails in bios. In that case, the same logic >> works as long as the head of the compound page is used instead of >> the tails. This patch handles compound pages and does not BUG. >> >> It relies on the block layer user leaving the relationship >> between the page tail and its head alone for the duration between >> the submission of the bio and its completion, whether successful >> or not. > > What are the end-user-visible effects of the problem which is being > fixed? Please always include this info when fixing bugs so that others > can work out what kernel version(s) need the patch. Sorry. I tried to cover that but maybe should have separated it more clearly from the running text. The patch changes the current behavior encountered with direct I/O and aoe: aoe can BUG when direct I/O used ... to a new behavior: aoe does not BUG when direct I/O is used And the user will never know that instead aoe is manipulating the _count of the head page, even though that's what's happening with this patch applied. (The user-visible effects that motivated the older patch that added the BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred when the aoe driver manipulated the _count of a compound page tail.) -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote: > On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin wrote: > >> As discussed previously, > > I think I missed that. > >> the fact that some users of the block >> layer provide bios that point to pages with a zero _count means >> that it is not OK for the network layer to do a put_page on the >> skb frags during an skb_linearize, so the aoe driver gets a >> reference to pages in bios and puts the reference before ending >> the bio. And because it cannot use get_page on a page with a >> zero _count, it manipulates the value directly. > > Eh? What code is putting count==0 pages into bios? That sounds very > weird and broken. I thought so in 2007 but couldn't solicit a clear "this is wrong" consensus from the discussion. http://article.gmane.org/gmane.linux.kernel/499197 https://lkml.org/lkml/2007/1/19/56 https://lkml.org/lkml/2006/12/18/230 We were seeing zero-count pages in bios from XFS, but Christoph Hellwig pointed out that kmalloced pages can also come from ext3 when it's doing log recovery, and they'll have zero page counts. I don't have a counter in the driver to track how often zero-count pages show up in bios, so I don't know how common it is these days. I would have been motivated to look if I had gotten the impression that there was any interest in identifying and eliminating cases where zero-count pages were being used in bios. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote: On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin ecas...@coraid.com wrote: As discussed previously, I think I missed that. the fact that some users of the block layer provide bios that point to pages with a zero _count means that it is not OK for the network layer to do a put_page on the skb frags during an skb_linearize, so the aoe driver gets a reference to pages in bios and puts the reference before ending the bio. And because it cannot use get_page on a page with a zero _count, it manipulates the value directly. Eh? What code is putting count==0 pages into bios? That sounds very weird and broken. I thought so in 2007 but couldn't solicit a clear this is wrong consensus from the discussion. http://article.gmane.org/gmane.linux.kernel/499197 https://lkml.org/lkml/2007/1/19/56 https://lkml.org/lkml/2006/12/18/230 We were seeing zero-count pages in bios from XFS, but Christoph Hellwig pointed out that kmalloced pages can also come from ext3 when it's doing log recovery, and they'll have zero page counts. I don't have a counter in the driver to track how often zero-count pages show up in bios, so I don't know how common it is these days. I would have been motivated to look if I had gotten the impression that there was any interest in identifying and eliminating cases where zero-count pages were being used in bios. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 5:00 PM, Andrew Morton wrote: On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin ecas...@coraid.com wrote: As discussed previously, the fact that some users of the block layer provide bios that point to pages with a zero _count means that it is not OK for the network layer to do a put_page on the skb frags during an skb_linearize, so the aoe driver gets a reference to pages in bios and puts the reference before ending the bio. And because it cannot use get_page on a page with a zero _count, it manipulates the value directly. It is not OK to increment the _count of a compound page tail, though, since the VM layer will VM_BUG_ON a non-zero _count. Block users that do direct I/O can result in the aoe driver seeing compound page tails in bios. In that case, the same logic works as long as the head of the compound page is used instead of the tails. This patch handles compound pages and does not BUG. It relies on the block layer user leaving the relationship between the page tail and its head alone for the duration between the submission of the bio and its completion, whether successful or not. What are the end-user-visible effects of the problem which is being fixed? Please always include this info when fixing bugs so that others can work out what kernel version(s) need the patch. Sorry. I tried to cover that but maybe should have separated it more clearly from the running text. The patch changes the current behavior encountered with direct I/O and aoe: aoe can BUG when direct I/O used ... to a new behavior: aoe does not BUG when direct I/O is used And the user will never know that instead aoe is manipulating the _count of the head page, even though that's what's happening with this patch applied. (The user-visible effects that motivated the older patch that added the BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred when the aoe driver manipulated the _count of a compound page tail.) -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote: On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin ecas...@coraid.com wrote: ... Sorry. I tried to cover that but maybe should have separated it more clearly from the running text. The patch changes the current behavior encountered with direct I/O and aoe: aoe can BUG when direct I/O used ... to a new behavior: aoe does not BUG when direct I/O is used And the user will never know that instead aoe is manipulating the _count of the head page, even though that's what's happening with this patch applied. (The user-visible effects that motivated the older patch that added the BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred when the aoe driver manipulated the _count of a compound page tail.) OK, I added Fix a BUG which can trigger when direct-IO is used with AOE. Which kernel versions are affected? That BUG was added to the aoe driver in a commit that first appears in v3.7. It persists until this patch adds the code that handles the situation that the BUG was intended to detect. [ecashin@marino linux]$ git describe --contains --match 'v*' 69cf2d85de773d998798e47e3335b85e5645d157 v3.7-rc1~110^2~22 [ecashin@marino linux]$ -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 5:27 PM, Andrew Morton wrote: On Wed, 7 Aug 2013 14:18:35 -0700 Andrew Morton a...@linux-foundation.org wrote: On Wed, 7 Aug 2013 17:12:36 -0400 Ed Cashin ecas...@coraid.com wrote: On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote: On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin ecas...@coraid.com wrote: As discussed previously, I think I missed that. the fact that some users of the block layer provide bios that point to pages with a zero _count means that it is not OK for the network layer to do a put_page on the skb frags during an skb_linearize, so the aoe driver gets a reference to pages in bios and puts the reference before ending the bio. And because it cannot use get_page on a page with a zero _count, it manipulates the value directly. Eh? What code is putting count==0 pages into bios? That sounds very weird and broken. I thought so in 2007 but couldn't solicit a clear this is wrong consensus from the discussion. http://article.gmane.org/gmane.linux.kernel/499197 https://lkml.org/lkml/2007/1/19/56 https://lkml.org/lkml/2006/12/18/230 We were seeing zero-count pages in bios from XFS, but Christoph Hellwig pointed out that kmalloced pages can also come from ext3 when it's doing log recovery, and they'll have zero page counts. aiiee! It is (I suppose) reasonable to put kmalloced memory into a BIO's page array. And it is perfectly reasonable for a user of that bio to do a get_page/put_page against that page. It is utterly unreasonable for the damn page to get freed as a result! I'd claim that slab is broken. The page is in use, so it should have an elevated refcount, full stop. err, no. slab.c uses alloc_pages(), so the underlying page indeed has a proper refcount. I'm still not understanding how this situation comes about. It sounds like it's wrong to give block pages with a zero count, so why not just have aoe BUG_ON(compound_trans_head(bv-page-_count) == 0) until we're sure nobody does that anymore? If that idea makes sense to you, I will submit a new patch to follow the one under discussion. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 7:41 PM, Ed Cashin wrote: It sounds like it's wrong to give block pages with a zero count, so why not just have aoe BUG_ON(compound_trans_head(bv-page-_count) == 0) until we're sure nobody does that anymore? Ugh. I goofed the parens and such. I meant, BUG_ON(compound_trans_head(bv-bv_page)-_count == 0) ... will catch the cases we think should not be occurring. If that idea makes sense to you, I will submit a new patch to follow the one under discussion. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 7:35 PM, Andrew Morton wrote: On Wed, 7 Aug 2013 19:27:40 -0400 Ed Cashin ecas...@coraid.com wrote: On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote: On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin ecas...@coraid.com wrote: ... Sorry. I tried to cover that but maybe should have separated it more clearly from the running text. The patch changes the current behavior encountered with direct I/O and aoe: aoe can BUG when direct I/O used ... to a new behavior: aoe does not BUG when direct I/O is used And the user will never know that instead aoe is manipulating the _count of the head page, even though that's what's happening with this patch applied. (The user-visible effects that motivated the older patch that added the BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred when the aoe driver manipulated the _count of a compound page tail.) OK, I added Fix a BUG which can trigger when direct-IO is used with AOE. Which kernel versions are affected? That BUG was added to the aoe driver in a commit that first appears in v3.7. It persists until this patch adds the code that handles the situation that the BUG was intended to detect. [ecashin@marino linux]$ git describe --contains --match 'v*' 69cf2d85de773d998798e47e3335b85e5645d157 v3.7-rc1~110^2~22 [ecashin@marino linux]$ ah, is it this? +bio_pageinc(struct bio *bio) +{ + struct bio_vec *bv; + struct page *page; + int i; + + bio_for_each_segment(bv, bio, i) { + page = bv-bv_page; + /* Non-zero page count for non-head members of +* compound pages is no longer allowed by the kernel, +* but this has never been seen here. +*/ + if (unlikely(PageCompound(page))) + if (compound_trans_head(page) != page) { + pr_crit(page tail used for block I/O\n); + BUG(); + } But get_page/put_page against a tail page of a compound page should Just Work. The core MM will hunt down the head page and manipulate its refcount. Perhaps the problem here is that AOE is diving under the covers and is using low-level page refcount alterations: + atomic_inc(page-_count); Why does AOE do this? It would be better if it were to use the official published MM interfaces. If those interfaces need alteration/augmentation then we can do that. The get_page cannot be used when the _count is 0 (which is exactly when we need to increment it), because it will VM_BUG_ON. /* * Getting a normal page or the head of a compound page * requires to already have an elevated page-_count. */ VM_BUG_ON(atomic_read(page-_count) = 0); (And seeing the code reminds me that I forgot to include atomic_read in my last email's proposal. LKML + dinner isn't a good mix. ;) -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aoe: adjust ref of head for compound page tails
On Aug 7, 2013, at 8:05 PM, Andrew Morton wrote: On Wed, 7 Aug 2013 19:54:45 -0400 Ed Cashin ecas...@coraid.com wrote: +bio_pageinc(struct bio *bio) +{ + struct bio_vec *bv; + struct page *page; + int i; + + bio_for_each_segment(bv, bio, i) { + page = bv-bv_page; + /* Non-zero page count for non-head members of +* compound pages is no longer allowed by the kernel, +* but this has never been seen here. +*/ + if (unlikely(PageCompound(page))) + if (compound_trans_head(page) != page) { + pr_crit(page tail used for block I/O\n); + BUG(); + } But get_page/put_page against a tail page of a compound page should Just Work. The core MM will hunt down the head page and manipulate its refcount. Perhaps the problem here is that AOE is diving under the covers and is using low-level page refcount alterations: + atomic_inc(page-_count); Why does AOE do this? It would be better if it were to use the official published MM interfaces. If those interfaces need alteration/augmentation then we can do that. The get_page cannot be used when the _count is 0 (which is exactly when we need to increment it), because it will VM_BUG_ON. /* * Getting a normal page or the head of a compound page * requires to already have an elevated page-_count. */ VM_BUG_ON(atomic_read(page-_count) = 0); But we shouldn't get that far: static inline void get_page(struct page *page) { if (unlikely(PageTail(page))) if (likely(__get_page_tail(page))) return; /* * Getting a normal page or the head of a compound page * requires to already have an elevated page-_count. */ VM_BUG_ON(atomic_read(page-_count) = 0); atomic_inc(page-_count); } This is a tail page, so we should be using __get_page_tail(). When the workaround was created, it was with the assumption that the zero-count pages are not always tail pages, and that seemed to be the case in 2007, but as I said, I don't have a mechanism for detecting that now, so I cannot say whether it really happens with today's kernel. If it is never correct for normal pages or compound page heads to have a zero count while they are associated with a bio, then yes, I think get_page is a great solution. The VM_BUG_ON in get_page would identify any parts of the kernel that are supplying bios that have pages without references. Just a note in response to we shouldn't get that far: I believe the VM_BUG_ON line in get_page does get executed for tail pages when the __get_page_tail returns false because the compound page head has a zero count: get_page - __get_page_tail - get_page_unless_zero returns false, so __get_page_tail returns got, which is still false, so get_page executes the VM_BUG_ON, where the _count will be zero for the tail page. -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] aoe: adjust ref of head for compound page tails
As discussed previously, the fact that some users of the block layer provide bios that point to pages with a zero _count means that it is not OK for the network layer to do a put_page on the skb frags during an skb_linearize, so the aoe driver gets a reference to pages in bios and puts the reference before ending the bio. And because it cannot use get_page on a page with a zero _count, it manipulates the value directly. It is not OK to increment the _count of a compound page tail, though, since the VM layer will VM_BUG_ON a non-zero _count. Block users that do direct I/O can result in the aoe driver seeing compound page tails in bios. In that case, the same logic works as long as the head of the compound page is used instead of the tails. This patch handles compound pages and does not BUG. It relies on the block layer user leaving the relationship between the page tail and its head alone for the duration between the submission of the bio and its completion, whether successful or not. Signed-off-by: Ed Cashin --- drivers/block/aoe/aoecmd.c | 17 +++-- 1 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 99cb944..4d45dba 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -906,16 +906,10 @@ bio_pageinc(struct bio *bio) int i; bio_for_each_segment(bv, bio, i) { - page = bv->bv_page; /* Non-zero page count for non-head members of -* compound pages is no longer allowed by the kernel, -* but this has never been seen here. +* compound pages is no longer allowed by the kernel. */ - if (unlikely(PageCompound(page))) - if (compound_trans_head(page) != page) { - pr_crit("page tail used for block I/O\n"); - BUG(); - } + page = compound_trans_head(bv->bv_page); atomic_inc(>_count); } } @@ -924,10 +918,13 @@ static void bio_pagedec(struct bio *bio) { struct bio_vec *bv; + struct page *page; int i; - bio_for_each_segment(bv, bio, i) - atomic_dec(>bv_page->_count); + bio_for_each_segment(bv, bio, i) { + page = compound_trans_head(bv->bv_page); + atomic_dec(>_count); + } } static void -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] aoe: respect compound page reference tracking expectations
This patch series applies to today's linux-next/akpm, commit 651e600ff3ecc4ac06747313a1c49d89a9ce988e. Ed L. Cashin (1): aoe: adjust ref of head for compound page tails drivers/block/aoe/aoecmd.c | 17 +++-- 1 files changed, 7 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] aoe: use min() to simplify the code
Thanks. I agree that we could do that. I'm not sure it increases readability to put this much stuff on one line, though. What is the motivation? On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote: > min() incorporates condition in it. In our case we could do assignment and > make a choice at once. > > Signed-off-by: Andy Shevchenko > --- > drivers/block/aoe/aoedev.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c > index db35ef6..92fadfa 100644 > --- a/drivers/block/aoe/aoedev.c > +++ b/drivers/block/aoe/aoedev.c > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > #include "aoe.h" > > static void dummy_timer(ulong); > @@ -248,10 +249,7 @@ user_req(char *s, size_t slen, struct aoedev *d) > if (!d->gd) > return 0; > p = kbasename(d->gd->disk_name); > - lim = sizeof(d->gd->disk_name); > - lim -= p - d->gd->disk_name; > - if (slen < lim) > - lim = slen; > + lim = min(sizeof(d->gd->disk_name) - (p - d->gd->disk_name), slen); > > return !strncmp(s, p, lim); > } > -- > 1.8.3.2 > -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] aoe: remove custom implementation of kbasename()
Looks OK. Thanks. On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote: > In the kernel we have a nice helper that may be used here. This patch > substitutes the custom implementation by the native function call. > > Signed-off-by: Andy Shevchenko > --- > drivers/block/aoe/aoedev.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c > index 784c92e..db35ef6 100644 > --- a/drivers/block/aoe/aoedev.c > +++ b/drivers/block/aoe/aoedev.c > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > #include "aoe.h" > > static void dummy_timer(ulong); > @@ -241,16 +242,12 @@ aoedev_downdev(struct aoedev *d) > static int > user_req(char *s, size_t slen, struct aoedev *d) > { > - char *p; > + const char *p; > size_t lim; > > if (!d->gd) > return 0; > - p = strrchr(d->gd->disk_name, '/'); > - if (!p) > - p = d->gd->disk_name; > - else > - p += 1; > + p = kbasename(d->gd->disk_name); > lim = sizeof(d->gd->disk_name); > lim -= p - d->gd->disk_name; > if (slen < lim) > -- > 1.8.3.2 > -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] aoe: remove custom implementation of kbasename()
Looks OK. Thanks. On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote: In the kernel we have a nice helper that may be used here. This patch substitutes the custom implementation by the native function call. Signed-off-by: Andy Shevchenko andriy.shevche...@linux.intel.com --- drivers/block/aoe/aoedev.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index 784c92e..db35ef6 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -12,6 +12,7 @@ #include linux/bitmap.h #include linux/kdev_t.h #include linux/moduleparam.h +#include linux/string.h #include aoe.h static void dummy_timer(ulong); @@ -241,16 +242,12 @@ aoedev_downdev(struct aoedev *d) static int user_req(char *s, size_t slen, struct aoedev *d) { - char *p; + const char *p; size_t lim; if (!d-gd) return 0; - p = strrchr(d-gd-disk_name, '/'); - if (!p) - p = d-gd-disk_name; - else - p += 1; + p = kbasename(d-gd-disk_name); lim = sizeof(d-gd-disk_name); lim -= p - d-gd-disk_name; if (slen lim) -- 1.8.3.2 -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] aoe: use min() to simplify the code
Thanks. I agree that we could do that. I'm not sure it increases readability to put this much stuff on one line, though. What is the motivation? On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote: min() incorporates condition in it. In our case we could do assignment and make a choice at once. Signed-off-by: Andy Shevchenko andriy.shevche...@linux.intel.com --- drivers/block/aoe/aoedev.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index db35ef6..92fadfa 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -13,6 +13,7 @@ #include linux/kdev_t.h #include linux/moduleparam.h #include linux/string.h +#include linux/kernel.h #include aoe.h static void dummy_timer(ulong); @@ -248,10 +249,7 @@ user_req(char *s, size_t slen, struct aoedev *d) if (!d-gd) return 0; p = kbasename(d-gd-disk_name); - lim = sizeof(d-gd-disk_name); - lim -= p - d-gd-disk_name; - if (slen lim) - lim = slen; + lim = min(sizeof(d-gd-disk_name) - (p - d-gd-disk_name), slen); return !strncmp(s, p, lim); } -- 1.8.3.2 -- Ed Cashin ecas...@coraid.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] aoe: adjust ref of head for compound page tails
As discussed previously, the fact that some users of the block layer provide bios that point to pages with a zero _count means that it is not OK for the network layer to do a put_page on the skb frags during an skb_linearize, so the aoe driver gets a reference to pages in bios and puts the reference before ending the bio. And because it cannot use get_page on a page with a zero _count, it manipulates the value directly. It is not OK to increment the _count of a compound page tail, though, since the VM layer will VM_BUG_ON a non-zero _count. Block users that do direct I/O can result in the aoe driver seeing compound page tails in bios. In that case, the same logic works as long as the head of the compound page is used instead of the tails. This patch handles compound pages and does not BUG. It relies on the block layer user leaving the relationship between the page tail and its head alone for the duration between the submission of the bio and its completion, whether successful or not. Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoecmd.c | 17 +++-- 1 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 99cb944..4d45dba 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -906,16 +906,10 @@ bio_pageinc(struct bio *bio) int i; bio_for_each_segment(bv, bio, i) { - page = bv-bv_page; /* Non-zero page count for non-head members of -* compound pages is no longer allowed by the kernel, -* but this has never been seen here. +* compound pages is no longer allowed by the kernel. */ - if (unlikely(PageCompound(page))) - if (compound_trans_head(page) != page) { - pr_crit(page tail used for block I/O\n); - BUG(); - } + page = compound_trans_head(bv-bv_page); atomic_inc(page-_count); } } @@ -924,10 +918,13 @@ static void bio_pagedec(struct bio *bio) { struct bio_vec *bv; + struct page *page; int i; - bio_for_each_segment(bv, bio, i) - atomic_dec(bv-bv_page-_count); + bio_for_each_segment(bv, bio, i) { + page = compound_trans_head(bv-bv_page); + atomic_dec(page-_count); + } } static void -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] aoe: respect compound page reference tracking expectations
This patch series applies to today's linux-next/akpm, commit 651e600ff3ecc4ac06747313a1c49d89a9ce988e. Ed L. Cashin (1): aoe: adjust ref of head for compound page tails drivers/block/aoe/aoecmd.c | 17 +++-- 1 files changed, 7 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/6] aoe: update internal version number to 85
Signed-off-by: Ed Cashin --- drivers/block/aoe/aoe.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index b1f24c5..14a9d19 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -1,5 +1,5 @@ /* Copyright (c) 2013 Coraid, Inc. See COPYING for GPL terms. */ -#define VERSION "83" +#define VERSION "85" #define AOE_MAJOR 152 #define DEVICE_NAME "aoe" -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6] aoe: update copyright date
Signed-off-by: Ed Cashin --- drivers/block/aoe/aoeblk.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index b58cbeb..d63dcf0 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2012 Coraid, Inc. See COPYING for GPL terms. */ +/* Copyright (c) 2013 Coraid, Inc. See COPYING for GPL terms. */ /* * aoeblk.c * block device routines -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6] aoe: fill in per-AoE-target information for debugfs file
This information is presented in a compact format that has evolved for easy routine scanning by expert humans, mostly developers and support technicians helping to troubleshoot or test AoE-based systems. Signed-off-by: Ed Cashin --- drivers/block/aoe/aoeblk.c | 33 - 1 files changed, 32 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index 0511d38..b58cbeb 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -113,11 +113,42 @@ static ssize_t aoedisk_show_payload(struct device *dev, static int aoedisk_debugfs_show(struct seq_file *s, void *ignored) { struct aoedev *d; + struct aoetgt **t, **te; + struct aoeif *ifp, *ife; unsigned long flags; + char c; d = s->private; + seq_printf(s, "rttavg: %d rttdev: %d\n", + d->rttavg >> RTTSCALE, + d->rttdev >> RTTDSCALE); + seq_printf(s, "nskbpool: %d\n", skb_queue_len(>skbpool)); + seq_printf(s, "kicked: %ld\n", d->kicked); + seq_printf(s, "maxbcnt: %ld\n", d->maxbcnt); + seq_printf(s, "ref: %ld\n", d->ref); + spin_lock_irqsave(>lock, flags); - seq_printf(s, "%s\n", d->gd->disk_name); /* place holder */ + t = d->targets; + te = t + d->ntargets; + for (; t < te && *t; t++) { + c = '\t'; + seq_printf(s, "falloc: %ld\n", (*t)->falloc); + seq_printf(s, "ffree: %p\n", + list_empty(&(*t)->ffree) ? NULL : (*t)->ffree.next); + seq_printf(s, "%pm:%d:%d:%d\n", (*t)->addr, (*t)->nout, + (*t)->maxout, (*t)->nframes); + seq_printf(s, "\tssthresh:%d\n", (*t)->ssthresh); + seq_printf(s, "\ttaint:%d\n", (*t)->taint); + seq_printf(s, "\tr:%d\n", (*t)->rpkts); + seq_printf(s, "\tw:%d\n", (*t)->wpkts); + ifp = (*t)->ifs; + ife = ifp + ARRAY_SIZE((*t)->ifs); + for (; ifp->nd && ifp < ife; ifp++) { + seq_printf(s, "%c%s", c, ifp->nd->name); + c = ','; + } + seq_puts(s, "\n"); + } spin_unlock_irqrestore(>lock, flags); return 0; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6] aoe: provide file operations for debugfs files
The place holder in the file contents is filled out in the following patch. Signed-off-by: Ed Cashin --- drivers/block/aoe/aoeblk.c | 25 - 1 files changed, 24 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index d76c5cb..0511d38 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -110,6 +110,24 @@ static ssize_t aoedisk_show_payload(struct device *dev, return snprintf(page, PAGE_SIZE, "%lu\n", d->maxbcnt); } +static int aoedisk_debugfs_show(struct seq_file *s, void *ignored) +{ + struct aoedev *d; + unsigned long flags; + + d = s->private; + spin_lock_irqsave(>lock, flags); + seq_printf(s, "%s\n", d->gd->disk_name); /* place holder */ + spin_unlock_irqrestore(>lock, flags); + + return 0; +} + +static int aoe_debugfs_open(struct inode *inode, struct file *file) +{ + return single_open(file, aoedisk_debugfs_show, inode->i_private); +} + static DEVICE_ATTR(state, S_IRUGO, aoedisk_show_state, NULL); static DEVICE_ATTR(mac, S_IRUGO, aoedisk_show_mac, NULL); static DEVICE_ATTR(netif, S_IRUGO, aoedisk_show_netif, NULL); @@ -132,7 +150,12 @@ static const struct attribute_group attr_group = { .attrs = aoe_attrs, }; -static const struct file_operations aoe_debugfs_fops; +static const struct file_operations aoe_debugfs_fops = { + .open = aoe_debugfs_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; static void aoedisk_add_debugfs(struct aoedev *d) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/6] aoe: add AoE-target files to debugfs
Signed-off-by: Ed Cashin --- drivers/block/aoe/aoe.h|2 ++ drivers/block/aoe/aoeblk.c | 35 +++ drivers/block/aoe/aoedev.c |1 + 3 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 025c41d..b1f24c5 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -169,6 +169,7 @@ struct aoedev { ulong ref; struct work_struct work;/* disk create work struct */ struct gendisk *gd; + struct dentry *debugfs; struct request_queue *blkq; struct hd_geometry geo; sector_t ssize; @@ -206,6 +207,7 @@ struct ktstate { int aoeblk_init(void); void aoeblk_exit(void); void aoeblk_gdalloc(void *); +void aoedisk_rm_debugfs(struct aoedev *d); void aoedisk_rm_sysfs(struct aoedev *d); int aoechr_init(void); diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index cb508b7..d76c5cb 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -132,6 +132,40 @@ static const struct attribute_group attr_group = { .attrs = aoe_attrs, }; +static const struct file_operations aoe_debugfs_fops; + +static void +aoedisk_add_debugfs(struct aoedev *d) +{ + struct dentry *entry; + char *p; + + if (aoe_debugfs_dir == NULL) + return; + p = strchr(d->gd->disk_name, '/'); + if (p == NULL) + p = d->gd->disk_name; + else + p++; + BUG_ON(*p == '\0'); + entry = debugfs_create_file(p, 0444, aoe_debugfs_dir, d, + _debugfs_fops); + if (IS_ERR_OR_NULL(entry)) { + pr_info("aoe: cannot create debugfs file for %s\n", + d->gd->disk_name); + return; + } + BUG_ON(d->debugfs); + d->debugfs = entry; +} +void +aoedisk_rm_debugfs(struct aoedev *d) +{ + BUG_ON(d->debugfs == NULL); + debugfs_remove(d->debugfs); + d->debugfs = NULL; +} + static int aoedisk_add_sysfs(struct aoedev *d) { @@ -332,6 +366,7 @@ aoeblk_gdalloc(void *vp) add_disk(gd); aoedisk_add_sysfs(d); + aoedisk_add_debugfs(d); spin_lock_irqsave(>lock, flags); WARN_ON(!(d->flags & DEVFL_GD_NOW)); diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index 784c92e..c904767 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -278,6 +278,7 @@ freedev(struct aoedev *d) del_timer_sync(>timer); if (d->gd) { + aoedisk_rm_debugfs(d); aoedisk_rm_sysfs(d); del_gendisk(d->gd); put_disk(d->gd); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/6] aoe: create and destroy debugfs directory for aoe
Signed-off-by: Ed Cashin --- drivers/block/aoe/aoeblk.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index 916d9ed..cb508b7 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -17,11 +17,13 @@ #include #include #include +#include #include #include "aoe.h" static DEFINE_MUTEX(aoeblk_mutex); static struct kmem_cache *buf_pool_cache; +static struct dentry *aoe_debugfs_dir; /* GPFS needs a larger value than the default. */ static int aoe_maxsectors; @@ -351,6 +353,8 @@ err: void aoeblk_exit(void) { + debugfs_remove_recursive(aoe_debugfs_dir); + aoe_debugfs_dir = NULL; kmem_cache_destroy(buf_pool_cache); } @@ -362,7 +366,11 @@ aoeblk_init(void) 0, 0, NULL); if (buf_pool_cache == NULL) return -ENOMEM; - + aoe_debugfs_dir = debugfs_create_dir("aoe", NULL); + if (IS_ERR_OR_NULL(aoe_debugfs_dir)) { + pr_info("aoe: cannot create debugfs directory\n"); + aoe_debugfs_dir = NULL; + } return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/6] aoe: add debugfs-based export of debug counters
This patch series applies to today's linux-next/akpm, commit 0c2f52fa9a7cb139617078568c48026c49927617. It adds the debugging information that the coraid.com-distributed aoe driver exports via sysfs, but instead of sysfs, it uses debugfs. With these patches applied, even without AoE targets on the network, KEDR reports new possible memory leaks, but these are from callers outside the aoe driver that have used aoe_devnode to get the name of the character devices through the aoe_class->devnode callback, and I believe they're responsible for freeing that memory. Ed L. Cashin (6): aoe: create and destroy debugfs directory for aoe aoe: add AoE-target files to debugfs aoe: provide file operations for debugfs files aoe: fill in per-AoE-target information for debugfs file aoe: update copyright date aoe: update internal version number to 85 drivers/block/aoe/aoe.h|4 +- drivers/block/aoe/aoeblk.c | 101 +++- drivers/block/aoe/aoedev.c |1 + 3 files changed, 103 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/6] aoe: add debugfs-based export of debug counters
This patch series applies to today's linux-next/akpm, commit 0c2f52fa9a7cb139617078568c48026c49927617. It adds the debugging information that the coraid.com-distributed aoe driver exports via sysfs, but instead of sysfs, it uses debugfs. With these patches applied, even without AoE targets on the network, KEDR reports new possible memory leaks, but these are from callers outside the aoe driver that have used aoe_devnode to get the name of the character devices through the aoe_class-devnode callback, and I believe they're responsible for freeing that memory. Ed L. Cashin (6): aoe: create and destroy debugfs directory for aoe aoe: add AoE-target files to debugfs aoe: provide file operations for debugfs files aoe: fill in per-AoE-target information for debugfs file aoe: update copyright date aoe: update internal version number to 85 drivers/block/aoe/aoe.h|4 +- drivers/block/aoe/aoeblk.c | 101 +++- drivers/block/aoe/aoedev.c |1 + 3 files changed, 103 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/6] aoe: create and destroy debugfs directory for aoe
Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoeblk.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index 916d9ed..cb508b7 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -17,11 +17,13 @@ #include linux/mutex.h #include linux/export.h #include linux/moduleparam.h +#include linux/debugfs.h #include scsi/sg.h #include aoe.h static DEFINE_MUTEX(aoeblk_mutex); static struct kmem_cache *buf_pool_cache; +static struct dentry *aoe_debugfs_dir; /* GPFS needs a larger value than the default. */ static int aoe_maxsectors; @@ -351,6 +353,8 @@ err: void aoeblk_exit(void) { + debugfs_remove_recursive(aoe_debugfs_dir); + aoe_debugfs_dir = NULL; kmem_cache_destroy(buf_pool_cache); } @@ -362,7 +366,11 @@ aoeblk_init(void) 0, 0, NULL); if (buf_pool_cache == NULL) return -ENOMEM; - + aoe_debugfs_dir = debugfs_create_dir(aoe, NULL); + if (IS_ERR_OR_NULL(aoe_debugfs_dir)) { + pr_info(aoe: cannot create debugfs directory\n); + aoe_debugfs_dir = NULL; + } return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/6] aoe: add AoE-target files to debugfs
Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoe.h|2 ++ drivers/block/aoe/aoeblk.c | 35 +++ drivers/block/aoe/aoedev.c |1 + 3 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 025c41d..b1f24c5 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -169,6 +169,7 @@ struct aoedev { ulong ref; struct work_struct work;/* disk create work struct */ struct gendisk *gd; + struct dentry *debugfs; struct request_queue *blkq; struct hd_geometry geo; sector_t ssize; @@ -206,6 +207,7 @@ struct ktstate { int aoeblk_init(void); void aoeblk_exit(void); void aoeblk_gdalloc(void *); +void aoedisk_rm_debugfs(struct aoedev *d); void aoedisk_rm_sysfs(struct aoedev *d); int aoechr_init(void); diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index cb508b7..d76c5cb 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -132,6 +132,40 @@ static const struct attribute_group attr_group = { .attrs = aoe_attrs, }; +static const struct file_operations aoe_debugfs_fops; + +static void +aoedisk_add_debugfs(struct aoedev *d) +{ + struct dentry *entry; + char *p; + + if (aoe_debugfs_dir == NULL) + return; + p = strchr(d-gd-disk_name, '/'); + if (p == NULL) + p = d-gd-disk_name; + else + p++; + BUG_ON(*p == '\0'); + entry = debugfs_create_file(p, 0444, aoe_debugfs_dir, d, + aoe_debugfs_fops); + if (IS_ERR_OR_NULL(entry)) { + pr_info(aoe: cannot create debugfs file for %s\n, + d-gd-disk_name); + return; + } + BUG_ON(d-debugfs); + d-debugfs = entry; +} +void +aoedisk_rm_debugfs(struct aoedev *d) +{ + BUG_ON(d-debugfs == NULL); + debugfs_remove(d-debugfs); + d-debugfs = NULL; +} + static int aoedisk_add_sysfs(struct aoedev *d) { @@ -332,6 +366,7 @@ aoeblk_gdalloc(void *vp) add_disk(gd); aoedisk_add_sysfs(d); + aoedisk_add_debugfs(d); spin_lock_irqsave(d-lock, flags); WARN_ON(!(d-flags DEVFL_GD_NOW)); diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index 784c92e..c904767 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -278,6 +278,7 @@ freedev(struct aoedev *d) del_timer_sync(d-timer); if (d-gd) { + aoedisk_rm_debugfs(d); aoedisk_rm_sysfs(d); del_gendisk(d-gd); put_disk(d-gd); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6] aoe: provide file operations for debugfs files
The place holder in the file contents is filled out in the following patch. Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoeblk.c | 25 - 1 files changed, 24 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index d76c5cb..0511d38 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -110,6 +110,24 @@ static ssize_t aoedisk_show_payload(struct device *dev, return snprintf(page, PAGE_SIZE, %lu\n, d-maxbcnt); } +static int aoedisk_debugfs_show(struct seq_file *s, void *ignored) +{ + struct aoedev *d; + unsigned long flags; + + d = s-private; + spin_lock_irqsave(d-lock, flags); + seq_printf(s, %s\n, d-gd-disk_name); /* place holder */ + spin_unlock_irqrestore(d-lock, flags); + + return 0; +} + +static int aoe_debugfs_open(struct inode *inode, struct file *file) +{ + return single_open(file, aoedisk_debugfs_show, inode-i_private); +} + static DEVICE_ATTR(state, S_IRUGO, aoedisk_show_state, NULL); static DEVICE_ATTR(mac, S_IRUGO, aoedisk_show_mac, NULL); static DEVICE_ATTR(netif, S_IRUGO, aoedisk_show_netif, NULL); @@ -132,7 +150,12 @@ static const struct attribute_group attr_group = { .attrs = aoe_attrs, }; -static const struct file_operations aoe_debugfs_fops; +static const struct file_operations aoe_debugfs_fops = { + .open = aoe_debugfs_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; static void aoedisk_add_debugfs(struct aoedev *d) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6] aoe: fill in per-AoE-target information for debugfs file
This information is presented in a compact format that has evolved for easy routine scanning by expert humans, mostly developers and support technicians helping to troubleshoot or test AoE-based systems. Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoeblk.c | 33 - 1 files changed, 32 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index 0511d38..b58cbeb 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -113,11 +113,42 @@ static ssize_t aoedisk_show_payload(struct device *dev, static int aoedisk_debugfs_show(struct seq_file *s, void *ignored) { struct aoedev *d; + struct aoetgt **t, **te; + struct aoeif *ifp, *ife; unsigned long flags; + char c; d = s-private; + seq_printf(s, rttavg: %d rttdev: %d\n, + d-rttavg RTTSCALE, + d-rttdev RTTDSCALE); + seq_printf(s, nskbpool: %d\n, skb_queue_len(d-skbpool)); + seq_printf(s, kicked: %ld\n, d-kicked); + seq_printf(s, maxbcnt: %ld\n, d-maxbcnt); + seq_printf(s, ref: %ld\n, d-ref); + spin_lock_irqsave(d-lock, flags); - seq_printf(s, %s\n, d-gd-disk_name); /* place holder */ + t = d-targets; + te = t + d-ntargets; + for (; t te *t; t++) { + c = '\t'; + seq_printf(s, falloc: %ld\n, (*t)-falloc); + seq_printf(s, ffree: %p\n, + list_empty((*t)-ffree) ? NULL : (*t)-ffree.next); + seq_printf(s, %pm:%d:%d:%d\n, (*t)-addr, (*t)-nout, + (*t)-maxout, (*t)-nframes); + seq_printf(s, \tssthresh:%d\n, (*t)-ssthresh); + seq_printf(s, \ttaint:%d\n, (*t)-taint); + seq_printf(s, \tr:%d\n, (*t)-rpkts); + seq_printf(s, \tw:%d\n, (*t)-wpkts); + ifp = (*t)-ifs; + ife = ifp + ARRAY_SIZE((*t)-ifs); + for (; ifp-nd ifp ife; ifp++) { + seq_printf(s, %c%s, c, ifp-nd-name); + c = ','; + } + seq_puts(s, \n); + } spin_unlock_irqrestore(d-lock, flags); return 0; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6] aoe: update copyright date
Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoeblk.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c index b58cbeb..d63dcf0 100644 --- a/drivers/block/aoe/aoeblk.c +++ b/drivers/block/aoe/aoeblk.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2012 Coraid, Inc. See COPYING for GPL terms. */ +/* Copyright (c) 2013 Coraid, Inc. See COPYING for GPL terms. */ /* * aoeblk.c * block device routines -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/6] aoe: update internal version number to 85
Signed-off-by: Ed Cashin ecas...@coraid.com --- drivers/block/aoe/aoe.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index b1f24c5..14a9d19 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -1,5 +1,5 @@ /* Copyright (c) 2013 Coraid, Inc. See COPYING for GPL terms. */ -#define VERSION 83 +#define VERSION 85 #define AOE_MAJOR 152 #define DEVICE_NAME aoe -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/