[PATCH] aoe: list new maintainer for aoe driver

2019-05-17 Thread Ed Cashin
Justin Sanders, who has extensive experience with ATA over Ethernet
in general and AoE SCSI and block-device drivers in particular, is
ready to take on the role of aoe maintainer.  The driver needs a more
active maintainer.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5c38f21..9a0c0838 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2568,7 +2568,7 @@ F:
Documentation/devicetree/bindings/eeprom/at24.txt
 F: drivers/misc/eeprom/at24.c
 
 ATA OVER ETHERNET (AOE) DRIVER
-M: "Ed L. Cashin" 
+M: "Justin Sanders" 
 W: http://www.openaoe.org/
 S: Supported
 F: Documentation/aoe/
-- 
2.7.4



Re: [PATCH] block: aoe: no need to check return value of debugfs_create functions

2019-01-22 Thread Ed Cashin
On Tue, Jan 22, 2019 at 6:29 PM Omar Sandoval  wrote:
...
> Now entry is uninitialized here when we assign it to d->debugfs.

Thanks for noticing that.


Re: [PATCH v3] aoe: document sysfs interface

2018-02-19 Thread Ed Cashin

On 02/19/2018 01:16 PM, Aishwarya Pant wrote:

Documentation has been compiled from git commit logs and descriptions in
Documentation/aoe/aoe.txt. This should be useful for scripting and
tracking changes in the ABI.


I don't see any problems with v3.  Thank you.

--
  Ed


Re: [PATCH v3] aoe: document sysfs interface

2018-02-19 Thread Ed Cashin

On 02/19/2018 01:16 PM, Aishwarya Pant wrote:

Documentation has been compiled from git commit logs and descriptions in
Documentation/aoe/aoe.txt. This should be useful for scripting and
tracking changes in the ABI.


I don't see any problems with v3.  Thank you.

--
  Ed


Re: [PATCH] aoe: document sysfs interface

2018-02-17 Thread Ed Cashin

> On Feb 17, 2018, at 9:37 AM, Julia Lawall <julia.law...@lip6.fr> wrote:
> 
> 
> 
>> On Fri, 16 Feb 2018, Ed Cashin wrote:
>> 
>>> On 02/16/2018 10:39 AM, Aishwarya Pant wrote:
>>> Documentation has been compiled from git commit logs and descriptions in
>>> Documentation/aoe/aoe.txt. This should be useful for scripting and
>>> tracking changes in the ABI.
>> ...
>>> +What:/sys/block/etherd*/netif
>> ...
>>> +Description:
>>> +(RO) The name of the network interface on the localhost
>>> through
>>> +which we are communicating with the remote AoE device.
>> 
>> I'd recommend changing that to, "network interfaces".
> 
> Should it then be "names"?

Yes, s’il vous plait. Merci.

— 
  Ed


Re: [PATCH] aoe: document sysfs interface

2018-02-17 Thread Ed Cashin

> On Feb 17, 2018, at 9:37 AM, Julia Lawall  wrote:
> 
> 
> 
>> On Fri, 16 Feb 2018, Ed Cashin wrote:
>> 
>>> On 02/16/2018 10:39 AM, Aishwarya Pant wrote:
>>> Documentation has been compiled from git commit logs and descriptions in
>>> Documentation/aoe/aoe.txt. This should be useful for scripting and
>>> tracking changes in the ABI.
>> ...
>>> +What:/sys/block/etherd*/netif
>> ...
>>> +Description:
>>> +(RO) The name of the network interface on the localhost
>>> through
>>> +which we are communicating with the remote AoE device.
>> 
>> I'd recommend changing that to, "network interfaces".
> 
> Should it then be "names"?

Yes, s’il vous plait. Merci.

— 
  Ed


Re: [PATCH] aoe: document sysfs interface

2018-02-16 Thread Ed Cashin

On 02/16/2018 10:39 AM, Aishwarya Pant wrote:

Documentation has been compiled from git commit logs and descriptions in
Documentation/aoe/aoe.txt. This should be useful for scripting and
tracking changes in the ABI.

...

+What:  /sys/block/etherd*/netif

...

+Description:
+   (RO) The name of the network interface on the localhost through
+   which we are communicating with the remote AoE device.


I'd recommend changing that to, "network interfaces".

Thanks!

--
  Ed


Re: [PATCH] aoe: document sysfs interface

2018-02-16 Thread Ed Cashin

On 02/16/2018 10:39 AM, Aishwarya Pant wrote:

Documentation has been compiled from git commit logs and descriptions in
Documentation/aoe/aoe.txt. This should be useful for scripting and
tracking changes in the ABI.

...

+What:  /sys/block/etherd*/netif

...

+Description:
+   (RO) The name of the network interface on the localhost through
+   which we are communicating with the remote AoE device.


I'd recommend changing that to, "network interfaces".

Thanks!

--
  Ed


Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv

2018-01-28 Thread Ed Cashin
Good luck in your efforts, and thanks for your work on static analysis.

> On Jan 27, 2018, at 9:12 PM, Jia-Ju Bai <baijiaju1...@gmail.com> wrote:
> 
> 
> 
>> On 2018/1/28 1:48, Ed Cashin wrote:
>> If the tool cannot tell whether the protected state is manipulated by 
>> *another* piece of code called in atomic context, then it's insufficient.
>> 
>>> On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai <baijiaju1...@gmail.com> wrote:
>>> 
>>> After checking all possible call chains to aoenet_rcv(),
>>> my tool finds that aoenet_rcv() is never called in atomic context,
>>> namely never in an interrupt handler or holding a spinlock.
>>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
>>> 
>>> This is found by a static analysis tool named DCNS written by myself.
>>> 
>>> Signed-off-by: Jia-Ju Bai <baijiaju1...@gmail.com>
>>> ---
>>> drivers/block/aoe/aoenet.c |2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
>>> index 63773a9..d5fff7a 100644
>>> --- a/drivers/block/aoe/aoenet.c
>>> +++ b/drivers/block/aoe/aoenet.c
>>> @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str)
>>>if (dev_net(ifp) != _net)
>>>goto exit;
>>> 
>>> -skb = skb_share_check(skb, GFP_ATOMIC);
>>> +skb = skb_share_check(skb, GFP_KERNEL);
>>>if (skb == NULL)
>>>return 0;
>>>if (!is_aoe_netif(ifp))
>>> -- 
>>> 1.7.9.5
>>> 
>>> 
> 
> Sorry, I find my report is false positive after I manually check the code.
> aoenet_rcv() is used as function pointer via "->func", and it is called in 
> dev_queue_xmit_nit() in net/core/dev.c.
> dev_queue_xmit_nit() calls a rcu_read_lock() before it calls pt_prev->func().
> Thus it is right to use GFP_ATOMIC in aoenet_rcv().
> Sorry again for my incorrect report...
> 
> Thanks,
> Jia-Ju Bai



Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv

2018-01-28 Thread Ed Cashin
Good luck in your efforts, and thanks for your work on static analysis.

> On Jan 27, 2018, at 9:12 PM, Jia-Ju Bai  wrote:
> 
> 
> 
>> On 2018/1/28 1:48, Ed Cashin wrote:
>> If the tool cannot tell whether the protected state is manipulated by 
>> *another* piece of code called in atomic context, then it's insufficient.
>> 
>>> On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai  wrote:
>>> 
>>> After checking all possible call chains to aoenet_rcv(),
>>> my tool finds that aoenet_rcv() is never called in atomic context,
>>> namely never in an interrupt handler or holding a spinlock.
>>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
>>> 
>>> This is found by a static analysis tool named DCNS written by myself.
>>> 
>>> Signed-off-by: Jia-Ju Bai 
>>> ---
>>> drivers/block/aoe/aoenet.c |2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
>>> index 63773a9..d5fff7a 100644
>>> --- a/drivers/block/aoe/aoenet.c
>>> +++ b/drivers/block/aoe/aoenet.c
>>> @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str)
>>>if (dev_net(ifp) != _net)
>>>goto exit;
>>> 
>>> -skb = skb_share_check(skb, GFP_ATOMIC);
>>> +skb = skb_share_check(skb, GFP_KERNEL);
>>>if (skb == NULL)
>>>return 0;
>>>if (!is_aoe_netif(ifp))
>>> -- 
>>> 1.7.9.5
>>> 
>>> 
> 
> Sorry, I find my report is false positive after I manually check the code.
> aoenet_rcv() is used as function pointer via "->func", and it is called in 
> dev_queue_xmit_nit() in net/core/dev.c.
> dev_queue_xmit_nit() calls a rcu_read_lock() before it calls pt_prev->func().
> Thus it is right to use GFP_ATOMIC in aoenet_rcv().
> Sorry again for my incorrect report...
> 
> Thanks,
> Jia-Ju Bai



Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv

2018-01-27 Thread Ed Cashin
If the tool cannot tell whether the protected state is manipulated by *another* 
piece of code called in atomic context, then it's insufficient.

> On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai  wrote:
> 
> After checking all possible call chains to aoenet_rcv(),
> my tool finds that aoenet_rcv() is never called in atomic context, 
> namely never in an interrupt handler or holding a spinlock.
> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
> 
> This is found by a static analysis tool named DCNS written by myself.
> 
> Signed-off-by: Jia-Ju Bai 
> ---
> drivers/block/aoe/aoenet.c |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
> index 63773a9..d5fff7a 100644
> --- a/drivers/block/aoe/aoenet.c
> +++ b/drivers/block/aoe/aoenet.c
> @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str)
>if (dev_net(ifp) != _net)
>goto exit;
> 
> -skb = skb_share_check(skb, GFP_ATOMIC);
> +skb = skb_share_check(skb, GFP_KERNEL);
>if (skb == NULL)
>return 0;
>if (!is_aoe_netif(ifp))
> -- 
> 1.7.9.5
> 



Re: [PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv

2018-01-27 Thread Ed Cashin
If the tool cannot tell whether the protected state is manipulated by *another* 
piece of code called in atomic context, then it's insufficient.

> On Jan 26, 2018, at 4:37 AM, Jia-Ju Bai  wrote:
> 
> After checking all possible call chains to aoenet_rcv(),
> my tool finds that aoenet_rcv() is never called in atomic context, 
> namely never in an interrupt handler or holding a spinlock.
> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
> 
> This is found by a static analysis tool named DCNS written by myself.
> 
> Signed-off-by: Jia-Ju Bai 
> ---
> drivers/block/aoe/aoenet.c |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
> index 63773a9..d5fff7a 100644
> --- a/drivers/block/aoe/aoenet.c
> +++ b/drivers/block/aoe/aoenet.c
> @@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str)
>if (dev_net(ifp) != _net)
>goto exit;
> 
> -skb = skb_share_check(skb, GFP_ATOMIC);
> +skb = skb_share_check(skb, GFP_KERNEL);
>if (skb == NULL)
>return 0;
>if (!is_aoe_netif(ifp))
> -- 
> 1.7.9.5
> 



Re: [PATCH] [RESEND] aoe: use ktime_t instead of timeval

2018-01-17 Thread Ed Cashin

On 01/17/2018 10:41 AM, Jens Axboe wrote:
...

Applied, thanks Arnd/Tina.


Thank you for the CC.  Sorry for the lack of response in November.

--
  Ed



Re: [PATCH] [RESEND] aoe: use ktime_t instead of timeval

2018-01-17 Thread Ed Cashin

On 01/17/2018 10:41 AM, Jens Axboe wrote:
...

Applied, thanks Arnd/Tina.


Thank you for the CC.  Sorry for the lack of response in November.

--
  Ed



Re: [PATCH] block/aoe: discover_timer: Convert timers to use timer_setup()

2017-11-03 Thread Ed Cashin

On 11/02/2017 07:31 PM, Kees Cook wrote:

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

This refactors the discover_timer to remove the needless locking and
state machine used for synchronizing timer death. Using del_timer_sync()
will already do the right thing.


Looks OK to me, thanks.

--
  Ed


Re: [PATCH] block/aoe: discover_timer: Convert timers to use timer_setup()

2017-11-03 Thread Ed Cashin

On 11/02/2017 07:31 PM, Kees Cook wrote:

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

This refactors the discover_timer to remove the needless locking and
state machine used for synchronizing timer death. Using del_timer_sync()
will already do the right thing.


Looks OK to me, thanks.

--
  Ed


Re: [PATCH] aoe: use setup_timer

2017-03-25 Thread Ed Cashin

On 03/24/2017 10:15 AM, Geliang Tang wrote:

Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang 
---
 drivers/block/aoe/aoemain.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/block/aoe/aoemain.c b/drivers/block/aoe/aoemain.c
index 4b987c2..20865d4 100644
--- a/drivers/block/aoe/aoemain.c
+++ b/drivers/block/aoe/aoemain.c
@@ -28,10 +28,8 @@ discover_timer(ulong vp)

switch (vp) {
case TINIT:
-   init_timer();
+   setup_timer(, discover_timer, TRUN);
spin_lock_init();
-   t.data = TRUN;
-   t.function = discover_timer;
die = 0;
case TRUN:
spin_lock_irqsave(, flags);



This change looks OK to me, thanks.

--
  Ed


Re: [PATCH] aoe: use setup_timer

2017-03-25 Thread Ed Cashin

On 03/24/2017 10:15 AM, Geliang Tang wrote:

Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang 
---
 drivers/block/aoe/aoemain.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/block/aoe/aoemain.c b/drivers/block/aoe/aoemain.c
index 4b987c2..20865d4 100644
--- a/drivers/block/aoe/aoemain.c
+++ b/drivers/block/aoe/aoemain.c
@@ -28,10 +28,8 @@ discover_timer(ulong vp)

switch (vp) {
case TINIT:
-   init_timer();
+   setup_timer(, discover_timer, TRUN);
spin_lock_init();
-   t.data = TRUN;
-   t.function = discover_timer;
die = 0;
case TRUN:
spin_lock_irqsave(, flags);



This change looks OK to me, thanks.

--
  Ed


Re: [PATCH 06/15] genhd: Add return code to device_add_disk

2016-08-17 Thread Ed Cashin

On 08/17/2016 05:14 AM, Fam Zheng wrote:
...

Of course, the plan is to write patches on top. I'm not cleaning up anything
here because I'm concerned callers may double free (and I didn't look hard into
that).


Aside from Huck's concerns, the changes looked OK from aoe's perspective.

--
  Ed



Re: [PATCH 06/15] genhd: Add return code to device_add_disk

2016-08-17 Thread Ed Cashin

On 08/17/2016 05:14 AM, Fam Zheng wrote:
...

Of course, the plan is to write patches on top. I'm not cleaning up anything
here because I'm concerned callers may double free (and I didn't look hard into
that).


Aside from Huck's concerns, the changes looked OK from aoe's perspective.

--
  Ed



Re: [PATCH v2 05/12] aoeblk: Generate uevent after attribute available

2016-06-30 Thread Ed Cashin

On 06/29/2016 09:59 PM, Fam Zheng wrote:

It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.


Looks like an improvement, thanks!

--
  Ed



Re: [PATCH v2 05/12] aoeblk: Generate uevent after attribute available

2016-06-30 Thread Ed Cashin

On 06/29/2016 09:59 PM, Fam Zheng wrote:

It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.


Looks like an improvement, thanks!

--
  Ed



Re: [PATCH] aoe: remove unnecessary check for failing kthread creation

2016-02-02 Thread Ed Cashin

On 02/01/2016 10:53 AM, Insu Yun wrote:

When kthread_run fails, it returns ERR, not NULL.
Therefore, NULL checking is redundant.
(https://www.kernel.org/doc/htmldocs/device-drivers/API-kthread-run.html)


Thanks, the change looks reasonable.

...

task = kthread_run(kthread, k, "%s", k->name);
-   if (task == NULL || IS_ERR(task))
+   if (IS_ERR(task))
return -ENOMEM;


Interestingly, after this change, it's more clear that returning
-ENOMEM is a bit misleading when the underlying kernel/kthread.c
function, kthread_create_on_node returns ERR_PTR(-EINTR).  But because
that would happen when the process we need to have the driver working
gets killed, I'm not sure it matters.

--
  Ed



Re: [PATCH] aoe: remove unnecessary check for failing kthread creation

2016-02-02 Thread Ed Cashin

On 02/01/2016 10:53 AM, Insu Yun wrote:

When kthread_run fails, it returns ERR, not NULL.
Therefore, NULL checking is redundant.
(https://www.kernel.org/doc/htmldocs/device-drivers/API-kthread-run.html)


Thanks, the change looks reasonable.

...

task = kthread_run(kthread, k, "%s", k->name);
-   if (task == NULL || IS_ERR(task))
+   if (IS_ERR(task))
return -ENOMEM;


Interestingly, after this change, it's more clear that returning
-ENOMEM is a bit misleading when the underlying kernel/kthread.c
function, kthread_create_on_node returns ERR_PTR(-EINTR).  But because
that would happen when the process we need to have the driver working
gets killed, I'm not sure it matters.

--
  Ed



Re: [PATCH 09/39] aoe: drop null test before destroy functions

2015-09-14 Thread Ed Cashin

ACK.  Thanks.

On 09/13/2015 08:15 AM, Julia Lawall wrote:

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@@ expression x; @@
-if (x != NULL)
   \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// 

Signed-off-by: Julia Lawall 

---
  drivers/block/aoe/aoedev.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index ffd1947..efc8a8a 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -285,8 +285,7 @@ freedev(struct aoedev *d)
e = t + d->ntargets;
for (; t < e && *t; t++)
freetgt(d, *t);
-   if (d->bufpool)
-   mempool_destroy(d->bufpool);
+   mempool_destroy(d->bufpool);
skbpoolfree(d);
minor_free(d->sysminor);
  



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/39] aoe: drop null test before destroy functions

2015-09-14 Thread Ed Cashin

ACK.  Thanks.

On 09/13/2015 08:15 AM, Julia Lawall wrote:

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@@ expression x; @@
-if (x != NULL)
   \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// 

Signed-off-by: Julia Lawall 

---
  drivers/block/aoe/aoedev.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index ffd1947..efc8a8a 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -285,8 +285,7 @@ freedev(struct aoedev *d)
e = t + d->ntargets;
for (; t < e && *t; t++)
freetgt(d, *t);
-   if (d->bufpool)
-   mempool_destroy(d->bufpool);
+   mempool_destroy(d->bufpool);
skbpoolfree(d);
minor_free(d->sysminor);
  



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Revert "block: remove artifical max_hw_sectors cap"

2015-08-14 Thread Ed Cashin

ACK.

On 08/13/2015 02:57 PM, Jeff Moyer wrote:

This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays.  It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.

The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k.  In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512.  This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.

Signed-off-by: Jeff Moyer 
---
  block/blk-settings.c   | 4 +++-
  drivers/block/aoe/aoeblk.c | 2 +-
  include/linux/blkdev.h | 1 +
  3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 12600bf..b160f89 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,7 +257,9 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, 
unsigned int max_hw_
   __func__, max_hw_sectors);
}
  
-	limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;

+   limits->max_hw_sectors = max_hw_sectors;
+   limits->max_sectors = min_t(unsigned int, max_hw_sectors,
+   BLK_DEF_MAX_SECTORS);
  }
  EXPORT_SYMBOL(blk_limits_max_hw_sectors);
  
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c

index 46c282f..dd73e1f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->flags & DEVFL_TKILL);
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
-   blk_queue_max_hw_sectors(q, 1024);
+   blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
q->backing_dev_info.name = "aoe";
q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
d->bufpool = mp;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4068c1..1fd459e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1138,6 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t 
has_write_perm);
  enum blk_default_limits {
BLK_MAX_SEGMENTS= 128,
BLK_SAFE_MAX_SECTORS= 255,
+   BLK_DEF_MAX_SECTORS = 1024,
BLK_MAX_SEGMENT_SIZE= 65536,
BLK_SEG_BOUNDARY_MASK   = 0xUL,
  };


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Revert block: remove artifical max_hw_sectors cap

2015-08-14 Thread Ed Cashin

ACK.

On 08/13/2015 02:57 PM, Jeff Moyer wrote:

This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays.  It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.

The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k.  In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512.  This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.

Signed-off-by: Jeff Moyer jmo...@redhat.com
---
  block/blk-settings.c   | 4 +++-
  drivers/block/aoe/aoeblk.c | 2 +-
  include/linux/blkdev.h | 1 +
  3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 12600bf..b160f89 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,7 +257,9 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, 
unsigned int max_hw_
   __func__, max_hw_sectors);
}
  
-	limits-max_sectors = limits-max_hw_sectors = max_hw_sectors;

+   limits-max_hw_sectors = max_hw_sectors;
+   limits-max_sectors = min_t(unsigned int, max_hw_sectors,
+   BLK_DEF_MAX_SECTORS);
  }
  EXPORT_SYMBOL(blk_limits_max_hw_sectors);
  
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c

index 46c282f..dd73e1f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d-flags  DEVFL_TKILL);
WARN_ON(d-gd);
WARN_ON(d-flags  DEVFL_UP);
-   blk_queue_max_hw_sectors(q, 1024);
+   blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
q-backing_dev_info.name = aoe;
q-backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
d-bufpool = mp;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4068c1..1fd459e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1138,6 +1138,7 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t 
has_write_perm);
  enum blk_default_limits {
BLK_MAX_SEGMENTS= 128,
BLK_SAFE_MAX_SECTORS= 255,
+   BLK_DEF_MAX_SECTORS = 1024,
BLK_MAX_SEGMENT_SIZE= 65536,
BLK_SEG_BOUNDARY_MASK   = 0xUL,
  };


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Convert use of __constant_ to

2015-06-19 Thread Ed Cashin

OK.  Thanks.

On 06/18/2015 11:15 PM, Vaishali Thakkar wrote:

In little endian cases, the macros htons and cpu_to_be16 unfolds
to __swab16 which provides special case for constants. In big
endian cases, __constant_htons and htons expand directly to the
same expression. The same applies for __constant_cpu_to_be16 and
cpu_to_be16.

So, replace __constant_htons with htons and __constant_cpu_to_be16
with cpu_to_be16 with the goal of getting rid of the definitions
__constant_htons and __constant_cpu_to_be16 completely.

The semantic patch that performs this transformation is as follows:

@@expression x;@@

(
- __constant_htons(x)
+ htons(x)
|
- __constant_cpu_to_be16(x)
+ cpu_to_be16(x)
)

Signed-off-by: Vaishali Thakkar 
---
  drivers/block/aoe/aoecmd.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 422b7d8..6e891d2 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -69,7 +69,7 @@ new_skb(ulong len)
skb_reserve(skb, MAX_HEADER);
skb_reset_mac_header(skb);
skb_reset_network_header(skb);
-   skb->protocol = __constant_htons(ETH_P_AOE);
+   skb->protocol = htons(ETH_P_AOE);
skb_checksum_none_assert(skb);
}
return skb;
@@ -132,7 +132,7 @@ aoehdr_atainit(struct aoedev *d, struct aoetgt *t, struct 
aoe_hdr *h)
  
  	memcpy(h->src, t->ifp->nd->dev_addr, sizeof h->src);

memcpy(h->dst, t->addr, sizeof h->dst);
-   h->type = __constant_cpu_to_be16(ETH_P_AOE);
+   h->type = cpu_to_be16(ETH_P_AOE);
h->verfl = AOE_HVER;
h->major = cpu_to_be16(d->aoemajor);
h->minor = d->aoeminor;
@@ -437,7 +437,7 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, 
struct sk_buff_head *qu
  
  		memset(h->dst, 0xff, sizeof h->dst);

memcpy(h->src, ifp->dev_addr, sizeof h->src);
-   h->type = __constant_cpu_to_be16(ETH_P_AOE);
+   h->type = cpu_to_be16(ETH_P_AOE);
h->verfl = AOE_HVER;
h->major = cpu_to_be16(aoemajor);
h->minor = aoeminor;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Convert use of __constant_htons to htons

2015-06-19 Thread Ed Cashin

OK.  Thanks.

On 06/18/2015 11:23 PM, Vaishali Thakkar wrote:

In little endian cases, the macro htons unfolds to __swab16 which
provides special case for constants. In big endian cases,
__constant_htons and htons expand directly to the same expression.
So, replace __constant_htons with htons with the goal of getting
rid of the definition of __constant_htons completely.

The semantic patch that performs this transformation is as follows:

@@expression x;@@

- __constant_htons(x)
+ htons(x)

Signed-off-by: Vaishali Thakkar 
---
  drivers/block/aoe/aoenet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
index 63773a9..78ff47a 100644
--- a/drivers/block/aoe/aoenet.c
+++ b/drivers/block/aoe/aoenet.c
@@ -192,7 +192,7 @@ exit:
  }
  
  static struct packet_type aoe_pt __read_mostly = {

-   .type = __constant_htons(ETH_P_AOE),
+   .type = htons(ETH_P_AOE),
.func = aoenet_rcv,
  };
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Convert use of __constant_foo to foo

2015-06-19 Thread Ed Cashin

OK.  Thanks.

On 06/18/2015 11:15 PM, Vaishali Thakkar wrote:

In little endian cases, the macros htons and cpu_to_be16 unfolds
to __swab16 which provides special case for constants. In big
endian cases, __constant_htons and htons expand directly to the
same expression. The same applies for __constant_cpu_to_be16 and
cpu_to_be16.

So, replace __constant_htons with htons and __constant_cpu_to_be16
with cpu_to_be16 with the goal of getting rid of the definitions
__constant_htons and __constant_cpu_to_be16 completely.

The semantic patch that performs this transformation is as follows:

@@expression x;@@

(
- __constant_htons(x)
+ htons(x)
|
- __constant_cpu_to_be16(x)
+ cpu_to_be16(x)
)

Signed-off-by: Vaishali Thakkar vthakkar1...@gmail.com
---
  drivers/block/aoe/aoecmd.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 422b7d8..6e891d2 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -69,7 +69,7 @@ new_skb(ulong len)
skb_reserve(skb, MAX_HEADER);
skb_reset_mac_header(skb);
skb_reset_network_header(skb);
-   skb-protocol = __constant_htons(ETH_P_AOE);
+   skb-protocol = htons(ETH_P_AOE);
skb_checksum_none_assert(skb);
}
return skb;
@@ -132,7 +132,7 @@ aoehdr_atainit(struct aoedev *d, struct aoetgt *t, struct 
aoe_hdr *h)
  
  	memcpy(h-src, t-ifp-nd-dev_addr, sizeof h-src);

memcpy(h-dst, t-addr, sizeof h-dst);
-   h-type = __constant_cpu_to_be16(ETH_P_AOE);
+   h-type = cpu_to_be16(ETH_P_AOE);
h-verfl = AOE_HVER;
h-major = cpu_to_be16(d-aoemajor);
h-minor = d-aoeminor;
@@ -437,7 +437,7 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, 
struct sk_buff_head *qu
  
  		memset(h-dst, 0xff, sizeof h-dst);

memcpy(h-src, ifp-dev_addr, sizeof h-src);
-   h-type = __constant_cpu_to_be16(ETH_P_AOE);
+   h-type = cpu_to_be16(ETH_P_AOE);
h-verfl = AOE_HVER;
h-major = cpu_to_be16(aoemajor);
h-minor = aoeminor;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Convert use of __constant_htons to htons

2015-06-19 Thread Ed Cashin

OK.  Thanks.

On 06/18/2015 11:23 PM, Vaishali Thakkar wrote:

In little endian cases, the macro htons unfolds to __swab16 which
provides special case for constants. In big endian cases,
__constant_htons and htons expand directly to the same expression.
So, replace __constant_htons with htons with the goal of getting
rid of the definition of __constant_htons completely.

The semantic patch that performs this transformation is as follows:

@@expression x;@@

- __constant_htons(x)
+ htons(x)

Signed-off-by: Vaishali Thakkar vthakkar1...@gmail.com
---
  drivers/block/aoe/aoenet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
index 63773a9..78ff47a 100644
--- a/drivers/block/aoe/aoenet.c
+++ b/drivers/block/aoe/aoenet.c
@@ -192,7 +192,7 @@ exit:
  }
  
  static struct packet_type aoe_pt __read_mostly = {

-   .type = __constant_htons(ETH_P_AOE),
+   .type = htons(ETH_P_AOE),
.func = aoenet_rcv,
  };
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-13 Thread Ed Cashin

On 05/13/2015 04:04 AM, Arnd Bergmann wrote:
...
Shall we do the ktime_get_us() approach then? It still requires a 
32-bit division like do_gettimeofday(), so it will not be as efficient 
as the shifted nanoseconds.


It's no worse, though, right?  So I think it's a good transition. Further
optimization could be attempted in an experimental branch at some
point for easy testing.

As for the aoe_deadsecs computation, converting the aoe_deadsec module 
parameter into scaled nanoseconds can be done at module load time, and 
that way you also save the integer division you currently do for each 
frame in rexmit_timer() to turn the microseconds into seconds. Arnd 


That's true, but the "secs" in the identifier stands for "seconds". It would
be misleading to have something called seconds be scaled nanoseconds.
And we could just use another variable if it weren't for the fact that this
module parameter is exposed through sysfs and can be changed through
that mechanism at any time.

--
  Ed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-13 Thread Ed Cashin

On 05/13/2015 04:04 AM, Arnd Bergmann wrote:
...
Shall we do the ktime_get_us() approach then? It still requires a 
32-bit division like do_gettimeofday(), so it will not be as efficient 
as the shifted nanoseconds.


It's no worse, though, right?  So I think it's a good transition. Further
optimization could be attempted in an experimental branch at some
point for easy testing.

As for the aoe_deadsecs computation, converting the aoe_deadsec module 
parameter into scaled nanoseconds can be done at module load time, and 
that way you also save the integer division you currently do for each 
frame in rexmit_timer() to turn the microseconds into seconds. Arnd 


That's true, but the secs in the identifier stands for seconds. It would
be misleading to have something called seconds be scaled nanoseconds.
And we could just use another variable if it weren't for the fact that this
module parameter is exposed through sysfs and can be changed through
that mechanism at any time.

--
  Ed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-12 Thread Ed Cashin

On 05/12/2015 05:44 AM, Arnd Bergmann wrote:

On Monday 11 May 2015 21:00:25 Ed Cashin wrote:

...

In that case, there is still information about the timing embedded in
the AoE tag.  The send time in jiffies is a rough-grained record of the
send time, and it's extracted from the tag.  For these "unexpected"
responses, this timing information can improve performance significantly
without introducing extra overhead or risk.
That path is not changed at all by this patch, right? It also looks
like the jiffies information from there is only used to print an
error message.



That's right, thanks, the tag still has the old jiffies embedded
in it.

The information, though, is used to update the round-trip-time
average and the running estimate of the RTT variance.  For the
unexpected responses that information can help the driver to
maintain high performance when there are inconsistencies in the
network performance.  (calc_rttavg)

--
  Ed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-12 Thread Ed Cashin

On 05/12/2015 07:14 AM, Arnd Bergmann wrote:

On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote:

There are of course multiple ways to do this. One way would be to
change the code to work on 32-bit nanoseconds instead of 32-bit
microseconds. This requires proving that the we cannot exceed
4.29 seconds of round-trip time in calc_rttavg().
Is that a valid assumption or not?

If not, we could replace do_gettimeofday() with ktime_get_ts64().
This will ensure we don't need a 64-bit division when converting
the ts64 to a 32-bit microsecond value, and combined with the
conversion is still no slower than do_gettimeofday(), and it
still avoids the double bookkeeping because it uses a monotonic
timebase that is robust against settimeofday.

Two other approaches that occurred to me later:

- introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms()
   and ktime_get_real_is() interfaces, to match the other interfaces
   we already provide. These could be done as efficiently or better
   than what aoe does manually today.

- change the timebase that is used for the computations in aoe to use
   scaled nanoseconds instead of microseconds. Using

   u32 time = ktime_get_ns() >> 10;

   would give you a similar range and precision as microseconds, but
   completely avoid integer division. You could also use a different
   shift value to either extend the range beyond 71 minutes, or the
   extend the precision to something below a microsecond. This would
   be the most efficient implementation, but also require significant
   changes to the driver.



That is an interesting idea.  People do care about aoe_deadsecs being
pretty accurate, so there would need to be a way to make that remain
accurate.  The driver will fail outstanding I/O to the target and mark it
as "down" after unsuccessfully retransmitting commands to the target
for a number of seconds equal to aoe_deadsecs.

As to the efficient ktime_get_us idea, that sounds appealing since you
mention that they would be efficient.

Thanks for the analysis.

--
  Ed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-12 Thread Ed Cashin
Thanks for the expanded motivation. I'll return to your ideas tonight, but I 
wanted to mention that it is possible for round trips to take well over five 
seconds, partly because the disk on the target might be resetting. 

If the rough-grained mechanism in the AoE tag still works, though, that's very 
helpful. 

Re: [Y2038] [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-12 Thread Ed Cashin

On 05/12/2015 07:14 AM, Arnd Bergmann wrote:

On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote:

There are of course multiple ways to do this. One way would be to
change the code to work on 32-bit nanoseconds instead of 32-bit
microseconds. This requires proving that the we cannot exceed
4.29 seconds of round-trip time in calc_rttavg().
Is that a valid assumption or not?

If not, we could replace do_gettimeofday() with ktime_get_ts64().
This will ensure we don't need a 64-bit division when converting
the ts64 to a 32-bit microsecond value, and combined with the
conversion is still no slower than do_gettimeofday(), and it
still avoids the double bookkeeping because it uses a monotonic
timebase that is robust against settimeofday.

Two other approaches that occurred to me later:

- introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms()
   and ktime_get_real_is() interfaces, to match the other interfaces
   we already provide. These could be done as efficiently or better
   than what aoe does manually today.

- change the timebase that is used for the computations in aoe to use
   scaled nanoseconds instead of microseconds. Using

   u32 time = ktime_get_ns()  10;

   would give you a similar range and precision as microseconds, but
   completely avoid integer division. You could also use a different
   shift value to either extend the range beyond 71 minutes, or the
   extend the precision to something below a microsecond. This would
   be the most efficient implementation, but also require significant
   changes to the driver.



That is an interesting idea.  People do care about aoe_deadsecs being
pretty accurate, so there would need to be a way to make that remain
accurate.  The driver will fail outstanding I/O to the target and mark it
as down after unsuccessfully retransmitting commands to the target
for a number of seconds equal to aoe_deadsecs.

As to the efficient ktime_get_us idea, that sounds appealing since you
mention that they would be efficient.

Thanks for the analysis.

--
  Ed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-12 Thread Ed Cashin

On 05/12/2015 05:44 AM, Arnd Bergmann wrote:

On Monday 11 May 2015 21:00:25 Ed Cashin wrote:

...

In that case, there is still information about the timing embedded in
the AoE tag.  The send time in jiffies is a rough-grained record of the
send time, and it's extracted from the tag.  For these unexpected
responses, this timing information can improve performance significantly
without introducing extra overhead or risk.
That path is not changed at all by this patch, right? It also looks
like the jiffies information from there is only used to print an
error message.



That's right, thanks, the tag still has the old jiffies embedded
in it.

The information, though, is used to update the round-trip-time
average and the running estimate of the RTT variance.  For the
unexpected responses that information can help the driver to
maintain high performance when there are inconsistencies in the
network performance.  (calc_rttavg)

--
  Ed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-12 Thread Ed Cashin
Thanks for the expanded motivation. I'll return to your ideas tonight, but I 
wanted to mention that it is possible for round trips to take well over five 
seconds, partly because the disk on the target might be resetting. 

If the rough-grained mechanism in the AoE tag still works, though, that's very 
helpful. 

Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-11 Thread Ed Cashin

First, thanks for the patch.  I do appreciate the attempt to simplify
this part of the driver, but I don't think that this patch is good to merge.

I'll make some comments inline below.

On 05/10/2015 10:35 PM, Tina Ruchandani wrote:

'struct frame' uses two variables to store the sent timestamp - 'struct
timeval' and jiffies. jiffies is used to avoid discrepancies caused by
updates to system time. 'struct timeval' uses 32-bit representation for
seconds which will overflow in year 2038.


The comment in the deleted lines below mentions the fact that the
overflow does not matter for calculating rough-grained deltas in time.
So there is no problem in 2038 or on systems with the clock set to 2038
accidentally.


This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock
time prevents any discprepancies caused by updates to system time.


But the patch only changes the struct frame data.  The aoe driver
only has the struct frame for an incoming AoE response when that
response is "expected".  If the response comes in a bit late, the frame
may have already been used for a new command.

You can see that in aoecmd_ata_rsp when getframe_deferred returns
NULL and tsince is called instead of tsince_hr.

In that case, there is still information about the timing embedded in
the AoE tag.  The send time in jiffies is a rough-grained record of the
send time, and it's extracted from the tag.  For these "unexpected"
responses, this timing information can improve performance significantly
without introducing extra overhead or risk.

I don't think the patch considers this aspect of the way the round trip
time is calculated, and I don't think the primary motivation is justified
(if that's 2038 safety, which we have already).

Simplifying it would be nice, but it would be difficult to thoroughly test
all of the performance implications.  There are still people using 32-bit
systems, for example.



Signed-off-by: Tina Ruchandani 
---
  drivers/block/aoe/aoe.h|  3 +--
  drivers/block/aoe/aoecmd.c | 36 +++-
  2 files changed, 8 insertions(+), 31 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 9220f8e..4582b3c 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -112,8 +112,7 @@ enum frame_flags {
  struct frame {
struct list_head head;
u32 tag;
-   struct timeval sent;/* high-res time packet was sent */
-   u32 sent_jiffs; /* low-res jiffies-based sent time */
+   ktime_t sent;
ulong waited;
ulong waited_total;
struct aoetgt *t;   /* parent target I belong to */
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 422b7d8..7f78780 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -398,8 +398,7 @@ aoecmd_ata_rw(struct aoedev *d)
  
  	skb = skb_clone(f->skb, GFP_ATOMIC);

if (skb) {
-   do_gettimeofday(>sent);
-   f->sent_jiffs = (u32) jiffies;
+   f->sent = ktime_get();
__skb_queue_head_init();
__skb_queue_tail(, skb);
aoenet_xmit();
@@ -489,8 +488,7 @@ resend(struct aoedev *d, struct frame *f)
skb = skb_clone(skb, GFP_ATOMIC);
if (skb == NULL)
return;
-   do_gettimeofday(>sent);
-   f->sent_jiffs = (u32) jiffies;
+   f->sent = ktime_get();
__skb_queue_head_init();
__skb_queue_tail(, skb);
aoenet_xmit();
@@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f)
  static int
  tsince_hr(struct frame *f)
  {
-   struct timeval now;
+   ktime_t now;
int n;
  
-	do_gettimeofday();

-   n = now.tv_usec - f->sent.tv_usec;
-   n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
+   now = ktime_get();
+   n = ktime_to_us(ktime_sub(now, f->sent));
  
  	if (n < 0)

n = -n;
  
-	/* For relatively long periods, use jiffies to avoid

-* discrepancies caused by updates to the system time.
-*
-* On system with HZ of 1000, 32-bits is over 49 days
-* worth of jiffies, or over 71 minutes worth of usecs.
-*
-* Jiffies overflow is handled by subtraction of unsigned ints:
-* (gdb) print (unsigned) 2 - (unsigned) 0xfffe
-* $3 = 4
-* (gdb)
-*/
-   if (n > USEC_PER_SEC / 4) {
-   n = ((u32) jiffies) - f->sent_jiffs;
-   n *= USEC_PER_SEC / HZ;
-   }
-
return n;
  }
  
@@ -589,7 +570,6 @@ reassign_frame(struct frame *f)

nf->waited = 0;
nf->waited_total = f->waited_total;
nf->sent = f->sent;
-   nf->sent_jiffs = f->sent_jiffs;
f->skb = skb;
  
  	return 

Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-11 Thread Ed Cashin
I would like to see some performance measurements for this patch on a system 
with fast storage and multiple 10 GbE links. 

If not, at least a good analysis of the expected performance impact the patch 
will have on major architectures. 

Tonight I will think about whether the 2038 thing even matters or whether we 
just need a comment explaining why it's safe.

On May 11, 2015 11:38 AM, Arnd Bergmann  wrote:
>
> On Monday 11 May 2015 08:05:05 Tina Ruchandani wrote: 
> > 'struct frame' uses two variables to store the sent timestamp - 'struct 
> > timeval' and jiffies. jiffies is used to avoid discrepancies caused by 
> > updates to system time. 'struct timeval' uses 32-bit representation for 
> > seconds which will overflow in year 2038. 
> > This patch does the following: 
> > - Replace the use of 'struct timeval' and jiffies with ktime_t, which 
> > is a 64-bit timestamp and is year 2038 safe. 
> > - ktime_t provides both long range (like jiffies) and high resolution 
> > (like timeval). Using ktime_get (monotonic time) instead of wall-clock 
> > time prevents any discprepancies caused by updates to system time. 
> > 
> > Signed-off-by: Tina Ruchandani  
>
> Very nice! 
>
> > @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) 
> >  static int 
> >  tsince_hr(struct frame *f) 
> >  { 
> > - struct timeval now; 
> > + ktime_t now; 
> >  int n; 
> >  
> > - do_gettimeofday(); 
> > - n = now.tv_usec - f->sent.tv_usec; 
> > - n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC; 
> > + now = ktime_get(); 
> > + n = ktime_to_us(ktime_sub(now, f->sent)); 
> >  
>
> I would cut four extra lines by writing this as 
>
> return ktime_us_delta(ktime_get(), f->sent)); 
>
> but the effect is exactly the same. 
>
> With that change, please add 
>
> Reviewed-by: Arnd Bergmann  
>
> Arnd 


Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-11 Thread Ed Cashin

First, thanks for the patch.  I do appreciate the attempt to simplify
this part of the driver, but I don't think that this patch is good to merge.

I'll make some comments inline below.

On 05/10/2015 10:35 PM, Tina Ruchandani wrote:

'struct frame' uses two variables to store the sent timestamp - 'struct
timeval' and jiffies. jiffies is used to avoid discrepancies caused by
updates to system time. 'struct timeval' uses 32-bit representation for
seconds which will overflow in year 2038.


The comment in the deleted lines below mentions the fact that the
overflow does not matter for calculating rough-grained deltas in time.
So there is no problem in 2038 or on systems with the clock set to 2038
accidentally.


This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock
time prevents any discprepancies caused by updates to system time.


But the patch only changes the struct frame data.  The aoe driver
only has the struct frame for an incoming AoE response when that
response is expected.  If the response comes in a bit late, the frame
may have already been used for a new command.

You can see that in aoecmd_ata_rsp when getframe_deferred returns
NULL and tsince is called instead of tsince_hr.

In that case, there is still information about the timing embedded in
the AoE tag.  The send time in jiffies is a rough-grained record of the
send time, and it's extracted from the tag.  For these unexpected
responses, this timing information can improve performance significantly
without introducing extra overhead or risk.

I don't think the patch considers this aspect of the way the round trip
time is calculated, and I don't think the primary motivation is justified
(if that's 2038 safety, which we have already).

Simplifying it would be nice, but it would be difficult to thoroughly test
all of the performance implications.  There are still people using 32-bit
systems, for example.



Signed-off-by: Tina Ruchandani ruchandani.t...@gmail.com
---
  drivers/block/aoe/aoe.h|  3 +--
  drivers/block/aoe/aoecmd.c | 36 +++-
  2 files changed, 8 insertions(+), 31 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 9220f8e..4582b3c 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -112,8 +112,7 @@ enum frame_flags {
  struct frame {
struct list_head head;
u32 tag;
-   struct timeval sent;/* high-res time packet was sent */
-   u32 sent_jiffs; /* low-res jiffies-based sent time */
+   ktime_t sent;
ulong waited;
ulong waited_total;
struct aoetgt *t;   /* parent target I belong to */
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 422b7d8..7f78780 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -398,8 +398,7 @@ aoecmd_ata_rw(struct aoedev *d)
  
  	skb = skb_clone(f-skb, GFP_ATOMIC);

if (skb) {
-   do_gettimeofday(f-sent);
-   f-sent_jiffs = (u32) jiffies;
+   f-sent = ktime_get();
__skb_queue_head_init(queue);
__skb_queue_tail(queue, skb);
aoenet_xmit(queue);
@@ -489,8 +488,7 @@ resend(struct aoedev *d, struct frame *f)
skb = skb_clone(skb, GFP_ATOMIC);
if (skb == NULL)
return;
-   do_gettimeofday(f-sent);
-   f-sent_jiffs = (u32) jiffies;
+   f-sent = ktime_get();
__skb_queue_head_init(queue);
__skb_queue_tail(queue, skb);
aoenet_xmit(queue);
@@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f)
  static int
  tsince_hr(struct frame *f)
  {
-   struct timeval now;
+   ktime_t now;
int n;
  
-	do_gettimeofday(now);

-   n = now.tv_usec - f-sent.tv_usec;
-   n += (now.tv_sec - f-sent.tv_sec) * USEC_PER_SEC;
+   now = ktime_get();
+   n = ktime_to_us(ktime_sub(now, f-sent));
  
  	if (n  0)

n = -n;
  
-	/* For relatively long periods, use jiffies to avoid

-* discrepancies caused by updates to the system time.
-*
-* On system with HZ of 1000, 32-bits is over 49 days
-* worth of jiffies, or over 71 minutes worth of usecs.
-*
-* Jiffies overflow is handled by subtraction of unsigned ints:
-* (gdb) print (unsigned) 2 - (unsigned) 0xfffe
-* $3 = 4
-* (gdb)
-*/
-   if (n  USEC_PER_SEC / 4) {
-   n = ((u32) jiffies) - f-sent_jiffs;
-   n *= USEC_PER_SEC / HZ;
-   }
-
return n;
  }
  
@@ -589,7 +570,6 @@ reassign_frame(struct frame *f)

nf-waited = 0;
nf-waited_total = f-waited_total;
nf-sent = f-sent;
-   nf-sent_jiffs = 

Re: [PATCH] aoe: Use 64-bit timestamp in frame

2015-05-11 Thread Ed Cashin
I would like to see some performance measurements for this patch on a system 
with fast storage and multiple 10 GbE links. 

If not, at least a good analysis of the expected performance impact the patch 
will have on major architectures. 

Tonight I will think about whether the 2038 thing even matters or whether we 
just need a comment explaining why it's safe.

On May 11, 2015 11:38 AM, Arnd Bergmann a...@arndb.de wrote:

 On Monday 11 May 2015 08:05:05 Tina Ruchandani wrote: 
  'struct frame' uses two variables to store the sent timestamp - 'struct 
  timeval' and jiffies. jiffies is used to avoid discrepancies caused by 
  updates to system time. 'struct timeval' uses 32-bit representation for 
  seconds which will overflow in year 2038. 
  This patch does the following: 
  - Replace the use of 'struct timeval' and jiffies with ktime_t, which 
  is a 64-bit timestamp and is year 2038 safe. 
  - ktime_t provides both long range (like jiffies) and high resolution 
  (like timeval). Using ktime_get (monotonic time) instead of wall-clock 
  time prevents any discprepancies caused by updates to system time. 
  
  Signed-off-by: Tina Ruchandani ruchandani.t...@gmail.com 

 Very nice! 

  @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) 
   static int 
   tsince_hr(struct frame *f) 
   { 
  - struct timeval now; 
  + ktime_t now; 
   int n; 
   
  - do_gettimeofday(now); 
  - n = now.tv_usec - f-sent.tv_usec; 
  - n += (now.tv_sec - f-sent.tv_sec) * USEC_PER_SEC; 
  + now = ktime_get(); 
  + n = ktime_to_us(ktime_sub(now, f-sent)); 
   

 I would cut four extra lines by writing this as 

 return ktime_us_delta(ktime_get(), f-sent)); 

 but the effect is exactly the same. 

 With that change, please add 

 Reviewed-by: Arnd Bergmann a...@arndb.de 

 Arnd 


[PATCH] aoe: Update aoe maintainer information

2015-03-14 Thread Ed Cashin
The coraid.com email address is defunct.  The old aoe support area
hosted at coraid.com is no longer up.  These changes update the email
and website to current ones.

Signed-off-by: Ed Cashin 
---

The docs need updating as well, but I want to fix the email before
tackling that.

 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d1bddc..7f58068 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1679,8 +1679,8 @@ F:drivers/misc/eeprom/at24.c
 F: include/linux/platform_data/at24.h
 
 ATA OVER ETHERNET (AOE) DRIVER
-M: "Ed L. Cashin" 
-W: http://support.coraid.com/support/linux
+M: "Ed L. Cashin" 
+W: http://www.openaoe.org/
 S: Supported
 F: Documentation/aoe/
 F: drivers/block/aoe/
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] aoe: Update aoe maintainer information

2015-03-14 Thread Ed Cashin
The coraid.com email address is defunct.  The old aoe support area
hosted at coraid.com is no longer up.  These changes update the email
and website to current ones.

Signed-off-by: Ed Cashin ed.cas...@acm.org
---

The docs need updating as well, but I want to fix the email before
tackling that.

 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d1bddc..7f58068 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1679,8 +1679,8 @@ F:drivers/misc/eeprom/at24.c
 F: include/linux/platform_data/at24.h
 
 ATA OVER ETHERNET (AOE) DRIVER
-M: Ed L. Cashin ecas...@coraid.com
-W: http://support.coraid.com/support/linux
+M: Ed L. Cashin ed.cas...@acm.org
+W: http://www.openaoe.org/
 S: Supported
 F: Documentation/aoe/
 F: drivers/block/aoe/
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] incorrect use of init_completion fixup

2014-12-23 Thread Ed Cashin
On Tue, Dec 23, 2014 at 12:44 PM, Nicholas Mc Guire  wrote:
>
> The second init_completion call should be a reinit_completion here.
>
> patch is against 3.18.0 linux-next
>
> Signed-off-by: Nicholas Mc Guire 
> ---
>  drivers/block/aoe/aoecmd.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
> index 422b7d8..2239513 100644
> --- a/drivers/block/aoe/aoecmd.c
> +++ b/drivers/block/aoe/aoecmd.c
> @@ -1331,7 +1331,7 @@ aoe_ktstart(struct ktstate *k)
> return -ENOMEM;
> k->task = task;
> wait_for_completion(>rendez); /* allow kthread to start */
> -   init_completion(>rendez);/* for waiting for exit later */
> +   reinit_completion(>rendez);  /* for waiting for exit later */
> return 0;
>  }

That looks good to me, thanks.

-- 
  Ed Cashin 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] incorrect use of init_completion fixup

2014-12-23 Thread Ed Cashin
On Tue, Dec 23, 2014 at 12:44 PM, Nicholas Mc Guire der.h...@hofr.at wrote:

 The second init_completion call should be a reinit_completion here.

 patch is against 3.18.0 linux-next

 Signed-off-by: Nicholas Mc Guire der.h...@hofr.at
 ---
  drivers/block/aoe/aoecmd.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
 index 422b7d8..2239513 100644
 --- a/drivers/block/aoe/aoecmd.c
 +++ b/drivers/block/aoe/aoecmd.c
 @@ -1331,7 +1331,7 @@ aoe_ktstart(struct ktstate *k)
 return -ENOMEM;
 k-task = task;
 wait_for_completion(k-rendez); /* allow kthread to start */
 -   init_completion(k-rendez);/* for waiting for exit later */
 +   reinit_completion(k-rendez);  /* for waiting for exit later */
 return 0;
  }

That looks good to me, thanks.

-- 
  Ed Cashin ecas...@noserose.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] aoe: remove do-nothing NAME="%k" term from example udev rules

2013-08-28 Thread Ed Cashin
When the example udev rules in the documentation are used without
modification, warnings like the one shown below appear in the system
logs:

  /var/log/messages:Aug 22 11:09:11 kung udevd[445]: NAME="%k" \
is superfluous and breaks kernel supplied names, please remove \
it from /etc/udev/rules.d/60-aoe.rules:26

Removing the term does not cause any problems with the creation
of the special character and block device nodes.

Signed-off-by: Ed Cashin 
---
 Documentation/aoe/udev.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt
index 8686e78..1f06daf 100644
--- a/Documentation/aoe/udev.txt
+++ b/Documentation/aoe/udev.txt
@@ -23,4 +23,4 @@ SUBSYSTEM=="aoe", KERNEL=="revalidate",   
NAME="etherd/%k", GROUP="disk", MODE="02
 SUBSYSTEM=="aoe", KERNEL=="flush", NAME="etherd/%k", GROUP="disk", 
MODE="0220"
 
 # aoe block devices 
-KERNEL=="etherd*",   NAME="%k", GROUP="disk"
+KERNEL=="etherd*",   GROUP="disk"
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] aoe: remove too-strict BUG from aoedev teardown and update udev example

2013-08-28 Thread Ed Cashin
This patch series applies to linux-next/akpm fetched 22 August, commit
a187db44dd9b24a9eed2141af962306a36935423.  Yesterday's linux-next/akpm
has no updates to the aoe files that could conflict, though.

Ed L. Cashin (2):
  aoe: do not BUG if memory pressure prevented debugfs file creation
  aoe: remove do-nothing NAME="%k" term from example udev rules

 Documentation/aoe/udev.txt |2 +-
 drivers/block/aoe/aoeblk.c |1 -
 2 files changed, 1 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] aoe: do not BUG if memory pressure prevented debugfs file creation

2013-08-28 Thread Ed Cashin
If the system has trouble allocating memory for the creation of the
aoe debugfs directory or of a file inside it, the debugfs member of
an aoedev can be NULL.

Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown,
avoiding the user impact of an unecessary panic.

Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoeblk.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index d63dcf0..dd73e1f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -215,7 +215,6 @@ aoedisk_add_debugfs(struct aoedev *d)
 void
 aoedisk_rm_debugfs(struct aoedev *d)
 {
-   BUG_ON(d->debugfs == NULL);
debugfs_remove(d->debugfs);
d->debugfs = NULL;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] aoe: do not BUG if memory pressure prevented debugfs file creation

2013-08-28 Thread Ed Cashin
If the system has trouble allocating memory for the creation of the
aoe debugfs directory or of a file inside it, the debugfs member of
an aoedev can be NULL.

Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown,
avoiding the user impact of an unecessary panic.

Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoeblk.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index d63dcf0..dd73e1f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -215,7 +215,6 @@ aoedisk_add_debugfs(struct aoedev *d)
 void
 aoedisk_rm_debugfs(struct aoedev *d)
 {
-   BUG_ON(d-debugfs == NULL);
debugfs_remove(d-debugfs);
d-debugfs = NULL;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] aoe: remove too-strict BUG from aoedev teardown and update udev example

2013-08-28 Thread Ed Cashin
This patch series applies to linux-next/akpm fetched 22 August, commit
a187db44dd9b24a9eed2141af962306a36935423.  Yesterday's linux-next/akpm
has no updates to the aoe files that could conflict, though.

Ed L. Cashin (2):
  aoe: do not BUG if memory pressure prevented debugfs file creation
  aoe: remove do-nothing NAME=%k term from example udev rules

 Documentation/aoe/udev.txt |2 +-
 drivers/block/aoe/aoeblk.c |1 -
 2 files changed, 1 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] aoe: remove do-nothing NAME=%k term from example udev rules

2013-08-28 Thread Ed Cashin
When the example udev rules in the documentation are used without
modification, warnings like the one shown below appear in the system
logs:

  /var/log/messages:Aug 22 11:09:11 kung udevd[445]: NAME=%k \
is superfluous and breaks kernel supplied names, please remove \
it from /etc/udev/rules.d/60-aoe.rules:26

Removing the term does not cause any problems with the creation
of the special character and block device nodes.

Signed-off-by: Ed Cashin ecas...@coraid.com
---
 Documentation/aoe/udev.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt
index 8686e78..1f06daf 100644
--- a/Documentation/aoe/udev.txt
+++ b/Documentation/aoe/udev.txt
@@ -23,4 +23,4 @@ SUBSYSTEM==aoe, KERNEL==revalidate,   
NAME=etherd/%k, GROUP=disk, MODE=02
 SUBSYSTEM==aoe, KERNEL==flush, NAME=etherd/%k, GROUP=disk, 
MODE=0220
 
 # aoe block devices 
-KERNEL==etherd*,   NAME=%k, GROUP=disk
+KERNEL==etherd*,   GROUP=disk
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: suppress compiler warnings

2013-08-26 Thread Ed Cashin
On Aug 26, 2013, at 5:45 AM, Andy Shevchenko wrote:

> This patch fixes following compiler warnings:
> 
> drivers/block/aoe/aoecmd.c: In function ‘aoecmd_ata_rw’:
> drivers/block/aoe/aoecmd.c:383:17: warning: variable ‘t’ set but not used 
> [-Wunused-but-set-variable]
>  struct aoetgt *t;
> ^

Looks good, thanks.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: suppress compiler warnings

2013-08-26 Thread Ed Cashin
On Aug 26, 2013, at 5:45 AM, Andy Shevchenko wrote:

 This patch fixes following compiler warnings:
 
 drivers/block/aoe/aoecmd.c: In function ‘aoecmd_ata_rw’:
 drivers/block/aoe/aoecmd.c:383:17: warning: variable ‘t’ set but not used 
 [-Wunused-but-set-variable]
  struct aoetgt *t;
 ^

Looks good, thanks.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/22] block: Abstract out bvec iterator

2013-08-13 Thread Ed Cashin
On Aug 9, 2013, Ed Cashin wrote:
> On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote:
> ...
> > It's in the for-jens branch now.
> 
> 
> Just examining the patches, I like the way it cleans up the aoe code.  I
> had a question about a new BUG added by the for-jens branch the
> read-response handling path of the aoe driver.

The aoe driver in linux-bcache/for-jens commit 4c36c973a8f45 is
passing my tests.

Here is a patch against that branch illustrating my suggestion for
handling bad target responses gracefully.


commit 2c39f50b1ee02e2ac07fd072a883a91713da53cc
Author: Ed Cashin 
Date:   Tue Aug 13 10:50:28 2013 -0400

aoe: bad AoE responses fail I/O without BUG

Instead of having a BUG when the AoE target does something wrong,
just fail the I/O and log the problem with rate limiting.

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index cacd48e..b9916a6 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1096,7 +1096,6 @@ bvcpy(struct sk_buff *skb, struct bio *bio, struct 
bvec_iter iter, long cnt)
int soff = 0;
struct bio_vec bv;
 
-   BUG_ON(cnt > iter.bi_size);
iter.bi_size = cnt;
 
__bio_for_each_segment(bv, bio, iter, iter) {
@@ -1196,6 +1195,14 @@ noskb:   if (buf)
clear_bit(BIO_UPTODATE, >bio->bi_flags);
break;
}
+   if (n > f->iter.bi_size) {
+   pr_err_ratelimited("%s e%ld.%d.  bytes=%ld need=%u\n",
+   "aoe: too-large data size in read from",
+   (long) d->aoemajor, d->aoeminor,
+   n, f->iter.bi_size);
+   clear_bit(BIO_UPTODATE, >bio->bi_flags);
+   break;
+   }
bvcpy(skb, f->buf->bio, f->iter, n);
case ATA_CMD_PIO_WRITE:
case ATA_CMD_PIO_WRITE_EXT:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/22] block: Abstract out bvec iterator

2013-08-13 Thread Ed Cashin
On Aug 9, 2013, Ed Cashin wrote:
 On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote:
 ...
  It's in the for-jens branch now.
 
 
 Just examining the patches, I like the way it cleans up the aoe code.  I
 had a question about a new BUG added by the for-jens branch the
 read-response handling path of the aoe driver.

The aoe driver in linux-bcache/for-jens commit 4c36c973a8f45 is
passing my tests.

Here is a patch against that branch illustrating my suggestion for
handling bad target responses gracefully.


commit 2c39f50b1ee02e2ac07fd072a883a91713da53cc
Author: Ed Cashin ecas...@coraid.com
Date:   Tue Aug 13 10:50:28 2013 -0400

aoe: bad AoE responses fail I/O without BUG

Instead of having a BUG when the AoE target does something wrong,
just fail the I/O and log the problem with rate limiting.

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index cacd48e..b9916a6 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1096,7 +1096,6 @@ bvcpy(struct sk_buff *skb, struct bio *bio, struct 
bvec_iter iter, long cnt)
int soff = 0;
struct bio_vec bv;
 
-   BUG_ON(cnt  iter.bi_size);
iter.bi_size = cnt;
 
__bio_for_each_segment(bv, bio, iter, iter) {
@@ -1196,6 +1195,14 @@ noskb:   if (buf)
clear_bit(BIO_UPTODATE, buf-bio-bi_flags);
break;
}
+   if (n  f-iter.bi_size) {
+   pr_err_ratelimited(%s e%ld.%d.  bytes=%ld need=%u\n,
+   aoe: too-large data size in read from,
+   (long) d-aoemajor, d-aoeminor,
+   n, f-iter.bi_size);
+   clear_bit(BIO_UPTODATE, buf-bio-bi_flags);
+   break;
+   }
bvcpy(skb, f-buf-bio, f-iter, n);
case ATA_CMD_PIO_WRITE:
case ATA_CMD_PIO_WRITE_EXT:
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/22] block: Abstract out bvec iterator

2013-08-09 Thread Ed Cashin
On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote:
...
> It's in the for-jens branch now.


Just examining the patches, I like the way it cleans up the aoe code.  I had a 
question about a new BUG added by the for-jens branch the read-response 
handling path of the aoe driver.

It looks like if a misbehaving AoE target has a bad (too high compared to the 
request) sector count, but sends a packet large enough for that sector count to 
seem legit in ktiocomplete, then the patched bvcpy will BUG.  An example would 
be the case where 1024 bytes was requested but a (bad but possible) AoE read 
response comes back with 4096 bytes in a jumbo frame.  Here's an excerpt from 
ktiocomplete:

 n = ahout->scnt << 9;
 switch (ahout->cmdstat) {
 case ATA_CMD_PIO_READ:
 case ATA_CMD_PIO_READ_EXT:
  if (skb->len < n) {
   pr_err("%s e%ld.%d.  skb->len=%d need=%ld\n",
"aoe: runt data size in read from",
(long) d->aoemajor, d->aoeminor,
  skb->len, n);
   clear_bit(BIO_UPTODATE, >bio->bi_flags);
   break;
  }
  bvcpy(skb, f->buf->bio, f->iter, n);

... and earlier in linux-bcache/for-jens aoecmd.c there's bvcpy ...

static void
bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt)
{
 int soff = 0;
 struct bio_vec bv;

 BUG_ON(cnt > iter.bi_size);

It seems like it would be better to treat that case as another indication of a 
problem with the target that gets logged when the AoE response is ignored, just 
as happens for "runt data size".  That way people working on or trying out 
experimental AoE targets don't panic the initiator system.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/22] block: Abstract out bvec iterator

2013-08-09 Thread Ed Cashin
On Aug 8, 2013, at 9:05 PM, Kent Overstreet wrote:
...
 It's in the for-jens branch now.


Just examining the patches, I like the way it cleans up the aoe code.  I had a 
question about a new BUG added by the for-jens branch the read-response 
handling path of the aoe driver.

It looks like if a misbehaving AoE target has a bad (too high compared to the 
request) sector count, but sends a packet large enough for that sector count to 
seem legit in ktiocomplete, then the patched bvcpy will BUG.  An example would 
be the case where 1024 bytes was requested but a (bad but possible) AoE read 
response comes back with 4096 bytes in a jumbo frame.  Here's an excerpt from 
ktiocomplete:

 n = ahout-scnt  9;
 switch (ahout-cmdstat) {
 case ATA_CMD_PIO_READ:
 case ATA_CMD_PIO_READ_EXT:
  if (skb-len  n) {
   pr_err(%s e%ld.%d.  skb-len=%d need=%ld\n,
aoe: runt data size in read from,
(long) d-aoemajor, d-aoeminor,
  skb-len, n);
   clear_bit(BIO_UPTODATE, buf-bio-bi_flags);
   break;
  }
  bvcpy(skb, f-buf-bio, f-iter, n);

... and earlier in linux-bcache/for-jens aoecmd.c there's bvcpy ...

static void
bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt)
{
 int soff = 0;
 struct bio_vec bv;

 BUG_ON(cnt  iter.bi_size);

It seems like it would be better to treat that case as another indication of a 
problem with the target that gets logged when the AoE response is ignored, just 
as happens for runt data size.  That way people working on or trying out 
experimental AoE targets don't panic the initiator system.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/22] block: Abstract out bvec iterator

2013-08-08 Thread Ed Cashin
On Aug 8, 2013, at 8:09 PM, Kent Overstreet wrote:

> On Wed, Aug 07, 2013 at 10:04:36PM -0400, Ed Cashin wrote:
>> On Aug 7, 2013, at 5:54 PM, Kent Overstreet wrote:
>> 
>>> Immutable biovecs are going to require an explicit iterator. To
>>> implement immutable bvecs, a later patch is going to add a bi_bvec_done
>>> member to this struct; for now, this patch effectively just renames
>>> things.
>> 
>> Hi, Kent Overstreet.  Thanks for Cc'ing me and for the promising work.
>> 
>> Were you able to do sanity tests with aoe this time around?  Last time, 
>> basic I/O was not working with the immutable biovec patches applied.
>> 
>> Here is my 28 June email about my experiences with 
>> git://evilpiepirate.org/~kent/linux-bcache.git at that time.  It also 
>> includes information about creating an easy software-only aoe test 
>> environment.
>> 
>>  http://thread.gmane.org/gmane.linux.kernel/1505222/focus=1517924
> 
> Hey, thanks for testing it - sorry, I think I remember seeing that email
> last time and got sidetracked before I got around to setting up some
> tests.

No worries.

> I think I've got it working now, it's running the same stress tests I
> use for bcache. Here's a fixed patch, I broke the aoe changes out into
> their own patch since they were more involved than most of the others:

I had added your git tree as a remote and checked out the for-jens branch.  
This patch conflicts with some of the changes in that patch, so I could use a 
pointer as to which base this new patch applies to.

If it's already in some branch in linux-bcache, I can just use it there, and 
that's even more convenient.

> (I am seeing a bug where it's getting stuck after running stress tests
> for awhile, but I can reproduce that without the aoe changes too...)

Just so I know what to look for, what does that behavior look like?  I/O stops 
going and procs get stuck in disk sleep, maybe?

Thanks.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-08 Thread Ed Cashin
On Aug 8, 2013, at 3:56 PM, Andrew Morton wrote:

> On Wed, 7 Aug 2013 20:50:09 -0400 Ed Cashin  wrote:
...
>> When the workaround was created, it was with the assumption that the 
>> zero-count pages are not always tail pages, and that seemed to be the case 
>> in 2007, but as I said, I don't have a mechanism for detecting that now, so 
>> I cannot say whether it really happens with today's kernel.
> 
> It sounds we should pull out all that code and retest.  It shouldn't be
> needed - if this results in some failure then I suspect core MM will
> need changes.

OK.  I'll look into that.  It sure would be nice to get rid of it.

> Why don't you have a "mechanism for detecting that"?  It's a matter of
> pointing AOE at some hugetlb pages?

No, I was testing with hugetlb pages already.  But there are many use case 
combinations, and I meant that aoe has no mechanism that has been in the aoe 
code all these years to catch specific (possibly rare) use cases that result in 
zero-count pages.  It would be something like this:

+static void
+check_page_counts(struct bio *bio)
+{
+   struct bio_vec *bv;
+   int n;
+   int i;
+
+   bio_for_each_segment(bv, bio, i) {
+   n = page_count(bv->bv_page);
+   WARN_ONCE(n <= 0,
+   "aoe: page %d in bio has non-positive count %d\n",
+   i, n);
+   }
+}

I'll add that when I try testing without the page count manipulation.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-08 Thread Ed Cashin
On Aug 8, 2013, at 3:56 PM, Andrew Morton wrote:

 On Wed, 7 Aug 2013 20:50:09 -0400 Ed Cashin ecas...@coraid.com wrote:
...
 When the workaround was created, it was with the assumption that the 
 zero-count pages are not always tail pages, and that seemed to be the case 
 in 2007, but as I said, I don't have a mechanism for detecting that now, so 
 I cannot say whether it really happens with today's kernel.
 
 It sounds we should pull out all that code and retest.  It shouldn't be
 needed - if this results in some failure then I suspect core MM will
 need changes.

OK.  I'll look into that.  It sure would be nice to get rid of it.

 Why don't you have a mechanism for detecting that?  It's a matter of
 pointing AOE at some hugetlb pages?

No, I was testing with hugetlb pages already.  But there are many use case 
combinations, and I meant that aoe has no mechanism that has been in the aoe 
code all these years to catch specific (possibly rare) use cases that result in 
zero-count pages.  It would be something like this:

+static void
+check_page_counts(struct bio *bio)
+{
+   struct bio_vec *bv;
+   int n;
+   int i;
+
+   bio_for_each_segment(bv, bio, i) {
+   n = page_count(bv-bv_page);
+   WARN_ONCE(n = 0,
+   aoe: page %d in bio has non-positive count %d\n,
+   i, n);
+   }
+}

I'll add that when I try testing without the page count manipulation.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/22] block: Abstract out bvec iterator

2013-08-08 Thread Ed Cashin
On Aug 8, 2013, at 8:09 PM, Kent Overstreet wrote:

 On Wed, Aug 07, 2013 at 10:04:36PM -0400, Ed Cashin wrote:
 On Aug 7, 2013, at 5:54 PM, Kent Overstreet wrote:
 
 Immutable biovecs are going to require an explicit iterator. To
 implement immutable bvecs, a later patch is going to add a bi_bvec_done
 member to this struct; for now, this patch effectively just renames
 things.
 
 Hi, Kent Overstreet.  Thanks for Cc'ing me and for the promising work.
 
 Were you able to do sanity tests with aoe this time around?  Last time, 
 basic I/O was not working with the immutable biovec patches applied.
 
 Here is my 28 June email about my experiences with 
 git://evilpiepirate.org/~kent/linux-bcache.git at that time.  It also 
 includes information about creating an easy software-only aoe test 
 environment.
 
  http://thread.gmane.org/gmane.linux.kernel/1505222/focus=1517924
 
 Hey, thanks for testing it - sorry, I think I remember seeing that email
 last time and got sidetracked before I got around to setting up some
 tests.

No worries.

 I think I've got it working now, it's running the same stress tests I
 use for bcache. Here's a fixed patch, I broke the aoe changes out into
 their own patch since they were more involved than most of the others:

I had added your git tree as a remote and checked out the for-jens branch.  
This patch conflicts with some of the changes in that patch, so I could use a 
pointer as to which base this new patch applies to.

If it's already in some branch in linux-bcache, I can just use it there, and 
that's even more convenient.

 (I am seeing a bug where it's getting stuck after running stress tests
 for awhile, but I can reproduce that without the aoe changes too...)

Just so I know what to look for, what does that behavior look like?  I/O stops 
going and procs get stuck in disk sleep, maybe?

Thanks.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 8:05 PM, Andrew Morton wrote:

> On Wed, 7 Aug 2013 19:54:45 -0400 Ed Cashin  wrote:
> 
>>> +bio_pageinc(struct bio *bio)
>>> +{
>>> +   struct bio_vec *bv;
>>> +   struct page *page;
>>> +   int i;
>>> +
>>> +   bio_for_each_segment(bv, bio, i) {
>>> +   page = bv->bv_page;
>>> +   /* Non-zero page count for non-head members of
>>> +* compound pages is no longer allowed by the kernel,
>>> +* but this has never been seen here.
>>> +*/
>>> +   if (unlikely(PageCompound(page)))
>>> +   if (compound_trans_head(page) != page) {
>>> +   pr_crit("page tail used for block I/O\n");
>>> +   BUG();
>>> +   }
>>> 
>>> But get_page/put_page against a tail page of a compound page should
>>> Just Work.  The core MM will hunt down the head page and manipulate its
>>> refcount.  
>>> 
>>> Perhaps the problem here is that AOE is diving under the covers and is
>>> using low-level page refcount alterations:
>>> 
>>> +   atomic_inc(>_count);
>>> 
>>> Why does AOE do this?  It would be better if it were to use the
>>> official published MM interfaces.  If those interfaces need
>>> alteration/augmentation then we can do that.
>> 
>> The get_page cannot be used when the _count is 0 (which is exactly when we 
>> need to increment it), because it will VM_BUG_ON.
>> 
>>  /*
>>   * Getting a normal page or the head of a compound page
>>   * requires to already have an elevated page->_count.
>>   */
>>  VM_BUG_ON(atomic_read(>_count) <= 0);
>> 
> 
> But we shouldn't get that far:
> 
> static inline void get_page(struct page *page)
> {
>   if (unlikely(PageTail(page)))
>   if (likely(__get_page_tail(page)))
>   return;
>   /*
>* Getting a normal page or the head of a compound page
>* requires to already have an elevated page->_count.
>*/
>   VM_BUG_ON(atomic_read(>_count) <= 0);
>   atomic_inc(>_count);
> }
> 
> This is a tail page, so we should be using __get_page_tail().


When the workaround was created, it was with the assumption that the zero-count 
pages are not always tail pages, and that seemed to be the case in 2007, but as 
I said, I don't have a mechanism for detecting that now, so I cannot say 
whether it really happens with today's kernel.

If it is never correct for normal pages or compound page heads to have a zero 
count while they are associated with a bio, then yes, I think get_page is a 
great solution.  The VM_BUG_ON in get_page would identify any parts of the 
kernel that are supplying bios that have pages without references.

Just a note in response to "we shouldn't get that far":  I believe the 
VM_BUG_ON line in get_page does get executed for tail pages when the 
__get_page_tail returns false because the compound page head has a zero count:

  get_page -> __get_page_tail -> get_page_unless_zero returns false, so
__get_page_tail returns "got", which is still false, so
  get_page executes the VM_BUG_ON, where the _count will be zero for the 
tail page.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 7:35 PM, Andrew Morton wrote:

> On Wed, 7 Aug 2013 19:27:40 -0400 Ed Cashin  wrote:
> 
>> On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote:
>> 
>>> On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin  wrote:
>> ...
>>>> Sorry.  I tried to cover that but maybe should have separated it more 
>>>> clearly from the running text.
>>>> 
>>>> The patch changes the current behavior encountered with direct I/O and aoe:
>>>> 
>>>> aoe can BUG when direct I/O used
>>>> 
>>>> ... to a new behavior:
>>>> 
>>>> aoe does not BUG when direct I/O is used
>>>> 
>>>> And the user will never know that instead aoe is manipulating the _count 
>>>> of the head page, even though that's what's happening with this patch 
>>>> applied.
>>>> 
>>>> (The user-visible effects that motivated the older patch that added the 
>>>> BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that 
>>>> occurred when the aoe driver manipulated the _count of a compound page 
>>>> tail.)
>>> 
>>> OK, I added "Fix a BUG which can trigger when direct-IO is used with
>>> AOE".  Which kernel versions are affected?
>> 
>> 
>> That BUG was added to the aoe driver in a commit that first appears in v3.7. 
>>  It persists until this patch adds the code that handles the situation that 
>> the BUG was intended to detect.
>> 
>> [ecashin@marino linux]$ git describe --contains --match 'v*' 
>> 69cf2d85de773d998798e47e3335b85e5645d157
>> v3.7-rc1~110^2~22
>> [ecashin@marino linux]$ 
>> 
> 
> ah, is it this?
> 
> +bio_pageinc(struct bio *bio)
> +{
> +   struct bio_vec *bv;
> +   struct page *page;
> +   int i;
> +
> +   bio_for_each_segment(bv, bio, i) {
> +   page = bv->bv_page;
> +   /* Non-zero page count for non-head members of
> +* compound pages is no longer allowed by the kernel,
> +* but this has never been seen here.
> +*/
> +   if (unlikely(PageCompound(page)))
> +   if (compound_trans_head(page) != page) {
> +   pr_crit("page tail used for block I/O\n");
> +   BUG();
> +   }
> 
> But get_page/put_page against a tail page of a compound page should
> Just Work.  The core MM will hunt down the head page and manipulate its
> refcount.  
> 
> Perhaps the problem here is that AOE is diving under the covers and is
> using low-level page refcount alterations:
> 
> +   atomic_inc(>_count);
> 
> Why does AOE do this?  It would be better if it were to use the
> official published MM interfaces.  If those interfaces need
> alteration/augmentation then we can do that.

The get_page cannot be used when the _count is 0 (which is exactly when we need 
to increment it), because it will VM_BUG_ON.

/*
 * Getting a normal page or the head of a compound page
 * requires to already have an elevated page->_count.
 */
VM_BUG_ON(atomic_read(>_count) <= 0);

(And seeing the code reminds me that I forgot to include atomic_read in my last 
email's proposal.  LKML + dinner isn't a good mix. ;)

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin

On Aug 7, 2013, at 7:41 PM, Ed Cashin wrote:

> It sounds like it's wrong to give block pages with a zero count, so why not 
> just have aoe BUG_ON(compound_trans_head(bv->page->_count) == 0) until we're 
> sure nobody does that anymore?

Ugh.  I goofed the parens and such.  I meant,

  BUG_ON(compound_trans_head(bv->bv_page)->_count == 0)

... will catch the cases we think should not be occurring.

> If that idea makes sense to you, I will submit a new patch to follow the one 
> under discussion.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 5:27 PM, Andrew Morton wrote:

> On Wed, 7 Aug 2013 14:18:35 -0700 Andrew Morton  
> wrote:
> 
>> On Wed, 7 Aug 2013 17:12:36 -0400 Ed Cashin  wrote:
>> 
>>> 
>>> On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote:
>>> 
>>>> On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin  wrote:
>>>> 
>>>>> As discussed previously,
>>>> 
>>>> I think I missed that.
>>>> 
>>>>> the fact that some users of the block
>>>>> layer provide bios that point to pages with a zero _count means
>>>>> that it is not OK for the network layer to do a put_page on the
>>>>> skb frags during an skb_linearize, so the aoe driver gets a
>>>>> reference to pages in bios and puts the reference before ending
>>>>> the bio.  And because it cannot use get_page on a page with a
>>>>> zero _count, it manipulates the value directly.
>>>> 
>>>> Eh?  What code is putting count==0 pages into bios?  That sounds very
>>>> weird and broken.
>>> 
>>> I thought so in 2007 but couldn't solicit a clear "this is wrong" consensus 
>>> from the discussion.
>>> 
>>>  http://article.gmane.org/gmane.linux.kernel/499197
>>>  https://lkml.org/lkml/2007/1/19/56
>>>  https://lkml.org/lkml/2006/12/18/230
>>> 
>>> We were seeing zero-count pages in bios from XFS, but Christoph Hellwig 
>>> pointed out that kmalloced pages can also come from ext3 when it's doing 
>>> log recovery, and they'll have zero page counts.
>> 
>> aiiee!
>> 
>> It is (I suppose) reasonable to put kmalloced memory into a BIO's page
>> array.  And it is perfectly reasonable for a user of that bio to do a
>> get_page/put_page against that page.  It is utterly unreasonable for
>> the damn page to get freed as a result!
>> 
>> I'd claim that slab is broken.  The page is in use, so it should have an
>> elevated refcount, full stop.
>> 
> 
> err, no.  slab.c uses alloc_pages(), so the underlying page indeed has
> a proper refcount.  I'm still not understanding how this situation comes
> about.

It sounds like it's wrong to give block pages with a zero count, so why not 
just have aoe BUG_ON(compound_trans_head(bv->page->_count) == 0) until we're 
sure nobody does that anymore?

If that idea makes sense to you, I will submit a new patch to follow the one 
under discussion.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote:

> On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin  wrote:
...
>> Sorry.  I tried to cover that but maybe should have separated it more 
>> clearly from the running text.
>> 
>> The patch changes the current behavior encountered with direct I/O and aoe:
>> 
>>  aoe can BUG when direct I/O used
>> 
>> ... to a new behavior:
>> 
>>  aoe does not BUG when direct I/O is used
>> 
>> And the user will never know that instead aoe is manipulating the _count of 
>> the head page, even though that's what's happening with this patch applied.
>> 
>> (The user-visible effects that motivated the older patch that added the BUG 
>> to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred 
>> when the aoe driver manipulated the _count of a compound page tail.)
> 
> OK, I added "Fix a BUG which can trigger when direct-IO is used with
> AOE".  Which kernel versions are affected?


That BUG was added to the aoe driver in a commit that first appears in v3.7.  
It persists until this patch adds the code that handles the situation that the 
BUG was intended to detect.

[ecashin@marino linux]$ git describe --contains --match 'v*' 
69cf2d85de773d998798e47e3335b85e5645d157
v3.7-rc1~110^2~22
[ecashin@marino linux]$ 

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 5:00 PM, Andrew Morton wrote:

> On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin  wrote:
> 
>> As discussed previously, the fact that some users of the block
>> layer provide bios that point to pages with a zero _count means
>> that it is not OK for the network layer to do a put_page on the
>> skb frags during an skb_linearize, so the aoe driver gets a
>> reference to pages in bios and puts the reference before ending
>> the bio.  And because it cannot use get_page on a page with a
>> zero _count, it manipulates the value directly.
>> 
>> It is not OK to increment the _count of a compound page tail,
>> though, since the VM layer will VM_BUG_ON a non-zero _count.
>> Block users that do direct I/O can result in the aoe driver
>> seeing compound page tails in bios.  In that case, the same logic
>> works as long as the head of the compound page is used instead of
>> the tails.  This patch handles compound pages and does not BUG.
>> 
>> It relies on the block layer user leaving the relationship
>> between the page tail and its head alone for the duration between
>> the submission of the bio and its completion, whether successful
>> or not.
> 
> What are the end-user-visible effects of the problem which is being
> fixed?  Please always include this info when fixing bugs so that others
> can work out what kernel version(s) need the patch.


Sorry.  I tried to cover that but maybe should have separated it more clearly 
from the running text.

The patch changes the current behavior encountered with direct I/O and aoe:

  aoe can BUG when direct I/O used

... to a new behavior:

  aoe does not BUG when direct I/O is used

And the user will never know that instead aoe is manipulating the _count of the 
head page, even though that's what's happening with this patch applied.

(The user-visible effects that motivated the older patch that added the BUG to 
aoe when page tails are used for block I/O was a VM_BUG_ON that occurred when 
the aoe driver manipulated the _count of a compound page tail.)

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin

On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote:

> On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin  wrote:
> 
>> As discussed previously,
> 
> I think I missed that.
> 
>> the fact that some users of the block
>> layer provide bios that point to pages with a zero _count means
>> that it is not OK for the network layer to do a put_page on the
>> skb frags during an skb_linearize, so the aoe driver gets a
>> reference to pages in bios and puts the reference before ending
>> the bio.  And because it cannot use get_page on a page with a
>> zero _count, it manipulates the value directly.
> 
> Eh?  What code is putting count==0 pages into bios?  That sounds very
> weird and broken.

I thought so in 2007 but couldn't solicit a clear "this is wrong" consensus 
from the discussion.

  http://article.gmane.org/gmane.linux.kernel/499197
  https://lkml.org/lkml/2007/1/19/56
  https://lkml.org/lkml/2006/12/18/230

We were seeing zero-count pages in bios from XFS, but Christoph Hellwig pointed 
out that kmalloced pages can also come from ext3 when it's doing log recovery, 
and they'll have zero page counts.

I don't have a counter in the driver to track how often zero-count pages show 
up in bios, so I don't know how common it is these days.  I would have been 
motivated to look if I had gotten the impression that there was any interest in 
identifying and eliminating cases where zero-count pages were being used in 
bios.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin

On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote:

 On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin ecas...@coraid.com wrote:
 
 As discussed previously,
 
 I think I missed that.
 
 the fact that some users of the block
 layer provide bios that point to pages with a zero _count means
 that it is not OK for the network layer to do a put_page on the
 skb frags during an skb_linearize, so the aoe driver gets a
 reference to pages in bios and puts the reference before ending
 the bio.  And because it cannot use get_page on a page with a
 zero _count, it manipulates the value directly.
 
 Eh?  What code is putting count==0 pages into bios?  That sounds very
 weird and broken.

I thought so in 2007 but couldn't solicit a clear this is wrong consensus 
from the discussion.

  http://article.gmane.org/gmane.linux.kernel/499197
  https://lkml.org/lkml/2007/1/19/56
  https://lkml.org/lkml/2006/12/18/230

We were seeing zero-count pages in bios from XFS, but Christoph Hellwig pointed 
out that kmalloced pages can also come from ext3 when it's doing log recovery, 
and they'll have zero page counts.

I don't have a counter in the driver to track how often zero-count pages show 
up in bios, so I don't know how common it is these days.  I would have been 
motivated to look if I had gotten the impression that there was any interest in 
identifying and eliminating cases where zero-count pages were being used in 
bios.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 5:00 PM, Andrew Morton wrote:

 On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin ecas...@coraid.com wrote:
 
 As discussed previously, the fact that some users of the block
 layer provide bios that point to pages with a zero _count means
 that it is not OK for the network layer to do a put_page on the
 skb frags during an skb_linearize, so the aoe driver gets a
 reference to pages in bios and puts the reference before ending
 the bio.  And because it cannot use get_page on a page with a
 zero _count, it manipulates the value directly.
 
 It is not OK to increment the _count of a compound page tail,
 though, since the VM layer will VM_BUG_ON a non-zero _count.
 Block users that do direct I/O can result in the aoe driver
 seeing compound page tails in bios.  In that case, the same logic
 works as long as the head of the compound page is used instead of
 the tails.  This patch handles compound pages and does not BUG.
 
 It relies on the block layer user leaving the relationship
 between the page tail and its head alone for the duration between
 the submission of the bio and its completion, whether successful
 or not.
 
 What are the end-user-visible effects of the problem which is being
 fixed?  Please always include this info when fixing bugs so that others
 can work out what kernel version(s) need the patch.


Sorry.  I tried to cover that but maybe should have separated it more clearly 
from the running text.

The patch changes the current behavior encountered with direct I/O and aoe:

  aoe can BUG when direct I/O used

... to a new behavior:

  aoe does not BUG when direct I/O is used

And the user will never know that instead aoe is manipulating the _count of the 
head page, even though that's what's happening with this patch applied.

(The user-visible effects that motivated the older patch that added the BUG to 
aoe when page tails are used for block I/O was a VM_BUG_ON that occurred when 
the aoe driver manipulated the _count of a compound page tail.)

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote:

 On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin ecas...@coraid.com wrote:
...
 Sorry.  I tried to cover that but maybe should have separated it more 
 clearly from the running text.
 
 The patch changes the current behavior encountered with direct I/O and aoe:
 
  aoe can BUG when direct I/O used
 
 ... to a new behavior:
 
  aoe does not BUG when direct I/O is used
 
 And the user will never know that instead aoe is manipulating the _count of 
 the head page, even though that's what's happening with this patch applied.
 
 (The user-visible effects that motivated the older patch that added the BUG 
 to aoe when page tails are used for block I/O was a VM_BUG_ON that occurred 
 when the aoe driver manipulated the _count of a compound page tail.)
 
 OK, I added Fix a BUG which can trigger when direct-IO is used with
 AOE.  Which kernel versions are affected?


That BUG was added to the aoe driver in a commit that first appears in v3.7.  
It persists until this patch adds the code that handles the situation that the 
BUG was intended to detect.

[ecashin@marino linux]$ git describe --contains --match 'v*' 
69cf2d85de773d998798e47e3335b85e5645d157
v3.7-rc1~110^2~22
[ecashin@marino linux]$ 

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 5:27 PM, Andrew Morton wrote:

 On Wed, 7 Aug 2013 14:18:35 -0700 Andrew Morton a...@linux-foundation.org 
 wrote:
 
 On Wed, 7 Aug 2013 17:12:36 -0400 Ed Cashin ecas...@coraid.com wrote:
 
 
 On Aug 7, 2013, at 4:58 PM, Andrew Morton wrote:
 
 On Thu, 1 Aug 2013 21:29:59 -0400 Ed Cashin ecas...@coraid.com wrote:
 
 As discussed previously,
 
 I think I missed that.
 
 the fact that some users of the block
 layer provide bios that point to pages with a zero _count means
 that it is not OK for the network layer to do a put_page on the
 skb frags during an skb_linearize, so the aoe driver gets a
 reference to pages in bios and puts the reference before ending
 the bio.  And because it cannot use get_page on a page with a
 zero _count, it manipulates the value directly.
 
 Eh?  What code is putting count==0 pages into bios?  That sounds very
 weird and broken.
 
 I thought so in 2007 but couldn't solicit a clear this is wrong consensus 
 from the discussion.
 
  http://article.gmane.org/gmane.linux.kernel/499197
  https://lkml.org/lkml/2007/1/19/56
  https://lkml.org/lkml/2006/12/18/230
 
 We were seeing zero-count pages in bios from XFS, but Christoph Hellwig 
 pointed out that kmalloced pages can also come from ext3 when it's doing 
 log recovery, and they'll have zero page counts.
 
 aiiee!
 
 It is (I suppose) reasonable to put kmalloced memory into a BIO's page
 array.  And it is perfectly reasonable for a user of that bio to do a
 get_page/put_page against that page.  It is utterly unreasonable for
 the damn page to get freed as a result!
 
 I'd claim that slab is broken.  The page is in use, so it should have an
 elevated refcount, full stop.
 
 
 err, no.  slab.c uses alloc_pages(), so the underlying page indeed has
 a proper refcount.  I'm still not understanding how this situation comes
 about.

It sounds like it's wrong to give block pages with a zero count, so why not 
just have aoe BUG_ON(compound_trans_head(bv-page-_count) == 0) until we're 
sure nobody does that anymore?

If that idea makes sense to you, I will submit a new patch to follow the one 
under discussion.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin

On Aug 7, 2013, at 7:41 PM, Ed Cashin wrote:

 It sounds like it's wrong to give block pages with a zero count, so why not 
 just have aoe BUG_ON(compound_trans_head(bv-page-_count) == 0) until we're 
 sure nobody does that anymore?

Ugh.  I goofed the parens and such.  I meant,

  BUG_ON(compound_trans_head(bv-bv_page)-_count == 0)

... will catch the cases we think should not be occurring.

 If that idea makes sense to you, I will submit a new patch to follow the one 
 under discussion.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 7:35 PM, Andrew Morton wrote:

 On Wed, 7 Aug 2013 19:27:40 -0400 Ed Cashin ecas...@coraid.com wrote:
 
 On Aug 7, 2013, at 5:26 PM, Andrew Morton wrote:
 
 On Wed, 7 Aug 2013 17:21:32 -0400 Ed Cashin ecas...@coraid.com wrote:
 ...
 Sorry.  I tried to cover that but maybe should have separated it more 
 clearly from the running text.
 
 The patch changes the current behavior encountered with direct I/O and aoe:
 
 aoe can BUG when direct I/O used
 
 ... to a new behavior:
 
 aoe does not BUG when direct I/O is used
 
 And the user will never know that instead aoe is manipulating the _count 
 of the head page, even though that's what's happening with this patch 
 applied.
 
 (The user-visible effects that motivated the older patch that added the 
 BUG to aoe when page tails are used for block I/O was a VM_BUG_ON that 
 occurred when the aoe driver manipulated the _count of a compound page 
 tail.)
 
 OK, I added Fix a BUG which can trigger when direct-IO is used with
 AOE.  Which kernel versions are affected?
 
 
 That BUG was added to the aoe driver in a commit that first appears in v3.7. 
  It persists until this patch adds the code that handles the situation that 
 the BUG was intended to detect.
 
 [ecashin@marino linux]$ git describe --contains --match 'v*' 
 69cf2d85de773d998798e47e3335b85e5645d157
 v3.7-rc1~110^2~22
 [ecashin@marino linux]$ 
 
 
 ah, is it this?
 
 +bio_pageinc(struct bio *bio)
 +{
 +   struct bio_vec *bv;
 +   struct page *page;
 +   int i;
 +
 +   bio_for_each_segment(bv, bio, i) {
 +   page = bv-bv_page;
 +   /* Non-zero page count for non-head members of
 +* compound pages is no longer allowed by the kernel,
 +* but this has never been seen here.
 +*/
 +   if (unlikely(PageCompound(page)))
 +   if (compound_trans_head(page) != page) {
 +   pr_crit(page tail used for block I/O\n);
 +   BUG();
 +   }
 
 But get_page/put_page against a tail page of a compound page should
 Just Work.  The core MM will hunt down the head page and manipulate its
 refcount.  
 
 Perhaps the problem here is that AOE is diving under the covers and is
 using low-level page refcount alterations:
 
 +   atomic_inc(page-_count);
 
 Why does AOE do this?  It would be better if it were to use the
 official published MM interfaces.  If those interfaces need
 alteration/augmentation then we can do that.

The get_page cannot be used when the _count is 0 (which is exactly when we need 
to increment it), because it will VM_BUG_ON.

/*
 * Getting a normal page or the head of a compound page
 * requires to already have an elevated page-_count.
 */
VM_BUG_ON(atomic_read(page-_count) = 0);

(And seeing the code reminds me that I forgot to include atomic_read in my last 
email's proposal.  LKML + dinner isn't a good mix. ;)

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe: adjust ref of head for compound page tails

2013-08-07 Thread Ed Cashin
On Aug 7, 2013, at 8:05 PM, Andrew Morton wrote:

 On Wed, 7 Aug 2013 19:54:45 -0400 Ed Cashin ecas...@coraid.com wrote:
 
 +bio_pageinc(struct bio *bio)
 +{
 +   struct bio_vec *bv;
 +   struct page *page;
 +   int i;
 +
 +   bio_for_each_segment(bv, bio, i) {
 +   page = bv-bv_page;
 +   /* Non-zero page count for non-head members of
 +* compound pages is no longer allowed by the kernel,
 +* but this has never been seen here.
 +*/
 +   if (unlikely(PageCompound(page)))
 +   if (compound_trans_head(page) != page) {
 +   pr_crit(page tail used for block I/O\n);
 +   BUG();
 +   }
 
 But get_page/put_page against a tail page of a compound page should
 Just Work.  The core MM will hunt down the head page and manipulate its
 refcount.  
 
 Perhaps the problem here is that AOE is diving under the covers and is
 using low-level page refcount alterations:
 
 +   atomic_inc(page-_count);
 
 Why does AOE do this?  It would be better if it were to use the
 official published MM interfaces.  If those interfaces need
 alteration/augmentation then we can do that.
 
 The get_page cannot be used when the _count is 0 (which is exactly when we 
 need to increment it), because it will VM_BUG_ON.
 
  /*
   * Getting a normal page or the head of a compound page
   * requires to already have an elevated page-_count.
   */
  VM_BUG_ON(atomic_read(page-_count) = 0);
 
 
 But we shouldn't get that far:
 
 static inline void get_page(struct page *page)
 {
   if (unlikely(PageTail(page)))
   if (likely(__get_page_tail(page)))
   return;
   /*
* Getting a normal page or the head of a compound page
* requires to already have an elevated page-_count.
*/
   VM_BUG_ON(atomic_read(page-_count) = 0);
   atomic_inc(page-_count);
 }
 
 This is a tail page, so we should be using __get_page_tail().


When the workaround was created, it was with the assumption that the zero-count 
pages are not always tail pages, and that seemed to be the case in 2007, but as 
I said, I don't have a mechanism for detecting that now, so I cannot say 
whether it really happens with today's kernel.

If it is never correct for normal pages or compound page heads to have a zero 
count while they are associated with a bio, then yes, I think get_page is a 
great solution.  The VM_BUG_ON in get_page would identify any parts of the 
kernel that are supplying bios that have pages without references.

Just a note in response to we shouldn't get that far:  I believe the 
VM_BUG_ON line in get_page does get executed for tail pages when the 
__get_page_tail returns false because the compound page head has a zero count:

  get_page - __get_page_tail - get_page_unless_zero returns false, so
__get_page_tail returns got, which is still false, so
  get_page executes the VM_BUG_ON, where the _count will be zero for the 
tail page.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] aoe: adjust ref of head for compound page tails

2013-08-01 Thread Ed Cashin
As discussed previously, the fact that some users of the block
layer provide bios that point to pages with a zero _count means
that it is not OK for the network layer to do a put_page on the
skb frags during an skb_linearize, so the aoe driver gets a
reference to pages in bios and puts the reference before ending
the bio.  And because it cannot use get_page on a page with a
zero _count, it manipulates the value directly.

It is not OK to increment the _count of a compound page tail,
though, since the VM layer will VM_BUG_ON a non-zero _count.
Block users that do direct I/O can result in the aoe driver
seeing compound page tails in bios.  In that case, the same logic
works as long as the head of the compound page is used instead of
the tails.  This patch handles compound pages and does not BUG.

It relies on the block layer user leaving the relationship
between the page tail and its head alone for the duration between
the submission of the bio and its completion, whether successful
or not.

Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoecmd.c |   17 +++--
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 99cb944..4d45dba 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -906,16 +906,10 @@ bio_pageinc(struct bio *bio)
int i;
 
bio_for_each_segment(bv, bio, i) {
-   page = bv->bv_page;
/* Non-zero page count for non-head members of
-* compound pages is no longer allowed by the kernel,
-* but this has never been seen here.
+* compound pages is no longer allowed by the kernel.
 */
-   if (unlikely(PageCompound(page)))
-   if (compound_trans_head(page) != page) {
-   pr_crit("page tail used for block I/O\n");
-   BUG();
-   }
+   page = compound_trans_head(bv->bv_page);
atomic_inc(>_count);
}
 }
@@ -924,10 +918,13 @@ static void
 bio_pagedec(struct bio *bio)
 {
struct bio_vec *bv;
+   struct page *page;
int i;
 
-   bio_for_each_segment(bv, bio, i)
-   atomic_dec(>bv_page->_count);
+   bio_for_each_segment(bv, bio, i) {
+   page = compound_trans_head(bv->bv_page);
+   atomic_dec(>_count);
+   }
 }
 
 static void
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] aoe: respect compound page reference tracking expectations

2013-08-01 Thread Ed Cashin
This patch series applies to today's linux-next/akpm, commit
651e600ff3ecc4ac06747313a1c49d89a9ce988e.

Ed L. Cashin (1):
  aoe: adjust ref of head for compound page tails

 drivers/block/aoe/aoecmd.c |   17 +++--
 1 files changed, 7 insertions(+), 10 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] aoe: use min() to simplify the code

2013-08-01 Thread Ed Cashin
Thanks.

I agree that we could do that.  I'm not sure it increases readability to put 
this much stuff on one line, though.  What is the motivation?

On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote:

> min() incorporates condition in it. In our case we could do assignment and
> make a choice at once.
> 
> Signed-off-by: Andy Shevchenko 
> ---
> drivers/block/aoe/aoedev.c | 6 ++
> 1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
> index db35ef6..92fadfa 100644
> --- a/drivers/block/aoe/aoedev.c
> +++ b/drivers/block/aoe/aoedev.c
> @@ -13,6 +13,7 @@
> #include 
> #include 
> #include 
> +#include 
> #include "aoe.h"
> 
> static void dummy_timer(ulong);
> @@ -248,10 +249,7 @@ user_req(char *s, size_t slen, struct aoedev *d)
>   if (!d->gd)
>   return 0;
>   p = kbasename(d->gd->disk_name);
> - lim = sizeof(d->gd->disk_name);
> - lim -= p - d->gd->disk_name;
> - if (slen < lim)
> - lim = slen;
> + lim = min(sizeof(d->gd->disk_name) - (p - d->gd->disk_name), slen);
> 
>   return !strncmp(s, p, lim);
> }
> -- 
> 1.8.3.2
> 

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] aoe: remove custom implementation of kbasename()

2013-08-01 Thread Ed Cashin
Looks OK.  Thanks.

On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote:

> In the kernel we have a nice helper that may be used here. This patch
> substitutes the custom implementation by the native function call.
> 
> Signed-off-by: Andy Shevchenko 
> ---
> drivers/block/aoe/aoedev.c | 9 +++--
> 1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
> index 784c92e..db35ef6 100644
> --- a/drivers/block/aoe/aoedev.c
> +++ b/drivers/block/aoe/aoedev.c
> @@ -12,6 +12,7 @@
> #include 
> #include 
> #include 
> +#include 
> #include "aoe.h"
> 
> static void dummy_timer(ulong);
> @@ -241,16 +242,12 @@ aoedev_downdev(struct aoedev *d)
> static int
> user_req(char *s, size_t slen, struct aoedev *d)
> {
> - char *p;
> + const char *p;
>   size_t lim;
> 
>   if (!d->gd)
>   return 0;
> - p = strrchr(d->gd->disk_name, '/');
> - if (!p)
> - p = d->gd->disk_name;
> - else
> - p += 1;
> + p = kbasename(d->gd->disk_name);
>   lim = sizeof(d->gd->disk_name);
>   lim -= p - d->gd->disk_name;
>   if (slen < lim)
> -- 
> 1.8.3.2
> 

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] aoe: remove custom implementation of kbasename()

2013-08-01 Thread Ed Cashin
Looks OK.  Thanks.

On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote:

 In the kernel we have a nice helper that may be used here. This patch
 substitutes the custom implementation by the native function call.
 
 Signed-off-by: Andy Shevchenko andriy.shevche...@linux.intel.com
 ---
 drivers/block/aoe/aoedev.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
 index 784c92e..db35ef6 100644
 --- a/drivers/block/aoe/aoedev.c
 +++ b/drivers/block/aoe/aoedev.c
 @@ -12,6 +12,7 @@
 #include linux/bitmap.h
 #include linux/kdev_t.h
 #include linux/moduleparam.h
 +#include linux/string.h
 #include aoe.h
 
 static void dummy_timer(ulong);
 @@ -241,16 +242,12 @@ aoedev_downdev(struct aoedev *d)
 static int
 user_req(char *s, size_t slen, struct aoedev *d)
 {
 - char *p;
 + const char *p;
   size_t lim;
 
   if (!d-gd)
   return 0;
 - p = strrchr(d-gd-disk_name, '/');
 - if (!p)
 - p = d-gd-disk_name;
 - else
 - p += 1;
 + p = kbasename(d-gd-disk_name);
   lim = sizeof(d-gd-disk_name);
   lim -= p - d-gd-disk_name;
   if (slen  lim)
 -- 
 1.8.3.2
 

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] aoe: use min() to simplify the code

2013-08-01 Thread Ed Cashin
Thanks.

I agree that we could do that.  I'm not sure it increases readability to put 
this much stuff on one line, though.  What is the motivation?

On Aug 1, 2013, at 8:28 AM, Andy Shevchenko wrote:

 min() incorporates condition in it. In our case we could do assignment and
 make a choice at once.
 
 Signed-off-by: Andy Shevchenko andriy.shevche...@linux.intel.com
 ---
 drivers/block/aoe/aoedev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
 index db35ef6..92fadfa 100644
 --- a/drivers/block/aoe/aoedev.c
 +++ b/drivers/block/aoe/aoedev.c
 @@ -13,6 +13,7 @@
 #include linux/kdev_t.h
 #include linux/moduleparam.h
 #include linux/string.h
 +#include linux/kernel.h
 #include aoe.h
 
 static void dummy_timer(ulong);
 @@ -248,10 +249,7 @@ user_req(char *s, size_t slen, struct aoedev *d)
   if (!d-gd)
   return 0;
   p = kbasename(d-gd-disk_name);
 - lim = sizeof(d-gd-disk_name);
 - lim -= p - d-gd-disk_name;
 - if (slen  lim)
 - lim = slen;
 + lim = min(sizeof(d-gd-disk_name) - (p - d-gd-disk_name), slen);
 
   return !strncmp(s, p, lim);
 }
 -- 
 1.8.3.2
 

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] aoe: adjust ref of head for compound page tails

2013-08-01 Thread Ed Cashin
As discussed previously, the fact that some users of the block
layer provide bios that point to pages with a zero _count means
that it is not OK for the network layer to do a put_page on the
skb frags during an skb_linearize, so the aoe driver gets a
reference to pages in bios and puts the reference before ending
the bio.  And because it cannot use get_page on a page with a
zero _count, it manipulates the value directly.

It is not OK to increment the _count of a compound page tail,
though, since the VM layer will VM_BUG_ON a non-zero _count.
Block users that do direct I/O can result in the aoe driver
seeing compound page tails in bios.  In that case, the same logic
works as long as the head of the compound page is used instead of
the tails.  This patch handles compound pages and does not BUG.

It relies on the block layer user leaving the relationship
between the page tail and its head alone for the duration between
the submission of the bio and its completion, whether successful
or not.

Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoecmd.c |   17 +++--
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 99cb944..4d45dba 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -906,16 +906,10 @@ bio_pageinc(struct bio *bio)
int i;
 
bio_for_each_segment(bv, bio, i) {
-   page = bv-bv_page;
/* Non-zero page count for non-head members of
-* compound pages is no longer allowed by the kernel,
-* but this has never been seen here.
+* compound pages is no longer allowed by the kernel.
 */
-   if (unlikely(PageCompound(page)))
-   if (compound_trans_head(page) != page) {
-   pr_crit(page tail used for block I/O\n);
-   BUG();
-   }
+   page = compound_trans_head(bv-bv_page);
atomic_inc(page-_count);
}
 }
@@ -924,10 +918,13 @@ static void
 bio_pagedec(struct bio *bio)
 {
struct bio_vec *bv;
+   struct page *page;
int i;
 
-   bio_for_each_segment(bv, bio, i)
-   atomic_dec(bv-bv_page-_count);
+   bio_for_each_segment(bv, bio, i) {
+   page = compound_trans_head(bv-bv_page);
+   atomic_dec(page-_count);
+   }
 }
 
 static void
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] aoe: respect compound page reference tracking expectations

2013-08-01 Thread Ed Cashin
This patch series applies to today's linux-next/akpm, commit
651e600ff3ecc4ac06747313a1c49d89a9ce988e.

Ed L. Cashin (1):
  aoe: adjust ref of head for compound page tails

 drivers/block/aoe/aoecmd.c |   17 +++--
 1 files changed, 7 insertions(+), 10 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] aoe: update internal version number to 85

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoe.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index b1f24c5..14a9d19 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -1,5 +1,5 @@
 /* Copyright (c) 2013 Coraid, Inc.  See COPYING for GPL terms. */
-#define VERSION "83"
+#define VERSION "85"
 #define AOE_MAJOR 152
 #define DEVICE_NAME "aoe"
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] aoe: update copyright date

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoeblk.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index b58cbeb..d63dcf0 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2012 Coraid, Inc.  See COPYING for GPL terms. */
+/* Copyright (c) 2013 Coraid, Inc.  See COPYING for GPL terms. */
 /*
  * aoeblk.c
  * block device routines
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] aoe: fill in per-AoE-target information for debugfs file

2013-07-23 Thread Ed Cashin
This information is presented in a compact format that has
evolved for easy routine scanning by expert humans, mostly
developers and support technicians helping to troubleshoot
or test AoE-based systems.

Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoeblk.c |   33 -
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 0511d38..b58cbeb 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -113,11 +113,42 @@ static ssize_t aoedisk_show_payload(struct device *dev,
 static int aoedisk_debugfs_show(struct seq_file *s, void *ignored)
 {
struct aoedev *d;
+   struct aoetgt **t, **te;
+   struct aoeif *ifp, *ife;
unsigned long flags;
+   char c;
 
d = s->private;
+   seq_printf(s, "rttavg: %d rttdev: %d\n",
+   d->rttavg >> RTTSCALE,
+   d->rttdev >> RTTDSCALE);
+   seq_printf(s, "nskbpool: %d\n", skb_queue_len(>skbpool));
+   seq_printf(s, "kicked: %ld\n", d->kicked);
+   seq_printf(s, "maxbcnt: %ld\n", d->maxbcnt);
+   seq_printf(s, "ref: %ld\n", d->ref);
+
spin_lock_irqsave(>lock, flags);
-   seq_printf(s, "%s\n", d->gd->disk_name); /* place holder */
+   t = d->targets;
+   te = t + d->ntargets;
+   for (; t < te && *t; t++) {
+   c = '\t';
+   seq_printf(s, "falloc: %ld\n", (*t)->falloc);
+   seq_printf(s, "ffree: %p\n",
+   list_empty(&(*t)->ffree) ? NULL : (*t)->ffree.next);
+   seq_printf(s, "%pm:%d:%d:%d\n", (*t)->addr, (*t)->nout,
+   (*t)->maxout, (*t)->nframes);
+   seq_printf(s, "\tssthresh:%d\n", (*t)->ssthresh);
+   seq_printf(s, "\ttaint:%d\n", (*t)->taint);
+   seq_printf(s, "\tr:%d\n", (*t)->rpkts);
+   seq_printf(s, "\tw:%d\n", (*t)->wpkts);
+   ifp = (*t)->ifs;
+   ife = ifp + ARRAY_SIZE((*t)->ifs);
+   for (; ifp->nd && ifp < ife; ifp++) {
+   seq_printf(s, "%c%s", c, ifp->nd->name);
+   c = ',';
+   }
+   seq_puts(s, "\n");
+   }
spin_unlock_irqrestore(>lock, flags);
 
return 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] aoe: provide file operations for debugfs files

2013-07-23 Thread Ed Cashin
The place holder in the file contents is filled out in the
following patch.

Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoeblk.c |   25 -
 1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index d76c5cb..0511d38 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -110,6 +110,24 @@ static ssize_t aoedisk_show_payload(struct device *dev,
return snprintf(page, PAGE_SIZE, "%lu\n", d->maxbcnt);
 }
 
+static int aoedisk_debugfs_show(struct seq_file *s, void *ignored)
+{
+   struct aoedev *d;
+   unsigned long flags;
+
+   d = s->private;
+   spin_lock_irqsave(>lock, flags);
+   seq_printf(s, "%s\n", d->gd->disk_name); /* place holder */
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static int aoe_debugfs_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, aoedisk_debugfs_show, inode->i_private);
+}
+
 static DEVICE_ATTR(state, S_IRUGO, aoedisk_show_state, NULL);
 static DEVICE_ATTR(mac, S_IRUGO, aoedisk_show_mac, NULL);
 static DEVICE_ATTR(netif, S_IRUGO, aoedisk_show_netif, NULL);
@@ -132,7 +150,12 @@ static const struct attribute_group attr_group = {
.attrs = aoe_attrs,
 };
 
-static const struct file_operations aoe_debugfs_fops;
+static const struct file_operations aoe_debugfs_fops = {
+   .open = aoe_debugfs_open,
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = single_release,
+};
 
 static void
 aoedisk_add_debugfs(struct aoedev *d)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] aoe: add AoE-target files to debugfs

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoe.h|2 ++
 drivers/block/aoe/aoeblk.c |   35 +++
 drivers/block/aoe/aoedev.c |1 +
 3 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 025c41d..b1f24c5 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -169,6 +169,7 @@ struct aoedev {
ulong ref;
struct work_struct work;/* disk create work struct */
struct gendisk *gd;
+   struct dentry *debugfs;
struct request_queue *blkq;
struct hd_geometry geo;
sector_t ssize;
@@ -206,6 +207,7 @@ struct ktstate {
 int aoeblk_init(void);
 void aoeblk_exit(void);
 void aoeblk_gdalloc(void *);
+void aoedisk_rm_debugfs(struct aoedev *d);
 void aoedisk_rm_sysfs(struct aoedev *d);
 
 int aoechr_init(void);
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index cb508b7..d76c5cb 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -132,6 +132,40 @@ static const struct attribute_group attr_group = {
.attrs = aoe_attrs,
 };
 
+static const struct file_operations aoe_debugfs_fops;
+
+static void
+aoedisk_add_debugfs(struct aoedev *d)
+{
+   struct dentry *entry;
+   char *p;
+
+   if (aoe_debugfs_dir == NULL)
+   return;
+   p = strchr(d->gd->disk_name, '/');
+   if (p == NULL)
+   p = d->gd->disk_name;
+   else
+   p++;
+   BUG_ON(*p == '\0');
+   entry = debugfs_create_file(p, 0444, aoe_debugfs_dir, d,
+   _debugfs_fops);
+   if (IS_ERR_OR_NULL(entry)) {
+   pr_info("aoe: cannot create debugfs file for %s\n",
+   d->gd->disk_name);
+   return;
+   }
+   BUG_ON(d->debugfs);
+   d->debugfs = entry;
+}
+void
+aoedisk_rm_debugfs(struct aoedev *d)
+{
+   BUG_ON(d->debugfs == NULL);
+   debugfs_remove(d->debugfs);
+   d->debugfs = NULL;
+}
+
 static int
 aoedisk_add_sysfs(struct aoedev *d)
 {
@@ -332,6 +366,7 @@ aoeblk_gdalloc(void *vp)
 
add_disk(gd);
aoedisk_add_sysfs(d);
+   aoedisk_add_debugfs(d);
 
spin_lock_irqsave(>lock, flags);
WARN_ON(!(d->flags & DEVFL_GD_NOW));
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index 784c92e..c904767 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -278,6 +278,7 @@ freedev(struct aoedev *d)
 
del_timer_sync(>timer);
if (d->gd) {
+   aoedisk_rm_debugfs(d);
aoedisk_rm_sysfs(d);
del_gendisk(d->gd);
put_disk(d->gd);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] aoe: create and destroy debugfs directory for aoe

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoeblk.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 916d9ed..cb508b7 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -17,11 +17,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "aoe.h"
 
 static DEFINE_MUTEX(aoeblk_mutex);
 static struct kmem_cache *buf_pool_cache;
+static struct dentry *aoe_debugfs_dir;
 
 /* GPFS needs a larger value than the default. */
 static int aoe_maxsectors;
@@ -351,6 +353,8 @@ err:
 void
 aoeblk_exit(void)
 {
+   debugfs_remove_recursive(aoe_debugfs_dir);
+   aoe_debugfs_dir = NULL;
kmem_cache_destroy(buf_pool_cache);
 }
 
@@ -362,7 +366,11 @@ aoeblk_init(void)
   0, 0, NULL);
if (buf_pool_cache == NULL)
return -ENOMEM;
-
+   aoe_debugfs_dir = debugfs_create_dir("aoe", NULL);
+   if (IS_ERR_OR_NULL(aoe_debugfs_dir)) {
+   pr_info("aoe: cannot create debugfs directory\n");
+   aoe_debugfs_dir = NULL;
+   }
return 0;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] aoe: add debugfs-based export of debug counters

2013-07-23 Thread Ed Cashin
This patch series applies to today's linux-next/akpm, commit
0c2f52fa9a7cb139617078568c48026c49927617.

It adds the debugging information that the coraid.com-distributed aoe
driver exports via sysfs, but instead of sysfs, it uses debugfs.

With these patches applied, even without AoE targets on the network,
KEDR reports new possible memory leaks, but these are from callers
outside the aoe driver that have used aoe_devnode to get the name of
the character devices through the aoe_class->devnode callback, and I
believe they're responsible for freeing that memory.

Ed L. Cashin (6):
  aoe: create and destroy debugfs directory for aoe
  aoe: add AoE-target files to debugfs
  aoe: provide file operations for debugfs files
  aoe: fill in per-AoE-target information for debugfs file
  aoe: update copyright date
  aoe: update internal version number to 85

 drivers/block/aoe/aoe.h|4 +-
 drivers/block/aoe/aoeblk.c |  101 +++-
 drivers/block/aoe/aoedev.c |1 +
 3 files changed, 103 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] aoe: add debugfs-based export of debug counters

2013-07-23 Thread Ed Cashin
This patch series applies to today's linux-next/akpm, commit
0c2f52fa9a7cb139617078568c48026c49927617.

It adds the debugging information that the coraid.com-distributed aoe
driver exports via sysfs, but instead of sysfs, it uses debugfs.

With these patches applied, even without AoE targets on the network,
KEDR reports new possible memory leaks, but these are from callers
outside the aoe driver that have used aoe_devnode to get the name of
the character devices through the aoe_class-devnode callback, and I
believe they're responsible for freeing that memory.

Ed L. Cashin (6):
  aoe: create and destroy debugfs directory for aoe
  aoe: add AoE-target files to debugfs
  aoe: provide file operations for debugfs files
  aoe: fill in per-AoE-target information for debugfs file
  aoe: update copyright date
  aoe: update internal version number to 85

 drivers/block/aoe/aoe.h|4 +-
 drivers/block/aoe/aoeblk.c |  101 +++-
 drivers/block/aoe/aoedev.c |1 +
 3 files changed, 103 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] aoe: create and destroy debugfs directory for aoe

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoeblk.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 916d9ed..cb508b7 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -17,11 +17,13 @@
 #include linux/mutex.h
 #include linux/export.h
 #include linux/moduleparam.h
+#include linux/debugfs.h
 #include scsi/sg.h
 #include aoe.h
 
 static DEFINE_MUTEX(aoeblk_mutex);
 static struct kmem_cache *buf_pool_cache;
+static struct dentry *aoe_debugfs_dir;
 
 /* GPFS needs a larger value than the default. */
 static int aoe_maxsectors;
@@ -351,6 +353,8 @@ err:
 void
 aoeblk_exit(void)
 {
+   debugfs_remove_recursive(aoe_debugfs_dir);
+   aoe_debugfs_dir = NULL;
kmem_cache_destroy(buf_pool_cache);
 }
 
@@ -362,7 +366,11 @@ aoeblk_init(void)
   0, 0, NULL);
if (buf_pool_cache == NULL)
return -ENOMEM;
-
+   aoe_debugfs_dir = debugfs_create_dir(aoe, NULL);
+   if (IS_ERR_OR_NULL(aoe_debugfs_dir)) {
+   pr_info(aoe: cannot create debugfs directory\n);
+   aoe_debugfs_dir = NULL;
+   }
return 0;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] aoe: add AoE-target files to debugfs

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoe.h|2 ++
 drivers/block/aoe/aoeblk.c |   35 +++
 drivers/block/aoe/aoedev.c |1 +
 3 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 025c41d..b1f24c5 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -169,6 +169,7 @@ struct aoedev {
ulong ref;
struct work_struct work;/* disk create work struct */
struct gendisk *gd;
+   struct dentry *debugfs;
struct request_queue *blkq;
struct hd_geometry geo;
sector_t ssize;
@@ -206,6 +207,7 @@ struct ktstate {
 int aoeblk_init(void);
 void aoeblk_exit(void);
 void aoeblk_gdalloc(void *);
+void aoedisk_rm_debugfs(struct aoedev *d);
 void aoedisk_rm_sysfs(struct aoedev *d);
 
 int aoechr_init(void);
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index cb508b7..d76c5cb 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -132,6 +132,40 @@ static const struct attribute_group attr_group = {
.attrs = aoe_attrs,
 };
 
+static const struct file_operations aoe_debugfs_fops;
+
+static void
+aoedisk_add_debugfs(struct aoedev *d)
+{
+   struct dentry *entry;
+   char *p;
+
+   if (aoe_debugfs_dir == NULL)
+   return;
+   p = strchr(d-gd-disk_name, '/');
+   if (p == NULL)
+   p = d-gd-disk_name;
+   else
+   p++;
+   BUG_ON(*p == '\0');
+   entry = debugfs_create_file(p, 0444, aoe_debugfs_dir, d,
+   aoe_debugfs_fops);
+   if (IS_ERR_OR_NULL(entry)) {
+   pr_info(aoe: cannot create debugfs file for %s\n,
+   d-gd-disk_name);
+   return;
+   }
+   BUG_ON(d-debugfs);
+   d-debugfs = entry;
+}
+void
+aoedisk_rm_debugfs(struct aoedev *d)
+{
+   BUG_ON(d-debugfs == NULL);
+   debugfs_remove(d-debugfs);
+   d-debugfs = NULL;
+}
+
 static int
 aoedisk_add_sysfs(struct aoedev *d)
 {
@@ -332,6 +366,7 @@ aoeblk_gdalloc(void *vp)
 
add_disk(gd);
aoedisk_add_sysfs(d);
+   aoedisk_add_debugfs(d);
 
spin_lock_irqsave(d-lock, flags);
WARN_ON(!(d-flags  DEVFL_GD_NOW));
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index 784c92e..c904767 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -278,6 +278,7 @@ freedev(struct aoedev *d)
 
del_timer_sync(d-timer);
if (d-gd) {
+   aoedisk_rm_debugfs(d);
aoedisk_rm_sysfs(d);
del_gendisk(d-gd);
put_disk(d-gd);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] aoe: provide file operations for debugfs files

2013-07-23 Thread Ed Cashin
The place holder in the file contents is filled out in the
following patch.

Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoeblk.c |   25 -
 1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index d76c5cb..0511d38 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -110,6 +110,24 @@ static ssize_t aoedisk_show_payload(struct device *dev,
return snprintf(page, PAGE_SIZE, %lu\n, d-maxbcnt);
 }
 
+static int aoedisk_debugfs_show(struct seq_file *s, void *ignored)
+{
+   struct aoedev *d;
+   unsigned long flags;
+
+   d = s-private;
+   spin_lock_irqsave(d-lock, flags);
+   seq_printf(s, %s\n, d-gd-disk_name); /* place holder */
+   spin_unlock_irqrestore(d-lock, flags);
+
+   return 0;
+}
+
+static int aoe_debugfs_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, aoedisk_debugfs_show, inode-i_private);
+}
+
 static DEVICE_ATTR(state, S_IRUGO, aoedisk_show_state, NULL);
 static DEVICE_ATTR(mac, S_IRUGO, aoedisk_show_mac, NULL);
 static DEVICE_ATTR(netif, S_IRUGO, aoedisk_show_netif, NULL);
@@ -132,7 +150,12 @@ static const struct attribute_group attr_group = {
.attrs = aoe_attrs,
 };
 
-static const struct file_operations aoe_debugfs_fops;
+static const struct file_operations aoe_debugfs_fops = {
+   .open = aoe_debugfs_open,
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = single_release,
+};
 
 static void
 aoedisk_add_debugfs(struct aoedev *d)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] aoe: fill in per-AoE-target information for debugfs file

2013-07-23 Thread Ed Cashin
This information is presented in a compact format that has
evolved for easy routine scanning by expert humans, mostly
developers and support technicians helping to troubleshoot
or test AoE-based systems.

Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoeblk.c |   33 -
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 0511d38..b58cbeb 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -113,11 +113,42 @@ static ssize_t aoedisk_show_payload(struct device *dev,
 static int aoedisk_debugfs_show(struct seq_file *s, void *ignored)
 {
struct aoedev *d;
+   struct aoetgt **t, **te;
+   struct aoeif *ifp, *ife;
unsigned long flags;
+   char c;
 
d = s-private;
+   seq_printf(s, rttavg: %d rttdev: %d\n,
+   d-rttavg  RTTSCALE,
+   d-rttdev  RTTDSCALE);
+   seq_printf(s, nskbpool: %d\n, skb_queue_len(d-skbpool));
+   seq_printf(s, kicked: %ld\n, d-kicked);
+   seq_printf(s, maxbcnt: %ld\n, d-maxbcnt);
+   seq_printf(s, ref: %ld\n, d-ref);
+
spin_lock_irqsave(d-lock, flags);
-   seq_printf(s, %s\n, d-gd-disk_name); /* place holder */
+   t = d-targets;
+   te = t + d-ntargets;
+   for (; t  te  *t; t++) {
+   c = '\t';
+   seq_printf(s, falloc: %ld\n, (*t)-falloc);
+   seq_printf(s, ffree: %p\n,
+   list_empty((*t)-ffree) ? NULL : (*t)-ffree.next);
+   seq_printf(s, %pm:%d:%d:%d\n, (*t)-addr, (*t)-nout,
+   (*t)-maxout, (*t)-nframes);
+   seq_printf(s, \tssthresh:%d\n, (*t)-ssthresh);
+   seq_printf(s, \ttaint:%d\n, (*t)-taint);
+   seq_printf(s, \tr:%d\n, (*t)-rpkts);
+   seq_printf(s, \tw:%d\n, (*t)-wpkts);
+   ifp = (*t)-ifs;
+   ife = ifp + ARRAY_SIZE((*t)-ifs);
+   for (; ifp-nd  ifp  ife; ifp++) {
+   seq_printf(s, %c%s, c, ifp-nd-name);
+   c = ',';
+   }
+   seq_puts(s, \n);
+   }
spin_unlock_irqrestore(d-lock, flags);
 
return 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] aoe: update copyright date

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoeblk.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index b58cbeb..d63dcf0 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2012 Coraid, Inc.  See COPYING for GPL terms. */
+/* Copyright (c) 2013 Coraid, Inc.  See COPYING for GPL terms. */
 /*
  * aoeblk.c
  * block device routines
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] aoe: update internal version number to 85

2013-07-23 Thread Ed Cashin
Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoe.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index b1f24c5..14a9d19 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -1,5 +1,5 @@
 /* Copyright (c) 2013 Coraid, Inc.  See COPYING for GPL terms. */
-#define VERSION 83
+#define VERSION 85
 #define AOE_MAJOR 152
 #define DEVICE_NAME aoe
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >