Re: [RFC 3/4] nbd: Create helper functions for ioctls

2015-01-28 Thread Paul Clements
Markus,

Comments below...

On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann  wrote:
> This patch prepares the nbd code for the nbd-root device patch. It
> moves the ioctl code into separate functions so they can be called
> directly. The patch creates nbd_set_total_size(), nbd_set_blksize(),
> nbd_set_sock() and nbd_set_timeout().
>
> Signed-off-by: Markus Pargmann 
> ---
>  drivers/block/nbd.c | 79 
> ++---
>  1 file changed, 57 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 0ff6ebbec252..11f7644be111 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -596,6 +596,55 @@ static void do_nbd_request(struct request_queue *q)
> }
>  }
>
> +/*
> + * Call with tx_lock held
> + */
> +static void nbd_set_total_size(struct block_device *bdev,
> +  struct nbd_device *nbd, size_t size)
> +{
> +   nbd->bytesize = size;
> +   bdev->bd_inode->i_size = size;
> +   set_blocksize(bdev, nbd->blksize);
> +   set_capacity(nbd->disk, size >> 9);
> +}
> +
> +/*
> + * Call with tx_lock held
> + */
> +static void nbd_set_blksize(struct block_device *bdev, struct nbd_device 
> *nbd,
> +   size_t blksize)
> +{
> +   u64 new_bytesize;
> +
> +   nbd->blksize = blksize;
> +   bdev->bd_inode->i_size = nbd->bytesize;
> +   set_blocksize(bdev, nbd->blksize);
> +
> +   new_bytesize = nbd->bytesize & ~(nbd->blksize-1);
> +   if (new_bytesize != nbd->bytesize)
> +   nbd_set_total_size(bdev, nbd, new_bytesize);

could just do set_capacity here to save a few cycles, but not a big
deal either way...


> +}
> +
> +/*
> + * Call with tx_lock held
> + */
> +static void nbd_set_sock(struct block_device *bdev, struct nbd_device *nbd,
> +struct socket *sock)
> +{
> +   nbd->sock = sock;
> +   if (max_part > 0)
> +   bdev->bd_invalidated = 1;
> +   nbd->disconnect = 0; /* we're connected now */
> +}
> +
> +/*
> + * Call with tx_lock held
> + */
> +static void nbd_set_timeout(struct nbd_device *nbd, int timeout)
> +{
> +   nbd->xmit_timeout = timeout * HZ;
> +}
> +
>  static int nbd_connection_start(struct block_device *bdev,
> struct nbd_device *nbd)
>  {
> @@ -733,33 +782,22 @@ static int __nbd_ioctl(struct block_device *bdev, 
> struct nbd_device *nbd,
> if (nbd->sock)
> return -EBUSY;
> sock = sockfd_lookup(arg, );
> -   if (sock) {
> -   nbd->sock = sock;
> -   if (max_part > 0)
> -   bdev->bd_invalidated = 1;
> -   nbd->disconnect = 0; /* we're connected now */
> -   return 0;
> -   }
> -   return -EINVAL;
> +   if (!sock)
> +   return -EINVAL;
> +
> +   nbd_set_sock(bdev, nbd, sock);

need a return 0 here


> }
>
> case NBD_SET_BLKSIZE:
> -   nbd->blksize = arg;
> -   nbd->bytesize &= ~(nbd->blksize-1);
> -   bdev->bd_inode->i_size = nbd->bytesize;
> -   set_blocksize(bdev, nbd->blksize);
> -   set_capacity(nbd->disk, nbd->bytesize >> 9);
> +   nbd_set_blksize(bdev, nbd, arg);
> return 0;
>
> case NBD_SET_SIZE:
> -   nbd->bytesize = arg & ~(nbd->blksize-1);
> -   bdev->bd_inode->i_size = nbd->bytesize;
> -   set_blocksize(bdev, nbd->blksize);
> -   set_capacity(nbd->disk, nbd->bytesize >> 9);
> +   nbd_set_total_size(bdev, nbd, arg & ~(nbd->blksize-1));
> return 0;

Might consider deprecating this one...it's pretty useless these days,
as it has the same function as NBD_SET_SIZE_BLOCKS, but has a smaller
device size limit (at least on some platforms)... nbd-client does not
use it anymore...


> case NBD_SET_TIMEOUT:
> -   nbd->xmit_timeout = arg * HZ;
> +   nbd_set_timeout(nbd, arg);
> return 0;
>
> case NBD_SET_FLAGS:
> @@ -767,10 +805,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
> nbd_device *nbd,
> return 0;
>
> case NBD_SET_SIZE_BLOCKS:
> -   nbd->bytesize = ((u64) arg) * nbd->blksize;
> -   bdev->bd_inode->i_size = nbd->bytesize;
> -   set_blocksize(bdev, nbd->blksize);
> -   set_capacity(nbd->disk, nbd->bytesize >> 9);
> +   nbd_set_total_size(bdev, nbd, ((u64)arg) * nbd->blksize);
> return 0;
>
> case NBD_DO_IT:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH] MAINTAINERS: Update NBD maintainer

2015-01-28 Thread Paul Clements
Small typo, but otherwise, I approve.

On Wed, Jan 28, 2015 at 1:35 PM, Markus Pargmann  wrote:
> Paul stops maintining NBD and I will take his place from now on.
>
> Signed-off-by: Markus Pargmann 
> ---
>  MAINTAINERS | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2ebb056cbe0a..4a83259e7f45 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6583,9 +6583,10 @@ F:   include/uapi/linux/netrom.h
>  F: net/netrom/
>
>  NETWORK BLOCK DEVICE (NBD)
> -M: Paul Clements 
> +M: Markus Pargmann 

>  S: Maintained
>  L: nbd-gene...@lists.sourceforge.net
> +T: git git://git.pengutronix.de/git/mpa/linux-nbd.git
>  F: Documentation/blockdev/nbd.txt
>  F: drivers/block/nbd.c
>  F: include/linux/nbd.h
> --
> 2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/4] nbd: Create helper functions for ioctls

2015-01-28 Thread Paul Clements
Markus,

Comments below...

On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann m...@pengutronix.de wrote:
 This patch prepares the nbd code for the nbd-root device patch. It
 moves the ioctl code into separate functions so they can be called
 directly. The patch creates nbd_set_total_size(), nbd_set_blksize(),
 nbd_set_sock() and nbd_set_timeout().

 Signed-off-by: Markus Pargmann m...@pengutronix.de
 ---
  drivers/block/nbd.c | 79 
 ++---
  1 file changed, 57 insertions(+), 22 deletions(-)

 diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
 index 0ff6ebbec252..11f7644be111 100644
 --- a/drivers/block/nbd.c
 +++ b/drivers/block/nbd.c
 @@ -596,6 +596,55 @@ static void do_nbd_request(struct request_queue *q)
 }
  }

 +/*
 + * Call with tx_lock held
 + */
 +static void nbd_set_total_size(struct block_device *bdev,
 +  struct nbd_device *nbd, size_t size)
 +{
 +   nbd-bytesize = size;
 +   bdev-bd_inode-i_size = size;
 +   set_blocksize(bdev, nbd-blksize);
 +   set_capacity(nbd-disk, size  9);
 +}
 +
 +/*
 + * Call with tx_lock held
 + */
 +static void nbd_set_blksize(struct block_device *bdev, struct nbd_device 
 *nbd,
 +   size_t blksize)
 +{
 +   u64 new_bytesize;
 +
 +   nbd-blksize = blksize;
 +   bdev-bd_inode-i_size = nbd-bytesize;
 +   set_blocksize(bdev, nbd-blksize);
 +
 +   new_bytesize = nbd-bytesize  ~(nbd-blksize-1);
 +   if (new_bytesize != nbd-bytesize)
 +   nbd_set_total_size(bdev, nbd, new_bytesize);

could just do set_capacity here to save a few cycles, but not a big
deal either way...


 +}
 +
 +/*
 + * Call with tx_lock held
 + */
 +static void nbd_set_sock(struct block_device *bdev, struct nbd_device *nbd,
 +struct socket *sock)
 +{
 +   nbd-sock = sock;
 +   if (max_part  0)
 +   bdev-bd_invalidated = 1;
 +   nbd-disconnect = 0; /* we're connected now */
 +}
 +
 +/*
 + * Call with tx_lock held
 + */
 +static void nbd_set_timeout(struct nbd_device *nbd, int timeout)
 +{
 +   nbd-xmit_timeout = timeout * HZ;
 +}
 +
  static int nbd_connection_start(struct block_device *bdev,
 struct nbd_device *nbd)
  {
 @@ -733,33 +782,22 @@ static int __nbd_ioctl(struct block_device *bdev, 
 struct nbd_device *nbd,
 if (nbd-sock)
 return -EBUSY;
 sock = sockfd_lookup(arg, err);
 -   if (sock) {
 -   nbd-sock = sock;
 -   if (max_part  0)
 -   bdev-bd_invalidated = 1;
 -   nbd-disconnect = 0; /* we're connected now */
 -   return 0;
 -   }
 -   return -EINVAL;
 +   if (!sock)
 +   return -EINVAL;
 +
 +   nbd_set_sock(bdev, nbd, sock);

need a return 0 here


 }

 case NBD_SET_BLKSIZE:
 -   nbd-blksize = arg;
 -   nbd-bytesize = ~(nbd-blksize-1);
 -   bdev-bd_inode-i_size = nbd-bytesize;
 -   set_blocksize(bdev, nbd-blksize);
 -   set_capacity(nbd-disk, nbd-bytesize  9);
 +   nbd_set_blksize(bdev, nbd, arg);
 return 0;

 case NBD_SET_SIZE:
 -   nbd-bytesize = arg  ~(nbd-blksize-1);
 -   bdev-bd_inode-i_size = nbd-bytesize;
 -   set_blocksize(bdev, nbd-blksize);
 -   set_capacity(nbd-disk, nbd-bytesize  9);
 +   nbd_set_total_size(bdev, nbd, arg  ~(nbd-blksize-1));
 return 0;

Might consider deprecating this one...it's pretty useless these days,
as it has the same function as NBD_SET_SIZE_BLOCKS, but has a smaller
device size limit (at least on some platforms)... nbd-client does not
use it anymore...


 case NBD_SET_TIMEOUT:
 -   nbd-xmit_timeout = arg * HZ;
 +   nbd_set_timeout(nbd, arg);
 return 0;

 case NBD_SET_FLAGS:
 @@ -767,10 +805,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
 nbd_device *nbd,
 return 0;

 case NBD_SET_SIZE_BLOCKS:
 -   nbd-bytesize = ((u64) arg) * nbd-blksize;
 -   bdev-bd_inode-i_size = nbd-bytesize;
 -   set_blocksize(bdev, nbd-blksize);
 -   set_capacity(nbd-disk, nbd-bytesize  9);
 +   nbd_set_total_size(bdev, nbd, ((u64)arg) * nbd-blksize);
 return 0;

 case NBD_DO_IT:
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MAINTAINERS: Update NBD maintainer

2015-01-28 Thread Paul Clements
Small typo, but otherwise, I approve.

On Wed, Jan 28, 2015 at 1:35 PM, Markus Pargmann m...@pengutronix.de wrote:
 Paul stops maintining NBD and I will take his place from now on.

 Signed-off-by: Markus Pargmann m...@pengutronix.de
 ---
  MAINTAINERS | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/MAINTAINERS b/MAINTAINERS
 index 2ebb056cbe0a..4a83259e7f45 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -6583,9 +6583,10 @@ F:   include/uapi/linux/netrom.h
  F: net/netrom/

  NETWORK BLOCK DEVICE (NBD)
 -M: Paul Clements paul.cleme...@steeleye.com
 +M: Markus Pargmann m...@pengutronix.de

^^ here, missing the trailing 

  S: Maintained
  L: nbd-gene...@lists.sourceforge.net
 +T: git git://git.pengutronix.de/git/mpa/linux-nbd.git
  F: Documentation/blockdev/nbd.txt
  F: drivers/block/nbd.c
  F: include/linux/nbd.h
 --
 2.1.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] nbd: fix possible memory leak

2015-01-27 Thread Paul Clements
On Tue, Jan 27, 2015 at 7:38 AM, Sudip Mukherjee
 wrote:
> we have already allocated memory for nbd_dev, but we were not
> releasing that memory and just returning the error value.
>
> Signed-off-by: Sudip Mukherjee 

Looks good to me.

Acked-by: Paul Clements 

> ---
>
> v2: moved kcalloc after the returns.
>
>  drivers/block/nbd.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 4bc2a5c..db93c75 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -803,10 +803,6 @@ static int __init nbd_init(void)
> return -EINVAL;
> }
>
> -   nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
> -   if (!nbd_dev)
> -   return -ENOMEM;
> -
> part_shift = 0;
> if (max_part > 0) {
> part_shift = fls(max_part);
> @@ -827,6 +823,10 @@ static int __init nbd_init(void)
>
> if (nbds_max > 1UL << (MINORBITS - part_shift))
> return -EINVAL;
> +
> +   nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
> +   if (!nbd_dev)
> +   return -ENOMEM;
>
> for (i = 0; i < nbds_max; i++) {
> struct gendisk *disk = alloc_disk(1 << part_shift);
> --
> 1.8.1.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] nbd: fix possible memory leak

2015-01-27 Thread Paul Clements
On Tue, Jan 27, 2015 at 7:38 AM, Sudip Mukherjee
sudipm.mukher...@gmail.com wrote:
 we have already allocated memory for nbd_dev, but we were not
 releasing that memory and just returning the error value.

 Signed-off-by: Sudip Mukherjee su...@vectorindia.org

Looks good to me.

Acked-by: Paul Clements paul.cleme...@steeleye.com

 ---

 v2: moved kcalloc after the returns.

  drivers/block/nbd.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
 index 4bc2a5c..db93c75 100644
 --- a/drivers/block/nbd.c
 +++ b/drivers/block/nbd.c
 @@ -803,10 +803,6 @@ static int __init nbd_init(void)
 return -EINVAL;
 }

 -   nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
 -   if (!nbd_dev)
 -   return -ENOMEM;
 -
 part_shift = 0;
 if (max_part  0) {
 part_shift = fls(max_part);
 @@ -827,6 +823,10 @@ static int __init nbd_init(void)

 if (nbds_max  1UL  (MINORBITS - part_shift))
 return -EINVAL;
 +
 +   nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
 +   if (!nbd_dev)
 +   return -ENOMEM;

 for (i = 0; i  nbds_max; i++) {
 struct gendisk *disk = alloc_disk(1  part_shift);
 --
 1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: fix possible memory leak

2015-01-26 Thread Paul Clements
Sudip,

On Tue, Jan 13, 2015 at 12:19 PM, Sudip Mukherjee
 wrote:
> we have already allocated memory for nbd_dev, but we were not
> releasing that memory and just returning the error value.
>
> Signed-off-by: Sudip Mukherjee 
> ---
>
> Hi Paul,
> after seeing your mail about maintainers, just started to see the code.
> this patch is one way, another way can be to bring the kcalloc part below 
> these checks.

Probably better to move the kcalloc below, rather than adding more
branches, wouldn't you agree?

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 2/4] nbd: Split 'DO_IT' into three functions

2015-01-26 Thread Paul Clements
Markus,

This refactor looks OK with the exception of one thing...

On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann  wrote:

>  /* Must be called with tx_lock held */
>
>  static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
> @@ -684,61 +773,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
> nbd_device *nbd,
> set_capacity(nbd->disk, nbd->bytesize >> 9);
> return 0;
>
> -   case NBD_DO_IT: {
> -   struct task_struct *thread;
> -   struct socket *sock;
> -   int error;
> -
> -   if (nbd->pid)
> -   return -EBUSY;
> -   if (!nbd->sock)
> -   return -EINVAL;
>

You seem to have done away with these checks. Was that inadvertent or
was there a reason for that? The pid check is necessary to prevent two
instances of NBD_DO_IT from running. Without the sock check you'll get
a null pointer deref in nbd_do_it.

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 2/4] nbd: Split 'DO_IT' into three functions

2015-01-26 Thread Paul Clements
Markus,

This refactor looks OK with the exception of one thing...

On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann m...@pengutronix.de wrote:

  /* Must be called with tx_lock held */

  static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 @@ -684,61 +773,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
 nbd_device *nbd,
 set_capacity(nbd-disk, nbd-bytesize  9);
 return 0;

 -   case NBD_DO_IT: {
 -   struct task_struct *thread;
 -   struct socket *sock;
 -   int error;
 -
 -   if (nbd-pid)
 -   return -EBUSY;
 -   if (!nbd-sock)
 -   return -EINVAL;


You seem to have done away with these checks. Was that inadvertent or
was there a reason for that? The pid check is necessary to prevent two
instances of NBD_DO_IT from running. Without the sock check you'll get
a null pointer deref in nbd_do_it.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: fix possible memory leak

2015-01-26 Thread Paul Clements
Sudip,

On Tue, Jan 13, 2015 at 12:19 PM, Sudip Mukherjee
sudipm.mukher...@gmail.com wrote:
 we have already allocated memory for nbd_dev, but we were not
 releasing that memory and just returning the error value.

 Signed-off-by: Sudip Mukherjee su...@vectorindia.org
 ---

 Hi Paul,
 after seeing your mail about maintainers, just started to see the code.
 this patch is one way, another way can be to bring the kcalloc part below 
 these checks.

Probably better to move the kcalloc below, rather than adding more
branches, wouldn't you agree?

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NBD Maintainer

2015-01-13 Thread Paul Clements
On Tue, Jan 13, 2015 at 11:14 AM, Andrey Utkin
 wrote:
> Hi Paul,
> could you please describe

> - how wide is NBD usage today (any estimation is ok),

It depends somewhat on who you consider to be users. There are two
groups of NBD "users":

1) integrators and admins (who work directly with nbd)
2) users of the systems and software that the first group creates

My guess is that the first group is in the hundreds. The second in the
thousands. NBD usage tends to fall into one the following classes:

- support of diskless thin clients (mainly at universities and such)
- replication (used in conjunction with some form of mirroring scheme,
this becomes a network-raid device of sorts)
- block device emulation/user-level block device (e.g., qemu-nbd)
(because NBD has a fairly flexible design, including a fully
user-level server component, it has been used by quite a few people as
a user-level block device for various different purposes)

> - what is in a TODO list,

Nothing specific, other than merging in pending patches, of which
there are a few (check the nbd-general archives).

> - what are critical bugs or important issues requiring work, if there are any.

There is a network timeout issue, which is probably one of the more
important fixes to get in:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770479
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NBD Maintainer

2015-01-13 Thread Paul Clements
All,

I've maintained the NBD driver since 2003, but alas, I no longer have
the time and energy to continue maintaining it. If someone would like
to take over, let me know.

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NBD Maintainer

2015-01-13 Thread Paul Clements
On Tue, Jan 13, 2015 at 11:14 AM, Andrey Utkin
andrey.krieger.ut...@gmail.com wrote:
 Hi Paul,
 could you please describe

 - how wide is NBD usage today (any estimation is ok),

It depends somewhat on who you consider to be users. There are two
groups of NBD users:

1) integrators and admins (who work directly with nbd)
2) users of the systems and software that the first group creates

My guess is that the first group is in the hundreds. The second in the
thousands. NBD usage tends to fall into one the following classes:

- support of diskless thin clients (mainly at universities and such)
- replication (used in conjunction with some form of mirroring scheme,
this becomes a network-raid device of sorts)
- block device emulation/user-level block device (e.g., qemu-nbd)
(because NBD has a fairly flexible design, including a fully
user-level server component, it has been used by quite a few people as
a user-level block device for various different purposes)

 - what is in a TODO list,

Nothing specific, other than merging in pending patches, of which
there are a few (check the nbd-general archives).

 - what are critical bugs or important issues requiring work, if there are any.

There is a network timeout issue, which is probably one of the more
important fixes to get in:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770479
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NBD Maintainer

2015-01-13 Thread Paul Clements
All,

I've maintained the NBD driver since 2003, but alas, I no longer have
the time and energy to continue maintaining it. If someone would like
to take over, let me know.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.

2014-05-26 Thread Paul Clements
On Sun, May 25, 2014 at 6:22 PM, Hani Benhabiles  wrote:
> Len field is already set to zero, but not the from field which is sent as
> 0xfe00. This makes no sense, and may cause confuse server
> implementations doing sanity checks (qemu-nbd is an example.)
>
> Signed-off-by: Hani Benhabiles 
> ---
>
> Compared to v1:
> * Zero the request structure instead of conditionally zeroing specific fields.


Yeah, this one looks good. Sorry for the confusion on the structs...


>  drivers/block/nbd.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 3a70ea2..8e1df52 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -243,14 +243,11 @@ static int nbd_send_req(struct nbd_device *nbd, struct 
> request *req)
> struct nbd_request request;
> unsigned long size = blk_rq_bytes(req);
>
> +   memset(, 0, sizeof(request));
> request.magic = htonl(NBD_REQUEST_MAGIC);
> request.type = htonl(nbd_cmd(req));
>
> -   if (nbd_cmd(req) == NBD_CMD_FLUSH) {
> -   /* Other values are reserved for FLUSH requests.  */
> -   request.from = 0;
> -   request.len = 0;
> -   } else {
> +   if (nbd_cmd(req) != NBD_CMD_FLUSH && nbd_cmd(req) != NBD_CMD_DISC) {
> request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9);
> request.len = htonl(size);
> }
> --
> 1.8.3.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.

2014-05-26 Thread Paul Clements
On Sun, May 25, 2014 at 6:22 PM, Hani Benhabiles kroo...@gmail.com wrote:
 Len field is already set to zero, but not the from field which is sent as
 0xfe00. This makes no sense, and may cause confuse server
 implementations doing sanity checks (qemu-nbd is an example.)

 Signed-off-by: Hani Benhabiles h...@linux.com
 ---

 Compared to v1:
 * Zero the request structure instead of conditionally zeroing specific fields.


Yeah, this one looks good. Sorry for the confusion on the structs...


  drivers/block/nbd.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

 diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
 index 3a70ea2..8e1df52 100644
 --- a/drivers/block/nbd.c
 +++ b/drivers/block/nbd.c
 @@ -243,14 +243,11 @@ static int nbd_send_req(struct nbd_device *nbd, struct 
 request *req)
 struct nbd_request request;
 unsigned long size = blk_rq_bytes(req);

 +   memset(request, 0, sizeof(request));
 request.magic = htonl(NBD_REQUEST_MAGIC);
 request.type = htonl(nbd_cmd(req));

 -   if (nbd_cmd(req) == NBD_CMD_FLUSH) {
 -   /* Other values are reserved for FLUSH requests.  */
 -   request.from = 0;
 -   request.len = 0;
 -   } else {
 +   if (nbd_cmd(req) != NBD_CMD_FLUSH  nbd_cmd(req) != NBD_CMD_DISC) {
 request.from = cpu_to_be64((u64)blk_rq_pos(req)  9);
 request.len = htonl(size);
 }
 --
 1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.

2014-05-25 Thread Paul Clements
On Sun, May 25, 2014 at 6:18 AM, Hani Benhabiles  wrote:
> On Sun, May 18, 2014 at 10:11:13AM +0100, Hani Benhabiles wrote:
>> On Fri, May 16, 2014 at 10:56:33PM -0400, Paul Clements wrote:
>> > Agreed. But better yet, the request structure should just be zeroed when
>> > it's allocated.
>> >
>>
>> It is already initialized  in __nbd_ioctl() with the blk_rq_init() call which
>> sets the __sector value to -1 (which is 0xfe00 after the left 
>> shifts.)
>>
>> This is the only (non-ugly / non-intrusive) way to do it afaict.
>>
>
> Ping!
>
> Anything blocking this patch ?

It's cleaner to just zero the struct and get rid of the conditional
zeroing of specific fields. I'll prepare a patch in the next few days.

Thanks,
Paul

>> > On Fri, May 16, 2014 at 7:43 PM, Hani Benhabiles  wrote:
>> >
>> > > Len field is already set to zero, but not the from field which is sent as
>> > > 0xfe00. This makes no sense, and may cause confuse server
>> > > implementations doing sanity checks (qemu-nbd is an example.)
>> > >
>> > > Signed-off-by: Hani Benhabiles 
>> > > ---
>> > >  drivers/block/nbd.c | 2 +-
>> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> > > index 3a70ea2..657bdac 100644
>> > > --- a/drivers/block/nbd.c
>> > > +++ b/drivers/block/nbd.c
>> > > @@ -246,7 +246,7 @@ static int nbd_send_req(struct nbd_device *nbd, 
>> > > struct
>> > > request *req)
>> > > request.magic = htonl(NBD_REQUEST_MAGIC);
>> > > request.type = htonl(nbd_cmd(req));
>> > >
>> > > -   if (nbd_cmd(req) == NBD_CMD_FLUSH) {
>> > > +   if (nbd_cmd(req) == NBD_CMD_FLUSH || nbd_cmd(req) == 
>> > > NBD_CMD_DISC)
>> > > {
>> > > /* Other values are reserved for FLUSH requests.  */
>> > > request.from = 0;
>> > > request.len = 0;
>> > > --
>> > > 1.8.3.2
>> > >
>> > >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.

2014-05-25 Thread Paul Clements
On Sun, May 25, 2014 at 6:18 AM, Hani Benhabiles kroo...@gmail.com wrote:
 On Sun, May 18, 2014 at 10:11:13AM +0100, Hani Benhabiles wrote:
 On Fri, May 16, 2014 at 10:56:33PM -0400, Paul Clements wrote:
  Agreed. But better yet, the request structure should just be zeroed when
  it's allocated.
 

 It is already initialized  in __nbd_ioctl() with the blk_rq_init() call which
 sets the __sector value to -1 (which is 0xfe00 after the left 
 shifts.)

 This is the only (non-ugly / non-intrusive) way to do it afaict.


 Ping!

 Anything blocking this patch ?

It's cleaner to just zero the struct and get rid of the conditional
zeroing of specific fields. I'll prepare a patch in the next few days.

Thanks,
Paul

  On Fri, May 16, 2014 at 7:43 PM, Hani Benhabiles kroo...@gmail.com wrote:
 
   Len field is already set to zero, but not the from field which is sent as
   0xfe00. This makes no sense, and may cause confuse server
   implementations doing sanity checks (qemu-nbd is an example.)
  
   Signed-off-by: Hani Benhabiles h...@linux.com
   ---
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
  
   diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
   index 3a70ea2..657bdac 100644
   --- a/drivers/block/nbd.c
   +++ b/drivers/block/nbd.c
   @@ -246,7 +246,7 @@ static int nbd_send_req(struct nbd_device *nbd, 
   struct
   request *req)
   request.magic = htonl(NBD_REQUEST_MAGIC);
   request.type = htonl(nbd_cmd(req));
  
   -   if (nbd_cmd(req) == NBD_CMD_FLUSH) {
   +   if (nbd_cmd(req) == NBD_CMD_FLUSH || nbd_cmd(req) == 
   NBD_CMD_DISC)
   {
   /* Other values are reserved for FLUSH requests.  */
   request.from = 0;
   request.len = 0;
   --
   1.8.3.2
  
  
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: correct disconnect behavior

2013-06-27 Thread Paul Clements
On Thu, Jun 27, 2013 at 6:28 PM, Andrew Morton
 wrote:
> On Thu, 27 Jun 2013 18:20:37 -0400 Paul Clements  
> wrote:
> OK, but.  "Would it be safer to clear ->disconnect in NBD_DO_IT?"

About the same in terms of safety. Both ioctls have to be called to
set up the device and neither can be called again, until the device is
reset.

> If not safer, would it be cleaner?

I don't know, seems like a toss up. NBD_SET_SOCK is the earliest place
that the socket might conceivably be usable, so I wanted to clear
disconnect there (e.g., in case an alternate/new version of NBD_DO_IT,
as has been discussed, is ever implemented).

>> > The cool kids are using bool lately ;)
>> >
>>
>> Hey, maybe I want to be able to compile with gcc 2.7.2 ? :)
>
> Sob, I miss 2.7.2.  It was a good 50% faster than the new improved
> models.  But I don't think this yearning makes us cool.

No, I think it just makes us old :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: correct disconnect behavior

2013-06-27 Thread Paul Clements
On Wed, Jun 26, 2013 at 7:21 PM, Andrew Morton
 wrote:
>
> On Wed, 19 Jun 2013 17:09:18 -0400 (EDT) Paul Clements 
>  wrote:
>
> This sounds like something which users of 3.10 and earlier kernels
> might want, so I added the Cc:stable tag.  Please let me know if
> you disagree.


That is a good idea. It would be useful in stable.


> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, 
> > struct nbd_device *nbd,
> >   if (!nbd->sock)
> >   return -EINVAL;
> >
> > + nbd->disconnect = 1;
> > +
> >   nbd_send_req(nbd, );
> >  return 0;
> >   }
> > @@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, 
> > struct nbd_device *nbd,
> >   nbd->sock = SOCKET_I(inode);
> >   if (max_part > 0)
> >   bdev->bd_invalidated = 1;
> > + nbd->disconnect = 0; /* we're connected now */
> >   return 0;
> >   } else {
> >   fput(file);
> > @@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, 
> > struct nbd_device *nbd,
> >   set_capacity(nbd->disk, 0);
> >   if (max_part > 0)
> >   ioctl_by_bdev(bdev, BLKRRPART, 0);
> > + if (nbd->disconnect) /* user requested, ignore socket errors 
> > */
> > + return 0;
> >   return nbd->harderror;
> >   }
>
> hm, how does nbd work...  Hard to tell as nothing seems to be documented
> anywhere :(


I think most people look at the nbd-client to see. :)

But I will work on some docs for the ioctls...

Briefly, though, it goes something like this:

nbd-client and nbd-server negotiate parameters, including the server
telling the client how big the export is

Then the nbd-client does a series of ioctls to set up the device:

NBD_SET_SIZE - set the size of the device
NBD_SET_SOCK - tell the kernel which socket to use
NBD_DO_IT - this ioctl lasts for the life of the device and causes
nbd-client to go into a receive loop, handling I/O replies from the
nbd-server

NBD_DISCONNECT, meanwhile, can be called from a separate thread
(usually "nbd-client -d" calls it). This requests a disconnect of the
device.


>
> > --- a/include/linux/nbd.h
> > +++ b/include/linux/nbd.h
> > @@ -41,6 +41,7 @@ struct nbd_device {
> >   u64 bytesize;
> >   pid_t pid; /* pid of nbd-client, if attached */
> >   int xmit_timeout;
> > + int disconnect; /* a disconnect has been requested by user */
> >  };
>
> The cool kids are using bool lately ;)


Hey, maybe I want to be able to compile with gcc 2.7.2 ? :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: correct disconnect behavior

2013-06-27 Thread Paul Clements
On Wed, Jun 26, 2013 at 7:21 PM, Andrew Morton
a...@linux-foundation.org wrote:

 On Wed, 19 Jun 2013 17:09:18 -0400 (EDT) Paul Clements 
 paul.cleme...@steeleye.com wrote:

 This sounds like something which users of 3.10 and earlier kernels
 might want, so I added the Cc:stable tag.  Please let me know if
 you disagree.


That is a good idea. It would be useful in stable.


  --- a/drivers/block/nbd.c
  +++ b/drivers/block/nbd.c
  @@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, 
  struct nbd_device *nbd,
if (!nbd-sock)
return -EINVAL;
 
  + nbd-disconnect = 1;
  +
nbd_send_req(nbd, sreq);
   return 0;
}
  @@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, 
  struct nbd_device *nbd,
nbd-sock = SOCKET_I(inode);
if (max_part  0)
bdev-bd_invalidated = 1;
  + nbd-disconnect = 0; /* we're connected now */
return 0;
} else {
fput(file);
  @@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, 
  struct nbd_device *nbd,
set_capacity(nbd-disk, 0);
if (max_part  0)
ioctl_by_bdev(bdev, BLKRRPART, 0);
  + if (nbd-disconnect) /* user requested, ignore socket errors 
  */
  + return 0;
return nbd-harderror;
}

 hm, how does nbd work...  Hard to tell as nothing seems to be documented
 anywhere :(


I think most people look at the nbd-client to see. :)

But I will work on some docs for the ioctls...

Briefly, though, it goes something like this:

nbd-client and nbd-server negotiate parameters, including the server
telling the client how big the export is

Then the nbd-client does a series of ioctls to set up the device:

NBD_SET_SIZE - set the size of the device
NBD_SET_SOCK - tell the kernel which socket to use
NBD_DO_IT - this ioctl lasts for the life of the device and causes
nbd-client to go into a receive loop, handling I/O replies from the
nbd-server

NBD_DISCONNECT, meanwhile, can be called from a separate thread
(usually nbd-client -d calls it). This requests a disconnect of the
device.



  --- a/include/linux/nbd.h
  +++ b/include/linux/nbd.h
  @@ -41,6 +41,7 @@ struct nbd_device {
u64 bytesize;
pid_t pid; /* pid of nbd-client, if attached */
int xmit_timeout;
  + int disconnect; /* a disconnect has been requested by user */
   };

 The cool kids are using bool lately ;)


Hey, maybe I want to be able to compile with gcc 2.7.2 ? :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd: correct disconnect behavior

2013-06-27 Thread Paul Clements
On Thu, Jun 27, 2013 at 6:28 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Thu, 27 Jun 2013 18:20:37 -0400 Paul Clements paul.cleme...@us.sios.com 
 wrote:
 OK, but.  Would it be safer to clear -disconnect in NBD_DO_IT?

About the same in terms of safety. Both ioctls have to be called to
set up the device and neither can be called again, until the device is
reset.

 If not safer, would it be cleaner?

I don't know, seems like a toss up. NBD_SET_SOCK is the earliest place
that the socket might conceivably be usable, so I wanted to clear
disconnect there (e.g., in case an alternate/new version of NBD_DO_IT,
as has been discussed, is ever implemented).

  The cool kids are using bool lately ;)
 

 Hey, maybe I want to be able to compile with gcc 2.7.2 ? :)

 Sob, I miss 2.7.2.  It was a good 50% faster than the new improved
 models.  But I don't think this yearning makes us cool.

No, I think it just makes us old :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: correct disconnect behavior

2013-06-19 Thread Paul Clements
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.

This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).

Signed-off-by: Paul Clements 
---

 drivers/block/nbd.c |5 +
 include/linux/nbd.h |1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7fecc78..8b7664d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
if (!nbd->sock)
return -EINVAL;
 
+   nbd->disconnect = 1;
+
nbd_send_req(nbd, );
 return 0;
}
@@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd->sock = SOCKET_I(inode);
if (max_part > 0)
bdev->bd_invalidated = 1;
+   nbd->disconnect = 0; /* we're connected now */
return 0;
} else {
fput(file);
@@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
set_capacity(nbd->disk, 0);
if (max_part > 0)
ioctl_by_bdev(bdev, BLKRRPART, 0);
+   if (nbd->disconnect) /* user requested, ignore socket errors */
+   return 0;
return nbd->harderror;
}
 
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index 4871170..ae4981e 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -41,6 +41,7 @@ struct nbd_device {
u64 bytesize;
pid_t pid; /* pid of nbd-client, if attached */
int xmit_timeout;
+   int disconnect; /* a disconnect has been requested by user */
 };
 
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: correct disconnect behavior

2013-06-19 Thread Paul Clements
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.

This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).

Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---

 drivers/block/nbd.c |5 +
 include/linux/nbd.h |1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7fecc78..8b7664d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
if (!nbd-sock)
return -EINVAL;
 
+   nbd-disconnect = 1;
+
nbd_send_req(nbd, sreq);
 return 0;
}
@@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd-sock = SOCKET_I(inode);
if (max_part  0)
bdev-bd_invalidated = 1;
+   nbd-disconnect = 0; /* we're connected now */
return 0;
} else {
fput(file);
@@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
set_capacity(nbd-disk, 0);
if (max_part  0)
ioctl_by_bdev(bdev, BLKRRPART, 0);
+   if (nbd-disconnect) /* user requested, ignore socket errors */
+   return 0;
return nbd-harderror;
}
 
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index 4871170..ae4981e 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -41,6 +41,7 @@ struct nbd_device {
u64 bytesize;
pid_t pid; /* pid of nbd-client, if attached */
int xmit_timeout;
+   int disconnect; /* a disconnect has been requested by user */
 };
 
 #endif
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: remove bogus BUG_ON in NBD_CLEAR_QUE

2013-06-18 Thread Paul Clements
From: Michal Belczyk 

The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
is now done by two other ioctls). We should stop trying to make bogus
assertions in it. Also, user-level code should remove calls to
NBD_CLEAR_QUE, ASAP.

Signed-off-by: Michal Belczyk 
Signed-off-by: Paul Clements 
---

 nbd.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7fecc78..21ba264 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -750,7 +750,6 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
 * This is for compatibility only.  The queue is always cleared
 * by NBD_DO_IT or NBD_CLEAR_SOCK.
 */
-   BUG_ON(!nbd->sock && !list_empty(>queue_head));
return 0;
 
case NBD_PRINT_DEBUG:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: remove bogus BUG_ON in NBD_CLEAR_QUE

2013-06-18 Thread Paul Clements
From: Michal Belczyk belc...@bsd.krakow.pl

The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
is now done by two other ioctls). We should stop trying to make bogus
assertions in it. Also, user-level code should remove calls to
NBD_CLEAR_QUE, ASAP.

Signed-off-by: Michal Belczyk belc...@bsd.krakow.pl
Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---

 nbd.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7fecc78..21ba264 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -750,7 +750,6 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
 * This is for compatibility only.  The queue is always cleared
 * by NBD_DO_IT or NBD_CLEAR_SOCK.
 */
-   BUG_ON(!nbd-sock  !list_empty(nbd-queue_head));
return 0;
 
case NBD_PRINT_DEBUG:
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: increase default and max request sizes

2013-04-02 Thread Paul Clements
This patch raises the default max request size for nbd to 128KB (from 127KB)
to get it 4KB aligned. This patch also allows the max request size to be
increased (via /sys/block/nbd/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic
difference in performance.

From: Michal Belczyk 
Signed-off-by: Paul Clements 
---

 nbd.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7fecc78..037288e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -856,6 +856,8 @@ static int __init nbd_init(void)
disk->queue->limits.discard_granularity = 512;
disk->queue->limits.max_discard_sectors = UINT_MAX;
disk->queue->limits.discard_zeroes_data = 0;
+   blk_queue_max_hw_sectors(disk->queue, 65536);
+   disk->queue->limits.max_sectors = 256;
}
 
if (register_blkdev(NBD_MAJOR, "nbd")) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: increase default and max request sizes

2013-04-02 Thread Paul Clements
This patch raises the default max request size for nbd to 128KB (from 127KB)
to get it 4KB aligned. This patch also allows the max request size to be
increased (via /sys/block/nbdx/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic
difference in performance.

From: Michal Belczyk belc...@bsd.krakow.pl
Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---

 nbd.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7fecc78..037288e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -856,6 +856,8 @@ static int __init nbd_init(void)
disk-queue-limits.discard_granularity = 512;
disk-queue-limits.max_discard_sectors = UINT_MAX;
disk-queue-limits.discard_zeroes_data = 0;
+   blk_queue_max_hw_sectors(disk-queue, 65536);
+   disk-queue-limits.max_sectors = 256;
}
 
if (register_blkdev(NBD_MAJOR, nbd)) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] nbd: show read-only state in sysfs

2013-02-12 Thread Paul Clements
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini  wrote:
> Pass the read-only flag to set_device_ro, so that it will be
> visible to the block layer and in sysfs.

Looks good


> Cc: 
> Cc: Paul Clements 
> Cc: Andrew Morton 
> Signed-off-by: Paolo Bonzini 
> ---
>  drivers/block/nbd.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index a9c5c7a..ef17c2e 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -703,6 +703,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
> nbd_device *nbd,
>
> mutex_unlock(>tx_lock);
>
> +   if (nbd->flags & NBD_FLAG_READ_ONLY)
> +   set_device_ro(bdev, true);
> if (nbd->flags & NBD_FLAG_SEND_TRIM)
> queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
> nbd->disk->queue);
> @@ -730,6 +732,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
> nbd_device *nbd,
> dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
> kill_bdev(bdev);
> queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, 
> nbd->disk->queue);
> +   set_device_ro(bdev, false);
> if (file)
> fput(file);
> nbd->flags = 0;
> --
> 1.7.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown

2013-02-12 Thread Paul Clements
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini  wrote:
> There are two problems with shutdown in the NBD driver.  The first is
> that receiving the NBD_DISCONNECT ioctl does not sync the filesystem;
> this is useful because BLKFLSBUF is restricted to processes that have
> CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the
> block device does not sync the filesystem, either).

> The second is that once we clear the socket we have no guarantee that
> later reads will come from the same backing storage.  Thus the page cache
> must be cleaned, lest reads that hit on the page cache will return stale
> data from the previously-accessible disk.

Paolo,

Thanks for this. A problem indeed...

Acked-by: paul.cleme...@steeleye.com


> Cc: 
> Cc: 
> Cc: Paul Clements 
> Cc: Andrew Morton 
> Signed-off-by: Paolo Bonzini 
> ---
>  drivers/block/nbd.c |   10 ++
>  1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5603765..a9c5c7a 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -608,12 +608,20 @@ static int __nbd_ioctl(struct block_device *bdev, 
> struct nbd_device *nbd,
> struct request sreq;
>
> dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n");
> +   if (!nbd->sock)
> +   return -EINVAL;
>
> +   mutex_unlock(>tx_lock);
> +   fsync_bdev(bdev);
> +   mutex_lock(>tx_lock);
> blk_rq_init(NULL, );
> sreq.cmd_type = REQ_TYPE_SPECIAL;
> nbd_cmd() = NBD_CMD_DISC;
> +
> +   /* Check again after getting mutex back.  */
> if (!nbd->sock)
> return -EINVAL;
> +
> nbd_send_req(nbd, );
>  return 0;
> }
> @@ -627,6 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
> nbd_device *nbd,
> nbd_clear_que(nbd);
> BUG_ON(!list_empty(>queue_head));
> BUG_ON(!list_empty(>waiting_queue));
> +   kill_bdev(bdev);
> if (file)
> fput(file);
> return 0;
> @@ -719,6 +728,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
> nbd_device *nbd,
> nbd->file = NULL;
> nbd_clear_que(nbd);
> dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
> +   kill_bdev(bdev);
> queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, 
> nbd->disk->queue);
> if (file)
> fput(file);
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] nbd: support FLUSH requests

2013-02-12 Thread Paul Clements
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini  wrote:
> From: Alex Bligh 
>
> The NBD device does not support writeback caching, thus it is not safe
> against power losses unless the client opens the target with O_DSYNC or
> O_SYNC.
>
> Add support for a new flag that the server can pass.  If the flag is
> enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH
> command.
>
> Cc: 
> Cc: Paul Clements 
> Cc: Andrew Morton 
> Signed-off-by: Alex Bligh 

Looks good.

Acked-by: paul.cleme...@steeleye.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] nbd: support FLUSH requests

2013-02-12 Thread Paul Clements
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 From: Alex Bligh a...@alex.org.uk

 The NBD device does not support writeback caching, thus it is not safe
 against power losses unless the client opens the target with O_DSYNC or
 O_SYNC.

 Add support for a new flag that the server can pass.  If the flag is
 enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH
 command.

 Cc: nbd-gene...@lists.sf.net
 Cc: Paul Clements paul.cleme...@steeleye.com
 Cc: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Alex Bligh a...@alex.org.uk

Looks good.

Acked-by: paul.cleme...@steeleye.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown

2013-02-12 Thread Paul Clements
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 There are two problems with shutdown in the NBD driver.  The first is
 that receiving the NBD_DISCONNECT ioctl does not sync the filesystem;
 this is useful because BLKFLSBUF is restricted to processes that have
 CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the
 block device does not sync the filesystem, either).

 The second is that once we clear the socket we have no guarantee that
 later reads will come from the same backing storage.  Thus the page cache
 must be cleaned, lest reads that hit on the page cache will return stale
 data from the previously-accessible disk.

Paolo,

Thanks for this. A problem indeed...

Acked-by: paul.cleme...@steeleye.com


 Cc: sta...@vger.kernel.org
 Cc: nbd-gene...@lists.sf.net
 Cc: Paul Clements paul.cleme...@steeleye.com
 Cc: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  drivers/block/nbd.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
 index 5603765..a9c5c7a 100644
 --- a/drivers/block/nbd.c
 +++ b/drivers/block/nbd.c
 @@ -608,12 +608,20 @@ static int __nbd_ioctl(struct block_device *bdev, 
 struct nbd_device *nbd,
 struct request sreq;

 dev_info(disk_to_dev(nbd-disk), NBD_DISCONNECT\n);
 +   if (!nbd-sock)
 +   return -EINVAL;

 +   mutex_unlock(nbd-tx_lock);
 +   fsync_bdev(bdev);
 +   mutex_lock(nbd-tx_lock);
 blk_rq_init(NULL, sreq);
 sreq.cmd_type = REQ_TYPE_SPECIAL;
 nbd_cmd(sreq) = NBD_CMD_DISC;
 +
 +   /* Check again after getting mutex back.  */
 if (!nbd-sock)
 return -EINVAL;
 +
 nbd_send_req(nbd, sreq);
  return 0;
 }
 @@ -627,6 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
 nbd_device *nbd,
 nbd_clear_que(nbd);
 BUG_ON(!list_empty(nbd-queue_head));
 BUG_ON(!list_empty(nbd-waiting_queue));
 +   kill_bdev(bdev);
 if (file)
 fput(file);
 return 0;
 @@ -719,6 +728,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
 nbd_device *nbd,
 nbd-file = NULL;
 nbd_clear_que(nbd);
 dev_warn(disk_to_dev(nbd-disk), queue cleared\n);
 +   kill_bdev(bdev);
 queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, 
 nbd-disk-queue);
 if (file)
 fput(file);
 --
 1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] nbd: show read-only state in sysfs

2013-02-12 Thread Paul Clements
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Pass the read-only flag to set_device_ro, so that it will be
 visible to the block layer and in sysfs.

Looks good


 Cc: nbd-gene...@lists.sf.net
 Cc: Paul Clements paul.cleme...@steeleye.com
 Cc: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  drivers/block/nbd.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
 index a9c5c7a..ef17c2e 100644
 --- a/drivers/block/nbd.c
 +++ b/drivers/block/nbd.c
 @@ -703,6 +703,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
 nbd_device *nbd,

 mutex_unlock(nbd-tx_lock);

 +   if (nbd-flags  NBD_FLAG_READ_ONLY)
 +   set_device_ro(bdev, true);
 if (nbd-flags  NBD_FLAG_SEND_TRIM)
 queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
 nbd-disk-queue);
 @@ -730,6 +732,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
 nbd_device *nbd,
 dev_warn(disk_to_dev(nbd-disk), queue cleared\n);
 kill_bdev(bdev);
 queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, 
 nbd-disk-queue);
 +   set_device_ro(bdev, false);
 if (file)
 fput(file);
 nbd-flags = 0;
 --
 1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] nbd: clear waiting_queue on shutdown

2012-08-31 Thread paul . clements

From: Paul Clements 


Signed-off-by: Paul Clements 
---

 nbd.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d07c9f7..0c03411 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -449,6 +449,14 @@ static void nbd_clear_que(struct nbd_device *nbd)
req->errors++;
nbd_end_request(req);
}
+
+   while (!list_empty(>waiting_queue)) {
+   req = list_entry(nbd->waiting_queue.next, struct request,
+queuelist);
+   list_del_init(>queuelist);
+   req->errors++;
+   nbd_end_request(req);
+   }
 }
 
 
@@ -598,6 +606,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd->file = NULL;
nbd_clear_que(nbd);
BUG_ON(!list_empty(>queue_head));
+   BUG_ON(!list_empty(>waiting_queue));
if (file)
fput(file);
return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1] nbd: fix dangling requests bug

2012-08-31 Thread paul . clements
This patch fixes a serious, but uncommon bug in nbd. This should probably
be considered for backport to one or more stable branches.

The bug occurs when there is heavy I/O going to the nbd device while, at the
same time, a failure (server, network) or manual disconnect of the nbd
connection occurs.

There is a small window between the time that the nbd_thread is stopped and the
socket is shutdown where requests can continue to be queued to nbd's
internal waiting_queue. When this happens, those requests are never completed
or freed.

The fix is to clear the waiting_queue on shutdown of the nbd device, in
the same way that the nbd request queue (queue_head) is already being cleared.

--
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 2/2] nbd: handle discard requests

2012-08-31 Thread paul . clements
Description: This patch adds discard support to nbd. If the nbd-server
supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client
will then set the flag in the kernel via NBD_SET_FLAGS, which tells the
kernel to enable discards for the device (QUEUE_FLAG_DISCARD).

If discard support is enabled, then when the nbd client system receives
a discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:

fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements 
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c544bb4..a014169 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd)
case  NBD_CMD_READ: return "read";
case NBD_CMD_WRITE: return "write";
case  NBD_CMD_DISC: return "disconnect";
+   case  NBD_CMD_TRIM: return "trim/discard";
}
return "invalid";
 }
@@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
 
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
-   nbd_cmd(req) = NBD_CMD_WRITE;
+   if ((req->cmd_flags & REQ_DISCARD)) {
+   WARN_ON(!(nbd->flags & NBD_FLAG_SEND_TRIM));
+   nbd_cmd(req) = NBD_CMD_TRIM;
+   } else
+   nbd_cmd(req) = NBD_CMD_WRITE;
if (nbd->flags & NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd->disk),
"Write on read-only\n");
@@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
 
mutex_unlock(>tx_lock);
 
+   if (nbd->flags & NBD_FLAG_SEND_TRIM)
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
+   nbd->disk->queue);
+
thread = kthread_create(nbd_thread, nbd, nbd->disk->disk_name);
if (IS_ERR(thread)) {
mutex_lock(>tx_lock);
@@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd->file = NULL;
nbd_clear_que(nbd);
dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
if (file)
fput(file);
nbd->bytesize = 0;
@@ -802,6 +812,9 @@ static int __init nbd_init(void)
 * Tell the block layer that we are not a rotational device
 */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
+   disk->queue->limits.discard_granularity = 512;
+   disk->queue->limits.max_discard_sectors = UINT_MAX;
+   disk->queue->limits.discard_zeroes_data = 0;
}
 
if (register_blkdev(NBD_MAJOR, "nbd")) {
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index bb349be..3b49a63 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -32,12 +32,16 @@
 enum {
NBD_CMD_READ = 0,
NBD_CMD_WRITE = 1,
-   NBD_CMD_DISC = 2
+   NBD_CMD_DISC = 2,
+   /* there is a gap here to match userspace */
+   NBD_CMD_TRIM = 4
 };
 
 /* values for flags field */
 #define NBD_FLAG_HAS_FLAGS(1 << 0) /* nbd-server supports flags */
 #define NBD_FLAG_READ_ONLY(1 << 1) /* device is read-only */
+/* there is a gap here to match userspace */
+#define NBD_FLAG_SEND_TRIM(1 << 5) /* send trim/discard */
 
 #define nbd_cmd(req) ((req)->cmd[0])
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 1/2] nbd: add set flags ioctl

2012-08-31 Thread paul . clements
Description: This patch adds a set-flags ioctl, allowing various option
flags to be set on an nbd device. This allows the nbd-client to set
the device flags (to enable read-only mode, or enable discard support, etc.).

Flags are typically specified by the nbd-server. During the negotiation phase
of the nbd connection, the server sends its flags to the client. The
client then uses NBD_SET_FLAGS to inform the kernel of the options.

Also included is a one-line fix to debug output for the set-timeout ioctl.

Signed-off-by: Paul Clements 
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d07c9f7..c544bb4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd)
case NBD_SET_SOCK: return "set-sock";
case NBD_SET_BLKSIZE: return "set-blksize";
case NBD_SET_SIZE: return "set-size";
+   case NBD_SET_TIMEOUT: return "set-timeout";
+   case NBD_SET_FLAGS: return "set-flags";
case NBD_DO_IT: return "do-it";
case NBD_CLEAR_SOCK: return "clear-sock";
case NBD_CLEAR_QUE: return "clear-que";
@@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
nbd_cmd(req) = NBD_CMD_WRITE;
-   if (nbd->flags & NBD_READ_ONLY) {
+   if (nbd->flags & NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd->disk),
"Write on read-only\n");
goto error_out;
@@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd->xmit_timeout = arg * HZ;
return 0;
 
+   case NBD_SET_FLAGS:
+   nbd->flags = arg;
+   return 0;
+
case NBD_SET_SIZE_BLOCKS:
nbd->bytesize = ((u64) arg) * nbd->blksize;
bdev->bd_inode->i_size = nbd->bytesize;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index d146ca1..bb349be 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -27,6 +27,7 @@
 #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 )
 #define NBD_DISCONNECT  _IO( 0xab, 8 )
 #define NBD_SET_TIMEOUT _IO( 0xab, 9 )
+#define NBD_SET_FLAGS   _IO( 0xab, 10)
 
 enum {
NBD_CMD_READ = 0,
@@ -34,6 +35,10 @@ enum {
NBD_CMD_DISC = 2
 };
 
+/* values for flags field */
+#define NBD_FLAG_HAS_FLAGS(1 << 0) /* nbd-server supports flags */
+#define NBD_FLAG_READ_ONLY(1 << 1) /* device is read-only */
+
 #define nbd_cmd(req) ((req)->cmd[0])
 
 /* userspace doesn't need the nbd_device structure */
@@ -42,10 +47,6 @@ enum {
 #include 
 #include 
 
-/* values for flags field */
-#define NBD_READ_ONLY 0x0001
-#define NBD_WRITE_NOCHK 0x0002
-
 struct request;
 
 struct nbd_device {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 0/2] nbd: add discard support

2012-08-31 Thread paul . clements
This patchset adds discard request support to nbd. This should be good
for inclusion in next.

The first patch adds a set-flags ioctl, allowing various option flags
to be set on an nbd device. One of the new flags (NBD_FLAG_SEND_TRIM)
tells the nbd client to send discard requests to the server.

The second patch adds handling of discard requests to nbd when
NBD_FLAG_SEND_TRIM is set.

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] [RESEND] add discard support to nbd

2012-08-31 Thread Paul Clements
On Thu, Aug 30, 2012 at 6:03 PM, Andrew Morton
 wrote:
> On Wed, 29 Aug 2012 08:41:02 -0400
> paul.cleme...@steeleye.com wrote:
>
>> Description: This patch adds discard support to nbd. When the nbd client
>> system receives a discard request, this will be passed along to the nbd
>> server system, where the nbd-server will respond by performing:
>>   fallocate(.. FALLOC_FL_PUNCH_HOLE ..)
>>
>> To punch a hole in the backend storage, which is no longer needed.
>>
>
> What happens if the user is running an older server?

In that case, the older server will not have set NBD_SEND_FLAG_TRIM,
so QUEUE_FLAG_DISCARD doesn't get set, so no discards will be sent to
the device.

> I is it possible that because the old server didn't set
> NBD_FLAG_SEND_TRIM, the user's screen gets filled with WARN_ONs?

No.

> Anyway, please make sure this combination was tested!

It was.

Thanks for the feedback.

I will resend v2 of the patches shortly.

--
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] [RESEND] add discard support to nbd

2012-08-31 Thread Paul Clements
On Thu, Aug 30, 2012 at 6:03 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Wed, 29 Aug 2012 08:41:02 -0400
 paul.cleme...@steeleye.com wrote:

 Description: This patch adds discard support to nbd. When the nbd client
 system receives a discard request, this will be passed along to the nbd
 server system, where the nbd-server will respond by performing:
   fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

 To punch a hole in the backend storage, which is no longer needed.


 What happens if the user is running an older server?

In that case, the older server will not have set NBD_SEND_FLAG_TRIM,
so QUEUE_FLAG_DISCARD doesn't get set, so no discards will be sent to
the device.

 I is it possible that because the old server didn't set
 NBD_FLAG_SEND_TRIM, the user's screen gets filled with WARN_ONs?

No.

 Anyway, please make sure this combination was tested!

It was.

Thanks for the feedback.

I will resend v2 of the patches shortly.

--
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 0/2] nbd: add discard support

2012-08-31 Thread paul . clements
This patchset adds discard request support to nbd. This should be good
for inclusion in next.

The first patch adds a set-flags ioctl, allowing various option flags
to be set on an nbd device. One of the new flags (NBD_FLAG_SEND_TRIM)
tells the nbd client to send discard requests to the server.

The second patch adds handling of discard requests to nbd when
NBD_FLAG_SEND_TRIM is set.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 1/2] nbd: add set flags ioctl

2012-08-31 Thread paul . clements
Description: This patch adds a set-flags ioctl, allowing various option
flags to be set on an nbd device. This allows the nbd-client to set
the device flags (to enable read-only mode, or enable discard support, etc.).

Flags are typically specified by the nbd-server. During the negotiation phase
of the nbd connection, the server sends its flags to the client. The
client then uses NBD_SET_FLAGS to inform the kernel of the options.

Also included is a one-line fix to debug output for the set-timeout ioctl.

Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d07c9f7..c544bb4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd)
case NBD_SET_SOCK: return set-sock;
case NBD_SET_BLKSIZE: return set-blksize;
case NBD_SET_SIZE: return set-size;
+   case NBD_SET_TIMEOUT: return set-timeout;
+   case NBD_SET_FLAGS: return set-flags;
case NBD_DO_IT: return do-it;
case NBD_CLEAR_SOCK: return clear-sock;
case NBD_CLEAR_QUE: return clear-que;
@@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
nbd_cmd(req) = NBD_CMD_WRITE;
-   if (nbd-flags  NBD_READ_ONLY) {
+   if (nbd-flags  NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd-disk),
Write on read-only\n);
goto error_out;
@@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd-xmit_timeout = arg * HZ;
return 0;
 
+   case NBD_SET_FLAGS:
+   nbd-flags = arg;
+   return 0;
+
case NBD_SET_SIZE_BLOCKS:
nbd-bytesize = ((u64) arg) * nbd-blksize;
bdev-bd_inode-i_size = nbd-bytesize;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index d146ca1..bb349be 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -27,6 +27,7 @@
 #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 )
 #define NBD_DISCONNECT  _IO( 0xab, 8 )
 #define NBD_SET_TIMEOUT _IO( 0xab, 9 )
+#define NBD_SET_FLAGS   _IO( 0xab, 10)
 
 enum {
NBD_CMD_READ = 0,
@@ -34,6 +35,10 @@ enum {
NBD_CMD_DISC = 2
 };
 
+/* values for flags field */
+#define NBD_FLAG_HAS_FLAGS(1  0) /* nbd-server supports flags */
+#define NBD_FLAG_READ_ONLY(1  1) /* device is read-only */
+
 #define nbd_cmd(req) ((req)-cmd[0])
 
 /* userspace doesn't need the nbd_device structure */
@@ -42,10 +47,6 @@ enum {
 #include linux/wait.h
 #include linux/mutex.h
 
-/* values for flags field */
-#define NBD_READ_ONLY 0x0001
-#define NBD_WRITE_NOCHK 0x0002
-
 struct request;
 
 struct nbd_device {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 2/2] nbd: handle discard requests

2012-08-31 Thread paul . clements
Description: This patch adds discard support to nbd. If the nbd-server
supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client
will then set the flag in the kernel via NBD_SET_FLAGS, which tells the
kernel to enable discards for the device (QUEUE_FLAG_DISCARD).

If discard support is enabled, then when the nbd client system receives
a discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:

fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c544bb4..a014169 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd)
case  NBD_CMD_READ: return read;
case NBD_CMD_WRITE: return write;
case  NBD_CMD_DISC: return disconnect;
+   case  NBD_CMD_TRIM: return trim/discard;
}
return invalid;
 }
@@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
 
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
-   nbd_cmd(req) = NBD_CMD_WRITE;
+   if ((req-cmd_flags  REQ_DISCARD)) {
+   WARN_ON(!(nbd-flags  NBD_FLAG_SEND_TRIM));
+   nbd_cmd(req) = NBD_CMD_TRIM;
+   } else
+   nbd_cmd(req) = NBD_CMD_WRITE;
if (nbd-flags  NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd-disk),
Write on read-only\n);
@@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
 
mutex_unlock(nbd-tx_lock);
 
+   if (nbd-flags  NBD_FLAG_SEND_TRIM)
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
+   nbd-disk-queue);
+
thread = kthread_create(nbd_thread, nbd, nbd-disk-disk_name);
if (IS_ERR(thread)) {
mutex_lock(nbd-tx_lock);
@@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd-file = NULL;
nbd_clear_que(nbd);
dev_warn(disk_to_dev(nbd-disk), queue cleared\n);
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue);
if (file)
fput(file);
nbd-bytesize = 0;
@@ -802,6 +812,9 @@ static int __init nbd_init(void)
 * Tell the block layer that we are not a rotational device
 */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk-queue);
+   disk-queue-limits.discard_granularity = 512;
+   disk-queue-limits.max_discard_sectors = UINT_MAX;
+   disk-queue-limits.discard_zeroes_data = 0;
}
 
if (register_blkdev(NBD_MAJOR, nbd)) {
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index bb349be..3b49a63 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -32,12 +32,16 @@
 enum {
NBD_CMD_READ = 0,
NBD_CMD_WRITE = 1,
-   NBD_CMD_DISC = 2
+   NBD_CMD_DISC = 2,
+   /* there is a gap here to match userspace */
+   NBD_CMD_TRIM = 4
 };
 
 /* values for flags field */
 #define NBD_FLAG_HAS_FLAGS(1  0) /* nbd-server supports flags */
 #define NBD_FLAG_READ_ONLY(1  1) /* device is read-only */
+/* there is a gap here to match userspace */
+#define NBD_FLAG_SEND_TRIM(1  5) /* send trim/discard */
 
 #define nbd_cmd(req) ((req)-cmd[0])
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1] nbd: fix dangling requests bug

2012-08-31 Thread paul . clements
This patch fixes a serious, but uncommon bug in nbd. This should probably
be considered for backport to one or more stable branches.

The bug occurs when there is heavy I/O going to the nbd device while, at the
same time, a failure (server, network) or manual disconnect of the nbd
connection occurs.

There is a small window between the time that the nbd_thread is stopped and the
socket is shutdown where requests can continue to be queued to nbd's
internal waiting_queue. When this happens, those requests are never completed
or freed.

The fix is to clear the waiting_queue on shutdown of the nbd device, in
the same way that the nbd request queue (queue_head) is already being cleared.

--
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] nbd: clear waiting_queue on shutdown

2012-08-31 Thread paul . clements

From: Paul Clements paul.cleme...@steeleye.com


Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---

 nbd.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d07c9f7..0c03411 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -449,6 +449,14 @@ static void nbd_clear_que(struct nbd_device *nbd)
req-errors++;
nbd_end_request(req);
}
+
+   while (!list_empty(nbd-waiting_queue)) {
+   req = list_entry(nbd-waiting_queue.next, struct request,
+queuelist);
+   list_del_init(req-queuelist);
+   req-errors++;
+   nbd_end_request(req);
+   }
 }
 
 
@@ -598,6 +606,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd-file = NULL;
nbd_clear_que(nbd);
BUG_ON(!list_empty(nbd-queue_head));
+   BUG_ON(!list_empty(nbd-waiting_queue));
if (file)
fput(file);
return 0;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] [RESEND] add discard support to nbd

2012-08-29 Thread paul . clements
Description: This patch adds discard support to nbd. When the nbd client
system receives a discard request, this will be passed along to the nbd
server system, where the nbd-server will respond by performing:
fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements 
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c544bb4..a014169 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd)
case  NBD_CMD_READ: return "read";
case NBD_CMD_WRITE: return "write";
case  NBD_CMD_DISC: return "disconnect";
+   case  NBD_CMD_TRIM: return "trim/discard";
}
return "invalid";
 }
@@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
 
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
-   nbd_cmd(req) = NBD_CMD_WRITE;
+   if ((req->cmd_flags & REQ_DISCARD)) {
+   WARN_ON(!(nbd->flags & NBD_FLAG_SEND_TRIM));
+   nbd_cmd(req) = NBD_CMD_TRIM;
+   } else
+   nbd_cmd(req) = NBD_CMD_WRITE;
if (nbd->flags & NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd->disk),
"Write on read-only\n");
@@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
 
mutex_unlock(>tx_lock);
 
+   if (nbd->flags & NBD_FLAG_SEND_TRIM)
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
+   nbd->disk->queue);
+
thread = kthread_create(nbd_thread, nbd, nbd->disk->disk_name);
if (IS_ERR(thread)) {
mutex_lock(>tx_lock);
@@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd->file = NULL;
nbd_clear_que(nbd);
dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
if (file)
fput(file);
nbd->bytesize = 0;
@@ -802,6 +812,9 @@ static int __init nbd_init(void)
 * Tell the block layer that we are not a rotational device
 */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue);
+   disk->queue->limits.discard_granularity = 512;
+   disk->queue->limits.max_discard_sectors = UINT_MAX;
+   disk->queue->limits.discard_zeroes_data = 0;
}
 
if (register_blkdev(NBD_MAJOR, "nbd")) {
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index bb349be..3b49a63 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -32,12 +32,16 @@
 enum {
NBD_CMD_READ = 0,
NBD_CMD_WRITE = 1,
-   NBD_CMD_DISC = 2
+   NBD_CMD_DISC = 2,
+   /* there is a gap here to match userspace */
+   NBD_CMD_TRIM = 4
 };
 
 /* values for flags field */
 #define NBD_FLAG_HAS_FLAGS (1 << 0)
 #define NBD_FLAG_READ_ONLY (1 << 1)
+/* there is a gap here to match userspace */
+#define NBD_FLAG_SEND_TRIM (1 << 5) /* send trim/discard */
 
 #define nbd_cmd(req) ((req)->cmd[0])
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] [RESEND] add discard support to nbd

2012-08-29 Thread paul . clements
[ ... resend, as my mailer garbled the last post ... ]

This patchset adds discard request support to nbd. This should be good
for inclusion in next.

The first patch adds a set-flags ioctl, allowing various option flags
to be set on an nbd device. One of the new flags tells the nbd client
to send discard requests to the server.
The second patch adds handling of discard requests to nbd when
NBD_FLAG_SEND_TRIM is set.

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] [RESEND] add discard support to nbd

2012-08-29 Thread paul . clements
Description: This patch adds a set-flags ioctl, allowing various option
flags to be set on an nbd device.

Signed-off-by: Paul Clements 
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d07c9f7..c544bb4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd)
case NBD_SET_SOCK: return "set-sock";
case NBD_SET_BLKSIZE: return "set-blksize";
case NBD_SET_SIZE: return "set-size";
+   case NBD_SET_TIMEOUT: return "set-timeout";
+   case NBD_SET_FLAGS: return "set-flags";
case NBD_DO_IT: return "do-it";
case NBD_CLEAR_SOCK: return "clear-sock";
case NBD_CLEAR_QUE: return "clear-que";
@@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
nbd_cmd(req) = NBD_CMD_WRITE;
-   if (nbd->flags & NBD_READ_ONLY) {
+   if (nbd->flags & NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd->disk),
"Write on read-only\n");
goto error_out;
@@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd->xmit_timeout = arg * HZ;
return 0;
 
+   case NBD_SET_FLAGS:
+   nbd->flags = arg;
+   return 0;
+
case NBD_SET_SIZE_BLOCKS:
nbd->bytesize = ((u64) arg) * nbd->blksize;
bdev->bd_inode->i_size = nbd->bytesize;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index d146ca1..bb349be 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -27,6 +27,7 @@
 #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 )
 #define NBD_DISCONNECT  _IO( 0xab, 8 )
 #define NBD_SET_TIMEOUT _IO( 0xab, 9 )
+#define NBD_SET_FLAGS   _IO( 0xab, 10)
 
 enum {
NBD_CMD_READ = 0,
@@ -34,6 +35,10 @@ enum {
NBD_CMD_DISC = 2
 };
 
+/* values for flags field */
+#define NBD_FLAG_HAS_FLAGS (1 << 0)
+#define NBD_FLAG_READ_ONLY (1 << 1)
+
 #define nbd_cmd(req) ((req)->cmd[0])
 
 /* userspace doesn't need the nbd_device structure */
@@ -42,10 +47,6 @@ enum {
 #include 
 #include 
 
-/* values for flags field */
-#define NBD_READ_ONLY 0x0001
-#define NBD_WRITE_NOCHK 0x0002
-
 struct request;
 
 struct nbd_device {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] [RESEND] add discard support to nbd

2012-08-29 Thread paul . clements
Description: This patch adds a set-flags ioctl, allowing various option
flags to be set on an nbd device.

Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d07c9f7..c544bb4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd)
case NBD_SET_SOCK: return set-sock;
case NBD_SET_BLKSIZE: return set-blksize;
case NBD_SET_SIZE: return set-size;
+   case NBD_SET_TIMEOUT: return set-timeout;
+   case NBD_SET_FLAGS: return set-flags;
case NBD_DO_IT: return do-it;
case NBD_CLEAR_SOCK: return clear-sock;
case NBD_CLEAR_QUE: return clear-que;
@@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
nbd_cmd(req) = NBD_CMD_WRITE;
-   if (nbd-flags  NBD_READ_ONLY) {
+   if (nbd-flags  NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd-disk),
Write on read-only\n);
goto error_out;
@@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd-xmit_timeout = arg * HZ;
return 0;
 
+   case NBD_SET_FLAGS:
+   nbd-flags = arg;
+   return 0;
+
case NBD_SET_SIZE_BLOCKS:
nbd-bytesize = ((u64) arg) * nbd-blksize;
bdev-bd_inode-i_size = nbd-bytesize;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index d146ca1..bb349be 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -27,6 +27,7 @@
 #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 )
 #define NBD_DISCONNECT  _IO( 0xab, 8 )
 #define NBD_SET_TIMEOUT _IO( 0xab, 9 )
+#define NBD_SET_FLAGS   _IO( 0xab, 10)
 
 enum {
NBD_CMD_READ = 0,
@@ -34,6 +35,10 @@ enum {
NBD_CMD_DISC = 2
 };
 
+/* values for flags field */
+#define NBD_FLAG_HAS_FLAGS (1  0)
+#define NBD_FLAG_READ_ONLY (1  1)
+
 #define nbd_cmd(req) ((req)-cmd[0])
 
 /* userspace doesn't need the nbd_device structure */
@@ -42,10 +47,6 @@ enum {
 #include linux/wait.h
 #include linux/mutex.h
 
-/* values for flags field */
-#define NBD_READ_ONLY 0x0001
-#define NBD_WRITE_NOCHK 0x0002
-
 struct request;
 
 struct nbd_device {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] [RESEND] add discard support to nbd

2012-08-29 Thread paul . clements
[ ... resend, as my mailer garbled the last post ... ]

This patchset adds discard request support to nbd. This should be good
for inclusion in next.

The first patch adds a set-flags ioctl, allowing various option flags
to be set on an nbd device. One of the new flags tells the nbd client
to send discard requests to the server.
The second patch adds handling of discard requests to nbd when
NBD_FLAG_SEND_TRIM is set.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] [RESEND] add discard support to nbd

2012-08-29 Thread paul . clements
Description: This patch adds discard support to nbd. When the nbd client
system receives a discard request, this will be passed along to the nbd
server system, where the nbd-server will respond by performing:
fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements paul.cleme...@steeleye.com
---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c544bb4..a014169 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd)
case  NBD_CMD_READ: return read;
case NBD_CMD_WRITE: return write;
case  NBD_CMD_DISC: return disconnect;
+   case  NBD_CMD_TRIM: return trim/discard;
}
return invalid;
 }
@@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct 
request *req)
 
nbd_cmd(req) = NBD_CMD_READ;
if (rq_data_dir(req) == WRITE) {
-   nbd_cmd(req) = NBD_CMD_WRITE;
+   if ((req-cmd_flags  REQ_DISCARD)) {
+   WARN_ON(!(nbd-flags  NBD_FLAG_SEND_TRIM));
+   nbd_cmd(req) = NBD_CMD_TRIM;
+   } else
+   nbd_cmd(req) = NBD_CMD_WRITE;
if (nbd-flags  NBD_FLAG_READ_ONLY) {
dev_err(disk_to_dev(nbd-disk),
Write on read-only\n);
@@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
 
mutex_unlock(nbd-tx_lock);
 
+   if (nbd-flags  NBD_FLAG_SEND_TRIM)
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
+   nbd-disk-queue);
+
thread = kthread_create(nbd_thread, nbd, nbd-disk-disk_name);
if (IS_ERR(thread)) {
mutex_lock(nbd-tx_lock);
@@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct 
nbd_device *nbd,
nbd-file = NULL;
nbd_clear_que(nbd);
dev_warn(disk_to_dev(nbd-disk), queue cleared\n);
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue);
if (file)
fput(file);
nbd-bytesize = 0;
@@ -802,6 +812,9 @@ static int __init nbd_init(void)
 * Tell the block layer that we are not a rotational device
 */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk-queue);
+   disk-queue-limits.discard_granularity = 512;
+   disk-queue-limits.max_discard_sectors = UINT_MAX;
+   disk-queue-limits.discard_zeroes_data = 0;
}
 
if (register_blkdev(NBD_MAJOR, nbd)) {
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index bb349be..3b49a63 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -32,12 +32,16 @@
 enum {
NBD_CMD_READ = 0,
NBD_CMD_WRITE = 1,
-   NBD_CMD_DISC = 2
+   NBD_CMD_DISC = 2,
+   /* there is a gap here to match userspace */
+   NBD_CMD_TRIM = 4
 };
 
 /* values for flags field */
 #define NBD_FLAG_HAS_FLAGS (1  0)
 #define NBD_FLAG_READ_ONLY (1  1)
+/* there is a gap here to match userspace */
+#define NBD_FLAG_SEND_TRIM (1  5) /* send trim/discard */
 
 #define nbd_cmd(req) ((req)-cmd[0])
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] add discard support to nbd

2012-08-28 Thread Paul Clements



nbd-trim-discard-support.diff
Description: Binary data


[PATCH 1/2] add discard support to nbd

2012-08-28 Thread Paul Clements



nbd-set-flags-ioctl.diff
Description: Binary data


[PATCH 0/2] add discard support to nbd

2012-08-28 Thread Paul Clements
This patchset adds discard request support to nbd. This should be good
for inclusion in next.

The first patch adds a set-flags ioctl, allowing various option flags
to be set on an nbd device. One of the new flags tells the nbd client
to send discard requests to the server.
The second patch adds handling of discard requests to nbd when
NBD_FLAG_SEND_TRIM is set.

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] add discard support to nbd

2012-08-28 Thread Paul Clements
This patchset adds discard request support to nbd. This should be good
for inclusion in next.

The first patch adds a set-flags ioctl, allowing various option flags
to be set on an nbd device. One of the new flags tells the nbd client
to send discard requests to the server.
The second patch adds handling of discard requests to nbd when
NBD_FLAG_SEND_TRIM is set.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] add discard support to nbd

2012-08-28 Thread Paul Clements



nbd-set-flags-ioctl.diff
Description: Binary data


[PATCH 2/2] add discard support to nbd

2012-08-28 Thread Paul Clements



nbd-trim-discard-support.diff
Description: Binary data


Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-09 Thread Paul Clements


Take 3...this should address all the issues.




NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something
else.

The two problems I have experienced with nbd and cfq are:
 
1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed
 
There's a similar debian bug that has been filed as well:
 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.
 
2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)
 

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./drivers/block/nbd.c.max_nbd_killed	2008-02-07 16:46:24.0 -0500
+++ ./drivers/block/nbd.c	2008-02-09 08:14:18.0 -0500
@@ -654,6 +654,7 @@ static int __init nbd_init(void)
 
 	for (i = 0; i < nbds_max; i++) {
 		struct gendisk *disk = alloc_disk(1);
+		elevator_t *old_e;
 		if (!disk)
 			goto out;
 		nbd_dev[i].disk = disk;
@@ -667,6 +668,11 @@ static int __init nbd_init(void)
 			put_disk(disk);
 			goto out;
 		}
+		old_e = disk->queue->elevator;
+		if (elevator_init(disk->queue, "deadline") == 0 ||
+			elevator_init(disk->queue, "noop") == 0) {
+elevator_exit(old_e);
+		}
 	}
 
 	if (register_blkdev(NBD_MAJOR, "nbd")) {


Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-09 Thread Paul Clements


Take 3...this should address all the issues.




NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something
else.

The two problems I have experienced with nbd and cfq are:
 
1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed
 
There's a similar debian bug that has been filed as well:
 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.
 
2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)
 

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./drivers/block/nbd.c.max_nbd_killed	2008-02-07 16:46:24.0 -0500
+++ ./drivers/block/nbd.c	2008-02-09 08:14:18.0 -0500
@@ -654,6 +654,7 @@ static int __init nbd_init(void)
 
 	for (i = 0; i  nbds_max; i++) {
 		struct gendisk *disk = alloc_disk(1);
+		elevator_t *old_e;
 		if (!disk)
 			goto out;
 		nbd_dev[i].disk = disk;
@@ -667,6 +668,11 @@ static int __init nbd_init(void)
 			put_disk(disk);
 			goto out;
 		}
+		old_e = disk-queue-elevator;
+		if (elevator_init(disk-queue, deadline) == 0 ||
+			elevator_init(disk-queue, noop) == 0) {
+elevator_exit(old_e);
+		}
 	}
 
 	if (register_blkdev(NBD_MAJOR, nbd)) {


Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-08 Thread Paul Clements

Jens Axboe wrote:

On Fri, Feb 08 2008, Jens Axboe wrote:

On Fri, Feb 08 2008, Paul Clements wrote:

Andrew Morton wrote:
On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap <[EMAIL PROTECTED]> 
wrote:



On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote:

There have been numerous reports of problems with nbd and cfq. Deadline 
gives better performance for nbd, anyway, so let's use it by default.

Please define "problems".  If it's just "slowness" then we can live with
that, but I'd hope that Jens is aware and that it's understood.

It it's "hangs" or "oopses" then we panic.

The two problems I have experienced (which may already be fixed):

1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed

There's a similar debian bug that has been filed as well:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)


There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.

I'm fine with that, it's one of those things we'll do automatically when
we have some sort of disk profile system setup. Devices without seek
penalties should not use AS or CFQ.

Asking for a non-existing elevator is not an issue, but it may trigger
both printks and a switch to another elevator. So if you ask for
"deadline" and it's modular, you'll get cfq again if it's the default.

Your patch looks bad though, you forget to exit the old elevator. And
you don't check the return value of elevator_init().

All in all, your patch definitely needs more work before it can be
included.


Thanks Jens. This one should be better.

--
Paul
NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something
else.

The two problems I have experienced with nbd and cfq are:
 
1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed
 
There's a similar debian bug that has been filed as well:
 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638
 
2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)
 
There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./drivers/block/nbd.c.max_nbd_killed	2008-02-07 16:46:24.0 -0500
+++ ./drivers/block/nbd.c	2008-02-08 16:13:01.0 -0500
@@ -667,6 +667,12 @@ static int __init nbd_init(void)
 			put_disk(disk);
 			goto out;
 		}
+		if (elevator_init(disk->queue, "deadline") != 0) {
+			if (elevator_init(disk->queue, "noop") != 0) {
+put_disk(disk);
+goto out;
+			}
+		}
 	}
 
 	if (register_blkdev(NBD_MAJOR, "nbd")) {


Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-08 Thread Paul Clements

Andrew Morton wrote:

On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap <[EMAIL PROTECTED]> wrote:


On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote:

There have been numerous reports of problems with nbd and cfq. Deadline 
gives better performance for nbd, anyway, so let's use it by default.


Please define "problems".  If it's just "slowness" then we can live with
that, but I'd hope that Jens is aware and that it's understood.

It it's "hangs" or "oopses" then we panic.


The two problems I have experienced (which may already be fixed):

1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed

There's a similar debian bug that has been filed as well:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)


There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.


--
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-08 Thread Paul Clements
There have been numerous reports of problems with nbd and cfq. Deadline 
gives better performance for nbd, anyway, so let's use it by default.


--
Paul
There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default.

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./drivers/block/nbd.c.max_nbd_killed	2008-02-07 16:46:24.0 -0500
+++ ./drivers/block/nbd.c	2008-02-08 11:38:47.0 -0500
@@ -667,6 +667,7 @@ static int __init nbd_init(void)
 			put_disk(disk);
 			goto out;
 		}
+		elevator_init(disk->queue, "deadline");
 	}
 
 	if (register_blkdev(NBD_MAJOR, "nbd")) {


[PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-08 Thread Paul Clements
There have been numerous reports of problems with nbd and cfq. Deadline 
gives better performance for nbd, anyway, so let's use it by default.


--
Paul
There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default.

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./drivers/block/nbd.c.max_nbd_killed	2008-02-07 16:46:24.0 -0500
+++ ./drivers/block/nbd.c	2008-02-08 11:38:47.0 -0500
@@ -667,6 +667,7 @@ static int __init nbd_init(void)
 			put_disk(disk);
 			goto out;
 		}
+		elevator_init(disk-queue, deadline);
 	}
 
 	if (register_blkdev(NBD_MAJOR, nbd)) {


Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-08 Thread Paul Clements

Andrew Morton wrote:

On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap [EMAIL PROTECTED] wrote:


On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote:

There have been numerous reports of problems with nbd and cfq. Deadline 
gives better performance for nbd, anyway, so let's use it by default.


Please define problems.  If it's just slowness then we can live with
that, but I'd hope that Jens is aware and that it's understood.

It it's hangs or oopses then we panic.


The two problems I have experienced (which may already be fixed):

1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed

There's a similar debian bug that has been filed as well:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)


There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.


--
Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler

2008-02-08 Thread Paul Clements

Jens Axboe wrote:

On Fri, Feb 08 2008, Jens Axboe wrote:

On Fri, Feb 08 2008, Paul Clements wrote:

Andrew Morton wrote:
On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap [EMAIL PROTECTED] 
wrote:



On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote:

There have been numerous reports of problems with nbd and cfq. Deadline 
gives better performance for nbd, anyway, so let's use it by default.

Please define problems.  If it's just slowness then we can live with
that, but I'd hope that Jens is aware and that it's understood.

It it's hangs or oopses then we panic.

The two problems I have experienced (which may already be fixed):

1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed

There's a similar debian bug that has been filed as well:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)


There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.

I'm fine with that, it's one of those things we'll do automatically when
we have some sort of disk profile system setup. Devices without seek
penalties should not use AS or CFQ.

Asking for a non-existing elevator is not an issue, but it may trigger
both printks and a switch to another elevator. So if you ask for
deadline and it's modular, you'll get cfq again if it's the default.

Your patch looks bad though, you forget to exit the old elevator. And
you don't check the return value of elevator_init().

All in all, your patch definitely needs more work before it can be
included.


Thanks Jens. This one should be better.

--
Paul
NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something
else.

The two problems I have experienced with nbd and cfq are:
 
1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed
 
There's a similar debian bug that has been filed as well:
 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638
 
2) nbd performs about 10% better (the last time I tested) with deadline 
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not 
being a real disk], and you end up going through the I/O scheduler on 
the nbd server anyway, so it makes sense that deadline is better with nbd)
 
There have been posts to nbd-general mailing list about problems with 
cfq and nbd also.

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./drivers/block/nbd.c.max_nbd_killed	2008-02-07 16:46:24.0 -0500
+++ ./drivers/block/nbd.c	2008-02-08 16:13:01.0 -0500
@@ -667,6 +667,12 @@ static int __init nbd_init(void)
 			put_disk(disk);
 			goto out;
 		}
+		if (elevator_init(disk-queue, deadline) != 0) {
+			if (elevator_init(disk-queue, noop) != 0) {
+put_disk(disk);
+goto out;
+			}
+		}
 	}
 
 	if (register_blkdev(NBD_MAJOR, nbd)) {


Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024

2008-01-30 Thread Paul Clements

Andrew Morton wrote:

On Tue, 29 Jan 2008 21:04:05 -0500 Paul Clements <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:




much nicer?

OK, yes, I was going for quick and easy, but you've got a point...

We do need to move the kcalloc into nbd_init instead of nbd_cleanup, 
though -- that works a little better. Patch is attached. Thanks for the 
suggestion.


--
Paul


[nbd_max_nbd_killed.diff  text/x-patch (1.2KB)]

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./include/linux/nbd.h.TIMEOUT   2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h   2008-01-29 20:01:33.0 -0500


Could we have a complete changelog please?  As I mentioned in the
earlier email?




[PATCH 1/1] NBD: remove limit on max number of nbd devices

Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to
any number. In certain scenarios where devices are used sparsely we
have run into the 128 device limit.

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./include/linux/nbd.h.TIMEOUT	2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h	2008-01-29 20:01:33.0 -0500
@@ -35,7 +35,6 @@ enum {
 };
 
 #define nbd_cmd(req) ((req)->cmd[0])
-#define MAX_NBD 128
 
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__
--- ./drivers/block/nbd.c.ORIG	2008-01-29 20:01:12.0 -0500
+++ ./drivers/block/nbd.c	2008-01-29 20:08:30.0 -0500
@@ -53,7 +53,7 @@ static unsigned int debugflags;
 #endif /* NDEBUG */
 
 static unsigned int nbds_max = 16;
-static struct nbd_device nbd_dev[MAX_NBD];
+static struct nbd_device *nbd_dev;
 
 /*
  * Use just one lock (or at most 1 per NIC). Two arguments for this:
@@ -659,11 +659,9 @@ static int __init nbd_init(void)
 
 	BUILD_BUG_ON(sizeof(struct nbd_request) != 28);
 
-	if (nbds_max > MAX_NBD) {
-		printk(KERN_CRIT "nbd: cannot allocate more than %u nbds; %u requested.\n", MAX_NBD,
-nbds_max);
-		return -EINVAL;
-	}
+	nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
+	if (!nbd_dev)
+		return -ENOMEM;
 
 	for (i = 0; i < nbds_max; i++) {
 		struct gendisk *disk = alloc_disk(1);


Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024

2008-01-30 Thread Paul Clements

Andrew Morton wrote:

On Tue, 29 Jan 2008 21:04:05 -0500 Paul Clements [EMAIL PROTECTED] wrote:


Andrew Morton wrote:

snip


much nicer?

OK, yes, I was going for quick and easy, but you've got a point...

We do need to move the kcalloc into nbd_init instead of nbd_cleanup, 
though -- that works a little better. Patch is attached. Thanks for the 
suggestion.


--
Paul


[nbd_max_nbd_killed.diff  text/x-patch (1.2KB)]

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./include/linux/nbd.h.TIMEOUT   2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h   2008-01-29 20:01:33.0 -0500


Could we have a complete changelog please?  As I mentioned in the
earlier email?




[PATCH 1/1] NBD: remove limit on max number of nbd devices

Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to
any number. In certain scenarios where devices are used sparsely we
have run into the 128 device limit.

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./include/linux/nbd.h.TIMEOUT	2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h	2008-01-29 20:01:33.0 -0500
@@ -35,7 +35,6 @@ enum {
 };
 
 #define nbd_cmd(req) ((req)-cmd[0])
-#define MAX_NBD 128
 
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__
--- ./drivers/block/nbd.c.ORIG	2008-01-29 20:01:12.0 -0500
+++ ./drivers/block/nbd.c	2008-01-29 20:08:30.0 -0500
@@ -53,7 +53,7 @@ static unsigned int debugflags;
 #endif /* NDEBUG */
 
 static unsigned int nbds_max = 16;
-static struct nbd_device nbd_dev[MAX_NBD];
+static struct nbd_device *nbd_dev;
 
 /*
  * Use just one lock (or at most 1 per NIC). Two arguments for this:
@@ -659,11 +659,9 @@ static int __init nbd_init(void)
 
 	BUILD_BUG_ON(sizeof(struct nbd_request) != 28);
 
-	if (nbds_max  MAX_NBD) {
-		printk(KERN_CRIT nbd: cannot allocate more than %u nbds; %u requested.\n, MAX_NBD,
-nbds_max);
-		return -EINVAL;
-	}
+	nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
+	if (!nbd_dev)
+		return -ENOMEM;
 
 	for (i = 0; i  nbds_max; i++) {
 		struct gendisk *disk = alloc_disk(1);


Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024

2008-01-29 Thread Paul Clements

Andrew Morton wrote:




much nicer?


OK, yes, I was going for quick and easy, but you've got a point...

We do need to move the kcalloc into nbd_init instead of nbd_cleanup, 
though -- that works a little better. Patch is attached. Thanks for the 
suggestion.


--
Paul

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./include/linux/nbd.h.TIMEOUT	2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h	2008-01-29 20:01:33.0 -0500
@@ -35,7 +35,6 @@ enum {
 };
 
 #define nbd_cmd(req) ((req)->cmd[0])
-#define MAX_NBD 128
 
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__
--- ./drivers/block/nbd.c.ORIG	2008-01-29 20:01:12.0 -0500
+++ ./drivers/block/nbd.c	2008-01-29 20:08:30.0 -0500
@@ -53,7 +53,7 @@ static unsigned int debugflags;
 #endif /* NDEBUG */
 
 static unsigned int nbds_max = 16;
-static struct nbd_device nbd_dev[MAX_NBD];
+static struct nbd_device *nbd_dev;
 
 /*
  * Use just one lock (or at most 1 per NIC). Two arguments for this:
@@ -659,11 +659,9 @@ static int __init nbd_init(void)
 
 	BUILD_BUG_ON(sizeof(struct nbd_request) != 28);
 
-	if (nbds_max > MAX_NBD) {
-		printk(KERN_CRIT "nbd: cannot allocate more than %u nbds; %u requested.\n", MAX_NBD,
-nbds_max);
-		return -EINVAL;
-	}
+	nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
+	if (!nbd_dev)
+		return -ENOMEM;
 
 	for (i = 0; i < nbds_max; i++) {
 		struct gendisk *disk = alloc_disk(1);


[PATCH 1/1] NBD: raise max number of nbd devices to 1024

2008-01-29 Thread Paul Clements

The limit was 128. This changes it to 1024.

--
Paul

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./include/linux/nbd.h.TIMEOUT	2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h	2008-01-29 14:35:17.0 -0500
@@ -35,7 +35,7 @@ enum {
 };
 
 #define nbd_cmd(req) ((req)->cmd[0])
-#define MAX_NBD 128
+#define MAX_NBD 1024
 
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__


[PATCH 1/1] NBD: raise max number of nbd devices to 1024

2008-01-29 Thread Paul Clements

The limit was 128. This changes it to 1024.

--
Paul

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./include/linux/nbd.h.TIMEOUT	2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h	2008-01-29 14:35:17.0 -0500
@@ -35,7 +35,7 @@ enum {
 };
 
 #define nbd_cmd(req) ((req)-cmd[0])
-#define MAX_NBD 128
+#define MAX_NBD 1024
 
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__


Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024

2008-01-29 Thread Paul Clements

Andrew Morton wrote:

snip


much nicer?


OK, yes, I was going for quick and easy, but you've got a point...

We do need to move the kcalloc into nbd_init instead of nbd_cleanup, 
though -- that works a little better. Patch is attached. Thanks for the 
suggestion.


--
Paul

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./include/linux/nbd.h.TIMEOUT	2007-08-22 13:18:12.0 -0400
+++ ./include/linux/nbd.h	2008-01-29 20:01:33.0 -0500
@@ -35,7 +35,6 @@ enum {
 };
 
 #define nbd_cmd(req) ((req)-cmd[0])
-#define MAX_NBD 128
 
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__
--- ./drivers/block/nbd.c.ORIG	2008-01-29 20:01:12.0 -0500
+++ ./drivers/block/nbd.c	2008-01-29 20:08:30.0 -0500
@@ -53,7 +53,7 @@ static unsigned int debugflags;
 #endif /* NDEBUG */
 
 static unsigned int nbds_max = 16;
-static struct nbd_device nbd_dev[MAX_NBD];
+static struct nbd_device *nbd_dev;
 
 /*
  * Use just one lock (or at most 1 per NIC). Two arguments for this:
@@ -659,11 +659,9 @@ static int __init nbd_init(void)
 
 	BUILD_BUG_ON(sizeof(struct nbd_request) != 28);
 
-	if (nbds_max  MAX_NBD) {
-		printk(KERN_CRIT nbd: cannot allocate more than %u nbds; %u requested.\n, MAX_NBD,
-nbds_max);
-		return -EINVAL;
-	}
+	nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL);
+	if (!nbd_dev)
+		return -ENOMEM;
 
 	for (i = 0; i  nbds_max; i++) {
 		struct gendisk *disk = alloc_disk(1);


Re: [PATCH 2/2] NBD: allow hung network I/O to be cancelled

2007-08-24 Thread Paul Clements

Mike Snitzer wrote:

On 8/24/07, Paul Clements <[EMAIL PROTECTED]> wrote:

This patch allows NBD I/O to be cancelled when a network outage occurs.
Previously, I/O would just hang, and if enough I/O was hung in nbd, the
system (at least user-level) would completely hang until a TCP timeout
(default, 15 minutes) occurred.

The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit
timeout value (in seconds) to be specified. Any network send that
exceeds the timeout will be cancelled and the nbd connection will be
shut down. I've tested with various timeout values and 6 seconds seems
to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not
called, you get the old (I/O hang) behavior.


Hi Paul,

Thanks for implementing this!  Do you happen to have an associated
nbd-client patch for userspace?  If not I'd be happy to coordinate
with you and Wouter on a patch.


No, I don't. I just basically hardcoded my nbd-client to do a 6 second 
timeout by default, but Wouter will probably want to do something a 
little less hackish for the official nbd-client.


--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] NBD: allow hung network I/O to be cancelled

2007-08-24 Thread Paul Clements
This patch allows NBD I/O to be cancelled when a network outage occurs. 
Previously, I/O would just hang, and if enough I/O was hung in nbd, the 
system (at least user-level) would completely hang until a TCP timeout 
(default, 15 minutes) occurred.


The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit 
timeout value (in seconds) to be specified. Any network send that 
exceeds the timeout will be cancelled and the nbd connection will be 
shut down. I've tested with various timeout values and 6 seconds seems 
to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not 
called, you get the old (I/O hang) behavior.


Patch applies and was tested against 2.6.23-rc2-mm2.

Thanks,
Paul

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./drivers/block/nbd.c.SIZE_ZERO	2007-08-16 13:08:06.0 -0400
+++ ./drivers/block/nbd.c	2007-08-23 16:42:09.0 -0400
@@ -113,12 +113,42 @@ static void nbd_end_request(struct reque
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
+static void sock_shutdown(struct nbd_device *lo, int lock)
+{
+	/* Forcibly shutdown the socket causing all listeners
+	 * to error
+	 *
+	 * FIXME: This code is duplicated from sys_shutdown, but
+	 * there should be a more generic interface rather than
+	 * calling socket ops directly here */
+	if (lock)
+		mutex_lock(>tx_lock);
+	if (lo->sock) {
+		printk(KERN_WARNING "%s: shutting down socket\n",
+			lo->disk->disk_name);
+		lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
+		lo->sock = NULL;
+	}
+	if (lock)
+		mutex_unlock(>tx_lock);
+}
+
+static void nbd_xmit_timeout(unsigned long arg)
+{
+	struct task_struct *task = (struct task_struct *)arg;
+
+	printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n",
+		task->comm, task->pid);
+	force_sig(SIGKILL, task);
+}
+
 /*
  *  Send or receive packet.
  */
-static int sock_xmit(struct socket *sock, int send, void *buf, int size,
+static int sock_xmit(struct nbd_device *lo, int send, void *buf, int size,
 		int msg_flags)
 {
+	struct socket *sock = lo->sock;
 	int result;
 	struct msghdr msg;
 	struct kvec iov;
@@ -139,9 +169,20 @@ static int sock_xmit(struct socket *sock
 		msg.msg_controllen = 0;
 		msg.msg_flags = msg_flags | MSG_NOSIGNAL;
 
-		if (send)
+		if (send) {
+			struct timer_list ti;
+
+			if (lo->xmit_timeout) {
+init_timer();
+ti.function = nbd_xmit_timeout;
+ti.data = (unsigned long)current;
+ti.expires = jiffies + lo->xmit_timeout;
+add_timer();
+			}
 			result = kernel_sendmsg(sock, , , 1, size);
-		else
+			if (lo->xmit_timeout)
+del_timer_sync();
+		} else
 			result = kernel_recvmsg(sock, , , 1, size, 0);
 
 		if (signal_pending(current)) {
@@ -150,6 +191,7 @@ static int sock_xmit(struct socket *sock
 current->pid, current->comm,
 dequeue_signal_lock(current, >blocked, ));
 			result = -EINTR;
+			sock_shutdown(lo, !send);
 			break;
 		}
 
@@ -167,23 +209,22 @@ static int sock_xmit(struct socket *sock
 	return result;
 }
 
-static inline int sock_send_bvec(struct socket *sock, struct bio_vec *bvec,
+static inline int sock_send_bvec(struct nbd_device *lo, struct bio_vec *bvec,
 		int flags)
 {
 	int result;
 	void *kaddr = kmap(bvec->bv_page);
-	result = sock_xmit(sock, 1, kaddr + bvec->bv_offset, bvec->bv_len,
-			flags);
+	result = sock_xmit(lo, 1, kaddr + bvec->bv_offset, bvec->bv_len, flags);
 	kunmap(bvec->bv_page);
 	return result;
 }
 
+/* always call with the tx_lock held */
 static int nbd_send_req(struct nbd_device *lo, struct request *req)
 {
 	int result, i, flags;
 	struct nbd_request request;
 	unsigned long size = req->nr_sectors << 9;
-	struct socket *sock = lo->sock;
 
 	request.magic = htonl(NBD_REQUEST_MAGIC);
 	request.type = htonl(nbd_cmd(req));
@@ -196,8 +237,8 @@ static int nbd_send_req(struct nbd_devic
 			nbdcmd_to_ascii(nbd_cmd(req)),
 			(unsigned long long)req->sector << 9,
 			req->nr_sectors << 9);
-	result = sock_xmit(sock, 1, , sizeof(request),
-			(nbd_cmd(req) == NBD_CMD_WRITE)? MSG_MORE: 0);
+	result = sock_xmit(lo, 1, , sizeof(request),
+			(nbd_cmd(req) == NBD_CMD_WRITE) ? MSG_MORE : 0);
 	if (result <= 0) {
 		printk(KERN_ERR "%s: Send control failed (result %d)\n",
 lo->disk->disk_name, result);
@@ -219,7 +260,7 @@ static int nbd_send_req(struct nbd_devic
 dprintk(DBG_TX, "%s: request %p: sending %d bytes data\n",
 		lo->disk->disk_name, req,
 		bvec->bv_len);
-result = sock_send_bvec(sock, bvec, flags);
+result = sock_send_bvec(lo, bvec, flags);
 if (result <= 0) {
 	printk(KERN_ERR "%s: Send data failed (result %d)\n",
 			lo->disk->disk_name,
@@ -261,11 +302,11 @@ out:
 	return ERR_PTR(err);
 }
 
-static inline int sock_recv_bvec(struct socket *sock, struct bio_vec *bvec)
+static inline int sock_recv_bvec(struct nbd

[PATCH 1/2] NBD: set uninitialized devices to size 0

2007-08-24 Thread Paul Clements
This fixes errors with utilities (such as LVM's vgscan) that try to scan 
all devices. Previously this would generate read errors when 
uninitialized nbd devices were scanned:


# vgscan
  Reading all physical volumes.  This may take a while...
  /dev/nbd0: read failed after 0 of 1024 at 0: Input/output error
  /dev/nbd0: read failed after 0 of 1024 at 509804544: Input/output error
  /dev/nbd0: read failed after 0 of 2048 at 0: Input/output error
  /dev/nbd1: read failed after 0 of 1024 at 509804544: Input/output error
  /dev/nbd1: read failed after 0 of 2048 at 0: Input/output error


From now on, uninitialized nbd devices will have size zero, which 
prevents these errors.


Patch applies and was tested against 2.6.23-rc2-mm2.

Thanks,
Paul

Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

--- ./drivers/block/nbd.c.ERROR_RETURNS	2007-08-15 15:41:19.0 -0400
+++ ./drivers/block/nbd.c	2007-08-16 13:08:06.0 -0400
@@ -589,6 +589,9 @@ static int nbd_ioctl(struct inode *inode
 		printk(KERN_WARNING "%s: queue cleared\n", lo->disk->disk_name);
 		if (file)
 			fput(file);
+		lo->bytesize = 0;
+		inode->i_bdev->bd_inode->i_size = 0;
+		set_capacity(lo->disk, 0);
 		return lo->harderror;
 	case NBD_CLEAR_QUE:
 		/*
@@ -666,14 +669,14 @@ static int __init nbd_init(void)
 		mutex_init(_dev[i].tx_lock);
 		init_waitqueue_head(_dev[i].active_wq);
 		nbd_dev[i].blksize = 1024;
-		nbd_dev[i].bytesize = 0x7c00ULL << 10; /* 2TB */
+		nbd_dev[i].bytesize = 0;
 		disk->major = NBD_MAJOR;
 		disk->first_minor = i;
 		disk->fops = _fops;
 		disk->private_data = _dev[i];
 		disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
 		sprintf(disk->disk_name, "nbd%d", i);
-		set_capacity(disk, 0x7c00ULL << 1); /* 2 TB */
+		set_capacity(disk, 0);
 		add_disk(disk);
 	}
 


[PATCH 1/2] NBD: set uninitialized devices to size 0

2007-08-24 Thread Paul Clements
This fixes errors with utilities (such as LVM's vgscan) that try to scan 
all devices. Previously this would generate read errors when 
uninitialized nbd devices were scanned:


# vgscan
  Reading all physical volumes.  This may take a while...
  /dev/nbd0: read failed after 0 of 1024 at 0: Input/output error
  /dev/nbd0: read failed after 0 of 1024 at 509804544: Input/output error
  /dev/nbd0: read failed after 0 of 2048 at 0: Input/output error
  /dev/nbd1: read failed after 0 of 1024 at 509804544: Input/output error
  /dev/nbd1: read failed after 0 of 2048 at 0: Input/output error


From now on, uninitialized nbd devices will have size zero, which 
prevents these errors.


Patch applies and was tested against 2.6.23-rc2-mm2.

Thanks,
Paul

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./drivers/block/nbd.c.ERROR_RETURNS	2007-08-15 15:41:19.0 -0400
+++ ./drivers/block/nbd.c	2007-08-16 13:08:06.0 -0400
@@ -589,6 +589,9 @@ static int nbd_ioctl(struct inode *inode
 		printk(KERN_WARNING %s: queue cleared\n, lo-disk-disk_name);
 		if (file)
 			fput(file);
+		lo-bytesize = 0;
+		inode-i_bdev-bd_inode-i_size = 0;
+		set_capacity(lo-disk, 0);
 		return lo-harderror;
 	case NBD_CLEAR_QUE:
 		/*
@@ -666,14 +669,14 @@ static int __init nbd_init(void)
 		mutex_init(nbd_dev[i].tx_lock);
 		init_waitqueue_head(nbd_dev[i].active_wq);
 		nbd_dev[i].blksize = 1024;
-		nbd_dev[i].bytesize = 0x7c00ULL  10; /* 2TB */
+		nbd_dev[i].bytesize = 0;
 		disk-major = NBD_MAJOR;
 		disk-first_minor = i;
 		disk-fops = nbd_fops;
 		disk-private_data = nbd_dev[i];
 		disk-flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
 		sprintf(disk-disk_name, nbd%d, i);
-		set_capacity(disk, 0x7c00ULL  1); /* 2 TB */
+		set_capacity(disk, 0);
 		add_disk(disk);
 	}
 


[PATCH 2/2] NBD: allow hung network I/O to be cancelled

2007-08-24 Thread Paul Clements
This patch allows NBD I/O to be cancelled when a network outage occurs. 
Previously, I/O would just hang, and if enough I/O was hung in nbd, the 
system (at least user-level) would completely hang until a TCP timeout 
(default, 15 minutes) occurred.


The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit 
timeout value (in seconds) to be specified. Any network send that 
exceeds the timeout will be cancelled and the nbd connection will be 
shut down. I've tested with various timeout values and 6 seconds seems 
to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not 
called, you get the old (I/O hang) behavior.


Patch applies and was tested against 2.6.23-rc2-mm2.

Thanks,
Paul

Signed-Off-By: Paul Clements [EMAIL PROTECTED]

--- ./drivers/block/nbd.c.SIZE_ZERO	2007-08-16 13:08:06.0 -0400
+++ ./drivers/block/nbd.c	2007-08-23 16:42:09.0 -0400
@@ -113,12 +113,42 @@ static void nbd_end_request(struct reque
 	spin_unlock_irqrestore(q-queue_lock, flags);
 }
 
+static void sock_shutdown(struct nbd_device *lo, int lock)
+{
+	/* Forcibly shutdown the socket causing all listeners
+	 * to error
+	 *
+	 * FIXME: This code is duplicated from sys_shutdown, but
+	 * there should be a more generic interface rather than
+	 * calling socket ops directly here */
+	if (lock)
+		mutex_lock(lo-tx_lock);
+	if (lo-sock) {
+		printk(KERN_WARNING %s: shutting down socket\n,
+			lo-disk-disk_name);
+		lo-sock-ops-shutdown(lo-sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
+		lo-sock = NULL;
+	}
+	if (lock)
+		mutex_unlock(lo-tx_lock);
+}
+
+static void nbd_xmit_timeout(unsigned long arg)
+{
+	struct task_struct *task = (struct task_struct *)arg;
+
+	printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n,
+		task-comm, task-pid);
+	force_sig(SIGKILL, task);
+}
+
 /*
  *  Send or receive packet.
  */
-static int sock_xmit(struct socket *sock, int send, void *buf, int size,
+static int sock_xmit(struct nbd_device *lo, int send, void *buf, int size,
 		int msg_flags)
 {
+	struct socket *sock = lo-sock;
 	int result;
 	struct msghdr msg;
 	struct kvec iov;
@@ -139,9 +169,20 @@ static int sock_xmit(struct socket *sock
 		msg.msg_controllen = 0;
 		msg.msg_flags = msg_flags | MSG_NOSIGNAL;
 
-		if (send)
+		if (send) {
+			struct timer_list ti;
+
+			if (lo-xmit_timeout) {
+init_timer(ti);
+ti.function = nbd_xmit_timeout;
+ti.data = (unsigned long)current;
+ti.expires = jiffies + lo-xmit_timeout;
+add_timer(ti);
+			}
 			result = kernel_sendmsg(sock, msg, iov, 1, size);
-		else
+			if (lo-xmit_timeout)
+del_timer_sync(ti);
+		} else
 			result = kernel_recvmsg(sock, msg, iov, 1, size, 0);
 
 		if (signal_pending(current)) {
@@ -150,6 +191,7 @@ static int sock_xmit(struct socket *sock
 current-pid, current-comm,
 dequeue_signal_lock(current, current-blocked, info));
 			result = -EINTR;
+			sock_shutdown(lo, !send);
 			break;
 		}
 
@@ -167,23 +209,22 @@ static int sock_xmit(struct socket *sock
 	return result;
 }
 
-static inline int sock_send_bvec(struct socket *sock, struct bio_vec *bvec,
+static inline int sock_send_bvec(struct nbd_device *lo, struct bio_vec *bvec,
 		int flags)
 {
 	int result;
 	void *kaddr = kmap(bvec-bv_page);
-	result = sock_xmit(sock, 1, kaddr + bvec-bv_offset, bvec-bv_len,
-			flags);
+	result = sock_xmit(lo, 1, kaddr + bvec-bv_offset, bvec-bv_len, flags);
 	kunmap(bvec-bv_page);
 	return result;
 }
 
+/* always call with the tx_lock held */
 static int nbd_send_req(struct nbd_device *lo, struct request *req)
 {
 	int result, i, flags;
 	struct nbd_request request;
 	unsigned long size = req-nr_sectors  9;
-	struct socket *sock = lo-sock;
 
 	request.magic = htonl(NBD_REQUEST_MAGIC);
 	request.type = htonl(nbd_cmd(req));
@@ -196,8 +237,8 @@ static int nbd_send_req(struct nbd_devic
 			nbdcmd_to_ascii(nbd_cmd(req)),
 			(unsigned long long)req-sector  9,
 			req-nr_sectors  9);
-	result = sock_xmit(sock, 1, request, sizeof(request),
-			(nbd_cmd(req) == NBD_CMD_WRITE)? MSG_MORE: 0);
+	result = sock_xmit(lo, 1, request, sizeof(request),
+			(nbd_cmd(req) == NBD_CMD_WRITE) ? MSG_MORE : 0);
 	if (result = 0) {
 		printk(KERN_ERR %s: Send control failed (result %d)\n,
 lo-disk-disk_name, result);
@@ -219,7 +260,7 @@ static int nbd_send_req(struct nbd_devic
 dprintk(DBG_TX, %s: request %p: sending %d bytes data\n,
 		lo-disk-disk_name, req,
 		bvec-bv_len);
-result = sock_send_bvec(sock, bvec, flags);
+result = sock_send_bvec(lo, bvec, flags);
 if (result = 0) {
 	printk(KERN_ERR %s: Send data failed (result %d)\n,
 			lo-disk-disk_name,
@@ -261,11 +302,11 @@ out:
 	return ERR_PTR(err);
 }
 
-static inline int sock_recv_bvec(struct socket *sock, struct bio_vec *bvec)
+static inline int sock_recv_bvec(struct nbd_device *lo, struct bio_vec *bvec)
 {
 	int result;
 	void *kaddr = kmap(bvec-bv_page);
-	result = sock_xmit(sock, 0, kaddr + bvec-bv_offset, bvec-bv_len,
+	result = sock_xmit(lo, 0, kaddr + bvec

Re: [PATCH 2/2] NBD: allow hung network I/O to be cancelled

2007-08-24 Thread Paul Clements

Mike Snitzer wrote:

On 8/24/07, Paul Clements [EMAIL PROTECTED] wrote:

This patch allows NBD I/O to be cancelled when a network outage occurs.
Previously, I/O would just hang, and if enough I/O was hung in nbd, the
system (at least user-level) would completely hang until a TCP timeout
(default, 15 minutes) occurred.

The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit
timeout value (in seconds) to be specified. Any network send that
exceeds the timeout will be cancelled and the nbd connection will be
shut down. I've tested with various timeout values and 6 seconds seems
to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not
called, you get the old (I/O hang) behavior.


Hi Paul,

Thanks for implementing this!  Do you happen to have an associated
nbd-client patch for userspace?  If not I'd be happy to coordinate
with you and Wouter on a patch.


No, I don't. I just basically hardcoded my nbd-client to do a 6 second 
timeout by default, but Wouter will probably want to do something a 
little less hackish for the official nbd-client.


--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])

2007-08-12 Thread Paul Clements

Iustin Pop wrote:

On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote:

On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote:

now, I am not an expert on either option, but three are a couple things that I
would question about the DRDB+MD option

1. when the remote machine is down, how does MD deal with it for reads and
writes?

I suppose it kicks the drive and you'd have to re-add it by hand unless done by
a cronjob.


Yes, and with a bitmap configured on the raid1, you just resync the 
blocks that have been written while the connection was down.




From my tests, since NBD doesn't have a timeout option, MD hangs in the

write to that mirror indefinitely, somewhat like when dealing with a
broken IDE driver/chipset/disk.


Well, if people would like to see a timeout option, I actually coded up 
a patch a couple of years ago to do just that, but I never got it into 
mainline because you can do almost as well by doing a check at 
user-level (I basically ping the nbd connection periodically and if it 
fails, I kill -9 the nbd-client).




2. MD over local drive will alternate reads between mirrors (or so I've been
told), doing so over the network is wrong.

Certainly. In which case you set "write_mostly" (or even write_only, not sure
of its name) on the raid component that is nbd.


3. when writing, will MD wait for the network I/O to get the data saved on the
backup before returning from the syscall? or can it sync the data out lazily

Can't answer this one - ask Neil :)


MD has the write-mostly/write-behind options - which help in this case
but only up to a certain amount.


You can configure write_behind (aka, asynchronous writes) to buffer as 
much data as you have RAM to hold. At a certain point, presumably, you'd 
want to just break the mirror and take the hit of doing a resync once 
your network leg falls too far behind.


--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])

2007-08-12 Thread Paul Clements

Iustin Pop wrote:

On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote:

On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote:

now, I am not an expert on either option, but three are a couple things that I
would question about the DRDB+MD option

1. when the remote machine is down, how does MD deal with it for reads and
writes?

I suppose it kicks the drive and you'd have to re-add it by hand unless done by
a cronjob.


Yes, and with a bitmap configured on the raid1, you just resync the 
blocks that have been written while the connection was down.




From my tests, since NBD doesn't have a timeout option, MD hangs in the

write to that mirror indefinitely, somewhat like when dealing with a
broken IDE driver/chipset/disk.


Well, if people would like to see a timeout option, I actually coded up 
a patch a couple of years ago to do just that, but I never got it into 
mainline because you can do almost as well by doing a check at 
user-level (I basically ping the nbd connection periodically and if it 
fails, I kill -9 the nbd-client).




2. MD over local drive will alternate reads between mirrors (or so I've been
told), doing so over the network is wrong.

Certainly. In which case you set write_mostly (or even write_only, not sure
of its name) on the raid component that is nbd.


3. when writing, will MD wait for the network I/O to get the data saved on the
backup before returning from the syscall? or can it sync the data out lazily

Can't answer this one - ask Neil :)


MD has the write-mostly/write-behind options - which help in this case
but only up to a certain amount.


You can configure write_behind (aka, asynchronous writes) to buffer as 
much data as you have RAM to hold. At a certain point, presumably, you'd 
want to just break the mirror and take the hit of doing a resync once 
your network leg falls too far behind.


--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] nbd: change a parameter's type to remove a memcpy call

2007-07-19 Thread Paul Clements

Denis Cheng wrote:

this memcpy looks so strange, in fact it's merely a pointer dereference,
so I change the parameter's type to refer it more directly,
this could make the memcpy not needed anymore.

in the function nbd_read_stat where nbd_find_request is only once called,
the parameter served should be transformed accordingly.


This is really a matter of preference. The generated code ends up being 
about the same, I think, while your patch makes the call to 
nbd_find_request kind of obtuse. Also, the memcpy's are balanced between 
send_req and find_request, so you can quickly see how the data is being 
transferred (from req into handle, and then back again). Your patch 
makes this less clear, at least to me.


--
Paul



Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 drivers/block/nbd.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 86639c0..a4d8508 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -235,14 +235,11 @@ error_out:
return 1;
 }
 
-static struct request *nbd_find_request(struct nbd_device *lo, char *handle)

+static struct request *nbd_find_request(struct nbd_device *lo, struct request 
*xreq)
 {
struct request *req, *n;
-   struct request *xreq;
int err;
 
-	memcpy(, handle, sizeof(xreq));

-
err = wait_event_interruptible(lo->active_wq, lo->active_req != xreq);
if (unlikely(err))
goto out;
@@ -297,7 +294,7 @@ static struct request *nbd_read_stat(struct nbd_device *lo)
goto harderror;
}
 
-	req = nbd_find_request(lo, reply.handle);

+   req = nbd_find_request(lo, *(struct request **)reply.handle);
if (unlikely(IS_ERR(req))) {
result = PTR_ERR(req);
if (result != -ENOENT)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] nbd: use list_for_each_entry_safe to make it more consolidated and readable

2007-07-19 Thread Paul Clements

Denis Cheng wrote:

Thus the traverse of the loop may delete nodes, use the safe version.

Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 drivers/block/nbd.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c129510..86639c0 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -237,8 +237,7 @@ error_out:
 
 static struct request *nbd_find_request(struct nbd_device *lo, char *handle)

 {
-   struct request *req;
-   struct list_head *tmp;
+   struct request *req, *n;
struct request *xreq;
int err;
 
@@ -249,8 +248,7 @@ static struct request *nbd_find_request(struct nbd_device *lo, char *handle)

goto out;
 
 	spin_lock(>queue_lock);

-   list_for_each(tmp, >queue_head) {
-   req = list_entry(tmp, struct request, queuelist);
+   list_for_each_entry_safe(req, n, >queue_head, queuelist) {
if (req != xreq)
continue;
list_del_init(>queuelist);


Could you name "n" as "tmp" (as in the previous code) so that it's clear 
that's only a temporary variable. Other than that, this looks good.


Thanks,
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] nbd: use list_for_each_entry_safe to make it more consolidated and readable

2007-07-19 Thread Paul Clements

Denis Cheng wrote:

Thus the traverse of the loop may delete nodes, use the safe version.

Signed-off-by: Denis Cheng [EMAIL PROTECTED]
---
 drivers/block/nbd.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c129510..86639c0 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -237,8 +237,7 @@ error_out:
 
 static struct request *nbd_find_request(struct nbd_device *lo, char *handle)

 {
-   struct request *req;
-   struct list_head *tmp;
+   struct request *req, *n;
struct request *xreq;
int err;
 
@@ -249,8 +248,7 @@ static struct request *nbd_find_request(struct nbd_device *lo, char *handle)

goto out;
 
 	spin_lock(lo-queue_lock);

-   list_for_each(tmp, lo-queue_head) {
-   req = list_entry(tmp, struct request, queuelist);
+   list_for_each_entry_safe(req, n, lo-queue_head, queuelist) {
if (req != xreq)
continue;
list_del_init(req-queuelist);


Could you name n as tmp (as in the previous code) so that it's clear 
that's only a temporary variable. Other than that, this looks good.


Thanks,
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] nbd: change a parameter's type to remove a memcpy call

2007-07-19 Thread Paul Clements

Denis Cheng wrote:

this memcpy looks so strange, in fact it's merely a pointer dereference,
so I change the parameter's type to refer it more directly,
this could make the memcpy not needed anymore.

in the function nbd_read_stat where nbd_find_request is only once called,
the parameter served should be transformed accordingly.


This is really a matter of preference. The generated code ends up being 
about the same, I think, while your patch makes the call to 
nbd_find_request kind of obtuse. Also, the memcpy's are balanced between 
send_req and find_request, so you can quickly see how the data is being 
transferred (from req into handle, and then back again). Your patch 
makes this less clear, at least to me.


--
Paul



Signed-off-by: Denis Cheng [EMAIL PROTECTED]
---
 drivers/block/nbd.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 86639c0..a4d8508 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -235,14 +235,11 @@ error_out:
return 1;
 }
 
-static struct request *nbd_find_request(struct nbd_device *lo, char *handle)

+static struct request *nbd_find_request(struct nbd_device *lo, struct request 
*xreq)
 {
struct request *req, *n;
-   struct request *xreq;
int err;
 
-	memcpy(xreq, handle, sizeof(xreq));

-
err = wait_event_interruptible(lo-active_wq, lo-active_req != xreq);
if (unlikely(err))
goto out;
@@ -297,7 +294,7 @@ static struct request *nbd_read_stat(struct nbd_device *lo)
goto harderror;
}
 
-	req = nbd_find_request(lo, reply.handle);

+   req = nbd_find_request(lo, *(struct request **)reply.handle);
if (unlikely(IS_ERR(req))) {
result = PTR_ERR(req);
if (result != -ENOENT)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Mike Snitzer wrote:

On 6/14/07, Paul Clements <[EMAIL PROTECTED]> wrote:

Mike Snitzer wrote:

> Here are the steps to reproduce reliably on SLES10 SP1:
> 1) establish a raid1 mirror (md0) using one local member (sdc1) and
> one remote member (nbd0)
> 2) power off the remote machine, whereby severing nbd0's connection
> 3) perform IO to the filesystem that is on the md0 device to enduce
> the MD layer to mark the nbd device as "faulty"
> 4) cat /proc/mdstat hangs, sysrq trace was collected

That's working as designed. NBD works over TCP. You're going to have to
wait for TCP to time out before an error occurs. Until then I/O will 
hang.


With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the
kernel like I am with RHEL5 and SLES10.  This hang (tcp timeout) is
indefinite oh RHEL5 and ~5min on SLES10.

Should/can I be playing with TCP timeout values?  Why was this not a
concern with kernel.org 2.6.15.7; I was able to "feel" the nbd
connection break immediately; no MD superblock update hangs, no
longwinded (or indefinite) TCP timeout.


I don't know. I've never seen nbd immediately start returning I/O 
errors. Perhaps something was different about the configuration?
If the other other machine rebooted quickly, for instance, you'd get a 
connection reset, which would kill the nbd connection.


--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Mike Snitzer wrote:


Here are the steps to reproduce reliably on SLES10 SP1:
1) establish a raid1 mirror (md0) using one local member (sdc1) and
one remote member (nbd0)
2) power off the remote machine, whereby severing nbd0's connection
3) perform IO to the filesystem that is on the md0 device to enduce
the MD layer to mark the nbd device as "faulty"
4) cat /proc/mdstat hangs, sysrq trace was collected


That's working as designed. NBD works over TCP. You're going to have to 
wait for TCP to time out before an error occurs. Until then I/O will hang.


--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Mike Snitzer wrote:


Just a quick update; it is really starting to look like there is
definitely an issue with the nbd kernel driver.  I booted the SLES10
2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the
nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD
hang.  But it _still_ occurs with the 4-step process I outlined above.

The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server
is no longer running (the node it was running on was powered off)...


What do you mean, nbd should _feel_ an NBD_DISCONNECT ?

NBD_DISCONNECT is a manual process, not an automatic one.

--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Bill Davidsen wrote:

Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, 
but was told it wouldn't work with smp and I kind of lost interest. If 
Neil thinks it should work in 2.6.21 or later I'll test it, since I have 
a machine which wants a fresh install soon, and is both backed up and 
available.


Please stop this. nbd is working perfectly fine, AFAIK. I use it every 
day, and so do 100s of our customers. What exactly is it that not's 
working? If there's a problem, please send the bug report.


Thank You,
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd.c:sock_xmit: cleanup signal related code

2007-06-14 Thread Paul Clements

Oleg Nesterov wrote:

sock_xmit() re-implements sigprocmask() and dequeue_signal_lock().


Yeah, that code was written before those existed. Thanks for the clean up.


Note: I can't understand this dequeue_signal(), it can dequeue SIGKILL
for the user-space task doing nbd_ioctl() ?


So we can interrupt an nbd transmission without waiting for a TCP 
timeout (when we know the network is down).



Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>


Patch looks good to me.

--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nbd.c:sock_xmit: cleanup signal related code

2007-06-14 Thread Paul Clements

Oleg Nesterov wrote:

sock_xmit() re-implements sigprocmask() and dequeue_signal_lock().


Yeah, that code was written before those existed. Thanks for the clean up.


Note: I can't understand this dequeue_signal(), it can dequeue SIGKILL
for the user-space task doing nbd_ioctl() ?


So we can interrupt an nbd transmission without waiting for a TCP 
timeout (when we know the network is down).



Signed-off-by: Oleg Nesterov [EMAIL PROTECTED]


Patch looks good to me.

--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Bill Davidsen wrote:

Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, 
but was told it wouldn't work with smp and I kind of lost interest. If 
Neil thinks it should work in 2.6.21 or later I'll test it, since I have 
a machine which wants a fresh install soon, and is both backed up and 
available.


Please stop this. nbd is working perfectly fine, AFAIK. I use it every 
day, and so do 100s of our customers. What exactly is it that not's 
working? If there's a problem, please send the bug report.


Thank You,
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Mike Snitzer wrote:


Just a quick update; it is really starting to look like there is
definitely an issue with the nbd kernel driver.  I booted the SLES10
2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the
nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD
hang.  But it _still_ occurs with the 4-step process I outlined above.

The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server
is no longer running (the node it was running on was powered off)...


What do you mean, nbd should _feel_ an NBD_DISCONNECT ?

NBD_DISCONNECT is a manual process, not an automatic one.

--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Mike Snitzer wrote:


Here are the steps to reproduce reliably on SLES10 SP1:
1) establish a raid1 mirror (md0) using one local member (sdc1) and
one remote member (nbd0)
2) power off the remote machine, whereby severing nbd0's connection
3) perform IO to the filesystem that is on the md0 device to enduce
the MD layer to mark the nbd device as faulty
4) cat /proc/mdstat hangs, sysrq trace was collected


That's working as designed. NBD works over TCP. You're going to have to 
wait for TCP to time out before an error occurs. Until then I/O will hang.


--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements

Mike Snitzer wrote:

On 6/14/07, Paul Clements [EMAIL PROTECTED] wrote:

Mike Snitzer wrote:

 Here are the steps to reproduce reliably on SLES10 SP1:
 1) establish a raid1 mirror (md0) using one local member (sdc1) and
 one remote member (nbd0)
 2) power off the remote machine, whereby severing nbd0's connection
 3) perform IO to the filesystem that is on the md0 device to enduce
 the MD layer to mark the nbd device as faulty
 4) cat /proc/mdstat hangs, sysrq trace was collected

That's working as designed. NBD works over TCP. You're going to have to
wait for TCP to time out before an error occurs. Until then I/O will 
hang.


With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the
kernel like I am with RHEL5 and SLES10.  This hang (tcp timeout) is
indefinite oh RHEL5 and ~5min on SLES10.

Should/can I be playing with TCP timeout values?  Why was this not a
concern with kernel.org 2.6.15.7; I was able to feel the nbd
connection break immediately; no MD superblock update hangs, no
longwinded (or indefinite) TCP timeout.


I don't know. I've never seen nbd immediately start returning I/O 
errors. Perhaps something was different about the configuration?
If the other other machine rebooted quickly, for instance, you'd get a 
connection reset, which would kill the nbd connection.


--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nbd: show nbd client pid in sysfs

2006-12-05 Thread Paul Clements
This simple patch allows nbd to expose the nbd-client daemon's PID in 
/sys/block/nbd/pid. This is helpful for tracking connection status of 
a device and for determining which nbd devices are currently in use.


Tested against 2.6.19.

Thanks,
Paul
--- ./drivers/block/nbd.c   Wed Nov 29 16:57:37 2006
+++ ./drivers/block/nbd.c   Tue Nov 28 16:09:31 2006
@@ -355,14 +389,30 @@ harderror:
return NULL;
 }
 
+static ssize_t pid_show(struct gendisk *disk, char *page)
+{
+   return sprintf(page, "%ld\n",
+   (long) ((struct nbd_device *)disk->private_data)->pid);
+}
+
+static struct disk_attribute pid_attr = {
+   .attr = { .name = "pid", .mode = S_IRUGO },
+   .show = pid_show,
+};
+   
 static void nbd_do_it(struct nbd_device *lo)
 {
struct request *req;
 
BUG_ON(lo->magic != LO_MAGIC);
 
+   lo->pid = current->pid;
+   sysfs_create_file(>disk->kobj, _attr.attr);
+
while ((req = nbd_read_stat(lo)) != NULL)
nbd_end_request(req);
+
+   sysfs_remove_file(>disk->kobj, _attr.attr);
return;
 }
 
--- ./include/linux/nbd.h   Wed Nov 29 16:57:37 2006
+++ ./include/linux/nbd.h   Mon Dec  4 23:28:30 2006
@@ -64,6 +64,7 @@ struct nbd_device {
struct gendisk *disk;
int blksize;
u64 bytesize;
+   pid_t pid; /* pid of nbd-client, if attached */
 };
 
 #endif


[PATCH] nbd: show nbd client pid in sysfs

2006-12-05 Thread Paul Clements
This simple patch allows nbd to expose the nbd-client daemon's PID in 
/sys/block/nbdx/pid. This is helpful for tracking connection status of 
a device and for determining which nbd devices are currently in use.


Tested against 2.6.19.

Thanks,
Paul
--- ./drivers/block/nbd.c   Wed Nov 29 16:57:37 2006
+++ ./drivers/block/nbd.c   Tue Nov 28 16:09:31 2006
@@ -355,14 +389,30 @@ harderror:
return NULL;
 }
 
+static ssize_t pid_show(struct gendisk *disk, char *page)
+{
+   return sprintf(page, %ld\n,
+   (long) ((struct nbd_device *)disk-private_data)-pid);
+}
+
+static struct disk_attribute pid_attr = {
+   .attr = { .name = pid, .mode = S_IRUGO },
+   .show = pid_show,
+};
+   
 static void nbd_do_it(struct nbd_device *lo)
 {
struct request *req;
 
BUG_ON(lo-magic != LO_MAGIC);
 
+   lo-pid = current-pid;
+   sysfs_create_file(lo-disk-kobj, pid_attr.attr);
+
while ((req = nbd_read_stat(lo)) != NULL)
nbd_end_request(req);
+
+   sysfs_remove_file(lo-disk-kobj, pid_attr.attr);
return;
 }
 
--- ./include/linux/nbd.h   Wed Nov 29 16:57:37 2006
+++ ./include/linux/nbd.h   Mon Dec  4 23:28:30 2006
@@ -64,6 +64,7 @@ struct nbd_device {
struct gendisk *disk;
int blksize;
u64 bytesize;
+   pid_t pid; /* pid of nbd-client, if attached */
 };
 
 #endif


[PATCH 2.4.29] nbd: fix ioctl permissions

2005-01-23 Thread Paul Clements
Hi Marcelo,
Here's a patch for nbd that Rogier recently sent me. It allows non-root 
to do BLKGETSIZE, et al. on nbd devices, which he needs for his data 
recovery applications.

Tested against 2.4.29. Please apply.
Thanks,
Paul
From: Rogier Wolff <[EMAIL PROTECTED]>
Signed-Off-By: Paul Clements <[EMAIL PROTECTED]>

Description: We shouldn't need CAP_SYS_ADMIN to ask for disk capacity and such.

===

diff -ur linux-2.4.28.clean/drivers/block/nbd.c 
linux-2.4.28.nbd-fix/drivers/block/nbd.c
--- linux-2.4.28.clean/drivers/block/nbd.c  Wed Jan 19 18:14:01 2005
+++ linux-2.4.28.nbd-fix/drivers/block/nbd.cWed Jan 19 16:36:59 2005
@@ -408,10 +408,7 @@
int dev, error, temp;
struct request sreq ;
 
-   /* Anyone capable of this syscall can do *real bad* things */
 
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
if (!inode)
return -EINVAL;
dev = MINOR(inode->i_rdev);
@@ -419,6 +416,20 @@
return -ENODEV;
 
lo = _dev[dev];
+
+   /* these are innocent, but */
+   switch (cmd) {
+   case BLKGETSIZE:
+   return put_user(nbd_bytesizes[dev] >> 9, (unsigned long *) arg);
+   case BLKGETSIZE64:
+   return put_user((u64)nbd_bytesizes[dev], (u64 *) arg);
+   }
+
+   /* ... anyone capable of any of the below ioctls can do *real bad* 
+  things */
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
switch (cmd) {
case NBD_DISCONNECT:
printk("NBD_DISCONNECT\n");
@@ -524,10 +535,6 @@
   dev, lo->queue_head.next, lo->queue_head.prev, 
requests_in, requests_out);
return 0;
 #endif
-   case BLKGETSIZE:
-   return put_user(nbd_bytesizes[dev] >> 9, (unsigned long *) arg);
-   case BLKGETSIZE64:
-   return put_user((u64)nbd_bytesizes[dev], (u64 *) arg);
}
return -EINVAL;
 }



[PATCH 2.4.29] nbd: fix ioctl permissions

2005-01-23 Thread Paul Clements
Hi Marcelo,
Here's a patch for nbd that Rogier recently sent me. It allows non-root 
to do BLKGETSIZE, et al. on nbd devices, which he needs for his data 
recovery applications.

Tested against 2.4.29. Please apply.
Thanks,
Paul
From: Rogier Wolff [EMAIL PROTECTED]
Signed-Off-By: Paul Clements [EMAIL PROTECTED]

Description: We shouldn't need CAP_SYS_ADMIN to ask for disk capacity and such.

===

diff -ur linux-2.4.28.clean/drivers/block/nbd.c 
linux-2.4.28.nbd-fix/drivers/block/nbd.c
--- linux-2.4.28.clean/drivers/block/nbd.c  Wed Jan 19 18:14:01 2005
+++ linux-2.4.28.nbd-fix/drivers/block/nbd.cWed Jan 19 16:36:59 2005
@@ -408,10 +408,7 @@
int dev, error, temp;
struct request sreq ;
 
-   /* Anyone capable of this syscall can do *real bad* things */
 
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
if (!inode)
return -EINVAL;
dev = MINOR(inode-i_rdev);
@@ -419,6 +416,20 @@
return -ENODEV;
 
lo = nbd_dev[dev];
+
+   /* these are innocent, but */
+   switch (cmd) {
+   case BLKGETSIZE:
+   return put_user(nbd_bytesizes[dev]  9, (unsigned long *) arg);
+   case BLKGETSIZE64:
+   return put_user((u64)nbd_bytesizes[dev], (u64 *) arg);
+   }
+
+   /* ... anyone capable of any of the below ioctls can do *real bad* 
+  things */
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
switch (cmd) {
case NBD_DISCONNECT:
printk(NBD_DISCONNECT\n);
@@ -524,10 +535,6 @@
   dev, lo-queue_head.next, lo-queue_head.prev, 
requests_in, requests_out);
return 0;
 #endif
-   case BLKGETSIZE:
-   return put_user(nbd_bytesizes[dev]  9, (unsigned long *) arg);
-   case BLKGETSIZE64:
-   return put_user((u64)nbd_bytesizes[dev], (u64 *) arg);
}
return -EINVAL;
 }



kernel panic on 2.2.14 in sg driver

2000-11-16 Thread Paul Clements


I am seeing a kernel panic on 2.2.14. It looks like 2.2.16 also has the
same problem.

Details:

I have been able to reproduce a kernel panic several times with kdb
compiled in and some added printk debug messages and I have now
pinpointed the problem. The panic occurs when the following call is made
in scsi_ioctl_send_command() ("scsi_ioctl.c", line 329):

  if(SDpnt->scsi_request_fn)
(*SDpnt->scsi_request_fn)();

I have verified that scsi_request_fn is a function pointer that points
to the do_sd_request() function ("sd.c", line 530). By adding a debug
printk() right before this call I can see that this function pointer
contains: 0x8489ab80. This is the address where the panic occurs,
labelled "?unknown?" in the stack trace below. After a panic, when I
list the instructions at that address, the code that is there is the
middle of a switch statement in the sg_ioctl() function. The function
pointer is pointing to a bogus address.

The problem appears to be due to the following: 

do_sd_request() is in the sd_mod.o module
sg_ioctl() is in sg.o
scsi_ioctl_send_command() is in scsi_mod.o 

Now sg.o and sd_mod.o have a dependency on scsi_mod.o, but sg.o and
sd_mod.o are independent of each other. I can cause the kernel to panic
by simply unloading sd_mod.o and then performing an ioctl
(SCSI_IOCTL_SEND_COMMAND) on an open sg device. 

Since sg does not depend on sd_mod, the kernel loads sg.o when the sg
device is opened, but does not know to load sd_mod.o. The call to
"scsi_request_fn" then of course causes a panic. This problem is
compounded by the fact that most modern Linux distributions (I'm running
Caldera eServer 2.3 on this box), have an /etc/crontab entry that does:
"/sbin/rmmod -a" every five minutes, so sd_mod gets autocleaned
frequently.

So I guess I have a couple questions:

Does anyone know if the SCSI drivers have been redesigned to avoid this
type of problem in the 2.4 kernel?

Does anyone have a solution or workaround to this problem? As a
workaround, I guess I could avoid the autoclean problem by doing "insmod
sd_mod" in some startup script.

Are there other instances of this type of problem elsewhere in the
kernel? 

If you need/want any more information about this, just let me know...

Thanks,
Paul



kernel stack trace:
--
?unknown? (?pointer?, arg, filp, -25, 1)
sg_ioctl (filp->f_dentry->d_inode, filp, SCSI_IOCTL_SEND_COMMAND, arg)
sys_ioctl (fd, SCSI_IOCTL_SEND_COMMAND, arg)
system_call
 
where `fd' is an open file descriptor for the scsi device
 
and `filp' is a "struct file *" corresponding to the open file
descriptor for the scsi device
 
and `arg' is the "Scsi_Ioctl_Command *" (struct scsi_ioctl_command *)
sent in from the original ioctl call, in this case containing the
following:
 
 inlen = 0
 outlen = 0
 command = { 0 (TEST UNIT READY), 6, 0, 70, 98, 0 }
 data = {empty}
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



kernel panic on 2.2.14 in sg driver

2000-11-16 Thread Paul Clements


I am seeing a kernel panic on 2.2.14. It looks like 2.2.16 also has the
same problem.

Details:

I have been able to reproduce a kernel panic several times with kdb
compiled in and some added printk debug messages and I have now
pinpointed the problem. The panic occurs when the following call is made
in scsi_ioctl_send_command() ("scsi_ioctl.c", line 329):

  if(SDpnt-scsi_request_fn)
(*SDpnt-scsi_request_fn)();

I have verified that scsi_request_fn is a function pointer that points
to the do_sd_request() function ("sd.c", line 530). By adding a debug
printk() right before this call I can see that this function pointer
contains: 0x8489ab80. This is the address where the panic occurs,
labelled "?unknown?" in the stack trace below. After a panic, when I
list the instructions at that address, the code that is there is the
middle of a switch statement in the sg_ioctl() function. The function
pointer is pointing to a bogus address.

The problem appears to be due to the following: 

do_sd_request() is in the sd_mod.o module
sg_ioctl() is in sg.o
scsi_ioctl_send_command() is in scsi_mod.o 

Now sg.o and sd_mod.o have a dependency on scsi_mod.o, but sg.o and
sd_mod.o are independent of each other. I can cause the kernel to panic
by simply unloading sd_mod.o and then performing an ioctl
(SCSI_IOCTL_SEND_COMMAND) on an open sg device. 

Since sg does not depend on sd_mod, the kernel loads sg.o when the sg
device is opened, but does not know to load sd_mod.o. The call to
"scsi_request_fn" then of course causes a panic. This problem is
compounded by the fact that most modern Linux distributions (I'm running
Caldera eServer 2.3 on this box), have an /etc/crontab entry that does:
"/sbin/rmmod -a" every five minutes, so sd_mod gets autocleaned
frequently.

So I guess I have a couple questions:

Does anyone know if the SCSI drivers have been redesigned to avoid this
type of problem in the 2.4 kernel?

Does anyone have a solution or workaround to this problem? As a
workaround, I guess I could avoid the autoclean problem by doing "insmod
sd_mod" in some startup script.

Are there other instances of this type of problem elsewhere in the
kernel? 

If you need/want any more information about this, just let me know...

Thanks,
Paul



kernel stack trace:
--
?unknown? (?pointer?, arg, filp, -25, 1)
sg_ioctl (filp-f_dentry-d_inode, filp, SCSI_IOCTL_SEND_COMMAND, arg)
sys_ioctl (fd, SCSI_IOCTL_SEND_COMMAND, arg)
system_call
 
where `fd' is an open file descriptor for the scsi device
 
and `filp' is a "struct file *" corresponding to the open file
descriptor for the scsi device
 
and `arg' is the "Scsi_Ioctl_Command *" (struct scsi_ioctl_command *)
sent in from the original ioctl call, in this case containing the
following:
 
 inlen = 0
 outlen = 0
 command = { 0 (TEST UNIT READY), 6, 0, 70, 98, 0 }
 data = {empty}
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >