Re: [RFC 3/4] nbd: Create helper functions for ioctls
Markus, Comments below... On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann wrote: > This patch prepares the nbd code for the nbd-root device patch. It > moves the ioctl code into separate functions so they can be called > directly. The patch creates nbd_set_total_size(), nbd_set_blksize(), > nbd_set_sock() and nbd_set_timeout(). > > Signed-off-by: Markus Pargmann > --- > drivers/block/nbd.c | 79 > ++--- > 1 file changed, 57 insertions(+), 22 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 0ff6ebbec252..11f7644be111 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -596,6 +596,55 @@ static void do_nbd_request(struct request_queue *q) > } > } > > +/* > + * Call with tx_lock held > + */ > +static void nbd_set_total_size(struct block_device *bdev, > + struct nbd_device *nbd, size_t size) > +{ > + nbd->bytesize = size; > + bdev->bd_inode->i_size = size; > + set_blocksize(bdev, nbd->blksize); > + set_capacity(nbd->disk, size >> 9); > +} > + > +/* > + * Call with tx_lock held > + */ > +static void nbd_set_blksize(struct block_device *bdev, struct nbd_device > *nbd, > + size_t blksize) > +{ > + u64 new_bytesize; > + > + nbd->blksize = blksize; > + bdev->bd_inode->i_size = nbd->bytesize; > + set_blocksize(bdev, nbd->blksize); > + > + new_bytesize = nbd->bytesize & ~(nbd->blksize-1); > + if (new_bytesize != nbd->bytesize) > + nbd_set_total_size(bdev, nbd, new_bytesize); could just do set_capacity here to save a few cycles, but not a big deal either way... > +} > + > +/* > + * Call with tx_lock held > + */ > +static void nbd_set_sock(struct block_device *bdev, struct nbd_device *nbd, > +struct socket *sock) > +{ > + nbd->sock = sock; > + if (max_part > 0) > + bdev->bd_invalidated = 1; > + nbd->disconnect = 0; /* we're connected now */ > +} > + > +/* > + * Call with tx_lock held > + */ > +static void nbd_set_timeout(struct nbd_device *nbd, int timeout) > +{ > + nbd->xmit_timeout = timeout * HZ; > +} > + > static int nbd_connection_start(struct block_device *bdev, > struct nbd_device *nbd) > { > @@ -733,33 +782,22 @@ static int __nbd_ioctl(struct block_device *bdev, > struct nbd_device *nbd, > if (nbd->sock) > return -EBUSY; > sock = sockfd_lookup(arg, ); > - if (sock) { > - nbd->sock = sock; > - if (max_part > 0) > - bdev->bd_invalidated = 1; > - nbd->disconnect = 0; /* we're connected now */ > - return 0; > - } > - return -EINVAL; > + if (!sock) > + return -EINVAL; > + > + nbd_set_sock(bdev, nbd, sock); need a return 0 here > } > > case NBD_SET_BLKSIZE: > - nbd->blksize = arg; > - nbd->bytesize &= ~(nbd->blksize-1); > - bdev->bd_inode->i_size = nbd->bytesize; > - set_blocksize(bdev, nbd->blksize); > - set_capacity(nbd->disk, nbd->bytesize >> 9); > + nbd_set_blksize(bdev, nbd, arg); > return 0; > > case NBD_SET_SIZE: > - nbd->bytesize = arg & ~(nbd->blksize-1); > - bdev->bd_inode->i_size = nbd->bytesize; > - set_blocksize(bdev, nbd->blksize); > - set_capacity(nbd->disk, nbd->bytesize >> 9); > + nbd_set_total_size(bdev, nbd, arg & ~(nbd->blksize-1)); > return 0; Might consider deprecating this one...it's pretty useless these days, as it has the same function as NBD_SET_SIZE_BLOCKS, but has a smaller device size limit (at least on some platforms)... nbd-client does not use it anymore... > case NBD_SET_TIMEOUT: > - nbd->xmit_timeout = arg * HZ; > + nbd_set_timeout(nbd, arg); > return 0; > > case NBD_SET_FLAGS: > @@ -767,10 +805,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct > nbd_device *nbd, > return 0; > > case NBD_SET_SIZE_BLOCKS: > - nbd->bytesize = ((u64) arg) * nbd->blksize; > - bdev->bd_inode->i_size = nbd->bytesize; > - set_blocksize(bdev, nbd->blksize); > - set_capacity(nbd->disk, nbd->bytesize >> 9); > + nbd_set_total_size(bdev, nbd, ((u64)arg) * nbd->blksize); > return 0; > > case NBD_DO_IT: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH] MAINTAINERS: Update NBD maintainer
Small typo, but otherwise, I approve. On Wed, Jan 28, 2015 at 1:35 PM, Markus Pargmann wrote: > Paul stops maintining NBD and I will take his place from now on. > > Signed-off-by: Markus Pargmann > --- > MAINTAINERS | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 2ebb056cbe0a..4a83259e7f45 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6583,9 +6583,10 @@ F: include/uapi/linux/netrom.h > F: net/netrom/ > > NETWORK BLOCK DEVICE (NBD) > -M: Paul Clements > +M: Markus Pargmann > S: Maintained > L: nbd-gene...@lists.sourceforge.net > +T: git git://git.pengutronix.de/git/mpa/linux-nbd.git > F: Documentation/blockdev/nbd.txt > F: drivers/block/nbd.c > F: include/linux/nbd.h > -- > 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/4] nbd: Create helper functions for ioctls
Markus, Comments below... On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann m...@pengutronix.de wrote: This patch prepares the nbd code for the nbd-root device patch. It moves the ioctl code into separate functions so they can be called directly. The patch creates nbd_set_total_size(), nbd_set_blksize(), nbd_set_sock() and nbd_set_timeout(). Signed-off-by: Markus Pargmann m...@pengutronix.de --- drivers/block/nbd.c | 79 ++--- 1 file changed, 57 insertions(+), 22 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 0ff6ebbec252..11f7644be111 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -596,6 +596,55 @@ static void do_nbd_request(struct request_queue *q) } } +/* + * Call with tx_lock held + */ +static void nbd_set_total_size(struct block_device *bdev, + struct nbd_device *nbd, size_t size) +{ + nbd-bytesize = size; + bdev-bd_inode-i_size = size; + set_blocksize(bdev, nbd-blksize); + set_capacity(nbd-disk, size 9); +} + +/* + * Call with tx_lock held + */ +static void nbd_set_blksize(struct block_device *bdev, struct nbd_device *nbd, + size_t blksize) +{ + u64 new_bytesize; + + nbd-blksize = blksize; + bdev-bd_inode-i_size = nbd-bytesize; + set_blocksize(bdev, nbd-blksize); + + new_bytesize = nbd-bytesize ~(nbd-blksize-1); + if (new_bytesize != nbd-bytesize) + nbd_set_total_size(bdev, nbd, new_bytesize); could just do set_capacity here to save a few cycles, but not a big deal either way... +} + +/* + * Call with tx_lock held + */ +static void nbd_set_sock(struct block_device *bdev, struct nbd_device *nbd, +struct socket *sock) +{ + nbd-sock = sock; + if (max_part 0) + bdev-bd_invalidated = 1; + nbd-disconnect = 0; /* we're connected now */ +} + +/* + * Call with tx_lock held + */ +static void nbd_set_timeout(struct nbd_device *nbd, int timeout) +{ + nbd-xmit_timeout = timeout * HZ; +} + static int nbd_connection_start(struct block_device *bdev, struct nbd_device *nbd) { @@ -733,33 +782,22 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (nbd-sock) return -EBUSY; sock = sockfd_lookup(arg, err); - if (sock) { - nbd-sock = sock; - if (max_part 0) - bdev-bd_invalidated = 1; - nbd-disconnect = 0; /* we're connected now */ - return 0; - } - return -EINVAL; + if (!sock) + return -EINVAL; + + nbd_set_sock(bdev, nbd, sock); need a return 0 here } case NBD_SET_BLKSIZE: - nbd-blksize = arg; - nbd-bytesize = ~(nbd-blksize-1); - bdev-bd_inode-i_size = nbd-bytesize; - set_blocksize(bdev, nbd-blksize); - set_capacity(nbd-disk, nbd-bytesize 9); + nbd_set_blksize(bdev, nbd, arg); return 0; case NBD_SET_SIZE: - nbd-bytesize = arg ~(nbd-blksize-1); - bdev-bd_inode-i_size = nbd-bytesize; - set_blocksize(bdev, nbd-blksize); - set_capacity(nbd-disk, nbd-bytesize 9); + nbd_set_total_size(bdev, nbd, arg ~(nbd-blksize-1)); return 0; Might consider deprecating this one...it's pretty useless these days, as it has the same function as NBD_SET_SIZE_BLOCKS, but has a smaller device size limit (at least on some platforms)... nbd-client does not use it anymore... case NBD_SET_TIMEOUT: - nbd-xmit_timeout = arg * HZ; + nbd_set_timeout(nbd, arg); return 0; case NBD_SET_FLAGS: @@ -767,10 +805,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, return 0; case NBD_SET_SIZE_BLOCKS: - nbd-bytesize = ((u64) arg) * nbd-blksize; - bdev-bd_inode-i_size = nbd-bytesize; - set_blocksize(bdev, nbd-blksize); - set_capacity(nbd-disk, nbd-bytesize 9); + nbd_set_total_size(bdev, nbd, ((u64)arg) * nbd-blksize); return 0; case NBD_DO_IT: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MAINTAINERS: Update NBD maintainer
Small typo, but otherwise, I approve. On Wed, Jan 28, 2015 at 1:35 PM, Markus Pargmann m...@pengutronix.de wrote: Paul stops maintining NBD and I will take his place from now on. Signed-off-by: Markus Pargmann m...@pengutronix.de --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2ebb056cbe0a..4a83259e7f45 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6583,9 +6583,10 @@ F: include/uapi/linux/netrom.h F: net/netrom/ NETWORK BLOCK DEVICE (NBD) -M: Paul Clements paul.cleme...@steeleye.com +M: Markus Pargmann m...@pengutronix.de ^^ here, missing the trailing S: Maintained L: nbd-gene...@lists.sourceforge.net +T: git git://git.pengutronix.de/git/mpa/linux-nbd.git F: Documentation/blockdev/nbd.txt F: drivers/block/nbd.c F: include/linux/nbd.h -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] nbd: fix possible memory leak
On Tue, Jan 27, 2015 at 7:38 AM, Sudip Mukherjee wrote: > we have already allocated memory for nbd_dev, but we were not > releasing that memory and just returning the error value. > > Signed-off-by: Sudip Mukherjee Looks good to me. Acked-by: Paul Clements > --- > > v2: moved kcalloc after the returns. > > drivers/block/nbd.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 4bc2a5c..db93c75 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -803,10 +803,6 @@ static int __init nbd_init(void) > return -EINVAL; > } > > - nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); > - if (!nbd_dev) > - return -ENOMEM; > - > part_shift = 0; > if (max_part > 0) { > part_shift = fls(max_part); > @@ -827,6 +823,10 @@ static int __init nbd_init(void) > > if (nbds_max > 1UL << (MINORBITS - part_shift)) > return -EINVAL; > + > + nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); > + if (!nbd_dev) > + return -ENOMEM; > > for (i = 0; i < nbds_max; i++) { > struct gendisk *disk = alloc_disk(1 << part_shift); > -- > 1.8.1.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] nbd: fix possible memory leak
On Tue, Jan 27, 2015 at 7:38 AM, Sudip Mukherjee sudipm.mukher...@gmail.com wrote: we have already allocated memory for nbd_dev, but we were not releasing that memory and just returning the error value. Signed-off-by: Sudip Mukherjee su...@vectorindia.org Looks good to me. Acked-by: Paul Clements paul.cleme...@steeleye.com --- v2: moved kcalloc after the returns. drivers/block/nbd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4bc2a5c..db93c75 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -803,10 +803,6 @@ static int __init nbd_init(void) return -EINVAL; } - nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); - if (!nbd_dev) - return -ENOMEM; - part_shift = 0; if (max_part 0) { part_shift = fls(max_part); @@ -827,6 +823,10 @@ static int __init nbd_init(void) if (nbds_max 1UL (MINORBITS - part_shift)) return -EINVAL; + + nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); + if (!nbd_dev) + return -ENOMEM; for (i = 0; i nbds_max; i++) { struct gendisk *disk = alloc_disk(1 part_shift); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: fix possible memory leak
Sudip, On Tue, Jan 13, 2015 at 12:19 PM, Sudip Mukherjee wrote: > we have already allocated memory for nbd_dev, but we were not > releasing that memory and just returning the error value. > > Signed-off-by: Sudip Mukherjee > --- > > Hi Paul, > after seeing your mail about maintainers, just started to see the code. > this patch is one way, another way can be to bring the kcalloc part below > these checks. Probably better to move the kcalloc below, rather than adding more branches, wouldn't you agree? Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 2/4] nbd: Split 'DO_IT' into three functions
Markus, This refactor looks OK with the exception of one thing... On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann wrote: > /* Must be called with tx_lock held */ > > static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, > @@ -684,61 +773,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct > nbd_device *nbd, > set_capacity(nbd->disk, nbd->bytesize >> 9); > return 0; > > - case NBD_DO_IT: { > - struct task_struct *thread; > - struct socket *sock; > - int error; > - > - if (nbd->pid) > - return -EBUSY; > - if (!nbd->sock) > - return -EINVAL; > You seem to have done away with these checks. Was that inadvertent or was there a reason for that? The pid check is necessary to prevent two instances of NBD_DO_IT from running. Without the sock check you'll get a null pointer deref in nbd_do_it. Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 2/4] nbd: Split 'DO_IT' into three functions
Markus, This refactor looks OK with the exception of one thing... On Tue, Jan 13, 2015 at 8:44 AM, Markus Pargmann m...@pengutronix.de wrote: /* Must be called with tx_lock held */ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, @@ -684,61 +773,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, set_capacity(nbd-disk, nbd-bytesize 9); return 0; - case NBD_DO_IT: { - struct task_struct *thread; - struct socket *sock; - int error; - - if (nbd-pid) - return -EBUSY; - if (!nbd-sock) - return -EINVAL; You seem to have done away with these checks. Was that inadvertent or was there a reason for that? The pid check is necessary to prevent two instances of NBD_DO_IT from running. Without the sock check you'll get a null pointer deref in nbd_do_it. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: fix possible memory leak
Sudip, On Tue, Jan 13, 2015 at 12:19 PM, Sudip Mukherjee sudipm.mukher...@gmail.com wrote: we have already allocated memory for nbd_dev, but we were not releasing that memory and just returning the error value. Signed-off-by: Sudip Mukherjee su...@vectorindia.org --- Hi Paul, after seeing your mail about maintainers, just started to see the code. this patch is one way, another way can be to bring the kcalloc part below these checks. Probably better to move the kcalloc below, rather than adding more branches, wouldn't you agree? Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NBD Maintainer
On Tue, Jan 13, 2015 at 11:14 AM, Andrey Utkin wrote: > Hi Paul, > could you please describe > - how wide is NBD usage today (any estimation is ok), It depends somewhat on who you consider to be users. There are two groups of NBD "users": 1) integrators and admins (who work directly with nbd) 2) users of the systems and software that the first group creates My guess is that the first group is in the hundreds. The second in the thousands. NBD usage tends to fall into one the following classes: - support of diskless thin clients (mainly at universities and such) - replication (used in conjunction with some form of mirroring scheme, this becomes a network-raid device of sorts) - block device emulation/user-level block device (e.g., qemu-nbd) (because NBD has a fairly flexible design, including a fully user-level server component, it has been used by quite a few people as a user-level block device for various different purposes) > - what is in a TODO list, Nothing specific, other than merging in pending patches, of which there are a few (check the nbd-general archives). > - what are critical bugs or important issues requiring work, if there are any. There is a network timeout issue, which is probably one of the more important fixes to get in: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770479 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NBD Maintainer
All, I've maintained the NBD driver since 2003, but alas, I no longer have the time and energy to continue maintaining it. If someone would like to take over, let me know. Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NBD Maintainer
On Tue, Jan 13, 2015 at 11:14 AM, Andrey Utkin andrey.krieger.ut...@gmail.com wrote: Hi Paul, could you please describe - how wide is NBD usage today (any estimation is ok), It depends somewhat on who you consider to be users. There are two groups of NBD users: 1) integrators and admins (who work directly with nbd) 2) users of the systems and software that the first group creates My guess is that the first group is in the hundreds. The second in the thousands. NBD usage tends to fall into one the following classes: - support of diskless thin clients (mainly at universities and such) - replication (used in conjunction with some form of mirroring scheme, this becomes a network-raid device of sorts) - block device emulation/user-level block device (e.g., qemu-nbd) (because NBD has a fairly flexible design, including a fully user-level server component, it has been used by quite a few people as a user-level block device for various different purposes) - what is in a TODO list, Nothing specific, other than merging in pending patches, of which there are a few (check the nbd-general archives). - what are critical bugs or important issues requiring work, if there are any. There is a network timeout issue, which is probably one of the more important fixes to get in: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770479 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NBD Maintainer
All, I've maintained the NBD driver since 2003, but alas, I no longer have the time and energy to continue maintaining it. If someone would like to take over, let me know. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.
On Sun, May 25, 2014 at 6:22 PM, Hani Benhabiles wrote: > Len field is already set to zero, but not the from field which is sent as > 0xfe00. This makes no sense, and may cause confuse server > implementations doing sanity checks (qemu-nbd is an example.) > > Signed-off-by: Hani Benhabiles > --- > > Compared to v1: > * Zero the request structure instead of conditionally zeroing specific fields. Yeah, this one looks good. Sorry for the confusion on the structs... > drivers/block/nbd.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 3a70ea2..8e1df52 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -243,14 +243,11 @@ static int nbd_send_req(struct nbd_device *nbd, struct > request *req) > struct nbd_request request; > unsigned long size = blk_rq_bytes(req); > > + memset(, 0, sizeof(request)); > request.magic = htonl(NBD_REQUEST_MAGIC); > request.type = htonl(nbd_cmd(req)); > > - if (nbd_cmd(req) == NBD_CMD_FLUSH) { > - /* Other values are reserved for FLUSH requests. */ > - request.from = 0; > - request.len = 0; > - } else { > + if (nbd_cmd(req) != NBD_CMD_FLUSH && nbd_cmd(req) != NBD_CMD_DISC) { > request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9); > request.len = htonl(size); > } > -- > 1.8.3.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.
On Sun, May 25, 2014 at 6:22 PM, Hani Benhabiles kroo...@gmail.com wrote: Len field is already set to zero, but not the from field which is sent as 0xfe00. This makes no sense, and may cause confuse server implementations doing sanity checks (qemu-nbd is an example.) Signed-off-by: Hani Benhabiles h...@linux.com --- Compared to v1: * Zero the request structure instead of conditionally zeroing specific fields. Yeah, this one looks good. Sorry for the confusion on the structs... drivers/block/nbd.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 3a70ea2..8e1df52 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -243,14 +243,11 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req) struct nbd_request request; unsigned long size = blk_rq_bytes(req); + memset(request, 0, sizeof(request)); request.magic = htonl(NBD_REQUEST_MAGIC); request.type = htonl(nbd_cmd(req)); - if (nbd_cmd(req) == NBD_CMD_FLUSH) { - /* Other values are reserved for FLUSH requests. */ - request.from = 0; - request.len = 0; - } else { + if (nbd_cmd(req) != NBD_CMD_FLUSH nbd_cmd(req) != NBD_CMD_DISC) { request.from = cpu_to_be64((u64)blk_rq_pos(req) 9); request.len = htonl(size); } -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.
On Sun, May 25, 2014 at 6:18 AM, Hani Benhabiles wrote: > On Sun, May 18, 2014 at 10:11:13AM +0100, Hani Benhabiles wrote: >> On Fri, May 16, 2014 at 10:56:33PM -0400, Paul Clements wrote: >> > Agreed. But better yet, the request structure should just be zeroed when >> > it's allocated. >> > >> >> It is already initialized in __nbd_ioctl() with the blk_rq_init() call which >> sets the __sector value to -1 (which is 0xfe00 after the left >> shifts.) >> >> This is the only (non-ugly / non-intrusive) way to do it afaict. >> > > Ping! > > Anything blocking this patch ? It's cleaner to just zero the struct and get rid of the conditional zeroing of specific fields. I'll prepare a patch in the next few days. Thanks, Paul >> > On Fri, May 16, 2014 at 7:43 PM, Hani Benhabiles wrote: >> > >> > > Len field is already set to zero, but not the from field which is sent as >> > > 0xfe00. This makes no sense, and may cause confuse server >> > > implementations doing sanity checks (qemu-nbd is an example.) >> > > >> > > Signed-off-by: Hani Benhabiles >> > > --- >> > > drivers/block/nbd.c | 2 +- >> > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > >> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> > > index 3a70ea2..657bdac 100644 >> > > --- a/drivers/block/nbd.c >> > > +++ b/drivers/block/nbd.c >> > > @@ -246,7 +246,7 @@ static int nbd_send_req(struct nbd_device *nbd, >> > > struct >> > > request *req) >> > > request.magic = htonl(NBD_REQUEST_MAGIC); >> > > request.type = htonl(nbd_cmd(req)); >> > > >> > > - if (nbd_cmd(req) == NBD_CMD_FLUSH) { >> > > + if (nbd_cmd(req) == NBD_CMD_FLUSH || nbd_cmd(req) == >> > > NBD_CMD_DISC) >> > > { >> > > /* Other values are reserved for FLUSH requests. */ >> > > request.from = 0; >> > > request.len = 0; >> > > -- >> > > 1.8.3.2 >> > > >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: Zero from and len fields in NBD_CMD_DISCONNECT.
On Sun, May 25, 2014 at 6:18 AM, Hani Benhabiles kroo...@gmail.com wrote: On Sun, May 18, 2014 at 10:11:13AM +0100, Hani Benhabiles wrote: On Fri, May 16, 2014 at 10:56:33PM -0400, Paul Clements wrote: Agreed. But better yet, the request structure should just be zeroed when it's allocated. It is already initialized in __nbd_ioctl() with the blk_rq_init() call which sets the __sector value to -1 (which is 0xfe00 after the left shifts.) This is the only (non-ugly / non-intrusive) way to do it afaict. Ping! Anything blocking this patch ? It's cleaner to just zero the struct and get rid of the conditional zeroing of specific fields. I'll prepare a patch in the next few days. Thanks, Paul On Fri, May 16, 2014 at 7:43 PM, Hani Benhabiles kroo...@gmail.com wrote: Len field is already set to zero, but not the from field which is sent as 0xfe00. This makes no sense, and may cause confuse server implementations doing sanity checks (qemu-nbd is an example.) Signed-off-by: Hani Benhabiles h...@linux.com --- drivers/block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 3a70ea2..657bdac 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -246,7 +246,7 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req) request.magic = htonl(NBD_REQUEST_MAGIC); request.type = htonl(nbd_cmd(req)); - if (nbd_cmd(req) == NBD_CMD_FLUSH) { + if (nbd_cmd(req) == NBD_CMD_FLUSH || nbd_cmd(req) == NBD_CMD_DISC) { /* Other values are reserved for FLUSH requests. */ request.from = 0; request.len = 0; -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: correct disconnect behavior
On Thu, Jun 27, 2013 at 6:28 PM, Andrew Morton wrote: > On Thu, 27 Jun 2013 18:20:37 -0400 Paul Clements > wrote: > OK, but. "Would it be safer to clear ->disconnect in NBD_DO_IT?" About the same in terms of safety. Both ioctls have to be called to set up the device and neither can be called again, until the device is reset. > If not safer, would it be cleaner? I don't know, seems like a toss up. NBD_SET_SOCK is the earliest place that the socket might conceivably be usable, so I wanted to clear disconnect there (e.g., in case an alternate/new version of NBD_DO_IT, as has been discussed, is ever implemented). >> > The cool kids are using bool lately ;) >> > >> >> Hey, maybe I want to be able to compile with gcc 2.7.2 ? :) > > Sob, I miss 2.7.2. It was a good 50% faster than the new improved > models. But I don't think this yearning makes us cool. No, I think it just makes us old :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: correct disconnect behavior
On Wed, Jun 26, 2013 at 7:21 PM, Andrew Morton wrote: > > On Wed, 19 Jun 2013 17:09:18 -0400 (EDT) Paul Clements > wrote: > > This sounds like something which users of 3.10 and earlier kernels > might want, so I added the Cc:stable tag. Please let me know if > you disagree. That is a good idea. It would be useful in stable. > > --- a/drivers/block/nbd.c > > +++ b/drivers/block/nbd.c > > @@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, > > struct nbd_device *nbd, > > if (!nbd->sock) > > return -EINVAL; > > > > + nbd->disconnect = 1; > > + > > nbd_send_req(nbd, ); > > return 0; > > } > > @@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, > > struct nbd_device *nbd, > > nbd->sock = SOCKET_I(inode); > > if (max_part > 0) > > bdev->bd_invalidated = 1; > > + nbd->disconnect = 0; /* we're connected now */ > > return 0; > > } else { > > fput(file); > > @@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, > > struct nbd_device *nbd, > > set_capacity(nbd->disk, 0); > > if (max_part > 0) > > ioctl_by_bdev(bdev, BLKRRPART, 0); > > + if (nbd->disconnect) /* user requested, ignore socket errors > > */ > > + return 0; > > return nbd->harderror; > > } > > hm, how does nbd work... Hard to tell as nothing seems to be documented > anywhere :( I think most people look at the nbd-client to see. :) But I will work on some docs for the ioctls... Briefly, though, it goes something like this: nbd-client and nbd-server negotiate parameters, including the server telling the client how big the export is Then the nbd-client does a series of ioctls to set up the device: NBD_SET_SIZE - set the size of the device NBD_SET_SOCK - tell the kernel which socket to use NBD_DO_IT - this ioctl lasts for the life of the device and causes nbd-client to go into a receive loop, handling I/O replies from the nbd-server NBD_DISCONNECT, meanwhile, can be called from a separate thread (usually "nbd-client -d" calls it). This requests a disconnect of the device. > > > --- a/include/linux/nbd.h > > +++ b/include/linux/nbd.h > > @@ -41,6 +41,7 @@ struct nbd_device { > > u64 bytesize; > > pid_t pid; /* pid of nbd-client, if attached */ > > int xmit_timeout; > > + int disconnect; /* a disconnect has been requested by user */ > > }; > > The cool kids are using bool lately ;) Hey, maybe I want to be able to compile with gcc 2.7.2 ? :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: correct disconnect behavior
On Wed, Jun 26, 2013 at 7:21 PM, Andrew Morton a...@linux-foundation.org wrote: On Wed, 19 Jun 2013 17:09:18 -0400 (EDT) Paul Clements paul.cleme...@steeleye.com wrote: This sounds like something which users of 3.10 and earlier kernels might want, so I added the Cc:stable tag. Please let me know if you disagree. That is a good idea. It would be useful in stable. --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (!nbd-sock) return -EINVAL; + nbd-disconnect = 1; + nbd_send_req(nbd, sreq); return 0; } @@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-sock = SOCKET_I(inode); if (max_part 0) bdev-bd_invalidated = 1; + nbd-disconnect = 0; /* we're connected now */ return 0; } else { fput(file); @@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, set_capacity(nbd-disk, 0); if (max_part 0) ioctl_by_bdev(bdev, BLKRRPART, 0); + if (nbd-disconnect) /* user requested, ignore socket errors */ + return 0; return nbd-harderror; } hm, how does nbd work... Hard to tell as nothing seems to be documented anywhere :( I think most people look at the nbd-client to see. :) But I will work on some docs for the ioctls... Briefly, though, it goes something like this: nbd-client and nbd-server negotiate parameters, including the server telling the client how big the export is Then the nbd-client does a series of ioctls to set up the device: NBD_SET_SIZE - set the size of the device NBD_SET_SOCK - tell the kernel which socket to use NBD_DO_IT - this ioctl lasts for the life of the device and causes nbd-client to go into a receive loop, handling I/O replies from the nbd-server NBD_DISCONNECT, meanwhile, can be called from a separate thread (usually nbd-client -d calls it). This requests a disconnect of the device. --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -41,6 +41,7 @@ struct nbd_device { u64 bytesize; pid_t pid; /* pid of nbd-client, if attached */ int xmit_timeout; + int disconnect; /* a disconnect has been requested by user */ }; The cool kids are using bool lately ;) Hey, maybe I want to be able to compile with gcc 2.7.2 ? :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd: correct disconnect behavior
On Thu, Jun 27, 2013 at 6:28 PM, Andrew Morton a...@linux-foundation.org wrote: On Thu, 27 Jun 2013 18:20:37 -0400 Paul Clements paul.cleme...@us.sios.com wrote: OK, but. Would it be safer to clear -disconnect in NBD_DO_IT? About the same in terms of safety. Both ioctls have to be called to set up the device and neither can be called again, until the device is reset. If not safer, would it be cleaner? I don't know, seems like a toss up. NBD_SET_SOCK is the earliest place that the socket might conceivably be usable, so I wanted to clear disconnect there (e.g., in case an alternate/new version of NBD_DO_IT, as has been discussed, is ever implemented). The cool kids are using bool lately ;) Hey, maybe I want to be able to compile with gcc 2.7.2 ? :) Sob, I miss 2.7.2. It was a good 50% faster than the new improved models. But I don't think this yearning makes us cool. No, I think it just makes us old :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: correct disconnect behavior
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT ioctl) the return from NBD_DO_IT is undefined (it is usually one of several error codes). This means that nbd-client does not know if a manual disconnect was performed or whether a network error occurred. Because of this, nbd-client's persist mode (which tries to reconnect after error, but not after manual disconnect) does not always work correctly. This change fixes this by causing NBD_DO_IT to always return 0 if a user requests a disconnect. This means that nbd-client can correctly either persist the connection (if an error occurred) or disconnect (if the user requested it). Signed-off-by: Paul Clements --- drivers/block/nbd.c |5 + include/linux/nbd.h |1 + 2 files changed, 6 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 7fecc78..8b7664d 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (!nbd->sock) return -EINVAL; + nbd->disconnect = 1; + nbd_send_req(nbd, ); return 0; } @@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->sock = SOCKET_I(inode); if (max_part > 0) bdev->bd_invalidated = 1; + nbd->disconnect = 0; /* we're connected now */ return 0; } else { fput(file); @@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, set_capacity(nbd->disk, 0); if (max_part > 0) ioctl_by_bdev(bdev, BLKRRPART, 0); + if (nbd->disconnect) /* user requested, ignore socket errors */ + return 0; return nbd->harderror; } diff --git a/include/linux/nbd.h b/include/linux/nbd.h index 4871170..ae4981e 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -41,6 +41,7 @@ struct nbd_device { u64 bytesize; pid_t pid; /* pid of nbd-client, if attached */ int xmit_timeout; + int disconnect; /* a disconnect has been requested by user */ }; #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: correct disconnect behavior
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT ioctl) the return from NBD_DO_IT is undefined (it is usually one of several error codes). This means that nbd-client does not know if a manual disconnect was performed or whether a network error occurred. Because of this, nbd-client's persist mode (which tries to reconnect after error, but not after manual disconnect) does not always work correctly. This change fixes this by causing NBD_DO_IT to always return 0 if a user requests a disconnect. This means that nbd-client can correctly either persist the connection (if an error occurred) or disconnect (if the user requested it). Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- drivers/block/nbd.c |5 + include/linux/nbd.h |1 + 2 files changed, 6 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 7fecc78..8b7664d 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -623,6 +623,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (!nbd-sock) return -EINVAL; + nbd-disconnect = 1; + nbd_send_req(nbd, sreq); return 0; } @@ -654,6 +656,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-sock = SOCKET_I(inode); if (max_part 0) bdev-bd_invalidated = 1; + nbd-disconnect = 0; /* we're connected now */ return 0; } else { fput(file); @@ -742,6 +745,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, set_capacity(nbd-disk, 0); if (max_part 0) ioctl_by_bdev(bdev, BLKRRPART, 0); + if (nbd-disconnect) /* user requested, ignore socket errors */ + return 0; return nbd-harderror; } diff --git a/include/linux/nbd.h b/include/linux/nbd.h index 4871170..ae4981e 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -41,6 +41,7 @@ struct nbd_device { u64 bytesize; pid_t pid; /* pid of nbd-client, if attached */ int xmit_timeout; + int disconnect; /* a disconnect has been requested by user */ }; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: remove bogus BUG_ON in NBD_CLEAR_QUE
From: Michal Belczyk The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job is now done by two other ioctls). We should stop trying to make bogus assertions in it. Also, user-level code should remove calls to NBD_CLEAR_QUE, ASAP. Signed-off-by: Michal Belczyk Signed-off-by: Paul Clements --- nbd.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 7fecc78..21ba264 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -750,7 +750,6 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, * This is for compatibility only. The queue is always cleared * by NBD_DO_IT or NBD_CLEAR_SOCK. */ - BUG_ON(!nbd->sock && !list_empty(>queue_head)); return 0; case NBD_PRINT_DEBUG: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: remove bogus BUG_ON in NBD_CLEAR_QUE
From: Michal Belczyk belc...@bsd.krakow.pl The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job is now done by two other ioctls). We should stop trying to make bogus assertions in it. Also, user-level code should remove calls to NBD_CLEAR_QUE, ASAP. Signed-off-by: Michal Belczyk belc...@bsd.krakow.pl Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- nbd.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 7fecc78..21ba264 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -750,7 +750,6 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, * This is for compatibility only. The queue is always cleared * by NBD_DO_IT or NBD_CLEAR_SOCK. */ - BUG_ON(!nbd-sock !list_empty(nbd-queue_head)); return 0; case NBD_PRINT_DEBUG: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: increase default and max request sizes
This patch raises the default max request size for nbd to 128KB (from 127KB) to get it 4KB aligned. This patch also allows the max request size to be increased (via /sys/block/nbd/queue/max_sectors_kb) to 32MB. The patch makes nbd network traffic more efficient by: - reducing request fragmentation (4KB alignment) - reducing the number of requests (fewer round trips, less network overhead) Especially in high latency networks, larger request size can make a dramatic difference in performance. From: Michal Belczyk Signed-off-by: Paul Clements --- nbd.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 7fecc78..037288e 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -856,6 +856,8 @@ static int __init nbd_init(void) disk->queue->limits.discard_granularity = 512; disk->queue->limits.max_discard_sectors = UINT_MAX; disk->queue->limits.discard_zeroes_data = 0; + blk_queue_max_hw_sectors(disk->queue, 65536); + disk->queue->limits.max_sectors = 256; } if (register_blkdev(NBD_MAJOR, "nbd")) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: increase default and max request sizes
This patch raises the default max request size for nbd to 128KB (from 127KB) to get it 4KB aligned. This patch also allows the max request size to be increased (via /sys/block/nbdx/queue/max_sectors_kb) to 32MB. The patch makes nbd network traffic more efficient by: - reducing request fragmentation (4KB alignment) - reducing the number of requests (fewer round trips, less network overhead) Especially in high latency networks, larger request size can make a dramatic difference in performance. From: Michal Belczyk belc...@bsd.krakow.pl Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- nbd.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 7fecc78..037288e 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -856,6 +856,8 @@ static int __init nbd_init(void) disk-queue-limits.discard_granularity = 512; disk-queue-limits.max_discard_sectors = UINT_MAX; disk-queue-limits.discard_zeroes_data = 0; + blk_queue_max_hw_sectors(disk-queue, 65536); + disk-queue-limits.max_sectors = 256; } if (register_blkdev(NBD_MAJOR, nbd)) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] nbd: show read-only state in sysfs
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini wrote: > Pass the read-only flag to set_device_ro, so that it will be > visible to the block layer and in sysfs. Looks good > Cc: > Cc: Paul Clements > Cc: Andrew Morton > Signed-off-by: Paolo Bonzini > --- > drivers/block/nbd.c |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index a9c5c7a..ef17c2e 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -703,6 +703,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct > nbd_device *nbd, > > mutex_unlock(>tx_lock); > > + if (nbd->flags & NBD_FLAG_READ_ONLY) > + set_device_ro(bdev, true); > if (nbd->flags & NBD_FLAG_SEND_TRIM) > queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, > nbd->disk->queue); > @@ -730,6 +732,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct > nbd_device *nbd, > dev_warn(disk_to_dev(nbd->disk), "queue cleared\n"); > kill_bdev(bdev); > queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, > nbd->disk->queue); > + set_device_ro(bdev, false); > if (file) > fput(file); > nbd->flags = 0; > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini wrote: > There are two problems with shutdown in the NBD driver. The first is > that receiving the NBD_DISCONNECT ioctl does not sync the filesystem; > this is useful because BLKFLSBUF is restricted to processes that have > CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the > block device does not sync the filesystem, either). > The second is that once we clear the socket we have no guarantee that > later reads will come from the same backing storage. Thus the page cache > must be cleaned, lest reads that hit on the page cache will return stale > data from the previously-accessible disk. Paolo, Thanks for this. A problem indeed... Acked-by: paul.cleme...@steeleye.com > Cc: > Cc: > Cc: Paul Clements > Cc: Andrew Morton > Signed-off-by: Paolo Bonzini > --- > drivers/block/nbd.c | 10 ++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 5603765..a9c5c7a 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -608,12 +608,20 @@ static int __nbd_ioctl(struct block_device *bdev, > struct nbd_device *nbd, > struct request sreq; > > dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n"); > + if (!nbd->sock) > + return -EINVAL; > > + mutex_unlock(>tx_lock); > + fsync_bdev(bdev); > + mutex_lock(>tx_lock); > blk_rq_init(NULL, ); > sreq.cmd_type = REQ_TYPE_SPECIAL; > nbd_cmd() = NBD_CMD_DISC; > + > + /* Check again after getting mutex back. */ > if (!nbd->sock) > return -EINVAL; > + > nbd_send_req(nbd, ); > return 0; > } > @@ -627,6 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct > nbd_device *nbd, > nbd_clear_que(nbd); > BUG_ON(!list_empty(>queue_head)); > BUG_ON(!list_empty(>waiting_queue)); > + kill_bdev(bdev); > if (file) > fput(file); > return 0; > @@ -719,6 +728,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct > nbd_device *nbd, > nbd->file = NULL; > nbd_clear_que(nbd); > dev_warn(disk_to_dev(nbd->disk), "queue cleared\n"); > + kill_bdev(bdev); > queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, > nbd->disk->queue); > if (file) > fput(file); > -- > 1.7.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] nbd: support FLUSH requests
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini wrote: > From: Alex Bligh > > The NBD device does not support writeback caching, thus it is not safe > against power losses unless the client opens the target with O_DSYNC or > O_SYNC. > > Add support for a new flag that the server can pass. If the flag is > enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH > command. > > Cc: > Cc: Paul Clements > Cc: Andrew Morton > Signed-off-by: Alex Bligh Looks good. Acked-by: paul.cleme...@steeleye.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] nbd: support FLUSH requests
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini pbonz...@redhat.com wrote: From: Alex Bligh a...@alex.org.uk The NBD device does not support writeback caching, thus it is not safe against power losses unless the client opens the target with O_DSYNC or O_SYNC. Add support for a new flag that the server can pass. If the flag is enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH command. Cc: nbd-gene...@lists.sf.net Cc: Paul Clements paul.cleme...@steeleye.com Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Alex Bligh a...@alex.org.uk Looks good. Acked-by: paul.cleme...@steeleye.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini pbonz...@redhat.com wrote: There are two problems with shutdown in the NBD driver. The first is that receiving the NBD_DISCONNECT ioctl does not sync the filesystem; this is useful because BLKFLSBUF is restricted to processes that have CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the block device does not sync the filesystem, either). The second is that once we clear the socket we have no guarantee that later reads will come from the same backing storage. Thus the page cache must be cleaned, lest reads that hit on the page cache will return stale data from the previously-accessible disk. Paolo, Thanks for this. A problem indeed... Acked-by: paul.cleme...@steeleye.com Cc: sta...@vger.kernel.org Cc: nbd-gene...@lists.sf.net Cc: Paul Clements paul.cleme...@steeleye.com Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/block/nbd.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 5603765..a9c5c7a 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -608,12 +608,20 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, struct request sreq; dev_info(disk_to_dev(nbd-disk), NBD_DISCONNECT\n); + if (!nbd-sock) + return -EINVAL; + mutex_unlock(nbd-tx_lock); + fsync_bdev(bdev); + mutex_lock(nbd-tx_lock); blk_rq_init(NULL, sreq); sreq.cmd_type = REQ_TYPE_SPECIAL; nbd_cmd(sreq) = NBD_CMD_DISC; + + /* Check again after getting mutex back. */ if (!nbd-sock) return -EINVAL; + nbd_send_req(nbd, sreq); return 0; } @@ -627,6 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd_clear_que(nbd); BUG_ON(!list_empty(nbd-queue_head)); BUG_ON(!list_empty(nbd-waiting_queue)); + kill_bdev(bdev); if (file) fput(file); return 0; @@ -719,6 +728,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-file = NULL; nbd_clear_que(nbd); dev_warn(disk_to_dev(nbd-disk), queue cleared\n); + kill_bdev(bdev); queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue); if (file) fput(file); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] nbd: show read-only state in sysfs
On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini pbonz...@redhat.com wrote: Pass the read-only flag to set_device_ro, so that it will be visible to the block layer and in sysfs. Looks good Cc: nbd-gene...@lists.sf.net Cc: Paul Clements paul.cleme...@steeleye.com Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/block/nbd.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index a9c5c7a..ef17c2e 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -703,6 +703,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, mutex_unlock(nbd-tx_lock); + if (nbd-flags NBD_FLAG_READ_ONLY) + set_device_ro(bdev, true); if (nbd-flags NBD_FLAG_SEND_TRIM) queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue); @@ -730,6 +732,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, dev_warn(disk_to_dev(nbd-disk), queue cleared\n); kill_bdev(bdev); queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue); + set_device_ro(bdev, false); if (file) fput(file); nbd-flags = 0; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] nbd: clear waiting_queue on shutdown
From: Paul Clements Signed-off-by: Paul Clements --- nbd.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index d07c9f7..0c03411 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -449,6 +449,14 @@ static void nbd_clear_que(struct nbd_device *nbd) req->errors++; nbd_end_request(req); } + + while (!list_empty(>waiting_queue)) { + req = list_entry(nbd->waiting_queue.next, struct request, +queuelist); + list_del_init(>queuelist); + req->errors++; + nbd_end_request(req); + } } @@ -598,6 +606,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->file = NULL; nbd_clear_que(nbd); BUG_ON(!list_empty(>queue_head)); + BUG_ON(!list_empty(>waiting_queue)); if (file) fput(file); return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/1] nbd: fix dangling requests bug
This patch fixes a serious, but uncommon bug in nbd. This should probably be considered for backport to one or more stable branches. The bug occurs when there is heavy I/O going to the nbd device while, at the same time, a failure (server, network) or manual disconnect of the nbd connection occurs. There is a small window between the time that the nbd_thread is stopped and the socket is shutdown where requests can continue to be queued to nbd's internal waiting_queue. When this happens, those requests are never completed or freed. The fix is to clear the waiting_queue on shutdown of the nbd device, in the same way that the nbd request queue (queue_head) is already being cleared. -- Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 2/2] nbd: handle discard requests
Description: This patch adds discard support to nbd. If the nbd-server supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards for the device (QUEUE_FLAG_DISCARD). If discard support is enabled, then when the nbd client system receives a discard request, this will be passed along to the nbd-server. When the discard request is received by the nbd-server, it will perform: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c544bb4..a014169 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd) case NBD_CMD_READ: return "read"; case NBD_CMD_WRITE: return "write"; case NBD_CMD_DISC: return "disconnect"; + case NBD_CMD_TRIM: return "trim/discard"; } return "invalid"; } @@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { - nbd_cmd(req) = NBD_CMD_WRITE; + if ((req->cmd_flags & REQ_DISCARD)) { + WARN_ON(!(nbd->flags & NBD_FLAG_SEND_TRIM)); + nbd_cmd(req) = NBD_CMD_TRIM; + } else + nbd_cmd(req) = NBD_CMD_WRITE; if (nbd->flags & NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd->disk), "Write on read-only\n"); @@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, mutex_unlock(>tx_lock); + if (nbd->flags & NBD_FLAG_SEND_TRIM) + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, + nbd->disk->queue); + thread = kthread_create(nbd_thread, nbd, nbd->disk->disk_name); if (IS_ERR(thread)) { mutex_lock(>tx_lock); @@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->file = NULL; nbd_clear_que(nbd); dev_warn(disk_to_dev(nbd->disk), "queue cleared\n"); + queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue); if (file) fput(file); nbd->bytesize = 0; @@ -802,6 +812,9 @@ static int __init nbd_init(void) * Tell the block layer that we are not a rotational device */ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue); + disk->queue->limits.discard_granularity = 512; + disk->queue->limits.max_discard_sectors = UINT_MAX; + disk->queue->limits.discard_zeroes_data = 0; } if (register_blkdev(NBD_MAJOR, "nbd")) { diff --git a/include/linux/nbd.h b/include/linux/nbd.h index bb349be..3b49a63 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -32,12 +32,16 @@ enum { NBD_CMD_READ = 0, NBD_CMD_WRITE = 1, - NBD_CMD_DISC = 2 + NBD_CMD_DISC = 2, + /* there is a gap here to match userspace */ + NBD_CMD_TRIM = 4 }; /* values for flags field */ #define NBD_FLAG_HAS_FLAGS(1 << 0) /* nbd-server supports flags */ #define NBD_FLAG_READ_ONLY(1 << 1) /* device is read-only */ +/* there is a gap here to match userspace */ +#define NBD_FLAG_SEND_TRIM(1 << 5) /* send trim/discard */ #define nbd_cmd(req) ((req)->cmd[0]) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] nbd: add set flags ioctl
Description: This patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. This allows the nbd-client to set the device flags (to enable read-only mode, or enable discard support, etc.). Flags are typically specified by the nbd-server. During the negotiation phase of the nbd connection, the server sends its flags to the client. The client then uses NBD_SET_FLAGS to inform the kernel of the options. Also included is a one-line fix to debug output for the set-timeout ioctl. Signed-off-by: Paul Clements --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index d07c9f7..c544bb4 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd) case NBD_SET_SOCK: return "set-sock"; case NBD_SET_BLKSIZE: return "set-blksize"; case NBD_SET_SIZE: return "set-size"; + case NBD_SET_TIMEOUT: return "set-timeout"; + case NBD_SET_FLAGS: return "set-flags"; case NBD_DO_IT: return "do-it"; case NBD_CLEAR_SOCK: return "clear-sock"; case NBD_CLEAR_QUE: return "clear-que"; @@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { nbd_cmd(req) = NBD_CMD_WRITE; - if (nbd->flags & NBD_READ_ONLY) { + if (nbd->flags & NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd->disk), "Write on read-only\n"); goto error_out; @@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->xmit_timeout = arg * HZ; return 0; + case NBD_SET_FLAGS: + nbd->flags = arg; + return 0; + case NBD_SET_SIZE_BLOCKS: nbd->bytesize = ((u64) arg) * nbd->blksize; bdev->bd_inode->i_size = nbd->bytesize; diff --git a/include/linux/nbd.h b/include/linux/nbd.h index d146ca1..bb349be 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -27,6 +27,7 @@ #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 ) #define NBD_DISCONNECT _IO( 0xab, 8 ) #define NBD_SET_TIMEOUT _IO( 0xab, 9 ) +#define NBD_SET_FLAGS _IO( 0xab, 10) enum { NBD_CMD_READ = 0, @@ -34,6 +35,10 @@ enum { NBD_CMD_DISC = 2 }; +/* values for flags field */ +#define NBD_FLAG_HAS_FLAGS(1 << 0) /* nbd-server supports flags */ +#define NBD_FLAG_READ_ONLY(1 << 1) /* device is read-only */ + #define nbd_cmd(req) ((req)->cmd[0]) /* userspace doesn't need the nbd_device structure */ @@ -42,10 +47,6 @@ enum { #include #include -/* values for flags field */ -#define NBD_READ_ONLY 0x0001 -#define NBD_WRITE_NOCHK 0x0002 - struct request; struct nbd_device { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/2] nbd: add discard support
This patchset adds discard request support to nbd. This should be good for inclusion in next. The first patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. One of the new flags (NBD_FLAG_SEND_TRIM) tells the nbd client to send discard requests to the server. The second patch adds handling of discard requests to nbd when NBD_FLAG_SEND_TRIM is set. Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] [RESEND] add discard support to nbd
On Thu, Aug 30, 2012 at 6:03 PM, Andrew Morton wrote: > On Wed, 29 Aug 2012 08:41:02 -0400 > paul.cleme...@steeleye.com wrote: > >> Description: This patch adds discard support to nbd. When the nbd client >> system receives a discard request, this will be passed along to the nbd >> server system, where the nbd-server will respond by performing: >> fallocate(.. FALLOC_FL_PUNCH_HOLE ..) >> >> To punch a hole in the backend storage, which is no longer needed. >> > > What happens if the user is running an older server? In that case, the older server will not have set NBD_SEND_FLAG_TRIM, so QUEUE_FLAG_DISCARD doesn't get set, so no discards will be sent to the device. > I is it possible that because the old server didn't set > NBD_FLAG_SEND_TRIM, the user's screen gets filled with WARN_ONs? No. > Anyway, please make sure this combination was tested! It was. Thanks for the feedback. I will resend v2 of the patches shortly. -- Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] [RESEND] add discard support to nbd
On Thu, Aug 30, 2012 at 6:03 PM, Andrew Morton a...@linux-foundation.org wrote: On Wed, 29 Aug 2012 08:41:02 -0400 paul.cleme...@steeleye.com wrote: Description: This patch adds discard support to nbd. When the nbd client system receives a discard request, this will be passed along to the nbd server system, where the nbd-server will respond by performing: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. What happens if the user is running an older server? In that case, the older server will not have set NBD_SEND_FLAG_TRIM, so QUEUE_FLAG_DISCARD doesn't get set, so no discards will be sent to the device. I is it possible that because the old server didn't set NBD_FLAG_SEND_TRIM, the user's screen gets filled with WARN_ONs? No. Anyway, please make sure this combination was tested! It was. Thanks for the feedback. I will resend v2 of the patches shortly. -- Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/2] nbd: add discard support
This patchset adds discard request support to nbd. This should be good for inclusion in next. The first patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. One of the new flags (NBD_FLAG_SEND_TRIM) tells the nbd client to send discard requests to the server. The second patch adds handling of discard requests to nbd when NBD_FLAG_SEND_TRIM is set. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] nbd: add set flags ioctl
Description: This patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. This allows the nbd-client to set the device flags (to enable read-only mode, or enable discard support, etc.). Flags are typically specified by the nbd-server. During the negotiation phase of the nbd connection, the server sends its flags to the client. The client then uses NBD_SET_FLAGS to inform the kernel of the options. Also included is a one-line fix to debug output for the set-timeout ioctl. Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index d07c9f7..c544bb4 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd) case NBD_SET_SOCK: return set-sock; case NBD_SET_BLKSIZE: return set-blksize; case NBD_SET_SIZE: return set-size; + case NBD_SET_TIMEOUT: return set-timeout; + case NBD_SET_FLAGS: return set-flags; case NBD_DO_IT: return do-it; case NBD_CLEAR_SOCK: return clear-sock; case NBD_CLEAR_QUE: return clear-que; @@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { nbd_cmd(req) = NBD_CMD_WRITE; - if (nbd-flags NBD_READ_ONLY) { + if (nbd-flags NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd-disk), Write on read-only\n); goto error_out; @@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-xmit_timeout = arg * HZ; return 0; + case NBD_SET_FLAGS: + nbd-flags = arg; + return 0; + case NBD_SET_SIZE_BLOCKS: nbd-bytesize = ((u64) arg) * nbd-blksize; bdev-bd_inode-i_size = nbd-bytesize; diff --git a/include/linux/nbd.h b/include/linux/nbd.h index d146ca1..bb349be 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -27,6 +27,7 @@ #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 ) #define NBD_DISCONNECT _IO( 0xab, 8 ) #define NBD_SET_TIMEOUT _IO( 0xab, 9 ) +#define NBD_SET_FLAGS _IO( 0xab, 10) enum { NBD_CMD_READ = 0, @@ -34,6 +35,10 @@ enum { NBD_CMD_DISC = 2 }; +/* values for flags field */ +#define NBD_FLAG_HAS_FLAGS(1 0) /* nbd-server supports flags */ +#define NBD_FLAG_READ_ONLY(1 1) /* device is read-only */ + #define nbd_cmd(req) ((req)-cmd[0]) /* userspace doesn't need the nbd_device structure */ @@ -42,10 +47,6 @@ enum { #include linux/wait.h #include linux/mutex.h -/* values for flags field */ -#define NBD_READ_ONLY 0x0001 -#define NBD_WRITE_NOCHK 0x0002 - struct request; struct nbd_device { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 2/2] nbd: handle discard requests
Description: This patch adds discard support to nbd. If the nbd-server supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards for the device (QUEUE_FLAG_DISCARD). If discard support is enabled, then when the nbd client system receives a discard request, this will be passed along to the nbd-server. When the discard request is received by the nbd-server, it will perform: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c544bb4..a014169 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd) case NBD_CMD_READ: return read; case NBD_CMD_WRITE: return write; case NBD_CMD_DISC: return disconnect; + case NBD_CMD_TRIM: return trim/discard; } return invalid; } @@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { - nbd_cmd(req) = NBD_CMD_WRITE; + if ((req-cmd_flags REQ_DISCARD)) { + WARN_ON(!(nbd-flags NBD_FLAG_SEND_TRIM)); + nbd_cmd(req) = NBD_CMD_TRIM; + } else + nbd_cmd(req) = NBD_CMD_WRITE; if (nbd-flags NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd-disk), Write on read-only\n); @@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, mutex_unlock(nbd-tx_lock); + if (nbd-flags NBD_FLAG_SEND_TRIM) + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, + nbd-disk-queue); + thread = kthread_create(nbd_thread, nbd, nbd-disk-disk_name); if (IS_ERR(thread)) { mutex_lock(nbd-tx_lock); @@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-file = NULL; nbd_clear_que(nbd); dev_warn(disk_to_dev(nbd-disk), queue cleared\n); + queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue); if (file) fput(file); nbd-bytesize = 0; @@ -802,6 +812,9 @@ static int __init nbd_init(void) * Tell the block layer that we are not a rotational device */ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk-queue); + disk-queue-limits.discard_granularity = 512; + disk-queue-limits.max_discard_sectors = UINT_MAX; + disk-queue-limits.discard_zeroes_data = 0; } if (register_blkdev(NBD_MAJOR, nbd)) { diff --git a/include/linux/nbd.h b/include/linux/nbd.h index bb349be..3b49a63 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -32,12 +32,16 @@ enum { NBD_CMD_READ = 0, NBD_CMD_WRITE = 1, - NBD_CMD_DISC = 2 + NBD_CMD_DISC = 2, + /* there is a gap here to match userspace */ + NBD_CMD_TRIM = 4 }; /* values for flags field */ #define NBD_FLAG_HAS_FLAGS(1 0) /* nbd-server supports flags */ #define NBD_FLAG_READ_ONLY(1 1) /* device is read-only */ +/* there is a gap here to match userspace */ +#define NBD_FLAG_SEND_TRIM(1 5) /* send trim/discard */ #define nbd_cmd(req) ((req)-cmd[0]) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/1] nbd: fix dangling requests bug
This patch fixes a serious, but uncommon bug in nbd. This should probably be considered for backport to one or more stable branches. The bug occurs when there is heavy I/O going to the nbd device while, at the same time, a failure (server, network) or manual disconnect of the nbd connection occurs. There is a small window between the time that the nbd_thread is stopped and the socket is shutdown where requests can continue to be queued to nbd's internal waiting_queue. When this happens, those requests are never completed or freed. The fix is to clear the waiting_queue on shutdown of the nbd device, in the same way that the nbd request queue (queue_head) is already being cleared. -- Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] nbd: clear waiting_queue on shutdown
From: Paul Clements paul.cleme...@steeleye.com Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- nbd.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index d07c9f7..0c03411 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -449,6 +449,14 @@ static void nbd_clear_que(struct nbd_device *nbd) req-errors++; nbd_end_request(req); } + + while (!list_empty(nbd-waiting_queue)) { + req = list_entry(nbd-waiting_queue.next, struct request, +queuelist); + list_del_init(req-queuelist); + req-errors++; + nbd_end_request(req); + } } @@ -598,6 +606,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-file = NULL; nbd_clear_que(nbd); BUG_ON(!list_empty(nbd-queue_head)); + BUG_ON(!list_empty(nbd-waiting_queue)); if (file) fput(file); return 0; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] [RESEND] add discard support to nbd
Description: This patch adds discard support to nbd. When the nbd client system receives a discard request, this will be passed along to the nbd server system, where the nbd-server will respond by performing: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c544bb4..a014169 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd) case NBD_CMD_READ: return "read"; case NBD_CMD_WRITE: return "write"; case NBD_CMD_DISC: return "disconnect"; + case NBD_CMD_TRIM: return "trim/discard"; } return "invalid"; } @@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { - nbd_cmd(req) = NBD_CMD_WRITE; + if ((req->cmd_flags & REQ_DISCARD)) { + WARN_ON(!(nbd->flags & NBD_FLAG_SEND_TRIM)); + nbd_cmd(req) = NBD_CMD_TRIM; + } else + nbd_cmd(req) = NBD_CMD_WRITE; if (nbd->flags & NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd->disk), "Write on read-only\n"); @@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, mutex_unlock(>tx_lock); + if (nbd->flags & NBD_FLAG_SEND_TRIM) + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, + nbd->disk->queue); + thread = kthread_create(nbd_thread, nbd, nbd->disk->disk_name); if (IS_ERR(thread)) { mutex_lock(>tx_lock); @@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->file = NULL; nbd_clear_que(nbd); dev_warn(disk_to_dev(nbd->disk), "queue cleared\n"); + queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue); if (file) fput(file); nbd->bytesize = 0; @@ -802,6 +812,9 @@ static int __init nbd_init(void) * Tell the block layer that we are not a rotational device */ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk->queue); + disk->queue->limits.discard_granularity = 512; + disk->queue->limits.max_discard_sectors = UINT_MAX; + disk->queue->limits.discard_zeroes_data = 0; } if (register_blkdev(NBD_MAJOR, "nbd")) { diff --git a/include/linux/nbd.h b/include/linux/nbd.h index bb349be..3b49a63 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -32,12 +32,16 @@ enum { NBD_CMD_READ = 0, NBD_CMD_WRITE = 1, - NBD_CMD_DISC = 2 + NBD_CMD_DISC = 2, + /* there is a gap here to match userspace */ + NBD_CMD_TRIM = 4 }; /* values for flags field */ #define NBD_FLAG_HAS_FLAGS (1 << 0) #define NBD_FLAG_READ_ONLY (1 << 1) +/* there is a gap here to match userspace */ +#define NBD_FLAG_SEND_TRIM (1 << 5) /* send trim/discard */ #define nbd_cmd(req) ((req)->cmd[0]) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] [RESEND] add discard support to nbd
[ ... resend, as my mailer garbled the last post ... ] This patchset adds discard request support to nbd. This should be good for inclusion in next. The first patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. One of the new flags tells the nbd client to send discard requests to the server. The second patch adds handling of discard requests to nbd when NBD_FLAG_SEND_TRIM is set. Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] [RESEND] add discard support to nbd
Description: This patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. Signed-off-by: Paul Clements --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index d07c9f7..c544bb4 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd) case NBD_SET_SOCK: return "set-sock"; case NBD_SET_BLKSIZE: return "set-blksize"; case NBD_SET_SIZE: return "set-size"; + case NBD_SET_TIMEOUT: return "set-timeout"; + case NBD_SET_FLAGS: return "set-flags"; case NBD_DO_IT: return "do-it"; case NBD_CLEAR_SOCK: return "clear-sock"; case NBD_CLEAR_QUE: return "clear-que"; @@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { nbd_cmd(req) = NBD_CMD_WRITE; - if (nbd->flags & NBD_READ_ONLY) { + if (nbd->flags & NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd->disk), "Write on read-only\n"); goto error_out; @@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->xmit_timeout = arg * HZ; return 0; + case NBD_SET_FLAGS: + nbd->flags = arg; + return 0; + case NBD_SET_SIZE_BLOCKS: nbd->bytesize = ((u64) arg) * nbd->blksize; bdev->bd_inode->i_size = nbd->bytesize; diff --git a/include/linux/nbd.h b/include/linux/nbd.h index d146ca1..bb349be 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -27,6 +27,7 @@ #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 ) #define NBD_DISCONNECT _IO( 0xab, 8 ) #define NBD_SET_TIMEOUT _IO( 0xab, 9 ) +#define NBD_SET_FLAGS _IO( 0xab, 10) enum { NBD_CMD_READ = 0, @@ -34,6 +35,10 @@ enum { NBD_CMD_DISC = 2 }; +/* values for flags field */ +#define NBD_FLAG_HAS_FLAGS (1 << 0) +#define NBD_FLAG_READ_ONLY (1 << 1) + #define nbd_cmd(req) ((req)->cmd[0]) /* userspace doesn't need the nbd_device structure */ @@ -42,10 +47,6 @@ enum { #include #include -/* values for flags field */ -#define NBD_READ_ONLY 0x0001 -#define NBD_WRITE_NOCHK 0x0002 - struct request; struct nbd_device { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] [RESEND] add discard support to nbd
Description: This patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index d07c9f7..c544bb4 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -78,6 +78,8 @@ static const char *ioctl_cmd_to_ascii(int cmd) case NBD_SET_SOCK: return set-sock; case NBD_SET_BLKSIZE: return set-blksize; case NBD_SET_SIZE: return set-size; + case NBD_SET_TIMEOUT: return set-timeout; + case NBD_SET_FLAGS: return set-flags; case NBD_DO_IT: return do-it; case NBD_CLEAR_SOCK: return clear-sock; case NBD_CLEAR_QUE: return clear-que; @@ -460,7 +462,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { nbd_cmd(req) = NBD_CMD_WRITE; - if (nbd-flags NBD_READ_ONLY) { + if (nbd-flags NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd-disk), Write on read-only\n); goto error_out; @@ -642,6 +644,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-xmit_timeout = arg * HZ; return 0; + case NBD_SET_FLAGS: + nbd-flags = arg; + return 0; + case NBD_SET_SIZE_BLOCKS: nbd-bytesize = ((u64) arg) * nbd-blksize; bdev-bd_inode-i_size = nbd-bytesize; diff --git a/include/linux/nbd.h b/include/linux/nbd.h index d146ca1..bb349be 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -27,6 +27,7 @@ #define NBD_SET_SIZE_BLOCKS_IO( 0xab, 7 ) #define NBD_DISCONNECT _IO( 0xab, 8 ) #define NBD_SET_TIMEOUT _IO( 0xab, 9 ) +#define NBD_SET_FLAGS _IO( 0xab, 10) enum { NBD_CMD_READ = 0, @@ -34,6 +35,10 @@ enum { NBD_CMD_DISC = 2 }; +/* values for flags field */ +#define NBD_FLAG_HAS_FLAGS (1 0) +#define NBD_FLAG_READ_ONLY (1 1) + #define nbd_cmd(req) ((req)-cmd[0]) /* userspace doesn't need the nbd_device structure */ @@ -42,10 +47,6 @@ enum { #include linux/wait.h #include linux/mutex.h -/* values for flags field */ -#define NBD_READ_ONLY 0x0001 -#define NBD_WRITE_NOCHK 0x0002 - struct request; struct nbd_device { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] [RESEND] add discard support to nbd
[ ... resend, as my mailer garbled the last post ... ] This patchset adds discard request support to nbd. This should be good for inclusion in next. The first patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. One of the new flags tells the nbd client to send discard requests to the server. The second patch adds handling of discard requests to nbd when NBD_FLAG_SEND_TRIM is set. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] [RESEND] add discard support to nbd
Description: This patch adds discard support to nbd. When the nbd client system receives a discard request, this will be passed along to the nbd server system, where the nbd-server will respond by performing: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements paul.cleme...@steeleye.com --- diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c544bb4..a014169 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd) case NBD_CMD_READ: return read; case NBD_CMD_WRITE: return write; case NBD_CMD_DISC: return disconnect; + case NBD_CMD_TRIM: return trim/discard; } return invalid; } @@ -461,7 +462,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req) nbd_cmd(req) = NBD_CMD_READ; if (rq_data_dir(req) == WRITE) { - nbd_cmd(req) = NBD_CMD_WRITE; + if ((req-cmd_flags REQ_DISCARD)) { + WARN_ON(!(nbd-flags NBD_FLAG_SEND_TRIM)); + nbd_cmd(req) = NBD_CMD_TRIM; + } else + nbd_cmd(req) = NBD_CMD_WRITE; if (nbd-flags NBD_FLAG_READ_ONLY) { dev_err(disk_to_dev(nbd-disk), Write on read-only\n); @@ -667,6 +672,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, mutex_unlock(nbd-tx_lock); + if (nbd-flags NBD_FLAG_SEND_TRIM) + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, + nbd-disk-queue); + thread = kthread_create(nbd_thread, nbd, nbd-disk-disk_name); if (IS_ERR(thread)) { mutex_lock(nbd-tx_lock); @@ -684,6 +693,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd-file = NULL; nbd_clear_que(nbd); dev_warn(disk_to_dev(nbd-disk), queue cleared\n); + queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd-disk-queue); if (file) fput(file); nbd-bytesize = 0; @@ -802,6 +812,9 @@ static int __init nbd_init(void) * Tell the block layer that we are not a rotational device */ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, disk-queue); + disk-queue-limits.discard_granularity = 512; + disk-queue-limits.max_discard_sectors = UINT_MAX; + disk-queue-limits.discard_zeroes_data = 0; } if (register_blkdev(NBD_MAJOR, nbd)) { diff --git a/include/linux/nbd.h b/include/linux/nbd.h index bb349be..3b49a63 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -32,12 +32,16 @@ enum { NBD_CMD_READ = 0, NBD_CMD_WRITE = 1, - NBD_CMD_DISC = 2 + NBD_CMD_DISC = 2, + /* there is a gap here to match userspace */ + NBD_CMD_TRIM = 4 }; /* values for flags field */ #define NBD_FLAG_HAS_FLAGS (1 0) #define NBD_FLAG_READ_ONLY (1 1) +/* there is a gap here to match userspace */ +#define NBD_FLAG_SEND_TRIM (1 5) /* send trim/discard */ #define nbd_cmd(req) ((req)-cmd[0]) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] add discard support to nbd
nbd-trim-discard-support.diff Description: Binary data
[PATCH 1/2] add discard support to nbd
nbd-set-flags-ioctl.diff Description: Binary data
[PATCH 0/2] add discard support to nbd
This patchset adds discard request support to nbd. This should be good for inclusion in next. The first patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. One of the new flags tells the nbd client to send discard requests to the server. The second patch adds handling of discard requests to nbd when NBD_FLAG_SEND_TRIM is set. Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] add discard support to nbd
This patchset adds discard request support to nbd. This should be good for inclusion in next. The first patch adds a set-flags ioctl, allowing various option flags to be set on an nbd device. One of the new flags tells the nbd client to send discard requests to the server. The second patch adds handling of discard requests to nbd when NBD_FLAG_SEND_TRIM is set. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] add discard support to nbd
nbd-set-flags-ioctl.diff Description: Binary data
[PATCH 2/2] add discard support to nbd
nbd-trim-discard-support.diff Description: Binary data
Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
Take 3...this should address all the issues. NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something else. The two problems I have experienced with nbd and cfq are: 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 There have been posts to nbd-general mailing list about problems with cfq and nbd also. 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./drivers/block/nbd.c.max_nbd_killed 2008-02-07 16:46:24.0 -0500 +++ ./drivers/block/nbd.c 2008-02-09 08:14:18.0 -0500 @@ -654,6 +654,7 @@ static int __init nbd_init(void) for (i = 0; i < nbds_max; i++) { struct gendisk *disk = alloc_disk(1); + elevator_t *old_e; if (!disk) goto out; nbd_dev[i].disk = disk; @@ -667,6 +668,11 @@ static int __init nbd_init(void) put_disk(disk); goto out; } + old_e = disk->queue->elevator; + if (elevator_init(disk->queue, "deadline") == 0 || + elevator_init(disk->queue, "noop") == 0) { +elevator_exit(old_e); + } } if (register_blkdev(NBD_MAJOR, "nbd")) {
Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
Take 3...this should address all the issues. NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something else. The two problems I have experienced with nbd and cfq are: 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 There have been posts to nbd-general mailing list about problems with cfq and nbd also. 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./drivers/block/nbd.c.max_nbd_killed 2008-02-07 16:46:24.0 -0500 +++ ./drivers/block/nbd.c 2008-02-09 08:14:18.0 -0500 @@ -654,6 +654,7 @@ static int __init nbd_init(void) for (i = 0; i nbds_max; i++) { struct gendisk *disk = alloc_disk(1); + elevator_t *old_e; if (!disk) goto out; nbd_dev[i].disk = disk; @@ -667,6 +668,11 @@ static int __init nbd_init(void) put_disk(disk); goto out; } + old_e = disk-queue-elevator; + if (elevator_init(disk-queue, deadline) == 0 || + elevator_init(disk-queue, noop) == 0) { +elevator_exit(old_e); + } } if (register_blkdev(NBD_MAJOR, nbd)) {
Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
Jens Axboe wrote: On Fri, Feb 08 2008, Jens Axboe wrote: On Fri, Feb 08 2008, Paul Clements wrote: Andrew Morton wrote: On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap <[EMAIL PROTECTED]> wrote: On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote: There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. Please define "problems". If it's just "slowness" then we can live with that, but I'd hope that Jens is aware and that it's understood. It it's "hangs" or "oopses" then we panic. The two problems I have experienced (which may already be fixed): 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) There have been posts to nbd-general mailing list about problems with cfq and nbd also. I'm fine with that, it's one of those things we'll do automatically when we have some sort of disk profile system setup. Devices without seek penalties should not use AS or CFQ. Asking for a non-existing elevator is not an issue, but it may trigger both printks and a switch to another elevator. So if you ask for "deadline" and it's modular, you'll get cfq again if it's the default. Your patch looks bad though, you forget to exit the old elevator. And you don't check the return value of elevator_init(). All in all, your patch definitely needs more work before it can be included. Thanks Jens. This one should be better. -- Paul NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something else. The two problems I have experienced with nbd and cfq are: 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) There have been posts to nbd-general mailing list about problems with cfq and nbd also. Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./drivers/block/nbd.c.max_nbd_killed 2008-02-07 16:46:24.0 -0500 +++ ./drivers/block/nbd.c 2008-02-08 16:13:01.0 -0500 @@ -667,6 +667,12 @@ static int __init nbd_init(void) put_disk(disk); goto out; } + if (elevator_init(disk->queue, "deadline") != 0) { + if (elevator_init(disk->queue, "noop") != 0) { +put_disk(disk); +goto out; + } + } } if (register_blkdev(NBD_MAJOR, "nbd")) {
Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
Andrew Morton wrote: On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap <[EMAIL PROTECTED]> wrote: On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote: There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. Please define "problems". If it's just "slowness" then we can live with that, but I'd hope that Jens is aware and that it's understood. It it's "hangs" or "oopses" then we panic. The two problems I have experienced (which may already be fixed): 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) There have been posts to nbd-general mailing list about problems with cfq and nbd also. -- Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. -- Paul There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./drivers/block/nbd.c.max_nbd_killed 2008-02-07 16:46:24.0 -0500 +++ ./drivers/block/nbd.c 2008-02-08 11:38:47.0 -0500 @@ -667,6 +667,7 @@ static int __init nbd_init(void) put_disk(disk); goto out; } + elevator_init(disk->queue, "deadline"); } if (register_blkdev(NBD_MAJOR, "nbd")) {
[PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. -- Paul There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./drivers/block/nbd.c.max_nbd_killed 2008-02-07 16:46:24.0 -0500 +++ ./drivers/block/nbd.c 2008-02-08 11:38:47.0 -0500 @@ -667,6 +667,7 @@ static int __init nbd_init(void) put_disk(disk); goto out; } + elevator_init(disk-queue, deadline); } if (register_blkdev(NBD_MAJOR, nbd)) {
Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
Andrew Morton wrote: On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap [EMAIL PROTECTED] wrote: On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote: There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. Please define problems. If it's just slowness then we can live with that, but I'd hope that Jens is aware and that it's understood. It it's hangs or oopses then we panic. The two problems I have experienced (which may already be fixed): 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) There have been posts to nbd-general mailing list about problems with cfq and nbd also. -- Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] NBD: make nbd default to deadline I/O scheduler
Jens Axboe wrote: On Fri, Feb 08 2008, Jens Axboe wrote: On Fri, Feb 08 2008, Paul Clements wrote: Andrew Morton wrote: On Fri, 8 Feb 2008 09:33:41 -0800 Randy Dunlap [EMAIL PROTECTED] wrote: On Fri, 08 Feb 2008 11:47:42 -0500 Paul Clements wrote: There have been numerous reports of problems with nbd and cfq. Deadline gives better performance for nbd, anyway, so let's use it by default. Please define problems. If it's just slowness then we can live with that, but I'd hope that Jens is aware and that it's understood. It it's hangs or oopses then we panic. The two problems I have experienced (which may already be fixed): 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) There have been posts to nbd-general mailing list about problems with cfq and nbd also. I'm fine with that, it's one of those things we'll do automatically when we have some sort of disk profile system setup. Devices without seek penalties should not use AS or CFQ. Asking for a non-existing elevator is not an issue, but it may trigger both printks and a switch to another elevator. So if you ask for deadline and it's modular, you'll get cfq again if it's the default. Your patch looks bad though, you forget to exit the old elevator. And you don't check the return value of elevator_init(). All in all, your patch definitely needs more work before it can be included. Thanks Jens. This one should be better. -- Paul NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something else. The two problems I have experienced with nbd and cfq are: 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) There have been posts to nbd-general mailing list about problems with cfq and nbd also. Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./drivers/block/nbd.c.max_nbd_killed 2008-02-07 16:46:24.0 -0500 +++ ./drivers/block/nbd.c 2008-02-08 16:13:01.0 -0500 @@ -667,6 +667,12 @@ static int __init nbd_init(void) put_disk(disk); goto out; } + if (elevator_init(disk-queue, deadline) != 0) { + if (elevator_init(disk-queue, noop) != 0) { +put_disk(disk); +goto out; + } + } } if (register_blkdev(NBD_MAJOR, nbd)) {
Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024
Andrew Morton wrote: On Tue, 29 Jan 2008 21:04:05 -0500 Paul Clements <[EMAIL PROTECTED]> wrote: Andrew Morton wrote: much nicer? OK, yes, I was going for quick and easy, but you've got a point... We do need to move the kcalloc into nbd_init instead of nbd_cleanup, though -- that works a little better. Patch is attached. Thanks for the suggestion. -- Paul [nbd_max_nbd_killed.diff text/x-patch (1.2KB)] Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 20:01:33.0 -0500 Could we have a complete changelog please? As I mentioned in the earlier email? [PATCH 1/1] NBD: remove limit on max number of nbd devices Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to any number. In certain scenarios where devices are used sparsely we have run into the 128 device limit. Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 20:01:33.0 -0500 @@ -35,7 +35,6 @@ enum { }; #define nbd_cmd(req) ((req)->cmd[0]) -#define MAX_NBD 128 /* userspace doesn't need the nbd_device structure */ #ifdef __KERNEL__ --- ./drivers/block/nbd.c.ORIG 2008-01-29 20:01:12.0 -0500 +++ ./drivers/block/nbd.c 2008-01-29 20:08:30.0 -0500 @@ -53,7 +53,7 @@ static unsigned int debugflags; #endif /* NDEBUG */ static unsigned int nbds_max = 16; -static struct nbd_device nbd_dev[MAX_NBD]; +static struct nbd_device *nbd_dev; /* * Use just one lock (or at most 1 per NIC). Two arguments for this: @@ -659,11 +659,9 @@ static int __init nbd_init(void) BUILD_BUG_ON(sizeof(struct nbd_request) != 28); - if (nbds_max > MAX_NBD) { - printk(KERN_CRIT "nbd: cannot allocate more than %u nbds; %u requested.\n", MAX_NBD, -nbds_max); - return -EINVAL; - } + nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); + if (!nbd_dev) + return -ENOMEM; for (i = 0; i < nbds_max; i++) { struct gendisk *disk = alloc_disk(1);
Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024
Andrew Morton wrote: On Tue, 29 Jan 2008 21:04:05 -0500 Paul Clements [EMAIL PROTECTED] wrote: Andrew Morton wrote: snip much nicer? OK, yes, I was going for quick and easy, but you've got a point... We do need to move the kcalloc into nbd_init instead of nbd_cleanup, though -- that works a little better. Patch is attached. Thanks for the suggestion. -- Paul [nbd_max_nbd_killed.diff text/x-patch (1.2KB)] Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 20:01:33.0 -0500 Could we have a complete changelog please? As I mentioned in the earlier email? [PATCH 1/1] NBD: remove limit on max number of nbd devices Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to any number. In certain scenarios where devices are used sparsely we have run into the 128 device limit. Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 20:01:33.0 -0500 @@ -35,7 +35,6 @@ enum { }; #define nbd_cmd(req) ((req)-cmd[0]) -#define MAX_NBD 128 /* userspace doesn't need the nbd_device structure */ #ifdef __KERNEL__ --- ./drivers/block/nbd.c.ORIG 2008-01-29 20:01:12.0 -0500 +++ ./drivers/block/nbd.c 2008-01-29 20:08:30.0 -0500 @@ -53,7 +53,7 @@ static unsigned int debugflags; #endif /* NDEBUG */ static unsigned int nbds_max = 16; -static struct nbd_device nbd_dev[MAX_NBD]; +static struct nbd_device *nbd_dev; /* * Use just one lock (or at most 1 per NIC). Two arguments for this: @@ -659,11 +659,9 @@ static int __init nbd_init(void) BUILD_BUG_ON(sizeof(struct nbd_request) != 28); - if (nbds_max MAX_NBD) { - printk(KERN_CRIT nbd: cannot allocate more than %u nbds; %u requested.\n, MAX_NBD, -nbds_max); - return -EINVAL; - } + nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); + if (!nbd_dev) + return -ENOMEM; for (i = 0; i nbds_max; i++) { struct gendisk *disk = alloc_disk(1);
Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024
Andrew Morton wrote: much nicer? OK, yes, I was going for quick and easy, but you've got a point... We do need to move the kcalloc into nbd_init instead of nbd_cleanup, though -- that works a little better. Patch is attached. Thanks for the suggestion. -- Paul Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 20:01:33.0 -0500 @@ -35,7 +35,6 @@ enum { }; #define nbd_cmd(req) ((req)->cmd[0]) -#define MAX_NBD 128 /* userspace doesn't need the nbd_device structure */ #ifdef __KERNEL__ --- ./drivers/block/nbd.c.ORIG 2008-01-29 20:01:12.0 -0500 +++ ./drivers/block/nbd.c 2008-01-29 20:08:30.0 -0500 @@ -53,7 +53,7 @@ static unsigned int debugflags; #endif /* NDEBUG */ static unsigned int nbds_max = 16; -static struct nbd_device nbd_dev[MAX_NBD]; +static struct nbd_device *nbd_dev; /* * Use just one lock (or at most 1 per NIC). Two arguments for this: @@ -659,11 +659,9 @@ static int __init nbd_init(void) BUILD_BUG_ON(sizeof(struct nbd_request) != 28); - if (nbds_max > MAX_NBD) { - printk(KERN_CRIT "nbd: cannot allocate more than %u nbds; %u requested.\n", MAX_NBD, -nbds_max); - return -EINVAL; - } + nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); + if (!nbd_dev) + return -ENOMEM; for (i = 0; i < nbds_max; i++) { struct gendisk *disk = alloc_disk(1);
[PATCH 1/1] NBD: raise max number of nbd devices to 1024
The limit was 128. This changes it to 1024. -- Paul Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 14:35:17.0 -0500 @@ -35,7 +35,7 @@ enum { }; #define nbd_cmd(req) ((req)->cmd[0]) -#define MAX_NBD 128 +#define MAX_NBD 1024 /* userspace doesn't need the nbd_device structure */ #ifdef __KERNEL__
[PATCH 1/1] NBD: raise max number of nbd devices to 1024
The limit was 128. This changes it to 1024. -- Paul Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 14:35:17.0 -0500 @@ -35,7 +35,7 @@ enum { }; #define nbd_cmd(req) ((req)-cmd[0]) -#define MAX_NBD 128 +#define MAX_NBD 1024 /* userspace doesn't need the nbd_device structure */ #ifdef __KERNEL__
Re: [PATCH 1/1] NBD: raise max number of nbd devices to 1024
Andrew Morton wrote: snip much nicer? OK, yes, I was going for quick and easy, but you've got a point... We do need to move the kcalloc into nbd_init instead of nbd_cleanup, though -- that works a little better. Patch is attached. Thanks for the suggestion. -- Paul Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./include/linux/nbd.h.TIMEOUT 2007-08-22 13:18:12.0 -0400 +++ ./include/linux/nbd.h 2008-01-29 20:01:33.0 -0500 @@ -35,7 +35,6 @@ enum { }; #define nbd_cmd(req) ((req)-cmd[0]) -#define MAX_NBD 128 /* userspace doesn't need the nbd_device structure */ #ifdef __KERNEL__ --- ./drivers/block/nbd.c.ORIG 2008-01-29 20:01:12.0 -0500 +++ ./drivers/block/nbd.c 2008-01-29 20:08:30.0 -0500 @@ -53,7 +53,7 @@ static unsigned int debugflags; #endif /* NDEBUG */ static unsigned int nbds_max = 16; -static struct nbd_device nbd_dev[MAX_NBD]; +static struct nbd_device *nbd_dev; /* * Use just one lock (or at most 1 per NIC). Two arguments for this: @@ -659,11 +659,9 @@ static int __init nbd_init(void) BUILD_BUG_ON(sizeof(struct nbd_request) != 28); - if (nbds_max MAX_NBD) { - printk(KERN_CRIT nbd: cannot allocate more than %u nbds; %u requested.\n, MAX_NBD, -nbds_max); - return -EINVAL; - } + nbd_dev = kcalloc(nbds_max, sizeof(*nbd_dev), GFP_KERNEL); + if (!nbd_dev) + return -ENOMEM; for (i = 0; i nbds_max; i++) { struct gendisk *disk = alloc_disk(1);
Re: [PATCH 2/2] NBD: allow hung network I/O to be cancelled
Mike Snitzer wrote: On 8/24/07, Paul Clements <[EMAIL PROTECTED]> wrote: This patch allows NBD I/O to be cancelled when a network outage occurs. Previously, I/O would just hang, and if enough I/O was hung in nbd, the system (at least user-level) would completely hang until a TCP timeout (default, 15 minutes) occurred. The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit timeout value (in seconds) to be specified. Any network send that exceeds the timeout will be cancelled and the nbd connection will be shut down. I've tested with various timeout values and 6 seconds seems to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O hang) behavior. Hi Paul, Thanks for implementing this! Do you happen to have an associated nbd-client patch for userspace? If not I'd be happy to coordinate with you and Wouter on a patch. No, I don't. I just basically hardcoded my nbd-client to do a 6 second timeout by default, but Wouter will probably want to do something a little less hackish for the official nbd-client. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] NBD: allow hung network I/O to be cancelled
This patch allows NBD I/O to be cancelled when a network outage occurs. Previously, I/O would just hang, and if enough I/O was hung in nbd, the system (at least user-level) would completely hang until a TCP timeout (default, 15 minutes) occurred. The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit timeout value (in seconds) to be specified. Any network send that exceeds the timeout will be cancelled and the nbd connection will be shut down. I've tested with various timeout values and 6 seconds seems to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O hang) behavior. Patch applies and was tested against 2.6.23-rc2-mm2. Thanks, Paul Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./drivers/block/nbd.c.SIZE_ZERO 2007-08-16 13:08:06.0 -0400 +++ ./drivers/block/nbd.c 2007-08-23 16:42:09.0 -0400 @@ -113,12 +113,42 @@ static void nbd_end_request(struct reque spin_unlock_irqrestore(q->queue_lock, flags); } +static void sock_shutdown(struct nbd_device *lo, int lock) +{ + /* Forcibly shutdown the socket causing all listeners + * to error + * + * FIXME: This code is duplicated from sys_shutdown, but + * there should be a more generic interface rather than + * calling socket ops directly here */ + if (lock) + mutex_lock(>tx_lock); + if (lo->sock) { + printk(KERN_WARNING "%s: shutting down socket\n", + lo->disk->disk_name); + lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN); + lo->sock = NULL; + } + if (lock) + mutex_unlock(>tx_lock); +} + +static void nbd_xmit_timeout(unsigned long arg) +{ + struct task_struct *task = (struct task_struct *)arg; + + printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n", + task->comm, task->pid); + force_sig(SIGKILL, task); +} + /* * Send or receive packet. */ -static int sock_xmit(struct socket *sock, int send, void *buf, int size, +static int sock_xmit(struct nbd_device *lo, int send, void *buf, int size, int msg_flags) { + struct socket *sock = lo->sock; int result; struct msghdr msg; struct kvec iov; @@ -139,9 +169,20 @@ static int sock_xmit(struct socket *sock msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) + if (send) { + struct timer_list ti; + + if (lo->xmit_timeout) { +init_timer(); +ti.function = nbd_xmit_timeout; +ti.data = (unsigned long)current; +ti.expires = jiffies + lo->xmit_timeout; +add_timer(); + } result = kernel_sendmsg(sock, , , 1, size); - else + if (lo->xmit_timeout) +del_timer_sync(); + } else result = kernel_recvmsg(sock, , , 1, size, 0); if (signal_pending(current)) { @@ -150,6 +191,7 @@ static int sock_xmit(struct socket *sock current->pid, current->comm, dequeue_signal_lock(current, >blocked, )); result = -EINTR; + sock_shutdown(lo, !send); break; } @@ -167,23 +209,22 @@ static int sock_xmit(struct socket *sock return result; } -static inline int sock_send_bvec(struct socket *sock, struct bio_vec *bvec, +static inline int sock_send_bvec(struct nbd_device *lo, struct bio_vec *bvec, int flags) { int result; void *kaddr = kmap(bvec->bv_page); - result = sock_xmit(sock, 1, kaddr + bvec->bv_offset, bvec->bv_len, - flags); + result = sock_xmit(lo, 1, kaddr + bvec->bv_offset, bvec->bv_len, flags); kunmap(bvec->bv_page); return result; } +/* always call with the tx_lock held */ static int nbd_send_req(struct nbd_device *lo, struct request *req) { int result, i, flags; struct nbd_request request; unsigned long size = req->nr_sectors << 9; - struct socket *sock = lo->sock; request.magic = htonl(NBD_REQUEST_MAGIC); request.type = htonl(nbd_cmd(req)); @@ -196,8 +237,8 @@ static int nbd_send_req(struct nbd_devic nbdcmd_to_ascii(nbd_cmd(req)), (unsigned long long)req->sector << 9, req->nr_sectors << 9); - result = sock_xmit(sock, 1, , sizeof(request), - (nbd_cmd(req) == NBD_CMD_WRITE)? MSG_MORE: 0); + result = sock_xmit(lo, 1, , sizeof(request), + (nbd_cmd(req) == NBD_CMD_WRITE) ? MSG_MORE : 0); if (result <= 0) { printk(KERN_ERR "%s: Send control failed (result %d)\n", lo->disk->disk_name, result); @@ -219,7 +260,7 @@ static int nbd_send_req(struct nbd_devic dprintk(DBG_TX, "%s: request %p: sending %d bytes data\n", lo->disk->disk_name, req, bvec->bv_len); -result = sock_send_bvec(sock, bvec, flags); +result = sock_send_bvec(lo, bvec, flags); if (result <= 0) { printk(KERN_ERR "%s: Send data failed (result %d)\n", lo->disk->disk_name, @@ -261,11 +302,11 @@ out: return ERR_PTR(err); } -static inline int sock_recv_bvec(struct socket *sock, struct bio_vec *bvec) +static inline int sock_recv_bvec(struct nbd
[PATCH 1/2] NBD: set uninitialized devices to size 0
This fixes errors with utilities (such as LVM's vgscan) that try to scan all devices. Previously this would generate read errors when uninitialized nbd devices were scanned: # vgscan Reading all physical volumes. This may take a while... /dev/nbd0: read failed after 0 of 1024 at 0: Input/output error /dev/nbd0: read failed after 0 of 1024 at 509804544: Input/output error /dev/nbd0: read failed after 0 of 2048 at 0: Input/output error /dev/nbd1: read failed after 0 of 1024 at 509804544: Input/output error /dev/nbd1: read failed after 0 of 2048 at 0: Input/output error From now on, uninitialized nbd devices will have size zero, which prevents these errors. Patch applies and was tested against 2.6.23-rc2-mm2. Thanks, Paul Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> --- ./drivers/block/nbd.c.ERROR_RETURNS 2007-08-15 15:41:19.0 -0400 +++ ./drivers/block/nbd.c 2007-08-16 13:08:06.0 -0400 @@ -589,6 +589,9 @@ static int nbd_ioctl(struct inode *inode printk(KERN_WARNING "%s: queue cleared\n", lo->disk->disk_name); if (file) fput(file); + lo->bytesize = 0; + inode->i_bdev->bd_inode->i_size = 0; + set_capacity(lo->disk, 0); return lo->harderror; case NBD_CLEAR_QUE: /* @@ -666,14 +669,14 @@ static int __init nbd_init(void) mutex_init(_dev[i].tx_lock); init_waitqueue_head(_dev[i].active_wq); nbd_dev[i].blksize = 1024; - nbd_dev[i].bytesize = 0x7c00ULL << 10; /* 2TB */ + nbd_dev[i].bytesize = 0; disk->major = NBD_MAJOR; disk->first_minor = i; disk->fops = _fops; disk->private_data = _dev[i]; disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO; sprintf(disk->disk_name, "nbd%d", i); - set_capacity(disk, 0x7c00ULL << 1); /* 2 TB */ + set_capacity(disk, 0); add_disk(disk); }
[PATCH 1/2] NBD: set uninitialized devices to size 0
This fixes errors with utilities (such as LVM's vgscan) that try to scan all devices. Previously this would generate read errors when uninitialized nbd devices were scanned: # vgscan Reading all physical volumes. This may take a while... /dev/nbd0: read failed after 0 of 1024 at 0: Input/output error /dev/nbd0: read failed after 0 of 1024 at 509804544: Input/output error /dev/nbd0: read failed after 0 of 2048 at 0: Input/output error /dev/nbd1: read failed after 0 of 1024 at 509804544: Input/output error /dev/nbd1: read failed after 0 of 2048 at 0: Input/output error From now on, uninitialized nbd devices will have size zero, which prevents these errors. Patch applies and was tested against 2.6.23-rc2-mm2. Thanks, Paul Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./drivers/block/nbd.c.ERROR_RETURNS 2007-08-15 15:41:19.0 -0400 +++ ./drivers/block/nbd.c 2007-08-16 13:08:06.0 -0400 @@ -589,6 +589,9 @@ static int nbd_ioctl(struct inode *inode printk(KERN_WARNING %s: queue cleared\n, lo-disk-disk_name); if (file) fput(file); + lo-bytesize = 0; + inode-i_bdev-bd_inode-i_size = 0; + set_capacity(lo-disk, 0); return lo-harderror; case NBD_CLEAR_QUE: /* @@ -666,14 +669,14 @@ static int __init nbd_init(void) mutex_init(nbd_dev[i].tx_lock); init_waitqueue_head(nbd_dev[i].active_wq); nbd_dev[i].blksize = 1024; - nbd_dev[i].bytesize = 0x7c00ULL 10; /* 2TB */ + nbd_dev[i].bytesize = 0; disk-major = NBD_MAJOR; disk-first_minor = i; disk-fops = nbd_fops; disk-private_data = nbd_dev[i]; disk-flags |= GENHD_FL_SUPPRESS_PARTITION_INFO; sprintf(disk-disk_name, nbd%d, i); - set_capacity(disk, 0x7c00ULL 1); /* 2 TB */ + set_capacity(disk, 0); add_disk(disk); }
[PATCH 2/2] NBD: allow hung network I/O to be cancelled
This patch allows NBD I/O to be cancelled when a network outage occurs. Previously, I/O would just hang, and if enough I/O was hung in nbd, the system (at least user-level) would completely hang until a TCP timeout (default, 15 minutes) occurred. The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit timeout value (in seconds) to be specified. Any network send that exceeds the timeout will be cancelled and the nbd connection will be shut down. I've tested with various timeout values and 6 seconds seems to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O hang) behavior. Patch applies and was tested against 2.6.23-rc2-mm2. Thanks, Paul Signed-Off-By: Paul Clements [EMAIL PROTECTED] --- ./drivers/block/nbd.c.SIZE_ZERO 2007-08-16 13:08:06.0 -0400 +++ ./drivers/block/nbd.c 2007-08-23 16:42:09.0 -0400 @@ -113,12 +113,42 @@ static void nbd_end_request(struct reque spin_unlock_irqrestore(q-queue_lock, flags); } +static void sock_shutdown(struct nbd_device *lo, int lock) +{ + /* Forcibly shutdown the socket causing all listeners + * to error + * + * FIXME: This code is duplicated from sys_shutdown, but + * there should be a more generic interface rather than + * calling socket ops directly here */ + if (lock) + mutex_lock(lo-tx_lock); + if (lo-sock) { + printk(KERN_WARNING %s: shutting down socket\n, + lo-disk-disk_name); + lo-sock-ops-shutdown(lo-sock, SEND_SHUTDOWN|RCV_SHUTDOWN); + lo-sock = NULL; + } + if (lock) + mutex_unlock(lo-tx_lock); +} + +static void nbd_xmit_timeout(unsigned long arg) +{ + struct task_struct *task = (struct task_struct *)arg; + + printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, + task-comm, task-pid); + force_sig(SIGKILL, task); +} + /* * Send or receive packet. */ -static int sock_xmit(struct socket *sock, int send, void *buf, int size, +static int sock_xmit(struct nbd_device *lo, int send, void *buf, int size, int msg_flags) { + struct socket *sock = lo-sock; int result; struct msghdr msg; struct kvec iov; @@ -139,9 +169,20 @@ static int sock_xmit(struct socket *sock msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) + if (send) { + struct timer_list ti; + + if (lo-xmit_timeout) { +init_timer(ti); +ti.function = nbd_xmit_timeout; +ti.data = (unsigned long)current; +ti.expires = jiffies + lo-xmit_timeout; +add_timer(ti); + } result = kernel_sendmsg(sock, msg, iov, 1, size); - else + if (lo-xmit_timeout) +del_timer_sync(ti); + } else result = kernel_recvmsg(sock, msg, iov, 1, size, 0); if (signal_pending(current)) { @@ -150,6 +191,7 @@ static int sock_xmit(struct socket *sock current-pid, current-comm, dequeue_signal_lock(current, current-blocked, info)); result = -EINTR; + sock_shutdown(lo, !send); break; } @@ -167,23 +209,22 @@ static int sock_xmit(struct socket *sock return result; } -static inline int sock_send_bvec(struct socket *sock, struct bio_vec *bvec, +static inline int sock_send_bvec(struct nbd_device *lo, struct bio_vec *bvec, int flags) { int result; void *kaddr = kmap(bvec-bv_page); - result = sock_xmit(sock, 1, kaddr + bvec-bv_offset, bvec-bv_len, - flags); + result = sock_xmit(lo, 1, kaddr + bvec-bv_offset, bvec-bv_len, flags); kunmap(bvec-bv_page); return result; } +/* always call with the tx_lock held */ static int nbd_send_req(struct nbd_device *lo, struct request *req) { int result, i, flags; struct nbd_request request; unsigned long size = req-nr_sectors 9; - struct socket *sock = lo-sock; request.magic = htonl(NBD_REQUEST_MAGIC); request.type = htonl(nbd_cmd(req)); @@ -196,8 +237,8 @@ static int nbd_send_req(struct nbd_devic nbdcmd_to_ascii(nbd_cmd(req)), (unsigned long long)req-sector 9, req-nr_sectors 9); - result = sock_xmit(sock, 1, request, sizeof(request), - (nbd_cmd(req) == NBD_CMD_WRITE)? MSG_MORE: 0); + result = sock_xmit(lo, 1, request, sizeof(request), + (nbd_cmd(req) == NBD_CMD_WRITE) ? MSG_MORE : 0); if (result = 0) { printk(KERN_ERR %s: Send control failed (result %d)\n, lo-disk-disk_name, result); @@ -219,7 +260,7 @@ static int nbd_send_req(struct nbd_devic dprintk(DBG_TX, %s: request %p: sending %d bytes data\n, lo-disk-disk_name, req, bvec-bv_len); -result = sock_send_bvec(sock, bvec, flags); +result = sock_send_bvec(lo, bvec, flags); if (result = 0) { printk(KERN_ERR %s: Send data failed (result %d)\n, lo-disk-disk_name, @@ -261,11 +302,11 @@ out: return ERR_PTR(err); } -static inline int sock_recv_bvec(struct socket *sock, struct bio_vec *bvec) +static inline int sock_recv_bvec(struct nbd_device *lo, struct bio_vec *bvec) { int result; void *kaddr = kmap(bvec-bv_page); - result = sock_xmit(sock, 0, kaddr + bvec-bv_offset, bvec-bv_len, + result = sock_xmit(lo, 0, kaddr + bvec
Re: [PATCH 2/2] NBD: allow hung network I/O to be cancelled
Mike Snitzer wrote: On 8/24/07, Paul Clements [EMAIL PROTECTED] wrote: This patch allows NBD I/O to be cancelled when a network outage occurs. Previously, I/O would just hang, and if enough I/O was hung in nbd, the system (at least user-level) would completely hang until a TCP timeout (default, 15 minutes) occurred. The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit timeout value (in seconds) to be specified. Any network send that exceeds the timeout will be cancelled and the nbd connection will be shut down. I've tested with various timeout values and 6 seconds seems to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O hang) behavior. Hi Paul, Thanks for implementing this! Do you happen to have an associated nbd-client patch for userspace? If not I'd be happy to coordinate with you and Wouter on a patch. No, I don't. I just basically hardcoded my nbd-client to do a 6 second timeout by default, but Wouter will probably want to do something a little less hackish for the official nbd-client. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
Iustin Pop wrote: On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote: On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote: now, I am not an expert on either option, but three are a couple things that I would question about the DRDB+MD option 1. when the remote machine is down, how does MD deal with it for reads and writes? I suppose it kicks the drive and you'd have to re-add it by hand unless done by a cronjob. Yes, and with a bitmap configured on the raid1, you just resync the blocks that have been written while the connection was down. From my tests, since NBD doesn't have a timeout option, MD hangs in the write to that mirror indefinitely, somewhat like when dealing with a broken IDE driver/chipset/disk. Well, if people would like to see a timeout option, I actually coded up a patch a couple of years ago to do just that, but I never got it into mainline because you can do almost as well by doing a check at user-level (I basically ping the nbd connection periodically and if it fails, I kill -9 the nbd-client). 2. MD over local drive will alternate reads between mirrors (or so I've been told), doing so over the network is wrong. Certainly. In which case you set "write_mostly" (or even write_only, not sure of its name) on the raid component that is nbd. 3. when writing, will MD wait for the network I/O to get the data saved on the backup before returning from the syscall? or can it sync the data out lazily Can't answer this one - ask Neil :) MD has the write-mostly/write-behind options - which help in this case but only up to a certain amount. You can configure write_behind (aka, asynchronous writes) to buffer as much data as you have RAM to hold. At a certain point, presumably, you'd want to just break the mirror and take the hit of doing a resync once your network leg falls too far behind. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
Iustin Pop wrote: On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote: On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote: now, I am not an expert on either option, but three are a couple things that I would question about the DRDB+MD option 1. when the remote machine is down, how does MD deal with it for reads and writes? I suppose it kicks the drive and you'd have to re-add it by hand unless done by a cronjob. Yes, and with a bitmap configured on the raid1, you just resync the blocks that have been written while the connection was down. From my tests, since NBD doesn't have a timeout option, MD hangs in the write to that mirror indefinitely, somewhat like when dealing with a broken IDE driver/chipset/disk. Well, if people would like to see a timeout option, I actually coded up a patch a couple of years ago to do just that, but I never got it into mainline because you can do almost as well by doing a check at user-level (I basically ping the nbd connection periodically and if it fails, I kill -9 the nbd-client). 2. MD over local drive will alternate reads between mirrors (or so I've been told), doing so over the network is wrong. Certainly. In which case you set write_mostly (or even write_only, not sure of its name) on the raid component that is nbd. 3. when writing, will MD wait for the network I/O to get the data saved on the backup before returning from the syscall? or can it sync the data out lazily Can't answer this one - ask Neil :) MD has the write-mostly/write-behind options - which help in this case but only up to a certain amount. You can configure write_behind (aka, asynchronous writes) to buffer as much data as you have RAM to hold. At a certain point, presumably, you'd want to just break the mirror and take the hit of doing a resync once your network leg falls too far behind. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] nbd: change a parameter's type to remove a memcpy call
Denis Cheng wrote: this memcpy looks so strange, in fact it's merely a pointer dereference, so I change the parameter's type to refer it more directly, this could make the memcpy not needed anymore. in the function nbd_read_stat where nbd_find_request is only once called, the parameter served should be transformed accordingly. This is really a matter of preference. The generated code ends up being about the same, I think, while your patch makes the call to nbd_find_request kind of obtuse. Also, the memcpy's are balanced between send_req and find_request, so you can quickly see how the data is being transferred (from req into handle, and then back again). Your patch makes this less clear, at least to me. -- Paul Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> --- drivers/block/nbd.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 86639c0..a4d8508 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -235,14 +235,11 @@ error_out: return 1; } -static struct request *nbd_find_request(struct nbd_device *lo, char *handle) +static struct request *nbd_find_request(struct nbd_device *lo, struct request *xreq) { struct request *req, *n; - struct request *xreq; int err; - memcpy(, handle, sizeof(xreq)); - err = wait_event_interruptible(lo->active_wq, lo->active_req != xreq); if (unlikely(err)) goto out; @@ -297,7 +294,7 @@ static struct request *nbd_read_stat(struct nbd_device *lo) goto harderror; } - req = nbd_find_request(lo, reply.handle); + req = nbd_find_request(lo, *(struct request **)reply.handle); if (unlikely(IS_ERR(req))) { result = PTR_ERR(req); if (result != -ENOENT) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] nbd: use list_for_each_entry_safe to make it more consolidated and readable
Denis Cheng wrote: Thus the traverse of the loop may delete nodes, use the safe version. Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> --- drivers/block/nbd.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c129510..86639c0 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -237,8 +237,7 @@ error_out: static struct request *nbd_find_request(struct nbd_device *lo, char *handle) { - struct request *req; - struct list_head *tmp; + struct request *req, *n; struct request *xreq; int err; @@ -249,8 +248,7 @@ static struct request *nbd_find_request(struct nbd_device *lo, char *handle) goto out; spin_lock(>queue_lock); - list_for_each(tmp, >queue_head) { - req = list_entry(tmp, struct request, queuelist); + list_for_each_entry_safe(req, n, >queue_head, queuelist) { if (req != xreq) continue; list_del_init(>queuelist); Could you name "n" as "tmp" (as in the previous code) so that it's clear that's only a temporary variable. Other than that, this looks good. Thanks, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] nbd: use list_for_each_entry_safe to make it more consolidated and readable
Denis Cheng wrote: Thus the traverse of the loop may delete nodes, use the safe version. Signed-off-by: Denis Cheng [EMAIL PROTECTED] --- drivers/block/nbd.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c129510..86639c0 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -237,8 +237,7 @@ error_out: static struct request *nbd_find_request(struct nbd_device *lo, char *handle) { - struct request *req; - struct list_head *tmp; + struct request *req, *n; struct request *xreq; int err; @@ -249,8 +248,7 @@ static struct request *nbd_find_request(struct nbd_device *lo, char *handle) goto out; spin_lock(lo-queue_lock); - list_for_each(tmp, lo-queue_head) { - req = list_entry(tmp, struct request, queuelist); + list_for_each_entry_safe(req, n, lo-queue_head, queuelist) { if (req != xreq) continue; list_del_init(req-queuelist); Could you name n as tmp (as in the previous code) so that it's clear that's only a temporary variable. Other than that, this looks good. Thanks, Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] nbd: change a parameter's type to remove a memcpy call
Denis Cheng wrote: this memcpy looks so strange, in fact it's merely a pointer dereference, so I change the parameter's type to refer it more directly, this could make the memcpy not needed anymore. in the function nbd_read_stat where nbd_find_request is only once called, the parameter served should be transformed accordingly. This is really a matter of preference. The generated code ends up being about the same, I think, while your patch makes the call to nbd_find_request kind of obtuse. Also, the memcpy's are balanced between send_req and find_request, so you can quickly see how the data is being transferred (from req into handle, and then back again). Your patch makes this less clear, at least to me. -- Paul Signed-off-by: Denis Cheng [EMAIL PROTECTED] --- drivers/block/nbd.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 86639c0..a4d8508 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -235,14 +235,11 @@ error_out: return 1; } -static struct request *nbd_find_request(struct nbd_device *lo, char *handle) +static struct request *nbd_find_request(struct nbd_device *lo, struct request *xreq) { struct request *req, *n; - struct request *xreq; int err; - memcpy(xreq, handle, sizeof(xreq)); - err = wait_event_interruptible(lo-active_wq, lo-active_req != xreq); if (unlikely(err)) goto out; @@ -297,7 +294,7 @@ static struct request *nbd_read_stat(struct nbd_device *lo) goto harderror; } - req = nbd_find_request(lo, reply.handle); + req = nbd_find_request(lo, *(struct request **)reply.handle); if (unlikely(IS_ERR(req))) { result = PTR_ERR(req); if (result != -ENOENT) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote: On 6/14/07, Paul Clements <[EMAIL PROTECTED]> wrote: Mike Snitzer wrote: > Here are the steps to reproduce reliably on SLES10 SP1: > 1) establish a raid1 mirror (md0) using one local member (sdc1) and > one remote member (nbd0) > 2) power off the remote machine, whereby severing nbd0's connection > 3) perform IO to the filesystem that is on the md0 device to enduce > the MD layer to mark the nbd device as "faulty" > 4) cat /proc/mdstat hangs, sysrq trace was collected That's working as designed. NBD works over TCP. You're going to have to wait for TCP to time out before an error occurs. Until then I/O will hang. With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the kernel like I am with RHEL5 and SLES10. This hang (tcp timeout) is indefinite oh RHEL5 and ~5min on SLES10. Should/can I be playing with TCP timeout values? Why was this not a concern with kernel.org 2.6.15.7; I was able to "feel" the nbd connection break immediately; no MD superblock update hangs, no longwinded (or indefinite) TCP timeout. I don't know. I've never seen nbd immediately start returning I/O errors. Perhaps something was different about the configuration? If the other other machine rebooted quickly, for instance, you'd get a connection reset, which would kill the nbd connection. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote: Here are the steps to reproduce reliably on SLES10 SP1: 1) establish a raid1 mirror (md0) using one local member (sdc1) and one remote member (nbd0) 2) power off the remote machine, whereby severing nbd0's connection 3) perform IO to the filesystem that is on the md0 device to enduce the MD layer to mark the nbd device as "faulty" 4) cat /proc/mdstat hangs, sysrq trace was collected That's working as designed. NBD works over TCP. You're going to have to wait for TCP to time out before an error occurs. Until then I/O will hang. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote: Just a quick update; it is really starting to look like there is definitely an issue with the nbd kernel driver. I booted the SLES10 2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD hang. But it _still_ occurs with the 4-step process I outlined above. The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server is no longer running (the node it was running on was powered off)... What do you mean, nbd should _feel_ an NBD_DISCONNECT ? NBD_DISCONNECT is a manual process, not an automatic one. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Bill Davidsen wrote: Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, but was told it wouldn't work with smp and I kind of lost interest. If Neil thinks it should work in 2.6.21 or later I'll test it, since I have a machine which wants a fresh install soon, and is both backed up and available. Please stop this. nbd is working perfectly fine, AFAIK. I use it every day, and so do 100s of our customers. What exactly is it that not's working? If there's a problem, please send the bug report. Thank You, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd.c:sock_xmit: cleanup signal related code
Oleg Nesterov wrote: sock_xmit() re-implements sigprocmask() and dequeue_signal_lock(). Yeah, that code was written before those existed. Thanks for the clean up. Note: I can't understand this dequeue_signal(), it can dequeue SIGKILL for the user-space task doing nbd_ioctl() ? So we can interrupt an nbd transmission without waiting for a TCP timeout (when we know the network is down). Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> Patch looks good to me. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] nbd.c:sock_xmit: cleanup signal related code
Oleg Nesterov wrote: sock_xmit() re-implements sigprocmask() and dequeue_signal_lock(). Yeah, that code was written before those existed. Thanks for the clean up. Note: I can't understand this dequeue_signal(), it can dequeue SIGKILL for the user-space task doing nbd_ioctl() ? So we can interrupt an nbd transmission without waiting for a TCP timeout (when we know the network is down). Signed-off-by: Oleg Nesterov [EMAIL PROTECTED] Patch looks good to me. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Bill Davidsen wrote: Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, but was told it wouldn't work with smp and I kind of lost interest. If Neil thinks it should work in 2.6.21 or later I'll test it, since I have a machine which wants a fresh install soon, and is both backed up and available. Please stop this. nbd is working perfectly fine, AFAIK. I use it every day, and so do 100s of our customers. What exactly is it that not's working? If there's a problem, please send the bug report. Thank You, Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote: Just a quick update; it is really starting to look like there is definitely an issue with the nbd kernel driver. I booted the SLES10 2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD hang. But it _still_ occurs with the 4-step process I outlined above. The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server is no longer running (the node it was running on was powered off)... What do you mean, nbd should _feel_ an NBD_DISCONNECT ? NBD_DISCONNECT is a manual process, not an automatic one. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote: Here are the steps to reproduce reliably on SLES10 SP1: 1) establish a raid1 mirror (md0) using one local member (sdc1) and one remote member (nbd0) 2) power off the remote machine, whereby severing nbd0's connection 3) perform IO to the filesystem that is on the md0 device to enduce the MD layer to mark the nbd device as faulty 4) cat /proc/mdstat hangs, sysrq trace was collected That's working as designed. NBD works over TCP. You're going to have to wait for TCP to time out before an error occurs. Until then I/O will hang. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote: On 6/14/07, Paul Clements [EMAIL PROTECTED] wrote: Mike Snitzer wrote: Here are the steps to reproduce reliably on SLES10 SP1: 1) establish a raid1 mirror (md0) using one local member (sdc1) and one remote member (nbd0) 2) power off the remote machine, whereby severing nbd0's connection 3) perform IO to the filesystem that is on the md0 device to enduce the MD layer to mark the nbd device as faulty 4) cat /proc/mdstat hangs, sysrq trace was collected That's working as designed. NBD works over TCP. You're going to have to wait for TCP to time out before an error occurs. Until then I/O will hang. With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the kernel like I am with RHEL5 and SLES10. This hang (tcp timeout) is indefinite oh RHEL5 and ~5min on SLES10. Should/can I be playing with TCP timeout values? Why was this not a concern with kernel.org 2.6.15.7; I was able to feel the nbd connection break immediately; no MD superblock update hangs, no longwinded (or indefinite) TCP timeout. I don't know. I've never seen nbd immediately start returning I/O errors. Perhaps something was different about the configuration? If the other other machine rebooted quickly, for instance, you'd get a connection reset, which would kill the nbd connection. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nbd: show nbd client pid in sysfs
This simple patch allows nbd to expose the nbd-client daemon's PID in /sys/block/nbd/pid. This is helpful for tracking connection status of a device and for determining which nbd devices are currently in use. Tested against 2.6.19. Thanks, Paul --- ./drivers/block/nbd.c Wed Nov 29 16:57:37 2006 +++ ./drivers/block/nbd.c Tue Nov 28 16:09:31 2006 @@ -355,14 +389,30 @@ harderror: return NULL; } +static ssize_t pid_show(struct gendisk *disk, char *page) +{ + return sprintf(page, "%ld\n", + (long) ((struct nbd_device *)disk->private_data)->pid); +} + +static struct disk_attribute pid_attr = { + .attr = { .name = "pid", .mode = S_IRUGO }, + .show = pid_show, +}; + static void nbd_do_it(struct nbd_device *lo) { struct request *req; BUG_ON(lo->magic != LO_MAGIC); + lo->pid = current->pid; + sysfs_create_file(>disk->kobj, _attr.attr); + while ((req = nbd_read_stat(lo)) != NULL) nbd_end_request(req); + + sysfs_remove_file(>disk->kobj, _attr.attr); return; } --- ./include/linux/nbd.h Wed Nov 29 16:57:37 2006 +++ ./include/linux/nbd.h Mon Dec 4 23:28:30 2006 @@ -64,6 +64,7 @@ struct nbd_device { struct gendisk *disk; int blksize; u64 bytesize; + pid_t pid; /* pid of nbd-client, if attached */ }; #endif
[PATCH] nbd: show nbd client pid in sysfs
This simple patch allows nbd to expose the nbd-client daemon's PID in /sys/block/nbdx/pid. This is helpful for tracking connection status of a device and for determining which nbd devices are currently in use. Tested against 2.6.19. Thanks, Paul --- ./drivers/block/nbd.c Wed Nov 29 16:57:37 2006 +++ ./drivers/block/nbd.c Tue Nov 28 16:09:31 2006 @@ -355,14 +389,30 @@ harderror: return NULL; } +static ssize_t pid_show(struct gendisk *disk, char *page) +{ + return sprintf(page, %ld\n, + (long) ((struct nbd_device *)disk-private_data)-pid); +} + +static struct disk_attribute pid_attr = { + .attr = { .name = pid, .mode = S_IRUGO }, + .show = pid_show, +}; + static void nbd_do_it(struct nbd_device *lo) { struct request *req; BUG_ON(lo-magic != LO_MAGIC); + lo-pid = current-pid; + sysfs_create_file(lo-disk-kobj, pid_attr.attr); + while ((req = nbd_read_stat(lo)) != NULL) nbd_end_request(req); + + sysfs_remove_file(lo-disk-kobj, pid_attr.attr); return; } --- ./include/linux/nbd.h Wed Nov 29 16:57:37 2006 +++ ./include/linux/nbd.h Mon Dec 4 23:28:30 2006 @@ -64,6 +64,7 @@ struct nbd_device { struct gendisk *disk; int blksize; u64 bytesize; + pid_t pid; /* pid of nbd-client, if attached */ }; #endif
[PATCH 2.4.29] nbd: fix ioctl permissions
Hi Marcelo, Here's a patch for nbd that Rogier recently sent me. It allows non-root to do BLKGETSIZE, et al. on nbd devices, which he needs for his data recovery applications. Tested against 2.4.29. Please apply. Thanks, Paul From: Rogier Wolff <[EMAIL PROTECTED]> Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> Description: We shouldn't need CAP_SYS_ADMIN to ask for disk capacity and such. === diff -ur linux-2.4.28.clean/drivers/block/nbd.c linux-2.4.28.nbd-fix/drivers/block/nbd.c --- linux-2.4.28.clean/drivers/block/nbd.c Wed Jan 19 18:14:01 2005 +++ linux-2.4.28.nbd-fix/drivers/block/nbd.cWed Jan 19 16:36:59 2005 @@ -408,10 +408,7 @@ int dev, error, temp; struct request sreq ; - /* Anyone capable of this syscall can do *real bad* things */ - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; if (!inode) return -EINVAL; dev = MINOR(inode->i_rdev); @@ -419,6 +416,20 @@ return -ENODEV; lo = _dev[dev]; + + /* these are innocent, but */ + switch (cmd) { + case BLKGETSIZE: + return put_user(nbd_bytesizes[dev] >> 9, (unsigned long *) arg); + case BLKGETSIZE64: + return put_user((u64)nbd_bytesizes[dev], (u64 *) arg); + } + + /* ... anyone capable of any of the below ioctls can do *real bad* + things */ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + switch (cmd) { case NBD_DISCONNECT: printk("NBD_DISCONNECT\n"); @@ -524,10 +535,6 @@ dev, lo->queue_head.next, lo->queue_head.prev, requests_in, requests_out); return 0; #endif - case BLKGETSIZE: - return put_user(nbd_bytesizes[dev] >> 9, (unsigned long *) arg); - case BLKGETSIZE64: - return put_user((u64)nbd_bytesizes[dev], (u64 *) arg); } return -EINVAL; }
[PATCH 2.4.29] nbd: fix ioctl permissions
Hi Marcelo, Here's a patch for nbd that Rogier recently sent me. It allows non-root to do BLKGETSIZE, et al. on nbd devices, which he needs for his data recovery applications. Tested against 2.4.29. Please apply. Thanks, Paul From: Rogier Wolff [EMAIL PROTECTED] Signed-Off-By: Paul Clements [EMAIL PROTECTED] Description: We shouldn't need CAP_SYS_ADMIN to ask for disk capacity and such. === diff -ur linux-2.4.28.clean/drivers/block/nbd.c linux-2.4.28.nbd-fix/drivers/block/nbd.c --- linux-2.4.28.clean/drivers/block/nbd.c Wed Jan 19 18:14:01 2005 +++ linux-2.4.28.nbd-fix/drivers/block/nbd.cWed Jan 19 16:36:59 2005 @@ -408,10 +408,7 @@ int dev, error, temp; struct request sreq ; - /* Anyone capable of this syscall can do *real bad* things */ - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; if (!inode) return -EINVAL; dev = MINOR(inode-i_rdev); @@ -419,6 +416,20 @@ return -ENODEV; lo = nbd_dev[dev]; + + /* these are innocent, but */ + switch (cmd) { + case BLKGETSIZE: + return put_user(nbd_bytesizes[dev] 9, (unsigned long *) arg); + case BLKGETSIZE64: + return put_user((u64)nbd_bytesizes[dev], (u64 *) arg); + } + + /* ... anyone capable of any of the below ioctls can do *real bad* + things */ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + switch (cmd) { case NBD_DISCONNECT: printk(NBD_DISCONNECT\n); @@ -524,10 +535,6 @@ dev, lo-queue_head.next, lo-queue_head.prev, requests_in, requests_out); return 0; #endif - case BLKGETSIZE: - return put_user(nbd_bytesizes[dev] 9, (unsigned long *) arg); - case BLKGETSIZE64: - return put_user((u64)nbd_bytesizes[dev], (u64 *) arg); } return -EINVAL; }
kernel panic on 2.2.14 in sg driver
I am seeing a kernel panic on 2.2.14. It looks like 2.2.16 also has the same problem. Details: I have been able to reproduce a kernel panic several times with kdb compiled in and some added printk debug messages and I have now pinpointed the problem. The panic occurs when the following call is made in scsi_ioctl_send_command() ("scsi_ioctl.c", line 329): if(SDpnt->scsi_request_fn) (*SDpnt->scsi_request_fn)(); I have verified that scsi_request_fn is a function pointer that points to the do_sd_request() function ("sd.c", line 530). By adding a debug printk() right before this call I can see that this function pointer contains: 0x8489ab80. This is the address where the panic occurs, labelled "?unknown?" in the stack trace below. After a panic, when I list the instructions at that address, the code that is there is the middle of a switch statement in the sg_ioctl() function. The function pointer is pointing to a bogus address. The problem appears to be due to the following: do_sd_request() is in the sd_mod.o module sg_ioctl() is in sg.o scsi_ioctl_send_command() is in scsi_mod.o Now sg.o and sd_mod.o have a dependency on scsi_mod.o, but sg.o and sd_mod.o are independent of each other. I can cause the kernel to panic by simply unloading sd_mod.o and then performing an ioctl (SCSI_IOCTL_SEND_COMMAND) on an open sg device. Since sg does not depend on sd_mod, the kernel loads sg.o when the sg device is opened, but does not know to load sd_mod.o. The call to "scsi_request_fn" then of course causes a panic. This problem is compounded by the fact that most modern Linux distributions (I'm running Caldera eServer 2.3 on this box), have an /etc/crontab entry that does: "/sbin/rmmod -a" every five minutes, so sd_mod gets autocleaned frequently. So I guess I have a couple questions: Does anyone know if the SCSI drivers have been redesigned to avoid this type of problem in the 2.4 kernel? Does anyone have a solution or workaround to this problem? As a workaround, I guess I could avoid the autoclean problem by doing "insmod sd_mod" in some startup script. Are there other instances of this type of problem elsewhere in the kernel? If you need/want any more information about this, just let me know... Thanks, Paul kernel stack trace: -- ?unknown? (?pointer?, arg, filp, -25, 1) sg_ioctl (filp->f_dentry->d_inode, filp, SCSI_IOCTL_SEND_COMMAND, arg) sys_ioctl (fd, SCSI_IOCTL_SEND_COMMAND, arg) system_call where `fd' is an open file descriptor for the scsi device and `filp' is a "struct file *" corresponding to the open file descriptor for the scsi device and `arg' is the "Scsi_Ioctl_Command *" (struct scsi_ioctl_command *) sent in from the original ioctl call, in this case containing the following: inlen = 0 outlen = 0 command = { 0 (TEST UNIT READY), 6, 0, 70, 98, 0 } data = {empty} - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
kernel panic on 2.2.14 in sg driver
I am seeing a kernel panic on 2.2.14. It looks like 2.2.16 also has the same problem. Details: I have been able to reproduce a kernel panic several times with kdb compiled in and some added printk debug messages and I have now pinpointed the problem. The panic occurs when the following call is made in scsi_ioctl_send_command() ("scsi_ioctl.c", line 329): if(SDpnt-scsi_request_fn) (*SDpnt-scsi_request_fn)(); I have verified that scsi_request_fn is a function pointer that points to the do_sd_request() function ("sd.c", line 530). By adding a debug printk() right before this call I can see that this function pointer contains: 0x8489ab80. This is the address where the panic occurs, labelled "?unknown?" in the stack trace below. After a panic, when I list the instructions at that address, the code that is there is the middle of a switch statement in the sg_ioctl() function. The function pointer is pointing to a bogus address. The problem appears to be due to the following: do_sd_request() is in the sd_mod.o module sg_ioctl() is in sg.o scsi_ioctl_send_command() is in scsi_mod.o Now sg.o and sd_mod.o have a dependency on scsi_mod.o, but sg.o and sd_mod.o are independent of each other. I can cause the kernel to panic by simply unloading sd_mod.o and then performing an ioctl (SCSI_IOCTL_SEND_COMMAND) on an open sg device. Since sg does not depend on sd_mod, the kernel loads sg.o when the sg device is opened, but does not know to load sd_mod.o. The call to "scsi_request_fn" then of course causes a panic. This problem is compounded by the fact that most modern Linux distributions (I'm running Caldera eServer 2.3 on this box), have an /etc/crontab entry that does: "/sbin/rmmod -a" every five minutes, so sd_mod gets autocleaned frequently. So I guess I have a couple questions: Does anyone know if the SCSI drivers have been redesigned to avoid this type of problem in the 2.4 kernel? Does anyone have a solution or workaround to this problem? As a workaround, I guess I could avoid the autoclean problem by doing "insmod sd_mod" in some startup script. Are there other instances of this type of problem elsewhere in the kernel? If you need/want any more information about this, just let me know... Thanks, Paul kernel stack trace: -- ?unknown? (?pointer?, arg, filp, -25, 1) sg_ioctl (filp-f_dentry-d_inode, filp, SCSI_IOCTL_SEND_COMMAND, arg) sys_ioctl (fd, SCSI_IOCTL_SEND_COMMAND, arg) system_call where `fd' is an open file descriptor for the scsi device and `filp' is a "struct file *" corresponding to the open file descriptor for the scsi device and `arg' is the "Scsi_Ioctl_Command *" (struct scsi_ioctl_command *) sent in from the original ioctl call, in this case containing the following: inlen = 0 outlen = 0 command = { 0 (TEST UNIT READY), 6, 0, 70, 98, 0 } data = {empty} - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/