Re: linux-next: Signed-off-by missing for commits in the net-next tree
On 8/16/19 3:06 PM, Gerd Rausch wrote: > Hi, > > Just added the e-mail addresses I found using a simple "google search", > in order to reach out to the original authors of these commits: > Chris Mason and Andy Grover. > > I'm hoping they still remember their work from 7-8 years ago. Yes looks like what I was working on. What did you need from me? It's too late to amend the commitlogs... -- Andy > > Thanks, > > Gerd > > On 15/08/2019 14.53, Stephen Rothwell wrote: >> Hi all, >> >> Commits >> >> 11740ef44829 ("rds: check for excessive looping in rds_send_xmit") >> 65dedd7fe1f2 ("RDS: limit the number of times we loop in rds_send_xmit") >> >> are missing a Signed-off-by from their authors. >>
how to unmap pages in an anonymous mmap?
On 02/26/2017 09:59 PM, Xiubo Li wrote: >> But, We likely don't want to release memory from the data area anyways >> while active, in any case. How about if we set a timer when active >> commands go to zero, and then reduce data area to some minimum if no new >> cmds come in before timer expires? > > If I understand correctly: for example, we have 1G(as the minimum) > data area and all blocks have been allocated and mapped to runner's > vma, then we extern it to 1G + 256M as needed. When there have no > active cmds and after the timer expires, will it reduce the data area > back to 1G ? And then should it release the reduced 256M data area's > memories ? > > If so, after kfree()ed the blocks' memories, it should also try to remove > all the ptes which are mapping this page(like using the try_to_umap()), > but something like try_to_umap() doesn't export for the modules. > > Without ummaping the kfree()ed pages' ptes mentioned above, then > the reduced 256M vma space couldn't be reused again for the runner > process, because the runner has already do the mapping for the reduced > vma space to some old physical pages(won't trigger new page fault > again). Then there will be a hole, and the hole will be bigger and bigger. > > Without ummaping the kfree()ed pages' ptes mentioned above, the > pages' reference count (page_ref_dec(), which _inc()ed in page fault) > couldn't be reduced back too. Let's ask people who will know... Hi linux-mm, TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed memory to back a ring buffer that is mmaped by userspace. We want to move to dynamically mapping pages into this region, and also we'd like to unmap/free pages when idle. What's the right way to unmap? I see unmap_mapping_range() but that mentions an underlying file, which TCMU doesn't have. Or maybe zap_page_range()? But it's not exported. Any advice? Thanks in advance -- Regards -- Andy
how to unmap pages in an anonymous mmap?
On 02/26/2017 09:59 PM, Xiubo Li wrote: >> But, We likely don't want to release memory from the data area anyways >> while active, in any case. How about if we set a timer when active >> commands go to zero, and then reduce data area to some minimum if no new >> cmds come in before timer expires? > > If I understand correctly: for example, we have 1G(as the minimum) > data area and all blocks have been allocated and mapped to runner's > vma, then we extern it to 1G + 256M as needed. When there have no > active cmds and after the timer expires, will it reduce the data area > back to 1G ? And then should it release the reduced 256M data area's > memories ? > > If so, after kfree()ed the blocks' memories, it should also try to remove > all the ptes which are mapping this page(like using the try_to_umap()), > but something like try_to_umap() doesn't export for the modules. > > Without ummaping the kfree()ed pages' ptes mentioned above, then > the reduced 256M vma space couldn't be reused again for the runner > process, because the runner has already do the mapping for the reduced > vma space to some old physical pages(won't trigger new page fault > again). Then there will be a hole, and the hole will be bigger and bigger. > > Without ummaping the kfree()ed pages' ptes mentioned above, the > pages' reference count (page_ref_dec(), which _inc()ed in page fault) > couldn't be reduced back too. Let's ask people who will know... Hi linux-mm, TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed memory to back a ring buffer that is mmaped by userspace. We want to move to dynamically mapping pages into this region, and also we'd like to unmap/free pages when idle. What's the right way to unmap? I see unmap_mapping_range() but that mentions an underlying file, which TCMU doesn't have. Or maybe zap_page_range()? But it's not exported. Any advice? Thanks in advance -- Regards -- Andy
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
On 02/23/2017 06:07 PM, Xiubo Li wrote: >> Cool. This is a good approach for an initial patch but this raises >> concerns about efficiently managing kernel memory usage -- the data area >> grows but never shrinks, and total possible usage increases per >> backstore. (What if there are 1000?) Any ideas how we could also improve >> these aspects of the design? (Global TCMU data area usage limit?) > Two ways in my mind: > > The first: > How about by setting a threshold cmd(SHRINK cmd), something likes > the PAD cmd, to tell the userspace runner try to shrink the memories? Why should userspace need to know if the kernel is shrinking memory allocated to the data area? Userspace knows about memory described in iovecs for in-flight cmds, we shouldn't need its cooperation to free other allocated parts of the data area. But, We likely don't want to release memory from the data area anyways while active, in any case. How about if we set a timer when active commands go to zero, and then reduce data area to some minimum if no new cmds come in before timer expires? > When the runner get the SHRINK cmd, it will try to remmap uio0's ring > buffer(?). Then the kernel will get chance to shrink the memories > > The second: > Try to extern the data area by using /dev/uio1, we could remmap the > uio1 device when need, so it will be easy to get a chance to shrink the > memories in uio1. Userspace should not need to remap the region in order for the kernel to free and unmap pages from the region. The only thing we need to watch out for is if blocks are referenced by in-flight cmds, we can't free them or userspace will segfault. So, if we wait until there are no in-flight cmds, then it follows that the kernel can free whatever it wants and userspace will not segfault. -- Andy
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
On 02/23/2017 06:07 PM, Xiubo Li wrote: >> Cool. This is a good approach for an initial patch but this raises >> concerns about efficiently managing kernel memory usage -- the data area >> grows but never shrinks, and total possible usage increases per >> backstore. (What if there are 1000?) Any ideas how we could also improve >> these aspects of the design? (Global TCMU data area usage limit?) > Two ways in my mind: > > The first: > How about by setting a threshold cmd(SHRINK cmd), something likes > the PAD cmd, to tell the userspace runner try to shrink the memories? Why should userspace need to know if the kernel is shrinking memory allocated to the data area? Userspace knows about memory described in iovecs for in-flight cmds, we shouldn't need its cooperation to free other allocated parts of the data area. But, We likely don't want to release memory from the data area anyways while active, in any case. How about if we set a timer when active commands go to zero, and then reduce data area to some minimum if no new cmds come in before timer expires? > When the runner get the SHRINK cmd, it will try to remmap uio0's ring > buffer(?). Then the kernel will get chance to shrink the memories > > The second: > Try to extern the data area by using /dev/uio1, we could remmap the > uio1 device when need, so it will be easy to get a chance to shrink the > memories in uio1. Userspace should not need to remap the region in order for the kernel to free and unmap pages from the region. The only thing we need to watch out for is if blocks are referenced by in-flight cmds, we can't free them or userspace will segfault. So, if we wait until there are no in-flight cmds, then it follows that the kernel can free whatever it wants and userspace will not segfault. -- Andy
Re: [PATCH] target/user: Add daynmic growing data area feature support
On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li> > Currently for the TCMU, the ring buffer size is fixed to 64K cmd > area + 1M data area, and this will be bottlenecks for high iops. Hi Xiubo, thanks for your work. daynmic -> dynamic Have you benchmarked this patch and determined what kind of iops improvement it allows? Do you see the data area reaching its fully-allocated size? > > The struct tcmu_cmd_entry {} size is fixed about 44 bytes without > iovec[N], and the size of struct iovec[N] is about (16 * N) bytes. > > The sizeof(cmd entry) will be [44B, N *16 + 44B], and corresponding > iovec[N] size will be [0, N * 4096], so the ratio of > sizeof(cmd entry) : sizeof(entry datas) == > (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) + > 44/(N*4096) == 4/1024 + 11/(N * 1024). > > When N is bigger, the ratio will be smaller. If N >= 1, the ratio > will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be > enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio > should be bigger. > > For now we will increase the data area size to 1G, and the cmd area > size to 128M. The tcmu-runner should mmap() about (128M + 1G) when > running and the TCMU will dynamically grows the data area from 0 to > max 1G size. Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) > > The cmd area memory will be allocated through vmalloc(), and the data > area's blocks will be allocated individually later when needed. > > The allocated data area block memory will be managed via radix tree. > For now the bitmap still be the most efficient way to search and > manage the block index, this could be update later. > > Signed-off-by: Xiubo Li > Signed-off-by: Jianfei Hu > --- > drivers/target/target_core_user.c | 241 > ++ > 1 file changed, 167 insertions(+), 74 deletions(-) > > diff --git a/drivers/target/target_core_user.c > b/drivers/target/target_core_user.c > index 2e33100..f2e3769 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -2,6 +2,7 @@ > * Copyright (C) 2013 Shaohua Li > * Copyright (C) 2014 Red Hat, Inc. > * Copyright (C) 2015 Arrikto, Inc. > + * Copyright (C) 2017 Chinamobile, Inc. > * > * This program is free software; you can redistribute it and/or modify it > * under the terms and conditions of the GNU General Public License, > @@ -25,6 +26,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -65,12 +67,15 @@ > > #define TCMU_TIME_OUT (30 * MSEC_PER_SEC) > > -#define DATA_BLOCK_BITS 256 > -#define DATA_BLOCK_SIZE 4096 > +/* For cmd area, the size is fixed 64M */ > +#define CMDR_SIZE (64 * 1024 * 1024) Commitmsg mistakenly says 128M..? > -#define CMDR_SIZE (16 * 4096) > +/* For data area, the size is fixed 1G */ > +#define DATA_BLOCK_BITS (256 * 1024) > +#define DATA_BLOCK_SIZE 4096 > #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE) > > +/* The ring buffer size is 64M + 1G */ > #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE) > > static struct device *tcmu_root_device; > @@ -103,6 +108,7 @@ struct tcmu_dev { > size_t data_size; > > DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); > + struct radix_tree_root data_blocks; > > wait_queue_head_t wait_cmdr; > /* TODO should this be a mutex? */ > @@ -120,15 +126,22 @@ struct tcmu_dev { > > #define CMDR_OFF sizeof(struct tcmu_mailbox) > > +struct tcmu_data_block_index { > + struct list_head list; > + uint32_t index; > +}; is this used anywhere? leftover from an earlier version? > + > struct tcmu_cmd { > struct se_cmd *se_cmd; > struct tcmu_dev *tcmu_dev; > > - uint16_t cmd_id; > + uint32_t cmd_id; why this? tcmu_cmd_entry_hdr has a u16 cmd_id, so making this bigger doesn't really help anything. (natural alignment? But, compiler should be inserting padding as needed.) > > /* Can't use se_cmd when cleaning up expired cmds, because if > cmd has been completed then accessing se_cmd is off limits */ > - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); > + uint32_t dbi_cnt; > + uint32_t dbi_cur; > + uint32_t *dbi; Ah, cool. Saves tcmu_cmd from being huge, now that data_bitmap is 32MiB. > @@ -919,7 +999,7 @@ static int tcmu_configure_device(struct se_device *dev) > > info->name = str; > > - udev->mb_addr = vzalloc(TCMU_RING_SIZE); > + udev->mb_addr = vzalloc(CMDR_SIZE); I'm still concerned about this -- 64M vmalloc'd for cmd ring seems like
Re: [PATCH] target/user: Add daynmic growing data area feature support
On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li > > Currently for the TCMU, the ring buffer size is fixed to 64K cmd > area + 1M data area, and this will be bottlenecks for high iops. Hi Xiubo, thanks for your work. daynmic -> dynamic Have you benchmarked this patch and determined what kind of iops improvement it allows? Do you see the data area reaching its fully-allocated size? > > The struct tcmu_cmd_entry {} size is fixed about 44 bytes without > iovec[N], and the size of struct iovec[N] is about (16 * N) bytes. > > The sizeof(cmd entry) will be [44B, N *16 + 44B], and corresponding > iovec[N] size will be [0, N * 4096], so the ratio of > sizeof(cmd entry) : sizeof(entry datas) == > (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) + > 44/(N*4096) == 4/1024 + 11/(N * 1024). > > When N is bigger, the ratio will be smaller. If N >= 1, the ratio > will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be > enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio > should be bigger. > > For now we will increase the data area size to 1G, and the cmd area > size to 128M. The tcmu-runner should mmap() about (128M + 1G) when > running and the TCMU will dynamically grows the data area from 0 to > max 1G size. Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) > > The cmd area memory will be allocated through vmalloc(), and the data > area's blocks will be allocated individually later when needed. > > The allocated data area block memory will be managed via radix tree. > For now the bitmap still be the most efficient way to search and > manage the block index, this could be update later. > > Signed-off-by: Xiubo Li > Signed-off-by: Jianfei Hu > --- > drivers/target/target_core_user.c | 241 > ++ > 1 file changed, 167 insertions(+), 74 deletions(-) > > diff --git a/drivers/target/target_core_user.c > b/drivers/target/target_core_user.c > index 2e33100..f2e3769 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -2,6 +2,7 @@ > * Copyright (C) 2013 Shaohua Li > * Copyright (C) 2014 Red Hat, Inc. > * Copyright (C) 2015 Arrikto, Inc. > + * Copyright (C) 2017 Chinamobile, Inc. > * > * This program is free software; you can redistribute it and/or modify it > * under the terms and conditions of the GNU General Public License, > @@ -25,6 +26,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -65,12 +67,15 @@ > > #define TCMU_TIME_OUT (30 * MSEC_PER_SEC) > > -#define DATA_BLOCK_BITS 256 > -#define DATA_BLOCK_SIZE 4096 > +/* For cmd area, the size is fixed 64M */ > +#define CMDR_SIZE (64 * 1024 * 1024) Commitmsg mistakenly says 128M..? > -#define CMDR_SIZE (16 * 4096) > +/* For data area, the size is fixed 1G */ > +#define DATA_BLOCK_BITS (256 * 1024) > +#define DATA_BLOCK_SIZE 4096 > #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE) > > +/* The ring buffer size is 64M + 1G */ > #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE) > > static struct device *tcmu_root_device; > @@ -103,6 +108,7 @@ struct tcmu_dev { > size_t data_size; > > DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); > + struct radix_tree_root data_blocks; > > wait_queue_head_t wait_cmdr; > /* TODO should this be a mutex? */ > @@ -120,15 +126,22 @@ struct tcmu_dev { > > #define CMDR_OFF sizeof(struct tcmu_mailbox) > > +struct tcmu_data_block_index { > + struct list_head list; > + uint32_t index; > +}; is this used anywhere? leftover from an earlier version? > + > struct tcmu_cmd { > struct se_cmd *se_cmd; > struct tcmu_dev *tcmu_dev; > > - uint16_t cmd_id; > + uint32_t cmd_id; why this? tcmu_cmd_entry_hdr has a u16 cmd_id, so making this bigger doesn't really help anything. (natural alignment? But, compiler should be inserting padding as needed.) > > /* Can't use se_cmd when cleaning up expired cmds, because if > cmd has been completed then accessing se_cmd is off limits */ > - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); > + uint32_t dbi_cnt; > + uint32_t dbi_cur; > + uint32_t *dbi; Ah, cool. Saves tcmu_cmd from being huge, now that data_bitmap is 32MiB. > @@ -919,7 +999,7 @@ static int tcmu_configure_device(struct se_device *dev) > > info->name = str; > > - udev->mb_addr = vzalloc(TCMU_RING_SIZE); > + udev->mb_addr = vzalloc(CMDR_SIZE); I'm still concerned about this -- 64M vmalloc'd for cmd ring seems like too much. (again: what if there are 1000 tcmu backstores?) and is only needed if data area grows to max
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
On 02/15/2017 05:34 PM, Xiubo Li wrote: >>> --- a/drivers/uio/uio.c >>> +++ b/drivers/uio/uio.c >>> @@ -708,6 +708,8 @@ static int uio_mmap(struct file *filep, struct >>> vm_area_struct *vma) >>> case UIO_MEM_LOGICAL: >>> case UIO_MEM_VIRTUAL: >>> return uio_mmap_logical(vma); >>> +case UIO_MEM_CUSTOM: >>> +return 0; >> How does this help? > For example, the TCMU will use the map area as ISCSI commands & data ring > buffer(uio0 --> map0). Currently the TCMU will using the fixed small > size map > area as the ring buffer, but this will be the bottleneck for high iops. > > Without knowing how large it is enough, so the new scheme will use the > fixed > small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring > buffer > area(about 1.5G). The following code is in uio_mmap() in uio.c: if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); return ret; } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: return uio_mmap_physical(vma); case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); default: return -EINVAL; } We already have the equivalent of a CUSTOM memtype because TCMU sets the info->mmap fn, overriding uio's default handling choices in favor of its own. HTH -- Regards -- Andy
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
On 02/15/2017 05:34 PM, Xiubo Li wrote: >>> --- a/drivers/uio/uio.c >>> +++ b/drivers/uio/uio.c >>> @@ -708,6 +708,8 @@ static int uio_mmap(struct file *filep, struct >>> vm_area_struct *vma) >>> case UIO_MEM_LOGICAL: >>> case UIO_MEM_VIRTUAL: >>> return uio_mmap_logical(vma); >>> +case UIO_MEM_CUSTOM: >>> +return 0; >> How does this help? > For example, the TCMU will use the map area as ISCSI commands & data ring > buffer(uio0 --> map0). Currently the TCMU will using the fixed small > size map > area as the ring buffer, but this will be the bottleneck for high iops. > > Without knowing how large it is enough, so the new scheme will use the > fixed > small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring > buffer > area(about 1.5G). The following code is in uio_mmap() in uio.c: if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); return ret; } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: return uio_mmap_physical(vma); case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); default: return -EINVAL; } We already have the equivalent of a CUSTOM memtype because TCMU sets the info->mmap fn, overriding uio's default handling choices in favor of its own. HTH -- Regards -- Andy
Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events
On 10/05/2016 10:43 AM, Alasdair G Kergon wrote: On Wed, Oct 05, 2016 at 10:06:41AM -0700, Andy Grover wrote: My project *would* like this added sooner, so I'll work on a revised patchset that uses netlink instead of uevents, and will also work on a revision to uevents.txt that talks about KOBJ_CHANGE and when it should and should not be used, as described by agk, to help out the next person. netlink might still be overkill for this at this stage, if it's only making one thread able to monitor multiple devices efficiently. Let's find out your requirements and see if there's anything different from the ones we've debated before. Yeah just like Hannes said, I just want to get dm events for multiple dm devices without burning a thread for each device. After your clarification, netlink seems like the best option and not overkill. Give me a few days and I can have something for you to review. Thanks -- Regards -- Andy
Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events
On 10/05/2016 10:43 AM, Alasdair G Kergon wrote: On Wed, Oct 05, 2016 at 10:06:41AM -0700, Andy Grover wrote: My project *would* like this added sooner, so I'll work on a revised patchset that uses netlink instead of uevents, and will also work on a revision to uevents.txt that talks about KOBJ_CHANGE and when it should and should not be used, as described by agk, to help out the next person. netlink might still be overkill for this at this stage, if it's only making one thread able to monitor multiple devices efficiently. Let's find out your requirements and see if there's anything different from the ones we've debated before. Yeah just like Hannes said, I just want to get dm events for multiple dm devices without burning a thread for each device. After your clarification, netlink seems like the best option and not overkill. Give me a few days and I can have something for you to review. Thanks -- Regards -- Andy
Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events
On 10/04/2016 11:51 PM, Greg KH wrote: On Wed, Oct 05, 2016 at 01:40:05AM +0100, Alasdair G Kergon wrote: We see these as two different categories of notifications, and prefer the greater flexibility a mechanism independent of uevents would provide. The team has discussed several alternatives over the years but didn't make a decision as we've not yet reached a point where we're straining the existing mechanism too far. So, no changes need to be made? I'm confused here, who is wanting this changed? Hehe, Alasdair and I are both RH but working on different projects. We're not the Borg :) My project *would* like this added sooner, so I'll work on a revised patchset that uses netlink instead of uevents, and will also work on a revision to uevents.txt that talks about KOBJ_CHANGE and when it should and should not be used, as described by agk, to help out the next person. Thanks -- Regards -- Andy
Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events
On 10/04/2016 11:51 PM, Greg KH wrote: On Wed, Oct 05, 2016 at 01:40:05AM +0100, Alasdair G Kergon wrote: We see these as two different categories of notifications, and prefer the greater flexibility a mechanism independent of uevents would provide. The team has discussed several alternatives over the years but didn't make a decision as we've not yet reached a point where we're straining the existing mechanism too far. So, no changes need to be made? I'm confused here, who is wanting this changed? Hehe, Alasdair and I are both RH but working on different projects. We're not the Borg :) My project *would* like this added sooner, so I'll work on a revised patchset that uses netlink instead of uevents, and will also work on a revision to uevents.txt that talks about KOBJ_CHANGE and when it should and should not be used, as described by agk, to help out the next person. Thanks -- Regards -- Andy
Re: [PATCH 0/9] Generate uevents for all DM events
On 10/04/2016 12:20 AM, Greg KH wrote: On Mon, Oct 03, 2016 at 12:22:51PM -0700, Andy Grover wrote: Hi Mike and GregKH, I want a way to get devicemapper events without using the DM ioctl, because that requires creating a thread to sleep in the ioctl for each dm device I want events from. It would seem like using uevents and KOBJ_CHANGE would be a good way to do this, but Mike said that the uevent maintainers (Greg that's you?) did not think this was a good idea? KBOJ_CHANGE is a tricky one. It has been used for a variety of different things, but usually it is used to show that a major change has happened with a device like a docking station plug in or out. How often do DM events happen? What is triggering them? How do you want to send them to userspace? Through a sysfs file? Why not just use your own netlink connection? devicemapper is using uevents for: a. dm-verity detected corruption b. dm-multipath: path failed or reinstated c. dm device renamed d. there's also some use in md and bcache. devicemapper uses DM_EVENT ioctl (yuck) for: 1. dm-thin pool data/metadata filling up (hit a threshold) 2. dm-cache is now clean 3. dm-log flushed or log failed 4. dm-raid error detected or sync complete Instead of using uevent for everything, we could go to a separate genetlink for 1-4 instead of making them use uevent like a-d, but we'd end up with two different userspace notification techniques. At least to me, there doesn't seem to be much technical differentiation between the two lists. (Would we want to then at some point transition any of a-d off of uevent and onto genetlink for consistency's sake?) Thanks -- Regards -- Andy
Re: [PATCH 0/9] Generate uevents for all DM events
On 10/04/2016 12:20 AM, Greg KH wrote: On Mon, Oct 03, 2016 at 12:22:51PM -0700, Andy Grover wrote: Hi Mike and GregKH, I want a way to get devicemapper events without using the DM ioctl, because that requires creating a thread to sleep in the ioctl for each dm device I want events from. It would seem like using uevents and KOBJ_CHANGE would be a good way to do this, but Mike said that the uevent maintainers (Greg that's you?) did not think this was a good idea? KBOJ_CHANGE is a tricky one. It has been used for a variety of different things, but usually it is used to show that a major change has happened with a device like a docking station plug in or out. How often do DM events happen? What is triggering them? How do you want to send them to userspace? Through a sysfs file? Why not just use your own netlink connection? devicemapper is using uevents for: a. dm-verity detected corruption b. dm-multipath: path failed or reinstated c. dm device renamed d. there's also some use in md and bcache. devicemapper uses DM_EVENT ioctl (yuck) for: 1. dm-thin pool data/metadata filling up (hit a threshold) 2. dm-cache is now clean 3. dm-log flushed or log failed 4. dm-raid error detected or sync complete Instead of using uevent for everything, we could go to a separate genetlink for 1-4 instead of making them use uevent like a-d, but we'd end up with two different userspace notification techniques. At least to me, there doesn't seem to be much technical differentiation between the two lists. (Would we want to then at some point transition any of a-d off of uevent and onto genetlink for consistency's sake?) Thanks -- Regards -- Andy
[PATCH 2/9] dm: Move multipath-specific stuff out of dm-uevent.c
There's a little bit of mpath-specific stuff that is in dm-uevent.c, because all current uevents want to attach DM_PATH and DM_NR_VALID_PATHS variables to the uevent. Makes sense since all currently defined DM uevents are for dm-mpath, but is not true for future uevents. Move the addition of these to dm-mpath.c and expose a few lower-level functions, dm_build_uevent, dm_uevent_add and dm_uevent_free, for other dm targets to build their own uevents. Signed-off-by: Andy Grover <agro...@redhat.com> --- drivers/md/dm-mpath.c | 71 +++ drivers/md/dm-uevent.c | 75 +- drivers/md/dm-uevent.h | 30 drivers/md/dm.c| 1 + 4 files changed, 103 insertions(+), 74 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index ac734e5..c563b6d 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -30,6 +30,15 @@ #define DM_PG_INIT_DELAY_MSECS 2000 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1) +static const struct { + enum dm_uevent_type type; + enum kobject_action action; + char *name; +} _dm_mpath_uevent_type_names[] = { + {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"}, + {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"}, +}; + /* Path properties */ struct pgpath { struct list_head list; @@ -1232,6 +1241,68 @@ static void multipath_dtr(struct dm_target *ti) free_multipath(m); } +static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md, + struct dm_target *ti, + enum kobject_action action, + const char *dm_action, + const char *path, + unsigned nr_valid_paths) +{ + struct dm_uevent *event; + + event = dm_build_uevent(md, ti, action, dm_action); + if (IS_ERR(event)) + return event; + + if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) { + DMERR("%s: add_uevent_var() for DM_PATH failed", __func__); + goto err_add; + } + + if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d", + nr_valid_paths)) { + DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed", + __func__); + goto err_add; + } + + return event; + +err_add: + dm_uevent_free(event); + return ERR_PTR(-ENOMEM); +} + +/** + * dm_path_uevent - called to create a new path event and queue it + * + * @event_type:path event type enum + * @ti:pointer to a dm_target + * @path: string containing pathname + * @nr_valid_paths:number of valid paths remaining + * + */ +static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, + const char *path, unsigned nr_valid_paths) +{ + struct mapped_device *md = dm_table_get_md(ti->table); + struct dm_uevent *event; + + if (event_type >= ARRAY_SIZE(_dm_mpath_uevent_type_names)) { + DMERR("%s: Invalid event_type %d", __func__, event_type); + return; + } + + event = dm_build_path_uevent(md, ti, + _dm_mpath_uevent_type_names[event_type].action, + _dm_mpath_uevent_type_names[event_type].name, +path, nr_valid_paths); + if (IS_ERR(event)) + return; + + dm_uevent_add(md, >elist); +} + /* * Take a path out of use. */ diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index 2d5f858..f7089dd 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -29,30 +29,13 @@ #define DM_MSG_PREFIX "uevent" -static const struct { - enum dm_uevent_type type; - enum kobject_action action; - char *name; -} _dm_uevent_type_names[] = { - {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"}, - {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"}, -}; - static struct kmem_cache *_dm_event_cache; -struct dm_uevent { - struct mapped_device *md; - enum kobject_action action; - struct kobj_uevent_env ku_env; - struct list_head elist; - char name[DM_NAME_LEN]; - char uuid[DM_UUID_LEN]; -}; - -static void dm_uevent_free(struct dm_uevent *event) +void dm_uevent_free(struct dm_uevent *event) { kmem_cache_free(_dm_event_cache, event); } +EXPORT_SYMBOL_GPL(dm_uevent_free); static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md) { @@ -68,12 +51,10 @@ static struct dm_uevent *dm_uevent_alloc(stru
[PATCH 4/9] dm: Update dm-uevent.txt
Document the current dm uevent API, as modified by this patchset. Signed-off-by: Andy Grover <agro...@redhat.com> --- Documentation/device-mapper/dm-uevent.txt | 49 ++- 1 file changed, 42 insertions(+), 7 deletions(-) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 07edbd85..0a5d909 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -1,18 +1,53 @@ The device-mapper uevent code adds the capability to device-mapper to create and send kobject uevents (uevents). Previously device-mapper events were only -available through the ioctl interface. The advantage of the uevents interface -is the event contains environment attributes providing increased context for +available through the ioctl interface. The advantages of the uevents interface +are the event contains environment attributes providing increased context for the event avoiding the need to query the state of the device-mapper device after -the event is received. +the event is received, and uevents are poll()-able in userspace, whereas the +ioctl event interface is not. -There are two functions currently for device-mapper events. The first function -listed creates the event and the second function sends the event(s). +There are five functions DM targets should use. -void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, -const char *path, unsigned nr_valid_paths) +struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_target *ti, +enum kobject_action action, +const char *dm_action) + +Construct a dm uevent with default DM variables attached(DM_TARGET, +DM_ACTION, DM_SEQNUM). + +int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...) + +Optionally add event-specific data to the generated uevent. e.g. dm-mpath's +PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths +vars. + +void dm_uevent_add(struct mapped_device *md, struct list_head *elist) + +Hand off the uevent to the device's list of pending events. + +void dm_uevent_free(struct dm_uevent *event) + +Something went wrong, free the uevent instead of queueing it. + +void dm_table_event(struct dm_table *t) + +Triggers sending of queued uevents as well as waking up processes waiting on +the ioctl. + + +The device-mapper uevent code also supplies three functions for device-mapper core +to call: + +int dm_uevent_init(void) +void dm_uevent_exit(void) + +Setup and teardown. void dm_send_uevents(struct list_head *events, struct kobject *kobj) +Actually send the uevents (called indirectly from dm_table_event, above). + The variables added to the uevent environment are: -- 2.7.4
[PATCH 2/9] dm: Move multipath-specific stuff out of dm-uevent.c
There's a little bit of mpath-specific stuff that is in dm-uevent.c, because all current uevents want to attach DM_PATH and DM_NR_VALID_PATHS variables to the uevent. Makes sense since all currently defined DM uevents are for dm-mpath, but is not true for future uevents. Move the addition of these to dm-mpath.c and expose a few lower-level functions, dm_build_uevent, dm_uevent_add and dm_uevent_free, for other dm targets to build their own uevents. Signed-off-by: Andy Grover --- drivers/md/dm-mpath.c | 71 +++ drivers/md/dm-uevent.c | 75 +- drivers/md/dm-uevent.h | 30 drivers/md/dm.c| 1 + 4 files changed, 103 insertions(+), 74 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index ac734e5..c563b6d 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -30,6 +30,15 @@ #define DM_PG_INIT_DELAY_MSECS 2000 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1) +static const struct { + enum dm_uevent_type type; + enum kobject_action action; + char *name; +} _dm_mpath_uevent_type_names[] = { + {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"}, + {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"}, +}; + /* Path properties */ struct pgpath { struct list_head list; @@ -1232,6 +1241,68 @@ static void multipath_dtr(struct dm_target *ti) free_multipath(m); } +static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md, + struct dm_target *ti, + enum kobject_action action, + const char *dm_action, + const char *path, + unsigned nr_valid_paths) +{ + struct dm_uevent *event; + + event = dm_build_uevent(md, ti, action, dm_action); + if (IS_ERR(event)) + return event; + + if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) { + DMERR("%s: add_uevent_var() for DM_PATH failed", __func__); + goto err_add; + } + + if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d", + nr_valid_paths)) { + DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed", + __func__); + goto err_add; + } + + return event; + +err_add: + dm_uevent_free(event); + return ERR_PTR(-ENOMEM); +} + +/** + * dm_path_uevent - called to create a new path event and queue it + * + * @event_type:path event type enum + * @ti:pointer to a dm_target + * @path: string containing pathname + * @nr_valid_paths:number of valid paths remaining + * + */ +static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, + const char *path, unsigned nr_valid_paths) +{ + struct mapped_device *md = dm_table_get_md(ti->table); + struct dm_uevent *event; + + if (event_type >= ARRAY_SIZE(_dm_mpath_uevent_type_names)) { + DMERR("%s: Invalid event_type %d", __func__, event_type); + return; + } + + event = dm_build_path_uevent(md, ti, + _dm_mpath_uevent_type_names[event_type].action, + _dm_mpath_uevent_type_names[event_type].name, +path, nr_valid_paths); + if (IS_ERR(event)) + return; + + dm_uevent_add(md, >elist); +} + /* * Take a path out of use. */ diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index 2d5f858..f7089dd 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -29,30 +29,13 @@ #define DM_MSG_PREFIX "uevent" -static const struct { - enum dm_uevent_type type; - enum kobject_action action; - char *name; -} _dm_uevent_type_names[] = { - {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"}, - {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"}, -}; - static struct kmem_cache *_dm_event_cache; -struct dm_uevent { - struct mapped_device *md; - enum kobject_action action; - struct kobj_uevent_env ku_env; - struct list_head elist; - char name[DM_NAME_LEN]; - char uuid[DM_UUID_LEN]; -}; - -static void dm_uevent_free(struct dm_uevent *event) +void dm_uevent_free(struct dm_uevent *event) { kmem_cache_free(_dm_event_cache, event); } +EXPORT_SYMBOL_GPL(dm_uevent_free); static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md) { @@ -68,12 +51,10 @@ static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md) return event; } -static st
[PATCH 4/9] dm: Update dm-uevent.txt
Document the current dm uevent API, as modified by this patchset. Signed-off-by: Andy Grover --- Documentation/device-mapper/dm-uevent.txt | 49 ++- 1 file changed, 42 insertions(+), 7 deletions(-) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 07edbd85..0a5d909 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -1,18 +1,53 @@ The device-mapper uevent code adds the capability to device-mapper to create and send kobject uevents (uevents). Previously device-mapper events were only -available through the ioctl interface. The advantage of the uevents interface -is the event contains environment attributes providing increased context for +available through the ioctl interface. The advantages of the uevents interface +are the event contains environment attributes providing increased context for the event avoiding the need to query the state of the device-mapper device after -the event is received. +the event is received, and uevents are poll()-able in userspace, whereas the +ioctl event interface is not. -There are two functions currently for device-mapper events. The first function -listed creates the event and the second function sends the event(s). +There are five functions DM targets should use. -void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, -const char *path, unsigned nr_valid_paths) +struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_target *ti, +enum kobject_action action, +const char *dm_action) + +Construct a dm uevent with default DM variables attached(DM_TARGET, +DM_ACTION, DM_SEQNUM). + +int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...) + +Optionally add event-specific data to the generated uevent. e.g. dm-mpath's +PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths +vars. + +void dm_uevent_add(struct mapped_device *md, struct list_head *elist) + +Hand off the uevent to the device's list of pending events. + +void dm_uevent_free(struct dm_uevent *event) + +Something went wrong, free the uevent instead of queueing it. + +void dm_table_event(struct dm_table *t) + +Triggers sending of queued uevents as well as waking up processes waiting on +the ioctl. + + +The device-mapper uevent code also supplies three functions for device-mapper core +to call: + +int dm_uevent_init(void) +void dm_uevent_exit(void) + +Setup and teardown. void dm_send_uevents(struct list_head *events, struct kobject *kobj) +Actually send the uevents (called indirectly from dm_table_event, above). + The variables added to the uevent environment are: -- 2.7.4
[PATCH 5/9] dm: Rename dm_build_uevent to dm_uevent_build
For consistency with other function names that start with 'dm_uevent'. Signed-off-by: Andy Grover <agro...@redhat.com> --- Documentation/device-mapper/dm-uevent.txt | 2 +- drivers/md/dm-mpath.c | 4 ++-- drivers/md/dm-uevent.c| 4 ++-- drivers/md/dm-uevent.h| 4 ++-- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 0a5d909..5780e76 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -8,7 +8,7 @@ ioctl event interface is not. There are five functions DM targets should use. -struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 75fe28a..26f99d0 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1261,12 +1261,12 @@ static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, return; } - event = dm_build_uevent(md, + event = dm_uevent_build(md, ti, _dm_mpath_uevent_type_names[event_type].action, _dm_mpath_uevent_type_names[event_type].name); if (IS_ERR(event)) { - DMERR("%s: dm_build_uevent() failed", __func__); + DMERR("%s: dm_uevent_build() failed", __func__); return; } diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index f7089dd..6a72de7 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -51,7 +51,7 @@ static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md) return event; } -struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action) @@ -92,7 +92,7 @@ err_add: err_nomem: return ERR_PTR(-ENOMEM); } -EXPORT_SYMBOL_GPL(dm_build_uevent); +EXPORT_SYMBOL_GPL(dm_uevent_build); /** * dm_send_uevents - send uevents for given list diff --git a/drivers/md/dm-uevent.h b/drivers/md/dm-uevent.h index 4ff2ad1..4dc99c2 100644 --- a/drivers/md/dm-uevent.h +++ b/drivers/md/dm-uevent.h @@ -42,7 +42,7 @@ enum dm_uevent_type { extern int dm_uevent_init(void); extern void dm_uevent_exit(void); extern void dm_send_uevents(struct list_head *events, struct kobject *kobj); -struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action); @@ -61,7 +61,7 @@ static inline void dm_send_uevents(struct list_head *events, struct kobject *kobj) { } -static inline struct dm_uevent *dm_build_uevent(struct mapped_device *md, +static inline struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action) -- 2.7.4
[PATCH 9/9] dm: Generate uevents for other targets
Generate uevents for other targets: cache, log, raid1, and snap. Signed-off-by: Andy Grover <agro...@redhat.com> --- drivers/md/dm-cache-target.c | 5 - drivers/md/dm-log-userspace-base.c | 8 ++-- drivers/md/dm-log.c| 1 + drivers/md/dm-raid1.c | 1 + drivers/md/dm-snap.c | 1 + 5 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 59b2c50..29f0bc9 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -586,8 +586,10 @@ static void clear_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cbl { if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) { policy_clear_dirty(cache->policy, oblock); - if (atomic_dec_return(>nr_dirty) == 0) + if (atomic_dec_return(>nr_dirty) == 0) { + dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_CLEAN"); dm_table_event(cache->ti->table); + } } } @@ -978,6 +980,7 @@ static void notify_mode_switch(struct cache *cache, enum cache_metadata_mode mod "fail" }; + dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_MODE_SWITCH"); dm_table_event(cache->ti->table); DMINFO("%s: switching cache to %s mode", cache_device_name(cache), descs[(int)mode]); diff --git a/drivers/md/dm-log-userspace-base.c b/drivers/md/dm-log-userspace-base.c index 53b7b06d..cb0b6cb 100644 --- a/drivers/md/dm-log-userspace-base.c +++ b/drivers/md/dm-log-userspace-base.c @@ -162,8 +162,10 @@ static void do_flush(struct work_struct *work) r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH, NULL, 0, NULL, NULL); - if (r) + if (r) { + dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED"); dm_table_event(lc->ti->table); + } } /* @@ -634,8 +636,10 @@ out: mempool_free(fe, flush_entry_pool); } - if (r) + if (r) { + dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED"); dm_table_event(lc->ti->table); + } return r; } diff --git a/drivers/md/dm-log.c b/drivers/md/dm-log.c index 07fc1ad..8d144d2 100644 --- a/drivers/md/dm-log.c +++ b/drivers/md/dm-log.c @@ -579,6 +579,7 @@ static void fail_log_device(struct log_c *lc) return; lc->log_dev_failed = 1; + dm_uevent_add(lc->ti, KOBJ_CHANGE, "LOG_FAILED"); dm_table_event(lc->ti->table); } diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index bdf1606..daffbab 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -410,6 +410,7 @@ static void do_recovery(struct mirror_set *ms) if (!ms->in_sync && (log->type->get_sync_count(log) == ms->nr_regions)) { /* the sync is complete */ + dm_uevent_add(ms->ti, KOBJ_CHANGE, "TABLE_SYNCED"); dm_table_event(ms->ti->table); ms->in_sync = 1; reset_ms_flags(ms); diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index c65feea..b47e28f6 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1443,6 +1443,7 @@ static void __invalidate_snapshot(struct dm_snapshot *s, int err) s->valid = 0; + dm_uevent_add(s->ti, KOBJ_CHANGE, "SNAPSHOT_INVALIDATED"); dm_table_event(s->ti->table); } -- 2.7.4
[PATCH 7/9] dm: Implement dm_uevent_add()
This helper function builds and enqueues the uevent. Targets can use this when they do not need to customize the uevent beyond its name. Signed-off-by: Andy Grover <agro...@redhat.com> --- Documentation/device-mapper/dm-uevent.txt | 5 + drivers/md/dm-uevent.c| 26 ++ include/linux/device-mapper.h | 1 + 3 files changed, 32 insertions(+) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 4fcb71d..097e7f4 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -30,6 +30,11 @@ void dm_uevent_free(struct dm_uevent *event) Something went wrong, free the uevent instead of queueing it. +void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char *name) + +Helper function that builds and queues a uevent if no additional +variables need to be added. + void dm_table_event(struct dm_table *t) Triggers sending of queued uevents as well as waking up processes waiting on diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index 6a72de7..d139135 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -95,6 +95,32 @@ err_nomem: EXPORT_SYMBOL_GPL(dm_uevent_build); /** + * dm_uevent_add - Create and queue a basic uevent + * + * @ti:The target to add a uevent for + * @action:The kobject action + * @name: The name of the uevent to add and queue + * + * If targets wish to create and queue a uevent but don't need to add + * extra vars, they can do so more easily by calling this. + * This is usually followed by a call to dm_table_event(). + */ +void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char *name) +{ + struct mapped_device *md = dm_table_get_md(ti->table); + struct dm_uevent *event; + + event = dm_uevent_build(md, ti, action, name); + if (IS_ERR(event)) { + DMERR("%s: dm_uevent_build() failed", __func__); + return; + } + + dm_uevent_queue(md, >elist); +} +EXPORT_SYMBOL_GPL(dm_uevent_add); + +/** * dm_send_uevents - send uevents for given list * * @events:list of events to send diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 024207c..c102e29 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -420,6 +420,7 @@ uint32_t dm_get_event_nr(struct mapped_device *md); int dm_wait_event(struct mapped_device *md, int event_nr); uint32_t dm_next_uevent_seq(struct mapped_device *md); void dm_uevent_queue(struct mapped_device *md, struct list_head *elist); +void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char *name); /* * Info functions. -- 2.7.4
[PATCH 9/9] dm: Generate uevents for other targets
Generate uevents for other targets: cache, log, raid1, and snap. Signed-off-by: Andy Grover --- drivers/md/dm-cache-target.c | 5 - drivers/md/dm-log-userspace-base.c | 8 ++-- drivers/md/dm-log.c| 1 + drivers/md/dm-raid1.c | 1 + drivers/md/dm-snap.c | 1 + 5 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 59b2c50..29f0bc9 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -586,8 +586,10 @@ static void clear_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cbl { if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) { policy_clear_dirty(cache->policy, oblock); - if (atomic_dec_return(>nr_dirty) == 0) + if (atomic_dec_return(>nr_dirty) == 0) { + dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_CLEAN"); dm_table_event(cache->ti->table); + } } } @@ -978,6 +980,7 @@ static void notify_mode_switch(struct cache *cache, enum cache_metadata_mode mod "fail" }; + dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_MODE_SWITCH"); dm_table_event(cache->ti->table); DMINFO("%s: switching cache to %s mode", cache_device_name(cache), descs[(int)mode]); diff --git a/drivers/md/dm-log-userspace-base.c b/drivers/md/dm-log-userspace-base.c index 53b7b06d..cb0b6cb 100644 --- a/drivers/md/dm-log-userspace-base.c +++ b/drivers/md/dm-log-userspace-base.c @@ -162,8 +162,10 @@ static void do_flush(struct work_struct *work) r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH, NULL, 0, NULL, NULL); - if (r) + if (r) { + dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED"); dm_table_event(lc->ti->table); + } } /* @@ -634,8 +636,10 @@ out: mempool_free(fe, flush_entry_pool); } - if (r) + if (r) { + dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED"); dm_table_event(lc->ti->table); + } return r; } diff --git a/drivers/md/dm-log.c b/drivers/md/dm-log.c index 07fc1ad..8d144d2 100644 --- a/drivers/md/dm-log.c +++ b/drivers/md/dm-log.c @@ -579,6 +579,7 @@ static void fail_log_device(struct log_c *lc) return; lc->log_dev_failed = 1; + dm_uevent_add(lc->ti, KOBJ_CHANGE, "LOG_FAILED"); dm_table_event(lc->ti->table); } diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index bdf1606..daffbab 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -410,6 +410,7 @@ static void do_recovery(struct mirror_set *ms) if (!ms->in_sync && (log->type->get_sync_count(log) == ms->nr_regions)) { /* the sync is complete */ + dm_uevent_add(ms->ti, KOBJ_CHANGE, "TABLE_SYNCED"); dm_table_event(ms->ti->table); ms->in_sync = 1; reset_ms_flags(ms); diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index c65feea..b47e28f6 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1443,6 +1443,7 @@ static void __invalidate_snapshot(struct dm_snapshot *s, int err) s->valid = 0; + dm_uevent_add(s->ti, KOBJ_CHANGE, "SNAPSHOT_INVALIDATED"); dm_table_event(s->ti->table); } -- 2.7.4
[PATCH 7/9] dm: Implement dm_uevent_add()
This helper function builds and enqueues the uevent. Targets can use this when they do not need to customize the uevent beyond its name. Signed-off-by: Andy Grover --- Documentation/device-mapper/dm-uevent.txt | 5 + drivers/md/dm-uevent.c| 26 ++ include/linux/device-mapper.h | 1 + 3 files changed, 32 insertions(+) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 4fcb71d..097e7f4 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -30,6 +30,11 @@ void dm_uevent_free(struct dm_uevent *event) Something went wrong, free the uevent instead of queueing it. +void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char *name) + +Helper function that builds and queues a uevent if no additional +variables need to be added. + void dm_table_event(struct dm_table *t) Triggers sending of queued uevents as well as waking up processes waiting on diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index 6a72de7..d139135 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -95,6 +95,32 @@ err_nomem: EXPORT_SYMBOL_GPL(dm_uevent_build); /** + * dm_uevent_add - Create and queue a basic uevent + * + * @ti:The target to add a uevent for + * @action:The kobject action + * @name: The name of the uevent to add and queue + * + * If targets wish to create and queue a uevent but don't need to add + * extra vars, they can do so more easily by calling this. + * This is usually followed by a call to dm_table_event(). + */ +void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char *name) +{ + struct mapped_device *md = dm_table_get_md(ti->table); + struct dm_uevent *event; + + event = dm_uevent_build(md, ti, action, name); + if (IS_ERR(event)) { + DMERR("%s: dm_uevent_build() failed", __func__); + return; + } + + dm_uevent_queue(md, >elist); +} +EXPORT_SYMBOL_GPL(dm_uevent_add); + +/** * dm_send_uevents - send uevents for given list * * @events:list of events to send diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 024207c..c102e29 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -420,6 +420,7 @@ uint32_t dm_get_event_nr(struct mapped_device *md); int dm_wait_event(struct mapped_device *md, int event_nr); uint32_t dm_next_uevent_seq(struct mapped_device *md); void dm_uevent_queue(struct mapped_device *md, struct list_head *elist); +void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char *name); /* * Info functions. -- 2.7.4
[PATCH 5/9] dm: Rename dm_build_uevent to dm_uevent_build
For consistency with other function names that start with 'dm_uevent'. Signed-off-by: Andy Grover --- Documentation/device-mapper/dm-uevent.txt | 2 +- drivers/md/dm-mpath.c | 4 ++-- drivers/md/dm-uevent.c| 4 ++-- drivers/md/dm-uevent.h| 4 ++-- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 0a5d909..5780e76 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -8,7 +8,7 @@ ioctl event interface is not. There are five functions DM targets should use. -struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 75fe28a..26f99d0 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1261,12 +1261,12 @@ static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, return; } - event = dm_build_uevent(md, + event = dm_uevent_build(md, ti, _dm_mpath_uevent_type_names[event_type].action, _dm_mpath_uevent_type_names[event_type].name); if (IS_ERR(event)) { - DMERR("%s: dm_build_uevent() failed", __func__); + DMERR("%s: dm_uevent_build() failed", __func__); return; } diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index f7089dd..6a72de7 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -51,7 +51,7 @@ static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md) return event; } -struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action) @@ -92,7 +92,7 @@ err_add: err_nomem: return ERR_PTR(-ENOMEM); } -EXPORT_SYMBOL_GPL(dm_build_uevent); +EXPORT_SYMBOL_GPL(dm_uevent_build); /** * dm_send_uevents - send uevents for given list diff --git a/drivers/md/dm-uevent.h b/drivers/md/dm-uevent.h index 4ff2ad1..4dc99c2 100644 --- a/drivers/md/dm-uevent.h +++ b/drivers/md/dm-uevent.h @@ -42,7 +42,7 @@ enum dm_uevent_type { extern int dm_uevent_init(void); extern void dm_uevent_exit(void); extern void dm_send_uevents(struct list_head *events, struct kobject *kobj); -struct dm_uevent *dm_build_uevent(struct mapped_device *md, +struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action); @@ -61,7 +61,7 @@ static inline void dm_send_uevents(struct list_head *events, struct kobject *kobj) { } -static inline struct dm_uevent *dm_build_uevent(struct mapped_device *md, +static inline struct dm_uevent *dm_uevent_build(struct mapped_device *md, struct dm_target *ti, enum kobject_action action, const char *dm_action) -- 2.7.4
[PATCH 8/9] dm: Generate uevents for thin targets
Generate uevents when thin pool devices hit data or metadata low water marks, and when pool mode changes. Signed-off-by: Andy Grover <agro...@redhat.com> --- drivers/md/dm-thin.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index d1c05c1..0778a2a 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1429,6 +1429,7 @@ static void check_low_water_mark(struct pool *pool, dm_block_t free_blocks) spin_lock_irqsave(>lock, flags); pool->low_water_triggered = true; spin_unlock_irqrestore(>lock, flags); + dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_DATA"); dm_table_event(pool->ti->table); } } @@ -2397,6 +2398,7 @@ static enum pool_mode get_pool_mode(struct pool *pool) static void notify_of_pool_mode_change(struct pool *pool, const char *new_mode) { + dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_POOL_MODE"); dm_table_event(pool->ti->table); DMINFO("%s: switching pool to %s mode", dm_device_name(pool->pool_md), new_mode); @@ -3095,6 +3097,7 @@ static void metadata_low_callback(void *context) DMWARN("%s: reached low water mark for metadata device: sending event.", dm_device_name(pool->pool_md)); + dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_METADATA"); dm_table_event(pool->ti->table); } -- 2.7.4
[PATCH 6/9] dm: Rename dm_event_add to dm_event_queue
For clarity. This function queues an event to be sent, which is not totally clear from the previous name. And, we want to reuse the name for something else. Signed-off-by: Andy Grover <agro...@redhat.com> --- Documentation/device-mapper/dm-uevent.txt | 2 +- drivers/md/dm-mpath.c | 2 +- drivers/md/dm.c | 4 ++-- include/linux/device-mapper.h | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 5780e76..4fcb71d 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -22,7 +22,7 @@ Optionally add event-specific data to the generated uevent. e.g. dm-mpath's PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths vars. -void dm_uevent_add(struct mapped_device *md, struct list_head *elist) +void dm_uevent_queue(struct mapped_device *md, struct list_head *elist) Hand off the uevent to the device's list of pending events. diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 26f99d0..467724c 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1282,7 +1282,7 @@ static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, goto err; } - dm_uevent_add(md, >elist); + dm_uevent_queue(md, >elist); return; err: dm_uevent_free(event); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 701c75f..37b3a82 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2419,7 +2419,7 @@ int dm_wait_event(struct mapped_device *md, int event_nr) (event_nr != atomic_read(>event_nr))); } -void dm_uevent_add(struct mapped_device *md, struct list_head *elist) +void dm_uevent_queue(struct mapped_device *md, struct list_head *elist) { unsigned long flags; @@ -2427,7 +2427,7 @@ void dm_uevent_add(struct mapped_device *md, struct list_head *elist) list_add(elist, >uevent_list); spin_unlock_irqrestore(>uevent_lock, flags); } -EXPORT_SYMBOL_GPL(dm_uevent_add); +EXPORT_SYMBOL_GPL(dm_uevent_queue); /* * The gendisk is only valid as long as you have a reference diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 91acfce..024207c 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -419,7 +419,7 @@ int dm_resume(struct mapped_device *md); uint32_t dm_get_event_nr(struct mapped_device *md); int dm_wait_event(struct mapped_device *md, int event_nr); uint32_t dm_next_uevent_seq(struct mapped_device *md); -void dm_uevent_add(struct mapped_device *md, struct list_head *elist); +void dm_uevent_queue(struct mapped_device *md, struct list_head *elist); /* * Info functions. -- 2.7.4
[PATCH 8/9] dm: Generate uevents for thin targets
Generate uevents when thin pool devices hit data or metadata low water marks, and when pool mode changes. Signed-off-by: Andy Grover --- drivers/md/dm-thin.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index d1c05c1..0778a2a 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1429,6 +1429,7 @@ static void check_low_water_mark(struct pool *pool, dm_block_t free_blocks) spin_lock_irqsave(>lock, flags); pool->low_water_triggered = true; spin_unlock_irqrestore(>lock, flags); + dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_DATA"); dm_table_event(pool->ti->table); } } @@ -2397,6 +2398,7 @@ static enum pool_mode get_pool_mode(struct pool *pool) static void notify_of_pool_mode_change(struct pool *pool, const char *new_mode) { + dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_POOL_MODE"); dm_table_event(pool->ti->table); DMINFO("%s: switching pool to %s mode", dm_device_name(pool->pool_md), new_mode); @@ -3095,6 +3097,7 @@ static void metadata_low_callback(void *context) DMWARN("%s: reached low water mark for metadata device: sending event.", dm_device_name(pool->pool_md)); + dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_METADATA"); dm_table_event(pool->ti->table); } -- 2.7.4
[PATCH 6/9] dm: Rename dm_event_add to dm_event_queue
For clarity. This function queues an event to be sent, which is not totally clear from the previous name. And, we want to reuse the name for something else. Signed-off-by: Andy Grover --- Documentation/device-mapper/dm-uevent.txt | 2 +- drivers/md/dm-mpath.c | 2 +- drivers/md/dm.c | 4 ++-- include/linux/device-mapper.h | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 5780e76..4fcb71d 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -22,7 +22,7 @@ Optionally add event-specific data to the generated uevent. e.g. dm-mpath's PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths vars. -void dm_uevent_add(struct mapped_device *md, struct list_head *elist) +void dm_uevent_queue(struct mapped_device *md, struct list_head *elist) Hand off the uevent to the device's list of pending events. diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 26f99d0..467724c 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1282,7 +1282,7 @@ static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, goto err; } - dm_uevent_add(md, >elist); + dm_uevent_queue(md, >elist); return; err: dm_uevent_free(event); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 701c75f..37b3a82 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2419,7 +2419,7 @@ int dm_wait_event(struct mapped_device *md, int event_nr) (event_nr != atomic_read(>event_nr))); } -void dm_uevent_add(struct mapped_device *md, struct list_head *elist) +void dm_uevent_queue(struct mapped_device *md, struct list_head *elist) { unsigned long flags; @@ -2427,7 +2427,7 @@ void dm_uevent_add(struct mapped_device *md, struct list_head *elist) list_add(elist, >uevent_list); spin_unlock_irqrestore(>uevent_lock, flags); } -EXPORT_SYMBOL_GPL(dm_uevent_add); +EXPORT_SYMBOL_GPL(dm_uevent_queue); /* * The gendisk is only valid as long as you have a reference diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 91acfce..024207c 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -419,7 +419,7 @@ int dm_resume(struct mapped_device *md); uint32_t dm_get_event_nr(struct mapped_device *md); int dm_wait_event(struct mapped_device *md, int event_nr); uint32_t dm_next_uevent_seq(struct mapped_device *md); -void dm_uevent_add(struct mapped_device *md, struct list_head *elist); +void dm_uevent_queue(struct mapped_device *md, struct list_head *elist); /* * Info functions. -- 2.7.4
[PATCH 3/9] dm: Inline dm_build_path_uevent into dm_path_uevent
Since it's no longer an API boundary we can consolidate these two functions. Signed-off-by: Andy Grover <agro...@redhat.com> --- drivers/md/dm-mpath.c | 59 +++ 1 file changed, 22 insertions(+), 37 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index c563b6d..75fe28a 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1241,38 +1241,6 @@ static void multipath_dtr(struct dm_target *ti) free_multipath(m); } -static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md, - struct dm_target *ti, - enum kobject_action action, - const char *dm_action, - const char *path, - unsigned nr_valid_paths) -{ - struct dm_uevent *event; - - event = dm_build_uevent(md, ti, action, dm_action); - if (IS_ERR(event)) - return event; - - if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) { - DMERR("%s: add_uevent_var() for DM_PATH failed", __func__); - goto err_add; - } - - if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d", - nr_valid_paths)) { - DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed", - __func__); - goto err_add; - } - - return event; - -err_add: - dm_uevent_free(event); - return ERR_PTR(-ENOMEM); -} - /** * dm_path_uevent - called to create a new path event and queue it * @@ -1293,14 +1261,31 @@ static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, return; } - event = dm_build_path_uevent(md, ti, - _dm_mpath_uevent_type_names[event_type].action, - _dm_mpath_uevent_type_names[event_type].name, -path, nr_valid_paths); - if (IS_ERR(event)) + event = dm_build_uevent(md, + ti, + _dm_mpath_uevent_type_names[event_type].action, + _dm_mpath_uevent_type_names[event_type].name); + if (IS_ERR(event)) { + DMERR("%s: dm_build_uevent() failed", __func__); return; + } + + if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) { + DMERR("%s: add_uevent_var() for DM_PATH failed", __func__); + goto err; + } + + if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d", + nr_valid_paths)) { + DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed", + __func__); + goto err; + } dm_uevent_add(md, >elist); + return; +err: + dm_uevent_free(event); } /* -- 2.7.4
[PATCH 3/9] dm: Inline dm_build_path_uevent into dm_path_uevent
Since it's no longer an API boundary we can consolidate these two functions. Signed-off-by: Andy Grover --- drivers/md/dm-mpath.c | 59 +++ 1 file changed, 22 insertions(+), 37 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index c563b6d..75fe28a 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1241,38 +1241,6 @@ static void multipath_dtr(struct dm_target *ti) free_multipath(m); } -static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md, - struct dm_target *ti, - enum kobject_action action, - const char *dm_action, - const char *path, - unsigned nr_valid_paths) -{ - struct dm_uevent *event; - - event = dm_build_uevent(md, ti, action, dm_action); - if (IS_ERR(event)) - return event; - - if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) { - DMERR("%s: add_uevent_var() for DM_PATH failed", __func__); - goto err_add; - } - - if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d", - nr_valid_paths)) { - DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed", - __func__); - goto err_add; - } - - return event; - -err_add: - dm_uevent_free(event); - return ERR_PTR(-ENOMEM); -} - /** * dm_path_uevent - called to create a new path event and queue it * @@ -1293,14 +1261,31 @@ static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, return; } - event = dm_build_path_uevent(md, ti, - _dm_mpath_uevent_type_names[event_type].action, - _dm_mpath_uevent_type_names[event_type].name, -path, nr_valid_paths); - if (IS_ERR(event)) + event = dm_build_uevent(md, + ti, + _dm_mpath_uevent_type_names[event_type].action, + _dm_mpath_uevent_type_names[event_type].name); + if (IS_ERR(event)) { + DMERR("%s: dm_build_uevent() failed", __func__); return; + } + + if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) { + DMERR("%s: add_uevent_var() for DM_PATH failed", __func__); + goto err; + } + + if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d", + nr_valid_paths)) { + DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed", + __func__); + goto err; + } dm_uevent_add(md, >elist); + return; +err: + dm_uevent_free(event); } /* -- 2.7.4
[PATCH 0/9] Generate uevents for all DM events
Hi Mike and GregKH, I want a way to get devicemapper events without using the DM ioctl, because that requires creating a thread to sleep in the ioctl for each dm device I want events from. It would seem like using uevents and KOBJ_CHANGE would be a good way to do this, but Mike said that the uevent maintainers (Greg that's you?) did not think this was a good idea? If so, I was hoping you could talk a little more about why -- grep shows KOBJ_CHANGE used all over, but its usage is not documented in kobject.txt. In any case I can update kobject.txt with some more guidelines. The following patchset may be appliable if you're actually ok with using KOBJ_CHANGE for dm events, or if not, then I'll look to rework it to use a dm-specific genetlink approach. Thanks -- Regards -- Andy Andy Grover (9): dm: Do not export dm_send_uevents dm: Move multipath-specific stuff out of dm-uevent.c dm: Inline dm_build_path_uevent into dm_path_uevent dm: Update dm-uevent.txt dm: Rename dm_build_uevent to dm_uevent_build dm: Rename dm_event_add to dm_event_queue dm: Implement dm_uevent_add() dm: Generate uevents for thin targets dm: Generate uevents for other targets Documentation/device-mapper/dm-uevent.txt | 54 ++-- drivers/md/dm-cache-target.c | 5 +- drivers/md/dm-log-userspace-base.c| 8 ++- drivers/md/dm-log.c | 1 + drivers/md/dm-mpath.c | 56 drivers/md/dm-raid1.c | 1 + drivers/md/dm-snap.c | 1 + drivers/md/dm-thin.c | 3 + drivers/md/dm-uevent.c| 102 ++ drivers/md/dm-uevent.h| 30 +++-- drivers/md/dm.c | 3 +- include/linux/device-mapper.h | 3 +- 12 files changed, 180 insertions(+), 87 deletions(-) -- 2.7.4
[PATCH 0/9] Generate uevents for all DM events
Hi Mike and GregKH, I want a way to get devicemapper events without using the DM ioctl, because that requires creating a thread to sleep in the ioctl for each dm device I want events from. It would seem like using uevents and KOBJ_CHANGE would be a good way to do this, but Mike said that the uevent maintainers (Greg that's you?) did not think this was a good idea? If so, I was hoping you could talk a little more about why -- grep shows KOBJ_CHANGE used all over, but its usage is not documented in kobject.txt. In any case I can update kobject.txt with some more guidelines. The following patchset may be appliable if you're actually ok with using KOBJ_CHANGE for dm events, or if not, then I'll look to rework it to use a dm-specific genetlink approach. Thanks -- Regards -- Andy Andy Grover (9): dm: Do not export dm_send_uevents dm: Move multipath-specific stuff out of dm-uevent.c dm: Inline dm_build_path_uevent into dm_path_uevent dm: Update dm-uevent.txt dm: Rename dm_build_uevent to dm_uevent_build dm: Rename dm_event_add to dm_event_queue dm: Implement dm_uevent_add() dm: Generate uevents for thin targets dm: Generate uevents for other targets Documentation/device-mapper/dm-uevent.txt | 54 ++-- drivers/md/dm-cache-target.c | 5 +- drivers/md/dm-log-userspace-base.c| 8 ++- drivers/md/dm-log.c | 1 + drivers/md/dm-mpath.c | 56 drivers/md/dm-raid1.c | 1 + drivers/md/dm-snap.c | 1 + drivers/md/dm-thin.c | 3 + drivers/md/dm-uevent.c| 102 ++ drivers/md/dm-uevent.h| 30 +++-- drivers/md/dm.c | 3 +- include/linux/device-mapper.h | 3 +- 12 files changed, 180 insertions(+), 87 deletions(-) -- 2.7.4
[PATCH 1/9] dm: Do not export dm_send_uevents
Since dm-uevent.c (if CONFIG_DM_UEVENT) is part of the same module as where dm_sent_uevents is called, it does not need to be exported. Signed-off-by: Andy Grover <agro...@redhat.com> --- drivers/md/dm-uevent.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index 8efe033..2d5f858 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -169,7 +169,6 @@ uevent_free: dm_uevent_free(event); } } -EXPORT_SYMBOL_GPL(dm_send_uevents); /** * dm_path_uevent - called to create a new path event and queue it -- 2.7.4
[PATCH 1/9] dm: Do not export dm_send_uevents
Since dm-uevent.c (if CONFIG_DM_UEVENT) is part of the same module as where dm_sent_uevents is called, it does not need to be exported. Signed-off-by: Andy Grover --- drivers/md/dm-uevent.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c index 8efe033..2d5f858 100644 --- a/drivers/md/dm-uevent.c +++ b/drivers/md/dm-uevent.c @@ -169,7 +169,6 @@ uevent_free: dm_uevent_free(event); } } -EXPORT_SYMBOL_GPL(dm_send_uevents); /** * dm_path_uevent - called to create a new path event and queue it -- 2.7.4
Re: NVMe over Fabrics target implementation
On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote: Hi HCH & Co, On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote: This patch set adds a generic NVMe over Fabrics target. The implementation conforms to the NVMe 1.2b specification (which includes Fabrics) and provides the NVMe over Fabrics access to Linux block devices. Thanks for all of the development work by the fabric_linux_driver team (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year. Very excited to see this code get a public release now that NVMf specification is out. Now that it's in the wild, it's a good opportunity to discuss some of the more interesting implementation details, beyond the new NVMf wire-protocol itself. I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)? Thanks -- Andy
Re: NVMe over Fabrics target implementation
On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote: Hi HCH & Co, On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote: This patch set adds a generic NVMe over Fabrics target. The implementation conforms to the NVMe 1.2b specification (which includes Fabrics) and provides the NVMe over Fabrics access to Linux block devices. Thanks for all of the development work by the fabric_linux_driver team (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year. Very excited to see this code get a public release now that NVMf specification is out. Now that it's in the wild, it's a good opportunity to discuss some of the more interesting implementation details, beyond the new NVMf wire-protocol itself. I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)? Thanks -- Andy
Re: __might_sleep in uio_read()?
On 09/03/2015 12:12 PM, Greg KH wrote: On Thu, Sep 03, 2015 at 11:47:34AM -0700, Andy Grover wrote: On 09/03/2015 05:26 AM, Michal Hocko wrote: On Wed 02-09-15 15:45:10, Andy Grover wrote: Hi Hans and Greg, Is this an issue with uio? I swear it didn't used to throw this warning... Thanks -- Andy [ 5174.883261] [ cut here ] [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] The warning says that the driver is calling copy_to_user with TASK_INTERRUPTIBLE which is wrong in general because this context can sleep and a schedule would destroy the state. It doesn't matter here because the code would break out from the loop regardless of the copy_to_user return value. I assume that TASK_INTERRUPTIBLE is necessary before the event_count check to prevent from wake up races. If that is the case then you can simply do: diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 3257d4220d01..7d8959e3833b 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user *buf, event_count = atomic_read(>event); if (event_count != listener->event_count) { + __set_current_state(TASK_RUNNING); if (copy_to_user(buf, _count, count)) retval = -EFAULT; else { This certainly makes the warning go away. If this looks good to everyone else can we get this change in? What changed to require this? Why is this suddenly showing up now? I'm working on drivers/target/target_core_user.c, a SCSI userspace passthrough that was added in 3.18, aka TCMU, which uses uio. The checks for !TASK_RUNNING were added in 3.19 (8eb23b9f3 and 00845eb96) ...and I'm just getting back to TCMU development after a bit, so maybe I'm the first one to call uio_read since those checks were added in 3.19? -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: __might_sleep in uio_read()?
On 09/03/2015 05:26 AM, Michal Hocko wrote: On Wed 02-09-15 15:45:10, Andy Grover wrote: Hi Hans and Greg, Is this an issue with uio? I swear it didn't used to throw this warning... Thanks -- Andy [ 5174.883261] [ cut here ] [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] The warning says that the driver is calling copy_to_user with TASK_INTERRUPTIBLE which is wrong in general because this context can sleep and a schedule would destroy the state. It doesn't matter here because the code would break out from the loop regardless of the copy_to_user return value. I assume that TASK_INTERRUPTIBLE is necessary before the event_count check to prevent from wake up races. If that is the case then you can simply do: diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 3257d4220d01..7d8959e3833b 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user *buf, event_count = atomic_read(>event); if (event_count != listener->event_count) { + __set_current_state(TASK_RUNNING); if (copy_to_user(buf, _count, count)) retval = -EFAULT; else { This certainly makes the warning go away. If this looks good to everyone else can we get this change in? Thanks -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: __might_sleep in uio_read()?
On 09/03/2015 05:26 AM, Michal Hocko wrote: On Wed 02-09-15 15:45:10, Andy Grover wrote: Hi Hans and Greg, Is this an issue with uio? I swear it didn't used to throw this warning... Thanks -- Andy [ 5174.883261] [ cut here ] [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] The warning says that the driver is calling copy_to_user with TASK_INTERRUPTIBLE which is wrong in general because this context can sleep and a schedule would destroy the state. It doesn't matter here because the code would break out from the loop regardless of the copy_to_user return value. I assume that TASK_INTERRUPTIBLE is necessary before the event_count check to prevent from wake up races. If that is the case then you can simply do: diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 3257d4220d01..7d8959e3833b 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user *buf, event_count = atomic_read(>event); if (event_count != listener->event_count) { + __set_current_state(TASK_RUNNING); if (copy_to_user(buf, _count, count)) retval = -EFAULT; else { This certainly makes the warning go away. If this looks good to everyone else can we get this change in? Thanks -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: __might_sleep in uio_read()?
On 09/03/2015 12:12 PM, Greg KH wrote: On Thu, Sep 03, 2015 at 11:47:34AM -0700, Andy Grover wrote: On 09/03/2015 05:26 AM, Michal Hocko wrote: On Wed 02-09-15 15:45:10, Andy Grover wrote: Hi Hans and Greg, Is this an issue with uio? I swear it didn't used to throw this warning... Thanks -- Andy [ 5174.883261] [ cut here ] [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] The warning says that the driver is calling copy_to_user with TASK_INTERRUPTIBLE which is wrong in general because this context can sleep and a schedule would destroy the state. It doesn't matter here because the code would break out from the loop regardless of the copy_to_user return value. I assume that TASK_INTERRUPTIBLE is necessary before the event_count check to prevent from wake up races. If that is the case then you can simply do: diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 3257d4220d01..7d8959e3833b 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user *buf, event_count = atomic_read(>event); if (event_count != listener->event_count) { + __set_current_state(TASK_RUNNING); if (copy_to_user(buf, _count, count)) retval = -EFAULT; else { This certainly makes the warning go away. If this looks good to everyone else can we get this change in? What changed to require this? Why is this suddenly showing up now? I'm working on drivers/target/target_core_user.c, a SCSI userspace passthrough that was added in 3.18, aka TCMU, which uses uio. The checks for !TASK_RUNNING were added in 3.19 (8eb23b9f3 and 00845eb96) ...and I'm just getting back to TCMU development after a bit, so maybe I'm the first one to call uio_read since those checks were added in 3.19? -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
__might_sleep in uio_read()?
Hi Hans and Greg, Is this an issue with uio? I swear it didn't used to throw this warning... Thanks -- Andy [ 5174.883261] [ cut here ] [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] [ 5174.885198] Modules linked in: tcm_loop target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod uinput fuse nfsv3 nfs_acl nfs lockd grace fscache sunrpc microcode i2c_piix4 virtio_balloon i2c_core xfs libcrc32c crc32c_intel virtio_net virtio_blk [ 5174.887351] CPU: 0 PID: 1532 Comm: tcmu-runner Not tainted 4.2.0-rc7+ #178 [ 5174.887853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 5174.888633] 81a3b870 880045393ca8 817afaae [ 5174.889224] 880045393cf8 880045393ce8 8109a846 880045393cd8 [ 5174.889793] a02a7150 02dc 880045008000 [ 5174.890375] Call Trace: [ 5174.890562] [] dump_stack+0x4c/0x65 [ 5174.890938] [] warn_slowpath_common+0x86/0xc0 [ 5174.891388] [] warn_slowpath_fmt+0x46/0x50 [ 5174.891808] [] ? uio_read+0x91/0x170 [uio] [ 5174.892237] [] ? uio_read+0x91/0x170 [uio] [ 5174.892653] [] __might_sleep+0x7d/0x90 [ 5174.893055] [] __might_fault+0x43/0xa0 [ 5174.893448] [] ? schedule+0x3e/0x90 [ 5174.893820] [] uio_read+0x132/0x170 [uio] [ 5174.894240] [] ? wake_up_q+0x70/0x70 [ 5174.894620] [] __vfs_read+0x28/0xe0 [ 5174.894993] [] ? security_file_permission+0xa3/0xc0 [ 5174.895541] [] ? rw_verify_area+0x4f/0xf0 [ 5174.896006] [] vfs_read+0x8a/0x140 [ 5174.896391] [] ? __schedule+0x425/0xcc0 [ 5174.896788] [] SyS_read+0x49/0xb0 [ 5174.897160] [] entry_SYSCALL_64_fastpath+0x12/0x76 [ 5174.897625] ---[ end trace 61765cb39fe48dd5 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
__might_sleep in uio_read()?
Hi Hans and Greg, Is this an issue with uio? I swear it didn't used to throw this warning... Thanks -- Andy [ 5174.883261] [ cut here ] [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] [ 5174.885198] Modules linked in: tcm_loop target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod uinput fuse nfsv3 nfs_acl nfs lockd grace fscache sunrpc microcode i2c_piix4 virtio_balloon i2c_core xfs libcrc32c crc32c_intel virtio_net virtio_blk [ 5174.887351] CPU: 0 PID: 1532 Comm: tcmu-runner Not tainted 4.2.0-rc7+ #178 [ 5174.887853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 5174.888633] 81a3b870 880045393ca8 817afaae [ 5174.889224] 880045393cf8 880045393ce8 8109a846 880045393cd8 [ 5174.889793] a02a7150 02dc 880045008000 [ 5174.890375] Call Trace: [ 5174.890562] [] dump_stack+0x4c/0x65 [ 5174.890938] [] warn_slowpath_common+0x86/0xc0 [ 5174.891388] [] warn_slowpath_fmt+0x46/0x50 [ 5174.891808] [] ? uio_read+0x91/0x170 [uio] [ 5174.892237] [] ? uio_read+0x91/0x170 [uio] [ 5174.892653] [] __might_sleep+0x7d/0x90 [ 5174.893055] [] __might_fault+0x43/0xa0 [ 5174.893448] [] ? schedule+0x3e/0x90 [ 5174.893820] [] uio_read+0x132/0x170 [uio] [ 5174.894240] [] ? wake_up_q+0x70/0x70 [ 5174.894620] [] __vfs_read+0x28/0xe0 [ 5174.894993] [] ? security_file_permission+0xa3/0xc0 [ 5174.895541] [] ? rw_verify_area+0x4f/0xf0 [ 5174.896006] [] vfs_read+0x8a/0x140 [ 5174.896391] [] ? __schedule+0x425/0xcc0 [ 5174.896788] [] SyS_read+0x49/0xb0 [ 5174.897160] [] entry_SYSCALL_64_fastpath+0x12/0x76 [ 5174.897625] ---[ end trace 61765cb39fe48dd5 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/12] target/configfs: Convert mappedlun + SCSI MIBs to RCU reader
On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote: From: Nicholas Bellinger This patch converts fabric independent configfs link/unlink to use RCU read path macros for se_node_acl->lun_entry_hlist access. It also converts SCSI MIB configfs show attribute code to use RCU read path macros for se_node_acl->lun_entry_hlist access. Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Sagi Grimberg Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_fabric_configfs.c | 35 +++--- drivers/target/target_core_stat.c| 180 +-- 2 files changed, 110 insertions(+), 105 deletions(-) diff --git a/drivers/target/target_core_fabric_configfs.c b/drivers/target/target_core_fabric_configfs.c index 6cb4828..0dab6d5 100644 --- a/drivers/target/target_core_fabric_configfs.c +++ b/drivers/target/target_core_fabric_configfs.c @@ -123,16 +123,16 @@ static int target_fabric_mappedlun_link( * which be will write protected (READ-ONLY) when * tpg_1/attrib/demo_mode_write_protect=1 */ - spin_lock_irq(>se_lun_nacl->device_list_lock); - deve = lacl->se_lun_nacl->device_list[lacl->mapped_lun]; - if (deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS) + rcu_read_lock(); + deve = target_nacl_find_deve(lacl->se_lun_nacl, lacl->mapped_lun); + if (deve && deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS) Why do we still need TRANSPORT_LUNFLAGS_INITIATOR_ACCESS? Isn't deve being not-NULL enough? -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/12] target: Convert se_node_acl->device_list[] to RCU hlist
On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote: From: Nicholas Bellinger This patch converts se_node_acl->device_list[] table for mappedluns to modern RCU hlist_head usage in order to support an arbitrary number of node_acl lun mappings. This includes changes to core_[enable,disable]_device_list_for_node() rcu_assign_pointer() and invokes call_rcu() for releasing memory, along with a number of RCU read path conversions in target_core_device.c code. Required for subsequent conversion of transport_lookup_cmd() to lock-less RCU read path. Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Sagi Grimberg Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_device.c | 218 +++--- drivers/target/target_core_internal.h | 1 + drivers/target/target_core_tpg.c | 23 ++-- include/target/target_core_base.h | 7 +- 4 files changed, 137 insertions(+), 112 deletions(-) diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 6e58976..1df14ce 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -198,12 +198,9 @@ struct se_dev_entry *core_get_se_deve_from_rtpi( struct se_lun *lun; struct se_port *port; struct se_portal_group *tpg = nacl->se_tpg; - u32 i; - - spin_lock_irq(>device_list_lock); - for (i = 0; i < TRANSPORT_MAX_LUNS_PER_TPG; i++) { - deve = nacl->device_list[i]; + rcu_read_lock(); + hlist_for_each_entry_rcu(deve, >lun_entry_hlist, link) { if (!(deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)) continue; @@ -225,11 +222,11 @@ struct se_dev_entry *core_get_se_deve_from_rtpi( continue; atomic_inc_mb(>pr_ref_count); - spin_unlock_irq(>device_list_lock); + rcu_read_unlock(); return deve; } - spin_unlock_irq(>device_list_lock); + rcu_read_unlock(); return NULL; } @@ -240,18 +237,12 @@ int core_free_device_list_for_node( { struct se_dev_entry *deve; struct se_lun *lun; - u32 i; - - if (!nacl->device_list) - return 0; - - spin_lock_irq(>device_list_lock); - for (i = 0; i < TRANSPORT_MAX_LUNS_PER_TPG; i++) { - deve = nacl->device_list[i]; + u32 mapped_lun; + rcu_read_lock(); + hlist_for_each_entry_rcu(deve, >lun_entry_hlist, link) { if (!(deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)) continue; - if (!deve->se_lun) { pr_err("%s device entries device pointer is" " NULL, but Initiator has access.\n", @@ -259,16 +250,14 @@ int core_free_device_list_for_node( continue; } lun = deve->se_lun; + mapped_lun = deve->mapped_lun; + rcu_read_unlock(); - spin_unlock_irq(>device_list_lock); - core_disable_device_list_for_node(lun, NULL, deve->mapped_lun, - TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, tpg); - spin_lock_irq(>device_list_lock); + core_disable_device_list_for_node(lun, NULL, mapped_lun, + TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, tpg); + rcu_read_lock(); } - spin_unlock_irq(>device_list_lock); - - array_free(nacl->device_list, TRANSPORT_MAX_LUNS_PER_TPG); - nacl->device_list = NULL; + rcu_read_unlock(); return 0; } @@ -280,18 +269,44 @@ void core_update_device_list_access( { struct se_dev_entry *deve; - spin_lock_irq(>device_list_lock); - deve = nacl->device_list[mapped_lun]; - if (lun_access & TRANSPORT_LUNFLAGS_READ_WRITE) { - deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_ONLY; - deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE; - } else { - deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_WRITE; - deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY; + spin_lock_irq(>lun_entry_lock); + deve = target_nacl_find_deve(nacl, mapped_lun); + if (deve) { + if (lun_access & TRANSPORT_LUNFLAGS_READ_WRITE) { + deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_ONLY; + deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE; + } else { + deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_WRITE; + deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY; + } } - spin_unlock_irq(>device_list_lock); + spin_unlock_irq(>lun_entry_lock); + + synchronize_rcu(); +} + +static void target_nacl_deve_callrcu(struct rcu_head *head) +{ + struct se_dev_entry *deve = container_of(head, struct
Re: [PATCH 01/12] target: Convert se_node_acl-device_list[] to RCU hlist
On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch converts se_node_acl-device_list[] table for mappedluns to modern RCU hlist_head usage in order to support an arbitrary number of node_acl lun mappings. This includes changes to core_[enable,disable]_device_list_for_node() rcu_assign_pointer() and invokes call_rcu() for releasing memory, along with a number of RCU read path conversions in target_core_device.c code. Required for subsequent conversion of transport_lookup_cmd() to lock-less RCU read path. Cc: Hannes Reinecke h...@suse.de Cc: Christoph Hellwig h...@lst.de Cc: Sagi Grimberg sa...@mellanox.com Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- drivers/target/target_core_device.c | 218 +++--- drivers/target/target_core_internal.h | 1 + drivers/target/target_core_tpg.c | 23 ++-- include/target/target_core_base.h | 7 +- 4 files changed, 137 insertions(+), 112 deletions(-) diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 6e58976..1df14ce 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -198,12 +198,9 @@ struct se_dev_entry *core_get_se_deve_from_rtpi( struct se_lun *lun; struct se_port *port; struct se_portal_group *tpg = nacl-se_tpg; - u32 i; - - spin_lock_irq(nacl-device_list_lock); - for (i = 0; i TRANSPORT_MAX_LUNS_PER_TPG; i++) { - deve = nacl-device_list[i]; + rcu_read_lock(); + hlist_for_each_entry_rcu(deve, nacl-lun_entry_hlist, link) { if (!(deve-lun_flags TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)) continue; @@ -225,11 +222,11 @@ struct se_dev_entry *core_get_se_deve_from_rtpi( continue; atomic_inc_mb(deve-pr_ref_count); - spin_unlock_irq(nacl-device_list_lock); + rcu_read_unlock(); return deve; } - spin_unlock_irq(nacl-device_list_lock); + rcu_read_unlock(); return NULL; } @@ -240,18 +237,12 @@ int core_free_device_list_for_node( { struct se_dev_entry *deve; struct se_lun *lun; - u32 i; - - if (!nacl-device_list) - return 0; - - spin_lock_irq(nacl-device_list_lock); - for (i = 0; i TRANSPORT_MAX_LUNS_PER_TPG; i++) { - deve = nacl-device_list[i]; + u32 mapped_lun; + rcu_read_lock(); + hlist_for_each_entry_rcu(deve, nacl-lun_entry_hlist, link) { if (!(deve-lun_flags TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)) continue; - if (!deve-se_lun) { pr_err(%s device entries device pointer is NULL, but Initiator has access.\n, @@ -259,16 +250,14 @@ int core_free_device_list_for_node( continue; } lun = deve-se_lun; + mapped_lun = deve-mapped_lun; + rcu_read_unlock(); - spin_unlock_irq(nacl-device_list_lock); - core_disable_device_list_for_node(lun, NULL, deve-mapped_lun, - TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, tpg); - spin_lock_irq(nacl-device_list_lock); + core_disable_device_list_for_node(lun, NULL, mapped_lun, + TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, tpg); + rcu_read_lock(); } - spin_unlock_irq(nacl-device_list_lock); - - array_free(nacl-device_list, TRANSPORT_MAX_LUNS_PER_TPG); - nacl-device_list = NULL; + rcu_read_unlock(); return 0; } @@ -280,18 +269,44 @@ void core_update_device_list_access( { struct se_dev_entry *deve; - spin_lock_irq(nacl-device_list_lock); - deve = nacl-device_list[mapped_lun]; - if (lun_access TRANSPORT_LUNFLAGS_READ_WRITE) { - deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_ONLY; - deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE; - } else { - deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_WRITE; - deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY; + spin_lock_irq(nacl-lun_entry_lock); + deve = target_nacl_find_deve(nacl, mapped_lun); + if (deve) { + if (lun_access TRANSPORT_LUNFLAGS_READ_WRITE) { + deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_ONLY; + deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE; + } else { + deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_WRITE; + deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY; + } } - spin_unlock_irq(nacl-device_list_lock); + spin_unlock_irq(nacl-lun_entry_lock); + + synchronize_rcu(); +} + +static void
Re: [PATCH 03/12] target/configfs: Convert mappedlun + SCSI MIBs to RCU reader
On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch converts fabric independent configfs link/unlink to use RCU read path macros for se_node_acl-lun_entry_hlist access. It also converts SCSI MIB configfs show attribute code to use RCU read path macros for se_node_acl-lun_entry_hlist access. Cc: Hannes Reinecke h...@suse.de Cc: Christoph Hellwig h...@lst.de Cc: Sagi Grimberg sa...@mellanox.com Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- drivers/target/target_core_fabric_configfs.c | 35 +++--- drivers/target/target_core_stat.c| 180 +-- 2 files changed, 110 insertions(+), 105 deletions(-) diff --git a/drivers/target/target_core_fabric_configfs.c b/drivers/target/target_core_fabric_configfs.c index 6cb4828..0dab6d5 100644 --- a/drivers/target/target_core_fabric_configfs.c +++ b/drivers/target/target_core_fabric_configfs.c @@ -123,16 +123,16 @@ static int target_fabric_mappedlun_link( * which be will write protected (READ-ONLY) when * tpg_1/attrib/demo_mode_write_protect=1 */ - spin_lock_irq(lacl-se_lun_nacl-device_list_lock); - deve = lacl-se_lun_nacl-device_list[lacl-mapped_lun]; - if (deve-lun_flags TRANSPORT_LUNFLAGS_INITIATOR_ACCESS) + rcu_read_lock(); + deve = target_nacl_find_deve(lacl-se_lun_nacl, lacl-mapped_lun); + if (deve deve-lun_flags TRANSPORT_LUNFLAGS_INITIATOR_ACCESS) Why do we still need TRANSPORT_LUNFLAGS_INITIATOR_ACCESS? Isn't deve being not-NULL enough? -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] configfs: Update docs to indicate sysfs & configfs usage is ok
The docs state that sysfs and configfs are peers that meet different needs, but doesn't explicitly make it clear that if a subsystem needs functionality that spans both, it is preferable to use both instead of trying to cram everything into one or the other. Signed-off-by: Andy Grover --- Documentation/filesystems/configfs/configfs.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index b40fec9..a94c9c3 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -33,7 +33,8 @@ lifetime of the representation is completely driven by userspace. The kernel modules backing the items must respond to this. Both sysfs and configfs can and should exist together on the same -system. One is not a replacement for the other. +system. One is not a replacement for the other. They are +complementary, and a client module may have need to support both. [Using configfs] -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] configfs: Update docs to indicate sysfs configfs usage is ok
The docs state that sysfs and configfs are peers that meet different needs, but doesn't explicitly make it clear that if a subsystem needs functionality that spans both, it is preferable to use both instead of trying to cram everything into one or the other. Signed-off-by: Andy Grover agro...@redhat.com --- Documentation/filesystems/configfs/configfs.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index b40fec9..a94c9c3 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -33,7 +33,8 @@ lifetime of the representation is completely driven by userspace. The kernel modules backing the items must respond to this. Both sysfs and configfs can and should exist together on the same -system. One is not a replacement for the other. +system. One is not a replacement for the other. They are +complementary, and a client module may have need to support both. [Using configfs] -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] uapi/linux/target_core_user.h: fix headers_install.sh badness
On 12/18/2014 09:57 AM, Kyle McMartin wrote: scripts/headers_install.sh will transform __packed to __attribute__((packed)), so the #ifndef is not necessary. (and, in fact, it's problematic, because we'll end up with the header containing: #ifndef __attribute__((packed)) #define __attribu... and so forth.) Cc: sta...@vger.kernel.org # 3.18 Signed-off-by: Kyle McMartin --- cc-ing stable@ so this headers fix gets picked up by distros. Acked-by: Andy Grover -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] uapi/linux/target_core_user.h: fix headers_install.sh badness
On 12/18/2014 09:57 AM, Kyle McMartin wrote: scripts/headers_install.sh will transform __packed to __attribute__((packed)), so the #ifndef is not necessary. (and, in fact, it's problematic, because we'll end up with the header containing: #ifndef __attribute__((packed)) #define __attribu... and so forth.) Cc: sta...@vger.kernel.org # 3.18 Signed-off-by: Kyle McMartin k...@redhat.com --- cc-ing stable@ so this headers fix gets picked up by distros. Acked-by: Andy Grover agro...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 10/25] virtio: add API to enable VQs early
On 10/13/2014 12:50 AM, Michael S. Tsirkin wrote: virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin Reviewed-by: Cornelia Huck --- include/linux/virtio_config.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index e8f8f71..e36403b 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev, return vq; } +/** + * virtio_device_ready - enable vq use in probe function + * @vdev: the device + * + * Driver must call this to use vqs in the probe function. + * + * Note: vqs are enabled automatically after probe returns. + */ +static inline +void virtio_device_ready(struct virtio_device *dev) +{ + unsigned status = dev->config->get_status(dev); + + BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); + dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK); +} Getting a BUG when booting via KVM, host Fedora 20, guest Fedora 20. my config is at: https://fedorapeople.org/~grover/config-20141110 [0.828494] [ cut here ] [0.829039] kernel BUG at /home/agrover/git/kernel/include/linux/virtio_config.h:125! [0.831266] invalid opcode: [#1] SMP DEBUG_PAGEALLOC [0.831266] Modules linked in: [0.831266] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 3.18.0-rc4 #120 [0.831266] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.831266] Workqueue: events control_work_handler [0.831266] task: 88003cd98000 ti: 88003cd94000 task.ti: 88003cd94000 [0.831266] RIP: 0010:[] [] add_port+0x264/0x410 [0.831266] RSP: :88003cd97c78 EFLAGS: 00010202 [0.831266] RAX: 0007 RBX: 88003c58c400 RCX: 0001 [0.831266] RDX: c132 RSI: 81a955e9 RDI: 0001c132 [0.831266] RBP: 88003cd97cc8 R08: R09: [0.831266] R10: 0001 R11: R12: 88003c58be00 [0.831266] R13: 0001 R14: 8800395ca800 R15: 88003c58c420 [0.831266] FS: () GS:88003fa0() knlGS: [0.831266] CS: 0010 DS: ES: CR0: 8005003b [0.831266] CR2: CR3: 01c11000 CR4: 06e0 [0.831266] DR0: DR1: DR2: [0.831266] DR3: DR6: fffe0ff0 DR7: 0400 [0.831266] Stack: [0.831266] 8801 0292 0001 [0.831266] 88003cd97cc8 88003dfa8a20 88003c58beb8 88003c58be10 [0.831266] 8800395a2000 88003cd97d38 8144531a [0.831266] Call Trace: [0.831266] [] control_work_handler+0x16a/0x3c0 [0.831266] [] ? process_one_work+0x208/0x500 [0.831266] [] process_one_work+0x2ac/0x500 [0.831266] [] ? process_one_work+0x208/0x500 [0.831266] [] worker_thread+0x2ce/0x4e0 [0.831266] [] ? process_one_work+0x500/0x500 [0.831266] [] kthread+0xf8/0x100 [0.831266] [] ? trace_hardirqs_on+0xd/0x10 [0.831266] [] ? kthread_stop+0x140/0x140 [0.831266] [] ret_from_fork+0x7c/0xb0 [0.831266] [] ? kthread_stop+0x140/0x140 [0.831266] Code: c7 c2 48 31 01 83 48 c7 c6 e9 55 a9 81 e8 55 b4 c6 ff 4d 8b b4 24 58 01 00 00 49 8b 86 e8 04 00 00 4c 89 f7 ff 50 10 a8 04 74 0c <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 49 8b 96 e8 04 00 00 83 c8 [0.831266] RIP [] add_port+0x264/0x410 [0.831266] RSP [0.878202] ---[ end trace f98fbb172cc7bbf4 ]--- Thanks -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 10/25] virtio: add API to enable VQs early
On 10/13/2014 12:50 AM, Michael S. Tsirkin wrote: virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/virtio_config.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index e8f8f71..e36403b 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev, return vq; } +/** + * virtio_device_ready - enable vq use in probe function + * @vdev: the device + * + * Driver must call this to use vqs in the probe function. + * + * Note: vqs are enabled automatically after probe returns. + */ +static inline +void virtio_device_ready(struct virtio_device *dev) +{ + unsigned status = dev-config-get_status(dev); + + BUG_ON(status VIRTIO_CONFIG_S_DRIVER_OK); + dev-config-set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK); +} Getting a BUG when booting via KVM, host Fedora 20, guest Fedora 20. my config is at: https://fedorapeople.org/~grover/config-20141110 [0.828494] [ cut here ] [0.829039] kernel BUG at /home/agrover/git/kernel/include/linux/virtio_config.h:125! [0.831266] invalid opcode: [#1] SMP DEBUG_PAGEALLOC [0.831266] Modules linked in: [0.831266] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 3.18.0-rc4 #120 [0.831266] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.831266] Workqueue: events control_work_handler [0.831266] task: 88003cd98000 ti: 88003cd94000 task.ti: 88003cd94000 [0.831266] RIP: 0010:[81445004] [81445004] add_port+0x264/0x410 [0.831266] RSP: :88003cd97c78 EFLAGS: 00010202 [0.831266] RAX: 0007 RBX: 88003c58c400 RCX: 0001 [0.831266] RDX: c132 RSI: 81a955e9 RDI: 0001c132 [0.831266] RBP: 88003cd97cc8 R08: R09: [0.831266] R10: 0001 R11: R12: 88003c58be00 [0.831266] R13: 0001 R14: 8800395ca800 R15: 88003c58c420 [0.831266] FS: () GS:88003fa0() knlGS: [0.831266] CS: 0010 DS: ES: CR0: 8005003b [0.831266] CR2: CR3: 01c11000 CR4: 06e0 [0.831266] DR0: DR1: DR2: [0.831266] DR3: DR6: fffe0ff0 DR7: 0400 [0.831266] Stack: [0.831266] 8801 0292 0001 [0.831266] 88003cd97cc8 88003dfa8a20 88003c58beb8 88003c58be10 [0.831266] 8800395a2000 88003cd97d38 8144531a [0.831266] Call Trace: [0.831266] [8144531a] control_work_handler+0x16a/0x3c0 [0.831266] [8108b0c8] ? process_one_work+0x208/0x500 [0.831266] [8108b16c] process_one_work+0x2ac/0x500 [0.831266] [8108b0c8] ? process_one_work+0x208/0x500 [0.831266] [8108b68e] worker_thread+0x2ce/0x4e0 [0.831266] [8108b3c0] ? process_one_work+0x500/0x500 [0.831266] [81090b28] kthread+0xf8/0x100 [0.831266] [810bad7d] ? trace_hardirqs_on+0xd/0x10 [0.831266] [81090a30] ? kthread_stop+0x140/0x140 [0.831266] [816ea92c] ret_from_fork+0x7c/0xb0 [0.831266] [81090a30] ? kthread_stop+0x140/0x140 [0.831266] Code: c7 c2 48 31 01 83 48 c7 c6 e9 55 a9 81 e8 55 b4 c6 ff 4d 8b b4 24 58 01 00 00 49 8b 86 e8 04 00 00 4c 89 f7 ff 50 10 a8 04 74 0c 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 49 8b 96 e8 04 00 00 83 c8 [0.831266] RIP [81445004] add_port+0x264/0x410 [0.831266] RSP 88003cd97c78 [0.878202] ---[ end trace f98fbb172cc7bbf4 ]--- Thanks -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/4] target: Remove unneeded check in sbc_parse_cdb
The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's refactoring of this function. There are no places where that flag is set that cmd->execute_cmd isn't also set. Signed-off-by: Andy Grover --- drivers/target/target_core_sbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index bd78d92..ebe62af 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) } /* reject any command that we don't have a handler for */ - if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) && !cmd->execute_cmd) + if (!cmd->execute_cmd) return TCM_UNSUPPORTED_SCSI_OPCODE; if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) { -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/4] Userspace Passthrough backstore for LIO
Hi Nick and all, Here's version 2, against Nick's target-pending/for-next. Note that since that tree is still on 3.17-rc5, in testing I ran into an issue fixed by: genhd: fix leftover might_sleep() in blk_free_devt() which first appears in 3.17-rc7. Incorporates feedback from Nick. Doc fixes * add table of contents * add example code for the user-side * document specific opcodes that IO mode passes through Code fixes * add back support for PASS_ALL and PASS_IO pass_level option. * Removed retry-in-kernel mechanism * fix issues from kbuild test robot * remove kref from tcmu_dev and call kfree(udev) from tcmu_free_device instead * set hw_block_size to 512 instead of 4096 * Don't copy DMA_FROM_DEVICE data if there was a CHECK_CONDITION * use goto-style error handling in tcmu_configure_device * Check retval from genlmsg_multicast and ignore -ESRCH * properly handle if device is unconfigured in tcmu_free_device Thanks -- Andy Andy Grover (4): target: Remove unneeded check in sbc_parse_cdb uio: Export definition of struct uio_device target: Add documentation on the target userspace pass-through driver target: Add a user-passthrough backstore Documentation/target/tcmu-design.txt | 378 +++ drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_sbc.c |2 +- drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1163 drivers/uio/uio.c | 12 - include/linux/uio_driver.h | 12 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 10 files changed, 1706 insertions(+), 14 deletions(-) create mode 100644 Documentation/target/tcmu-design.txt create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 4/4] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for returning scsi command status and optional sense data. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. Signed-off-by: Andy Grover --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1163 include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 6 files changed, 1316 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate "TCM/USER Subsystem Plugin for Linux" + help + Say Y here to enable the TCM/USER subsystem plugin + source "drivers/target/loopback/Kconfig" source "drivers/target/tcm_fc/Kconfig" source "drivers/target/iscsi/Kconfig" diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 9700ea1..9ea0d5f 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err("Unable to load target_core_pscsi\n"); + ret = request_module("target_core_user"); + if (ret != 0) + pr_err("Unable to load target_core_user\n"); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..6608ecf --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1163 @@ +/* + * Copyright (C) 2013 Shaohua Li + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
[PATCHv2 3/4] target: Add documentation on the target userspace pass-through driver
Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Thanks to Richard W. M. Jones for review, and supplementing this doc with the first two paragraphs. Signed-off-by: Andy Grover --- Documentation/target/tcmu-design.txt | 378 +++ 1 file changed, 378 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..5518465 --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,378 @@ +Contents: + +1) TCM Userspace Design + a) Background + b) Benefits + c) Design constraints + d) Implementation overview + i. Mailbox + ii. Command ring + iii. Data Area + e) Device discovery + f) Device events + g) Other contingencies +2) Writing a user pass-through handler + a) Discovering and configuring TCMU uio devices + b) Waiting for events on the device(s) + c) Managing the command ring +3) Command filtering and pass_level +4) A final note + + +TCM Userspace Design + + +TCM is another name for LIO, an in-kernel iSCSI target (server). +Existing TCM targets run in the kernel. TCMU (TCM in Userspace) +allows userspace programs to be written which act as iSCSI targets. +This document describes the design. + +The existing kernel provides modules for different SCSI transport +protocols. TCM also modularizes the data storage. There are existing +modules for file, block device, RAM or using another SCSI device as +storage. These are called "backstores" or "storage engines". These +built-in modules are implemented entirely as kernel code. + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as "backstores" +or "storage engines". The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, "TCMU". + + +Benefits: + +In addition to allowing relatively easy support for RBD and GLFS, TCMU +will also allow easier development of new backstores. TCMU combines +with the LIO loopback fabric to become something similar to FUSE +(Filesystem in Userspace), but at the SCSI layer instead of the +filesystem layer. A SUSE, if you will. + +The disadvantage is there are more distinct components to configure, and +potentially to malfunction. This is unavoidable, but hopefully not +fatal if we're careful to keep things as simple as possible. + +Design constraints: + +- Good performance: high throughput, low latency +- Cleanly handle if userspace: + 1) never attaches + 2) hangs + 3) dies + 4) misbehaves +- Allow future flexibility in user & kernel implementations +- Be reasonably memory-efficient +- Simple to configure & run +- Simple to write a userspace backend + + +Implementation overview: + +The core of the TCMU interface is a memory region that is shared +between kernel and userspace. Within this region is: a control area +(mailbox); a lockless producer/consumer circular buffer for commands +to be passed up, and status returned; and an in/out data buffer area. + +TCMU uses the pre-existing UIO subsystem. UIO allows device driver +development in userspace, and this is conceptually very close to the +TCMU use case, except instead of a physical device, TCMU implements a +memory-mapped layout designed for SCSI commands. Using UIO also +benefits TCMU by handling device introspection (e.g. a way for +userspace to determine how large the shared region is) and signaling +mechanisms in both directions. + +There are no embedded pointers in the memory region. Everything is +expressed as an offset from the region's starting address. This allows +the ring to still work if the user process
[PATCHv2 2/4] uio: Export definition of struct uio_device
In order to prevent a O(n) search of the filesystem to link up its uio node with its target configuration, TCMU needs to know the minor number that UIO assigned. Expose the definition of this struct so TCMU can access this field. Signed-off-by: Andy Grover --- drivers/uio/uio.c | 12 include/linux/uio_driver.h | 12 +++- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index a673e5b..60fa627 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -28,18 +28,6 @@ #define UIO_MAX_DEVICES(1U << MINORBITS) -struct uio_device { - struct module *owner; - struct device *dev; - int minor; - atomic_tevent; - struct fasync_struct*async_queue; - wait_queue_head_t wait; - struct uio_info *info; - struct kobject *map_dir; - struct kobject *portio_dir; -}; - static int uio_major; static struct cdev *uio_cdev; static DEFINE_IDR(uio_idr); diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 1ad4724..baa8171 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -63,7 +63,17 @@ struct uio_port { #define MAX_UIO_PORT_REGIONS 5 -struct uio_device; +struct uio_device { +struct module *owner; +struct device *dev; +int minor; +atomic_tevent; +struct fasync_struct*async_queue; +wait_queue_head_t wait; +struct uio_info *info; +struct kobject *map_dir; +struct kobject *portio_dir; +}; /** * struct uio_info - UIO device capabilities -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 2/4] uio: Export definition of struct uio_device
In order to prevent a O(n) search of the filesystem to link up its uio node with its target configuration, TCMU needs to know the minor number that UIO assigned. Expose the definition of this struct so TCMU can access this field. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/uio/uio.c | 12 include/linux/uio_driver.h | 12 +++- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index a673e5b..60fa627 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -28,18 +28,6 @@ #define UIO_MAX_DEVICES(1U MINORBITS) -struct uio_device { - struct module *owner; - struct device *dev; - int minor; - atomic_tevent; - struct fasync_struct*async_queue; - wait_queue_head_t wait; - struct uio_info *info; - struct kobject *map_dir; - struct kobject *portio_dir; -}; - static int uio_major; static struct cdev *uio_cdev; static DEFINE_IDR(uio_idr); diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 1ad4724..baa8171 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -63,7 +63,17 @@ struct uio_port { #define MAX_UIO_PORT_REGIONS 5 -struct uio_device; +struct uio_device { +struct module *owner; +struct device *dev; +int minor; +atomic_tevent; +struct fasync_struct*async_queue; +wait_queue_head_t wait; +struct uio_info *info; +struct kobject *map_dir; +struct kobject *portio_dir; +}; /** * struct uio_info - UIO device capabilities -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 3/4] target: Add documentation on the target userspace pass-through driver
Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Thanks to Richard W. M. Jones for review, and supplementing this doc with the first two paragraphs. Signed-off-by: Andy Grover agro...@redhat.com --- Documentation/target/tcmu-design.txt | 378 +++ 1 file changed, 378 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..5518465 --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,378 @@ +Contents: + +1) TCM Userspace Design + a) Background + b) Benefits + c) Design constraints + d) Implementation overview + i. Mailbox + ii. Command ring + iii. Data Area + e) Device discovery + f) Device events + g) Other contingencies +2) Writing a user pass-through handler + a) Discovering and configuring TCMU uio devices + b) Waiting for events on the device(s) + c) Managing the command ring +3) Command filtering and pass_level +4) A final note + + +TCM Userspace Design + + +TCM is another name for LIO, an in-kernel iSCSI target (server). +Existing TCM targets run in the kernel. TCMU (TCM in Userspace) +allows userspace programs to be written which act as iSCSI targets. +This document describes the design. + +The existing kernel provides modules for different SCSI transport +protocols. TCM also modularizes the data storage. There are existing +modules for file, block device, RAM or using another SCSI device as +storage. These are called backstores or storage engines. These +built-in modules are implemented entirely as kernel code. + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as backstores +or storage engines. The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, TCMU. + + +Benefits: + +In addition to allowing relatively easy support for RBD and GLFS, TCMU +will also allow easier development of new backstores. TCMU combines +with the LIO loopback fabric to become something similar to FUSE +(Filesystem in Userspace), but at the SCSI layer instead of the +filesystem layer. A SUSE, if you will. + +The disadvantage is there are more distinct components to configure, and +potentially to malfunction. This is unavoidable, but hopefully not +fatal if we're careful to keep things as simple as possible. + +Design constraints: + +- Good performance: high throughput, low latency +- Cleanly handle if userspace: + 1) never attaches + 2) hangs + 3) dies + 4) misbehaves +- Allow future flexibility in user kernel implementations +- Be reasonably memory-efficient +- Simple to configure run +- Simple to write a userspace backend + + +Implementation overview: + +The core of the TCMU interface is a memory region that is shared +between kernel and userspace. Within this region is: a control area +(mailbox); a lockless producer/consumer circular buffer for commands +to be passed up, and status returned; and an in/out data buffer area. + +TCMU uses the pre-existing UIO subsystem. UIO allows device driver +development in userspace, and this is conceptually very close to the +TCMU use case, except instead of a physical device, TCMU implements a +memory-mapped layout designed for SCSI commands. Using UIO also +benefits TCMU by handling device introspection (e.g. a way for +userspace to determine how large the shared region is) and signaling +mechanisms in both directions. + +There are no embedded pointers in the memory region. Everything is +expressed as an offset from the region's starting address. This allows +the ring to still work if the user process dies and is restarted with +the region mapped at a different
[PATCHv2 0/4] Userspace Passthrough backstore for LIO
Hi Nick and all, Here's version 2, against Nick's target-pending/for-next. Note that since that tree is still on 3.17-rc5, in testing I ran into an issue fixed by: genhd: fix leftover might_sleep() in blk_free_devt() which first appears in 3.17-rc7. Incorporates feedback from Nick. Doc fixes * add table of contents * add example code for the user-side * document specific opcodes that IO mode passes through Code fixes * add back support for PASS_ALL and PASS_IO pass_level option. * Removed retry-in-kernel mechanism * fix issues from kbuild test robot * remove kref from tcmu_dev and call kfree(udev) from tcmu_free_device instead * set hw_block_size to 512 instead of 4096 * Don't copy DMA_FROM_DEVICE data if there was a CHECK_CONDITION * use goto-style error handling in tcmu_configure_device * Check retval from genlmsg_multicast and ignore -ESRCH * properly handle if device is unconfigured in tcmu_free_device Thanks -- Andy Andy Grover (4): target: Remove unneeded check in sbc_parse_cdb uio: Export definition of struct uio_device target: Add documentation on the target userspace pass-through driver target: Add a user-passthrough backstore Documentation/target/tcmu-design.txt | 378 +++ drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_sbc.c |2 +- drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1163 drivers/uio/uio.c | 12 - include/linux/uio_driver.h | 12 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 10 files changed, 1706 insertions(+), 14 deletions(-) create mode 100644 Documentation/target/tcmu-design.txt create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 4/4] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- SUSE?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for returning scsi command status and optional sense data. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1163 include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 6 files changed, 1316 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate TCM/USER Subsystem Plugin for Linux + help + Say Y here to enable the TCM/USER subsystem plugin + source drivers/target/loopback/Kconfig source drivers/target/tcm_fc/Kconfig source drivers/target/iscsi/Kconfig diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 9700ea1..9ea0d5f 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err(Unable to load target_core_pscsi\n); + ret = request_module(target_core_user); + if (ret != 0) + pr_err(Unable to load target_core_user\n); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..6608ecf --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1163 @@ +/* + * Copyright (C) 2013 Shaohua Li s...@kernel.org + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy
[PATCHv2 1/4] target: Remove unneeded check in sbc_parse_cdb
The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's refactoring of this function. There are no places where that flag is set that cmd-execute_cmd isn't also set. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/target/target_core_sbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index bd78d92..ebe62af 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) } /* reject any command that we don't have a handler for */ - if (!(cmd-se_cmd_flags SCF_SCSI_DATA_CDB) !cmd-execute_cmd) + if (!cmd-execute_cmd) return TCM_UNSUPPORTED_SCSI_OPCODE; if (cmd-se_cmd_flags SCF_SCSI_DATA_CDB) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns
On 09/18/2014 04:17 PM, Nicholas A. Bellinger wrote: I'm currently reviewing #2, #4, #5 and #7 and will consider merging these. These cleanups account for most of the LOC reduction, and avoid most of the larger concerns. Also for patches like this, they really need testing on your end before I'll consider them merging. Compile testing alone for these types of locking changes is not enough, given the history of regressions with these types of cleanups. OK. I'll look for those to show up in your tree, then? And work on better dividing up of the remaining changes, and also minimize their risk of causing regressions. Regards -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns
On 09/18/2014 04:17 PM, Nicholas A. Bellinger wrote: I'm currently reviewing #2, #4, #5 and #7 and will consider merging these. These cleanups account for most of the LOC reduction, and avoid most of the larger concerns. Also for patches like this, they really need testing on your end before I'll consider them merging. Compile testing alone for these types of locking changes is not enough, given the history of regressions with these types of cleanups. OK. I'll look for those to show up in your tree, then? And work on better dividing up of the remaining changes, and also minimize their risk of causing regressions. Regards -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] target: Add a user-passthrough backstore
On 09/22/2014 01:58 PM, Nicholas A. Bellinger wrote: So I'd still like to start for an initial merge with the two different modes mentioned earlier. The pure-passthrough mode where everything is handled by user-space, and an I/O passthrough mode where only SCF_SCSI_DATA_CDB is passed along to userspace, but all other control CDBs are handled by existing in-kernel emulation. Andy, can you re-spin a series with these two approaches in mind..? Will do. This is actually close to what the initial RFC version I posted did. -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] target: Add a user-passthrough backstore
On 09/22/2014 01:58 PM, Nicholas A. Bellinger wrote: So I'd still like to start for an initial merge with the two different modes mentioned earlier. The pure-passthrough mode where everything is handled by user-space, and an I/O passthrough mode where only SCF_SCSI_DATA_CDB is passed along to userspace, but all other control CDBs are handled by existing in-kernel emulation. Andy, can you re-spin a series with these two approaches in mind..? Will do. This is actually close to what the initial RFC version I posted did. -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] target: Add a user-passthrough backstore
On 09/19/2014 04:51 PM, Alex Elsayed wrote: Not sure I follow.. How does the proposed passthrough mode prevent someone from emulating OSDs, media changers, optical disks or anything else in userspace with TCMU..? The main thing that the above comments highlight is why attempting to combine the existing in-kernel emulation with a userspace backend providing it's own emulation can open up a number of problems with mismatched state between the two. It doesn't prevent it, but it _does_ put it in the exact same place as PSCSI regarding the warnings on the wiki. It means that if someone wants to implement (say) the optical disc or OSD CDBs, they then lose out on ALUA unless they implement it themselves - which seems unnecessary and painful, since those should really be disjoint. In particular, an OSD backed by RADOS objects could be a very nice thing indeed, _and_ could really benefit from ALUA. Some possible solutions: 1) The first time TCMU sees an opcode it passes it up and notes whether it is handled or not. If it was handled then future cmds with that opcode are passed up but not retried in the kernel. If it was not handled then it and all future commands with that opcode are handled by the kernel and not passed up. 2) Same as #1 but TCMU operates on SCSI spec granularity - e.g. handling any SSC opcode means userspace must handle all SSC commands. 3) Like #2 but define opcode groupings that must all be implemented together, independent of the specifications. 4) Have passthrough mode set at creation, but with more than two modes, either grouped by SCSI spec or our own groupings. 5) Never pass up certain opcodes, like the ALUA ones or whatever. Have a good weekend -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] target: Add a user-passthrough backstore
On 09/19/2014 04:51 PM, Alex Elsayed wrote: Not sure I follow.. How does the proposed passthrough mode prevent someone from emulating OSDs, media changers, optical disks or anything else in userspace with TCMU..? The main thing that the above comments highlight is why attempting to combine the existing in-kernel emulation with a userspace backend providing it's own emulation can open up a number of problems with mismatched state between the two. It doesn't prevent it, but it _does_ put it in the exact same place as PSCSI regarding the warnings on the wiki. It means that if someone wants to implement (say) the optical disc or OSD CDBs, they then lose out on ALUA co unless they implement it themselves - which seems unnecessary and painful, since those should really be disjoint. In particular, an OSD backed by RADOS objects could be a very nice thing indeed, _and_ could really benefit from ALUA. Some possible solutions: 1) The first time TCMU sees an opcode it passes it up and notes whether it is handled or not. If it was handled then future cmds with that opcode are passed up but not retried in the kernel. If it was not handled then it and all future commands with that opcode are handled by the kernel and not passed up. 2) Same as #1 but TCMU operates on SCSI spec granularity - e.g. handling any SSC opcode means userspace must handle all SSC commands. 3) Like #2 but define opcode groupings that must all be implemented together, independent of the specifications. 4) Have passthrough mode set at creation, but with more than two modes, either grouped by SCSI spec or our own groupings. 5) Never pass up certain opcodes, like the ALUA ones or whatever. Have a good weekend -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns
On 09/18/2014 12:38 AM, Nicholas A. Bellinger wrote: On Sat, 2014-09-13 at 21:55 +0200, Christoph Hellwig wrote: ping again. We're getting closer to the end of the 3.18 merge window and there still hasn't been a response. Should Andy just send the patches directly to Linus once 3.18 opens given that they have been out on the list since Jun 23? (with a positive review from me and no negative one) Removing unused per WWPN endpoint LUN + per NodeACL MappedLUN memory is a nice optimization to have, but I'm not yet convinced that extending existing control path spinlocks to support an array of pointers is ultimately worth the complexity it adds here. 9 files changed, 250 insertions(+), 367 deletions(-). This patchset removes 100+ lines of code. Furthermore, I wouldn't characterize it as extending locks, so much as putting locks where they should've always been. The fact that device_list[foo] is never null means we've avoided crashes but not potentially incorrect accesses. Another concern is how these changes effect active session + device I/O shutdown, which is an area of regressions I'd rather avoid I assume this set would spend time in your tree, followed by Linus' tree before making it into a release. Also, any logic errors are likely to result in a fault, so they should not remain hidden for long. if the primary benefit of this series is only reducing memory footprint for unused LUNs + MappedLUNs. Yes it does reduce wasted memory, that should be reason enough I'd say. But this patchset is also a building block for further improvements that are more significant. This set transitions all lun and mappedlun checks from checking a flag to checking for NULL. This is necessary before we can improve from a fixed-size array to more size-scalable data structures like a radix tree, or lockless, with RCU. Lowering the TRANSPORT_MAX_LUNS_PER_TPG value at compile time today is the simple way for reducing overall memory footprint for folks who need to scale up the number of targets using smaller individual LUN mappings. This is only an option for embedded. We should scale the amount of memory we use with the number of allocated LUNs and mapped LUNs. As for something smarter, given the mostly read-only nature of LUN + MappedLUN pointers to individual TPGT + NodeACLs context, I'd rather see something based on RCU arrays + percpu_ref counting to avoid this type of complexity to existing code, and move in the direction of dropping fast-path I_T ->device_list_lock access all together. See above about pointers vs flags, this is a first step toward more performant *and* space-efficient data structures. Beyond these objections, there are some useful fixes + cleanups from this series that I'm OK with merging soon.. I've pushed this patchset to git://git.kernel.org/pub/scm/linux/kernel/git/grover/linux.git on two branches against your and Linus' repos: against-linus against-target-pending-for-next (looked-over and compile-tested) For your convenience. Regards -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns
On 09/18/2014 12:38 AM, Nicholas A. Bellinger wrote: On Sat, 2014-09-13 at 21:55 +0200, Christoph Hellwig wrote: ping again. We're getting closer to the end of the 3.18 merge window and there still hasn't been a response. Should Andy just send the patches directly to Linus once 3.18 opens given that they have been out on the list since Jun 23? (with a positive review from me and no negative one) Removing unused per WWPN endpoint LUN + per NodeACL MappedLUN memory is a nice optimization to have, but I'm not yet convinced that extending existing control path spinlocks to support an array of pointers is ultimately worth the complexity it adds here. 9 files changed, 250 insertions(+), 367 deletions(-). This patchset removes 100+ lines of code. Furthermore, I wouldn't characterize it as extending locks, so much as putting locks where they should've always been. The fact that device_list[foo] is never null means we've avoided crashes but not potentially incorrect accesses. Another concern is how these changes effect active session + device I/O shutdown, which is an area of regressions I'd rather avoid I assume this set would spend time in your tree, followed by Linus' tree before making it into a release. Also, any logic errors are likely to result in a fault, so they should not remain hidden for long. if the primary benefit of this series is only reducing memory footprint for unused LUNs + MappedLUNs. Yes it does reduce wasted memory, that should be reason enough I'd say. But this patchset is also a building block for further improvements that are more significant. This set transitions all lun and mappedlun checks from checking a flag to checking for NULL. This is necessary before we can improve from a fixed-size array to more size-scalable data structures like a radix tree, or lockless, with RCU. Lowering the TRANSPORT_MAX_LUNS_PER_TPG value at compile time today is the simple way for reducing overall memory footprint for folks who need to scale up the number of targets using smaller individual LUN mappings. This is only an option for embedded. We should scale the amount of memory we use with the number of allocated LUNs and mapped LUNs. As for something smarter, given the mostly read-only nature of LUN + MappedLUN pointers to individual TPGT + NodeACLs context, I'd rather see something based on RCU arrays + percpu_ref counting to avoid this type of complexity to existing code, and move in the direction of dropping fast-path I_T -device_list_lock access all together. See above about pointers vs flags, this is a first step toward more performant *and* space-efficient data structures. Beyond these objections, there are some useful fixes + cleanups from this series that I'm OK with merging soon.. I've pushed this patchset to git://git.kernel.org/pub/scm/linux/kernel/git/grover/linux.git on two branches against your and Linus' repos: against-linus against-target-pending-for-next (looked-over and compile-tested) For your convenience. Regards -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4] Userspace Passthrough backstore for LIO
Hi all, Here is version three of the userspace passthrough backstore code I've been working on. I think it's ready to be considered for upstream. If you are wondering what this is, patch 3 is documentation that should explain, at both a high- and low-level. There are other pieces needed for this to work properly -- changes to targetcli and rtslib to easily configure the new backstore type, plus the user code that hooks up to the passthrough. I've written some code to do this, called tcmu-runner (https://github.com/agrover/tcmu-runner) but the userspace development naturally is hindered until the kernel code is accepted :) Changes since version 2: * Incorporate doc improvements from Richard W. M. Jones * Correct off-by-1 error in tcmu_get_blocks * execute_rw must be set in the sbc_ops passed to sbc_parse_cdb Thanks -- Andy Andy Grover (4): target: Remove unneeded check in sbc_parse_cdb uio: Export definition of struct uio_device target: Add documentation on the target userspace pass-through driver target: Add a user-passthrough backstore Documentation/target/tcmu-design.txt | 239 +++ drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_sbc.c |2 +- drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1169 drivers/uio/uio.c | 12 - include/linux/uio_driver.h | 12 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 10 files changed, 1573 insertions(+), 14 deletions(-) create mode 100644 Documentation/target/tcmu-design.txt create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] target: Add documentation on the target userspace pass-through driver
Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Thanks to Richard W. M. Jones for review, and supplementing this doc with the first two paragraphs. Signed-off-by: Andy Grover --- Documentation/target/tcmu-design.txt | 239 +++ 1 file changed, 239 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..92662be --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,239 @@ +TCM Userspace Design + + +TCM is another name for LIO, an in-kernel iSCSI target (server). +Existing TCM targets run in the kernel. TCMU (TCM in Userspace) +allows userspace programs to be written which act as iSCSI targets. +This document describes the design. + +The existing kernel provides modules for different SCSI transport +protocols. TCM also modularizes the data storage. There are existing +modules for file, block device, RAM or using another SCSI device as +storage. These are called "backstores" or "storage engines". These +built-in modules are implemented entirely as kernel code. + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as "backstores" +or "storage engines". The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, "TCMU". + + +Benefits: + +In addition to allowing relatively easy support for RBD and GLFS, TCMU +will also allow easier development of new backstores. TCMU combines +with the LIO loopback fabric to become something similar to FUSE +(Filesystem in Userspace), but at the SCSI layer instead of the +filesystem layer. A SUSE, if you will. + +The disadvantage is there are more distinct components to configure, and +potentially to malfunction. This is unavoidable, but hopefully not +fatal if we're careful to keep things as simple as possible. + +Design constraints: + +- Good performance: high throughput, low latency +- Cleanly handle if userspace: + 1) never attaches + 2) hangs + 3) dies + 4) misbehaves +- Allow future flexibility in user & kernel implementations +- Be reasonably memory-efficient +- Simple to configure & run +- Simple to write a userspace backend + + +Implementation overview: + +The core of the TCMU interface is a memory region that is shared +between kernel and userspace. Within this region is: a control area +(mailbox); a lockless producer/consumer circular buffer for commands +to be passed up, and status returned; and an in/out data buffer area. + +TCMU uses the pre-existing UIO subsystem. UIO allows device driver +development in userspace, and this is conceptually very close to the +TCMU use case, except instead of a physical device, TCMU implements a +memory-mapped layout designed for SCSI commands. Using UIO also +benefits TCMU by handling device introspection (e.g. a way for +userspace to determine how large the shared region is) and signaling +mechanisms in both directions. + +There are no embedded pointers in the memory region. Everything is +expressed as an offset from the region's starting address. This allows +the ring to still work if the user process dies and is restarted with +the region mapped at a different virtual address. + +See target_core_user.h for the struct definitions. + +The Mailbox: + +The mailbox is always at the start of the shared memory region, and +contains a version, details about the starting offset and size of the +command ring, and head and tail pointers to be used by the kernel and +userspace (respectively) to put commands on the ring, and indicate +when the commands are completed. + +version
[PATCH 2/4] uio: Export definition of struct uio_device
In order to prevent a O(n) search of the filesystem to link up its uio node with its target configuration, TCMU needs to know the minor number that UIO assigned. Expose the definition of this struct so TCMU can access this field. Signed-off-by: Andy Grover --- drivers/uio/uio.c | 12 include/linux/uio_driver.h | 12 +++- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index a673e5b..60fa627 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -28,18 +28,6 @@ #define UIO_MAX_DEVICES(1U << MINORBITS) -struct uio_device { - struct module *owner; - struct device *dev; - int minor; - atomic_tevent; - struct fasync_struct*async_queue; - wait_queue_head_t wait; - struct uio_info *info; - struct kobject *map_dir; - struct kobject *portio_dir; -}; - static int uio_major; static struct cdev *uio_cdev; static DEFINE_IDR(uio_idr); diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 1ad4724..baa8171 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -63,7 +63,17 @@ struct uio_port { #define MAX_UIO_PORT_REGIONS 5 -struct uio_device; +struct uio_device { +struct module *owner; +struct device *dev; +int minor; +atomic_tevent; +struct fasync_struct*async_queue; +wait_queue_head_t wait; +struct uio_info *info; +struct kobject *map_dir; +struct kobject *portio_dir; +}; /** * struct uio_info - UIO device capabilities -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for returning scsi command status and optional sense data. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. Signed-off-by: Andy Grover --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1169 include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 6 files changed, 1322 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate "TCM/USER Subsystem Plugin for Linux" + help + Say Y here to enable the TCM/USER subsystem plugin + source "drivers/target/loopback/Kconfig" source "drivers/target/tcm_fc/Kconfig" source "drivers/target/iscsi/Kconfig" diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 7fa62fc..f018a8c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err("Unable to load target_core_pscsi\n"); + ret = request_module("target_core_user"); + if (ret != 0) + pr_err("Unable to load target_core_user\n"); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..33b56a8 --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1169 @@ +/* + * Copyright (C) 2013 Shaohua Li + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
[PATCH 1/4] target: Remove unneeded check in sbc_parse_cdb
The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's refactoring of this function. There are no places where that flag is set that cmd->execute_cmd isn't also set. Signed-off-by: Andy Grover --- drivers/target/target_core_sbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index bd78d92..ebe62af 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) } /* reject any command that we don't have a handler for */ - if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) && !cmd->execute_cmd) + if (!cmd->execute_cmd) return TCM_UNSUPPORTED_SCSI_OPCODE; if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) { -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] target: Remove unneeded check in sbc_parse_cdb
The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's refactoring of this function. There are no places where that flag is set that cmd-execute_cmd isn't also set. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/target/target_core_sbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index bd78d92..ebe62af 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) } /* reject any command that we don't have a handler for */ - if (!(cmd-se_cmd_flags SCF_SCSI_DATA_CDB) !cmd-execute_cmd) + if (!cmd-execute_cmd) return TCM_UNSUPPORTED_SCSI_OPCODE; if (cmd-se_cmd_flags SCF_SCSI_DATA_CDB) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] uio: Export definition of struct uio_device
In order to prevent a O(n) search of the filesystem to link up its uio node with its target configuration, TCMU needs to know the minor number that UIO assigned. Expose the definition of this struct so TCMU can access this field. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/uio/uio.c | 12 include/linux/uio_driver.h | 12 +++- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index a673e5b..60fa627 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -28,18 +28,6 @@ #define UIO_MAX_DEVICES(1U MINORBITS) -struct uio_device { - struct module *owner; - struct device *dev; - int minor; - atomic_tevent; - struct fasync_struct*async_queue; - wait_queue_head_t wait; - struct uio_info *info; - struct kobject *map_dir; - struct kobject *portio_dir; -}; - static int uio_major; static struct cdev *uio_cdev; static DEFINE_IDR(uio_idr); diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 1ad4724..baa8171 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -63,7 +63,17 @@ struct uio_port { #define MAX_UIO_PORT_REGIONS 5 -struct uio_device; +struct uio_device { +struct module *owner; +struct device *dev; +int minor; +atomic_tevent; +struct fasync_struct*async_queue; +wait_queue_head_t wait; +struct uio_info *info; +struct kobject *map_dir; +struct kobject *portio_dir; +}; /** * struct uio_info - UIO device capabilities -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- SUSE?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for returning scsi command status and optional sense data. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1169 include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 6 files changed, 1322 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate TCM/USER Subsystem Plugin for Linux + help + Say Y here to enable the TCM/USER subsystem plugin + source drivers/target/loopback/Kconfig source drivers/target/tcm_fc/Kconfig source drivers/target/iscsi/Kconfig diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 7fa62fc..f018a8c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err(Unable to load target_core_pscsi\n); + ret = request_module(target_core_user); + if (ret != 0) + pr_err(Unable to load target_core_user\n); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..33b56a8 --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1169 @@ +/* + * Copyright (C) 2013 Shaohua Li s...@kernel.org + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy
[PATCH 3/4] target: Add documentation on the target userspace pass-through driver
Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Thanks to Richard W. M. Jones for review, and supplementing this doc with the first two paragraphs. Signed-off-by: Andy Grover agro...@redhat.com --- Documentation/target/tcmu-design.txt | 239 +++ 1 file changed, 239 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..92662be --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,239 @@ +TCM Userspace Design + + +TCM is another name for LIO, an in-kernel iSCSI target (server). +Existing TCM targets run in the kernel. TCMU (TCM in Userspace) +allows userspace programs to be written which act as iSCSI targets. +This document describes the design. + +The existing kernel provides modules for different SCSI transport +protocols. TCM also modularizes the data storage. There are existing +modules for file, block device, RAM or using another SCSI device as +storage. These are called backstores or storage engines. These +built-in modules are implemented entirely as kernel code. + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as backstores +or storage engines. The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, TCMU. + + +Benefits: + +In addition to allowing relatively easy support for RBD and GLFS, TCMU +will also allow easier development of new backstores. TCMU combines +with the LIO loopback fabric to become something similar to FUSE +(Filesystem in Userspace), but at the SCSI layer instead of the +filesystem layer. A SUSE, if you will. + +The disadvantage is there are more distinct components to configure, and +potentially to malfunction. This is unavoidable, but hopefully not +fatal if we're careful to keep things as simple as possible. + +Design constraints: + +- Good performance: high throughput, low latency +- Cleanly handle if userspace: + 1) never attaches + 2) hangs + 3) dies + 4) misbehaves +- Allow future flexibility in user kernel implementations +- Be reasonably memory-efficient +- Simple to configure run +- Simple to write a userspace backend + + +Implementation overview: + +The core of the TCMU interface is a memory region that is shared +between kernel and userspace. Within this region is: a control area +(mailbox); a lockless producer/consumer circular buffer for commands +to be passed up, and status returned; and an in/out data buffer area. + +TCMU uses the pre-existing UIO subsystem. UIO allows device driver +development in userspace, and this is conceptually very close to the +TCMU use case, except instead of a physical device, TCMU implements a +memory-mapped layout designed for SCSI commands. Using UIO also +benefits TCMU by handling device introspection (e.g. a way for +userspace to determine how large the shared region is) and signaling +mechanisms in both directions. + +There are no embedded pointers in the memory region. Everything is +expressed as an offset from the region's starting address. This allows +the ring to still work if the user process dies and is restarted with +the region mapped at a different virtual address. + +See target_core_user.h for the struct definitions. + +The Mailbox: + +The mailbox is always at the start of the shared memory region, and +contains a version, details about the starting offset and size of the +command ring, and head and tail pointers to be used by the kernel and +userspace (respectively) to put commands on the ring, and indicate +when the commands are completed. + +version - 1 (userspace should abort if otherwise) +flags - none yet defined
[PATCH 0/4] Userspace Passthrough backstore for LIO
Hi all, Here is version three of the userspace passthrough backstore code I've been working on. I think it's ready to be considered for upstream. If you are wondering what this is, patch 3 is documentation that should explain, at both a high- and low-level. There are other pieces needed for this to work properly -- changes to targetcli and rtslib to easily configure the new backstore type, plus the user code that hooks up to the passthrough. I've written some code to do this, called tcmu-runner (https://github.com/agrover/tcmu-runner) but the userspace development naturally is hindered until the kernel code is accepted :) Changes since version 2: * Incorporate doc improvements from Richard W. M. Jones * Correct off-by-1 error in tcmu_get_blocks * execute_rw must be set in the sbc_ops passed to sbc_parse_cdb Thanks -- Andy Andy Grover (4): target: Remove unneeded check in sbc_parse_cdb uio: Export definition of struct uio_device target: Add documentation on the target userspace pass-through driver target: Add a user-passthrough backstore Documentation/target/tcmu-design.txt | 239 +++ drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_sbc.c |2 +- drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1169 drivers/uio/uio.c | 12 - include/linux/uio_driver.h | 12 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 10 files changed, 1573 insertions(+), 14 deletions(-) create mode 100644 Documentation/target/tcmu-design.txt create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
On 08/31/2014 02:22 PM, Richard W.M. Jones wrote: Reading this several times, I now think I get what it's trying to say, but I think it needs to introduces the terms (as the Economist style does). Something like this: "TCM is the new name for LIO, an in-kernel iSCSI target (server). Existing TCM targets run in the kernel. TCMU (TCM in Userspace) allows userspace programs to be written which act as iSCSI targets. This document describes the design. The existing kernel provides modules for different SCSI transport protocols. TCM also modularizes the data storage. There are existing modules for file, block device, RAM or using another SCSI device as storage. These are called "backstores" or "storage engines". These built-in modules are implemented entirely as kernel code." And hopefully having defined a bit of background, the rest of the document just flows nicely: Thanks much! I've put this in the doc, and will hopefully send out a final patchset for inclusion with the new text in the next week or so. The only change I made was "another" instead of "the new name" -- because to be honest I don't know if there was actually an attempt at a name change, and if so if it was successful or not :) LIO seems to have stuck, but TCM seems to refer just to the "backend" part of LIO. Thanks again -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
On 08/31/2014 02:22 PM, Richard W.M. Jones wrote: Reading this several times, I now think I get what it's trying to say, but I think it needs to introduces the terms (as the Economist style does). Something like this: TCM is the new name for LIO, an in-kernel iSCSI target (server). Existing TCM targets run in the kernel. TCMU (TCM in Userspace) allows userspace programs to be written which act as iSCSI targets. This document describes the design. The existing kernel provides modules for different SCSI transport protocols. TCM also modularizes the data storage. There are existing modules for file, block device, RAM or using another SCSI device as storage. These are called backstores or storage engines. These built-in modules are implemented entirely as kernel code. And hopefully having defined a bit of background, the rest of the document just flows nicely: Thanks much! I've put this in the doc, and will hopefully send out a final patchset for inclusion with the new text in the next week or so. begin naming potential-bikeshed wasteoftime The only change I made was another instead of the new name -- because to be honest I don't know if there was actually an attempt at a name change, and if so if it was successful or not :) LIO seems to have stuck, but TCM seems to refer just to the backend part of LIO. end bikeshed Thanks again -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
On 08/30/2014 10:35 AM, Richard W.M. Jones wrote: On Tue, Jul 01, 2014 at 12:11:14PM -0700, Andy Grover wrote: Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Signed-off-by: Andy Grover --- Documentation/target/tcmu-design.txt | 210 +++ 1 file changed, 210 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..200ff3e --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,210 @@ +TCM Userspace Design + + + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as "backstores" +or "storage engines". The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, "TCMU". It has to be said that this documentation is terrible. Jumping "in medias res"[1] is great for fiction, awful for technical documentation. I would recommend the Economist Style Guide[2]. They always say "Barak Obama, President of the United States" the first time he is mentioned in an article, even though almost everyone knows who Barak Obama is. In this case you're leaping into something .. fabrics, LIO, backstores, target solutions, ... aargh. Explain what you mean by each term and how it all fits together. Thanks for the feedback. I am undoubtedly too close to the details, because I thought I *was* explaining things :) This doc is for people like you -- tech-savvy but unfamiliar with this specific area. Would you be so kind as to point out exactly the terms this document should explain? Should it explain SCSI and SCSI commands? What a SCSI target is? Say "target implementations" rather than "target solutions"? Do I need some ASCII art? Or, if in finding these gaps you've actually picked up the jargon and wanted to just take a pass a rewriting it, that would be fine too :) I've read some of your libguestfs docs and they were very understandable. Regards -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
On 08/30/2014 10:35 AM, Richard W.M. Jones wrote: On Tue, Jul 01, 2014 at 12:11:14PM -0700, Andy Grover wrote: Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Signed-off-by: Andy Grover agro...@redhat.com --- Documentation/target/tcmu-design.txt | 210 +++ 1 file changed, 210 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..200ff3e --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,210 @@ +TCM Userspace Design + + + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as backstores +or storage engines. The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, TCMU. It has to be said that this documentation is terrible. Jumping in medias res[1] is great for fiction, awful for technical documentation. I would recommend the Economist Style Guide[2]. They always say Barak Obama, President of the United States the first time he is mentioned in an article, even though almost everyone knows who Barak Obama is. In this case you're leaping into something .. fabrics, LIO, backstores, target solutions, ... aargh. Explain what you mean by each term and how it all fits together. Thanks for the feedback. I am undoubtedly too close to the details, because I thought I *was* explaining things :) This doc is for people like you -- tech-savvy but unfamiliar with this specific area. Would you be so kind as to point out exactly the terms this document should explain? Should it explain SCSI and SCSI commands? What a SCSI target is? Say target implementations rather than target solutions? Do I need some ASCII art? Or, if in finding these gaps you've actually picked up the jargon and wanted to just take a pass a rewriting it, that would be fine too :) I've read some of your libguestfs docs and they were very understandable. Regards -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 3/4] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for scsi command status and sense data, if present. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. * Random printks in code, still. Signed-off-by: Andy Grover --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1158 include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 6 files changed, 1311 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate "TCM/USER Subsystem Plugin for Linux" + help + Say Y here to enable the TCM/USER subsystem plugin + source "drivers/target/loopback/Kconfig" source "drivers/target/tcm_fc/Kconfig" source "drivers/target/iscsi/Kconfig" diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 7fa62fc..f018a8c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err("Unable to load target_core_pscsi\n"); + ret = request_module("target_core_user"); + if (ret != 0) + pr_err("Unable to load target_core_user\n"); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..d959e04 --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1158 @@ +/* + * Copyright (C) 2013 Shaohua Li + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the G
[RFCv2 4/4] target: Add documentation on the target userspace pass-through driver
Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Signed-off-by: Andy Grover --- Documentation/target/tcmu-design.txt | 229 +++ 1 file changed, 229 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..1012fe4 --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,229 @@ +TCM Userspace Design + + + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as "backstores" +or "storage engines". The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, "TCMU". + + +Benefits: + +In addition to allowing relatively easy support for RBD and GLFS, TCMU +will also allow easier development of new backstores. TCMU combines +with the LIO loopback fabric to become something similar to FUSE +(Filesystem in Userspace), but at the SCSI layer instead of the +filesystem layer. A SUSE, if you will. + +The disadvantage is there are more distinct components to configure, and +potentially to malfunction. This is unavoidable, but hopefully not +fatal if we're careful to keep things as simple as possible. + +Design constraints: + +- Good performance: high throughput, low latency +- Cleanly handle if userspace: + 1) never attaches + 2) hangs + 3) dies + 4) misbehaves +- Allow future flexibility in user & kernel implementations +- Be reasonably memory-efficient +- Simple to configure & run +- Simple to write a userspace backend + + +Implementation overview: + +The core of the TCMU interface is a memory region that is shared +between kernel and userspace. Within this region is: a control area +(mailbox); a lockless producer/consumer circular buffer for commands +to be passed up, and status returned; and an in/out data buffer area. + +TCMU uses the pre-existing UIO subsystem. UIO allows device driver +development in userspace, and this is conceptually very close to the +TCMU use case, except instead of a physical device, TCMU implements a +memory-mapped layout designed for SCSI commands. Using UIO also +benefits TCMU by handling device introspection (e.g. a way for +userspace to determine how large the shared region is) and signaling +mechanisms in both directions. + +There are no embedded pointers in the memory region. Everything is +expressed as an offset from the region's starting address. This allows +the ring to still work if the user process dies and is restarted with +the region mapped at a different virtual address. + +See target_core_user.h for the struct definitions. + +The Mailbox: + +The mailbox is always at the start of the shared memory region, and +contains a version, details about the starting offset and size of the +command ring, and head and tail pointers to be used by the kernel and +userspace (respectively) to put commands on the ring, and indicate +when the commands are completed. + +version - 1 (userspace should abort if otherwise) +flags - none yet defined. +cmdr_off - The offset of the start of the command ring from the start +of the memory region, to account for the mailbox size. +cmdr_size - The size of the command ring. This does *not* need to be a +power of two. +cmd_head - Modified by the kernel to indicate when a command has been +placed on the ring. +cmd_tail - Modified by userspace to indicate when it has completed +processing of a command. + +The Command Ring: + +Commands are placed on the ring by the kernel incrementing +mailbox.cmd_head by the size of the command, modulo cmdr_size, and +then signaling userspace via uio_event_notify(). Once the command is +completed
[RFCv2 0/4] Userspace pass-through storage engine (backend)
Hi all, Here is version two of the userspace passthrough backstore code I've been working on. You can find some documentation on the design in last patch. Please let me know what you think about the design and the code. Thanks -- Andy Changes since version 1: * Use netlink for reporting device add/removes to userspace * Remove pass_level in favor of automatic emulation of commands when userspace doesn't support them * Implemented 'tcmu-runner' daemon to handle the user side of the interface and make writing handlers easier (not in this patchset, see https://github.com/agrover/tcmu-runner). * Many bug fixes. Can now successfully partition, format, and use a TCMU-backed volume. * Two small patches added for other changes we need. Operation: * Using https://github.com/agrover/targetcli-fb/commits/userback and https://github.com/agrover/rtslib-fb/commits/userback, create a user storage object and give it a size, and 'file/foo.img' as config parameter * Start tcmu-runner (manually for now) * Create a loopback fabric and link the above storage object to it Andy Grover (4): target: Remove unneeded check in sbc_parse_cdb uio: Export definition of struct uio_device target: Add a user-passthrough backstore target: Add documentation on the target userspace pass-through driver Documentation/target/tcmu-design.txt | 229 +++ drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_sbc.c |2 +- drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1158 drivers/uio/uio.c | 12 - include/linux/uio_driver.h | 12 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 10 files changed, 1552 insertions(+), 14 deletions(-) create mode 100644 Documentation/target/tcmu-design.txt create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 1/4] target: Remove unneeded check in sbc_parse_cdb
The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's refactoring of this function. There are no places where that flag is set that cmd->execute_cmd isn't also set. Signed-off-by: Andy Grover --- drivers/target/target_core_sbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index bd78d92..ebe62af 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) } /* reject any command that we don't have a handler for */ - if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) && !cmd->execute_cmd) + if (!cmd->execute_cmd) return TCM_UNSUPPORTED_SCSI_OPCODE; if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) { -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 2/4] uio: Export definition of struct uio_device
In order to prevent a O(n) search of the filesystem to link up its uio node with its target configuration, TCMU needs to know the minor number that UIO assigned. Expose the definition of this struct so TCMU can access this field. Signed-off-by: Andy Grover --- drivers/uio/uio.c | 12 include/linux/uio_driver.h | 12 +++- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index a673e5b..60fa627 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -28,18 +28,6 @@ #define UIO_MAX_DEVICES(1U << MINORBITS) -struct uio_device { - struct module *owner; - struct device *dev; - int minor; - atomic_tevent; - struct fasync_struct*async_queue; - wait_queue_head_t wait; - struct uio_info *info; - struct kobject *map_dir; - struct kobject *portio_dir; -}; - static int uio_major; static struct cdev *uio_cdev; static DEFINE_IDR(uio_idr); diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 1ad4724..baa8171 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -63,7 +63,17 @@ struct uio_port { #define MAX_UIO_PORT_REGIONS 5 -struct uio_device; +struct uio_device { +struct module *owner; +struct device *dev; +int minor; +atomic_tevent; +struct fasync_struct*async_queue; +wait_queue_head_t wait; +struct uio_info *info; +struct kobject *map_dir; +struct kobject *portio_dir; +}; /** * struct uio_info - UIO device capabilities -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 1/4] target: Remove unneeded check in sbc_parse_cdb
The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's refactoring of this function. There are no places where that flag is set that cmd-execute_cmd isn't also set. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/target/target_core_sbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index bd78d92..ebe62af 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) } /* reject any command that we don't have a handler for */ - if (!(cmd-se_cmd_flags SCF_SCSI_DATA_CDB) !cmd-execute_cmd) + if (!cmd-execute_cmd) return TCM_UNSUPPORTED_SCSI_OPCODE; if (cmd-se_cmd_flags SCF_SCSI_DATA_CDB) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 0/4] Userspace pass-through storage engine (backend)
Hi all, Here is version two of the userspace passthrough backstore code I've been working on. You can find some documentation on the design in last patch. Please let me know what you think about the design and the code. Thanks -- Andy Changes since version 1: * Use netlink for reporting device add/removes to userspace * Remove pass_level in favor of automatic emulation of commands when userspace doesn't support them * Implemented 'tcmu-runner' daemon to handle the user side of the interface and make writing handlers easier (not in this patchset, see https://github.com/agrover/tcmu-runner). * Many bug fixes. Can now successfully partition, format, and use a TCMU-backed volume. * Two small patches added for other changes we need. Operation: * Using https://github.com/agrover/targetcli-fb/commits/userback and https://github.com/agrover/rtslib-fb/commits/userback, create a user storage object and give it a size, and 'file/foo.img' as config parameter * Start tcmu-runner (manually for now) * Create a loopback fabric and link the above storage object to it Andy Grover (4): target: Remove unneeded check in sbc_parse_cdb uio: Export definition of struct uio_device target: Add a user-passthrough backstore target: Add documentation on the target userspace pass-through driver Documentation/target/tcmu-design.txt | 229 +++ drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_sbc.c |2 +- drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1158 drivers/uio/uio.c | 12 - include/linux/uio_driver.h | 12 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 10 files changed, 1552 insertions(+), 14 deletions(-) create mode 100644 Documentation/target/tcmu-design.txt create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 2/4] uio: Export definition of struct uio_device
In order to prevent a O(n) search of the filesystem to link up its uio node with its target configuration, TCMU needs to know the minor number that UIO assigned. Expose the definition of this struct so TCMU can access this field. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/uio/uio.c | 12 include/linux/uio_driver.h | 12 +++- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index a673e5b..60fa627 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -28,18 +28,6 @@ #define UIO_MAX_DEVICES(1U MINORBITS) -struct uio_device { - struct module *owner; - struct device *dev; - int minor; - atomic_tevent; - struct fasync_struct*async_queue; - wait_queue_head_t wait; - struct uio_info *info; - struct kobject *map_dir; - struct kobject *portio_dir; -}; - static int uio_major; static struct cdev *uio_cdev; static DEFINE_IDR(uio_idr); diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 1ad4724..baa8171 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -63,7 +63,17 @@ struct uio_port { #define MAX_UIO_PORT_REGIONS 5 -struct uio_device; +struct uio_device { +struct module *owner; +struct device *dev; +int minor; +atomic_tevent; +struct fasync_struct*async_queue; +wait_queue_head_t wait; +struct uio_info *info; +struct kobject *map_dir; +struct kobject *portio_dir; +}; /** * struct uio_info - UIO device capabilities -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2 4/4] target: Add documentation on the target userspace pass-through driver
Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Signed-off-by: Andy Grover agro...@redhat.com --- Documentation/target/tcmu-design.txt | 229 +++ 1 file changed, 229 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..1012fe4 --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,229 @@ +TCM Userspace Design + + + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as backstores +or storage engines. The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, TCMU. + + +Benefits: + +In addition to allowing relatively easy support for RBD and GLFS, TCMU +will also allow easier development of new backstores. TCMU combines +with the LIO loopback fabric to become something similar to FUSE +(Filesystem in Userspace), but at the SCSI layer instead of the +filesystem layer. A SUSE, if you will. + +The disadvantage is there are more distinct components to configure, and +potentially to malfunction. This is unavoidable, but hopefully not +fatal if we're careful to keep things as simple as possible. + +Design constraints: + +- Good performance: high throughput, low latency +- Cleanly handle if userspace: + 1) never attaches + 2) hangs + 3) dies + 4) misbehaves +- Allow future flexibility in user kernel implementations +- Be reasonably memory-efficient +- Simple to configure run +- Simple to write a userspace backend + + +Implementation overview: + +The core of the TCMU interface is a memory region that is shared +between kernel and userspace. Within this region is: a control area +(mailbox); a lockless producer/consumer circular buffer for commands +to be passed up, and status returned; and an in/out data buffer area. + +TCMU uses the pre-existing UIO subsystem. UIO allows device driver +development in userspace, and this is conceptually very close to the +TCMU use case, except instead of a physical device, TCMU implements a +memory-mapped layout designed for SCSI commands. Using UIO also +benefits TCMU by handling device introspection (e.g. a way for +userspace to determine how large the shared region is) and signaling +mechanisms in both directions. + +There are no embedded pointers in the memory region. Everything is +expressed as an offset from the region's starting address. This allows +the ring to still work if the user process dies and is restarted with +the region mapped at a different virtual address. + +See target_core_user.h for the struct definitions. + +The Mailbox: + +The mailbox is always at the start of the shared memory region, and +contains a version, details about the starting offset and size of the +command ring, and head and tail pointers to be used by the kernel and +userspace (respectively) to put commands on the ring, and indicate +when the commands are completed. + +version - 1 (userspace should abort if otherwise) +flags - none yet defined. +cmdr_off - The offset of the start of the command ring from the start +of the memory region, to account for the mailbox size. +cmdr_size - The size of the command ring. This does *not* need to be a +power of two. +cmd_head - Modified by the kernel to indicate when a command has been +placed on the ring. +cmd_tail - Modified by userspace to indicate when it has completed +processing of a command. + +The Command Ring: + +Commands are placed on the ring by the kernel incrementing +mailbox.cmd_head by the size of the command, modulo cmdr_size, and +then signaling userspace via uio_event_notify(). Once the command is +completed, userspace updates mailbox.cmd_tail
[RFCv2 3/4] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- SUSE?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for scsi command status and sense data, if present. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. * Random printks in code, still. Signed-off-by: Andy Grover agro...@redhat.com --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1158 include/uapi/linux/Kbuild |1 + include/uapi/linux/target_core_user.h | 142 6 files changed, 1311 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 include/uapi/linux/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate TCM/USER Subsystem Plugin for Linux + help + Say Y here to enable the TCM/USER subsystem plugin + source drivers/target/loopback/Kconfig source drivers/target/tcm_fc/Kconfig source drivers/target/iscsi/Kconfig diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 7fa62fc..f018a8c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err(Unable to load target_core_pscsi\n); + ret = request_module(target_core_user); + if (ret != 0) + pr_err(Unable to load target_core_user\n); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..d959e04 --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1158 @@ +/* + * Copyright (C) 2013 Shaohua Li s...@kernel.org + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
[re-adding individual CCs that were dropped] On 07/05/2014 04:29 AM, Alex Elsayed wrote: +Device Discovery: + +Other devices may be using UIO besides TCMU. Unrelated user processes +may also be handling different sets of TCMU devices. TCMU userspace +processes must find their devices by scanning sysfs +class/uio/uio*/name. For TCMU devices, these names will be of the +format: + +tcm-user// + +where "tcm-user" is common for all TCMU-backed UIO devices. +will be a userspace-process-unique string to identify the TCMU device +as expecting to be backed by a certain handler, and will be an +additional handler-specific string for the user process to configure +the device, if needed. Neither or can contain ':', +due to LIO limitations. It might be good to change this somewhat; in the vast majority of cases it'd be saner for userspace programs to figure this information out via udev etc. rather than parsing sysfs themselves. This information is still worth documenting, but saying things like "must find their devices by scanning sysfs" is likely to lead to users of this interface making suboptimal choices. I agree. There's no getting around a certain degree of work required by the backing user program. I'm planning on writing a "tcmu-runner" program with a plugin interface, that will handle the event loop, device notifications, enumeration, and possibly thread pools, to minimize the amount of boilerplate code each implementation must contain. +Device Events: + +If a new device is added or removed, user processes will recieve a HUP +signal, and should re-scan sysfs. File descriptors for devices no +longer in sysfs should be closed, and new devices should be opened and +handled. Is there a cleaner way to do this? In particular, re-scanning sysfs may cause race conditions (device removed, one of the same name re-added but a different UIO device node; probably more to be found). Perhaps recommend netlink uevents, so that remove+add is noticeable? Also, is the SIGHUP itself the best option? Could we simply require the user process to listen for add/remove uevents to get such change notifications, and thus enforce good behavior? Yes this sounds better, let's do it this way. One use case I'm actually interested in is having userspace provide something other than just SPC - for instance, tgt can provide a virtual tape library or an OSD, and CDemu can provide emulated optical discs from various image formats. Currently, CDemu uses its own out-of-tree driver called VHBA (Virtual Host Bus Adapter) to do pretty much exactly what TCMU+Loopback would accomplish... and in the process misses out on all of the other fabrics, unless you're willing to _re-import_ those devices using PSCSI, which has its own quirks. Perhaps there could be a level 0 (or 4, or whatever) which means "explicitly enabled list of commands" - maybe as a bitmap that could be passed to the kernel somehow? Hopefully, that could also avoid some of the quirks of PSCSI regarding ALUA and such - if it's not implemented, leave the relevant bits at zero, and LIO handles it. I'm beginning to sour on pass_level and having configurable cmd filtering in the kernel interface. I think a less-clever but simpler approach might be to eliminate filtering, and the user process can return CHECK_CONDITION, INVALID COMMAND OPERATION CODE for commands it doesn't wish to support. TCMU checks for this, and the pending command thus returned can still be emulated by LIO (it looks like we could just re-call sbc_parse_cdb and target_execute_cmd). This does look really nice, thanks for writing it! Thanks for your helpful feedback! :) -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
[re-adding individual CCs that were dropped] On 07/05/2014 04:29 AM, Alex Elsayed wrote: +Device Discovery: + +Other devices may be using UIO besides TCMU. Unrelated user processes +may also be handling different sets of TCMU devices. TCMU userspace +processes must find their devices by scanning sysfs +class/uio/uio*/name. For TCMU devices, these names will be of the +format: + +tcm-user/subtype/path + +where tcm-user is common for all TCMU-backed UIO devices. subtype +will be a userspace-process-unique string to identify the TCMU device +as expecting to be backed by a certain handler, and path will be an +additional handler-specific string for the user process to configure +the device, if needed. Neither subtype or path can contain ':', +due to LIO limitations. It might be good to change this somewhat; in the vast majority of cases it'd be saner for userspace programs to figure this information out via udev etc. rather than parsing sysfs themselves. This information is still worth documenting, but saying things like must find their devices by scanning sysfs is likely to lead to users of this interface making suboptimal choices. I agree. There's no getting around a certain degree of work required by the backing user program. I'm planning on writing a tcmu-runner program with a plugin interface, that will handle the event loop, device notifications, enumeration, and possibly thread pools, to minimize the amount of boilerplate code each implementation must contain. +Device Events: + +If a new device is added or removed, user processes will recieve a HUP +signal, and should re-scan sysfs. File descriptors for devices no +longer in sysfs should be closed, and new devices should be opened and +handled. Is there a cleaner way to do this? In particular, re-scanning sysfs may cause race conditions (device removed, one of the same name re-added but a different UIO device node; probably more to be found). Perhaps recommend netlink uevents, so that remove+add is noticeable? Also, is the SIGHUP itself the best option? Could we simply require the user process to listen for add/remove uevents to get such change notifications, and thus enforce good behavior? Yes this sounds better, let's do it this way. One use case I'm actually interested in is having userspace provide something other than just SPC - for instance, tgt can provide a virtual tape library or an OSD, and CDemu can provide emulated optical discs from various image formats. Currently, CDemu uses its own out-of-tree driver called VHBA (Virtual Host Bus Adapter) to do pretty much exactly what TCMU+Loopback would accomplish... and in the process misses out on all of the other fabrics, unless you're willing to _re-import_ those devices using PSCSI, which has its own quirks. Perhaps there could be a level 0 (or 4, or whatever) which means explicitly enabled list of commands - maybe as a bitmap that could be passed to the kernel somehow? Hopefully, that could also avoid some of the quirks of PSCSI regarding ALUA and such - if it's not implemented, leave the relevant bits at zero, and LIO handles it. I'm beginning to sour on pass_level and having configurable cmd filtering in the kernel interface. I think a less-clever but simpler approach might be to eliminate filtering, and the user process can return CHECK_CONDITION, INVALID COMMAND OPERATION CODE for commands it doesn't wish to support. TCMU checks for this, and the pending command thus returned can still be emulated by LIO (it looks like we could just re-call sbc_parse_cdb and target_execute_cmd). This does look really nice, thanks for writing it! Thanks for your helpful feedback! :) -- Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/2] target: userspace pass-through backend
Hi, here's some additional beginning bits for the configuration and use of this code: Changes to targetcli-fb and rtslib-fb: https://github.com/agrover/targetcli-fb/tree/userback https://github.com/agrover/rtslib-fb/tree/userback An incredibly-crappy stand-in for the user process that will handle requests: https://github.com/agrover/test-userback I look forward to your feedback :) Thanks -- Regards -- Andy p.s. kernel changes are also available here: https://github.com/agrover/linux/tree/userback -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/2] target: userspace pass-through backend
Hi, here's some additional beginning bits for the configuration and use of this code: Changes to targetcli-fb and rtslib-fb: https://github.com/agrover/targetcli-fb/tree/userback https://github.com/agrover/rtslib-fb/tree/userback An incredibly-crappy stand-in for the user process that will handle requests: https://github.com/agrover/test-userback I look forward to your feedback :) Thanks -- Regards -- Andy p.s. kernel changes are also available here: https://github.com/agrover/linux/tree/userback -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 2/2] target: Add a user-passthrough backstore
Add a LIO storage engine that presents commands to userspace for execution. This would allow more complex backstores to be implemented out-of-kernel, and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?) possible. It uses a mmap()able UIO device per LUN to share a command ring and data area. The commands are raw SCSI CDBs and iovs for in/out data. The command ring is also reused for scsi command status and sense data, if present. This implementation is based on Shaohua Li's earlier version but heavily modified. Differences include: * Shared memory allocated by kernel, not locked-down user pages * Single ring for command request and response * Offsets instead of embedded pointers * Generic SCSI CDB passthrough instead of per-cmd specialization in ring format. * Uses UIO device instead of anon_file passed in mailbox. * Optional in-kernel handling of some commands. The main reason for these differences is to permit greater resiliency if the user process dies or hangs. Things not yet implemented (on purpose): * Zero copy. The data area is flexible enough to allow page flipping or backend-allocated pages to be used by fabrics, but it's not clear these are performance wins. Can come later. * Out-of-order command completion by userspace. Possible to add by just allowing userspace to change cmd_id in rsp cmd entries, but currently not supported. * No locks between kernel cmd submission and completion routines. Sounds like it's possible, but this can come later. * Sparse allocation of mmaped area. Current code vmallocs the whole thing. If the mapped area was larger and not fully mapped then the driver would have more freedom to change cmd and data area sizes based on demand. Current code open issues: * The use of idrs may be overkill -- we maybe can replace them with a simple counter to generate cmd_ids, and a hash table to get a cmd_id's associated pointer. * Use of a free-running counter for cmd ring instead of explicit modulo math. This would require power-of-2 cmd ring size. * Random printks in code, still. Signed-off-by: Andy Grover --- drivers/target/Kconfig |5 + drivers/target/Makefile|1 + drivers/target/target_core_transport.c |4 + drivers/target/target_core_user.c | 1078 drivers/target/target_core_user.h | 126 5 files changed, 1214 insertions(+) create mode 100644 drivers/target/target_core_user.c create mode 100644 drivers/target/target_core_user.h diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index dc2d84a..b03a845 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -31,6 +31,11 @@ config TCM_PSCSI Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered passthrough access to Linux/SCSI device +config TCM_USER + tristate "TCM/USER Subsystem Plugin for Linux" + help + Say Y here to enable the TCM/USER subsystem plugin + source "drivers/target/loopback/Kconfig" source "drivers/target/tcm_fc/Kconfig" source "drivers/target/iscsi/Kconfig" diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 85b012d..bbb4a7d 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o obj-$(CONFIG_TCM_IBLOCK) += target_core_iblock.o obj-$(CONFIG_TCM_FILEIO) += target_core_file.o obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o +obj-$(CONFIG_TCM_USER) += target_core_user.o # Fabric modules obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 7fa62fc..f018a8c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -232,6 +232,10 @@ void transport_subsystem_check_init(void) if (ret != 0) pr_err("Unable to load target_core_pscsi\n"); + ret = request_module("target_core_user"); + if (ret != 0) + pr_err("Unable to load target_core_user\n"); + sub_api_initialized = 1; } diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c new file mode 100644 index 000..544f3c5 --- /dev/null +++ b/drivers/target/target_core_user.c @@ -0,0 +1,1078 @@ +/* + * Copyright (C) 2013 Shaohua Li + * Copyright (C) 2014 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details.