from:"Andy Grover"

Re: linux-next: Signed-off-by missing for commits in the net-next tree

2019-08-16 Thread Andy Grover

On 8/16/19 3:06 PM, Gerd Rausch wrote:
> Hi,
> 
> Just added the e-mail addresses I found using a simple "google search",
> in order to reach out to the original authors of these commits:
> Chris Mason and Andy Grover.
> 
> I'm hoping they still remember their work from 7-8 years ago.

Yes looks like what I was working on. What did you need from me? It's
too late to amend the commitlogs...

-- Andy

> 
> Thanks,
> 
>   Gerd
> 
> On 15/08/2019 14.53, Stephen Rothwell wrote:
>> Hi all,
>>
>> Commits
>>
>>   11740ef44829 ("rds: check for excessive looping in rds_send_xmit")
>>   65dedd7fe1f2 ("RDS: limit the number of times we loop in rds_send_xmit")
>>
>> are missing a Signed-off-by from their authors.
>>

how to unmap pages in an anonymous mmap?

2017-02-27 Thread Andy Grover

On 02/26/2017 09:59 PM, Xiubo Li wrote:
>> But, We likely don't want to release memory from the data area anyways
>> while active, in any case. How about if we set a timer when active
>> commands go to zero, and then reduce data area to some minimum if no new
>> cmds come in before timer expires?
> 
> If I understand correctly: for example, we have 1G(as the minimum)
> data area and all blocks have been allocated and mapped to runner's
> vma, then we extern it to 1G + 256M as needed. When there have no
> active cmds and after the timer expires, will it reduce the data area
> back to 1G ? And then should it release the reduced 256M data area's
> memories ?
> 
> If so, after kfree()ed the blocks' memories, it should also try to remove
> all the ptes which are mapping this page(like using the try_to_umap()),
> but something like try_to_umap() doesn't export for the modules.
> 
> Without ummaping the kfree()ed pages' ptes mentioned above, then
> the reduced 256M vma space couldn't be reused again for the runner
> process, because the runner has already do the mapping for the reduced
> vma space to some old physical pages(won't trigger new page fault
> again). Then there will be a hole, and the hole will be bigger and bigger.
> 
> Without ummaping the kfree()ed pages' ptes mentioned above, the
> pages' reference count (page_ref_dec(), which _inc()ed in page fault)
> couldn't be reduced back too.

Let's ask people who will know...

Hi linux-mm,

TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed
memory to back a ring buffer that is mmaped by userspace.

We want to move to dynamically mapping pages into this region, and also
we'd like to unmap/free pages when idle. What's the right way to unmap?
I see unmap_mapping_range() but that mentions an underlying file, which
TCMU doesn't have. Or maybe zap_page_range()? But it's not exported.

Any advice?

Thanks in advance -- Regards -- Andy

how to unmap pages in an anonymous mmap?

2017-02-27 Thread Andy Grover

On 02/26/2017 09:59 PM, Xiubo Li wrote:
>> But, We likely don't want to release memory from the data area anyways
>> while active, in any case. How about if we set a timer when active
>> commands go to zero, and then reduce data area to some minimum if no new
>> cmds come in before timer expires?
> 
> If I understand correctly: for example, we have 1G(as the minimum)
> data area and all blocks have been allocated and mapped to runner's
> vma, then we extern it to 1G + 256M as needed. When there have no
> active cmds and after the timer expires, will it reduce the data area
> back to 1G ? And then should it release the reduced 256M data area's
> memories ?
> 
> If so, after kfree()ed the blocks' memories, it should also try to remove
> all the ptes which are mapping this page(like using the try_to_umap()),
> but something like try_to_umap() doesn't export for the modules.
> 
> Without ummaping the kfree()ed pages' ptes mentioned above, then
> the reduced 256M vma space couldn't be reused again for the runner
> process, because the runner has already do the mapping for the reduced
> vma space to some old physical pages(won't trigger new page fault
> again). Then there will be a hole, and the hole will be bigger and bigger.
> 
> Without ummaping the kfree()ed pages' ptes mentioned above, the
> pages' reference count (page_ref_dec(), which _inc()ed in page fault)
> couldn't be reduced back too.

Let's ask people who will know...

Hi linux-mm,

TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed
memory to back a ring buffer that is mmaped by userspace.

We want to move to dynamically mapping pages into this region, and also
we'd like to unmap/free pages when idle. What's the right way to unmap?
I see unmap_mapping_range() but that mentions an underlying file, which
TCMU doesn't have. Or maybe zap_page_range()? But it's not exported.

Any advice?

Thanks in advance -- Regards -- Andy

Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-24 Thread Andy Grover

On 02/23/2017 06:07 PM, Xiubo Li wrote:
>> Cool. This is a good approach for an initial patch but this raises
>> concerns about efficiently managing kernel memory usage -- the data area
>> grows but never shrinks, and total possible usage increases per
>> backstore. (What if there are 1000?) Any ideas how we could also improve
>> these aspects of the design? (Global TCMU data area usage limit?)
> Two ways in my mind:
> 
> The first:
> How about by setting a threshold cmd(SHRINK cmd), something likes
> the PAD cmd, to tell the userspace runner try to shrink the memories?

Why should userspace need to know if the kernel is shrinking memory
allocated to the data area? Userspace knows about memory described in
iovecs for in-flight cmds, we shouldn't need its cooperation to free
other allocated parts of the data area.

But, We likely don't want to release memory from the data area anyways
while active, in any case. How about if we set a timer when active
commands go to zero, and then reduce data area to some minimum if no new
cmds come in before timer expires?

> When the runner get the SHRINK cmd, it will try to remmap uio0's ring
> buffer(?). Then the kernel will get chance to shrink the memories
> 
> The second:
> Try to extern the data area by using /dev/uio1, we could remmap the
> uio1 device when need, so it will be easy to get a chance to shrink the
> memories in uio1.

Userspace should not need to remap the region in order for the kernel to
free and unmap pages from the region.

The only thing we need to watch out for is if blocks are referenced by
in-flight cmds, we can't free them or userspace will segfault. So, if we
wait until there are no in-flight cmds, then it follows that the kernel
can free whatever it wants and userspace will not segfault.

-- Andy

Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-24 Thread Andy Grover

On 02/23/2017 06:07 PM, Xiubo Li wrote:
>> Cool. This is a good approach for an initial patch but this raises
>> concerns about efficiently managing kernel memory usage -- the data area
>> grows but never shrinks, and total possible usage increases per
>> backstore. (What if there are 1000?) Any ideas how we could also improve
>> these aspects of the design? (Global TCMU data area usage limit?)
> Two ways in my mind:
> 
> The first:
> How about by setting a threshold cmd(SHRINK cmd), something likes
> the PAD cmd, to tell the userspace runner try to shrink the memories?

Why should userspace need to know if the kernel is shrinking memory
allocated to the data area? Userspace knows about memory described in
iovecs for in-flight cmds, we shouldn't need its cooperation to free
other allocated parts of the data area.

But, We likely don't want to release memory from the data area anyways
while active, in any case. How about if we set a timer when active
commands go to zero, and then reduce data area to some minimum if no new
cmds come in before timer expires?

> When the runner get the SHRINK cmd, it will try to remmap uio0's ring
> buffer(?). Then the kernel will get chance to shrink the memories
> 
> The second:
> Try to extern the data area by using /dev/uio1, we could remmap the
> uio1 device when need, so it will be easy to get a chance to shrink the
> memories in uio1.

Userspace should not need to remap the region in order for the kernel to
free and unmap pages from the region.

The only thing we need to watch out for is if blocks are referenced by
in-flight cmds, we can't free them or userspace will segfault. So, if we
wait until there are no in-flight cmds, then it follows that the kernel
can free whatever it wants and userspace will not segfault.

-- Andy

Re: [PATCH] target/user: Add daynmic growing data area feature support

2017-02-22 Thread Andy Grover

On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> Currently for the TCMU, the ring buffer size is fixed to 64K cmd
> area + 1M data area, and this will be bottlenecks for high iops.

Hi Xiubo, thanks for your work.

daynmic -> dynamic

Have you benchmarked this patch and determined what kind of iops
improvement it allows? Do you see the data area reaching its
fully-allocated size?

> 
> The struct tcmu_cmd_entry {} size is fixed about 44 bytes without
> iovec[N], and the size of struct iovec[N] is about (16 * N) bytes.
> 
> The sizeof(cmd entry) will be [44B, N *16 + 44B], and corresponding
> iovec[N] size will be [0, N * 4096], so the ratio of
> sizeof(cmd entry) : sizeof(entry datas) ==
> (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) +
> 44/(N*4096) == 4/1024 + 11/(N * 1024).
> 
> When N is bigger, the ratio will be smaller. If N >= 1, the ratio
> will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be
> enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio
> should be bigger.
> 
> For now we will increase the data area size to 1G, and the cmd area
> size to 128M. The tcmu-runner should mmap() about (128M + 1G) when
> running and the TCMU will dynamically grows the data area from 0 to
> max 1G size.

Cool. This is a good approach for an initial patch but this raises
concerns about efficiently managing kernel memory usage -- the data area
grows but never shrinks, and total possible usage increases per
backstore. (What if there are 1000?) Any ideas how we could also improve
these aspects of the design? (Global TCMU data area usage limit?)

> 
> The cmd area memory will be allocated through vmalloc(), and the data
> area's blocks will be allocated individually later when needed.
> 
> The allocated data area block memory will be managed via radix tree.
> For now the bitmap still be the most efficient way to search and
> manage the block index, this could be update later.
> 
> Signed-off-by: Xiubo Li 
> Signed-off-by: Jianfei Hu 
> ---
>  drivers/target/target_core_user.c | 241 
> ++
>  1 file changed, 167 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 2e33100..f2e3769 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -2,6 +2,7 @@
>   * Copyright (C) 2013 Shaohua Li 
>   * Copyright (C) 2014 Red Hat, Inc.
>   * Copyright (C) 2015 Arrikto, Inc.
> + * Copyright (C) 2017 Chinamobile, Inc.
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms and conditions of the GNU General Public License,
> @@ -25,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -65,12 +67,15 @@
>  
>  #define TCMU_TIME_OUT (30 * MSEC_PER_SEC)
>  
> -#define DATA_BLOCK_BITS 256
> -#define DATA_BLOCK_SIZE 4096
> +/* For cmd area, the size is fixed 64M */
> +#define CMDR_SIZE (64 * 1024 * 1024)

Commitmsg mistakenly says 128M..?

> -#define CMDR_SIZE (16 * 4096)
> +/* For data area, the size is fixed 1G */
> +#define DATA_BLOCK_BITS (256 * 1024)
> +#define DATA_BLOCK_SIZE 4096
>  #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE)
>  
> +/* The ring buffer size is 64M + 1G */
>  #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE)
>  
>  static struct device *tcmu_root_device;
> @@ -103,6 +108,7 @@ struct tcmu_dev {
>   size_t data_size;
>  
>   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
> + struct radix_tree_root data_blocks;
>  
>   wait_queue_head_t wait_cmdr;
>   /* TODO should this be a mutex? */
> @@ -120,15 +126,22 @@ struct tcmu_dev {
>  
>  #define CMDR_OFF sizeof(struct tcmu_mailbox)
>  
> +struct tcmu_data_block_index {
> + struct list_head list;
> + uint32_t index;
> +};

is this used anywhere? leftover from an earlier version?

> +
>  struct tcmu_cmd {
>   struct se_cmd *se_cmd;
>   struct tcmu_dev *tcmu_dev;
>  
> - uint16_t cmd_id;
> + uint32_t cmd_id;

why this? tcmu_cmd_entry_hdr has a u16 cmd_id, so making this bigger
doesn't really help anything. (natural alignment? But, compiler should
be inserting padding as needed.)

>  
>   /* Can't use se_cmd when cleaning up expired cmds, because if
>  cmd has been completed then accessing se_cmd is off limits */
> - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
> + uint32_t dbi_cnt;
> + uint32_t dbi_cur;
> + uint32_t *dbi;

Ah, cool. Saves tcmu_cmd from being huge, now that data_bitmap is 32MiB.



> @@ -919,7 +999,7 @@ static int tcmu_configure_device(struct se_device *dev)
>  
>   info->name = str;
>  
> - udev->mb_addr = vzalloc(TCMU_RING_SIZE);
> + udev->mb_addr = vzalloc(CMDR_SIZE);

I'm still concerned about this -- 64M vmalloc'd for cmd ring seems like

Re: [PATCH] target/user: Add daynmic growing data area feature support

2017-02-22 Thread Andy Grover

On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> Currently for the TCMU, the ring buffer size is fixed to 64K cmd
> area + 1M data area, and this will be bottlenecks for high iops.

Hi Xiubo, thanks for your work.

daynmic -> dynamic

Have you benchmarked this patch and determined what kind of iops
improvement it allows? Do you see the data area reaching its
fully-allocated size?

> 
> The struct tcmu_cmd_entry {} size is fixed about 44 bytes without
> iovec[N], and the size of struct iovec[N] is about (16 * N) bytes.
> 
> The sizeof(cmd entry) will be [44B, N *16 + 44B], and corresponding
> iovec[N] size will be [0, N * 4096], so the ratio of
> sizeof(cmd entry) : sizeof(entry datas) ==
> (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) +
> 44/(N*4096) == 4/1024 + 11/(N * 1024).
> 
> When N is bigger, the ratio will be smaller. If N >= 1, the ratio
> will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be
> enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio
> should be bigger.
> 
> For now we will increase the data area size to 1G, and the cmd area
> size to 128M. The tcmu-runner should mmap() about (128M + 1G) when
> running and the TCMU will dynamically grows the data area from 0 to
> max 1G size.

Cool. This is a good approach for an initial patch but this raises
concerns about efficiently managing kernel memory usage -- the data area
grows but never shrinks, and total possible usage increases per
backstore. (What if there are 1000?) Any ideas how we could also improve
these aspects of the design? (Global TCMU data area usage limit?)

> 
> The cmd area memory will be allocated through vmalloc(), and the data
> area's blocks will be allocated individually later when needed.
> 
> The allocated data area block memory will be managed via radix tree.
> For now the bitmap still be the most efficient way to search and
> manage the block index, this could be update later.
> 
> Signed-off-by: Xiubo Li 
> Signed-off-by: Jianfei Hu 
> ---
>  drivers/target/target_core_user.c | 241 
> ++
>  1 file changed, 167 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 2e33100..f2e3769 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -2,6 +2,7 @@
>   * Copyright (C) 2013 Shaohua Li 
>   * Copyright (C) 2014 Red Hat, Inc.
>   * Copyright (C) 2015 Arrikto, Inc.
> + * Copyright (C) 2017 Chinamobile, Inc.
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms and conditions of the GNU General Public License,
> @@ -25,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -65,12 +67,15 @@
>  
>  #define TCMU_TIME_OUT (30 * MSEC_PER_SEC)
>  
> -#define DATA_BLOCK_BITS 256
> -#define DATA_BLOCK_SIZE 4096
> +/* For cmd area, the size is fixed 64M */
> +#define CMDR_SIZE (64 * 1024 * 1024)

Commitmsg mistakenly says 128M..?

> -#define CMDR_SIZE (16 * 4096)
> +/* For data area, the size is fixed 1G */
> +#define DATA_BLOCK_BITS (256 * 1024)
> +#define DATA_BLOCK_SIZE 4096
>  #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE)
>  
> +/* The ring buffer size is 64M + 1G */
>  #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE)
>  
>  static struct device *tcmu_root_device;
> @@ -103,6 +108,7 @@ struct tcmu_dev {
>   size_t data_size;
>  
>   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
> + struct radix_tree_root data_blocks;
>  
>   wait_queue_head_t wait_cmdr;
>   /* TODO should this be a mutex? */
> @@ -120,15 +126,22 @@ struct tcmu_dev {
>  
>  #define CMDR_OFF sizeof(struct tcmu_mailbox)
>  
> +struct tcmu_data_block_index {
> + struct list_head list;
> + uint32_t index;
> +};

is this used anywhere? leftover from an earlier version?

> +
>  struct tcmu_cmd {
>   struct se_cmd *se_cmd;
>   struct tcmu_dev *tcmu_dev;
>  
> - uint16_t cmd_id;
> + uint32_t cmd_id;

why this? tcmu_cmd_entry_hdr has a u16 cmd_id, so making this bigger
doesn't really help anything. (natural alignment? But, compiler should
be inserting padding as needed.)

>  
>   /* Can't use se_cmd when cleaning up expired cmds, because if
>  cmd has been completed then accessing se_cmd is off limits */
> - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
> + uint32_t dbi_cnt;
> + uint32_t dbi_cur;
> + uint32_t *dbi;

Ah, cool. Saves tcmu_cmd from being huge, now that data_bitmap is 32MiB.



> @@ -919,7 +999,7 @@ static int tcmu_configure_device(struct se_device *dev)
>  
>   info->name = str;
>  
> - udev->mb_addr = vzalloc(TCMU_RING_SIZE);
> + udev->mb_addr = vzalloc(CMDR_SIZE);

I'm still concerned about this -- 64M vmalloc'd for cmd ring seems like
too much. (again: what if there are 1000 tcmu backstores?) and is only
needed if data area grows to max

Re: [PATCH] uio: add UIO_MEM_CUSTOM support

2017-02-15 Thread Andy Grover

On 02/15/2017 05:34 PM, Xiubo Li wrote:
>>> --- a/drivers/uio/uio.c
>>> +++ b/drivers/uio/uio.c
>>> @@ -708,6 +708,8 @@ static int uio_mmap(struct file *filep, struct
>>> vm_area_struct *vma)
>>>   case UIO_MEM_LOGICAL:
>>>   case UIO_MEM_VIRTUAL:
>>>   return uio_mmap_logical(vma);
>>> +case UIO_MEM_CUSTOM:
>>> +return 0;
>> How does this help?
> For example, the TCMU will use the map area as ISCSI commands & data ring
> buffer(uio0 --> map0). Currently the TCMU will using the fixed small
> size map
> area as the ring buffer, but this will be the bottleneck for high iops.
> 
> Without knowing how large it is enough, so the new scheme will use the
> fixed
> small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring
> buffer
> area(about 1.5G).

The following code is in uio_mmap() in uio.c:

if (idev->info->mmap) {
ret = idev->info->mmap(idev->info, vma);
return ret;
}

switch (idev->info->mem[mi].memtype) {
case UIO_MEM_PHYS:
return uio_mmap_physical(vma);
case UIO_MEM_LOGICAL:
case UIO_MEM_VIRTUAL:
return uio_mmap_logical(vma);
default:
return -EINVAL;
}

We already have the equivalent of a CUSTOM memtype because TCMU sets the
info->mmap fn, overriding uio's default handling choices in favor of its
own.

HTH -- Regards -- Andy

Re: [PATCH] uio: add UIO_MEM_CUSTOM support

2017-02-15 Thread Andy Grover

On 02/15/2017 05:34 PM, Xiubo Li wrote:
>>> --- a/drivers/uio/uio.c
>>> +++ b/drivers/uio/uio.c
>>> @@ -708,6 +708,8 @@ static int uio_mmap(struct file *filep, struct
>>> vm_area_struct *vma)
>>>   case UIO_MEM_LOGICAL:
>>>   case UIO_MEM_VIRTUAL:
>>>   return uio_mmap_logical(vma);
>>> +case UIO_MEM_CUSTOM:
>>> +return 0;
>> How does this help?
> For example, the TCMU will use the map area as ISCSI commands & data ring
> buffer(uio0 --> map0). Currently the TCMU will using the fixed small
> size map
> area as the ring buffer, but this will be the bottleneck for high iops.
> 
> Without knowing how large it is enough, so the new scheme will use the
> fixed
> small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring
> buffer
> area(about 1.5G).

The following code is in uio_mmap() in uio.c:

if (idev->info->mmap) {
ret = idev->info->mmap(idev->info, vma);
return ret;
}

switch (idev->info->mem[mi].memtype) {
case UIO_MEM_PHYS:
return uio_mmap_physical(vma);
case UIO_MEM_LOGICAL:
case UIO_MEM_VIRTUAL:
return uio_mmap_logical(vma);
default:
return -EINVAL;
}

We already have the equivalent of a CUSTOM memtype because TCMU sets the
info->mmap fn, overriding uio's default handling choices in favor of its
own.

HTH -- Regards -- Andy

Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events

2016-10-05 Thread Andy Grover


On 10/05/2016 10:43 AM, Alasdair G Kergon wrote:

On Wed, Oct 05, 2016 at 10:06:41AM -0700, Andy Grover wrote:

My project *would* like this added sooner, so I'll work on a revised
patchset that uses netlink instead of uevents, and will also work on a
revision to uevents.txt that talks about KOBJ_CHANGE and when it should
and should not be used, as described by agk, to help out the next person.


netlink might still be overkill for this at this stage, if it's only
making one thread able to monitor multiple devices efficiently.

Let's find out your requirements and see if there's anything different from
the ones we've debated before.


Yeah just like Hannes said, I just want to get dm events for multiple dm 
devices without burning a thread for each device. After your 
clarification, netlink seems like the best option and not overkill.


Give me a few days and I can have something for you to review.

Thanks -- Regards -- Andy

Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events

2016-10-05 Thread Andy Grover


On 10/05/2016 10:43 AM, Alasdair G Kergon wrote:

On Wed, Oct 05, 2016 at 10:06:41AM -0700, Andy Grover wrote:

My project *would* like this added sooner, so I'll work on a revised
patchset that uses netlink instead of uevents, and will also work on a
revision to uevents.txt that talks about KOBJ_CHANGE and when it should
and should not be used, as described by agk, to help out the next person.


netlink might still be overkill for this at this stage, if it's only
making one thread able to monitor multiple devices efficiently.

Let's find out your requirements and see if there's anything different from
the ones we've debated before.


Yeah just like Hannes said, I just want to get dm events for multiple dm 
devices without burning a thread for each device. After your 
clarification, netlink seems like the best option and not overkill.


Give me a few days and I can have something for you to review.

Thanks -- Regards -- Andy

Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events

2016-10-05 Thread Andy Grover


On 10/04/2016 11:51 PM, Greg KH wrote:

On Wed, Oct 05, 2016 at 01:40:05AM +0100, Alasdair G Kergon wrote:



We see these as two different categories of notifications, and prefer
the greater flexibility a mechanism independent of uevents would
provide.  The team has discussed several alternatives over the years but
didn't make a decision as we've not yet reached a point where we're
straining the existing mechanism too far.


So, no changes need to be made?  I'm confused here, who is wanting this
changed?


Hehe, Alasdair and I are both RH but working on different projects. 
We're not the Borg :)


My project *would* like this added sooner, so I'll work on a revised 
patchset that uses netlink instead of uevents, and will also work on a 
revision to uevents.txt that talks about KOBJ_CHANGE and when it should 
and should not be used, as described by agk, to help out the next person.


Thanks -- Regards -- Andy

Re: [dm-devel] [PATCH 0/9] Generate uevents for all DM events

2016-10-05 Thread Andy Grover


On 10/04/2016 11:51 PM, Greg KH wrote:

On Wed, Oct 05, 2016 at 01:40:05AM +0100, Alasdair G Kergon wrote:



We see these as two different categories of notifications, and prefer
the greater flexibility a mechanism independent of uevents would
provide.  The team has discussed several alternatives over the years but
didn't make a decision as we've not yet reached a point where we're
straining the existing mechanism too far.


So, no changes need to be made?  I'm confused here, who is wanting this
changed?


Hehe, Alasdair and I are both RH but working on different projects. 
We're not the Borg :)


My project *would* like this added sooner, so I'll work on a revised 
patchset that uses netlink instead of uevents, and will also work on a 
revision to uevents.txt that talks about KOBJ_CHANGE and when it should 
and should not be used, as described by agk, to help out the next person.


Thanks -- Regards -- Andy

Re: [PATCH 0/9] Generate uevents for all DM events

2016-10-04 Thread Andy Grover


On 10/04/2016 12:20 AM, Greg KH wrote:

On Mon, Oct 03, 2016 at 12:22:51PM -0700, Andy Grover wrote:

Hi Mike and GregKH,

I want a way to get devicemapper events without using the DM ioctl,
because that requires creating a thread to sleep in the ioctl for each
dm device I want events from.

It would seem like using uevents and KOBJ_CHANGE would be a good way
to do this, but Mike said that the uevent maintainers (Greg that's
you?) did not think this was a good idea?


KBOJ_CHANGE is a tricky one.  It has been used for a variety of
different things, but usually it is used to show that a major change has
happened with a device like a docking station plug in or out.

How often do DM events happen?  What is triggering them?  How do you
want to send them to userspace?  Through a sysfs file?  Why not just use
your own netlink connection?


devicemapper is using uevents for:

a. dm-verity detected corruption
b. dm-multipath: path failed or reinstated
c. dm device renamed
d. there's also some use in md and bcache.

devicemapper uses DM_EVENT ioctl (yuck) for:

1. dm-thin pool data/metadata filling up (hit a threshold)
2. dm-cache is now clean
3. dm-log flushed or log failed
4. dm-raid error detected or sync complete

Instead of using uevent for everything, we could go to a separate 
genetlink for 1-4 instead of making them use uevent like a-d, but we'd 
end up with two different userspace notification techniques. At least to 
me, there doesn't seem to be much technical differentiation between the 
two lists.


(Would we want to then at some point transition any of a-d off of uevent 
and onto genetlink for consistency's sake?)


Thanks -- Regards -- Andy

Re: [PATCH 0/9] Generate uevents for all DM events

2016-10-04 Thread Andy Grover


On 10/04/2016 12:20 AM, Greg KH wrote:

On Mon, Oct 03, 2016 at 12:22:51PM -0700, Andy Grover wrote:

Hi Mike and GregKH,

I want a way to get devicemapper events without using the DM ioctl,
because that requires creating a thread to sleep in the ioctl for each
dm device I want events from.

It would seem like using uevents and KOBJ_CHANGE would be a good way
to do this, but Mike said that the uevent maintainers (Greg that's
you?) did not think this was a good idea?


KBOJ_CHANGE is a tricky one.  It has been used for a variety of
different things, but usually it is used to show that a major change has
happened with a device like a docking station plug in or out.

How often do DM events happen?  What is triggering them?  How do you
want to send them to userspace?  Through a sysfs file?  Why not just use
your own netlink connection?


devicemapper is using uevents for:

a. dm-verity detected corruption
b. dm-multipath: path failed or reinstated
c. dm device renamed
d. there's also some use in md and bcache.

devicemapper uses DM_EVENT ioctl (yuck) for:

1. dm-thin pool data/metadata filling up (hit a threshold)
2. dm-cache is now clean
3. dm-log flushed or log failed
4. dm-raid error detected or sync complete

Instead of using uevent for everything, we could go to a separate 
genetlink for 1-4 instead of making them use uevent like a-d, but we'd 
end up with two different userspace notification techniques. At least to 
me, there doesn't seem to be much technical differentiation between the 
two lists.


(Would we want to then at some point transition any of a-d off of uevent 
and onto genetlink for consistency's sake?)


Thanks -- Regards -- Andy

[PATCH 2/9] dm: Move multipath-specific stuff out of dm-uevent.c

2016-10-03 Thread Andy Grover

There's a little bit of mpath-specific stuff that is in dm-uevent.c,
because all current uevents want to attach DM_PATH and DM_NR_VALID_PATHS
variables to the uevent. Makes sense since all currently defined DM
uevents are for dm-mpath, but is not true for future uevents. Move the
addition of these to dm-mpath.c and expose a few lower-level functions,
dm_build_uevent, dm_uevent_add and dm_uevent_free, for other dm targets to
build their own uevents.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 drivers/md/dm-mpath.c  | 71 +++
 drivers/md/dm-uevent.c | 75 +-
 drivers/md/dm-uevent.h | 30 
 drivers/md/dm.c|  1 +
 4 files changed, 103 insertions(+), 74 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index ac734e5..c563b6d 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -30,6 +30,15 @@
 #define DM_PG_INIT_DELAY_MSECS 2000
 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1)
 
+static const struct {
+   enum dm_uevent_type type;
+   enum kobject_action action;
+   char *name;
+} _dm_mpath_uevent_type_names[] = {
+   {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"},
+   {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"},
+};
+
 /* Path properties */
 struct pgpath {
struct list_head list;
@@ -1232,6 +1241,68 @@ static void multipath_dtr(struct dm_target *ti)
free_multipath(m);
 }
 
+static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md,
+ struct dm_target *ti,
+ enum kobject_action action,
+ const char *dm_action,
+ const char *path,
+ unsigned nr_valid_paths)
+{
+   struct dm_uevent *event;
+
+   event = dm_build_uevent(md, ti, action, dm_action);
+   if (IS_ERR(event))
+   return event;
+
+   if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) {
+   DMERR("%s: add_uevent_var() for DM_PATH failed", __func__);
+   goto err_add;
+   }
+
+   if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d",
+  nr_valid_paths)) {
+   DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed",
+ __func__);
+   goto err_add;
+   }
+
+   return event;
+
+err_add:
+   dm_uevent_free(event);
+   return ERR_PTR(-ENOMEM);
+}
+
+/**
+ * dm_path_uevent - called to create a new path event and queue it
+ *
+ * @event_type:path event type enum
+ * @ti:pointer to a dm_target
+ * @path:  string containing pathname
+ * @nr_valid_paths:number of valid paths remaining
+ *
+ */
+static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target 
*ti,
+  const char *path, unsigned nr_valid_paths)
+{
+   struct mapped_device *md = dm_table_get_md(ti->table);
+   struct dm_uevent *event;
+
+   if (event_type >= ARRAY_SIZE(_dm_mpath_uevent_type_names)) {
+   DMERR("%s: Invalid event_type %d", __func__, event_type);
+   return;
+   }
+
+   event = dm_build_path_uevent(md, ti,
+
_dm_mpath_uevent_type_names[event_type].action,
+
_dm_mpath_uevent_type_names[event_type].name,
+path, nr_valid_paths);
+   if (IS_ERR(event))
+   return;
+
+   dm_uevent_add(md, >elist);
+}
+
 /*
  * Take a path out of use.
  */
diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index 2d5f858..f7089dd 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -29,30 +29,13 @@
 
 #define DM_MSG_PREFIX "uevent"
 
-static const struct {
-   enum dm_uevent_type type;
-   enum kobject_action action;
-   char *name;
-} _dm_uevent_type_names[] = {
-   {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"},
-   {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"},
-};
-
 static struct kmem_cache *_dm_event_cache;
 
-struct dm_uevent {
-   struct mapped_device *md;
-   enum kobject_action action;
-   struct kobj_uevent_env ku_env;
-   struct list_head elist;
-   char name[DM_NAME_LEN];
-   char uuid[DM_UUID_LEN];
-};
-
-static void dm_uevent_free(struct dm_uevent *event)
+void dm_uevent_free(struct dm_uevent *event)
 {
kmem_cache_free(_dm_event_cache, event);
 }
+EXPORT_SYMBOL_GPL(dm_uevent_free);
 
 static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md)
 {
@@ -68,12 +51,10 @@ static struct dm_uevent *dm_uevent_alloc(stru

[PATCH 4/9] dm: Update dm-uevent.txt

2016-10-03 Thread Andy Grover

Document the current dm uevent API, as modified by this patchset.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 Documentation/device-mapper/dm-uevent.txt | 49 ++-
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 07edbd85..0a5d909 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -1,18 +1,53 @@
 The device-mapper uevent code adds the capability to device-mapper to create
 and send kobject uevents (uevents).  Previously device-mapper events were only
-available through the ioctl interface.  The advantage of the uevents interface
-is the event contains environment attributes providing increased context for
+available through the ioctl interface.  The advantages of the uevents interface
+are the event contains environment attributes providing increased context for
 the event avoiding the need to query the state of the device-mapper device 
after
-the event is received.
+the event is received, and uevents are poll()-able in userspace, whereas the
+ioctl event interface is not.
 
-There are two functions currently for device-mapper events.  The first function
-listed creates the event and the second function sends the event(s).
+There are five functions DM targets should use.
 
-void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
-const char *path, unsigned nr_valid_paths)
+struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_target *ti,
+enum kobject_action action,
+const char *dm_action)
+
+Construct a dm uevent with default DM variables attached(DM_TARGET,
+DM_ACTION, DM_SEQNUM).
+
+int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
+
+Optionally add event-specific data to the generated uevent. e.g. dm-mpath's
+PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths
+vars.
+
+void dm_uevent_add(struct mapped_device *md, struct list_head *elist)
+
+Hand off the uevent to the device's list of pending events.
+
+void dm_uevent_free(struct dm_uevent *event)
+
+Something went wrong, free the uevent instead of queueing it.
+
+void dm_table_event(struct dm_table *t)
+
+Triggers sending of queued uevents as well as waking up processes waiting on
+the ioctl.
+
+
+The device-mapper uevent code also supplies three functions for device-mapper 
core
+to call:
+
+int dm_uevent_init(void)
+void dm_uevent_exit(void)
+
+Setup and teardown.
 
 void dm_send_uevents(struct list_head *events, struct kobject *kobj)
 
+Actually send the uevents (called indirectly from dm_table_event, above).
+
 
 The variables added to the uevent environment are:
 
-- 
2.7.4

[PATCH 2/9] dm: Move multipath-specific stuff out of dm-uevent.c

2016-10-03 Thread Andy Grover

There's a little bit of mpath-specific stuff that is in dm-uevent.c,
because all current uevents want to attach DM_PATH and DM_NR_VALID_PATHS
variables to the uevent. Makes sense since all currently defined DM
uevents are for dm-mpath, but is not true for future uevents. Move the
addition of these to dm-mpath.c and expose a few lower-level functions,
dm_build_uevent, dm_uevent_add and dm_uevent_free, for other dm targets to
build their own uevents.

Signed-off-by: Andy Grover 
---
 drivers/md/dm-mpath.c  | 71 +++
 drivers/md/dm-uevent.c | 75 +-
 drivers/md/dm-uevent.h | 30 
 drivers/md/dm.c|  1 +
 4 files changed, 103 insertions(+), 74 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index ac734e5..c563b6d 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -30,6 +30,15 @@
 #define DM_PG_INIT_DELAY_MSECS 2000
 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1)
 
+static const struct {
+   enum dm_uevent_type type;
+   enum kobject_action action;
+   char *name;
+} _dm_mpath_uevent_type_names[] = {
+   {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"},
+   {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"},
+};
+
 /* Path properties */
 struct pgpath {
struct list_head list;
@@ -1232,6 +1241,68 @@ static void multipath_dtr(struct dm_target *ti)
free_multipath(m);
 }
 
+static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md,
+ struct dm_target *ti,
+ enum kobject_action action,
+ const char *dm_action,
+ const char *path,
+ unsigned nr_valid_paths)
+{
+   struct dm_uevent *event;
+
+   event = dm_build_uevent(md, ti, action, dm_action);
+   if (IS_ERR(event))
+   return event;
+
+   if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) {
+   DMERR("%s: add_uevent_var() for DM_PATH failed", __func__);
+   goto err_add;
+   }
+
+   if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d",
+  nr_valid_paths)) {
+   DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed",
+ __func__);
+   goto err_add;
+   }
+
+   return event;
+
+err_add:
+   dm_uevent_free(event);
+   return ERR_PTR(-ENOMEM);
+}
+
+/**
+ * dm_path_uevent - called to create a new path event and queue it
+ *
+ * @event_type:path event type enum
+ * @ti:pointer to a dm_target
+ * @path:  string containing pathname
+ * @nr_valid_paths:number of valid paths remaining
+ *
+ */
+static void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target 
*ti,
+  const char *path, unsigned nr_valid_paths)
+{
+   struct mapped_device *md = dm_table_get_md(ti->table);
+   struct dm_uevent *event;
+
+   if (event_type >= ARRAY_SIZE(_dm_mpath_uevent_type_names)) {
+   DMERR("%s: Invalid event_type %d", __func__, event_type);
+   return;
+   }
+
+   event = dm_build_path_uevent(md, ti,
+
_dm_mpath_uevent_type_names[event_type].action,
+
_dm_mpath_uevent_type_names[event_type].name,
+path, nr_valid_paths);
+   if (IS_ERR(event))
+   return;
+
+   dm_uevent_add(md, >elist);
+}
+
 /*
  * Take a path out of use.
  */
diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index 2d5f858..f7089dd 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -29,30 +29,13 @@
 
 #define DM_MSG_PREFIX "uevent"
 
-static const struct {
-   enum dm_uevent_type type;
-   enum kobject_action action;
-   char *name;
-} _dm_uevent_type_names[] = {
-   {DM_UEVENT_PATH_FAILED, KOBJ_CHANGE, "PATH_FAILED"},
-   {DM_UEVENT_PATH_REINSTATED, KOBJ_CHANGE, "PATH_REINSTATED"},
-};
-
 static struct kmem_cache *_dm_event_cache;
 
-struct dm_uevent {
-   struct mapped_device *md;
-   enum kobject_action action;
-   struct kobj_uevent_env ku_env;
-   struct list_head elist;
-   char name[DM_NAME_LEN];
-   char uuid[DM_UUID_LEN];
-};
-
-static void dm_uevent_free(struct dm_uevent *event)
+void dm_uevent_free(struct dm_uevent *event)
 {
kmem_cache_free(_dm_event_cache, event);
 }
+EXPORT_SYMBOL_GPL(dm_uevent_free);
 
 static struct dm_uevent *dm_uevent_alloc(struct mapped_device *md)
 {
@@ -68,12 +51,10 @@ static struct dm_uevent *dm_uevent_alloc(struct 
mapped_device *md)
return event;
 }
 
-static st

[PATCH 4/9] dm: Update dm-uevent.txt

2016-10-03 Thread Andy Grover

Document the current dm uevent API, as modified by this patchset.

Signed-off-by: Andy Grover 
---
 Documentation/device-mapper/dm-uevent.txt | 49 ++-
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 07edbd85..0a5d909 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -1,18 +1,53 @@
 The device-mapper uevent code adds the capability to device-mapper to create
 and send kobject uevents (uevents).  Previously device-mapper events were only
-available through the ioctl interface.  The advantage of the uevents interface
-is the event contains environment attributes providing increased context for
+available through the ioctl interface.  The advantages of the uevents interface
+are the event contains environment attributes providing increased context for
 the event avoiding the need to query the state of the device-mapper device 
after
-the event is received.
+the event is received, and uevents are poll()-able in userspace, whereas the
+ioctl event interface is not.
 
-There are two functions currently for device-mapper events.  The first function
-listed creates the event and the second function sends the event(s).
+There are five functions DM targets should use.
 
-void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
-const char *path, unsigned nr_valid_paths)
+struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_target *ti,
+enum kobject_action action,
+const char *dm_action)
+
+Construct a dm uevent with default DM variables attached(DM_TARGET,
+DM_ACTION, DM_SEQNUM).
+
+int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
+
+Optionally add event-specific data to the generated uevent. e.g. dm-mpath's
+PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths
+vars.
+
+void dm_uevent_add(struct mapped_device *md, struct list_head *elist)
+
+Hand off the uevent to the device's list of pending events.
+
+void dm_uevent_free(struct dm_uevent *event)
+
+Something went wrong, free the uevent instead of queueing it.
+
+void dm_table_event(struct dm_table *t)
+
+Triggers sending of queued uevents as well as waking up processes waiting on
+the ioctl.
+
+
+The device-mapper uevent code also supplies three functions for device-mapper 
core
+to call:
+
+int dm_uevent_init(void)
+void dm_uevent_exit(void)
+
+Setup and teardown.
 
 void dm_send_uevents(struct list_head *events, struct kobject *kobj)
 
+Actually send the uevents (called indirectly from dm_table_event, above).
+
 
 The variables added to the uevent environment are:
 
-- 
2.7.4

[PATCH 5/9] dm: Rename dm_build_uevent to dm_uevent_build

2016-10-03 Thread Andy Grover

For consistency with other function names that start with 'dm_uevent'.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 Documentation/device-mapper/dm-uevent.txt | 2 +-
 drivers/md/dm-mpath.c | 4 ++--
 drivers/md/dm-uevent.c| 4 ++--
 drivers/md/dm-uevent.h| 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 0a5d909..5780e76 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -8,7 +8,7 @@ ioctl event interface is not.
 
 There are five functions DM targets should use.
 
-struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_uevent *dm_uevent_build(struct mapped_device *md,
 struct dm_target *ti,
 enum kobject_action action,
 const char *dm_action)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 75fe28a..26f99d0 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1261,12 +1261,12 @@ static void dm_path_uevent(enum dm_uevent_type 
event_type, struct dm_target *ti,
return;
}
 
-   event = dm_build_uevent(md,
+   event = dm_uevent_build(md,
ti,
_dm_mpath_uevent_type_names[event_type].action,
_dm_mpath_uevent_type_names[event_type].name);
if (IS_ERR(event)) {
-   DMERR("%s: dm_build_uevent() failed", __func__);
+   DMERR("%s: dm_uevent_build() failed", __func__);
return;
}
 
diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index f7089dd..6a72de7 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -51,7 +51,7 @@ static struct dm_uevent *dm_uevent_alloc(struct mapped_device 
*md)
return event;
 }
 
-struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_uevent *dm_uevent_build(struct mapped_device *md,
 struct dm_target *ti,
 enum kobject_action action,
 const char *dm_action)
@@ -92,7 +92,7 @@ err_add:
 err_nomem:
return ERR_PTR(-ENOMEM);
 }
-EXPORT_SYMBOL_GPL(dm_build_uevent);
+EXPORT_SYMBOL_GPL(dm_uevent_build);
 
 /**
  * dm_send_uevents - send uevents for given list
diff --git a/drivers/md/dm-uevent.h b/drivers/md/dm-uevent.h
index 4ff2ad1..4dc99c2 100644
--- a/drivers/md/dm-uevent.h
+++ b/drivers/md/dm-uevent.h
@@ -42,7 +42,7 @@ enum dm_uevent_type {
 extern int dm_uevent_init(void);
 extern void dm_uevent_exit(void);
 extern void dm_send_uevents(struct list_head *events, struct kobject *kobj);
-struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_uevent *dm_uevent_build(struct mapped_device *md,
 struct dm_target *ti,
 enum kobject_action action,
 const char *dm_action);
@@ -61,7 +61,7 @@ static inline void dm_send_uevents(struct list_head *events,
   struct kobject *kobj)
 {
 }
-static inline struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+static inline struct dm_uevent *dm_uevent_build(struct mapped_device *md,
struct dm_target *ti,
enum kobject_action action,
const char *dm_action)
-- 
2.7.4

[PATCH 9/9] dm: Generate uevents for other targets

2016-10-03 Thread Andy Grover

Generate uevents for other targets: cache, log, raid1, and snap.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 drivers/md/dm-cache-target.c   | 5 -
 drivers/md/dm-log-userspace-base.c | 8 ++--
 drivers/md/dm-log.c| 1 +
 drivers/md/dm-raid1.c  | 1 +
 drivers/md/dm-snap.c   | 1 +
 5 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 59b2c50..29f0bc9 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -586,8 +586,10 @@ static void clear_dirty(struct cache *cache, dm_oblock_t 
oblock, dm_cblock_t cbl
 {
if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) {
policy_clear_dirty(cache->policy, oblock);
-   if (atomic_dec_return(>nr_dirty) == 0)
+   if (atomic_dec_return(>nr_dirty) == 0) {
+   dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_CLEAN");
dm_table_event(cache->ti->table);
+   }
}
 }
 
@@ -978,6 +980,7 @@ static void notify_mode_switch(struct cache *cache, enum 
cache_metadata_mode mod
"fail"
};
 
+   dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_MODE_SWITCH");
dm_table_event(cache->ti->table);
DMINFO("%s: switching cache to %s mode",
   cache_device_name(cache), descs[(int)mode]);
diff --git a/drivers/md/dm-log-userspace-base.c 
b/drivers/md/dm-log-userspace-base.c
index 53b7b06d..cb0b6cb 100644
--- a/drivers/md/dm-log-userspace-base.c
+++ b/drivers/md/dm-log-userspace-base.c
@@ -162,8 +162,10 @@ static void do_flush(struct work_struct *work)
 
r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH, NULL, 0, NULL, 
NULL);
 
-   if (r)
+   if (r) {
+   dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED");
dm_table_event(lc->ti->table);
+   }
 }
 
 /*
@@ -634,8 +636,10 @@ out:
mempool_free(fe, flush_entry_pool);
}
 
-   if (r)
+   if (r) {
+   dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED");
dm_table_event(lc->ti->table);
+   }
 
return r;
 }
diff --git a/drivers/md/dm-log.c b/drivers/md/dm-log.c
index 07fc1ad..8d144d2 100644
--- a/drivers/md/dm-log.c
+++ b/drivers/md/dm-log.c
@@ -579,6 +579,7 @@ static void fail_log_device(struct log_c *lc)
return;
 
lc->log_dev_failed = 1;
+   dm_uevent_add(lc->ti, KOBJ_CHANGE, "LOG_FAILED");
dm_table_event(lc->ti->table);
 }
 
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index bdf1606..daffbab 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -410,6 +410,7 @@ static void do_recovery(struct mirror_set *ms)
if (!ms->in_sync &&
(log->type->get_sync_count(log) == ms->nr_regions)) {
/* the sync is complete */
+   dm_uevent_add(ms->ti, KOBJ_CHANGE, "TABLE_SYNCED");
dm_table_event(ms->ti->table);
ms->in_sync = 1;
reset_ms_flags(ms);
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c65feea..b47e28f6 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1443,6 +1443,7 @@ static void __invalidate_snapshot(struct dm_snapshot *s, 
int err)
 
s->valid = 0;
 
+   dm_uevent_add(s->ti, KOBJ_CHANGE, "SNAPSHOT_INVALIDATED");
dm_table_event(s->ti->table);
 }
 
-- 
2.7.4

[PATCH 7/9] dm: Implement dm_uevent_add()

2016-10-03 Thread Andy Grover

This helper function builds and enqueues the uevent. Targets can use this
when they do not need to customize the uevent beyond its name.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 Documentation/device-mapper/dm-uevent.txt |  5 +
 drivers/md/dm-uevent.c| 26 ++
 include/linux/device-mapper.h |  1 +
 3 files changed, 32 insertions(+)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 4fcb71d..097e7f4 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -30,6 +30,11 @@ void dm_uevent_free(struct dm_uevent *event)
 
 Something went wrong, free the uevent instead of queueing it.
 
+void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char 
*name)
+
+Helper function that builds and queues a uevent if no additional
+variables need to be added.
+
 void dm_table_event(struct dm_table *t)
 
 Triggers sending of queued uevents as well as waking up processes waiting on
diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index 6a72de7..d139135 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -95,6 +95,32 @@ err_nomem:
 EXPORT_SYMBOL_GPL(dm_uevent_build);
 
 /**
+ * dm_uevent_add - Create and queue a basic uevent
+ *
+ * @ti:The target to add a uevent for
+ * @action:The kobject action
+ * @name:  The name of the uevent to add and queue
+ *
+ * If targets wish to create and queue a uevent but don't need to add
+ * extra vars, they can do so more easily by calling this.
+ * This is usually followed by a call to dm_table_event().
+ */
+void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char 
*name)
+{
+   struct mapped_device *md = dm_table_get_md(ti->table);
+   struct dm_uevent *event;
+
+   event = dm_uevent_build(md, ti, action, name);
+   if (IS_ERR(event)) {
+   DMERR("%s: dm_uevent_build() failed", __func__);
+   return;
+   }
+
+   dm_uevent_queue(md, >elist);
+}
+EXPORT_SYMBOL_GPL(dm_uevent_add);
+
+/**
  * dm_send_uevents - send uevents for given list
  *
  * @events:list of events to send
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 024207c..c102e29 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -420,6 +420,7 @@ uint32_t dm_get_event_nr(struct mapped_device *md);
 int dm_wait_event(struct mapped_device *md, int event_nr);
 uint32_t dm_next_uevent_seq(struct mapped_device *md);
 void dm_uevent_queue(struct mapped_device *md, struct list_head *elist);
+void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char 
*name);
 
 /*
  * Info functions.
-- 
2.7.4

[PATCH 9/9] dm: Generate uevents for other targets

2016-10-03 Thread Andy Grover

Generate uevents for other targets: cache, log, raid1, and snap.

Signed-off-by: Andy Grover 
---
 drivers/md/dm-cache-target.c   | 5 -
 drivers/md/dm-log-userspace-base.c | 8 ++--
 drivers/md/dm-log.c| 1 +
 drivers/md/dm-raid1.c  | 1 +
 drivers/md/dm-snap.c   | 1 +
 5 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 59b2c50..29f0bc9 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -586,8 +586,10 @@ static void clear_dirty(struct cache *cache, dm_oblock_t 
oblock, dm_cblock_t cbl
 {
if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) {
policy_clear_dirty(cache->policy, oblock);
-   if (atomic_dec_return(>nr_dirty) == 0)
+   if (atomic_dec_return(>nr_dirty) == 0) {
+   dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_CLEAN");
dm_table_event(cache->ti->table);
+   }
}
 }
 
@@ -978,6 +980,7 @@ static void notify_mode_switch(struct cache *cache, enum 
cache_metadata_mode mod
"fail"
};
 
+   dm_uevent_add(cache->ti, KOBJ_CHANGE, "CACHE_MODE_SWITCH");
dm_table_event(cache->ti->table);
DMINFO("%s: switching cache to %s mode",
   cache_device_name(cache), descs[(int)mode]);
diff --git a/drivers/md/dm-log-userspace-base.c 
b/drivers/md/dm-log-userspace-base.c
index 53b7b06d..cb0b6cb 100644
--- a/drivers/md/dm-log-userspace-base.c
+++ b/drivers/md/dm-log-userspace-base.c
@@ -162,8 +162,10 @@ static void do_flush(struct work_struct *work)
 
r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH, NULL, 0, NULL, 
NULL);
 
-   if (r)
+   if (r) {
+   dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED");
dm_table_event(lc->ti->table);
+   }
 }
 
 /*
@@ -634,8 +636,10 @@ out:
mempool_free(fe, flush_entry_pool);
}
 
-   if (r)
+   if (r) {
+   dm_uevent_add(lc->ti->table, KOBJ_CHANGE, "LOG_FLUSHED");
dm_table_event(lc->ti->table);
+   }
 
return r;
 }
diff --git a/drivers/md/dm-log.c b/drivers/md/dm-log.c
index 07fc1ad..8d144d2 100644
--- a/drivers/md/dm-log.c
+++ b/drivers/md/dm-log.c
@@ -579,6 +579,7 @@ static void fail_log_device(struct log_c *lc)
return;
 
lc->log_dev_failed = 1;
+   dm_uevent_add(lc->ti, KOBJ_CHANGE, "LOG_FAILED");
dm_table_event(lc->ti->table);
 }
 
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index bdf1606..daffbab 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -410,6 +410,7 @@ static void do_recovery(struct mirror_set *ms)
if (!ms->in_sync &&
(log->type->get_sync_count(log) == ms->nr_regions)) {
/* the sync is complete */
+   dm_uevent_add(ms->ti, KOBJ_CHANGE, "TABLE_SYNCED");
dm_table_event(ms->ti->table);
ms->in_sync = 1;
reset_ms_flags(ms);
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c65feea..b47e28f6 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1443,6 +1443,7 @@ static void __invalidate_snapshot(struct dm_snapshot *s, 
int err)
 
s->valid = 0;
 
+   dm_uevent_add(s->ti, KOBJ_CHANGE, "SNAPSHOT_INVALIDATED");
dm_table_event(s->ti->table);
 }
 
-- 
2.7.4

[PATCH 7/9] dm: Implement dm_uevent_add()

2016-10-03 Thread Andy Grover

This helper function builds and enqueues the uevent. Targets can use this
when they do not need to customize the uevent beyond its name.

Signed-off-by: Andy Grover 
---
 Documentation/device-mapper/dm-uevent.txt |  5 +
 drivers/md/dm-uevent.c| 26 ++
 include/linux/device-mapper.h |  1 +
 3 files changed, 32 insertions(+)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 4fcb71d..097e7f4 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -30,6 +30,11 @@ void dm_uevent_free(struct dm_uevent *event)
 
 Something went wrong, free the uevent instead of queueing it.
 
+void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char 
*name)
+
+Helper function that builds and queues a uevent if no additional
+variables need to be added.
+
 void dm_table_event(struct dm_table *t)
 
 Triggers sending of queued uevents as well as waking up processes waiting on
diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index 6a72de7..d139135 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -95,6 +95,32 @@ err_nomem:
 EXPORT_SYMBOL_GPL(dm_uevent_build);
 
 /**
+ * dm_uevent_add - Create and queue a basic uevent
+ *
+ * @ti:The target to add a uevent for
+ * @action:The kobject action
+ * @name:  The name of the uevent to add and queue
+ *
+ * If targets wish to create and queue a uevent but don't need to add
+ * extra vars, they can do so more easily by calling this.
+ * This is usually followed by a call to dm_table_event().
+ */
+void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char 
*name)
+{
+   struct mapped_device *md = dm_table_get_md(ti->table);
+   struct dm_uevent *event;
+
+   event = dm_uevent_build(md, ti, action, name);
+   if (IS_ERR(event)) {
+   DMERR("%s: dm_uevent_build() failed", __func__);
+   return;
+   }
+
+   dm_uevent_queue(md, >elist);
+}
+EXPORT_SYMBOL_GPL(dm_uevent_add);
+
+/**
  * dm_send_uevents - send uevents for given list
  *
  * @events:list of events to send
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 024207c..c102e29 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -420,6 +420,7 @@ uint32_t dm_get_event_nr(struct mapped_device *md);
 int dm_wait_event(struct mapped_device *md, int event_nr);
 uint32_t dm_next_uevent_seq(struct mapped_device *md);
 void dm_uevent_queue(struct mapped_device *md, struct list_head *elist);
+void dm_uevent_add(struct dm_target *ti, enum kobject_action action, char 
*name);
 
 /*
  * Info functions.
-- 
2.7.4

[PATCH 5/9] dm: Rename dm_build_uevent to dm_uevent_build

2016-10-03 Thread Andy Grover

For consistency with other function names that start with 'dm_uevent'.

Signed-off-by: Andy Grover 
---
 Documentation/device-mapper/dm-uevent.txt | 2 +-
 drivers/md/dm-mpath.c | 4 ++--
 drivers/md/dm-uevent.c| 4 ++--
 drivers/md/dm-uevent.h| 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 0a5d909..5780e76 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -8,7 +8,7 @@ ioctl event interface is not.
 
 There are five functions DM targets should use.
 
-struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_uevent *dm_uevent_build(struct mapped_device *md,
 struct dm_target *ti,
 enum kobject_action action,
 const char *dm_action)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 75fe28a..26f99d0 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1261,12 +1261,12 @@ static void dm_path_uevent(enum dm_uevent_type 
event_type, struct dm_target *ti,
return;
}
 
-   event = dm_build_uevent(md,
+   event = dm_uevent_build(md,
ti,
_dm_mpath_uevent_type_names[event_type].action,
_dm_mpath_uevent_type_names[event_type].name);
if (IS_ERR(event)) {
-   DMERR("%s: dm_build_uevent() failed", __func__);
+   DMERR("%s: dm_uevent_build() failed", __func__);
return;
}
 
diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index f7089dd..6a72de7 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -51,7 +51,7 @@ static struct dm_uevent *dm_uevent_alloc(struct mapped_device 
*md)
return event;
 }
 
-struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_uevent *dm_uevent_build(struct mapped_device *md,
 struct dm_target *ti,
 enum kobject_action action,
 const char *dm_action)
@@ -92,7 +92,7 @@ err_add:
 err_nomem:
return ERR_PTR(-ENOMEM);
 }
-EXPORT_SYMBOL_GPL(dm_build_uevent);
+EXPORT_SYMBOL_GPL(dm_uevent_build);
 
 /**
  * dm_send_uevents - send uevents for given list
diff --git a/drivers/md/dm-uevent.h b/drivers/md/dm-uevent.h
index 4ff2ad1..4dc99c2 100644
--- a/drivers/md/dm-uevent.h
+++ b/drivers/md/dm-uevent.h
@@ -42,7 +42,7 @@ enum dm_uevent_type {
 extern int dm_uevent_init(void);
 extern void dm_uevent_exit(void);
 extern void dm_send_uevents(struct list_head *events, struct kobject *kobj);
-struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+struct dm_uevent *dm_uevent_build(struct mapped_device *md,
 struct dm_target *ti,
 enum kobject_action action,
 const char *dm_action);
@@ -61,7 +61,7 @@ static inline void dm_send_uevents(struct list_head *events,
   struct kobject *kobj)
 {
 }
-static inline struct dm_uevent *dm_build_uevent(struct mapped_device *md,
+static inline struct dm_uevent *dm_uevent_build(struct mapped_device *md,
struct dm_target *ti,
enum kobject_action action,
const char *dm_action)
-- 
2.7.4

[PATCH 8/9] dm: Generate uevents for thin targets

2016-10-03 Thread Andy Grover

Generate uevents when thin pool devices hit data or metadata low water
marks, and when pool mode changes.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 drivers/md/dm-thin.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index d1c05c1..0778a2a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1429,6 +1429,7 @@ static void check_low_water_mark(struct pool *pool, 
dm_block_t free_blocks)
spin_lock_irqsave(>lock, flags);
pool->low_water_triggered = true;
spin_unlock_irqrestore(>lock, flags);
+   dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_DATA");
dm_table_event(pool->ti->table);
}
 }
@@ -2397,6 +2398,7 @@ static enum pool_mode get_pool_mode(struct pool *pool)
 
 static void notify_of_pool_mode_change(struct pool *pool, const char *new_mode)
 {
+   dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_POOL_MODE");
dm_table_event(pool->ti->table);
DMINFO("%s: switching pool to %s mode",
   dm_device_name(pool->pool_md), new_mode);
@@ -3095,6 +3097,7 @@ static void metadata_low_callback(void *context)
DMWARN("%s: reached low water mark for metadata device: sending event.",
   dm_device_name(pool->pool_md));
 
+   dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_METADATA");
dm_table_event(pool->ti->table);
 }
 
-- 
2.7.4

[PATCH 6/9] dm: Rename dm_event_add to dm_event_queue

2016-10-03 Thread Andy Grover

For clarity. This function queues an event to be sent, which is not
totally clear from the previous name. And, we want to reuse the name
for something else.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 Documentation/device-mapper/dm-uevent.txt | 2 +-
 drivers/md/dm-mpath.c | 2 +-
 drivers/md/dm.c   | 4 ++--
 include/linux/device-mapper.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 5780e76..4fcb71d 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -22,7 +22,7 @@ Optionally add event-specific data to the generated uevent. 
e.g. dm-mpath's
 PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths
 vars.
 
-void dm_uevent_add(struct mapped_device *md, struct list_head *elist)
+void dm_uevent_queue(struct mapped_device *md, struct list_head *elist)
 
 Hand off the uevent to the device's list of pending events.
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 26f99d0..467724c 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1282,7 +1282,7 @@ static void dm_path_uevent(enum dm_uevent_type 
event_type, struct dm_target *ti,
goto err;
}
 
-   dm_uevent_add(md, >elist);
+   dm_uevent_queue(md, >elist);
return;
 err:
dm_uevent_free(event);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 701c75f..37b3a82 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2419,7 +2419,7 @@ int dm_wait_event(struct mapped_device *md, int event_nr)
(event_nr != atomic_read(>event_nr)));
 }
 
-void dm_uevent_add(struct mapped_device *md, struct list_head *elist)
+void dm_uevent_queue(struct mapped_device *md, struct list_head *elist)
 {
unsigned long flags;
 
@@ -2427,7 +2427,7 @@ void dm_uevent_add(struct mapped_device *md, struct 
list_head *elist)
list_add(elist, >uevent_list);
spin_unlock_irqrestore(>uevent_lock, flags);
 }
-EXPORT_SYMBOL_GPL(dm_uevent_add);
+EXPORT_SYMBOL_GPL(dm_uevent_queue);
 
 /*
  * The gendisk is only valid as long as you have a reference
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 91acfce..024207c 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -419,7 +419,7 @@ int dm_resume(struct mapped_device *md);
 uint32_t dm_get_event_nr(struct mapped_device *md);
 int dm_wait_event(struct mapped_device *md, int event_nr);
 uint32_t dm_next_uevent_seq(struct mapped_device *md);
-void dm_uevent_add(struct mapped_device *md, struct list_head *elist);
+void dm_uevent_queue(struct mapped_device *md, struct list_head *elist);
 
 /*
  * Info functions.
-- 
2.7.4

[PATCH 8/9] dm: Generate uevents for thin targets

2016-10-03 Thread Andy Grover

Generate uevents when thin pool devices hit data or metadata low water
marks, and when pool mode changes.

Signed-off-by: Andy Grover 
---
 drivers/md/dm-thin.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index d1c05c1..0778a2a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1429,6 +1429,7 @@ static void check_low_water_mark(struct pool *pool, 
dm_block_t free_blocks)
spin_lock_irqsave(>lock, flags);
pool->low_water_triggered = true;
spin_unlock_irqrestore(>lock, flags);
+   dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_DATA");
dm_table_event(pool->ti->table);
}
 }
@@ -2397,6 +2398,7 @@ static enum pool_mode get_pool_mode(struct pool *pool)
 
 static void notify_of_pool_mode_change(struct pool *pool, const char *new_mode)
 {
+   dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_POOL_MODE");
dm_table_event(pool->ti->table);
DMINFO("%s: switching pool to %s mode",
   dm_device_name(pool->pool_md), new_mode);
@@ -3095,6 +3097,7 @@ static void metadata_low_callback(void *context)
DMWARN("%s: reached low water mark for metadata device: sending event.",
   dm_device_name(pool->pool_md));
 
+   dm_uevent_add(pool->ti, KOBJ_CHANGE, "THIN_LOW_WATER_METADATA");
dm_table_event(pool->ti->table);
 }
 
-- 
2.7.4

[PATCH 6/9] dm: Rename dm_event_add to dm_event_queue

2016-10-03 Thread Andy Grover

For clarity. This function queues an event to be sent, which is not
totally clear from the previous name. And, we want to reuse the name
for something else.

Signed-off-by: Andy Grover 
---
 Documentation/device-mapper/dm-uevent.txt | 2 +-
 drivers/md/dm-mpath.c | 2 +-
 drivers/md/dm.c   | 4 ++--
 include/linux/device-mapper.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/device-mapper/dm-uevent.txt 
b/Documentation/device-mapper/dm-uevent.txt
index 5780e76..4fcb71d 100644
--- a/Documentation/device-mapper/dm-uevent.txt
+++ b/Documentation/device-mapper/dm-uevent.txt
@@ -22,7 +22,7 @@ Optionally add event-specific data to the generated uevent. 
e.g. dm-mpath's
 PATH_FAILED and PATH_REINSTATED uevents add path and number-of-remaining-paths
 vars.
 
-void dm_uevent_add(struct mapped_device *md, struct list_head *elist)
+void dm_uevent_queue(struct mapped_device *md, struct list_head *elist)
 
 Hand off the uevent to the device's list of pending events.
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 26f99d0..467724c 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1282,7 +1282,7 @@ static void dm_path_uevent(enum dm_uevent_type 
event_type, struct dm_target *ti,
goto err;
}
 
-   dm_uevent_add(md, >elist);
+   dm_uevent_queue(md, >elist);
return;
 err:
dm_uevent_free(event);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 701c75f..37b3a82 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2419,7 +2419,7 @@ int dm_wait_event(struct mapped_device *md, int event_nr)
(event_nr != atomic_read(>event_nr)));
 }
 
-void dm_uevent_add(struct mapped_device *md, struct list_head *elist)
+void dm_uevent_queue(struct mapped_device *md, struct list_head *elist)
 {
unsigned long flags;
 
@@ -2427,7 +2427,7 @@ void dm_uevent_add(struct mapped_device *md, struct 
list_head *elist)
list_add(elist, >uevent_list);
spin_unlock_irqrestore(>uevent_lock, flags);
 }
-EXPORT_SYMBOL_GPL(dm_uevent_add);
+EXPORT_SYMBOL_GPL(dm_uevent_queue);
 
 /*
  * The gendisk is only valid as long as you have a reference
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 91acfce..024207c 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -419,7 +419,7 @@ int dm_resume(struct mapped_device *md);
 uint32_t dm_get_event_nr(struct mapped_device *md);
 int dm_wait_event(struct mapped_device *md, int event_nr);
 uint32_t dm_next_uevent_seq(struct mapped_device *md);
-void dm_uevent_add(struct mapped_device *md, struct list_head *elist);
+void dm_uevent_queue(struct mapped_device *md, struct list_head *elist);
 
 /*
  * Info functions.
-- 
2.7.4

[PATCH 3/9] dm: Inline dm_build_path_uevent into dm_path_uevent

2016-10-03 Thread Andy Grover

Since it's no longer an API boundary we can consolidate these two
functions.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 drivers/md/dm-mpath.c | 59 +++
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index c563b6d..75fe28a 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1241,38 +1241,6 @@ static void multipath_dtr(struct dm_target *ti)
free_multipath(m);
 }
 
-static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md,
- struct dm_target *ti,
- enum kobject_action action,
- const char *dm_action,
- const char *path,
- unsigned nr_valid_paths)
-{
-   struct dm_uevent *event;
-
-   event = dm_build_uevent(md, ti, action, dm_action);
-   if (IS_ERR(event))
-   return event;
-
-   if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) {
-   DMERR("%s: add_uevent_var() for DM_PATH failed", __func__);
-   goto err_add;
-   }
-
-   if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d",
-  nr_valid_paths)) {
-   DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed",
- __func__);
-   goto err_add;
-   }
-
-   return event;
-
-err_add:
-   dm_uevent_free(event);
-   return ERR_PTR(-ENOMEM);
-}
-
 /**
  * dm_path_uevent - called to create a new path event and queue it
  *
@@ -1293,14 +1261,31 @@ static void dm_path_uevent(enum dm_uevent_type 
event_type, struct dm_target *ti,
return;
}
 
-   event = dm_build_path_uevent(md, ti,
-
_dm_mpath_uevent_type_names[event_type].action,
-
_dm_mpath_uevent_type_names[event_type].name,
-path, nr_valid_paths);
-   if (IS_ERR(event))
+   event = dm_build_uevent(md,
+   ti,
+   _dm_mpath_uevent_type_names[event_type].action,
+   _dm_mpath_uevent_type_names[event_type].name);
+   if (IS_ERR(event)) {
+   DMERR("%s: dm_build_uevent() failed", __func__);
return;
+   }
+
+   if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) {
+   DMERR("%s: add_uevent_var() for DM_PATH failed", __func__);
+   goto err;
+   }
+
+   if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d",
+  nr_valid_paths)) {
+   DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed",
+ __func__);
+   goto err;
+   }
 
dm_uevent_add(md, >elist);
+   return;
+err:
+   dm_uevent_free(event);
 }
 
 /*
-- 
2.7.4

[PATCH 3/9] dm: Inline dm_build_path_uevent into dm_path_uevent

2016-10-03 Thread Andy Grover

Since it's no longer an API boundary we can consolidate these two
functions.

Signed-off-by: Andy Grover 
---
 drivers/md/dm-mpath.c | 59 +++
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index c563b6d..75fe28a 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1241,38 +1241,6 @@ static void multipath_dtr(struct dm_target *ti)
free_multipath(m);
 }
 
-static struct dm_uevent *dm_build_path_uevent(struct mapped_device *md,
- struct dm_target *ti,
- enum kobject_action action,
- const char *dm_action,
- const char *path,
- unsigned nr_valid_paths)
-{
-   struct dm_uevent *event;
-
-   event = dm_build_uevent(md, ti, action, dm_action);
-   if (IS_ERR(event))
-   return event;
-
-   if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) {
-   DMERR("%s: add_uevent_var() for DM_PATH failed", __func__);
-   goto err_add;
-   }
-
-   if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d",
-  nr_valid_paths)) {
-   DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed",
- __func__);
-   goto err_add;
-   }
-
-   return event;
-
-err_add:
-   dm_uevent_free(event);
-   return ERR_PTR(-ENOMEM);
-}
-
 /**
  * dm_path_uevent - called to create a new path event and queue it
  *
@@ -1293,14 +1261,31 @@ static void dm_path_uevent(enum dm_uevent_type 
event_type, struct dm_target *ti,
return;
}
 
-   event = dm_build_path_uevent(md, ti,
-
_dm_mpath_uevent_type_names[event_type].action,
-
_dm_mpath_uevent_type_names[event_type].name,
-path, nr_valid_paths);
-   if (IS_ERR(event))
+   event = dm_build_uevent(md,
+   ti,
+   _dm_mpath_uevent_type_names[event_type].action,
+   _dm_mpath_uevent_type_names[event_type].name);
+   if (IS_ERR(event)) {
+   DMERR("%s: dm_build_uevent() failed", __func__);
return;
+   }
+
+   if (add_uevent_var(>ku_env, "DM_PATH=%s", path)) {
+   DMERR("%s: add_uevent_var() for DM_PATH failed", __func__);
+   goto err;
+   }
+
+   if (add_uevent_var(>ku_env, "DM_NR_VALID_PATHS=%d",
+  nr_valid_paths)) {
+   DMERR("%s: add_uevent_var() for DM_NR_VALID_PATHS failed",
+ __func__);
+   goto err;
+   }
 
dm_uevent_add(md, >elist);
+   return;
+err:
+   dm_uevent_free(event);
 }
 
 /*
-- 
2.7.4

[PATCH 0/9] Generate uevents for all DM events

2016-10-03 Thread Andy Grover

Hi Mike and GregKH,

I want a way to get devicemapper events without using the DM ioctl,
because that requires creating a thread to sleep in the ioctl for each
dm device I want events from.

It would seem like using uevents and KOBJ_CHANGE would be a good way
to do this, but Mike said that the uevent maintainers (Greg that's
you?) did not think this was a good idea?

If so, I was hoping you could talk a little more about why -- grep
shows KOBJ_CHANGE used all over, but its usage is not documented in
kobject.txt. In any case I can update kobject.txt with some more
guidelines.

The following patchset may be appliable if you're actually ok with
using KOBJ_CHANGE for dm events, or if not, then I'll look to rework
it to use a dm-specific genetlink approach.

Thanks -- Regards -- Andy

Andy Grover (9):
  dm: Do not export dm_send_uevents
  dm: Move multipath-specific stuff out of dm-uevent.c
  dm: Inline dm_build_path_uevent into dm_path_uevent
  dm: Update dm-uevent.txt
  dm: Rename dm_build_uevent to dm_uevent_build
  dm: Rename dm_event_add to dm_event_queue
  dm: Implement dm_uevent_add()
  dm: Generate uevents for thin targets
  dm: Generate uevents for other targets

 Documentation/device-mapper/dm-uevent.txt |  54 ++--
 drivers/md/dm-cache-target.c  |   5 +-
 drivers/md/dm-log-userspace-base.c|   8 ++-
 drivers/md/dm-log.c   |   1 +
 drivers/md/dm-mpath.c |  56 
 drivers/md/dm-raid1.c |   1 +
 drivers/md/dm-snap.c  |   1 +
 drivers/md/dm-thin.c  |   3 +
 drivers/md/dm-uevent.c| 102 ++
 drivers/md/dm-uevent.h|  30 +++--
 drivers/md/dm.c   |   3 +-
 include/linux/device-mapper.h |   3 +-
 12 files changed, 180 insertions(+), 87 deletions(-)

-- 
2.7.4

[PATCH 0/9] Generate uevents for all DM events

2016-10-03 Thread Andy Grover

Hi Mike and GregKH,

I want a way to get devicemapper events without using the DM ioctl,
because that requires creating a thread to sleep in the ioctl for each
dm device I want events from.

It would seem like using uevents and KOBJ_CHANGE would be a good way
to do this, but Mike said that the uevent maintainers (Greg that's
you?) did not think this was a good idea?

If so, I was hoping you could talk a little more about why -- grep
shows KOBJ_CHANGE used all over, but its usage is not documented in
kobject.txt. In any case I can update kobject.txt with some more
guidelines.

The following patchset may be appliable if you're actually ok with
using KOBJ_CHANGE for dm events, or if not, then I'll look to rework
it to use a dm-specific genetlink approach.

Thanks -- Regards -- Andy

Andy Grover (9):
  dm: Do not export dm_send_uevents
  dm: Move multipath-specific stuff out of dm-uevent.c
  dm: Inline dm_build_path_uevent into dm_path_uevent
  dm: Update dm-uevent.txt
  dm: Rename dm_build_uevent to dm_uevent_build
  dm: Rename dm_event_add to dm_event_queue
  dm: Implement dm_uevent_add()
  dm: Generate uevents for thin targets
  dm: Generate uevents for other targets

 Documentation/device-mapper/dm-uevent.txt |  54 ++--
 drivers/md/dm-cache-target.c  |   5 +-
 drivers/md/dm-log-userspace-base.c|   8 ++-
 drivers/md/dm-log.c   |   1 +
 drivers/md/dm-mpath.c |  56 
 drivers/md/dm-raid1.c |   1 +
 drivers/md/dm-snap.c  |   1 +
 drivers/md/dm-thin.c  |   3 +
 drivers/md/dm-uevent.c| 102 ++
 drivers/md/dm-uevent.h|  30 +++--
 drivers/md/dm.c   |   3 +-
 include/linux/device-mapper.h |   3 +-
 12 files changed, 180 insertions(+), 87 deletions(-)

-- 
2.7.4

[PATCH 1/9] dm: Do not export dm_send_uevents

2016-10-03 Thread Andy Grover

Since dm-uevent.c (if CONFIG_DM_UEVENT) is part of the same module as where
dm_sent_uevents is called, it does not need to be exported.

Signed-off-by: Andy Grover <agro...@redhat.com>
---
 drivers/md/dm-uevent.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index 8efe033..2d5f858 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -169,7 +169,6 @@ uevent_free:
dm_uevent_free(event);
}
 }
-EXPORT_SYMBOL_GPL(dm_send_uevents);
 
 /**
  * dm_path_uevent - called to create a new path event and queue it
-- 
2.7.4

[PATCH 1/9] dm: Do not export dm_send_uevents

2016-10-03 Thread Andy Grover

Since dm-uevent.c (if CONFIG_DM_UEVENT) is part of the same module as where
dm_sent_uevents is called, it does not need to be exported.

Signed-off-by: Andy Grover 
---
 drivers/md/dm-uevent.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/md/dm-uevent.c b/drivers/md/dm-uevent.c
index 8efe033..2d5f858 100644
--- a/drivers/md/dm-uevent.c
+++ b/drivers/md/dm-uevent.c
@@ -169,7 +169,6 @@ uevent_free:
dm_uevent_free(event);
}
 }
-EXPORT_SYMBOL_GPL(dm_send_uevents);
 
 /**
  * dm_path_uevent - called to create a new path event and queue it
-- 
2.7.4

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Andy Grover


On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote:

Hi HCH & Co,

On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:

This patch set adds a generic NVMe over Fabrics target. The
implementation conforms to the NVMe 1.2b specification (which
includes Fabrics) and provides the NVMe over Fabrics access
to Linux block devices.



Thanks for all of the development work by the fabric_linux_driver team
(HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year.

Very excited to see this code get a public release now that NVMf
specification is out.  Now that it's in the wild, it's a good
opportunity to discuss some of the more interesting implementation
details, beyond the new NVMf wire-protocol itself.


I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)?

Thanks -- Andy

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Andy Grover


On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote:

Hi HCH & Co,

On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:

This patch set adds a generic NVMe over Fabrics target. The
implementation conforms to the NVMe 1.2b specification (which
includes Fabrics) and provides the NVMe over Fabrics access
to Linux block devices.



Thanks for all of the development work by the fabric_linux_driver team
(HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year.

Very excited to see this code get a public release now that NVMf
specification is out.  Now that it's in the wild, it's a good
opportunity to discuss some of the more interesting implementation
details, beyond the new NVMf wire-protocol itself.


I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)?

Thanks -- Andy

Re: __might_sleep in uio_read()?

2015-09-03 Thread Andy Grover


On 09/03/2015 12:12 PM, Greg KH wrote:

On Thu, Sep 03, 2015 at 11:47:34AM -0700, Andy Grover wrote:

On 09/03/2015 05:26 AM, Michal Hocko wrote:

On Wed 02-09-15 15:45:10, Andy Grover wrote:

Hi Hans and Greg,

Is this an issue with uio? I swear it didn't used to throw this warning...

Thanks -- Andy

[ 5174.883261] [ cut here ]
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at
[] uio_read+0x91/0x170 [uio]


The warning says that the driver is calling copy_to_user with
TASK_INTERRUPTIBLE which is wrong in general because this context can
sleep and a schedule would destroy the state. It doesn't matter here
because the code would break out from the loop regardless of the
copy_to_user return value.

I assume that TASK_INTERRUPTIBLE is necessary before the event_count
check to prevent from wake up races. If that is the case then you can
simply do:
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 3257d4220d01..7d8959e3833b 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user 
*buf,

event_count = atomic_read(>event);
if (event_count != listener->event_count) {
+   __set_current_state(TASK_RUNNING);
if (copy_to_user(buf, _count, count))
retval = -EFAULT;
else {



This certainly makes the warning go away. If this looks good to everyone
else can we get this change in?


What changed to require this?  Why is this suddenly showing up now?


I'm working on drivers/target/target_core_user.c, a SCSI userspace 
passthrough that was added in 3.18, aka TCMU, which uses uio.


The checks for !TASK_RUNNING were added in 3.19 (8eb23b9f3 and 00845eb96)

...and I'm just getting back to TCMU development after a bit, so maybe 
I'm the first one to call uio_read since those checks were added in 3.19?


-- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: __might_sleep in uio_read()?

2015-09-03 Thread Andy Grover


On 09/03/2015 05:26 AM, Michal Hocko wrote:

On Wed 02-09-15 15:45:10, Andy Grover wrote:

Hi Hans and Greg,

Is this an issue with uio? I swear it didn't used to throw this warning...

Thanks -- Andy

[ 5174.883261] [ cut here ]
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at
[] uio_read+0x91/0x170 [uio]


The warning says that the driver is calling copy_to_user with
TASK_INTERRUPTIBLE which is wrong in general because this context can
sleep and a schedule would destroy the state. It doesn't matter here
because the code would break out from the loop regardless of the
copy_to_user return value.

I assume that TASK_INTERRUPTIBLE is necessary before the event_count
check to prevent from wake up races. If that is the case then you can
simply do:
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 3257d4220d01..7d8959e3833b 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user 
*buf,

event_count = atomic_read(>event);
if (event_count != listener->event_count) {
+   __set_current_state(TASK_RUNNING);
if (copy_to_user(buf, _count, count))
retval = -EFAULT;
else {



This certainly makes the warning go away. If this looks good to everyone 
else can we get this change in?


Thanks -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: __might_sleep in uio_read()?

2015-09-03 Thread Andy Grover


On 09/03/2015 05:26 AM, Michal Hocko wrote:

On Wed 02-09-15 15:45:10, Andy Grover wrote:

Hi Hans and Greg,

Is this an issue with uio? I swear it didn't used to throw this warning...

Thanks -- Andy

[ 5174.883261] [ cut here ]
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at
[] uio_read+0x91/0x170 [uio]


The warning says that the driver is calling copy_to_user with
TASK_INTERRUPTIBLE which is wrong in general because this context can
sleep and a schedule would destroy the state. It doesn't matter here
because the code would break out from the loop regardless of the
copy_to_user return value.

I assume that TASK_INTERRUPTIBLE is necessary before the event_count
check to prevent from wake up races. If that is the case then you can
simply do:
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 3257d4220d01..7d8959e3833b 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user 
*buf,

event_count = atomic_read(>event);
if (event_count != listener->event_count) {
+   __set_current_state(TASK_RUNNING);
if (copy_to_user(buf, _count, count))
retval = -EFAULT;
else {



This certainly makes the warning go away. If this looks good to everyone 
else can we get this change in?


Thanks -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: __might_sleep in uio_read()?

2015-09-03 Thread Andy Grover


On 09/03/2015 12:12 PM, Greg KH wrote:

On Thu, Sep 03, 2015 at 11:47:34AM -0700, Andy Grover wrote:

On 09/03/2015 05:26 AM, Michal Hocko wrote:

On Wed 02-09-15 15:45:10, Andy Grover wrote:

Hi Hans and Greg,

Is this an issue with uio? I swear it didn't used to throw this warning...

Thanks -- Andy

[ 5174.883261] [ cut here ]
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at
[] uio_read+0x91/0x170 [uio]


The warning says that the driver is calling copy_to_user with
TASK_INTERRUPTIBLE which is wrong in general because this context can
sleep and a schedule would destroy the state. It doesn't matter here
because the code would break out from the loop regardless of the
copy_to_user return value.

I assume that TASK_INTERRUPTIBLE is necessary before the event_count
check to prevent from wake up races. If that is the case then you can
simply do:
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 3257d4220d01..7d8959e3833b 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -524,6 +524,7 @@ static ssize_t uio_read(struct file *filep, char __user 
*buf,

event_count = atomic_read(>event);
if (event_count != listener->event_count) {
+   __set_current_state(TASK_RUNNING);
if (copy_to_user(buf, _count, count))
retval = -EFAULT;
else {



This certainly makes the warning go away. If this looks good to everyone
else can we get this change in?


What changed to require this?  Why is this suddenly showing up now?


I'm working on drivers/target/target_core_user.c, a SCSI userspace 
passthrough that was added in 3.18, aka TCMU, which uses uio.


The checks for !TASK_RUNNING were added in 3.19 (8eb23b9f3 and 00845eb96)

...and I'm just getting back to TCMU development after a bit, so maybe 
I'm the first one to call uio_read since those checks were added in 3.19?


-- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

__might_sleep in uio_read()?

2015-09-02 Thread Andy Grover


Hi Hans and Greg,

Is this an issue with uio? I swear it didn't used to throw this warning...

Thanks -- Andy

[ 5174.883261] [ cut here ]
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at 
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set 
at [] uio_read+0x91/0x170 [uio]
[ 5174.885198] Modules linked in: tcm_loop target_core_user uio 
target_core_pscsi target_core_file target_core_iblock iscsi_target_mod 
target_core_mod uinput fuse nfsv3 nfs_acl nfs lockd grace fscache sunrpc 
microcode i2c_piix4 virtio_balloon i2c_core xfs libcrc32c crc32c_intel 
virtio_net virtio_blk
[ 5174.887351] CPU: 0 PID: 1532 Comm: tcmu-runner Not tainted 4.2.0-rc7+ 
#178
[ 5174.887853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.8.1-20150318_183358- 04/01/2014
[ 5174.888633]  81a3b870 880045393ca8 817afaae 

[ 5174.889224]  880045393cf8 880045393ce8 8109a846 
880045393cd8
[ 5174.889793]  a02a7150 02dc  
880045008000

[ 5174.890375] Call Trace:
[ 5174.890562]  [] dump_stack+0x4c/0x65
[ 5174.890938]  [] warn_slowpath_common+0x86/0xc0
[ 5174.891388]  [] warn_slowpath_fmt+0x46/0x50
[ 5174.891808]  [] ? uio_read+0x91/0x170 [uio]
[ 5174.892237]  [] ? uio_read+0x91/0x170 [uio]
[ 5174.892653]  [] __might_sleep+0x7d/0x90
[ 5174.893055]  [] __might_fault+0x43/0xa0
[ 5174.893448]  [] ? schedule+0x3e/0x90
[ 5174.893820]  [] uio_read+0x132/0x170 [uio]
[ 5174.894240]  [] ? wake_up_q+0x70/0x70
[ 5174.894620]  [] __vfs_read+0x28/0xe0
[ 5174.894993]  [] ? security_file_permission+0xa3/0xc0
[ 5174.895541]  [] ? rw_verify_area+0x4f/0xf0
[ 5174.896006]  [] vfs_read+0x8a/0x140
[ 5174.896391]  [] ? __schedule+0x425/0xcc0
[ 5174.896788]  [] SyS_read+0x49/0xb0
[ 5174.897160]  [] entry_SYSCALL_64_fastpath+0x12/0x76
[ 5174.897625] ---[ end trace 61765cb39fe48dd5 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

__might_sleep in uio_read()?

2015-09-02 Thread Andy Grover


Hi Hans and Greg,

Is this an issue with uio? I swear it didn't used to throw this warning...

Thanks -- Andy

[ 5174.883261] [ cut here ]
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at 
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set 
at [] uio_read+0x91/0x170 [uio]
[ 5174.885198] Modules linked in: tcm_loop target_core_user uio 
target_core_pscsi target_core_file target_core_iblock iscsi_target_mod 
target_core_mod uinput fuse nfsv3 nfs_acl nfs lockd grace fscache sunrpc 
microcode i2c_piix4 virtio_balloon i2c_core xfs libcrc32c crc32c_intel 
virtio_net virtio_blk
[ 5174.887351] CPU: 0 PID: 1532 Comm: tcmu-runner Not tainted 4.2.0-rc7+ 
#178
[ 5174.887853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.8.1-20150318_183358- 04/01/2014
[ 5174.888633]  81a3b870 880045393ca8 817afaae 

[ 5174.889224]  880045393cf8 880045393ce8 8109a846 
880045393cd8
[ 5174.889793]  a02a7150 02dc  
880045008000

[ 5174.890375] Call Trace:
[ 5174.890562]  [] dump_stack+0x4c/0x65
[ 5174.890938]  [] warn_slowpath_common+0x86/0xc0
[ 5174.891388]  [] warn_slowpath_fmt+0x46/0x50
[ 5174.891808]  [] ? uio_read+0x91/0x170 [uio]
[ 5174.892237]  [] ? uio_read+0x91/0x170 [uio]
[ 5174.892653]  [] __might_sleep+0x7d/0x90
[ 5174.893055]  [] __might_fault+0x43/0xa0
[ 5174.893448]  [] ? schedule+0x3e/0x90
[ 5174.893820]  [] uio_read+0x132/0x170 [uio]
[ 5174.894240]  [] ? wake_up_q+0x70/0x70
[ 5174.894620]  [] __vfs_read+0x28/0xe0
[ 5174.894993]  [] ? security_file_permission+0xa3/0xc0
[ 5174.895541]  [] ? rw_verify_area+0x4f/0xf0
[ 5174.896006]  [] vfs_read+0x8a/0x140
[ 5174.896391]  [] ? __schedule+0x425/0xcc0
[ 5174.896788]  [] SyS_read+0x49/0xb0
[ 5174.897160]  [] entry_SYSCALL_64_fastpath+0x12/0x76
[ 5174.897625] ---[ end trace 61765cb39fe48dd5 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/12] target/configfs: Convert mappedlun + SCSI MIBs to RCU reader

2015-05-12 Thread Andy Grover


On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote:

From: Nicholas Bellinger 

This patch converts fabric independent configfs link/unlink to use
RCU read path macros for se_node_acl->lun_entry_hlist access.

It also converts SCSI MIB configfs show attribute code to use
RCU read path macros for se_node_acl->lun_entry_hlist access.

Cc: Hannes Reinecke 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
Signed-off-by: Nicholas Bellinger 
---
  drivers/target/target_core_fabric_configfs.c |  35 +++---
  drivers/target/target_core_stat.c| 180 +--
  2 files changed, 110 insertions(+), 105 deletions(-)

diff --git a/drivers/target/target_core_fabric_configfs.c 
b/drivers/target/target_core_fabric_configfs.c
index 6cb4828..0dab6d5 100644
--- a/drivers/target/target_core_fabric_configfs.c
+++ b/drivers/target/target_core_fabric_configfs.c
@@ -123,16 +123,16 @@ static int target_fabric_mappedlun_link(
 * which be will write protected (READ-ONLY) when
 * tpg_1/attrib/demo_mode_write_protect=1
 */
-   spin_lock_irq(>se_lun_nacl->device_list_lock);
-   deve = lacl->se_lun_nacl->device_list[lacl->mapped_lun];
-   if (deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)
+   rcu_read_lock();
+   deve = target_nacl_find_deve(lacl->se_lun_nacl, lacl->mapped_lun);
+   if (deve && deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)


Why do we still need TRANSPORT_LUNFLAGS_INITIATOR_ACCESS? Isn't deve 
being not-NULL enough?


-- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/12] target: Convert se_node_acl->device_list[] to RCU hlist

2015-05-12 Thread Andy Grover


On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote:

From: Nicholas Bellinger 

This patch converts se_node_acl->device_list[] table for mappedluns
to modern RCU hlist_head usage in order to support an arbitrary number
of node_acl lun mappings.

This includes changes to core_[enable,disable]_device_list_for_node()
rcu_assign_pointer() and invokes call_rcu() for releasing memory, along
with a number of RCU read path conversions in target_core_device.c code.

Required for subsequent conversion of transport_lookup_cmd() to lock-less
RCU read path.

Cc: Hannes Reinecke 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
Signed-off-by: Nicholas Bellinger 
---
  drivers/target/target_core_device.c   | 218 +++---
  drivers/target/target_core_internal.h |   1 +
  drivers/target/target_core_tpg.c  |  23 ++--
  include/target/target_core_base.h |   7 +-
  4 files changed, 137 insertions(+), 112 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index 6e58976..1df14ce 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -198,12 +198,9 @@ struct se_dev_entry *core_get_se_deve_from_rtpi(
struct se_lun *lun;
struct se_port *port;
struct se_portal_group *tpg = nacl->se_tpg;
-   u32 i;
-
-   spin_lock_irq(>device_list_lock);
-   for (i = 0; i < TRANSPORT_MAX_LUNS_PER_TPG; i++) {
-   deve = nacl->device_list[i];

+   rcu_read_lock();
+   hlist_for_each_entry_rcu(deve, >lun_entry_hlist, link) {
if (!(deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS))
continue;

@@ -225,11 +222,11 @@ struct se_dev_entry *core_get_se_deve_from_rtpi(
continue;

atomic_inc_mb(>pr_ref_count);
-   spin_unlock_irq(>device_list_lock);
+   rcu_read_unlock();

return deve;
}
-   spin_unlock_irq(>device_list_lock);
+   rcu_read_unlock();

return NULL;
  }
@@ -240,18 +237,12 @@ int core_free_device_list_for_node(
  {
struct se_dev_entry *deve;
struct se_lun *lun;
-   u32 i;
-
-   if (!nacl->device_list)
-   return 0;
-
-   spin_lock_irq(>device_list_lock);
-   for (i = 0; i < TRANSPORT_MAX_LUNS_PER_TPG; i++) {
-   deve = nacl->device_list[i];
+   u32 mapped_lun;

+   rcu_read_lock();
+   hlist_for_each_entry_rcu(deve, >lun_entry_hlist, link) {
if (!(deve->lun_flags & TRANSPORT_LUNFLAGS_INITIATOR_ACCESS))
continue;
-
if (!deve->se_lun) {
pr_err("%s device entries device pointer is"
" NULL, but Initiator has access.\n",
@@ -259,16 +250,14 @@ int core_free_device_list_for_node(
continue;
}
lun = deve->se_lun;
+   mapped_lun = deve->mapped_lun;
+   rcu_read_unlock();

-   spin_unlock_irq(>device_list_lock);
-   core_disable_device_list_for_node(lun, NULL, deve->mapped_lun,
-   TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, tpg);
-   spin_lock_irq(>device_list_lock);
+   core_disable_device_list_for_node(lun, NULL, mapped_lun,
+   TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, 
tpg);
+   rcu_read_lock();
}
-   spin_unlock_irq(>device_list_lock);
-
-   array_free(nacl->device_list, TRANSPORT_MAX_LUNS_PER_TPG);
-   nacl->device_list = NULL;
+   rcu_read_unlock();

return 0;
  }
@@ -280,18 +269,44 @@ void core_update_device_list_access(
  {
struct se_dev_entry *deve;

-   spin_lock_irq(>device_list_lock);
-   deve = nacl->device_list[mapped_lun];
-   if (lun_access & TRANSPORT_LUNFLAGS_READ_WRITE) {
-   deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_ONLY;
-   deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE;
-   } else {
-   deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_WRITE;
-   deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY;
+   spin_lock_irq(>lun_entry_lock);
+   deve = target_nacl_find_deve(nacl, mapped_lun);
+   if (deve) {
+   if (lun_access & TRANSPORT_LUNFLAGS_READ_WRITE) {
+   deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_ONLY;
+   deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE;
+   } else {
+   deve->lun_flags &= ~TRANSPORT_LUNFLAGS_READ_WRITE;
+   deve->lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY;
+   }
}
-   spin_unlock_irq(>device_list_lock);
+   spin_unlock_irq(>lun_entry_lock);
+
+   synchronize_rcu();
+}
+
+static void target_nacl_deve_callrcu(struct rcu_head *head)
+{
+   struct se_dev_entry *deve = container_of(head, struct

Re: [PATCH 01/12] target: Convert se_node_acl-device_list[] to RCU hlist

2015-05-12 Thread Andy Grover


On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote:

From: Nicholas Bellinger n...@linux-iscsi.org

This patch converts se_node_acl-device_list[] table for mappedluns
to modern RCU hlist_head usage in order to support an arbitrary number
of node_acl lun mappings.

This includes changes to core_[enable,disable]_device_list_for_node()
rcu_assign_pointer() and invokes call_rcu() for releasing memory, along
with a number of RCU read path conversions in target_core_device.c code.

Required for subsequent conversion of transport_lookup_cmd() to lock-less
RCU read path.

Cc: Hannes Reinecke h...@suse.de
Cc: Christoph Hellwig h...@lst.de
Cc: Sagi Grimberg sa...@mellanox.com
Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
---
  drivers/target/target_core_device.c   | 218 +++---
  drivers/target/target_core_internal.h |   1 +
  drivers/target/target_core_tpg.c  |  23 ++--
  include/target/target_core_base.h |   7 +-
  4 files changed, 137 insertions(+), 112 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index 6e58976..1df14ce 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -198,12 +198,9 @@ struct se_dev_entry *core_get_se_deve_from_rtpi(
struct se_lun *lun;
struct se_port *port;
struct se_portal_group *tpg = nacl-se_tpg;
-   u32 i;
-
-   spin_lock_irq(nacl-device_list_lock);
-   for (i = 0; i  TRANSPORT_MAX_LUNS_PER_TPG; i++) {
-   deve = nacl-device_list[i];

+   rcu_read_lock();
+   hlist_for_each_entry_rcu(deve, nacl-lun_entry_hlist, link) {
if (!(deve-lun_flags  TRANSPORT_LUNFLAGS_INITIATOR_ACCESS))
continue;

@@ -225,11 +222,11 @@ struct se_dev_entry *core_get_se_deve_from_rtpi(
continue;

atomic_inc_mb(deve-pr_ref_count);
-   spin_unlock_irq(nacl-device_list_lock);
+   rcu_read_unlock();

return deve;
}
-   spin_unlock_irq(nacl-device_list_lock);
+   rcu_read_unlock();

return NULL;
  }
@@ -240,18 +237,12 @@ int core_free_device_list_for_node(
  {
struct se_dev_entry *deve;
struct se_lun *lun;
-   u32 i;
-
-   if (!nacl-device_list)
-   return 0;
-
-   spin_lock_irq(nacl-device_list_lock);
-   for (i = 0; i  TRANSPORT_MAX_LUNS_PER_TPG; i++) {
-   deve = nacl-device_list[i];
+   u32 mapped_lun;

+   rcu_read_lock();
+   hlist_for_each_entry_rcu(deve, nacl-lun_entry_hlist, link) {
if (!(deve-lun_flags  TRANSPORT_LUNFLAGS_INITIATOR_ACCESS))
continue;
-
if (!deve-se_lun) {
pr_err(%s device entries device pointer is
 NULL, but Initiator has access.\n,
@@ -259,16 +250,14 @@ int core_free_device_list_for_node(
continue;
}
lun = deve-se_lun;
+   mapped_lun = deve-mapped_lun;
+   rcu_read_unlock();

-   spin_unlock_irq(nacl-device_list_lock);
-   core_disable_device_list_for_node(lun, NULL, deve-mapped_lun,
-   TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, tpg);
-   spin_lock_irq(nacl-device_list_lock);
+   core_disable_device_list_for_node(lun, NULL, mapped_lun,
+   TRANSPORT_LUNFLAGS_NO_ACCESS, nacl, 
tpg);
+   rcu_read_lock();
}
-   spin_unlock_irq(nacl-device_list_lock);
-
-   array_free(nacl-device_list, TRANSPORT_MAX_LUNS_PER_TPG);
-   nacl-device_list = NULL;
+   rcu_read_unlock();

return 0;
  }
@@ -280,18 +269,44 @@ void core_update_device_list_access(
  {
struct se_dev_entry *deve;

-   spin_lock_irq(nacl-device_list_lock);
-   deve = nacl-device_list[mapped_lun];
-   if (lun_access  TRANSPORT_LUNFLAGS_READ_WRITE) {
-   deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_ONLY;
-   deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE;
-   } else {
-   deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_WRITE;
-   deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY;
+   spin_lock_irq(nacl-lun_entry_lock);
+   deve = target_nacl_find_deve(nacl, mapped_lun);
+   if (deve) {
+   if (lun_access  TRANSPORT_LUNFLAGS_READ_WRITE) {
+   deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_ONLY;
+   deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_WRITE;
+   } else {
+   deve-lun_flags = ~TRANSPORT_LUNFLAGS_READ_WRITE;
+   deve-lun_flags |= TRANSPORT_LUNFLAGS_READ_ONLY;
+   }
}
-   spin_unlock_irq(nacl-device_list_lock);
+   spin_unlock_irq(nacl-lun_entry_lock);
+
+   synchronize_rcu();
+}
+
+static void

Re: [PATCH 03/12] target/configfs: Convert mappedlun + SCSI MIBs to RCU reader

2015-05-12 Thread Andy Grover


On 05/12/2015 02:25 AM, Nicholas A. Bellinger wrote:

From: Nicholas Bellinger n...@linux-iscsi.org

This patch converts fabric independent configfs link/unlink to use
RCU read path macros for se_node_acl-lun_entry_hlist access.

It also converts SCSI MIB configfs show attribute code to use
RCU read path macros for se_node_acl-lun_entry_hlist access.

Cc: Hannes Reinecke h...@suse.de
Cc: Christoph Hellwig h...@lst.de
Cc: Sagi Grimberg sa...@mellanox.com
Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
---
  drivers/target/target_core_fabric_configfs.c |  35 +++---
  drivers/target/target_core_stat.c| 180 +--
  2 files changed, 110 insertions(+), 105 deletions(-)

diff --git a/drivers/target/target_core_fabric_configfs.c 
b/drivers/target/target_core_fabric_configfs.c
index 6cb4828..0dab6d5 100644
--- a/drivers/target/target_core_fabric_configfs.c
+++ b/drivers/target/target_core_fabric_configfs.c
@@ -123,16 +123,16 @@ static int target_fabric_mappedlun_link(
 * which be will write protected (READ-ONLY) when
 * tpg_1/attrib/demo_mode_write_protect=1
 */
-   spin_lock_irq(lacl-se_lun_nacl-device_list_lock);
-   deve = lacl-se_lun_nacl-device_list[lacl-mapped_lun];
-   if (deve-lun_flags  TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)
+   rcu_read_lock();
+   deve = target_nacl_find_deve(lacl-se_lun_nacl, lacl-mapped_lun);
+   if (deve  deve-lun_flags  TRANSPORT_LUNFLAGS_INITIATOR_ACCESS)


Why do we still need TRANSPORT_LUNFLAGS_INITIATOR_ACCESS? Isn't deve 
being not-NULL enough?


-- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC] configfs: Update docs to indicate sysfs & configfs usage is ok

2015-03-09 Thread Andy Grover

The docs state that sysfs and configfs are peers that meet different needs,
but doesn't explicitly make it clear that if a subsystem needs
functionality that spans both, it is preferable to use both instead of
trying to cram everything into one or the other.

Signed-off-by: Andy Grover 
---
 Documentation/filesystems/configfs/configfs.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/configfs/configfs.txt 
b/Documentation/filesystems/configfs/configfs.txt
index b40fec9..a94c9c3 100644
--- a/Documentation/filesystems/configfs/configfs.txt
+++ b/Documentation/filesystems/configfs/configfs.txt
@@ -33,7 +33,8 @@ lifetime of the representation is completely driven by 
userspace.  The
 kernel modules backing the items must respond to this.
 
 Both sysfs and configfs can and should exist together on the same
-system.  One is not a replacement for the other.
+system.  One is not a replacement for the other. They are
+complementary, and a client module may have need to support both.
 
 [Using configfs]
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC] configfs: Update docs to indicate sysfs configfs usage is ok

2015-03-09 Thread Andy Grover

The docs state that sysfs and configfs are peers that meet different needs,
but doesn't explicitly make it clear that if a subsystem needs
functionality that spans both, it is preferable to use both instead of
trying to cram everything into one or the other.

Signed-off-by: Andy Grover agro...@redhat.com
---
 Documentation/filesystems/configfs/configfs.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/configfs/configfs.txt 
b/Documentation/filesystems/configfs/configfs.txt
index b40fec9..a94c9c3 100644
--- a/Documentation/filesystems/configfs/configfs.txt
+++ b/Documentation/filesystems/configfs/configfs.txt
@@ -33,7 +33,8 @@ lifetime of the representation is completely driven by 
userspace.  The
 kernel modules backing the items must respond to this.
 
 Both sysfs and configfs can and should exist together on the same
-system.  One is not a replacement for the other.
+system.  One is not a replacement for the other. They are
+complementary, and a client module may have need to support both.
 
 [Using configfs]
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] uapi/linux/target_core_user.h: fix headers_install.sh badness

2014-12-18 Thread Andy Grover


On 12/18/2014 09:57 AM, Kyle McMartin wrote:

scripts/headers_install.sh will transform __packed to
__attribute__((packed)), so the #ifndef is not necessary.
(and, in fact, it's problematic, because we'll end up with the header
  containing:
#ifndef __attribute__((packed))
#define __attribu...
and so forth.)

Cc: sta...@vger.kernel.org # 3.18
Signed-off-by: Kyle McMartin 

---
cc-ing stable@ so this headers fix gets picked up by distros.


Acked-by: Andy Grover 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] uapi/linux/target_core_user.h: fix headers_install.sh badness

2014-12-18 Thread Andy Grover


On 12/18/2014 09:57 AM, Kyle McMartin wrote:

scripts/headers_install.sh will transform __packed to
__attribute__((packed)), so the #ifndef is not necessary.
(and, in fact, it's problematic, because we'll end up with the header
  containing:
#ifndef __attribute__((packed))
#define __attribu...
and so forth.)

Cc: sta...@vger.kernel.org # 3.18
Signed-off-by: Kyle McMartin k...@redhat.com

---
cc-ing stable@ so this headers fix gets picked up by distros.


Acked-by: Andy Grover agro...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 10/25] virtio: add API to enable VQs early

2014-11-10 Thread Andy Grover


On 10/13/2014 12:50 AM, Michael S. Tsirkin wrote:

virtio spec 0.9.X requires DRIVER_OK to be set before
VQs are used, but some drivers use VQs before probe
function returns.
Since DRIVER_OK is set after probe, this violates the spec.

Even though under virtio 1.0 transitional devices support this
behaviour, we want to make it possible for those early callers to become
spec compliant and eventually support non-transitional devices.

Add API for drivers to call before using VQs.

Sets DRIVER_OK internally.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
---
  include/linux/virtio_config.h | 17 +
  1 file changed, 17 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index e8f8f71..e36403b 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct 
virtio_device *vdev,
return vq;
  }

+/**
+ * virtio_device_ready - enable vq use in probe function
+ * @vdev: the device
+ *
+ * Driver must call this to use vqs in the probe function.
+ *
+ * Note: vqs are enabled automatically after probe returns.
+ */
+static inline
+void virtio_device_ready(struct virtio_device *dev)
+{
+   unsigned status = dev->config->get_status(dev);
+
+   BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+   dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
+}


Getting a BUG when booting via KVM, host Fedora 20, guest Fedora 20.

my config is at:

https://fedorapeople.org/~grover/config-20141110



[0.828494] [ cut here ]
[0.829039] kernel BUG at 
/home/agrover/git/kernel/include/linux/virtio_config.h:125!

[0.831266] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[0.831266] Modules linked in:
[0.831266] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 3.18.0-rc4 #120
[0.831266] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[0.831266] Workqueue: events control_work_handler
[0.831266] task: 88003cd98000 ti: 88003cd94000 task.ti: 
88003cd94000
[0.831266] RIP: 0010:[]  [] 
add_port+0x264/0x410

[0.831266] RSP: :88003cd97c78  EFLAGS: 00010202
[0.831266] RAX: 0007 RBX: 88003c58c400 RCX: 
0001
[0.831266] RDX: c132 RSI: 81a955e9 RDI: 
0001c132
[0.831266] RBP: 88003cd97cc8 R08:  R09: 

[0.831266] R10: 0001 R11:  R12: 
88003c58be00
[0.831266] R13: 0001 R14: 8800395ca800 R15: 
88003c58c420
[0.831266] FS:  () GS:88003fa0() 
knlGS:

[0.831266] CS:  0010 DS:  ES:  CR0: 8005003b
[0.831266] CR2:  CR3: 01c11000 CR4: 
06e0
[0.831266] DR0:  DR1:  DR2: 

[0.831266] DR3:  DR6: fffe0ff0 DR7: 
0400

[0.831266] Stack:
[0.831266]  8801 0292  
0001
[0.831266]  88003cd97cc8 88003dfa8a20 88003c58beb8 
88003c58be10
[0.831266]  8800395a2000  88003cd97d38 
8144531a

[0.831266] Call Trace:
[0.831266]  [] control_work_handler+0x16a/0x3c0
[0.831266]  [] ? process_one_work+0x208/0x500
[0.831266]  [] process_one_work+0x2ac/0x500
[0.831266]  [] ? process_one_work+0x208/0x500
[0.831266]  [] worker_thread+0x2ce/0x4e0
[0.831266]  [] ? process_one_work+0x500/0x500
[0.831266]  [] kthread+0xf8/0x100
[0.831266]  [] ? trace_hardirqs_on+0xd/0x10
[0.831266]  [] ? kthread_stop+0x140/0x140
[0.831266]  [] ret_from_fork+0x7c/0xb0
[0.831266]  [] ? kthread_stop+0x140/0x140
[0.831266] Code: c7 c2 48 31 01 83 48 c7 c6 e9 55 a9 81 e8 55 b4 c6 
ff 4d 8b b4 24 58 01 00 00 49 8b 86 e8 04 00 00 4c 89 f7 ff 50 10 a8 04 
74 0c <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 49 8b 96 e8 04 00 00 83 c8

[0.831266] RIP  [] add_port+0x264/0x410
[0.831266]  RSP 
[0.878202] ---[ end trace f98fbb172cc7bbf4 ]---

Thanks -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 10/25] virtio: add API to enable VQs early

2014-11-10 Thread Andy Grover


On 10/13/2014 12:50 AM, Michael S. Tsirkin wrote:

virtio spec 0.9.X requires DRIVER_OK to be set before
VQs are used, but some drivers use VQs before probe
function returns.
Since DRIVER_OK is set after probe, this violates the spec.

Even though under virtio 1.0 transitional devices support this
behaviour, we want to make it possible for those early callers to become
spec compliant and eventually support non-transitional devices.

Add API for drivers to call before using VQs.

Sets DRIVER_OK internally.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
  include/linux/virtio_config.h | 17 +
  1 file changed, 17 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index e8f8f71..e36403b 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct 
virtio_device *vdev,
return vq;
  }

+/**
+ * virtio_device_ready - enable vq use in probe function
+ * @vdev: the device
+ *
+ * Driver must call this to use vqs in the probe function.
+ *
+ * Note: vqs are enabled automatically after probe returns.
+ */
+static inline
+void virtio_device_ready(struct virtio_device *dev)
+{
+   unsigned status = dev-config-get_status(dev);
+
+   BUG_ON(status  VIRTIO_CONFIG_S_DRIVER_OK);
+   dev-config-set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
+}


Getting a BUG when booting via KVM, host Fedora 20, guest Fedora 20.

my config is at:

https://fedorapeople.org/~grover/config-20141110



[0.828494] [ cut here ]
[0.829039] kernel BUG at 
/home/agrover/git/kernel/include/linux/virtio_config.h:125!

[0.831266] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[0.831266] Modules linked in:
[0.831266] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 3.18.0-rc4 #120
[0.831266] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[0.831266] Workqueue: events control_work_handler
[0.831266] task: 88003cd98000 ti: 88003cd94000 task.ti: 
88003cd94000
[0.831266] RIP: 0010:[81445004]  [81445004] 
add_port+0x264/0x410

[0.831266] RSP: :88003cd97c78  EFLAGS: 00010202
[0.831266] RAX: 0007 RBX: 88003c58c400 RCX: 
0001
[0.831266] RDX: c132 RSI: 81a955e9 RDI: 
0001c132
[0.831266] RBP: 88003cd97cc8 R08:  R09: 

[0.831266] R10: 0001 R11:  R12: 
88003c58be00
[0.831266] R13: 0001 R14: 8800395ca800 R15: 
88003c58c420
[0.831266] FS:  () GS:88003fa0() 
knlGS:

[0.831266] CS:  0010 DS:  ES:  CR0: 8005003b
[0.831266] CR2:  CR3: 01c11000 CR4: 
06e0
[0.831266] DR0:  DR1:  DR2: 

[0.831266] DR3:  DR6: fffe0ff0 DR7: 
0400

[0.831266] Stack:
[0.831266]  8801 0292  
0001
[0.831266]  88003cd97cc8 88003dfa8a20 88003c58beb8 
88003c58be10
[0.831266]  8800395a2000  88003cd97d38 
8144531a

[0.831266] Call Trace:
[0.831266]  [8144531a] control_work_handler+0x16a/0x3c0
[0.831266]  [8108b0c8] ? process_one_work+0x208/0x500
[0.831266]  [8108b16c] process_one_work+0x2ac/0x500
[0.831266]  [8108b0c8] ? process_one_work+0x208/0x500
[0.831266]  [8108b68e] worker_thread+0x2ce/0x4e0
[0.831266]  [8108b3c0] ? process_one_work+0x500/0x500
[0.831266]  [81090b28] kthread+0xf8/0x100
[0.831266]  [810bad7d] ? trace_hardirqs_on+0xd/0x10
[0.831266]  [81090a30] ? kthread_stop+0x140/0x140
[0.831266]  [816ea92c] ret_from_fork+0x7c/0xb0
[0.831266]  [81090a30] ? kthread_stop+0x140/0x140
[0.831266] Code: c7 c2 48 31 01 83 48 c7 c6 e9 55 a9 81 e8 55 b4 c6 
ff 4d 8b b4 24 58 01 00 00 49 8b 86 e8 04 00 00 4c 89 f7 ff 50 10 a8 04 
74 0c 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 49 8b 96 e8 04 00 00 83 c8

[0.831266] RIP  [81445004] add_port+0x264/0x410
[0.831266]  RSP 88003cd97c78
[0.878202] ---[ end trace f98fbb172cc7bbf4 ]---

Thanks -- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 1/4] target: Remove unneeded check in sbc_parse_cdb

2014-10-01 Thread Andy Grover

The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's
refactoring of this function. There are no places where that flag is set
that cmd->execute_cmd isn't also set.

Signed-off-by: Andy Grover 
---
 drivers/target/target_core_sbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index bd78d92..ebe62af 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
}
 
/* reject any command that we don't have a handler for */
-   if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) && !cmd->execute_cmd)
+   if (!cmd->execute_cmd)
return TCM_UNSUPPORTED_SCSI_OPCODE;
 
if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 0/4] Userspace Passthrough backstore for LIO

2014-10-01 Thread Andy Grover

Hi Nick and all,

Here's version 2, against Nick's target-pending/for-next. Note that
since that tree is still on 3.17-rc5, in testing I ran into an issue
fixed by:

genhd: fix leftover might_sleep() in blk_free_devt()

which first appears in 3.17-rc7.

Incorporates feedback from Nick.

Doc fixes
* add table of contents
* add example code for the user-side
* document specific opcodes that IO mode passes through

Code fixes
* add back support for PASS_ALL and PASS_IO pass_level option.
* Removed retry-in-kernel mechanism
* fix issues from kbuild test robot
* remove kref from tcmu_dev and call kfree(udev) from tcmu_free_device
  instead
* set hw_block_size to 512 instead of 4096
* Don't copy DMA_FROM_DEVICE data if there was a CHECK_CONDITION
* use goto-style error handling in tcmu_configure_device
* Check retval from genlmsg_multicast and ignore -ESRCH
* properly handle if device is unconfigured in tcmu_free_device

Thanks -- Andy

Andy Grover (4):
  target: Remove unneeded check in sbc_parse_cdb
  uio: Export definition of struct uio_device
  target: Add documentation on the target userspace pass-through driver
  target: Add a user-passthrough backstore

 Documentation/target/tcmu-design.txt   |  378 +++
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_sbc.c   |2 +-
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1163 
 drivers/uio/uio.c  |   12 -
 include/linux/uio_driver.h |   12 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 10 files changed, 1706 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/target/tcmu-design.txt
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 4/4] target: Add a user-passthrough backstore

2014-10-01 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for returning scsi command status and optional sense
data.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.

Signed-off-by: Andy Grover 
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1163 
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 6 files changed, 1316 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate "TCM/USER Subsystem Plugin for Linux"
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source "drivers/target/loopback/Kconfig"
 source "drivers/target/tcm_fc/Kconfig"
 source "drivers/target/iscsi/Kconfig"
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 9700ea1..9ea0d5f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err("Unable to load target_core_pscsi\n");
 
+   ret = request_module("target_core_user");
+   if (ret != 0)
+   pr_err("Unable to load target_core_user\n");
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..6608ecf
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1163 @@
+/*
+ * Copyright (C) 2013 Shaohua Li 
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License

[PATCHv2 3/4] target: Add documentation on the target userspace pass-through driver

2014-10-01 Thread Andy Grover

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Thanks to Richard W. M. Jones for review, and supplementing this doc
with the first two paragraphs.

Signed-off-by: Andy Grover 
---
 Documentation/target/tcmu-design.txt | 378 +++
 1 file changed, 378 insertions(+)
 create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..5518465
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,378 @@
+Contents:
+
+1) TCM Userspace Design
+  a) Background
+  b) Benefits
+  c) Design constraints
+  d) Implementation overview
+ i. Mailbox
+ ii. Command ring
+ iii. Data Area
+  e) Device discovery
+  f) Device events
+  g) Other contingencies
+2) Writing a user pass-through handler
+  a) Discovering and configuring TCMU uio devices
+  b) Waiting for events on the device(s)
+  c) Managing the command ring
+3) Command filtering and pass_level
+4) A final note
+
+
+TCM Userspace Design
+
+
+TCM is another name for LIO, an in-kernel iSCSI target (server).
+Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
+allows userspace programs to be written which act as iSCSI targets.
+This document describes the design.
+
+The existing kernel provides modules for different SCSI transport
+protocols.  TCM also modularizes the data storage.  There are existing
+modules for file, block device, RAM or using another SCSI device as
+storage.  These are called "backstores" or "storage engines".  These
+built-in modules are implemented entirely as kernel code.
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as "backstores"
+or "storage engines". The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, "TCMU".
+
+
+Benefits:
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints:
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+- Allow future flexibility in user & kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure & run
+- Simple to write a userspace backend
+
+
+Implementation overview:
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process

[PATCHv2 2/4] uio: Export definition of struct uio_device

2014-10-01 Thread Andy Grover

In order to prevent a O(n) search of the filesystem to link up its uio
node with its target configuration, TCMU needs to know the minor number
that UIO assigned. Expose the definition of this struct so TCMU can
access this field.

Signed-off-by: Andy Grover 
---
 drivers/uio/uio.c  | 12 
 include/linux/uio_driver.h | 12 +++-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index a673e5b..60fa627 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -28,18 +28,6 @@
 
 #define UIO_MAX_DEVICES(1U << MINORBITS)
 
-struct uio_device {
-   struct module   *owner;
-   struct device   *dev;
-   int minor;
-   atomic_tevent;
-   struct fasync_struct*async_queue;
-   wait_queue_head_t   wait;
-   struct uio_info *info;
-   struct kobject  *map_dir;
-   struct kobject  *portio_dir;
-};
-
 static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 1ad4724..baa8171 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -63,7 +63,17 @@ struct uio_port {
 
 #define MAX_UIO_PORT_REGIONS   5
 
-struct uio_device;
+struct uio_device {
+struct module   *owner;
+struct device   *dev;
+int minor;
+atomic_tevent;
+struct fasync_struct*async_queue;
+wait_queue_head_t   wait;
+struct uio_info *info;
+struct kobject  *map_dir;
+struct kobject  *portio_dir;
+};
 
 /**
  * struct uio_info - UIO device capabilities
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 2/4] uio: Export definition of struct uio_device

2014-10-01 Thread Andy Grover

In order to prevent a O(n) search of the filesystem to link up its uio
node with its target configuration, TCMU needs to know the minor number
that UIO assigned. Expose the definition of this struct so TCMU can
access this field.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/uio/uio.c  | 12 
 include/linux/uio_driver.h | 12 +++-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index a673e5b..60fa627 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -28,18 +28,6 @@
 
 #define UIO_MAX_DEVICES(1U  MINORBITS)
 
-struct uio_device {
-   struct module   *owner;
-   struct device   *dev;
-   int minor;
-   atomic_tevent;
-   struct fasync_struct*async_queue;
-   wait_queue_head_t   wait;
-   struct uio_info *info;
-   struct kobject  *map_dir;
-   struct kobject  *portio_dir;
-};
-
 static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 1ad4724..baa8171 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -63,7 +63,17 @@ struct uio_port {
 
 #define MAX_UIO_PORT_REGIONS   5
 
-struct uio_device;
+struct uio_device {
+struct module   *owner;
+struct device   *dev;
+int minor;
+atomic_tevent;
+struct fasync_struct*async_queue;
+wait_queue_head_t   wait;
+struct uio_info *info;
+struct kobject  *map_dir;
+struct kobject  *portio_dir;
+};
 
 /**
  * struct uio_info - UIO device capabilities
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 3/4] target: Add documentation on the target userspace pass-through driver

2014-10-01 Thread Andy Grover

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Thanks to Richard W. M. Jones for review, and supplementing this doc
with the first two paragraphs.

Signed-off-by: Andy Grover agro...@redhat.com
---
 Documentation/target/tcmu-design.txt | 378 +++
 1 file changed, 378 insertions(+)
 create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..5518465
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,378 @@
+Contents:
+
+1) TCM Userspace Design
+  a) Background
+  b) Benefits
+  c) Design constraints
+  d) Implementation overview
+ i. Mailbox
+ ii. Command ring
+ iii. Data Area
+  e) Device discovery
+  f) Device events
+  g) Other contingencies
+2) Writing a user pass-through handler
+  a) Discovering and configuring TCMU uio devices
+  b) Waiting for events on the device(s)
+  c) Managing the command ring
+3) Command filtering and pass_level
+4) A final note
+
+
+TCM Userspace Design
+
+
+TCM is another name for LIO, an in-kernel iSCSI target (server).
+Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
+allows userspace programs to be written which act as iSCSI targets.
+This document describes the design.
+
+The existing kernel provides modules for different SCSI transport
+protocols.  TCM also modularizes the data storage.  There are existing
+modules for file, block device, RAM or using another SCSI device as
+storage.  These are called backstores or storage engines.  These
+built-in modules are implemented entirely as kernel code.
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as backstores
+or storage engines. The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, TCMU.
+
+
+Benefits:
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints:
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+- Allow future flexibility in user  kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure  run
+- Simple to write a userspace backend
+
+
+Implementation overview:
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process dies and is restarted with
+the region mapped at a different

[PATCHv2 0/4] Userspace Passthrough backstore for LIO

2014-10-01 Thread Andy Grover

Hi Nick and all,

Here's version 2, against Nick's target-pending/for-next. Note that
since that tree is still on 3.17-rc5, in testing I ran into an issue
fixed by:

genhd: fix leftover might_sleep() in blk_free_devt()

which first appears in 3.17-rc7.

Incorporates feedback from Nick.

Doc fixes
* add table of contents
* add example code for the user-side
* document specific opcodes that IO mode passes through

Code fixes
* add back support for PASS_ALL and PASS_IO pass_level option.
* Removed retry-in-kernel mechanism
* fix issues from kbuild test robot
* remove kref from tcmu_dev and call kfree(udev) from tcmu_free_device
  instead
* set hw_block_size to 512 instead of 4096
* Don't copy DMA_FROM_DEVICE data if there was a CHECK_CONDITION
* use goto-style error handling in tcmu_configure_device
* Check retval from genlmsg_multicast and ignore -ESRCH
* properly handle if device is unconfigured in tcmu_free_device

Thanks -- Andy

Andy Grover (4):
  target: Remove unneeded check in sbc_parse_cdb
  uio: Export definition of struct uio_device
  target: Add documentation on the target userspace pass-through driver
  target: Add a user-passthrough backstore

 Documentation/target/tcmu-design.txt   |  378 +++
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_sbc.c   |2 +-
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1163 
 drivers/uio/uio.c  |   12 -
 include/linux/uio_driver.h |   12 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 10 files changed, 1706 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/target/tcmu-design.txt
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 4/4] target: Add a user-passthrough backstore

2014-10-01 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- SUSE?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for returning scsi command status and optional sense
data.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1163 
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 6 files changed, 1316 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate TCM/USER Subsystem Plugin for Linux
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source drivers/target/loopback/Kconfig
 source drivers/target/tcm_fc/Kconfig
 source drivers/target/iscsi/Kconfig
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 9700ea1..9ea0d5f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err(Unable to load target_core_pscsi\n);
 
+   ret = request_module(target_core_user);
+   if (ret != 0)
+   pr_err(Unable to load target_core_user\n);
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..6608ecf
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1163 @@
+/*
+ * Copyright (C) 2013 Shaohua Li s...@kernel.org
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy

[PATCHv2 1/4] target: Remove unneeded check in sbc_parse_cdb

2014-10-01 Thread Andy Grover

The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's
refactoring of this function. There are no places where that flag is set
that cmd-execute_cmd isn't also set.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/target/target_core_sbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index bd78d92..ebe62af 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
}
 
/* reject any command that we don't have a handler for */
-   if (!(cmd-se_cmd_flags  SCF_SCSI_DATA_CDB)  !cmd-execute_cmd)
+   if (!cmd-execute_cmd)
return TCM_UNSUPPORTED_SCSI_OPCODE;
 
if (cmd-se_cmd_flags  SCF_SCSI_DATA_CDB) {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns

2014-09-23 Thread Andy Grover


On 09/18/2014 04:17 PM, Nicholas A. Bellinger wrote:


I'm currently reviewing #2, #4, #5 and #7 and will consider merging
these.  These cleanups account for most of the LOC reduction, and avoid
most of the larger concerns.

Also for patches like this, they really need testing on your end before
I'll consider them merging.  Compile testing alone for these types of
locking changes is not enough, given the history of regressions with
these types of cleanups.


OK. I'll look for those to show up in your tree, then? And work on 
better dividing up of the remaining changes, and also minimize their 
risk of causing regressions.


Regards -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns

2014-09-23 Thread Andy Grover


On 09/18/2014 04:17 PM, Nicholas A. Bellinger wrote:


I'm currently reviewing #2, #4, #5 and #7 and will consider merging
these.  These cleanups account for most of the LOC reduction, and avoid
most of the larger concerns.

Also for patches like this, they really need testing on your end before
I'll consider them merging.  Compile testing alone for these types of
locking changes is not enough, given the history of regressions with
these types of cleanups.


OK. I'll look for those to show up in your tree, then? And work on 
better dividing up of the remaining changes, and also minimize their 
risk of causing regressions.


Regards -- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-22 Thread Andy Grover


On 09/22/2014 01:58 PM, Nicholas A. Bellinger wrote:

So I'd still like to start for an initial merge with the two different
modes mentioned earlier.  The pure-passthrough mode where everything is
handled by user-space, and an I/O passthrough mode where only
SCF_SCSI_DATA_CDB is passed along to userspace, but all other control
CDBs are handled by existing in-kernel emulation.

Andy, can you re-spin a series with these two approaches in mind..?


Will do. This is actually close to what the initial RFC version I posted 
did.


-- Andy


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-22 Thread Andy Grover


On 09/22/2014 01:58 PM, Nicholas A. Bellinger wrote:

So I'd still like to start for an initial merge with the two different
modes mentioned earlier.  The pure-passthrough mode where everything is
handled by user-space, and an I/O passthrough mode where only
SCF_SCSI_DATA_CDB is passed along to userspace, but all other control
CDBs are handled by existing in-kernel emulation.

Andy, can you re-spin a series with these two approaches in mind..?


Will do. This is actually close to what the initial RFC version I posted 
did.


-- Andy


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Andy Grover


On 09/19/2014 04:51 PM, Alex Elsayed wrote:


Not sure I follow..  How does the proposed passthrough mode prevent
someone from emulating OSDs, media changers, optical disks or anything
else in userspace with TCMU..?

The main thing that the above comments highlight is why attempting to
combine the existing in-kernel emulation with a userspace backend
providing it's own emulation can open up a number of problems with
mismatched state between the two.


It doesn't prevent it, but it _does_ put it in the exact same place as PSCSI
regarding the warnings on the wiki. It means that if someone wants to
implement (say) the optical disc or OSD CDBs, they then lose out on ALUA 
unless they implement it themselves - which seems unnecessary and painful,
since those should really be disjoint. In particular, an OSD backed by RADOS
objects could be a very nice thing indeed, _and_ could really benefit from
ALUA.


Some possible solutions:

1) The first time TCMU sees an opcode it passes it up and notes whether 
it is handled or not. If it was handled then future cmds with that 
opcode are passed up but not retried in the kernel. If it was not 
handled then it and all future commands with that opcode are handled by 
the kernel and not passed up.


2) Same as #1 but TCMU operates on SCSI spec granularity - e.g. handling 
any SSC opcode means userspace must handle all SSC commands.


3) Like #2 but define opcode groupings that must all be implemented 
together, independent of the specifications.


4) Have passthrough mode set at creation, but with more than two modes, 
either grouped by SCSI spec or our own groupings.


5) Never pass up certain opcodes, like the ALUA ones or whatever.


Have a good weekend -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Andy Grover


On 09/19/2014 04:51 PM, Alex Elsayed wrote:


Not sure I follow..  How does the proposed passthrough mode prevent
someone from emulating OSDs, media changers, optical disks or anything
else in userspace with TCMU..?

The main thing that the above comments highlight is why attempting to
combine the existing in-kernel emulation with a userspace backend
providing it's own emulation can open up a number of problems with
mismatched state between the two.


It doesn't prevent it, but it _does_ put it in the exact same place as PSCSI
regarding the warnings on the wiki. It means that if someone wants to
implement (say) the optical disc or OSD CDBs, they then lose out on ALUA co
unless they implement it themselves - which seems unnecessary and painful,
since those should really be disjoint. In particular, an OSD backed by RADOS
objects could be a very nice thing indeed, _and_ could really benefit from
ALUA.


Some possible solutions:

1) The first time TCMU sees an opcode it passes it up and notes whether 
it is handled or not. If it was handled then future cmds with that 
opcode are passed up but not retried in the kernel. If it was not 
handled then it and all future commands with that opcode are handled by 
the kernel and not passed up.


2) Same as #1 but TCMU operates on SCSI spec granularity - e.g. handling 
any SSC opcode means userspace must handle all SSC commands.


3) Like #2 but define opcode groupings that must all be implemented 
together, independent of the specifications.


4) Have passthrough mode set at creation, but with more than two modes, 
either grouped by SCSI spec or our own groupings.


5) Never pass up certain opcodes, like the ALUA ones or whatever.


Have a good weekend -- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns

2014-09-18 Thread Andy Grover


On 09/18/2014 12:38 AM, Nicholas A. Bellinger wrote:

On Sat, 2014-09-13 at 21:55 +0200, Christoph Hellwig wrote:

ping again.  We're getting closer to the end of the 3.18 merge window
and there still hasn't been a response.  Should Andy just send the patches
directly to Linus once 3.18 opens given that they have been out on the list
since Jun 23? (with a positive review from me and no negative one)



Removing unused per WWPN endpoint LUN + per NodeACL MappedLUN memory is
a nice optimization to have, but I'm not yet convinced that extending
existing control path spinlocks to support an array of pointers is
ultimately worth the complexity it adds here.


9 files changed, 250 insertions(+), 367 deletions(-).

This patchset removes 100+ lines of code. Furthermore, I wouldn't 
characterize it as extending locks, so much as putting locks where they 
should've always been. The fact that device_list[foo] is never null 
means we've avoided crashes but not potentially incorrect accesses.



Another concern is how these changes effect active session + device I/O
shutdown, which is an area of regressions I'd rather avoid


I assume this set would spend time in your tree, followed by Linus' tree 
before making it into a release. Also, any logic errors are likely to 
result in a fault, so they should not remain hidden for long.



if the
primary benefit of this series is only reducing memory footprint for
unused LUNs + MappedLUNs.


Yes it does reduce wasted memory, that should be reason enough I'd say. 
But this patchset is also a building block for further improvements that 
are more significant. This set transitions all lun and mappedlun checks 
from checking a flag to checking for NULL. This is necessary before we 
can improve from a fixed-size array to more size-scalable data 
structures like a radix tree, or lockless, with RCU.



 Lowering the TRANSPORT_MAX_LUNS_PER_TPG value
at compile time today is the simple way for reducing overall memory
footprint for folks who need to scale up the number of targets using
smaller individual LUN mappings.


This is only an option for embedded. We should scale the amount of 
memory we use with the number of allocated LUNs and mapped LUNs.



As for something smarter, given the mostly read-only nature of LUN +
MappedLUN pointers to individual TPGT + NodeACLs context, I'd rather see
something based on RCU arrays + percpu_ref counting to avoid this type
of complexity to existing code, and move in the direction of dropping
fast-path I_T ->device_list_lock access all together.


See above about pointers vs flags, this is a first step toward more 
performant *and* space-efficient data structures.



Beyond these objections, there are some useful fixes + cleanups from
this series that I'm OK with merging soon..


I've pushed this patchset to

git://git.kernel.org/pub/scm/linux/kernel/git/grover/linux.git

on two branches against your and Linus' repos:
against-linus
against-target-pending-for-next

(looked-over and compile-tested)

For your convenience.

Regards -- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 0/8] target: Save memory on unused se_dev_entrys and se_luns

2014-09-18 Thread Andy Grover


On 09/18/2014 12:38 AM, Nicholas A. Bellinger wrote:

On Sat, 2014-09-13 at 21:55 +0200, Christoph Hellwig wrote:

ping again.  We're getting closer to the end of the 3.18 merge window
and there still hasn't been a response.  Should Andy just send the patches
directly to Linus once 3.18 opens given that they have been out on the list
since Jun 23? (with a positive review from me and no negative one)



Removing unused per WWPN endpoint LUN + per NodeACL MappedLUN memory is
a nice optimization to have, but I'm not yet convinced that extending
existing control path spinlocks to support an array of pointers is
ultimately worth the complexity it adds here.


9 files changed, 250 insertions(+), 367 deletions(-).

This patchset removes 100+ lines of code. Furthermore, I wouldn't 
characterize it as extending locks, so much as putting locks where they 
should've always been. The fact that device_list[foo] is never null 
means we've avoided crashes but not potentially incorrect accesses.



Another concern is how these changes effect active session + device I/O
shutdown, which is an area of regressions I'd rather avoid


I assume this set would spend time in your tree, followed by Linus' tree 
before making it into a release. Also, any logic errors are likely to 
result in a fault, so they should not remain hidden for long.



if the
primary benefit of this series is only reducing memory footprint for
unused LUNs + MappedLUNs.


Yes it does reduce wasted memory, that should be reason enough I'd say. 
But this patchset is also a building block for further improvements that 
are more significant. This set transitions all lun and mappedlun checks 
from checking a flag to checking for NULL. This is necessary before we 
can improve from a fixed-size array to more size-scalable data 
structures like a radix tree, or lockless, with RCU.



 Lowering the TRANSPORT_MAX_LUNS_PER_TPG value
at compile time today is the simple way for reducing overall memory
footprint for folks who need to scale up the number of targets using
smaller individual LUN mappings.


This is only an option for embedded. We should scale the amount of 
memory we use with the number of allocated LUNs and mapped LUNs.



As for something smarter, given the mostly read-only nature of LUN +
MappedLUN pointers to individual TPGT + NodeACLs context, I'd rather see
something based on RCU arrays + percpu_ref counting to avoid this type
of complexity to existing code, and move in the direction of dropping
fast-path I_T -device_list_lock access all together.


See above about pointers vs flags, this is a first step toward more 
performant *and* space-efficient data structures.



Beyond these objections, there are some useful fixes + cleanups from
this series that I'm OK with merging soon..


I've pushed this patchset to

git://git.kernel.org/pub/scm/linux/kernel/git/grover/linux.git

on two branches against your and Linus' repos:
against-linus
against-target-pending-for-next

(looked-over and compile-tested)

For your convenience.

Regards -- Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/4] Userspace Passthrough backstore for LIO

2014-09-15 Thread Andy Grover

Hi all,

Here is version three of the userspace passthrough backstore code I've
been working on. I think it's ready to be considered for upstream.

If you are wondering what this is, patch 3 is documentation that
should explain, at both a high- and low-level.

There are other pieces needed for this to work properly -- changes to
targetcli and rtslib to easily configure the new backstore type, plus
the user code that hooks up to the passthrough. I've written some code
to do this, called tcmu-runner
(https://github.com/agrover/tcmu-runner) but the userspace development
naturally is hindered until the kernel code is accepted :)

Changes since version 2:
* Incorporate doc improvements from Richard W. M. Jones
* Correct off-by-1 error in tcmu_get_blocks
* execute_rw must be set in the sbc_ops passed to sbc_parse_cdb

Thanks -- Andy

Andy Grover (4):
  target: Remove unneeded check in sbc_parse_cdb
  uio: Export definition of struct uio_device
  target: Add documentation on the target userspace pass-through driver
  target: Add a user-passthrough backstore

 Documentation/target/tcmu-design.txt   |  239 +++
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_sbc.c   |2 +-
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1169 
 drivers/uio/uio.c  |   12 -
 include/linux/uio_driver.h |   12 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 10 files changed, 1573 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/target/tcmu-design.txt
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] target: Add documentation on the target userspace pass-through driver

2014-09-15 Thread Andy Grover

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Thanks to Richard W. M. Jones for review, and supplementing this doc
with the first two paragraphs.

Signed-off-by: Andy Grover 
---
 Documentation/target/tcmu-design.txt | 239 +++
 1 file changed, 239 insertions(+)
 create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..92662be
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,239 @@
+TCM Userspace Design
+
+
+TCM is another name for LIO, an in-kernel iSCSI target (server).
+Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
+allows userspace programs to be written which act as iSCSI targets.
+This document describes the design.
+
+The existing kernel provides modules for different SCSI transport
+protocols.  TCM also modularizes the data storage.  There are existing
+modules for file, block device, RAM or using another SCSI device as
+storage.  These are called "backstores" or "storage engines".  These
+built-in modules are implemented entirely as kernel code.
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as "backstores"
+or "storage engines". The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, "TCMU".
+
+
+Benefits:
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints:
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+- Allow future flexibility in user & kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure & run
+- Simple to write a userspace backend
+
+
+Implementation overview:
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process dies and is restarted with
+the region mapped at a different virtual address.
+
+See target_core_user.h for the struct definitions.
+
+The Mailbox:
+
+The mailbox is always at the start of the shared memory region, and
+contains a version, details about the starting offset and size of the
+command ring, and head and tail pointers to be used by the kernel and
+userspace (respectively) to put commands on the ring, and indicate
+when the commands are completed.
+
+version

[PATCH 2/4] uio: Export definition of struct uio_device

2014-09-15 Thread Andy Grover

In order to prevent a O(n) search of the filesystem to link up its uio
node with its target configuration, TCMU needs to know the minor number
that UIO assigned. Expose the definition of this struct so TCMU can
access this field.

Signed-off-by: Andy Grover 
---
 drivers/uio/uio.c  | 12 
 include/linux/uio_driver.h | 12 +++-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index a673e5b..60fa627 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -28,18 +28,6 @@
 
 #define UIO_MAX_DEVICES(1U << MINORBITS)
 
-struct uio_device {
-   struct module   *owner;
-   struct device   *dev;
-   int minor;
-   atomic_tevent;
-   struct fasync_struct*async_queue;
-   wait_queue_head_t   wait;
-   struct uio_info *info;
-   struct kobject  *map_dir;
-   struct kobject  *portio_dir;
-};
-
 static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 1ad4724..baa8171 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -63,7 +63,17 @@ struct uio_port {
 
 #define MAX_UIO_PORT_REGIONS   5
 
-struct uio_device;
+struct uio_device {
+struct module   *owner;
+struct device   *dev;
+int minor;
+atomic_tevent;
+struct fasync_struct*async_queue;
+wait_queue_head_t   wait;
+struct uio_info *info;
+struct kobject  *map_dir;
+struct kobject  *portio_dir;
+};
 
 /**
  * struct uio_info - UIO device capabilities
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] target: Add a user-passthrough backstore

2014-09-15 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for returning scsi command status and optional sense
data.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.

Signed-off-by: Andy Grover 
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1169 
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 6 files changed, 1322 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate "TCM/USER Subsystem Plugin for Linux"
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source "drivers/target/loopback/Kconfig"
 source "drivers/target/tcm_fc/Kconfig"
 source "drivers/target/iscsi/Kconfig"
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 7fa62fc..f018a8c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err("Unable to load target_core_pscsi\n");
 
+   ret = request_module("target_core_user");
+   if (ret != 0)
+   pr_err("Unable to load target_core_user\n");
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..33b56a8
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1169 @@
+/*
+ * Copyright (C) 2013 Shaohua Li 
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License

[PATCH 1/4] target: Remove unneeded check in sbc_parse_cdb

2014-09-15 Thread Andy Grover

The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's
refactoring of this function. There are no places where that flag is set
that cmd->execute_cmd isn't also set.

Signed-off-by: Andy Grover 
---
 drivers/target/target_core_sbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index bd78d92..ebe62af 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
}
 
/* reject any command that we don't have a handler for */
-   if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) && !cmd->execute_cmd)
+   if (!cmd->execute_cmd)
return TCM_UNSUPPORTED_SCSI_OPCODE;
 
if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] target: Remove unneeded check in sbc_parse_cdb

2014-09-15 Thread Andy Grover

The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's
refactoring of this function. There are no places where that flag is set
that cmd-execute_cmd isn't also set.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/target/target_core_sbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index bd78d92..ebe62af 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
}
 
/* reject any command that we don't have a handler for */
-   if (!(cmd-se_cmd_flags  SCF_SCSI_DATA_CDB)  !cmd-execute_cmd)
+   if (!cmd-execute_cmd)
return TCM_UNSUPPORTED_SCSI_OPCODE;
 
if (cmd-se_cmd_flags  SCF_SCSI_DATA_CDB) {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] uio: Export definition of struct uio_device

2014-09-15 Thread Andy Grover

In order to prevent a O(n) search of the filesystem to link up its uio
node with its target configuration, TCMU needs to know the minor number
that UIO assigned. Expose the definition of this struct so TCMU can
access this field.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/uio/uio.c  | 12 
 include/linux/uio_driver.h | 12 +++-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index a673e5b..60fa627 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -28,18 +28,6 @@
 
 #define UIO_MAX_DEVICES(1U  MINORBITS)
 
-struct uio_device {
-   struct module   *owner;
-   struct device   *dev;
-   int minor;
-   atomic_tevent;
-   struct fasync_struct*async_queue;
-   wait_queue_head_t   wait;
-   struct uio_info *info;
-   struct kobject  *map_dir;
-   struct kobject  *portio_dir;
-};
-
 static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 1ad4724..baa8171 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -63,7 +63,17 @@ struct uio_port {
 
 #define MAX_UIO_PORT_REGIONS   5
 
-struct uio_device;
+struct uio_device {
+struct module   *owner;
+struct device   *dev;
+int minor;
+atomic_tevent;
+struct fasync_struct*async_queue;
+wait_queue_head_t   wait;
+struct uio_info *info;
+struct kobject  *map_dir;
+struct kobject  *portio_dir;
+};
 
 /**
  * struct uio_info - UIO device capabilities
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] target: Add a user-passthrough backstore

2014-09-15 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- SUSE?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for returning scsi command status and optional sense
data.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1169 
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 6 files changed, 1322 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate TCM/USER Subsystem Plugin for Linux
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source drivers/target/loopback/Kconfig
 source drivers/target/tcm_fc/Kconfig
 source drivers/target/iscsi/Kconfig
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 7fa62fc..f018a8c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err(Unable to load target_core_pscsi\n);
 
+   ret = request_module(target_core_user);
+   if (ret != 0)
+   pr_err(Unable to load target_core_user\n);
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..33b56a8
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1169 @@
+/*
+ * Copyright (C) 2013 Shaohua Li s...@kernel.org
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy

[PATCH 3/4] target: Add documentation on the target userspace pass-through driver

2014-09-15 Thread Andy Grover

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Thanks to Richard W. M. Jones for review, and supplementing this doc
with the first two paragraphs.

Signed-off-by: Andy Grover agro...@redhat.com
---
 Documentation/target/tcmu-design.txt | 239 +++
 1 file changed, 239 insertions(+)
 create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..92662be
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,239 @@
+TCM Userspace Design
+
+
+TCM is another name for LIO, an in-kernel iSCSI target (server).
+Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
+allows userspace programs to be written which act as iSCSI targets.
+This document describes the design.
+
+The existing kernel provides modules for different SCSI transport
+protocols.  TCM also modularizes the data storage.  There are existing
+modules for file, block device, RAM or using another SCSI device as
+storage.  These are called backstores or storage engines.  These
+built-in modules are implemented entirely as kernel code.
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as backstores
+or storage engines. The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, TCMU.
+
+
+Benefits:
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints:
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+- Allow future flexibility in user  kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure  run
+- Simple to write a userspace backend
+
+
+Implementation overview:
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process dies and is restarted with
+the region mapped at a different virtual address.
+
+See target_core_user.h for the struct definitions.
+
+The Mailbox:
+
+The mailbox is always at the start of the shared memory region, and
+contains a version, details about the starting offset and size of the
+command ring, and head and tail pointers to be used by the kernel and
+userspace (respectively) to put commands on the ring, and indicate
+when the commands are completed.
+
+version - 1 (userspace should abort if otherwise)
+flags - none yet defined

[PATCH 0/4] Userspace Passthrough backstore for LIO

2014-09-15 Thread Andy Grover

Hi all,

Here is version three of the userspace passthrough backstore code I've
been working on. I think it's ready to be considered for upstream.

If you are wondering what this is, patch 3 is documentation that
should explain, at both a high- and low-level.

There are other pieces needed for this to work properly -- changes to
targetcli and rtslib to easily configure the new backstore type, plus
the user code that hooks up to the passthrough. I've written some code
to do this, called tcmu-runner
(https://github.com/agrover/tcmu-runner) but the userspace development
naturally is hindered until the kernel code is accepted :)

Changes since version 2:
* Incorporate doc improvements from Richard W. M. Jones
* Correct off-by-1 error in tcmu_get_blocks
* execute_rw must be set in the sbc_ops passed to sbc_parse_cdb

Thanks -- Andy

Andy Grover (4):
  target: Remove unneeded check in sbc_parse_cdb
  uio: Export definition of struct uio_device
  target: Add documentation on the target userspace pass-through driver
  target: Add a user-passthrough backstore

 Documentation/target/tcmu-design.txt   |  239 +++
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_sbc.c   |2 +-
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1169 
 drivers/uio/uio.c  |   12 -
 include/linux/uio_driver.h |   12 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 10 files changed, 1573 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/target/tcmu-design.txt
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-09-02 Thread Andy Grover


On 08/31/2014 02:22 PM, Richard W.M. Jones wrote:

Reading this several times, I now think I get what it's trying to say,
but I think it needs to introduces the terms (as the Economist style
does).  Something like this:

   "TCM is the new name for LIO, an in-kernel iSCSI target (server).
   Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
   allows userspace programs to be written which act as iSCSI targets.
   This document describes the design.

   The existing kernel provides modules for different SCSI transport
   protocols.  TCM also modularizes the data storage.  There are
   existing modules for file, block device, RAM or using another SCSI
   device as storage.  These are called "backstores" or "storage
   engines".  These built-in modules are implemented entirely as kernel
   code."

And hopefully having defined a bit of background, the rest of the
document just flows nicely:


Thanks much! I've put this in the doc, and will hopefully send out a 
final patchset for inclusion with the new text in the next week or so.




The only change I made was "another" instead of "the new name" -- 
because to be honest I don't know if there was actually an attempt at a 
name change, and if so if it was successful or not :) LIO seems to have 
stuck, but TCM seems to refer just to the "backend" part of LIO.




Thanks again -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-09-02 Thread Andy Grover


On 08/31/2014 02:22 PM, Richard W.M. Jones wrote:

Reading this several times, I now think I get what it's trying to say,
but I think it needs to introduces the terms (as the Economist style
does).  Something like this:

   TCM is the new name for LIO, an in-kernel iSCSI target (server).
   Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
   allows userspace programs to be written which act as iSCSI targets.
   This document describes the design.

   The existing kernel provides modules for different SCSI transport
   protocols.  TCM also modularizes the data storage.  There are
   existing modules for file, block device, RAM or using another SCSI
   device as storage.  These are called backstores or storage
   engines.  These built-in modules are implemented entirely as kernel
   code.

And hopefully having defined a bit of background, the rest of the
document just flows nicely:


Thanks much! I've put this in the doc, and will hopefully send out a 
final patchset for inclusion with the new text in the next week or so.


begin naming potential-bikeshed wasteoftime

The only change I made was another instead of the new name -- 
because to be honest I don't know if there was actually an attempt at a 
name change, and if so if it was successful or not :) LIO seems to have 
stuck, but TCM seems to refer just to the backend part of LIO.


end bikeshed

Thanks again -- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-08-31 Thread Andy Grover


On 08/30/2014 10:35 AM, Richard W.M. Jones wrote:

On Tue, Jul 01, 2014 at 12:11:14PM -0700, Andy Grover wrote:

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Signed-off-by: Andy Grover 
---
  Documentation/target/tcmu-design.txt | 210 +++
  1 file changed, 210 insertions(+)
  create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..200ff3e
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,210 @@
+TCM Userspace Design
+
+
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as "backstores"
+or "storage engines". The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, "TCMU".


It has to be said that this documentation is terrible.

Jumping "in medias res"[1] is great for fiction, awful for technical
documentation.

I would recommend the Economist Style Guide[2].  They always say
"Barak Obama, President of the United States" the first time he is
mentioned in an article, even though almost everyone knows who Barak
Obama is.

In this case you're leaping into something .. fabrics, LIO,
backstores, target solutions, ... aargh.  Explain what you mean by
each term and how it all fits together.


Thanks for the feedback. I am undoubtedly too close to the details, 
because I thought I *was* explaining things :)


This doc is for people like you -- tech-savvy but unfamiliar with this 
specific area. Would you be so kind as to point out exactly the terms 
this document should explain? Should it explain SCSI and SCSI commands? 
What a SCSI target is? Say "target implementations" rather than "target 
solutions"? Do I need some ASCII art?


Or, if in finding these gaps you've actually picked up the jargon and 
wanted to just take a pass a rewriting it, that would be fine too :) 
I've read some of your libguestfs docs and they were very understandable.


Regards -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-08-31 Thread Andy Grover


On 08/30/2014 10:35 AM, Richard W.M. Jones wrote:

On Tue, Jul 01, 2014 at 12:11:14PM -0700, Andy Grover wrote:

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Signed-off-by: Andy Grover agro...@redhat.com
---
  Documentation/target/tcmu-design.txt | 210 +++
  1 file changed, 210 insertions(+)
  create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..200ff3e
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,210 @@
+TCM Userspace Design
+
+
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as backstores
+or storage engines. The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, TCMU.


It has to be said that this documentation is terrible.

Jumping in medias res[1] is great for fiction, awful for technical
documentation.

I would recommend the Economist Style Guide[2].  They always say
Barak Obama, President of the United States the first time he is
mentioned in an article, even though almost everyone knows who Barak
Obama is.

In this case you're leaping into something .. fabrics, LIO,
backstores, target solutions, ... aargh.  Explain what you mean by
each term and how it all fits together.


Thanks for the feedback. I am undoubtedly too close to the details, 
because I thought I *was* explaining things :)


This doc is for people like you -- tech-savvy but unfamiliar with this 
specific area. Would you be so kind as to point out exactly the terms 
this document should explain? Should it explain SCSI and SCSI commands? 
What a SCSI target is? Say target implementations rather than target 
solutions? Do I need some ASCII art?


Or, if in finding these gaps you've actually picked up the jargon and 
wanted to just take a pass a rewriting it, that would be fine too :) 
I've read some of your libguestfs docs and they were very understandable.


Regards -- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 3/4] target: Add a user-passthrough backstore

2014-08-20 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for scsi command status and sense data, if present.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.
* Random printks in code, still.

Signed-off-by: Andy Grover 
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1158 
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 6 files changed, 1311 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate "TCM/USER Subsystem Plugin for Linux"
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source "drivers/target/loopback/Kconfig"
 source "drivers/target/tcm_fc/Kconfig"
 source "drivers/target/iscsi/Kconfig"
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 7fa62fc..f018a8c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err("Unable to load target_core_pscsi\n");
 
+   ret = request_module("target_core_user");
+   if (ret != 0)
+   pr_err("Unable to load target_core_user\n");
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..d959e04
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1158 @@
+/*
+ * Copyright (C) 2013 Shaohua Li 
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the G

[RFCv2 4/4] target: Add documentation on the target userspace pass-through driver

2014-08-20 Thread Andy Grover

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Signed-off-by: Andy Grover 
---
 Documentation/target/tcmu-design.txt | 229 +++
 1 file changed, 229 insertions(+)
 create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..1012fe4
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,229 @@
+TCM Userspace Design
+
+
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as "backstores"
+or "storage engines". The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, "TCMU".
+
+
+Benefits:
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints:
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+- Allow future flexibility in user & kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure & run
+- Simple to write a userspace backend
+
+
+Implementation overview:
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process dies and is restarted with
+the region mapped at a different virtual address.
+
+See target_core_user.h for the struct definitions.
+
+The Mailbox:
+
+The mailbox is always at the start of the shared memory region, and
+contains a version, details about the starting offset and size of the
+command ring, and head and tail pointers to be used by the kernel and
+userspace (respectively) to put commands on the ring, and indicate
+when the commands are completed.
+
+version - 1 (userspace should abort if otherwise)
+flags - none yet defined.
+cmdr_off - The offset of the start of the command ring from the start
+of the memory region, to account for the mailbox size.
+cmdr_size - The size of the command ring. This does *not* need to be a
+power of two.
+cmd_head - Modified by the kernel to indicate when a command has been
+placed on the ring.
+cmd_tail - Modified by userspace to indicate when it has completed
+processing of a command.
+
+The Command Ring:
+
+Commands are placed on the ring by the kernel incrementing
+mailbox.cmd_head by the size of the command, modulo cmdr_size, and
+then signaling userspace via uio_event_notify(). Once the command is
+completed

[RFCv2 0/4] Userspace pass-through storage engine (backend)

2014-08-20 Thread Andy Grover

Hi all,

Here is version two of the userspace passthrough backstore code I've
been working on.

You can find some documentation on the design in last patch. Please
let me know what you think about the design and the code.

Thanks -- Andy

Changes since version 1:
* Use netlink for reporting device add/removes to userspace
* Remove pass_level in favor of automatic emulation of commands when
  userspace doesn't support them
* Implemented 'tcmu-runner' daemon to handle the user side of the
  interface and make writing handlers easier (not in this patchset,
  see https://github.com/agrover/tcmu-runner).
* Many bug fixes. Can now successfully partition, format, and use a
  TCMU-backed volume.
* Two small patches added for other changes we need.

Operation:
* Using https://github.com/agrover/targetcli-fb/commits/userback and
  https://github.com/agrover/rtslib-fb/commits/userback, create a user
  storage object and give it a size, and 'file/foo.img' as config
  parameter
* Start tcmu-runner (manually for now)
* Create a loopback fabric and link the above storage object to it

Andy Grover (4):
  target: Remove unneeded check in sbc_parse_cdb
  uio: Export definition of struct uio_device
  target: Add a user-passthrough backstore
  target: Add documentation on the target userspace pass-through driver

 Documentation/target/tcmu-design.txt   |  229 +++
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_sbc.c   |2 +-
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1158 
 drivers/uio/uio.c  |   12 -
 include/linux/uio_driver.h |   12 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 10 files changed, 1552 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/target/tcmu-design.txt
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 1/4] target: Remove unneeded check in sbc_parse_cdb

2014-08-20 Thread Andy Grover

The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's
refactoring of this function. There are no places where that flag is set
that cmd->execute_cmd isn't also set.

Signed-off-by: Andy Grover 
---
 drivers/target/target_core_sbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index bd78d92..ebe62af 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
}
 
/* reject any command that we don't have a handler for */
-   if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) && !cmd->execute_cmd)
+   if (!cmd->execute_cmd)
return TCM_UNSUPPORTED_SCSI_OPCODE;
 
if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 2/4] uio: Export definition of struct uio_device

2014-08-20 Thread Andy Grover

In order to prevent a O(n) search of the filesystem to link up its uio
node with its target configuration, TCMU needs to know the minor number
that UIO assigned. Expose the definition of this struct so TCMU can
access this field.

Signed-off-by: Andy Grover 
---
 drivers/uio/uio.c  | 12 
 include/linux/uio_driver.h | 12 +++-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index a673e5b..60fa627 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -28,18 +28,6 @@
 
 #define UIO_MAX_DEVICES(1U << MINORBITS)
 
-struct uio_device {
-   struct module   *owner;
-   struct device   *dev;
-   int minor;
-   atomic_tevent;
-   struct fasync_struct*async_queue;
-   wait_queue_head_t   wait;
-   struct uio_info *info;
-   struct kobject  *map_dir;
-   struct kobject  *portio_dir;
-};
-
 static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 1ad4724..baa8171 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -63,7 +63,17 @@ struct uio_port {
 
 #define MAX_UIO_PORT_REGIONS   5
 
-struct uio_device;
+struct uio_device {
+struct module   *owner;
+struct device   *dev;
+int minor;
+atomic_tevent;
+struct fasync_struct*async_queue;
+wait_queue_head_t   wait;
+struct uio_info *info;
+struct kobject  *map_dir;
+struct kobject  *portio_dir;
+};
 
 /**
  * struct uio_info - UIO device capabilities
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 1/4] target: Remove unneeded check in sbc_parse_cdb

2014-08-20 Thread Andy Grover

The check of SCF_SCSI_DATA_CDB seems to be a remnant from before hch's
refactoring of this function. There are no places where that flag is set
that cmd-execute_cmd isn't also set.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/target/target_core_sbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index bd78d92..ebe62af 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -948,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
}
 
/* reject any command that we don't have a handler for */
-   if (!(cmd-se_cmd_flags  SCF_SCSI_DATA_CDB)  !cmd-execute_cmd)
+   if (!cmd-execute_cmd)
return TCM_UNSUPPORTED_SCSI_OPCODE;
 
if (cmd-se_cmd_flags  SCF_SCSI_DATA_CDB) {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 0/4] Userspace pass-through storage engine (backend)

2014-08-20 Thread Andy Grover

Hi all,

Here is version two of the userspace passthrough backstore code I've
been working on.

You can find some documentation on the design in last patch. Please
let me know what you think about the design and the code.

Thanks -- Andy

Changes since version 1:
* Use netlink for reporting device add/removes to userspace
* Remove pass_level in favor of automatic emulation of commands when
  userspace doesn't support them
* Implemented 'tcmu-runner' daemon to handle the user side of the
  interface and make writing handlers easier (not in this patchset,
  see https://github.com/agrover/tcmu-runner).
* Many bug fixes. Can now successfully partition, format, and use a
  TCMU-backed volume.
* Two small patches added for other changes we need.

Operation:
* Using https://github.com/agrover/targetcli-fb/commits/userback and
  https://github.com/agrover/rtslib-fb/commits/userback, create a user
  storage object and give it a size, and 'file/foo.img' as config
  parameter
* Start tcmu-runner (manually for now)
* Create a loopback fabric and link the above storage object to it

Andy Grover (4):
  target: Remove unneeded check in sbc_parse_cdb
  uio: Export definition of struct uio_device
  target: Add a user-passthrough backstore
  target: Add documentation on the target userspace pass-through driver

 Documentation/target/tcmu-design.txt   |  229 +++
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_sbc.c   |2 +-
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1158 
 drivers/uio/uio.c  |   12 -
 include/linux/uio_driver.h |   12 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 10 files changed, 1552 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/target/tcmu-design.txt
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 2/4] uio: Export definition of struct uio_device

2014-08-20 Thread Andy Grover

In order to prevent a O(n) search of the filesystem to link up its uio
node with its target configuration, TCMU needs to know the minor number
that UIO assigned. Expose the definition of this struct so TCMU can
access this field.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/uio/uio.c  | 12 
 include/linux/uio_driver.h | 12 +++-
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index a673e5b..60fa627 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -28,18 +28,6 @@
 
 #define UIO_MAX_DEVICES(1U  MINORBITS)
 
-struct uio_device {
-   struct module   *owner;
-   struct device   *dev;
-   int minor;
-   atomic_tevent;
-   struct fasync_struct*async_queue;
-   wait_queue_head_t   wait;
-   struct uio_info *info;
-   struct kobject  *map_dir;
-   struct kobject  *portio_dir;
-};
-
 static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 1ad4724..baa8171 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -63,7 +63,17 @@ struct uio_port {
 
 #define MAX_UIO_PORT_REGIONS   5
 
-struct uio_device;
+struct uio_device {
+struct module   *owner;
+struct device   *dev;
+int minor;
+atomic_tevent;
+struct fasync_struct*async_queue;
+wait_queue_head_t   wait;
+struct uio_info *info;
+struct kobject  *map_dir;
+struct kobject  *portio_dir;
+};
 
 /**
  * struct uio_info - UIO device capabilities
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 4/4] target: Add documentation on the target userspace pass-through driver

2014-08-20 Thread Andy Grover

Describes the driver and its interface to make it possible for user
programs to back a LIO-exported LUN.

Signed-off-by: Andy Grover agro...@redhat.com
---
 Documentation/target/tcmu-design.txt | 229 +++
 1 file changed, 229 insertions(+)
 create mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/tcmu-design.txt 
b/Documentation/target/tcmu-design.txt
new file mode 100644
index 000..1012fe4
--- /dev/null
+++ b/Documentation/target/tcmu-design.txt
@@ -0,0 +1,229 @@
+TCM Userspace Design
+
+
+
+Background:
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as backstores
+or storage engines. The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, TCMU.
+
+
+Benefits:
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints:
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+- Allow future flexibility in user  kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure  run
+- Simple to write a userspace backend
+
+
+Implementation overview:
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process dies and is restarted with
+the region mapped at a different virtual address.
+
+See target_core_user.h for the struct definitions.
+
+The Mailbox:
+
+The mailbox is always at the start of the shared memory region, and
+contains a version, details about the starting offset and size of the
+command ring, and head and tail pointers to be used by the kernel and
+userspace (respectively) to put commands on the ring, and indicate
+when the commands are completed.
+
+version - 1 (userspace should abort if otherwise)
+flags - none yet defined.
+cmdr_off - The offset of the start of the command ring from the start
+of the memory region, to account for the mailbox size.
+cmdr_size - The size of the command ring. This does *not* need to be a
+power of two.
+cmd_head - Modified by the kernel to indicate when a command has been
+placed on the ring.
+cmd_tail - Modified by userspace to indicate when it has completed
+processing of a command.
+
+The Command Ring:
+
+Commands are placed on the ring by the kernel incrementing
+mailbox.cmd_head by the size of the command, modulo cmdr_size, and
+then signaling userspace via uio_event_notify(). Once the command is
+completed, userspace updates mailbox.cmd_tail

[RFCv2 3/4] target: Add a user-passthrough backstore

2014-08-20 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- SUSE?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for scsi command status and sense data, if present.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.
* Random printks in code, still.

Signed-off-by: Andy Grover agro...@redhat.com
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1158 
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/target_core_user.h  |  142 
 6 files changed, 1311 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 include/uapi/linux/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate TCM/USER Subsystem Plugin for Linux
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source drivers/target/loopback/Kconfig
 source drivers/target/tcm_fc/Kconfig
 source drivers/target/iscsi/Kconfig
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 7fa62fc..f018a8c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err(Unable to load target_core_pscsi\n);
 
+   ret = request_module(target_core_user);
+   if (ret != 0)
+   pr_err(Unable to load target_core_user\n);
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..d959e04
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1158 @@
+/*
+ * Copyright (C) 2013 Shaohua Li s...@kernel.org
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You

Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-07-08 Thread Andy Grover


[re-adding individual CCs that were dropped]

On 07/05/2014 04:29 AM, Alex Elsayed wrote:

+Device Discovery:
+
+Other devices may be using UIO besides TCMU. Unrelated user processes
+may also be handling different sets of TCMU devices. TCMU userspace
+processes must find their devices by scanning sysfs
+class/uio/uio*/name. For TCMU devices, these names will be of the
+format:
+
+tcm-user//
+
+where "tcm-user" is common for all TCMU-backed UIO devices. 
+will be a userspace-process-unique string to identify the TCMU device
+as expecting to be backed by a certain handler, and  will be an
+additional handler-specific string for the user process to configure
+the device, if needed. Neither  or  can contain ':',
+due to LIO limitations.


It might be good to change this somewhat; in the vast majority of cases it'd
be saner for userspace programs to figure this information out via udev etc.
rather than parsing sysfs themselves. This information is still worth
documenting, but saying things like "must find their devices by scanning
sysfs" is likely to lead to users of this interface making suboptimal
choices.


I agree. There's no getting around a certain degree of work required by 
the backing user program. I'm planning on writing a "tcmu-runner" 
program with a plugin interface, that will handle the event loop, device 
notifications, enumeration, and possibly thread pools, to minimize the 
amount of boilerplate code each implementation must contain.



+Device Events:
+
+If a new device is added or removed, user processes will recieve a HUP
+signal, and should re-scan sysfs. File descriptors for devices no
+longer in sysfs should be closed, and new devices should be opened and
+handled.


Is there a cleaner way to do this? In particular, re-scanning sysfs may
cause race conditions (device removed, one of the same name re-added but a
different UIO device node; probably more to be found). Perhaps recommend
netlink uevents, so that remove+add is noticeable? Also, is the SIGHUP
itself the best option? Could we simply require the user process to listen
for add/remove uevents to get such change notifications, and thus enforce
good behavior?


Yes this sounds better, let's do it this way.


One use case I'm actually interested in is having userspace provide
something other than just SPC - for instance, tgt can provide a virtual tape
library or an OSD, and CDemu can provide emulated optical discs from various
image formats.

Currently, CDemu uses its own out-of-tree driver called VHBA (Virtual Host
Bus Adapter) to do pretty much exactly what TCMU+Loopback would
accomplish... and in the process misses out on all of the other fabrics,
unless you're willing to _re-import_ those devices using PSCSI, which has
its own quirks.

Perhaps there could be a level 0 (or 4, or whatever) which means "explicitly
enabled list of commands" - maybe as a bitmap that could be passed to the
kernel somehow? Hopefully, that could also avoid some of the quirks of PSCSI
regarding ALUA and such - if it's not implemented, leave the relevant bits
at zero, and LIO handles it.


I'm beginning to sour on pass_level and having configurable cmd 
filtering in the kernel interface.


I think a less-clever but simpler approach might be to eliminate 
filtering, and the user process can return CHECK_CONDITION, INVALID 
COMMAND OPERATION CODE for commands it doesn't wish to support. TCMU 
checks for this, and the pending command thus returned can still be 
emulated by LIO (it looks like we could just re-call sbc_parse_cdb and 
target_execute_cmd).



This does look really nice, thanks for writing it!


Thanks for your helpful feedback! :)

-- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-07-08 Thread Andy Grover


[re-adding individual CCs that were dropped]

On 07/05/2014 04:29 AM, Alex Elsayed wrote:

+Device Discovery:
+
+Other devices may be using UIO besides TCMU. Unrelated user processes
+may also be handling different sets of TCMU devices. TCMU userspace
+processes must find their devices by scanning sysfs
+class/uio/uio*/name. For TCMU devices, these names will be of the
+format:
+
+tcm-user/subtype/path
+
+where tcm-user is common for all TCMU-backed UIO devices. subtype
+will be a userspace-process-unique string to identify the TCMU device
+as expecting to be backed by a certain handler, and path will be an
+additional handler-specific string for the user process to configure
+the device, if needed. Neither subtype or path can contain ':',
+due to LIO limitations.


It might be good to change this somewhat; in the vast majority of cases it'd
be saner for userspace programs to figure this information out via udev etc.
rather than parsing sysfs themselves. This information is still worth
documenting, but saying things like must find their devices by scanning
sysfs is likely to lead to users of this interface making suboptimal
choices.


I agree. There's no getting around a certain degree of work required by 
the backing user program. I'm planning on writing a tcmu-runner 
program with a plugin interface, that will handle the event loop, device 
notifications, enumeration, and possibly thread pools, to minimize the 
amount of boilerplate code each implementation must contain.



+Device Events:
+
+If a new device is added or removed, user processes will recieve a HUP
+signal, and should re-scan sysfs. File descriptors for devices no
+longer in sysfs should be closed, and new devices should be opened and
+handled.


Is there a cleaner way to do this? In particular, re-scanning sysfs may
cause race conditions (device removed, one of the same name re-added but a
different UIO device node; probably more to be found). Perhaps recommend
netlink uevents, so that remove+add is noticeable? Also, is the SIGHUP
itself the best option? Could we simply require the user process to listen
for add/remove uevents to get such change notifications, and thus enforce
good behavior?


Yes this sounds better, let's do it this way.


One use case I'm actually interested in is having userspace provide
something other than just SPC - for instance, tgt can provide a virtual tape
library or an OSD, and CDemu can provide emulated optical discs from various
image formats.

Currently, CDemu uses its own out-of-tree driver called VHBA (Virtual Host
Bus Adapter) to do pretty much exactly what TCMU+Loopback would
accomplish... and in the process misses out on all of the other fabrics,
unless you're willing to _re-import_ those devices using PSCSI, which has
its own quirks.

Perhaps there could be a level 0 (or 4, or whatever) which means explicitly
enabled list of commands - maybe as a bitmap that could be passed to the
kernel somehow? Hopefully, that could also avoid some of the quirks of PSCSI
regarding ALUA and such - if it's not implemented, leave the relevant bits
at zero, and LIO handles it.


I'm beginning to sour on pass_level and having configurable cmd 
filtering in the kernel interface.


I think a less-clever but simpler approach might be to eliminate 
filtering, and the user process can return CHECK_CONDITION, INVALID 
COMMAND OPERATION CODE for commands it doesn't wish to support. TCMU 
checks for this, and the pending command thus returned can still be 
emulated by LIO (it looks like we could just re-call sbc_parse_cdb and 
target_execute_cmd).



This does look really nice, thanks for writing it!


Thanks for your helpful feedback! :)

-- Andy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/2] target: userspace pass-through backend

2014-07-07 Thread Andy Grover

Hi, here's some additional beginning bits for the configuration and use 
of this code:


Changes to targetcli-fb and rtslib-fb:

https://github.com/agrover/targetcli-fb/tree/userback
https://github.com/agrover/rtslib-fb/tree/userback

An incredibly-crappy stand-in for the user process that will handle 
requests:


https://github.com/agrover/test-userback

I look forward to your feedback :)

Thanks -- Regards -- Andy

p.s. kernel changes are also available here:
https://github.com/agrover/linux/tree/userback
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/2] target: userspace pass-through backend

2014-07-07 Thread Andy Grover

Hi, here's some additional beginning bits for the configuration and use 
of this code:


Changes to targetcli-fb and rtslib-fb:

https://github.com/agrover/targetcli-fb/tree/userback
https://github.com/agrover/rtslib-fb/tree/userback

An incredibly-crappy stand-in for the user process that will handle 
requests:


https://github.com/agrover/test-userback

I look forward to your feedback :)

Thanks -- Regards -- Andy

p.s. kernel changes are also available here:
https://github.com/agrover/linux/tree/userback
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 2/2] target: Add a user-passthrough backstore

2014-07-01 Thread Andy Grover

Add a LIO storage engine that presents commands to userspace for execution.
This would allow more complex backstores to be implemented out-of-kernel,
and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?)
possible.

It uses a mmap()able UIO device per LUN to share a command ring and data
area. The commands are raw SCSI CDBs and iovs for in/out data. The command
ring is also reused for scsi command status and sense data, if present.

This implementation is based on Shaohua Li's earlier version but heavily
modified. Differences include:

* Shared memory allocated by kernel, not locked-down user pages
* Single ring for command request and response
* Offsets instead of embedded pointers
* Generic SCSI CDB passthrough instead of per-cmd specialization in ring
  format.
* Uses UIO device instead of anon_file passed in mailbox.
* Optional in-kernel handling of some commands.

The main reason for these differences is to permit greater resiliency
if the user process dies or hangs.

Things not yet implemented (on purpose):

* Zero copy. The data area is flexible enough to allow page flipping or
  backend-allocated pages to be used by fabrics, but it's not clear these
  are performance wins. Can come later.
* Out-of-order command completion by userspace. Possible to add by just
  allowing userspace to change cmd_id in rsp cmd entries, but currently
  not supported.
* No locks between kernel cmd submission and completion routines. Sounds
  like it's possible, but this can come later.
* Sparse allocation of mmaped area. Current code vmallocs the whole thing.
  If the mapped area was larger and not fully mapped then the driver would
  have more freedom to change cmd and data area sizes based on demand.

Current code open issues:

* The use of idrs may be overkill -- we maybe can replace them with a
  simple counter to generate cmd_ids, and a hash table to get a cmd_id's
  associated pointer.
* Use of a free-running counter for cmd ring instead of explicit modulo
  math. This would require power-of-2 cmd ring size.
* Random printks in code, still.

Signed-off-by: Andy Grover 
---
 drivers/target/Kconfig |5 +
 drivers/target/Makefile|1 +
 drivers/target/target_core_transport.c |4 +
 drivers/target/target_core_user.c  | 1078 
 drivers/target/target_core_user.h  |  126 
 5 files changed, 1214 insertions(+)
 create mode 100644 drivers/target/target_core_user.c
 create mode 100644 drivers/target/target_core_user.h

diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
index dc2d84a..b03a845 100644
--- a/drivers/target/Kconfig
+++ b/drivers/target/Kconfig
@@ -31,6 +31,11 @@ config TCM_PSCSI
Say Y here to enable the TCM/pSCSI subsystem plugin for non-buffered
passthrough access to Linux/SCSI device
 
+config TCM_USER
+   tristate "TCM/USER Subsystem Plugin for Linux"
+   help
+   Say Y here to enable the TCM/USER subsystem plugin
+
 source "drivers/target/loopback/Kconfig"
 source "drivers/target/tcm_fc/Kconfig"
 source "drivers/target/iscsi/Kconfig"
diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 85b012d..bbb4a7d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_TARGET_CORE) += target_core_mod.o
 obj-$(CONFIG_TCM_IBLOCK)   += target_core_iblock.o
 obj-$(CONFIG_TCM_FILEIO)   += target_core_file.o
 obj-$(CONFIG_TCM_PSCSI)+= target_core_pscsi.o
+obj-$(CONFIG_TCM_USER) += target_core_user.o
 
 # Fabric modules
 obj-$(CONFIG_LOOPBACK_TARGET)  += loopback/
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 7fa62fc..f018a8c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,10 @@ void transport_subsystem_check_init(void)
if (ret != 0)
pr_err("Unable to load target_core_pscsi\n");
 
+   ret = request_module("target_core_user");
+   if (ret != 0)
+   pr_err("Unable to load target_core_user\n");
+
sub_api_initialized = 1;
 }
 
diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
new file mode 100644
index 000..544f3c5
--- /dev/null
+++ b/drivers/target/target_core_user.c
@@ -0,0 +1,1078 @@
+/*
+ * Copyright (C) 2013 Shaohua Li 
+ * Copyright (C) 2014 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.

1 2 >

1 - 100 of 131 matches

Mail list logo