Re: [LSF/MM TOPIC] Page Cache Flexibility for NVM

2019-02-21 Thread Adam Manzanares
On Thu, 2019-02-21 at 19:27 -0500, Jerome Glisse wrote:
> On Thu, Feb 21, 2019 at 11:11:51PM +0000, Adam Manzanares wrote:
> > Hello,
> > 
> > I would like to attend the LSF/MM Summit 2019. I'm interested in
> > several MM topics that are mentioned below as well as Zoned Block
> > Devices and any io determinism topics that come up in the storage
> > track. 
> > 
> > I have been working on a caching layer, hmmap (heterogeneous memory
> > map) [1], for emerging NVM and it is in spirit close to the page
> > cache. The key difference being that the backend device and caching
> > layer of hmmap is pluggable. In addition, hmmap supports DAX and
> > write
> > protection, which I believe are key features for emerging NVMs that
> > may
> > have write/read asymmetry as well as write endurance constraints.
> > Lastly we can leverage hardware, such as a DMA engine, when moving
> > pages between the cache while also allowing direct access if the
> > device
> > is capable.
> > 
> > I am proposing that as an alternative to using NVMs as a NUMA node
> > we expose the NVM through the page cache or a viable alternative
> > and
> > have userspace applications mmap the NVM and hand out memory with
> > their favorite userspace memory allocator.
> > 
> > This would isolate the NVMs to only applications that are well
> > aware
> > of the performance implications of accessing NVM. I believe that
> > all
> > of this work could be solved with the NUMA node approach, but the
> > two
> > approaches are seeming to blur together.
> > 
> > The main points I would like to discuss are:
> > 
> > * Is the page cache model a viable alternative to NVM as a NUMA
> > NODE?
> > * Can we add more flexibility to the page cache?
> > * Should we force separation of NVM through an explicit mmap?
> > 
> > I believe this discussion could be merged with NUMA, memory
> > hierarchy
> > and device memory, Use NVDIMM as NUMA node and NUMA API, or memory
> > reclaim with NUMA balancing.
> 
> What about cache coherency and atomic ? If device block are expose
> through PCIE then there is no cache coherency or atomic and thus
> direct mmap will not have the expected memory model which would
> break program expectation of a mmap.

For the PCIE cache coherency case I would envision that you would map
the memory as read only into the process address space. Once a write
occurs I would then remap the PCIE memory to a page in the proposed
caching mechanism.

I have to think more about what this means for atomic operations.

> 
> This is also one of the reasons i do not see a way forward with NUMA
> and device memory. It can depart from the usual memory too much to
> be drop in like that to unaware application.

I have similar concerns and am trying to segregate the device memory to
aware applications.

> 
> In any case yes this kind of memory falls into the device memory i
> wish to discuss during LSF/MM.
> 
> Cheers,
> Jérôme
> 


Re: [LSF/MM TOPIC] Page Cache Flexibility for NVM

2019-02-21 Thread Adam Manzanares
Forgot the link.

[1] https://github.com/westerndigitalcorporation/hmmap

Take care,
Adam


On Thu, 2019-02-21 at 15:11 -0800, Adam Manzanares wrote:
> Hello,
> 
> I would like to attend the LSF/MM Summit 2019. I'm interested in
> several MM topics that are mentioned below as well as Zoned Block
> Devices and any io determinism topics that come up in the storage
> track. 
> 
> I have been working on a caching layer, hmmap (heterogeneous memory
> map) [1], for emerging NVM and it is in spirit close to the page
> cache. The key difference being that the backend device and caching
> layer of hmmap is pluggable. In addition, hmmap supports DAX and
> write
> protection, which I believe are key features for emerging NVMs that
> may
> have write/read asymmetry as well as write endurance constraints.
> Lastly we can leverage hardware, such as a DMA engine, when moving
> pages between the cache while also allowing direct access if the
> device
> is capable.
> 
> I am proposing that as an alternative to using NVMs as a NUMA node
> we expose the NVM through the page cache or a viable alternative and
> have userspace applications mmap the NVM and hand out memory with
> their favorite userspace memory allocator.
> 
> This would isolate the NVMs to only applications that are well aware
> of the performance implications of accessing NVM. I believe that all
> of this work could be solved with the NUMA node approach, but the two
> approaches are seeming to blur together.
> 
> The main points I would like to discuss are:
> 
> * Is the page cache model a viable alternative to NVM as a NUMA NODE?
> * Can we add more flexibility to the page cache?
> * Should we force separation of NVM through an explicit mmap?
> 
> I believe this discussion could be merged with NUMA, memory hierarchy
> and device memory, Use NVDIMM as NUMA node and NUMA API, or memory
> reclaim with NUMA balancing.
> 
> Here are some performance numbers of hmmap (in development):
> 
> All numbers are collected on a 4GiB hmmap device with a 128MiB cache.
> For the mmap tests I used cgroups to limit the page cache usage to
> 128MiB. All results are an average of 10 runs. W and R access the
> entire device with all threads segregated in the address space. RR
> reads the entire device randomly 8 bytes at a time and is limited to
> 8MiB of data accessed.
> 
> hmmap brd vs. mmap of brd
> 
>   hmmap   mmap
> 
> Threads W R RR  W R RR 
> 
> 1 7.21  5.39  5.04  6.80  5.63  5.23  
> 2 5.19  3.87  3.74  4.66  3.33  3.20
> 4 3.65  2.95  3.07  3.53  2.26  2.18
> 8 4.52  3.43  3.59  4.30  1.98  1.88
> 165.00  3.85  3.98  4.92  2.00  1.99
> 
> 
> 
> Memory Backend Test (Dax capable)
> 
>   hmmap hmmap-dax hmmap-wrprotect
> 
> Threads   W R RRW R RRW R RR 
> 
> 1 6.29  4.94  4.37  2.54  1.36  0.16  7.12  2.13  0.73 
> 2 4.62  3.63  3.57  1.41  0.69  0.08  5.06  1.14  0.41
> 4 3.45  2.97  3.11  0.77  0.36  0.04  3.66  0.63  0.25
> 8 4.10  3.53  3.71  0.44  0.19  0.02  4.03  0.35  0.17
> 164.60  3.98  4.04  0.34  0.16  0.02  4.52  0.27  0.14
> 
> 
> Thanks,
> Adam
> 
> 
> 
> 


Re: [LSF/MM TOPIC] Page Cache Flexibility for NVM

2019-02-21 Thread Adam Manzanares
On Thu, 2019-02-21 at 15:14 -0800, Dave Hansen wrote:
> On 2/21/19 3:11 PM, Adam Manzanares wrote:
> > I am proposing that as an alternative to using NVMs as a NUMA node
> > we expose the NVM through the page cache or a viable alternative
> > and
> > have userspace applications mmap the NVM and hand out memory with
> > their favorite userspace memory allocator.
> 
> Are you proposing that the kernel manage this memory (it's managed in
> the buddy lists, for instance) or that something else manage the
> memory,
> like we do for device-dax or HMM?
> 

I am proposing we use a device-dax or HMM like model.


[LSF/MM TOPIC] Page Cache Flexibility for NVM

2019-02-21 Thread Adam Manzanares
Hello,

I would like to attend the LSF/MM Summit 2019. I'm interested in
several MM topics that are mentioned below as well as Zoned Block
Devices and any io determinism topics that come up in the storage
track. 

I have been working on a caching layer, hmmap (heterogeneous memory
map) [1], for emerging NVM and it is in spirit close to the page
cache. The key difference being that the backend device and caching
layer of hmmap is pluggable. In addition, hmmap supports DAX and write
protection, which I believe are key features for emerging NVMs that may
have write/read asymmetry as well as write endurance constraints.
Lastly we can leverage hardware, such as a DMA engine, when moving
pages between the cache while also allowing direct access if the device
is capable.

I am proposing that as an alternative to using NVMs as a NUMA node
we expose the NVM through the page cache or a viable alternative and
have userspace applications mmap the NVM and hand out memory with
their favorite userspace memory allocator.

This would isolate the NVMs to only applications that are well aware
of the performance implications of accessing NVM. I believe that all
of this work could be solved with the NUMA node approach, but the two
approaches are seeming to blur together.

The main points I would like to discuss are:

* Is the page cache model a viable alternative to NVM as a NUMA NODE?
* Can we add more flexibility to the page cache?
* Should we force separation of NVM through an explicit mmap?

I believe this discussion could be merged with NUMA, memory hierarchy
and device memory, Use NVDIMM as NUMA node and NUMA API, or memory
reclaim with NUMA balancing.

Here are some performance numbers of hmmap (in development):

All numbers are collected on a 4GiB hmmap device with a 128MiB cache.
For the mmap tests I used cgroups to limit the page cache usage to
128MiB. All results are an average of 10 runs. W and R access the
entire device with all threads segregated in the address space. RR
reads the entire device randomly 8 bytes at a time and is limited to
8MiB of data accessed.

hmmap brd vs. mmap of brd

hmmap   mmap

Threads W R RRW R RR 

1   7.21  5.39  5.04  6.80  5.63  5.23  
2   5.19  3.87  3.74  4.66  3.33  3.20
4   3.65  2.95  3.07  3.53  2.26  2.18
8   4.52  3.43  3.59  4.30  1.98  1.88
16  5.00  3.85  3.98  4.92  2.00  1.99



Memory Backend Test (Dax capable)

hmmap hmmap-dax hmmap-wrprotect

Threads W R RRW R RRW R RR 

1   6.29  4.94  4.37  2.54  1.36  0.16  7.12  2.13  0.73 
2   4.62  3.63  3.57  1.41  0.69  0.08  5.06  1.14  0.41
4   3.45  2.97  3.11  0.77  0.36  0.04  3.66  0.63  0.25
8   4.10  3.53  3.71  0.44  0.19  0.02  4.03  0.35  0.17
16  4.60  3.98  4.04  0.34  0.16  0.02  4.52  0.27  0.14


Thanks,
Adam






[PATCH] io_submit.2: Add IOCB_FLAG_IOPRIO

2018-07-13 Thread adam . manzanares
From: Adam Manzanares 

The newly added IOCB_FLAG_IOPRIO aio_flag introduces
new behaviors and return values.

The details of this new feature are posted here:
https://lkml.org/lkml/2018/5/22/809

Signed-off-by: Adam Manzanares 
---
 man2/io_submit.2 | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/man2/io_submit.2 b/man2/io_submit.2
index d17e3122a..15e1ecdea 100644
--- a/man2/io_submit.2
+++ b/man2/io_submit.2
@@ -164,14 +164,26 @@ This is the size of the buffer pointed to by
 This is the file offset at which the I/O operation is to be performed.
 .TP
 .I aio_flags
-This is the flag to be passed iocb structure.
-The only valid value is
-.BR IOCB_FLAG_RESFD ,
-which indicates that the asynchronous I/O control must signal the file
+This is the set of flags associated with the iocb structure.
+The valid values are:
+.RS
+.TP
+.BR IOCB_FLAG_RESFD
+Asynchronous I/O control must signal the file
 descriptor mentioned in
 .I aio_resfd
 upon completion.
 .TP
+.BR IOCB_FLAG_IOPRIO " (since Linux 4.18)"
+.\" commit d9a08a9e616beeccdbd0e7262b7225ffdfa49e92
+Interpret the
+.I aio_reqprio
+field as an
+.B IOPRIO_VALUE
+as defined by
+.IR linux/ioprio.h.
+.RE
+.TP
 .I aio_resfd
 The file descriptor to signal in the event of asynchronous I/O completion.
 .SH RETURN VALUE
@@ -196,13 +208,21 @@ The AIO context specified by \fIctx_id\fP is invalid.
 \fInr\fP is less than 0.
 The \fIiocb\fP at
 .I *iocbpp[0]
-is not properly initialized,
-or the operation specified is invalid for the file descriptor
-in the \fIiocb\fP.
+is not properly initialized, the operation specified is invalid for the file
+descriptor in the \fIiocb\fP, or the value in the
+.I aio_reqprio
+field is invalid.
 .TP
 .B ENOSYS
 .BR io_submit ()
 is not implemented on this architecture.
+.TP
+.B EPERM
+The aio_reqprio field is set with the class
+.B IOPRIO_CLASS_RT
+, but the submitting context does not have the
+.B CAP_SYS_ADMIN
+privilege.
 .SH VERSIONS
 .PP
 The asynchronous I/O system calls first appeared in Linux 2.5.
-- 
2.17.1



[PATCH] io_submit.2: Add IOCB_FLAG_IOPRIO

2018-07-13 Thread adam . manzanares
From: Adam Manzanares 

The newly added IOCB_FLAG_IOPRIO aio_flag introduces
new behaviors and return values.

The details of this new feature are posted here:
https://lkml.org/lkml/2018/5/22/809

Signed-off-by: Adam Manzanares 
---
 man2/io_submit.2 | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/man2/io_submit.2 b/man2/io_submit.2
index d17e3122a..15e1ecdea 100644
--- a/man2/io_submit.2
+++ b/man2/io_submit.2
@@ -164,14 +164,26 @@ This is the size of the buffer pointed to by
 This is the file offset at which the I/O operation is to be performed.
 .TP
 .I aio_flags
-This is the flag to be passed iocb structure.
-The only valid value is
-.BR IOCB_FLAG_RESFD ,
-which indicates that the asynchronous I/O control must signal the file
+This is the set of flags associated with the iocb structure.
+The valid values are:
+.RS
+.TP
+.BR IOCB_FLAG_RESFD
+Asynchronous I/O control must signal the file
 descriptor mentioned in
 .I aio_resfd
 upon completion.
 .TP
+.BR IOCB_FLAG_IOPRIO " (since Linux 4.18)"
+.\" commit d9a08a9e616beeccdbd0e7262b7225ffdfa49e92
+Interpret the
+.I aio_reqprio
+field as an
+.B IOPRIO_VALUE
+as defined by
+.IR linux/ioprio.h.
+.RE
+.TP
 .I aio_resfd
 The file descriptor to signal in the event of asynchronous I/O completion.
 .SH RETURN VALUE
@@ -196,13 +208,21 @@ The AIO context specified by \fIctx_id\fP is invalid.
 \fInr\fP is less than 0.
 The \fIiocb\fP at
 .I *iocbpp[0]
-is not properly initialized,
-or the operation specified is invalid for the file descriptor
-in the \fIiocb\fP.
+is not properly initialized, the operation specified is invalid for the file
+descriptor in the \fIiocb\fP, or the value in the
+.I aio_reqprio
+field is invalid.
 .TP
 .B ENOSYS
 .BR io_submit ()
 is not implemented on this architecture.
+.TP
+.B EPERM
+The aio_reqprio field is set with the class
+.B IOPRIO_CLASS_RT
+, but the submitting context does not have the
+.B CAP_SYS_ADMIN
+privilege.
 .SH VERSIONS
 .PP
 The asynchronous I/O system calls first appeared in Linux 2.5.
-- 
2.17.1



Re: [PATCH v7 0/5] AIO add per-command iopriority

2018-05-22 Thread Adam Manzanares


On 5/22/18 11:30 AM, Jens Axboe wrote:
> On 5/22/18 12:30 PM, Al Viro wrote:
>> On Tue, May 22, 2018 at 11:55:04AM -0600, Jens Axboe wrote:
>>> On 5/22/18 11:52 AM, adam.manzana...@wdc.com wrote:
>>>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>>>
>>>> This is the per-I/O equivalent of the ioprio_set system call.
>>>> See the following link for performance implications on a SATA HDD:
>>>> https://lkml.org/lkml/2016/12/6/495
>>>>
>>>> First patch factors ioprio_check_cap function out of ioprio_set system 
>>>> call to
>>>> also be used by the aio ioprio interface.
>>>>
>>>> Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.
>>>>
>>>> Third patch passes ioprio hint from aio iocb to kiocb and initializes kiocb
>>>> ioprio value appropriately when it is not explicitly set.
>>>>
>>>> Fourth patch enables the feature for blkdev.
>>>>
>>>> Fifth patch enables the feature for iomap direct IO
>>>
>>> LGTM, you can add:
>>>
>>> Reviewed-by: Jens Axboe <ax...@kernel.dk>
>>>
>>> Al, are you picking this series up, or should I?
>>
>> Probably better if I do, once I finish reviewing Christoph's patchset -
>> we already have a bunch of stuff around fs/aio.c in this cycle...
> 
> Alright, sounds good, thanks Al.
> 

I was working on the man page update for this feature and noticed I 
could be bit more informative on error if I return the error value 
returned by ioprio_check_cap in fs/aio.c.

Should I resend the whole series?

Re: [PATCH v7 0/5] AIO add per-command iopriority

2018-05-22 Thread Adam Manzanares


On 5/22/18 11:30 AM, Jens Axboe wrote:
> On 5/22/18 12:30 PM, Al Viro wrote:
>> On Tue, May 22, 2018 at 11:55:04AM -0600, Jens Axboe wrote:
>>> On 5/22/18 11:52 AM, adam.manzana...@wdc.com wrote:
>>>> From: Adam Manzanares 
>>>>
>>>> This is the per-I/O equivalent of the ioprio_set system call.
>>>> See the following link for performance implications on a SATA HDD:
>>>> https://lkml.org/lkml/2016/12/6/495
>>>>
>>>> First patch factors ioprio_check_cap function out of ioprio_set system 
>>>> call to
>>>> also be used by the aio ioprio interface.
>>>>
>>>> Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.
>>>>
>>>> Third patch passes ioprio hint from aio iocb to kiocb and initializes kiocb
>>>> ioprio value appropriately when it is not explicitly set.
>>>>
>>>> Fourth patch enables the feature for blkdev.
>>>>
>>>> Fifth patch enables the feature for iomap direct IO
>>>
>>> LGTM, you can add:
>>>
>>> Reviewed-by: Jens Axboe 
>>>
>>> Al, are you picking this series up, or should I?
>>
>> Probably better if I do, once I finish reviewing Christoph's patchset -
>> we already have a bunch of stuff around fs/aio.c in this cycle...
> 
> Alright, sounds good, thanks Al.
> 

I was working on the man page update for this feature and noticed I 
could be bit more informative on error if I return the error value 
returned by ioprio_check_cap in fs/aio.c.

Should I resend the whole series?

[PATCH v7 5/5] fs: iomap dio set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 fs/iomap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..65aae194aeca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -919,6 +919,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
bio->bi_iter.bi_sector =
(iomap->addr + pos - iomap->offset) >> 9;
bio->bi_write_hint = dio->iocb->ki_hint;
+   bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
-- 
2.15.1



[PATCH v7 5/5] fs: iomap dio set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.

Signed-off-by: Adam Manzanares 
Reviewed-by: Jeff Moyer 
Reviewed-by: Christoph Hellwig 
---
 fs/iomap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..65aae194aeca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -919,6 +919,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
bio->bi_iter.bi_sector =
(iomap->addr + pos - iomap->offset) >> 9;
bio->bi_write_hint = dio->iocb->ki_hint;
+   bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
-- 
2.15.1



[PATCH v7 0/5] AIO add per-command iopriority

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and initializes kiocb
ioprio value appropriately when it is not explicitly set.

Fourth patch enables the feature for blkdev.

Fifth patch enables the feature for iomap direct IO

Note: this work is based on top of linux-vfs/for-next

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

v5: rename ki_hint_valid -> ki_hint_validate
remove ki_hint_validate comment and whitespace
remove IOCB_IOPRIO flag
initialize kiocb to have no priority

v6: add __blkdev_direct_IO_simple ioprio support

v7: Tie ki_hint_validate to kiocb ki_hint type
Add additional ki_hint_validate check

Adam Manzanares (5):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support
  fs: blkdev set bio prio from kiocb prio
  fs: iomap dio set bio prio from kiocb prio

 block/ioprio.c   | 22 --
 drivers/block/loop.c |  3 +++
 fs/aio.c | 18 +-
 fs/block_dev.c   |  2 ++
 fs/iomap.c   |  1 +
 include/linux/fs.h   | 16 ++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 8 files changed, 56 insertions(+), 9 deletions(-)

-- 
2.15.1



[PATCH v7 0/5] AIO add per-command iopriority

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and initializes kiocb
ioprio value appropriately when it is not explicitly set.

Fourth patch enables the feature for blkdev.

Fifth patch enables the feature for iomap direct IO

Note: this work is based on top of linux-vfs/for-next

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

v5: rename ki_hint_valid -> ki_hint_validate
remove ki_hint_validate comment and whitespace
remove IOCB_IOPRIO flag
initialize kiocb to have no priority

v6: add __blkdev_direct_IO_simple ioprio support

v7: Tie ki_hint_validate to kiocb ki_hint type
Add additional ki_hint_validate check

Adam Manzanares (5):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support
  fs: blkdev set bio prio from kiocb prio
  fs: iomap dio set bio prio from kiocb prio

 block/ioprio.c   | 22 --
 drivers/block/loop.c |  3 +++
 fs/aio.c | 18 +-
 fs/block_dev.c   |  2 ++
 fs/iomap.c   |  1 +
 include/linux/fs.h   | 16 ++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 8 files changed, 56 insertions(+), 9 deletions(-)

-- 
2.15.1



[PATCH v7 1/5] block: add ioprio_check_cap function

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
privileges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v7 1/5] block: add ioprio_check_cap function

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
privileges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jeff Moyer 
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v7 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
---
 fs/block_dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..11ba99e79d2a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -216,6 +216,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
bio.bi_write_hint = iocb->ki_hint;
bio.bi_private = current;
bio.bi_end_io = blkdev_bio_end_io_simple;
+   bio.bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(, iter);
if (unlikely(ret))
@@ -355,6 +356,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH v7 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assignment.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/aio.c   |  2 +-
 include/linux/fs.h | 13 +++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 755d3f57bcc8..87d8939bb1e4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1450,7 +1450,7 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
req->ki_flags = iocb_flags(req->ki_filp);
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
-   req->ki_hint = file_write_hint(req->ki_filp);
+   req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp));
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..9b76ee73af14 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -299,7 +299,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1927,21 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+   typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
+
+   if (hint <= max_hint)
+   return hint;
+   return 0;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_validate(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v7 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assignment.

Signed-off-by: Adam Manzanares 
---
 fs/aio.c   |  2 +-
 include/linux/fs.h | 13 +++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 755d3f57bcc8..87d8939bb1e4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1450,7 +1450,7 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
req->ki_flags = iocb_flags(req->ki_filp);
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
-   req->ki_hint = file_write_hint(req->ki_filp);
+   req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp));
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..9b76ee73af14 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -299,7 +299,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1927,21 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+   typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
+
+   if (hint <= max_hint)
+   return hint;
+   return 0;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_validate(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v7 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb.

Signed-off-by: Adam Manzanares 
Reviewed-by: Jeff Moyer 
---
 fs/block_dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..11ba99e79d2a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -216,6 +216,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
bio.bi_write_hint = iocb->ki_hint;
bio.bi_private = current;
bio.bi_end_io = blkdev_bio_end_io_simple;
+   bio.bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(, iter);
if (unlikely(ret))
@@ -355,6 +356,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH v7 3/5] fs: Add aio iopriority support

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 include/linux/fs.h   |  3 +++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5d4e31655d96..dd98dfd97f5e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -76,6 +76,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "loop.h"
 
 #include 
@@ -559,6 +561,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+   cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
if (cmd->css)
kthread_associate_blkcg(cmd->css);
 
diff --git a/fs/aio.c b/fs/aio.c
index 87d8939bb1e4..76a9d4c14e55 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp));
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   } else
+   req->ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9b76ee73af14..0c61d5987879 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -300,6 +301,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1942,6 +1944,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, 
struct file *filp)
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
.ki_hint = ki_hint_validate(file_write_hint(filp)),
+   .ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0),
};
 }
 
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v7 3/5] fs: Add aio iopriority support

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares 
Reviewed-by: Jeff Moyer 
Reviewed-by: Christoph Hellwig 
---
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 include/linux/fs.h   |  3 +++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5d4e31655d96..dd98dfd97f5e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -76,6 +76,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "loop.h"
 
 #include 
@@ -559,6 +561,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+   cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
if (cmd->css)
kthread_associate_blkcg(cmd->css);
 
diff --git a/fs/aio.c b/fs/aio.c
index 87d8939bb1e4..76a9d4c14e55 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp));
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   } else
+   req->ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9b76ee73af14..0c61d5987879 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -300,6 +301,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1942,6 +1944,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, 
struct file *filp)
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
.ki_hint = ki_hint_validate(file_write_hint(filp)),
+   .ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0),
};
 }
 
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



Re: [PATCH v6 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread Adam Manzanares


On 5/22/18 9:30 AM, Jens Axboe wrote:
> On 5/22/18 10:24 AM, Goldwyn Rodrigues wrote:
>>
>>
>> On 05/22/2018 10:32 AM, Jens Axboe wrote:
>>> On 5/22/18 9:07 AM, adam.manzana...@wdc.com wrote:
>>>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>>>
>>>> In order to avoid kiocb bloat for per command iopriority support, rw_hint
>>>> is converted from enum to a u16. Added a guard around ki_hint assignment.
>>>>
>>>> Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
>>>> Reviewed-by: Christoph Hellwig <h...@lst.de>
>>>> ---
>>>>   include/linux/fs.h | 13 +++--
>>>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index 7f07977bdfd7..50de40dbbb85 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -284,6 +284,8 @@ enum rw_hint {
>>>>WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
>>>>   };
>>>>   
>>>> +#define MAX_KI_HINT   ((1 << 16) - 1) /* ki_hint type is u16 
>>>> */
>>>
>>> Instead of having to do this and now rely on those now being synced,
>>> how about something like the below.
>>>
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 760d8da1b6c7..070438d0b62d 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -299,7 +299,7 @@ struct kiocb {
>>> void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
>>> void*private;
>>> int ki_flags;
>>> -   enum rw_hintki_hint;
>>> +   u16 ki_hint;
>>>   } __randomize_layout;
>>>   
>>>   static inline bool is_sync_kiocb(struct kiocb *kiocb)
>>> @@ -1927,12 +1927,22 @@ static inline enum rw_hint file_write_hint(struct 
>>> file *file)
>>>   
>>>   static inline int iocb_flags(struct file *file);
>>>   
>>> +static inline u16 ki_hint_validate(enum rw_hint hint)
>>> +{
>>> +   typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
>>
>> This looks complex to me. Would force a reader to lookback at what
>> datatype ki_hint is. I'd prefer to declare it as u16 max_hint = -1, or
>> even the previous #define MAX_KI_HINT format is easier to read. Just a
>> program reading style you are comfortable with though.
> 
> How is it complex? It's defining a type that'll be the same as ki_hint
> in the kiocb, which is _exactly_ what we care about. Any sort of other
> definition will rely on those two locations now being synced. The
> above will never break.
> 
> So I strongly disagree. The above will _never_ require the reader to
> figure out what the type is. Any other variant will _always_ require
> the reader to check if they are the same.
> 

I do think the previous code was a bit easier to parse at first glance, 
but that is outweighed by the fact that the validate function is now 
directly tied to the kiocb ki_hint type.

I also missed one spot where I should have used ki_hint_validate. Will 
resend soon.

Re: [PATCH v6 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread Adam Manzanares


On 5/22/18 9:30 AM, Jens Axboe wrote:
> On 5/22/18 10:24 AM, Goldwyn Rodrigues wrote:
>>
>>
>> On 05/22/2018 10:32 AM, Jens Axboe wrote:
>>> On 5/22/18 9:07 AM, adam.manzana...@wdc.com wrote:
>>>> From: Adam Manzanares 
>>>>
>>>> In order to avoid kiocb bloat for per command iopriority support, rw_hint
>>>> is converted from enum to a u16. Added a guard around ki_hint assignment.
>>>>
>>>> Signed-off-by: Adam Manzanares 
>>>> Reviewed-by: Christoph Hellwig 
>>>> ---
>>>>   include/linux/fs.h | 13 +++--
>>>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index 7f07977bdfd7..50de40dbbb85 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -284,6 +284,8 @@ enum rw_hint {
>>>>WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
>>>>   };
>>>>   
>>>> +#define MAX_KI_HINT   ((1 << 16) - 1) /* ki_hint type is u16 
>>>> */
>>>
>>> Instead of having to do this and now rely on those now being synced,
>>> how about something like the below.
>>>
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 760d8da1b6c7..070438d0b62d 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -299,7 +299,7 @@ struct kiocb {
>>> void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
>>> void*private;
>>> int ki_flags;
>>> -   enum rw_hintki_hint;
>>> +   u16 ki_hint;
>>>   } __randomize_layout;
>>>   
>>>   static inline bool is_sync_kiocb(struct kiocb *kiocb)
>>> @@ -1927,12 +1927,22 @@ static inline enum rw_hint file_write_hint(struct 
>>> file *file)
>>>   
>>>   static inline int iocb_flags(struct file *file);
>>>   
>>> +static inline u16 ki_hint_validate(enum rw_hint hint)
>>> +{
>>> +   typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
>>
>> This looks complex to me. Would force a reader to lookback at what
>> datatype ki_hint is. I'd prefer to declare it as u16 max_hint = -1, or
>> even the previous #define MAX_KI_HINT format is easier to read. Just a
>> program reading style you are comfortable with though.
> 
> How is it complex? It's defining a type that'll be the same as ki_hint
> in the kiocb, which is _exactly_ what we care about. Any sort of other
> definition will rely on those two locations now being synced. The
> above will never break.
> 
> So I strongly disagree. The above will _never_ require the reader to
> figure out what the type is. Any other variant will _always_ require
> the reader to check if they are the same.
> 

I do think the previous code was a bit easier to parse at first glance, 
but that is outweighed by the fact that the validate function is now 
directly tied to the kiocb ki_hint type.

I also missed one spot where I should have used ki_hint_validate. Will 
resend soon.

Re: [PATCH v6 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread Adam Manzanares


On 5/22/18 8:32 AM, Jens Axboe wrote:
> On 5/22/18 9:07 AM, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> In order to avoid kiocb bloat for per command iopriority support, rw_hint
>> is converted from enum to a u16. Added a guard around ki_hint assignment.
>>
>> Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
>> Reviewed-by: Christoph Hellwig <h...@lst.de>
>> ---
>>   include/linux/fs.h | 13 +++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 7f07977bdfd7..50de40dbbb85 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -284,6 +284,8 @@ enum rw_hint {
>>  WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
>>   };
>>   
>> +#define MAX_KI_HINT ((1 << 16) - 1) /* ki_hint type is u16 */
> 
> Instead of having to do this and now rely on those now being synced,
> how about something like the below.
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 760d8da1b6c7..070438d0b62d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -299,7 +299,7 @@ struct kiocb {
>   void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
>   void*private;
>   int ki_flags;
> - enum rw_hintki_hint;
> + u16 ki_hint;
>   } __randomize_layout;
>   
>   static inline bool is_sync_kiocb(struct kiocb *kiocb)
> @@ -1927,12 +1927,22 @@ static inline enum rw_hint file_write_hint(struct 
> file *file)
>   
>   static inline int iocb_flags(struct file *file);
>   
> +static inline u16 ki_hint_validate(enum rw_hint hint)
> +{
> + typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
> +
> + if (hint <= max_hint)
> + return hint;
> +
> + return 0;
> +}
> +
>   static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
>   {
>   *kiocb = (struct kiocb) {
>   .ki_filp = filp,
>   .ki_flags = iocb_flags(filp),
> - .ki_hint = file_write_hint(filp),
> + .ki_hint = ki_hint_validate(file_write_hint(filp)),
>   };
>   }

Looks good. I'll resubmit.

>   
> 

Re: [PATCH v6 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread Adam Manzanares


On 5/22/18 8:32 AM, Jens Axboe wrote:
> On 5/22/18 9:07 AM, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares 
>>
>> In order to avoid kiocb bloat for per command iopriority support, rw_hint
>> is converted from enum to a u16. Added a guard around ki_hint assignment.
>>
>> Signed-off-by: Adam Manzanares 
>> Reviewed-by: Christoph Hellwig 
>> ---
>>   include/linux/fs.h | 13 +++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 7f07977bdfd7..50de40dbbb85 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -284,6 +284,8 @@ enum rw_hint {
>>  WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
>>   };
>>   
>> +#define MAX_KI_HINT ((1 << 16) - 1) /* ki_hint type is u16 */
> 
> Instead of having to do this and now rely on those now being synced,
> how about something like the below.
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 760d8da1b6c7..070438d0b62d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -299,7 +299,7 @@ struct kiocb {
>   void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
>   void*private;
>   int ki_flags;
> - enum rw_hintki_hint;
> + u16 ki_hint;
>   } __randomize_layout;
>   
>   static inline bool is_sync_kiocb(struct kiocb *kiocb)
> @@ -1927,12 +1927,22 @@ static inline enum rw_hint file_write_hint(struct 
> file *file)
>   
>   static inline int iocb_flags(struct file *file);
>   
> +static inline u16 ki_hint_validate(enum rw_hint hint)
> +{
> + typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
> +
> + if (hint <= max_hint)
> + return hint;
> +
> + return 0;
> +}
> +
>   static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
>   {
>   *kiocb = (struct kiocb) {
>   .ki_filp = filp,
>   .ki_flags = iocb_flags(filp),
> - .ki_hint = file_write_hint(filp),
> + .ki_hint = ki_hint_validate(file_write_hint(filp)),
>   };
>   }

Looks good. I'll resubmit.

>   
> 

[PATCH v6 3/5] fs: Add aio iopriority support

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 include/linux/fs.h   |  3 +++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5d4e31655d96..dd98dfd97f5e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -76,6 +76,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "loop.h"
 
 #include 
@@ -559,6 +561,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+   cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
if (cmd->css)
kthread_associate_blkcg(cmd->css);
 
diff --git a/fs/aio.c b/fs/aio.c
index 755d3f57bcc8..8225013f70f0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = file_write_hint(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   } else
+   req->ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 50de40dbbb85..73b749ed3ea1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1942,6 +1944,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, 
struct file *filp)
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
.ki_hint = ki_hint_validate(file_write_hint(filp)),
+   .ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0),
};
 }
 
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v6 3/5] fs: Add aio iopriority support

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares 
Reviewed-by: Jeff Moyer 
Reviewed-by: Christoph Hellwig 
---
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 include/linux/fs.h   |  3 +++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5d4e31655d96..dd98dfd97f5e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -76,6 +76,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "loop.h"
 
 #include 
@@ -559,6 +561,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+   cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
if (cmd->css)
kthread_associate_blkcg(cmd->css);
 
diff --git a/fs/aio.c b/fs/aio.c
index 755d3f57bcc8..8225013f70f0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = file_write_hint(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   } else
+   req->ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 50de40dbbb85..73b749ed3ea1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1942,6 +1944,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, 
struct file *filp)
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
.ki_hint = ki_hint_validate(file_write_hint(filp)),
+   .ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0),
};
 }
 
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v6 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/block_dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..11ba99e79d2a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -216,6 +216,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
bio.bi_write_hint = iocb->ki_hint;
bio.bi_private = current;
bio.bi_end_io = blkdev_bio_end_io_simple;
+   bio.bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(, iter);
if (unlikely(ret))
@@ -355,6 +356,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH v6 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb.

Signed-off-by: Adam Manzanares 
---
 fs/block_dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..11ba99e79d2a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -216,6 +216,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
bio.bi_write_hint = iocb->ki_hint;
bio.bi_private = current;
bio.bi_end_io = blkdev_bio_end_io_simple;
+   bio.bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(, iter);
if (unlikely(ret))
@@ -355,6 +356,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH v6 1/5] block: add ioprio_check_cap function

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
privileges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v6 1/5] block: add ioprio_check_cap function

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
privileges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jeff Moyer 
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v6 5/5] fs: iomap dio set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Jeff Moyer <jmo...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 fs/iomap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..65aae194aeca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -919,6 +919,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
bio->bi_iter.bi_sector =
(iomap->addr + pos - iomap->offset) >> 9;
bio->bi_write_hint = dio->iocb->ki_hint;
+   bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
-- 
2.15.1



[PATCH v6 5/5] fs: iomap dio set bio prio from kiocb prio

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.

Signed-off-by: Adam Manzanares 
Reviewed-by: Jeff Moyer 
Reviewed-by: Christoph Hellwig 
---
 fs/iomap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..65aae194aeca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -919,6 +919,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
bio->bi_iter.bi_sector =
(iomap->addr + pos - iomap->offset) >> 9;
bio->bi_write_hint = dio->iocb->ki_hint;
+   bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
-- 
2.15.1



[PATCH v6 0/5] AIO add per-command iopriority

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and initializes kiocb
ioprio value appropriately when it is not explicitly set.

Fourth patch enables the feature for blkdev.

Fifth patch enables the feature for iomap direct IO

Note: this work is based on top of linux-vfs/for-next

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

v5: rename ki_hint_valid -> ki_hint_validate
remove ki_hint_validate comment and whitespace
remove IOCB_IOPRIO flag
initialize kiocb to have no priority

v6: add __blkdev_direct_IO_simple ioprio support


Adam Manzanares (5):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support
  fs: blkdev set bio prio from kiocb prio
  fs: iomap dio set bio prio from kiocb prio

 block/ioprio.c   | 22 --
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 fs/iomap.c   |  1 +
 include/linux/fs.h   | 16 ++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 8 files changed, 55 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v6 0/5] AIO add per-command iopriority

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and initializes kiocb
ioprio value appropriately when it is not explicitly set.

Fourth patch enables the feature for blkdev.

Fifth patch enables the feature for iomap direct IO

Note: this work is based on top of linux-vfs/for-next

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

v5: rename ki_hint_valid -> ki_hint_validate
remove ki_hint_validate comment and whitespace
remove IOCB_IOPRIO flag
initialize kiocb to have no priority

v6: add __blkdev_direct_IO_simple ioprio support


Adam Manzanares (5):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support
  fs: blkdev set bio prio from kiocb prio
  fs: iomap dio set bio prio from kiocb prio

 block/ioprio.c   | 22 --
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 fs/iomap.c   |  1 +
 include/linux/fs.h   | 16 ++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 8 files changed, 55 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v6 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assignment.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 include/linux/fs.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..50de40dbbb85 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,19 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_validate(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v6 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-22 Thread adam . manzanares
From: Adam Manzanares 

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assignment.

Signed-off-by: Adam Manzanares 
Reviewed-by: Christoph Hellwig 
---
 include/linux/fs.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..50de40dbbb85 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,19 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_validate(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



Re: [PATCH v5 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-21 Thread Adam Manzanares


On 5/21/18 2:04 PM, Jeff Moyer wrote:
> adam.manzana...@wdc.com writes:
> 
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> Now that kiocb has an ioprio field copy this over to the bio when it is
>> created from the kiocb.
>>
>> Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
>> ---
>>   fs/block_dev.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 7ec920e27065..da1e94d2bb75 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -355,6 +355,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  bio->bi_write_hint = iocb->ki_hint;
>>  bio->bi_private = dio;
>>  bio->bi_end_io = blkdev_bio_end_io;
>> +bio->bi_ioprio = iocb->ki_ioprio;
>>   
>>  ret = bio_iov_iter_get_pages(bio, iter);
>>  if (unlikely(ret)) {
> 
> Forgot to mention, you should also convert __blkdev_direct_IO_simple.

NP.

> 
> -Jeff
> 

Re: [PATCH v5 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-21 Thread Adam Manzanares


On 5/21/18 2:04 PM, Jeff Moyer wrote:
> adam.manzana...@wdc.com writes:
> 
>> From: Adam Manzanares 
>>
>> Now that kiocb has an ioprio field copy this over to the bio when it is
>> created from the kiocb.
>>
>> Signed-off-by: Adam Manzanares 
>> ---
>>   fs/block_dev.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 7ec920e27065..da1e94d2bb75 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -355,6 +355,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  bio->bi_write_hint = iocb->ki_hint;
>>  bio->bi_private = dio;
>>  bio->bi_end_io = blkdev_bio_end_io;
>> +bio->bi_ioprio = iocb->ki_ioprio;
>>   
>>  ret = bio_iov_iter_get_pages(bio, iter);
>>  if (unlikely(ret)) {
> 
> Forgot to mention, you should also convert __blkdev_direct_IO_simple.

NP.

> 
> -Jeff
> 

Re: [PATCH v5 0/5] AIO add per-command iopriority

2018-05-21 Thread Adam Manzanares


On 5/21/18 1:57 PM, Jeff Moyer wrote:
> Hi, Adam,
> 
> adam.manzana...@wdc.com writes:
> 
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>> See the following link for performance implications on a SATA HDD:
>> https://lkml.org/lkml/2016/12/6/495
>>
>> First patch factors ioprio_check_cap function out of ioprio_set system call 
>> to
>> also be used by the aio ioprio interface.
>>
>> Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.
>>
>> Third patch passes ioprio hint from aio iocb to kiocb and inititalizes kiocb
>> ioprio value appropriately when it is not explicitly set.
>>
>> Fourth patch enables the feature for blkdev.
>>
>> Fifth patch enables the feature for iomap direct IO
>>
>> Note: this work is based on top of linux-vfs/for-next
> 
> I'll cook up a libaio test case.  Can you put together a man-pages
> update for this?

Hello Jeff,

I'll send out a man-pages update soon.

Thanks,
Adam

> 
> Thanks!
> Jeff
> 

Re: [PATCH v5 0/5] AIO add per-command iopriority

2018-05-21 Thread Adam Manzanares


On 5/21/18 1:57 PM, Jeff Moyer wrote:
> Hi, Adam,
> 
> adam.manzana...@wdc.com writes:
> 
>> From: Adam Manzanares 
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>> See the following link for performance implications on a SATA HDD:
>> https://lkml.org/lkml/2016/12/6/495
>>
>> First patch factors ioprio_check_cap function out of ioprio_set system call 
>> to
>> also be used by the aio ioprio interface.
>>
>> Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.
>>
>> Third patch passes ioprio hint from aio iocb to kiocb and inititalizes kiocb
>> ioprio value appropriately when it is not explicitly set.
>>
>> Fourth patch enables the feature for blkdev.
>>
>> Fifth patch enables the feature for iomap direct IO
>>
>> Note: this work is based on top of linux-vfs/for-next
> 
> I'll cook up a libaio test case.  Can you put together a man-pages
> update for this?

Hello Jeff,

I'll send out a man-pages update soon.

Thanks,
Adam

> 
> Thanks!
> Jeff
> 

[PATCH v5 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-21 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/block_dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..da1e94d2bb75 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH v5 4/5] fs: blkdev set bio prio from kiocb prio

2018-05-21 Thread adam . manzanares
From: Adam Manzanares 

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb.

Signed-off-by: Adam Manzanares 
---
 fs/block_dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..da1e94d2bb75 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH v5 5/5] fs: iomap dio set bio prio from kiocb prio

2018-05-21 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/iomap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..65aae194aeca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -919,6 +919,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
bio->bi_iter.bi_sector =
(iomap->addr + pos - iomap->offset) >> 9;
bio->bi_write_hint = dio->iocb->ki_hint;
+   bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
-- 
2.15.1



[PATCH v5 5/5] fs: iomap dio set bio prio from kiocb prio

2018-05-21 Thread adam . manzanares
From: Adam Manzanares 

Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.

Signed-off-by: Adam Manzanares 
---
 fs/iomap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..65aae194aeca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -919,6 +919,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
bio->bi_iter.bi_sector =
(iomap->addr + pos - iomap->offset) >> 9;
bio->bi_write_hint = dio->iocb->ki_hint;
+   bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
 
-- 
2.15.1



[PATCH v5 1/5] block: add ioprio_check_cap function

2018-05-21 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v5 0/5] AIO add per-command iopriority

2018-05-21 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and inititalizes kiocb
ioprio value appropriately when it is not explicitly set.

Fourth patch enables the feature for blkdev.

Fifth patch enables the feature for iomap direct IO

Note: this work is based on top of linux-vfs/for-next

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

v5: rename ki_hint_valid -> ki_hint_validate 
remove ki_hint_validate comment and whitespace
remove IOCB_IOPRIO flag
initialize kiocb to have no priority 

Adam Manzanares (5):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support
  fs: blkdev set bio prio from kiocb prio
  fs: iomap dio set bio prio from kiocb prio

 block/ioprio.c   | 22 --
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 fs/block_dev.c   |  1 +
 fs/iomap.c   |  1 +
 include/linux/fs.h   | 16 ++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 8 files changed, 54 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v5 1/5] block: add ioprio_check_cap function

2018-05-21 Thread adam . manzanares
From: Adam Manzanares 

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares 
Reviewed-by: Christoph Hellwig 
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v5 0/5] AIO add per-command iopriority

2018-05-21 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and inititalizes kiocb
ioprio value appropriately when it is not explicitly set.

Fourth patch enables the feature for blkdev.

Fifth patch enables the feature for iomap direct IO

Note: this work is based on top of linux-vfs/for-next

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

v5: rename ki_hint_valid -> ki_hint_validate 
remove ki_hint_validate comment and whitespace
remove IOCB_IOPRIO flag
initialize kiocb to have no priority 

Adam Manzanares (5):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support
  fs: blkdev set bio prio from kiocb prio
  fs: iomap dio set bio prio from kiocb prio

 block/ioprio.c   | 22 --
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 fs/block_dev.c   |  1 +
 fs/iomap.c   |  1 +
 include/linux/fs.h   | 16 ++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 8 files changed, 54 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v5 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-21 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 include/linux/fs.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..50de40dbbb85 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,19 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_validate(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v5 2/5] fs: Convert kiocb rw_hint from enum to u16

2018-05-21 Thread adam . manzanares
From: Adam Manzanares 

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.

Signed-off-by: Adam Manzanares 
---
 include/linux/fs.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..50de40dbbb85 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,19 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_validate(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v5 3/5] fs: Add aio iopriority support

2018-05-21 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

We set the blkdev bio iopriority unconditionally, so we need to guarantee the 
kiocb is initialized properly. Added changes to the loopback driver and 
init_sync_kiocb to achieve this.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 include/linux/fs.h   |  3 +++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5d4e31655d96..dd98dfd97f5e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -76,6 +76,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "loop.h"
 
 #include 
@@ -559,6 +561,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+   cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
if (cmd->css)
kthread_associate_blkcg(cmd->css);
 
diff --git a/fs/aio.c b/fs/aio.c
index f3eae5d5771b..44b4572be524 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = file_write_hint(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   } else
+   req->ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 50de40dbbb85..73b749ed3ea1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1942,6 +1944,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, 
struct file *filp)
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
.ki_hint = ki_hint_validate(file_write_hint(filp)),
+   .ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0),
};
 }
 
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v5 3/5] fs: Add aio iopriority support

2018-05-21 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

We set the blkdev bio iopriority unconditionally, so we need to guarantee the 
kiocb is initialized properly. Added changes to the loopback driver and 
init_sync_kiocb to achieve this.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares 
---
 drivers/block/loop.c |  3 +++
 fs/aio.c | 16 
 include/linux/fs.h   |  3 +++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5d4e31655d96..dd98dfd97f5e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -76,6 +76,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "loop.h"
 
 #include 
@@ -559,6 +561,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+   cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
if (cmd->css)
kthread_associate_blkcg(cmd->css);
 
diff --git a/fs/aio.c b/fs/aio.c
index f3eae5d5771b..44b4572be524 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = file_write_hint(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   } else
+   req->ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 50de40dbbb85..73b749ed3ea1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1942,6 +1944,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, 
struct file *filp)
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
.ki_hint = ki_hint_validate(file_write_hint(filp)),
+   .ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0),
};
 }
 
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



Re: [PATCH v4 3/3] fs: Add aio iopriority support for block_dev

2018-05-18 Thread Adam Manzanares


On 5/18/18 9:06 AM, Christoph Hellwig wrote:
> Looks fine, although I'd split it into a aio and block_dev patch.
> 
> Also please wire this up for the fs/iomap.c direct I/O code, it should
> be essentially the same sniplet as in the block_dev.c code.
> 

Will do.

Re: [PATCH v4 3/3] fs: Add aio iopriority support for block_dev

2018-05-18 Thread Adam Manzanares


On 5/18/18 9:06 AM, Christoph Hellwig wrote:
> Looks fine, although I'd split it into a aio and block_dev patch.
> 
> Also please wire this up for the fs/iomap.c direct I/O code, it should
> be essentially the same sniplet as in the block_dev.c code.
> 

Will do.

Re: [PATCH v4 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-18 Thread Adam Manzanares


On 5/18/18 9:05 AM, Christoph Hellwig wrote:
>> +/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */
> 
> I don't think this comment is very useful.
> 
>> +static inline u16 ki_hint_valid(enum rw_hint hint)
> 
> I'd call this ki_hint_validate.
> 
>> +{
>> +if (hint > MAX_KI_HINT)
>> +return 0;
>> +
>> +return hint;
> 
> Nit: kill the empty line.
> 

I'll clean this up in the next revision.

Re: [PATCH v4 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-18 Thread Adam Manzanares


On 5/18/18 9:05 AM, Christoph Hellwig wrote:
>> +/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */
> 
> I don't think this comment is very useful.
> 
>> +static inline u16 ki_hint_valid(enum rw_hint hint)
> 
> I'd call this ki_hint_validate.
> 
>> +{
>> +if (hint > MAX_KI_HINT)
>> +return 0;
>> +
>> +return hint;
> 
> Nit: kill the empty line.
> 

I'll clean this up in the next revision.

Re: [PATCH v4 3/3] fs: Add aio iopriority support for block_dev

2018-05-18 Thread Adam Manzanares


On 5/18/18 8:14 AM, Jens Axboe wrote:
> On 5/17/18 2:38 PM, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>>
>> When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
>> newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.
>>
>> When a bio is created for an aio request by the block dev we set the priority
>> value of the bio to the user supplied value.
>>
>> This patch depends on block: add ioprio_check_cap function
> 
> Actually, one comment on this one:
> 
>> diff --git a/fs/aio.c b/fs/aio.c
>> index f3eae5d5771b..ff3107aa82d5 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
>> *iocb)
>>  if (iocb->aio_flags & IOCB_FLAG_RESFD)
>>  req->ki_flags |= IOCB_EVENTFD;
>>  req->ki_hint = file_write_hint(req->ki_filp);
>> +if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
>> +/*
>> + * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
>> + * aio_reqprio is interpreted as an I/O scheduling
>> + * class and priority.
>> + */
>> +ret = ioprio_check_cap(iocb->aio_reqprio);
>> +if (ret) {
>> +pr_debug("aio ioprio check cap error\n");
>> +return -EINVAL;
>> +}
>> +
>> +req->ki_ioprio = iocb->aio_reqprio;
>> +req->ki_flags |= IOCB_IOPRIO;
>> +}
> 
> Do we really need IOCB_IOPRIO? All zeroes is no priority set anyway,
> so we should be able to get by with just setting ->ki_ioprio to either
> the priority, or 0.
> 
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 7ec920e27065..970bef79caa6 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  bio->bi_write_hint = iocb->ki_hint;
>>  bio->bi_private = dio;
>>  bio->bi_end_io = blkdev_bio_end_io;
>> +if (iocb->ki_flags & IOCB_IOPRIO)
>> +bio->bi_ioprio = iocb->ki_ioprio;
> 
> And then this assignment can just happen unconditionally.

That is a cleaner way of guaranteeing the ioprio set on the kiocb is 
only set when the user intends to use the ioprio from the iocb.

I'll resend the series.


> 

Re: [PATCH v4 3/3] fs: Add aio iopriority support for block_dev

2018-05-18 Thread Adam Manzanares


On 5/18/18 8:14 AM, Jens Axboe wrote:
> On 5/17/18 2:38 PM, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares 
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>>
>> When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
>> newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.
>>
>> When a bio is created for an aio request by the block dev we set the priority
>> value of the bio to the user supplied value.
>>
>> This patch depends on block: add ioprio_check_cap function
> 
> Actually, one comment on this one:
> 
>> diff --git a/fs/aio.c b/fs/aio.c
>> index f3eae5d5771b..ff3107aa82d5 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
>> *iocb)
>>  if (iocb->aio_flags & IOCB_FLAG_RESFD)
>>  req->ki_flags |= IOCB_EVENTFD;
>>  req->ki_hint = file_write_hint(req->ki_filp);
>> +if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
>> +/*
>> + * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
>> + * aio_reqprio is interpreted as an I/O scheduling
>> + * class and priority.
>> + */
>> +ret = ioprio_check_cap(iocb->aio_reqprio);
>> +if (ret) {
>> +pr_debug("aio ioprio check cap error\n");
>> +return -EINVAL;
>> +}
>> +
>> +req->ki_ioprio = iocb->aio_reqprio;
>> +req->ki_flags |= IOCB_IOPRIO;
>> +}
> 
> Do we really need IOCB_IOPRIO? All zeroes is no priority set anyway,
> so we should be able to get by with just setting ->ki_ioprio to either
> the priority, or 0.
> 
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 7ec920e27065..970bef79caa6 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  bio->bi_write_hint = iocb->ki_hint;
>>  bio->bi_private = dio;
>>  bio->bi_end_io = blkdev_bio_end_io;
>> +if (iocb->ki_flags & IOCB_IOPRIO)
>> +bio->bi_ioprio = iocb->ki_ioprio;
> 
> And then this assignment can just happen unconditionally.

That is a cleaner way of guaranteeing the ioprio set on the kiocb is 
only set when the user intends to use the ioprio from the iocb.

I'll resend the series.


> 

Re: [PATCH v4 0/3] AIO add per-command iopriority

2018-05-18 Thread Adam Manzanares


On 5/17/18 7:41 PM, Jens Axboe wrote:
> On 5/17/18 2:38 PM, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>> See the following link for performance implications on a SATA HDD:
>> https://lkml.org/lkml/2016/12/6/495
>>
>> First patch factors ioprio_check_cap function out of ioprio_set system call 
>> to
>> also be used by the aio ioprio interface.
>>
>> Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.
>>
>> Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev
>> usage of the per I/O ioprio feature.
>>
>> v2: merge patches
>>  use IOCB_FLAG_IOPRIO
>>  validate intended use with IOCB_IOPRIO
>>  add linux-api and linux-block to cc
>>
>> v3: add ioprio_check_cap function
>>  convert kiocb ki_hint to u16
>>  use ioprio_check_cap when adding ioprio to kiocb in aio.c
>>
>> v4: handle IOCB_IOPRIO in aio_prep_rw
>>  note patch 3 depends on patch 1 in commit msg
>>
>> Adam Manzanares (3):
>>block: add ioprio_check_cap function
>>fs: Convert kiocb rw_hint from enum to u16
>>fs: Add aio iopriority support for block_dev
>>
>>   block/ioprio.c   | 22 --
>>   fs/aio.c | 16 
>>   fs/block_dev.c   |  2 ++
>>   include/linux/fs.h   | 17 +++--
>>   include/linux/ioprio.h   |  2 ++
>>   include/uapi/linux/aio_abi.h |  1 +
>>   6 files changed, 52 insertions(+), 8 deletions(-)
> 
> This looks fine to me now. I can pick up #1 for 4.18 - and 2+3 as well,
> unless someone else wants to take them.

Great, thanks Jens.

> 

Re: [PATCH v4 0/3] AIO add per-command iopriority

2018-05-18 Thread Adam Manzanares


On 5/17/18 7:41 PM, Jens Axboe wrote:
> On 5/17/18 2:38 PM, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares 
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>> See the following link for performance implications on a SATA HDD:
>> https://lkml.org/lkml/2016/12/6/495
>>
>> First patch factors ioprio_check_cap function out of ioprio_set system call 
>> to
>> also be used by the aio ioprio interface.
>>
>> Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.
>>
>> Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev
>> usage of the per I/O ioprio feature.
>>
>> v2: merge patches
>>  use IOCB_FLAG_IOPRIO
>>  validate intended use with IOCB_IOPRIO
>>  add linux-api and linux-block to cc
>>
>> v3: add ioprio_check_cap function
>>  convert kiocb ki_hint to u16
>>  use ioprio_check_cap when adding ioprio to kiocb in aio.c
>>
>> v4: handle IOCB_IOPRIO in aio_prep_rw
>>  note patch 3 depends on patch 1 in commit msg
>>
>> Adam Manzanares (3):
>>block: add ioprio_check_cap function
>>fs: Convert kiocb rw_hint from enum to u16
>>fs: Add aio iopriority support for block_dev
>>
>>   block/ioprio.c   | 22 --
>>   fs/aio.c | 16 
>>   fs/block_dev.c   |  2 ++
>>   include/linux/fs.h   | 17 +++--
>>   include/linux/ioprio.h   |  2 ++
>>   include/uapi/linux/aio_abi.h |  1 +
>>   6 files changed, 52 insertions(+), 8 deletions(-)
> 
> This looks fine to me now. I can pick up #1 for 4.18 - and 2+3 as well,
> unless someone else wants to take them.

Great, thanks Jens.

> 

[PATCH v4 3/3] fs: Add aio iopriority support for block_dev

2018-05-17 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.

This patch depends on block: add ioprio_check_cap function

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 21 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index f3eae5d5771b..ff3107aa82d5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = file_write_hint(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   req->ki_flags |= IOCB_IOPRIO;
+   }
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..970bef79caa6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   if (iocb->ki_flags & IOCB_IOPRIO)
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7a90ce387e00..3415e83f6350 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -294,6 +294,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v4 3/3] fs: Add aio iopriority support for block_dev

2018-05-17 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.

This patch depends on block: add ioprio_check_cap function

Signed-off-by: Adam Manzanares 
---
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 21 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index f3eae5d5771b..ff3107aa82d5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1451,6 +1451,22 @@ static int aio_prep_rw(struct kiocb *req, struct iocb 
*iocb)
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
req->ki_hint = file_write_hint(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   return -EINVAL;
+   }
+
+   req->ki_ioprio = iocb->aio_reqprio;
+   req->ki_flags |= IOCB_IOPRIO;
+   }
+
ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
if (unlikely(ret))
fput(req->ki_filp);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..970bef79caa6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   if (iocb->ki_flags & IOCB_IOPRIO)
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7a90ce387e00..3415e83f6350 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -294,6 +294,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..d4e768d55d14 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -55,6 +55,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v4 0/3] AIO add per-command iopriority

2018-05-17 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev
usage of the per I/O ioprio feature.

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

Adam Manzanares (3):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support for block_dev

 block/ioprio.c   | 22 --
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   | 17 +++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 6 files changed, 52 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v4 0/3] AIO add per-command iopriority

2018-05-17 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev
usage of the per I/O ioprio feature.

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

v4: handle IOCB_IOPRIO in aio_prep_rw
note patch 3 depends on patch 1 in commit msg

Adam Manzanares (3):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support for block_dev

 block/ioprio.c   | 22 --
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   | 17 +++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 6 files changed, 52 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v4 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-17 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 include/linux/fs.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..7a90ce387e00 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,21 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */
+static inline u16 ki_hint_valid(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_valid(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v4 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-17 Thread adam . manzanares
From: Adam Manzanares 

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.

Signed-off-by: Adam Manzanares 
---
 include/linux/fs.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..7a90ce387e00 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,21 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */
+static inline u16 ki_hint_valid(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_valid(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v4 1/3] block: add ioprio_check_cap function

2018-05-17 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v4 1/3] block: add ioprio_check_cap function

2018-05-17 Thread adam . manzanares
From: Adam Manzanares 

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares 
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



Re: [PATCH v3 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-09 Thread Adam Manzanares


On 5/9/18 11:21 AM, Theodore Y. Ts'o wrote:
> On Wed, May 09, 2018 at 08:23:00AM -0600, Jens Axboe wrote:
>> Streams is essentially the only thing ki_hint is currently used for,
>> with the write life time hints mapping to a stream. The idea for the
>> user side API was to have other things than just write life time hints.
>>
>> Since Adam wants to do priorities, he'd either need to pack into the
>> existing ki_hint, or do this patch does, which is make it smaller and
>> add a new member. I think the latter is cleaner.
> 
> Fair enough; but maybe we can use a u8 instead of a u16?  65,535
> priorities still seem like way more than would ever make sense.  I
> think 256 priorities is still way to many, but it's simpler while
> still reserving number of bits for future se.

The intention was to mimic the ioprio_set system call, which uses 3 bits 
for a prio class and 13 bits for a prio_value.

IDK what is the right amount of bits to use, but the existing use of 
bits seemed flexible enough to support many types of applications and 
devices.
> 
>  - Ted
> 

Re: [PATCH v3 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-09 Thread Adam Manzanares


On 5/9/18 11:21 AM, Theodore Y. Ts'o wrote:
> On Wed, May 09, 2018 at 08:23:00AM -0600, Jens Axboe wrote:
>> Streams is essentially the only thing ki_hint is currently used for,
>> with the write life time hints mapping to a stream. The idea for the
>> user side API was to have other things than just write life time hints.
>>
>> Since Adam wants to do priorities, he'd either need to pack into the
>> existing ki_hint, or do this patch does, which is make it smaller and
>> add a new member. I think the latter is cleaner.
> 
> Fair enough; but maybe we can use a u8 instead of a u16?  65,535
> priorities still seem like way more than would ever make sense.  I
> think 256 priorities is still way to many, but it's simpler while
> still reserving number of bits for future se.

The intention was to mimic the ioprio_set system call, which uses 3 bits 
for a prio class and 13 bits for a prio_value.

IDK what is the right amount of bits to use, but the existing use of 
bits seemed flexible enough to support many types of applications and 
devices.
> 
>  - Ted
> 

[PATCH v3 0/3] AIO add per-command iopriority

2018-05-08 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to 
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev 
usage of the per I/O ioprio feature.

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

Adam Manzanares (3):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support for block_dev

 block/ioprio.c   | 22 --
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   | 17 +++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 6 files changed, 52 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v3 0/3] AIO add per-command iopriority

2018-05-08 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

First patch factors ioprio_check_cap function out of ioprio_set system call to 
also be used by the aio ioprio interface.

Second patch converts kiocb ki_hint field to a u16 to avoid kiocb bloat.

Third patch passes ioprio hint from aio iocb to kiocb and enables block_dev 
usage of the per I/O ioprio feature.

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

v3: add ioprio_check_cap function
convert kiocb ki_hint to u16
use ioprio_check_cap when adding ioprio to kiocb in aio.c

Adam Manzanares (3):
  block: add ioprio_check_cap function
  fs: Convert kiocb rw_hint from enum to u16
  fs: Add aio iopriority support for block_dev

 block/ioprio.c   | 22 --
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   | 17 +++--
 include/linux/ioprio.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 6 files changed, 52 insertions(+), 8 deletions(-)

-- 
2.15.1



[PATCH v3 3/3] fs: Add aio iopriority support for block_dev

2018-05-08 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 21 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..f43d1d3a2e39 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1597,6 +1597,22 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
req->common.ki_flags |= IOCB_EVENTFD;
}
 
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   goto out_put_req;
+   }
+
+   req->common.ki_ioprio = iocb->aio_reqprio;
+   req->common.ki_flags |= IOCB_IOPRIO;
+   }
+
ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
if (unlikely(ret)) {
pr_debug("EINVAL: aio_rw_flags\n");
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..970bef79caa6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   if (iocb->ki_flags & IOCB_IOPRIO)
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7a90ce387e00..3415e83f6350 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -294,6 +294,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..d4593a6062ef 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -54,6 +54,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v3 3/3] fs: Add aio iopriority support for block_dev

2018-05-08 Thread adam . manzanares
From: Adam Manzanares 

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.

Signed-off-by: Adam Manzanares 
---
 fs/aio.c | 16 
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 21 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..f43d1d3a2e39 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1597,6 +1597,22 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
req->common.ki_flags |= IOCB_EVENTFD;
}
 
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   ret = ioprio_check_cap(iocb->aio_reqprio);
+   if (ret) {
+   pr_debug("aio ioprio check cap error\n");
+   goto out_put_req;
+   }
+
+   req->common.ki_ioprio = iocb->aio_reqprio;
+   req->common.ki_flags |= IOCB_IOPRIO;
+   }
+
ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
if (unlikely(ret)) {
pr_debug("EINVAL: aio_rw_flags\n");
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..970bef79caa6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   if (iocb->ki_flags & IOCB_IOPRIO)
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7a90ce387e00..3415e83f6350 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -294,6 +294,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -302,6 +303,7 @@ struct kiocb {
void*private;
int ki_flags;
u16 ki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..d4593a6062ef 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -54,6 +54,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v3 1/3] block: add ioprio_check_cap function

2018-05-08 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v3 1/3] block: add ioprio_check_cap function

2018-05-08 Thread adam . manzanares
From: Adam Manzanares 

Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch creates the
ioprio_check_cap function to be used by the ioprio_set system call and also by
the aio interface.

Signed-off-by: Adam Manzanares 
---
 block/ioprio.c | 22 --
 include/linux/ioprio.h |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 6f5d0b6625e3..f9821080c92c 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -61,15 +61,10 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 }
 EXPORT_SYMBOL_GPL(set_task_ioprio);
 
-SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+int ioprio_check_cap(int ioprio)
 {
int class = IOPRIO_PRIO_CLASS(ioprio);
int data = IOPRIO_PRIO_DATA(ioprio);
-   struct task_struct *p, *g;
-   struct user_struct *user;
-   struct pid *pgrp;
-   kuid_t uid;
-   int ret;
 
switch (class) {
case IOPRIO_CLASS_RT:
@@ -92,6 +87,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, 
ioprio)
return -EINVAL;
}
 
+   return 0;
+}
+
+SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
+{
+   struct task_struct *p, *g;
+   struct user_struct *user;
+   struct pid *pgrp;
+   kuid_t uid;
+   int ret;
+
+   ret = ioprio_check_cap(ioprio);
+   if (ret)
+   return ret;
+
ret = -ESRCH;
rcu_read_lock();
switch (which) {
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 627efac73e6d..4a28cec49ec3 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,4 +77,6 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+extern int ioprio_check_cap(int ioprio);
+
 #endif
-- 
2.15.1



[PATCH v3 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-08 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 include/linux/fs.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..7a90ce387e00 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,21 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */
+static inline u16 ki_hint_valid(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_valid(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



[PATCH v3 2/3] fs: Convert kiocb rw_hint from enum to u16

2018-05-08 Thread adam . manzanares
From: Adam Manzanares 

In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.

Signed-off-by: Adam Manzanares 
---
 include/linux/fs.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..7a90ce387e00 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -284,6 +284,8 @@ enum rw_hint {
WRITE_LIFE_EXTREME  = RWH_WRITE_LIFE_EXTREME,
 };
 
+#define MAX_KI_HINT((1 << 16) - 1) /* ki_hint type is u16 */
+
 #define IOCB_EVENTFD   (1 << 0)
 #define IOCB_APPEND(1 << 1)
 #define IOCB_DIRECT(1 << 2)
@@ -299,7 +301,7 @@ struct kiocb {
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void*private;
int ki_flags;
-   enum rw_hintki_hint;
+   u16 ki_hint;
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -1927,12 +1929,21 @@ static inline enum rw_hint file_write_hint(struct file 
*file)
 
 static inline int iocb_flags(struct file *file);
 
+/* ki_hint changed from enum to u16, make sure rw_hint fits into u16 */
+static inline u16 ki_hint_valid(enum rw_hint hint)
+{
+   if (hint > MAX_KI_HINT)
+   return 0;
+
+   return hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = iocb_flags(filp),
-   .ki_hint = file_write_hint(filp),
+   .ki_hint = ki_hint_valid(file_write_hint(filp)),
};
 }
 
-- 
2.15.1



Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Adam Manzanares


On 5/3/18 1:24 PM, Jens Axboe wrote:
> On 5/3/18 2:15 PM, Adam Manzanares wrote:
>>
>>
>> On 5/3/18 11:33 AM, Matthew Wilcox wrote:
>>> On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote:
>>>> If we want to avoid bloating struct kiocb, I suggest we turn the private 
>>>> field
>>>> into a union of the private and ki_ioprio field. It seems like the users of
>>>> the private field all use it at a point where we can yank the priority from
>>>> the kiocb before the private field is used. Comments and suggestions 
>>>> welcome.
>>>
>>> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need
>>> 32 bits of ki_hint.  (currently defined values are 1-5)
>>
>> I like the approach of using a u16 for the ki_hint. I'll update and
>> resubmit.
> 
> It's intended to be a mask. If you do shrink it for now, then we need some
> guard code to ensure it can always carry what it needs to.
> 

Got it, I'll add the guard to rw_hint_valid along with a comment about 
being limited by the size of ki_hint in case we get to a situation where 
16 bits is not enough.


Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Adam Manzanares


On 5/3/18 1:24 PM, Jens Axboe wrote:
> On 5/3/18 2:15 PM, Adam Manzanares wrote:
>>
>>
>> On 5/3/18 11:33 AM, Matthew Wilcox wrote:
>>> On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote:
>>>> If we want to avoid bloating struct kiocb, I suggest we turn the private 
>>>> field
>>>> into a union of the private and ki_ioprio field. It seems like the users of
>>>> the private field all use it at a point where we can yank the priority from
>>>> the kiocb before the private field is used. Comments and suggestions 
>>>> welcome.
>>>
>>> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need
>>> 32 bits of ki_hint.  (currently defined values are 1-5)
>>
>> I like the approach of using a u16 for the ki_hint. I'll update and
>> resubmit.
> 
> It's intended to be a mask. If you do shrink it for now, then we need some
> guard code to ensure it can always carry what it needs to.
> 

Got it, I'll add the guard to rw_hint_valid along with a comment about 
being limited by the size of ki_hint in case we get to a situation where 
16 bits is not enough.


Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Adam Manzanares


On 5/3/18 11:36 AM, Jeff Moyer wrote:
> Hi, Adam,

Hello Jeff,

> 
> adam.manzana...@wdc.com writes:
> 
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>>
>> When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
>> newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.
>>
>> When a bio is created for an aio request by the block dev we set the priority
>> value of the bio to the user supplied value.
>>
>> See the following link for performance implications on a SATA HDD:
>> https://lkml.org/lkml/2016/12/6/495
>>
>> Given that WRR support for NVME devices has patches floating around and it 
>> was
>> discussed at LSFMM, we may soon have a lower latency storage device that can
>> make use of iopriorities. A per command iopriority interface seems timely
>> given these developments.
>>
>> If we want to avoid bloating struct kiocb, I suggest we turn the private 
>> field
>> into a union of the private and ki_ioprio field. It seems like the users of
>> the private field all use it at a point where we can yank the priority from
>> the kiocb before the private field is used. Comments and suggestions welcome.
> 
> The ioprio_set system call requires CAP_SYS_ADMIN for setting
> IOPRIO_CLASS_RT.  I think we need similar checks here.

I forgot how dangerous IOPRIO_CLASS_RT can be :). I will make a new 
function based on the checks in ioprio.c and reuse.

Thanks,
Adam

> 
> -Jeff
> 
>>
>> v2: merge patches
>>  use IOCB_FLAG_IOPRIO
>>  validate intended use with IOCB_IOPRIO
>>  add linux-api and linux-block to cc
>>
>> Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
>> ---
>>   fs/aio.c | 10 ++
>>   fs/block_dev.c   |  2 ++
>>   include/linux/fs.h   |  2 ++
>>   include/uapi/linux/aio_abi.h |  1 +
>>   4 files changed, 15 insertions(+)
>>
>> diff --git a/fs/aio.c b/fs/aio.c
>> index 88d7927ffbc6..f36636d8ff2c 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1597,6 +1597,16 @@ static int io_submit_one(struct kioctx *ctx, struct 
>> iocb __user *user_iocb,
>>  req->common.ki_flags |= IOCB_EVENTFD;
>>  }
>>   
>> +if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
>> +/*
>> + * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
>> + * aio_reqprio is interpreted as an I/O scheduling
>> + * class and priority.
>> + */
>> +req->common.ki_ioprio = iocb->aio_reqprio;
>> +req->common.ki_flags |= IOCB_IOPRIO;
>> +}
>> +
>>  ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
>>  if (unlikely(ret)) {
>>  pr_debug("EINVAL: aio_rw_flags\n");
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 7ec920e27065..970bef79caa6 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  bio->bi_write_hint = iocb->ki_hint;
>>  bio->bi_private = dio;
>>  bio->bi_end_io = blkdev_bio_end_io;
>> +if (iocb->ki_flags & IOCB_IOPRIO)
>> +bio->bi_ioprio = iocb->ki_ioprio;
>>   
>>  ret = bio_iov_iter_get_pages(bio, iter);
>>  if (unlikely(ret)) {
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 760d8da1b6c7..ab63ce720305 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -292,6 +292,7 @@ enum rw_hint {
>>   #define IOCB_SYNC  (1 << 5)
>>   #define IOCB_WRITE (1 << 6)
>>   #define IOCB_NOWAIT(1 << 7)
>> +#define IOCB_IOPRIO (1 << 8)
>>   
>>   struct kiocb {
>>  struct file *ki_filp;
>> @@ -300,6 +301,7 @@ struct kiocb {
>>  void*private;
>>  int ki_flags;
>>  enum rw_hintki_hint;
>> +u16 ki_ioprio; /* See linux/ioprio.h */
>>   } __randomize_layout;
>>   
>>   static inline bool is_sync_kiocb(struct kiocb *kiocb)
>> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
>> index a04adbc70ddf..d4593a6062ef 100644
>> --- a/include/uapi/linux/aio_abi.h
>> +++ b/include/uapi/linux/aio_abi.h
>> @@ -54,6 +54,7 @@ enum {
>>*   is valid.
>>*/
>>   #define IOCB_FLAG_RESFD(1 << 0)
>> +#define IOCB_FLAG_IOPRIO(1 << 1)
>>   
>>   /* read() from /dev/aio returns these structures. */
>>   struct io_event {

Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Adam Manzanares


On 5/3/18 11:36 AM, Jeff Moyer wrote:
> Hi, Adam,

Hello Jeff,

> 
> adam.manzana...@wdc.com writes:
> 
>> From: Adam Manzanares 
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>>
>> When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
>> newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.
>>
>> When a bio is created for an aio request by the block dev we set the priority
>> value of the bio to the user supplied value.
>>
>> See the following link for performance implications on a SATA HDD:
>> https://lkml.org/lkml/2016/12/6/495
>>
>> Given that WRR support for NVME devices has patches floating around and it 
>> was
>> discussed at LSFMM, we may soon have a lower latency storage device that can
>> make use of iopriorities. A per command iopriority interface seems timely
>> given these developments.
>>
>> If we want to avoid bloating struct kiocb, I suggest we turn the private 
>> field
>> into a union of the private and ki_ioprio field. It seems like the users of
>> the private field all use it at a point where we can yank the priority from
>> the kiocb before the private field is used. Comments and suggestions welcome.
> 
> The ioprio_set system call requires CAP_SYS_ADMIN for setting
> IOPRIO_CLASS_RT.  I think we need similar checks here.

I forgot how dangerous IOPRIO_CLASS_RT can be :). I will make a new 
function based on the checks in ioprio.c and reuse.

Thanks,
Adam

> 
> -Jeff
> 
>>
>> v2: merge patches
>>  use IOCB_FLAG_IOPRIO
>>  validate intended use with IOCB_IOPRIO
>>  add linux-api and linux-block to cc
>>
>> Signed-off-by: Adam Manzanares 
>> ---
>>   fs/aio.c | 10 ++
>>   fs/block_dev.c   |  2 ++
>>   include/linux/fs.h   |  2 ++
>>   include/uapi/linux/aio_abi.h |  1 +
>>   4 files changed, 15 insertions(+)
>>
>> diff --git a/fs/aio.c b/fs/aio.c
>> index 88d7927ffbc6..f36636d8ff2c 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1597,6 +1597,16 @@ static int io_submit_one(struct kioctx *ctx, struct 
>> iocb __user *user_iocb,
>>  req->common.ki_flags |= IOCB_EVENTFD;
>>  }
>>   
>> +if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
>> +/*
>> + * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
>> + * aio_reqprio is interpreted as an I/O scheduling
>> + * class and priority.
>> + */
>> +req->common.ki_ioprio = iocb->aio_reqprio;
>> +req->common.ki_flags |= IOCB_IOPRIO;
>> +}
>> +
>>  ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
>>  if (unlikely(ret)) {
>>  pr_debug("EINVAL: aio_rw_flags\n");
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 7ec920e27065..970bef79caa6 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  bio->bi_write_hint = iocb->ki_hint;
>>  bio->bi_private = dio;
>>  bio->bi_end_io = blkdev_bio_end_io;
>> +if (iocb->ki_flags & IOCB_IOPRIO)
>> +bio->bi_ioprio = iocb->ki_ioprio;
>>   
>>  ret = bio_iov_iter_get_pages(bio, iter);
>>  if (unlikely(ret)) {
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 760d8da1b6c7..ab63ce720305 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -292,6 +292,7 @@ enum rw_hint {
>>   #define IOCB_SYNC  (1 << 5)
>>   #define IOCB_WRITE (1 << 6)
>>   #define IOCB_NOWAIT(1 << 7)
>> +#define IOCB_IOPRIO (1 << 8)
>>   
>>   struct kiocb {
>>  struct file *ki_filp;
>> @@ -300,6 +301,7 @@ struct kiocb {
>>  void*private;
>>  int ki_flags;
>>  enum rw_hintki_hint;
>> +u16 ki_ioprio; /* See linux/ioprio.h */
>>   } __randomize_layout;
>>   
>>   static inline bool is_sync_kiocb(struct kiocb *kiocb)
>> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
>> index a04adbc70ddf..d4593a6062ef 100644
>> --- a/include/uapi/linux/aio_abi.h
>> +++ b/include/uapi/linux/aio_abi.h
>> @@ -54,6 +54,7 @@ enum {
>>*   is valid.
>>*/
>>   #define IOCB_FLAG_RESFD(1 << 0)
>> +#define IOCB_FLAG_IOPRIO(1 << 1)
>>   
>>   /* read() from /dev/aio returns these structures. */
>>   struct io_event {

Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Adam Manzanares


On 5/3/18 11:33 AM, Matthew Wilcox wrote:
> On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote:
>> If we want to avoid bloating struct kiocb, I suggest we turn the private 
>> field
>> into a union of the private and ki_ioprio field. It seems like the users of
>> the private field all use it at a point where we can yank the priority from
>> the kiocb before the private field is used. Comments and suggestions welcome.
> 
> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need
> 32 bits of ki_hint.  (currently defined values are 1-5)

I like the approach of using a u16 for the ki_hint. I'll update and 
resubmit.

>> @@ -300,6 +301,7 @@ struct kiocb {
>>  void*private;
>>  int ki_flags;
>>  enum rw_hintki_hint;
>> +u16 ki_ioprio; /* See linux/ioprio.h */
>>   } __randomize_layout;
>>   
>>   static inline bool is_sync_kiocb(struct kiocb *kiocb)

Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Adam Manzanares


On 5/3/18 11:33 AM, Matthew Wilcox wrote:
> On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote:
>> If we want to avoid bloating struct kiocb, I suggest we turn the private 
>> field
>> into a union of the private and ki_ioprio field. It seems like the users of
>> the private field all use it at a point where we can yank the priority from
>> the kiocb before the private field is used. Comments and suggestions welcome.
> 
> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need
> 32 bits of ki_hint.  (currently defined values are 1-5)

I like the approach of using a u16 for the ki_hint. I'll update and 
resubmit.

>> @@ -300,6 +301,7 @@ struct kiocb {
>>  void*private;
>>  int ki_flags;
>>  enum rw_hintki_hint;
>> +u16 ki_ioprio; /* See linux/ioprio.h */
>>   } __randomize_layout;
>>   
>>   static inline bool is_sync_kiocb(struct kiocb *kiocb)

[PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.

See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

Given that WRR support for NVME devices has patches floating around and it was
discussed at LSFMM, we may soon have a lower latency storage device that can 
make use of iopriorities. A per command iopriority interface seems timely 
given these developments. 

If we want to avoid bloating struct kiocb, I suggest we turn the private field 
into a union of the private and ki_ioprio field. It seems like the users of 
the private field all use it at a point where we can yank the priority from 
the kiocb before the private field is used. Comments and suggestions welcome.

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/aio.c | 10 ++
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 15 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..f36636d8ff2c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1597,6 +1597,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
req->common.ki_flags |= IOCB_EVENTFD;
}
 
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   req->common.ki_ioprio = iocb->aio_reqprio;
+   req->common.ki_flags |= IOCB_IOPRIO;
+   }
+
ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
if (unlikely(ret)) {
pr_debug("EINVAL: aio_rw_flags\n");
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..970bef79caa6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   if (iocb->ki_flags & IOCB_IOPRIO)
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..ab63ce720305 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -292,6 +292,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -300,6 +301,7 @@ struct kiocb {
void*private;
int ki_flags;
enum rw_hintki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..d4593a6062ef 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -54,6 +54,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



[PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.

See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

Given that WRR support for NVME devices has patches floating around and it was
discussed at LSFMM, we may soon have a lower latency storage device that can 
make use of iopriorities. A per command iopriority interface seems timely 
given these developments. 

If we want to avoid bloating struct kiocb, I suggest we turn the private field 
into a union of the private and ki_ioprio field. It seems like the users of 
the private field all use it at a point where we can yank the priority from 
the kiocb before the private field is used. Comments and suggestions welcome.

v2: merge patches
use IOCB_FLAG_IOPRIO
validate intended use with IOCB_IOPRIO
add linux-api and linux-block to cc

Signed-off-by: Adam Manzanares 
---
 fs/aio.c | 10 ++
 fs/block_dev.c   |  2 ++
 include/linux/fs.h   |  2 ++
 include/uapi/linux/aio_abi.h |  1 +
 4 files changed, 15 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..f36636d8ff2c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1597,6 +1597,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
req->common.ki_flags |= IOCB_EVENTFD;
}
 
+   if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
+   /*
+* If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
+* aio_reqprio is interpreted as an I/O scheduling
+* class and priority.
+*/
+   req->common.ki_ioprio = iocb->aio_reqprio;
+   req->common.ki_flags |= IOCB_IOPRIO;
+   }
+
ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
if (unlikely(ret)) {
pr_debug("EINVAL: aio_rw_flags\n");
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..970bef79caa6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   if (iocb->ki_flags & IOCB_IOPRIO)
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..ab63ce720305 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -292,6 +292,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -300,6 +301,7 @@ struct kiocb {
void*private;
int ki_flags;
enum rw_hintki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..d4593a6062ef 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -54,6 +54,7 @@ enum {
  *   is valid.
  */
 #define IOCB_FLAG_RESFD(1 << 0)
+#define IOCB_FLAG_IOPRIO   (1 << 1)
 
 /* read() from /dev/aio returns these structures. */
 struct io_event {
-- 
2.15.1



Re: [PATCH 2/2] fs: Add aio priority support for block_dev

2018-05-02 Thread Adam Manzanares


On 5/2/18 10:33 AM, Christoph Hellwig wrote:
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1603,6 +1603,15 @@ static int io_submit_one(struct kioctx *ctx, struct 
>> iocb __user *user_iocb,
>>  goto out_put_req;
>>  }
>>   
>> +if (req->common.ki_flags & IOCB_IOPRIO)
>> +/*
>> + * The IOCB_IOPRIO flag is set when the user supplied iocb
>> + * aio_rw_flag field has the RWF_IOPRIO flag set. If so,
>> + * aio_reqprio is interpreted as a I/O scheduling class and
>> + * priority.
>> + */
>> +req->common.ki_ioprio = iocb->aio_reqprio;
> 
> Do we need any validation of the field here?

With this patch the only consumer of the ki_ioprio field is a block 
device, which will eventually hit blk_init_request_from_bio where a
validation of the ioprio is hit.

I was hoping to get a simple patch in for the block device case and then
go back and look at other consumers of the kiocb and add ioprio support.
At that point it may be worth it to pull a check into this code.

> 
> The only other thing I am a bit worried about is bloating struct kiocb
> with a field for a relatively uncommon feature, but I can't really
> see any much better way to pass it.
> 

I'll look more closely at reusing existing fields for the next patch 
submission. I am hoping that the feature will be used more often given 
that WRR for NVME should be coming soon.

Thanks,
Adam

Re: [PATCH 2/2] fs: Add aio priority support for block_dev

2018-05-02 Thread Adam Manzanares


On 5/2/18 10:33 AM, Christoph Hellwig wrote:
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1603,6 +1603,15 @@ static int io_submit_one(struct kioctx *ctx, struct 
>> iocb __user *user_iocb,
>>  goto out_put_req;
>>  }
>>   
>> +if (req->common.ki_flags & IOCB_IOPRIO)
>> +/*
>> + * The IOCB_IOPRIO flag is set when the user supplied iocb
>> + * aio_rw_flag field has the RWF_IOPRIO flag set. If so,
>> + * aio_reqprio is interpreted as a I/O scheduling class and
>> + * priority.
>> + */
>> +req->common.ki_ioprio = iocb->aio_reqprio;
> 
> Do we need any validation of the field here?

With this patch the only consumer of the ki_ioprio field is a block 
device, which will eventually hit blk_init_request_from_bio where a
validation of the ioprio is hit.

I was hoping to get a simple patch in for the block device case and then
go back and look at other consumers of the kiocb and add ioprio support.
At that point it may be worth it to pull a check into this code.

> 
> The only other thing I am a bit worried about is bloating struct kiocb
> with a field for a relatively uncommon feature, but I can't really
> see any much better way to pass it.
> 

I'll look more closely at reusing existing fields for the next patch 
submission. I am hoping that the feature will be used more often given 
that WRR for NVME should be coming soon.

Thanks,
Adam

Re: [PATCH 1/2] fs: add RWF_IOPRIO

2018-05-02 Thread Adam Manzanares


On 5/2/18 10:32 AM, Christoph Hellwig wrote:
> On Mon, Apr 30, 2018 at 09:57:39AM -0700, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares <adam.manzana...@wdc.com>
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>>
>> When the RWF_IOPRIO flag is set then the aio_reqprio field of the iocb
>> is interpreted as an I/O scheduling class and priority.
> 
> I think this belongs into the IOCB_FLAG_* flags namespace for aio_flags
> field as it isn't a field valid for plain read/write.
> 
> Also you probably want to merge both patches as they only really
> make sense together.
> 

I will make the changes and send out a v2.

Re: [PATCH 1/2] fs: add RWF_IOPRIO

2018-05-02 Thread Adam Manzanares


On 5/2/18 10:32 AM, Christoph Hellwig wrote:
> On Mon, Apr 30, 2018 at 09:57:39AM -0700, adam.manzana...@wdc.com wrote:
>> From: Adam Manzanares 
>>
>> This is the per-I/O equivalent of the ioprio_set system call.
>>
>> When the RWF_IOPRIO flag is set then the aio_reqprio field of the iocb
>> is interpreted as an I/O scheduling class and priority.
> 
> I think this belongs into the IOCB_FLAG_* flags namespace for aio_flags
> field as it isn't a field valid for plain read/write.
> 
> Also you probably want to merge both patches as they only really
> make sense together.
> 

I will make the changes and send out a v2.

[PATCH 0/2] AIO add per-command iopriority

2018-04-30 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This patchset interprets the aio_reqprio field of an iocb as a 
per-command value iff the RWF_IOPRIO flag is set on the iocb. 
This feature is implemented for a block device, but could also be
leveraged by any consumers of the iocb.

See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

Adam Manzanares (2):
  fs: add RWF_IOPRIO
  fs: Add aio priority support for block_dev

 fs/aio.c| 9 +
 fs/block_dev.c  | 1 +
 include/linux/fs.h  | 4 
 include/uapi/linux/fs.h | 5 -
 4 files changed, 18 insertions(+), 1 deletion(-)

-- 
2.15.1



[PATCH 1/2] fs: add RWF_IOPRIO

2018-04-30 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This is the per-I/O equivalent of the ioprio_set system call.

When the RWF_IOPRIO flag is set then the aio_reqprio field of the iocb
is interpreted as an I/O scheduling class and priority.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 include/linux/fs.h  | 4 
 include/uapi/linux/fs.h | 5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..32614eb72a4a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -292,6 +292,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -300,6 +301,7 @@ struct kiocb {
void*private;
int ki_flags;
enum rw_hintki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -3243,6 +3245,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, 
rwf_t flags)
ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
if (flags & RWF_APPEND)
ki->ki_flags |= IOCB_APPEND;
+   if (flags & RWF_IOPRIO)
+   ki->ki_flags |= IOCB_IOPRIO;
return 0;
 }
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313fabd7..c8f69a00b566 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -380,8 +380,11 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO O_APPEND */
 #define RWF_APPEND ((__force __kernel_rwf_t)0x0010)
 
+/* per-IO IOPRIO */
+#define RWF_IOPRIO ((__force __kernel_rwf_t)0x0020)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-RWF_APPEND)
+RWF_APPEND | RWF_IOPRIO)
 
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.15.1



[PATCH 0/2] AIO add per-command iopriority

2018-04-30 Thread adam . manzanares
From: Adam Manzanares 

This patchset interprets the aio_reqprio field of an iocb as a 
per-command value iff the RWF_IOPRIO flag is set on the iocb. 
This feature is implemented for a block device, but could also be
leveraged by any consumers of the iocb.

See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495

Adam Manzanares (2):
  fs: add RWF_IOPRIO
  fs: Add aio priority support for block_dev

 fs/aio.c| 9 +
 fs/block_dev.c  | 1 +
 include/linux/fs.h  | 4 
 include/uapi/linux/fs.h | 5 -
 4 files changed, 18 insertions(+), 1 deletion(-)

-- 
2.15.1



[PATCH 1/2] fs: add RWF_IOPRIO

2018-04-30 Thread adam . manzanares
From: Adam Manzanares 

This is the per-I/O equivalent of the ioprio_set system call.

When the RWF_IOPRIO flag is set then the aio_reqprio field of the iocb
is interpreted as an I/O scheduling class and priority.

Signed-off-by: Adam Manzanares 
---
 include/linux/fs.h  | 4 
 include/uapi/linux/fs.h | 5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..32614eb72a4a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -292,6 +292,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_IOPRIO(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
@@ -300,6 +301,7 @@ struct kiocb {
void*private;
int ki_flags;
enum rw_hintki_hint;
+   u16 ki_ioprio; /* See linux/ioprio.h */
 } __randomize_layout;
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -3243,6 +3245,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, 
rwf_t flags)
ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
if (flags & RWF_APPEND)
ki->ki_flags |= IOCB_APPEND;
+   if (flags & RWF_IOPRIO)
+   ki->ki_flags |= IOCB_IOPRIO;
return 0;
 }
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313fabd7..c8f69a00b566 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -380,8 +380,11 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO O_APPEND */
 #define RWF_APPEND ((__force __kernel_rwf_t)0x0010)
 
+/* per-IO IOPRIO */
+#define RWF_IOPRIO ((__force __kernel_rwf_t)0x0020)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-RWF_APPEND)
+RWF_APPEND | RWF_IOPRIO)
 
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.15.1



[PATCH 2/2] fs: Add aio priority support for block_dev

2018-04-30 Thread adam . manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

When the IOCB_IOPRIO flag is set because the user supplied iocb
has the RWF_IOPRIO flag is set then we set the priority value of
the kiocb from the iocb.

When a bio is created for an aio request by the block dev we set the
priority value of the bio to the user supplied value.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 fs/aio.c   | 9 +
 fs/block_dev.c | 1 +
 2 files changed, 10 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..47591f9e7ba3 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1603,6 +1603,15 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
goto out_put_req;
}
 
+   if (req->common.ki_flags & IOCB_IOPRIO)
+   /*
+* The IOCB_IOPRIO flag is set when the user supplied iocb
+* aio_rw_flag field has the RWF_IOPRIO flag set. If so,
+* aio_reqprio is interpreted as a I/O scheduling class and
+* priority.
+*/
+   req->common.ki_ioprio = iocb->aio_reqprio;
+
ret = put_user(KIOCB_KEY, _iocb->aio_key);
if (unlikely(ret)) {
pr_debug("EFAULT: aio_key\n");
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..da1e94d2bb75 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



[PATCH 2/2] fs: Add aio priority support for block_dev

2018-04-30 Thread adam . manzanares
From: Adam Manzanares 

When the IOCB_IOPRIO flag is set because the user supplied iocb
has the RWF_IOPRIO flag is set then we set the priority value of
the kiocb from the iocb.

When a bio is created for an aio request by the block dev we set the
priority value of the bio to the user supplied value.

Signed-off-by: Adam Manzanares 
---
 fs/aio.c   | 9 +
 fs/block_dev.c | 1 +
 2 files changed, 10 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..47591f9e7ba3 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1603,6 +1603,15 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
goto out_put_req;
}
 
+   if (req->common.ki_flags & IOCB_IOPRIO)
+   /*
+* The IOCB_IOPRIO flag is set when the user supplied iocb
+* aio_rw_flag field has the RWF_IOPRIO flag set. If so,
+* aio_reqprio is interpreted as a I/O scheduling class and
+* priority.
+*/
+   req->common.ki_ioprio = iocb->aio_reqprio;
+
ret = put_user(KIOCB_KEY, _iocb->aio_key);
if (unlikely(ret)) {
pr_debug("EFAULT: aio_key\n");
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..da1e94d2bb75 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,6 +355,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-- 
2.15.1



Re: [RFC PATCH] fs: block dev aio request priority support

2017-05-23 Thread Adam Manzanares
The 05/23/2017 10:46, Jan Kara wrote:
> On Mon 22-05-17 10:19:33, adam.manzana...@wdc.com wrote:
> > From: Adam Manzanares <adam.manzana...@wdc.com>
> > 
> > Map the aio_reqprio to the bio priority field at
> > the point the bio is created from the aio iocb.
> > 
> > The aio_reqprio field of iocb is used as a kernel IO class and priority
> > iff the IOCB_FLAG_IOPRIO flag is set on the iocb.
> > 
> > Late last year device IO priority support was introduced to reduce 
> > application
> > tail latency when iopriority information was set on the process [1]. This 
> > patch mapped iopriority information to block io requests. This information 
> > could be leveraged by device drivers to build device specific prioritized 
> > commands.
> > 
> > The iopriority is set on the iocontext which is a structure associated with 
> > a process. There exists a system call to set this iopriority information on 
> > a process, but I believe it would be useful to also have a mechanism to set 
> > priority on a per io command basis. 
> > 
> > The aio iocb has a field for the request priority which is currently not 
> > used 
> > within the kernel. This patch leverages this field to pass a per command 
> > iopriority value to devices. This work leverages the work in the previously 
> > referenced patch [1]. When the bio is generated from the iocb we copy the 
> > iocb iopriority information into the bio, which is eventually turned into a 
> > request which also gets a copy of the iopriority information. 
> > 
> > To demonstrate how to use this feature I modified fio to use the new aio 
> > feature. The modification to fio can be found at [2] and the new options 
> > are cmndprioclass and cmndprio.
> > 
> > [1] https://lkml.org/lkml/2016/12/6/495
> > [2] https://github.com/nmtadam/fio/tree/cmnd-prio.v2
> > 
> > Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
> 
> Using aio_flags is problematic because we never checked this for containing
> only expected flags. So userspace may be leaving this flag set
> unintentionally and currently it doesn't have any adverse effects. So it
> was decided to use a reserved word in struct iocb for new flags. And
> Goldwyn already did this as a part of his series [1] together with other IO
> flags. If you want, you can lobby for merging this particular patch earlier
> :).

Thanks for pointing this patch out, I missed it. I will base my work off of 
this patch given that adding new flags to aio_flags is problematic. 

> 
>   Honza
> 
> [1] https://patchwork.kernel.org/patch/9722865/
> 
> 
> > ---
> >  fs/aio.c | 9 +
> >  fs/block_dev.c   | 1 +
> >  include/linux/fs.h   | 1 +
> >  include/uapi/linux/aio_abi.h | 6 ++
> >  4 files changed, 17 insertions(+)
> > 
> > diff --git a/fs/aio.c b/fs/aio.c
> > index f52d925..a75a279 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -1568,6 +1568,15 @@ static int io_submit_one(struct kioctx *ctx, struct 
> > iocb __user *user_iocb,
> > req->common.ki_pos = iocb->aio_offset;
> > req->common.ki_complete = aio_complete;
> > req->common.ki_flags = iocb_flags(req->common.ki_filp);
> > +   if (iocb->aio_flags & IOCB_FLAG_IOPRIO)
> > +   /*
> > +* If the IOCB_FLAG_IOPRIO flag of aio_flags is set,
> > +* then the aio_reqprio is interpreted as a I/O
> > +* scheduling class and priority. This is then set
> > +* on the bio that is created from this request, which
> > +* enables the priority to be passed to device drivers.
> > +*/
> > +   req->common.ki_ioprio = iocb->aio_reqprio;
> >  
> > if (iocb->aio_flags & IOCB_FLAG_RESFD) {
> > /*
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 2eca00e..20d18db 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -360,6 +360,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
> > *iter, int nr_pages)
> > bio->bi_iter.bi_sector = pos >> 9;
> > bio->bi_private = dio;
> > bio->bi_end_io = blkdev_bio_end_io;
> > +   bio->bi_ioprio = iocb->ki_ioprio;
> >  
> > ret = bio_iov_iter_get_pages(bio, iter);
> > if (unlikely(ret)) {
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 866c955..83135f0 100644
> > ---

Re: [RFC PATCH] fs: block dev aio request priority support

2017-05-23 Thread Adam Manzanares
The 05/23/2017 10:46, Jan Kara wrote:
> On Mon 22-05-17 10:19:33, adam.manzana...@wdc.com wrote:
> > From: Adam Manzanares 
> > 
> > Map the aio_reqprio to the bio priority field at
> > the point the bio is created from the aio iocb.
> > 
> > The aio_reqprio field of iocb is used as a kernel IO class and priority
> > iff the IOCB_FLAG_IOPRIO flag is set on the iocb.
> > 
> > Late last year device IO priority support was introduced to reduce 
> > application
> > tail latency when iopriority information was set on the process [1]. This 
> > patch mapped iopriority information to block io requests. This information 
> > could be leveraged by device drivers to build device specific prioritized 
> > commands.
> > 
> > The iopriority is set on the iocontext which is a structure associated with 
> > a process. There exists a system call to set this iopriority information on 
> > a process, but I believe it would be useful to also have a mechanism to set 
> > priority on a per io command basis. 
> > 
> > The aio iocb has a field for the request priority which is currently not 
> > used 
> > within the kernel. This patch leverages this field to pass a per command 
> > iopriority value to devices. This work leverages the work in the previously 
> > referenced patch [1]. When the bio is generated from the iocb we copy the 
> > iocb iopriority information into the bio, which is eventually turned into a 
> > request which also gets a copy of the iopriority information. 
> > 
> > To demonstrate how to use this feature I modified fio to use the new aio 
> > feature. The modification to fio can be found at [2] and the new options 
> > are cmndprioclass and cmndprio.
> > 
> > [1] https://lkml.org/lkml/2016/12/6/495
> > [2] https://github.com/nmtadam/fio/tree/cmnd-prio.v2
> > 
> > Signed-off-by: Adam Manzanares 
> 
> Using aio_flags is problematic because we never checked this for containing
> only expected flags. So userspace may be leaving this flag set
> unintentionally and currently it doesn't have any adverse effects. So it
> was decided to use a reserved word in struct iocb for new flags. And
> Goldwyn already did this as a part of his series [1] together with other IO
> flags. If you want, you can lobby for merging this particular patch earlier
> :).

Thanks for pointing this patch out, I missed it. I will base my work off of 
this patch given that adding new flags to aio_flags is problematic. 

> 
>   Honza
> 
> [1] https://patchwork.kernel.org/patch/9722865/
> 
> 
> > ---
> >  fs/aio.c | 9 +
> >  fs/block_dev.c   | 1 +
> >  include/linux/fs.h   | 1 +
> >  include/uapi/linux/aio_abi.h | 6 ++
> >  4 files changed, 17 insertions(+)
> > 
> > diff --git a/fs/aio.c b/fs/aio.c
> > index f52d925..a75a279 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -1568,6 +1568,15 @@ static int io_submit_one(struct kioctx *ctx, struct 
> > iocb __user *user_iocb,
> > req->common.ki_pos = iocb->aio_offset;
> > req->common.ki_complete = aio_complete;
> > req->common.ki_flags = iocb_flags(req->common.ki_filp);
> > +   if (iocb->aio_flags & IOCB_FLAG_IOPRIO)
> > +   /*
> > +* If the IOCB_FLAG_IOPRIO flag of aio_flags is set,
> > +* then the aio_reqprio is interpreted as a I/O
> > +* scheduling class and priority. This is then set
> > +* on the bio that is created from this request, which
> > +* enables the priority to be passed to device drivers.
> > +*/
> > +   req->common.ki_ioprio = iocb->aio_reqprio;
> >  
> > if (iocb->aio_flags & IOCB_FLAG_RESFD) {
> > /*
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 2eca00e..20d18db 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -360,6 +360,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
> > *iter, int nr_pages)
> > bio->bi_iter.bi_sector = pos >> 9;
> > bio->bi_private = dio;
> > bio->bi_end_io = blkdev_bio_end_io;
> > +   bio->bi_ioprio = iocb->ki_ioprio;
> >  
> > ret = bio_iov_iter_get_pages(bio, iter);
> > if (unlikely(ret)) {
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 866c955..83135f0 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
>

[PATCH v2] mpt3sas: Recognize and act on iopriority info

2016-12-12 Thread Adam Manzanares
From: Adam Manzanares <adam.manzana...@wdc.com>

This patch adds support for request iopriority handling in the
mpt3sas layer. This works only when a ATA device is behind the
SATL. The ATA device also has to indicate that it supports
command priorities in the identify information that is pulled from
the SATL.

This patch depends on block: Add iocontext priority to request

v2:
 - Get iopriority class only if sysfs variable is set.

Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  6 +
 drivers/scsi/mpt3sas/mpt3sas_ctl.c   | 43 ++--
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 34 +++-
 3 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 3e71bc1..354cdc7 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -402,6 +402,9 @@ struct MPT3SAS_DEVICE {
u8  block;
u8  tlr_snoop_check;
u8  ignore_delay_remove;
+   /* Iopriority Command Handling */
+   u8  ncq_prio_enable;
+
 };
 
 #define MPT3_CMD_NOT_USED  0x8000  /* free */
@@ -1449,4 +1452,7 @@ mpt3sas_setup_direct_io(struct MPT3SAS_ADAPTER *ioc, 
struct scsi_cmnd *scmd,
struct _raid_device *raid_device, Mpi2SCSIIORequest_t *mpi_request,
u16 smid);
 
+/* NCQ Prio Handling Check */
+bool scsih_ncq_prio_supp(struct scsi_device *sdev);
+
 #endif /* MPT3SAS_BASE_H_INCLUDED */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c 
b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
index 26cdc12..3ad8339 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
@@ -3290,8 +3290,6 @@ static DEVICE_ATTR(diag_trigger_mpi, S_IRUGO | S_IWUSR,
 
 /*** diagnostic trigger suppport *** END /
 
-
-
 /*/
 
 struct device_attribute *mpt3sas_host_attrs[] = {
@@ -3367,9 +3365,50 @@ _ctl_device_handle_show(struct device *dev, struct 
device_attribute *attr,
 }
 static DEVICE_ATTR(sas_device_handle, S_IRUGO, _ctl_device_handle_show, NULL);
 
+/**
+ * _ctl_device_ncq_io_prio_show - send prioritized io commands to device
+ * @dev - pointer to embedded device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read/write' sdev attribute, only works with SATA
+ */
+static ssize_t
+_ctl_device_ncq_prio_enable_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
+   struct MPT3SAS_DEVICE *sas_device_priv_data = sdev->hostdata;
+
+   return snprintf(buf, PAGE_SIZE, "%d\n",
+   sas_device_priv_data->ncq_prio_enable);
+}
+
+static ssize_t
+_ctl_device_ncq_prio_enable_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
+   struct MPT3SAS_DEVICE *sas_device_priv_data = sdev->hostdata;
+   bool ncq_prio_enable = 0;
+
+   if (kstrtobool(buf, _prio_enable))
+   return -EINVAL;
+
+   if (!scsih_ncq_prio_supp(sdev))
+   return -EINVAL;
+
+   sas_device_priv_data->ncq_prio_enable = ncq_prio_enable;
+   return strlen(buf);
+}
+static DEVICE_ATTR(sas_ncq_prio_enable, S_IRUGO | S_IWUSR,
+  _ctl_device_ncq_prio_enable_show,
+  _ctl_device_ncq_prio_enable_store);
+
 struct device_attribute *mpt3sas_dev_attrs[] = {
_attr_sas_address,
_attr_sas_device_handle,
+   _attr_sas_ncq_prio_enable,
NULL,
 };
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 209a969..a6d1045 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4030,6 +4030,8 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
struct MPT3SAS_DEVICE *sas_device_priv_data;
struct MPT3SAS_TARGET *sas_target_priv_data;
struct _raid_device *raid_device;
+   struct request *rq = scmd->request;
+   int class;
Mpi2SCSIIORequest_t *mpi_request;
u32 mpi_control;
u16 smid;
@@ -4085,7 +4087,12 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
 
/* set tags */
mpi_control |= MPI2_SCSIIO_CONTROL_SIMPLEQ;
-
+   /* NCQ Prio supported, make sure control indicated high priority */
+   if (sas_device_priv_data->ncq_prio_enable) {
+   class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
+   if (class == IOPRIO_CLASS_RT)
+   mpi_control |= 1 << MPI2_SCSIIO_CONTROL_CMDPRI_SHIFT;
+   }
/* Make sure Device is not raid volume.
 * We do not expose raid functionality to upper layer 

[PATCH v2] mpt3sas: Recognize and act on iopriority info

2016-12-12 Thread Adam Manzanares
From: Adam Manzanares 

This patch adds support for request iopriority handling in the
mpt3sas layer. This works only when a ATA device is behind the
SATL. The ATA device also has to indicate that it supports
command priorities in the identify information that is pulled from
the SATL.

This patch depends on block: Add iocontext priority to request

v2:
 - Get iopriority class only if sysfs variable is set.

Signed-off-by: Adam Manzanares 
---
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  6 +
 drivers/scsi/mpt3sas/mpt3sas_ctl.c   | 43 ++--
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 34 +++-
 3 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 3e71bc1..354cdc7 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -402,6 +402,9 @@ struct MPT3SAS_DEVICE {
u8  block;
u8  tlr_snoop_check;
u8  ignore_delay_remove;
+   /* Iopriority Command Handling */
+   u8  ncq_prio_enable;
+
 };
 
 #define MPT3_CMD_NOT_USED  0x8000  /* free */
@@ -1449,4 +1452,7 @@ mpt3sas_setup_direct_io(struct MPT3SAS_ADAPTER *ioc, 
struct scsi_cmnd *scmd,
struct _raid_device *raid_device, Mpi2SCSIIORequest_t *mpi_request,
u16 smid);
 
+/* NCQ Prio Handling Check */
+bool scsih_ncq_prio_supp(struct scsi_device *sdev);
+
 #endif /* MPT3SAS_BASE_H_INCLUDED */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c 
b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
index 26cdc12..3ad8339 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
@@ -3290,8 +3290,6 @@ static DEVICE_ATTR(diag_trigger_mpi, S_IRUGO | S_IWUSR,
 
 /*** diagnostic trigger suppport *** END /
 
-
-
 /*/
 
 struct device_attribute *mpt3sas_host_attrs[] = {
@@ -3367,9 +3365,50 @@ _ctl_device_handle_show(struct device *dev, struct 
device_attribute *attr,
 }
 static DEVICE_ATTR(sas_device_handle, S_IRUGO, _ctl_device_handle_show, NULL);
 
+/**
+ * _ctl_device_ncq_io_prio_show - send prioritized io commands to device
+ * @dev - pointer to embedded device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read/write' sdev attribute, only works with SATA
+ */
+static ssize_t
+_ctl_device_ncq_prio_enable_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
+   struct MPT3SAS_DEVICE *sas_device_priv_data = sdev->hostdata;
+
+   return snprintf(buf, PAGE_SIZE, "%d\n",
+   sas_device_priv_data->ncq_prio_enable);
+}
+
+static ssize_t
+_ctl_device_ncq_prio_enable_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
+   struct MPT3SAS_DEVICE *sas_device_priv_data = sdev->hostdata;
+   bool ncq_prio_enable = 0;
+
+   if (kstrtobool(buf, _prio_enable))
+   return -EINVAL;
+
+   if (!scsih_ncq_prio_supp(sdev))
+   return -EINVAL;
+
+   sas_device_priv_data->ncq_prio_enable = ncq_prio_enable;
+   return strlen(buf);
+}
+static DEVICE_ATTR(sas_ncq_prio_enable, S_IRUGO | S_IWUSR,
+  _ctl_device_ncq_prio_enable_show,
+  _ctl_device_ncq_prio_enable_store);
+
 struct device_attribute *mpt3sas_dev_attrs[] = {
_attr_sas_address,
_attr_sas_device_handle,
+   _attr_sas_ncq_prio_enable,
NULL,
 };
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 209a969..a6d1045 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4030,6 +4030,8 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
struct MPT3SAS_DEVICE *sas_device_priv_data;
struct MPT3SAS_TARGET *sas_target_priv_data;
struct _raid_device *raid_device;
+   struct request *rq = scmd->request;
+   int class;
Mpi2SCSIIORequest_t *mpi_request;
u32 mpi_control;
u16 smid;
@@ -4085,7 +4087,12 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
 
/* set tags */
mpi_control |= MPI2_SCSIIO_CONTROL_SIMPLEQ;
-
+   /* NCQ Prio supported, make sure control indicated high priority */
+   if (sas_device_priv_data->ncq_prio_enable) {
+   class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
+   if (class == IOPRIO_CLASS_RT)
+   mpi_control |= 1 << MPI2_SCSIIO_CONTROL_CMDPRI_SHIFT;
+   }
/* Make sure Device is not raid volume.
 * We do not expose raid functionality to upper layer for warpdrive.
 */
@@ -9031,6 +9038,31 @@ scsih_pci_mmio_enabled(st

  1   2   >