On Wed, Jun 14, 2017 at 09:45:04PM -0600, Jens Axboe wrote:
> No functional changes in this patch, just in preparation for
> allowing applications to pass in hints about data life times
> for writes.
>
> Pack the i_write_hint field into a 2-byte hole, so we don't grow
> the size of the inode.
A
On 06/15/2017 01:40 AM, Michael Halcrow wrote:
> Several file systems either have already implemented encryption or are
> in the process of doing so. This addresses usability and storage
> isolation requirements on mobile devices and in multi-tenant
> environments.
>
> While distinct keys locked
On Wed, Jun 14, 2017 at 09:45:03PM -0600, Jens Axboe wrote:
> Useful to verify that things are working the way they should.
> Reading the file will return number of kb written to each
> stream. Writing the file will reset the statistics. No care
> is taken to ensure that we don't race on updates.
On Wed, Jun 14, 2017 at 01:24:43PM -0400, Jeff Layton wrote:
> In this smaller set, it's only really used for DAX.
DAX only is implemented by three filesystems, please just fix them
up in one go.
> sync_file_range: ->fsync isn't called directly there, and I think we
> probably want similar
On Wed, Jun 14, 2017 at 09:45:01PM -0600, Jens Axboe wrote:
> A new iteration of this patchset, previously known as write streams.
> As before, this patchset aims at enabling applications split up
> writes into separate streams, based on the perceived life time
> of the data written. This is
I think Darrick has a very valid concern here - using RWF_* flags
to affect inode or fd-wide state is extremely counter productive.
Combined with the fact that the streams need a special setup in NVMe
I'm tempted to say that the interface really should be fadvise or
similar, which would keep the
The RPMB partition on the eMMC devices is a special area used
for storing cryptographically safe information signed by a
special secret key. To write and read records from this special
area, authentication is needed.
The RPMB area is *only* and *exclusively* accessed using
ioctl():s from
Instead of passing a struct mmc_blk_data * to mmc_blk_part_switch()
let's pass the actual partition type we want to switch to. This
is necessary in order not to have a block device with a backing
mmc_blk_data and request queue and all for every hardware partition,
such as RPMB.
Signed-off-by:
Instead of passing a block device to
mmc_blk_ioctl[_multi]_cmd(), let's pass struct mmc_blk_data()
so we operate ioctl()s on the MMC block device representation
rather than the vanilla block device.
This saves a little duplicated code and makes it possible to
issue ioctl()s not targeted for a
This function is used by the block layer queue to bail out of
requests if the current request is an RPMB request.
However this makes no sense: RPMB is only used from ioctl():s,
there are no RPMB accesses coming from the block layer.
An RPMB ioctl() always switches to the RPMB partition and
then
mmc_blk_ioctl() calls either mmc_blk_ioctl_cmd() or
mmc_blk_ioctl_multi_cmd() and each of these make the same
check. Factor it into a new helper function, call it on
both branches of the switch() statement and save a chunk
of duplicate code.
Cc: Shawn Lin
Signed-off-by:
Looking for ways to get rid of the RPMB "block device" and the
extra block queue. This is one approach, I don't know if it will
stick, let's discuss it, especially the RFC patch.
Patches 1,2,3 can be applied as cleanups unless they collide with
something else.
Patch 5 is a consequence of the
On Thu, 2017-06-15 at 01:22 -0700, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 01:24:43PM -0400, Jeff Layton wrote:
> > In this smaller set, it's only really used for DAX.
>
> DAX only is implemented by three filesystems, please just fix them
> up in one go.
>
Ok.
> > sync_file_range:
On Wed, Jun 14, 2017 at 09:15:03PM -0700, Darrick J. Wong wrote:
> > + */
> > +#define RWF_WRITE_LIFE_SHIFT 4
> > +#define RWF_WRITE_LIFE_MASK0x00f0 /* 4 bits of stream
> > ID */
> > +#define RWF_WRITE_LIFE_SHORT (1 << RWF_WRITE_LIFE_SHIFT)
> >
On Mon, Jun 12, 2017 at 11:14:31PM -0700, Christoph Hellwig wrote:
> On Mon, Jun 12, 2017 at 05:38:13PM -0500, Goldwyn Rodrigues wrote:
> > We had FS_NOWAIT in filesystem type flags (in v3), but retracted it
> > later in v4.
>
> A per-fs flag is wrong as file_operation may have different
>
Hi Shaohua,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.12-rc5]
[cannot apply to driver-core/driver-core-testing block/for-next next-20170615]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com
On 06/15/2017 02:12 AM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 09:45:01PM -0600, Jens Axboe wrote:
>> A new iteration of this patchset, previously known as write streams.
>> As before, this patchset aims at enabling applications split up
>> writes into separate streams, based on the
On 06/15/2017 02:17 AM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 09:45:04PM -0600, Jens Axboe wrote:
>> No functional changes in this patch, just in preparation for
>> allowing applications to pass in hints about data life times
>> for writes.
>>
>> Pack the i_write_hint field into a
On 06/15/2017 02:16 AM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 09:45:03PM -0600, Jens Axboe wrote:
>> Useful to verify that things are working the way they should.
>> Reading the file will return number of kb written to each
>> stream. Writing the file will reset the statistics. No
On 06/15/2017 02:19 AM, Christoph Hellwig wrote:
> I think Darrick has a very valid concern here - using RWF_* flags
> to affect inode or fd-wide state is extremely counter productive.
>
> Combined with the fact that the streams need a special setup in NVMe
> I'm tempted to say that the interface
From: Goldwyn Rodrigues
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Goldwyn Rodrigues
---
fs/read_write.c| 12 +++-
include/linux/fs.h | 14 ++
2 files changed, 17 insertions(+),
From: Goldwyn Rodrigues
A new bio operation flag REQ_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.
Stacked devices such as md (the ones with
From: Goldwyn Rodrigues
If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.
IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.
From: Goldwyn Rodrigues
RWF_NOWAIT informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered,
or would block while allocating requests while performing
direct I/O.
RWF_NOWAIT is translated to IOCB_NOWAIT for
From: Goldwyn Rodrigues
filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.
Reviewed-by: Christoph Hellwig
Reviewed-by:
From: Goldwyn Rodrigues
aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will
carry the RWF_* flags. We cannot use aio_flags because they are not
checked for validity which may break existing applications.
Note, the only place RWF_HIPRI comes in effect is
From: Goldwyn Rodrigues
Find out if the write will trigger a wait due to writeback. If yes,
return -EAGAIN.
Return -EINVAL for buffered AIO: there are multiple causes of
delay such as page locks, dirty throttling logic, page loading
from disk etc. which cannot be taken care
This series adds nonblocking feature to asynchronous I/O writes.
io_submit() can be delayed because of a number of reason:
- Block allocation for files
- Data writebacks for direct I/O
- Sleeping because of waiting to acquire i_rwsem
- Congested block device
The goal of the patch series is to
From: Goldwyn Rodrigues
IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps.
This is used by XFS in the XFS patch.
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Goldwyn Rodrigues
---
fs/iomap.c|
From: Goldwyn Rodrigues
Return EAGAIN if any of the following checks fail for direct I/O:
+ i_rwsem is lockable
+ Writing beyond end of file (will trigger allocation)
+ Blocks are not allocated at the write location
Signed-off-by: Goldwyn Rodrigues
From: Goldwyn Rodrigues
Return EAGAIN if any of the following checks fail
+ i_rwsem is not lockable
+ NODATACOW or PREALLOC is not set
+ Cannot nocow at the desired location
+ Writing beyond end of file which is not allocated
Acked-by: David Sterba
On 06/15/2017 07:24 PM, Michael Halcrow wrote:
...
>> If this is accepted, we basically allow attacker to trick system to
>> write plaintext to media just by setting this flag. This must never
>> ever happen with FDE - BY DESIGN.
>
> That's an important point. This expands the attack surface to
Hi Linus,
Just a single fix this week, fixing a regression introduced in this
series. When we put the final reference to the queue, we may need to
block. Ensure that we can safely do so. From Bart.
Please pull!
git://git.kernel.dk/linux-block.git for-linus
On Thu, 15 Jun 2017 10:59:52 -0500 Goldwyn Rodrigues wrote:
> This series adds nonblocking feature to asynchronous I/O writes.
> io_submit() can be delayed because of a number of reason:
> - Block allocation for files
> - Data writebacks for direct I/O
> - Sleeping because
On Thu, 15 Jun 2017 10:59:54 -0500 Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues
>
> filemap_range_has_page() return true if the file's mapping has
> a page within the range mentioned. This function will be used
> to check if a write() call will cause
Hi Shaohua,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.12-rc5 next-20170615]
[cannot apply to driver-core/driver-core-testing block/for-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com
From: Shaohua Li
Hi,
Currently blktrace isn't cgroup aware. blktrace prints out task name of current
context, but the task of current context isn't always in the cgroup where the
BIO comes from. We can't use task name to find out IO cgroup. For example,
Writeback BIOs always comes
From: Shaohua Li
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Signed-off-by: Shaohua Li
---
fs/kernfs/dir.c|
From: Shaohua Li
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from
From: Shaohua Li
bio_free isn't a good place to free cgroup/integrity info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should
From: Shaohua Li
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to
On Thu, Jun 15, 2017 at 09:33:39AM +0200, Milan Broz wrote:
> On 06/15/2017 01:40 AM, Michael Halcrow wrote:
> > Several file systems either have already implemented encryption or are
> > in the process of doing so. This addresses usability and storage
> > isolation requirements on mobile devices
From: Shaohua Li
Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.
To make the API lock free, kernfs node is freed in RCU context. And we
depend on kernfs_node count/ino number to filter stale kernfs nodes.
Signed-off-by: Shaohua
From: Shaohua Li
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.
Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup'
From: Shaohua Li
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata.
From: Shaohua Li
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Signed-off-by: Shaohua Li
From: Shaohua Li
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.
with the option enabled, blktrace will output something like this:
dd-1353 [007] d..2
From: Shaohua Li
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle,
we only need export the inode number and
From: Shaohua Li
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
On Thu, 15 Jun 2017 16:51:41 -0500 Goldwyn Rodrigues wrote:
> > I have only minor quibbles - I'll grab the patch series for some -next
> > testing (at least).
> >
>
> I agree to the quibbles you have on patch 02/10. Should I send the
> entire fixed series, just the 02/10
After the issue with LO_HI_LONG definition on x86_64-linux-gnu, I planed to add
this patch to check the above patch for correct check for invalid flags (which
would also have show this issue with LO_HI_LONG being used on p{read,write}v2).
However it seems to trigger what I think it is a kernel
Hi Zanella,
On 06/15/2017 04:10 PM, Adhemerval Zanella wrote:
> After the issue with LO_HI_LONG definition on x86_64-linux-gnu, I planed to
> add
> this patch to check the above patch for correct check for invalid flags (which
> would also have show this issue with LO_HI_LONG being used on
>
On 06/15/2017 01:25 PM, Andrew Morton wrote:
> On Thu, 15 Jun 2017 10:59:52 -0500 Goldwyn Rodrigues wrote:
>
>> This series adds nonblocking feature to asynchronous I/O writes.
>> io_submit() can be delayed because of a number of reason:
>> - Block allocation for files
>> -
Hi,
On 2017/6/15 20:12, Linus Walleij wrote:
mmc_blk_ioctl() calls either mmc_blk_ioctl_cmd() or
mmc_blk_ioctl_multi_cmd() and each of these make the same
check. Factor it into a new helper function, call it on
both branches of the switch() statement and save a chunk
of duplicate code.
Cc:
On 06/15/2017 05:01 PM, Andrew Morton wrote:
> On Thu, 15 Jun 2017 16:51:41 -0500 Goldwyn Rodrigues wrote:
>
>>> I have only minor quibbles - I'll grab the patch series for some -next
>>> testing (at least).
>>>
>>
>> I agree to the quibbles you have on patch 02/10. Should I
When a loop device is being shutdown the backing file is
closed with fput(). This is different from how close(2)
closes files - it uses filp_close().
The difference is important for filesystems which provide a ->flush
file operation such as NFS. NFS assumes a flush will always
be called on last
Hi Jens,
one of these is a resend of a patch I sent a while back.
The other is new - loop closes files differently from close()
and in a way that can confuse NFS.
Thanks,
NeilBrown
---
NeilBrown (2):
loop: use filp_close() rather than fput()
loop: Add PF_LESS_THROTTLE to
When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file. This double-throttling can trigger
positive feedback loops that create significant delays. The
On Thu, May 11 2017, NeilBrown wrote:
> On Tue, May 02 2017, NeilBrown wrote:
>
>> This is a revision of my series of patches working
>> towards removing the bioset work queues.
>
> Hi Jens,
> could I get some feed-back about your thoughts on this series?
> Will you apply it? When? Do I need
On Thu, Jun 15, 2017 at 06:42:12AM -0400, Jeff Layton wrote:
> Correct.
>
> But if there is a data writeback error, should we report an error on all
> open fds at that time (like we will for fsync)?
We should in theory, but I don't see how to properly do it. In addition
sync_file_range just
On 06/15/2017 08:21 AM, Jens Axboe wrote:
> On 06/15/2017 02:19 AM, Christoph Hellwig wrote:
>> I think Darrick has a very valid concern here - using RWF_* flags
>> to affect inode or fd-wide state is extremely counter productive.
>>
>> Combined with the fact that the streams need a special setup
On Thu, Jun 15, 2017 at 12:11:58PM +0100, Al Viro wrote:
> Which flags are you talking about? aio ones? AFAICS, it's the same
> kind of thing as "can we lseek?" or "can we read/pread?", etc.
> What would that field look like? Note that some of those might depend
> upon the flags passed to
On Thu, 2017-06-15 at 07:57 -0700, Christoph Hellwig wrote:
> On Thu, Jun 15, 2017 at 06:42:12AM -0400, Jeff Layton wrote:
> > Correct.
> >
> > But if there is a data writeback error, should we report an error on all
> > open fds at that time (like we will for fsync)?
>
> We should in theory,
Useful to verify that things are working the way they should.
Reading the file will return number of kb written to each
stream. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.
Drivers will write to q->stream_writes[] if they handle a stream.
A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:
- For NVMe, this feature is ratified
We map the WRITE_HINT_* life time hints to the internal flags.
Drivers can then, in turn, map those flags to a suitable stream
type.
Signed-off-by: Jens Axboe
---
block/bio.c | 16
include/linux/bio.h | 1 +
include/linux/blk_types.h | 5
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/ext4/page-io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1a82138ba739..764bf0ddecd4 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/buffer.c | 14 +-
fs/mpage.c | 1 +
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..3faf73a71d4b 100644
--- a/fs/buffer.c
Reviewed-by: Andreas Dilger
Signed-off-by: Chris Mason
Signed-off-by: Jens Axboe
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3619e010005..2bc2dfca87c2 100644
---
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/xfs/xfs_aops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 09af0f7cd55e..fe11fe47d235 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@
No functional changes in this patch, just in preparation for
allowing applications to pass in hints about data life times
for writes. Set aside 3 bits for carrying hint information
in the inode flags.
Adds the public hints as well, which are:
WRITE_HINT_NONE No hints about write life
We have a pwritev2(2) interface based on passing in flags. Add an
fcntl interface for querying these flags, and also for setting them
as well:
F_GET_WRITE_LIFEReturns one of the valid type of write hints,
like WRITE_HINT_MEDIUM.
F_SET_WRITE_LIFEPass in a
This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data,
so that it the device can do so most effiently. If an application is
managing and writing data
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 2 ++
fs/direct-io.c | 2 ++
fs/iomap.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 51959936..de4301168710 100644
---
Add four flags for the pwritev2(2) system call, allowing an application
to give the kernel a hint about what on-media life times can be
expected from a given write.
The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.
Set aside
On 06/15/2017 09:59 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues
>
> A new bio operation flag REQ_NOWAIT is introduced to identify bio's
> orignating from iocb with IOCB_NOWAIT. This flag indicates
> to return immediately if a request cannot be made instead
> of
76 matches
Mail list logo