On 06/14/2017 09:57 PM, Jens Axboe wrote:
> On 06/14/2017 09:53 PM, Andreas Dilger wrote:
>> On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote:
>>> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
File systems can encrypt some of their data blocks with their own
encryption keys, and for those blocks another round of encryption at
the dm-crypt layer may be redundant, depending on the keys being used.
This patch enables dm-crypt to observe the REQ_NOENCRYPT flag as an
indicator that a bio
When lower layers such as dm-crypt observe the REQ_NOENCRYPT flag, it
helps the I/O stack avoid redundant encryption, improving performance
and power utilization.
Note that lower layers must be consistent in their observation of this
flag in order to avoid the possibility of data corruption.
On Wed, Jun 14, 2017 at 09:45:05PM -0600, Jens Axboe wrote:
> Add four flags for the pwritev2(2) system call, allowing an application
> to give the kernel a hint about what on-media life times can be
> expected from a given write.
>
> The intent is for these values to be relative to each other,
On Wed, Jun 14, 2017 at 07:55:17AM -0400, Jeff Layton wrote:
> On Tue, 2017-06-13 at 16:40 +0800, Eryu Guan wrote:
> > On Mon, Jun 12, 2017 at 08:42:13AM -0400, Jeff Layton wrote:
> > > Make a new btrfs/999 test that works the way Chris Mason suggested:
> > >
> > > Build a filesystem with 2
On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote:
> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
>> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
>> wrote:
>>> Christoph,
>>>
I think what Martin wants (or at least
On 06/14/2017 09:53 PM, Andreas Dilger wrote:
> On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote:
>> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
>>> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
>>> wrote:
On 06/14/2017 10:15 PM, Darrick J. Wong wrote:
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 47c1d4484df9..9cb2314efca3 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -678,7 +678,7 @@ static ssize_t do_iter_readv_writev(struct file *filp,
>> struct iov_iter *iter,
>>
When lower layers such as dm-crypt observe the REQ_NOENCRYPT flag, it
helps the I/O stack avoid redundant encryption, improving performance
and power utilization.
Note that lower layers must be consistent in their observation of this
flag in order to avoid the possibility of data corruption.
When both the file system and a lower layer such as dm-crypt encrypt
the same file contents, it impacts performance and power utilization.
Depending on how the operating environment manages the encryption
keys, there is often no significant security benefit to redundantly
encrypting.
File systems
On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
> wrote:
>> Christoph,
>>
>>> I think what Martin wants (or at least what I'd want him to want) is
>>> to define a few REQ_* bits that mirror
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/buffer.c | 14 +-
fs/mpage.c | 1 +
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..3faf73a71d4b 100644
--- a/fs/buffer.c
We map the RWF_WRITE_* life time flags to the internal flags.
Drivers can then, in turn, map those flags to a suitable stream
type.
Signed-off-by: Jens Axboe
---
block/bio.c | 16
include/linux/bio.h | 1 +
include/linux/blk_types.h | 5
No functional changes in this patch, we just add four flags
that will be used to denote a stream type, and ensure that we
don't merge across different stream types.
Signed-off-by: Jens Axboe
---
block/blk-merge.c | 16
include/linux/blk_types.h | 11
Useful to verify that things are working the way they should.
Reading the file will return number of kb written to each
stream. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.
Drivers will write to q->stream_writes[] if they handle a stream.
Add four flags for the pwritev2(2) system call, allowing an application
to give the kernel a hint about what on-media life times can be
expected from a given write.
The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.
Define
A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:
- With NVMe 1.3 compliant devices, the
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 2 ++
fs/direct-io.c | 2 ++
fs/iomap.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 51959936..de4301168710 100644
---
No functional changes in this patch, just in preparation for
allowing applications to pass in hints about data life times
for writes.
Pack the i_write_hint field into a 2-byte hole, so we don't grow
the size of the inode.
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
Reviewed-by: Andreas Dilger
Signed-off-by: Chris Mason
Signed-off-by: Jens Axboe
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3619e010005..2bc2dfca87c2 100644
---
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/ext4/page-io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1a82138ba739..764bf0ddecd4 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/xfs/xfs_aops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 09af0f7cd55e..fe11fe47d235 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@
On Wed, Jun 14, 2017 at 01:05:33PM -0600, Jens Axboe wrote:
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
Thanks Jens!
Signed-off-by: Chris Mason
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git
On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
wrote:
> Christoph,
>
>> I think what Martin wants (or at least what I'd want him to want) is
>> to define a few REQ_* bits that mirror the RWF bits, use that to
>> transfer the information down the stack, and then
Jens,
> A new iteration of this patchset, previously known as write streams.
> As before, this patchset aims at enabling applications split up
> writes into separate streams, based on the perceived life time
> of the data written. This is useful for a variety of reasons:
>
> - With NVMe 1.3
From: Shaohua Li
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from
From: Shaohua Li
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to
From: Shaohua Li
bio_free isn't a good place to free cgroup/integrity info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should
From: Shaohua Li
Set i_generation for kernfs inode. This is required to implement exportfs
operations.
Note, the generation is 32-bit, so it's possible the generation wraps up
and we find stale files. The possiblity is low, since fhandle matches
both inode number and generation. In
From: Shaohua Li
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Please note, I extend inode
From: Shaohua Li
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.
Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup'
From: Shaohua Li
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.
with the option enabled, blktrace will output something like this:
dd-1353 [007] d..2
On 06/13/17 10:54, Ross Zwisler wrote:
> This commit is causing the following kernel BUG for me when I shut
> down my systems:
>
> BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2790
> in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3
Thanks Ross for the
On 06/14/2017 09:45 AM, Martin K. Petersen wrote:
>
> Jens,
>
>> A new iteration of this patchset, previously known as write streams.
>> As before, this patchset aims at enabling applications split up
>> writes into separate streams, based on the perceived life time
>> of the data written. This
On Wed, Jun 14, 2017 at 09:53:05AM -0600, Jens Axboe wrote:
> So how about we just call it "write_hint"? It sounds mostly like a
> naming issue to me, as you would then map that to some specific stream
> in your driver. You're free to do that right now. They are all flags,
> it's just packed as a
Jens,
> So how about we just call it "write_hint"? It sounds mostly like a
> naming issue to me, as you would then map that to some specific stream
> in your driver. You're free to do that right now. They are all flags,
> it's just packed as a value to not waste too much space.
Sure, that's
From: Shaohua Li
Hi,
Currently blktrace isn't cgroup aware. blktrace prints out task name of current
context, but the task of current context isn't always in the cgroup where the
BIO comes from. We can't use task name to find out IO cgroup. For example,
Writeback BIOs always comes
From: Shaohua Li
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata.
On 06/14/2017 10:04 AM, Martin K. Petersen wrote:
>
> Christoph,
>
>> I think what Martin wants (or at least what I'd want him to want) is
>> to define a few REQ_* bits that mirror the RWF bits, use that to
>> transfer the information down the stack, and then only translate it
>> to stream ids
On 06/14/2017 10:00 AM, Martin K. Petersen wrote:
>
> Jens,
>
>> So how about we just call it "write_hint"? It sounds mostly like a
>> naming issue to me, as you would then map that to some specific stream
>> in your driver. You're free to do that right now. They are all flags,
>> it's just
On 06/14/2017 10:01 AM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 09:53:05AM -0600, Jens Axboe wrote:
>> So how about we just call it "write_hint"? It sounds mostly like a
>> naming issue to me, as you would then map that to some specific stream
>> in your driver. You're free to do that
Christoph,
> I think what Martin wants (or at least what I'd want him to want) is
> to define a few REQ_* bits that mirror the RWF bits, use that to
> transfer the information down the stack, and then only translate it
> to stream ids in the driver.
Yup. If we have enough space in the existing
From: Shaohua Li
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Signed-off-by: Shaohua Li
---
fs/kernfs/dir.c|
From: Shaohua Li
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_KERNFS_WITH_PARENT' type
fhandle, we only need export the inode number and
From: Shaohua Li
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
From: Shaohua Li
Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.
To make the API lock free, kernfs node is freed in RCU context. And we
depend on kernfs_node count/ino number to filter stale kernfs nodes.
Signed-off-by: Shaohua
On 06/14/2017 10:04 AM, Martin K. Petersen wrote:
>
> Christoph,
>
>> I think what Martin wants (or at least what I'd want him to want) is
>> to define a few REQ_* bits that mirror the RWF bits, use that to
>> transfer the information down the stack, and then only translate it
>> to stream ids
On Wed, Jun 14, 2017 at 9:19 AM, Bart Van Assche
wrote:
> On 06/13/17 10:54, Ross Zwisler wrote:
>> This commit is causing the following kernel BUG for me when I shut
>> down my systems:
>>
>> BUG: sleeping function called from invalid context at
>>
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/buffer.c | 14 +-
fs/mpage.c | 1 +
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..8324c24751ca 100644
--- a/fs/buffer.c
This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data,
so that it the device can do so most effiently. If an application is
managing and writing data
We map the RWF_WRITE_* life time flags to the internal flags.
Drivers can then, in turn, map those flags to a suitable stream
type.
Signed-off-by: Jens Axboe
---
block/bio.c | 16
include/linux/bio.h | 1 +
include/linux/blk_types.h | 5
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 2 ++
fs/direct-io.c | 2 ++
fs/iomap.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 51959936..31ba4a8f0a28 100644
---
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3619e010005..b245085e8f10 100644
--- a/fs/btrfs/extent_io.c
+++
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/ext4/page-io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1a82138ba739..033b5bfa4e0b 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@
On 06/14/2017 09:19 AM, Bart Van Assche wrote:
> Subject: [PATCH] block: Fix a blk_exit_rl() regression
>
> Avoid that the following complaint is reported:
>
> BUG: sleeping function called from invalid context at kernel/workqueue.c:2790
> in_atomic(): 1, irqs_disabled(): 0, pid: 41, name:
No functional changes in this patch, just in preparation for
allowing applications to pass in hints about data life times
for writes.
Pack the i_stream field into a 2-byte hole, so we don't grow
the size of the inode.
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:
- With NVMe 1.3 compliant devices, the
Useful to verify that things are working the way they should.
Reading the file will return number of kb written to each
stream. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.
Drivers will write to q->stream_writes[] if they handle a stream.
No functional changes in this patch, we just add four flags
that will be used to denote a stream type, and ensure that we
don't merge across different stream types.
Signed-off-by: Jens Axboe
---
block/blk-merge.c | 16
include/linux/blk_types.h | 11
Add four flags for the pwritev2(2) system call, allowing an application
to give the kernel a hint about what on-media life times can be
expected from a given write.
The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.
Define
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/xfs/xfs_aops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 09af0f7cd55e..9770be0140ad 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@
On 06/14/17 12:28, Jens Axboe wrote:
> I added this, but the above is really a horrible changelog. It doesn't
> say how the problem is fixed. I added some verbiage to that effect.
Hello Jens,
Thanks for having fixed up the changelog and for already having picked
up this patch. I was going to
On Tue, Jun 13, 2017 at 06:24:32AM -0400, Jeff Layton wrote:
> That's definitely what I want for the endgame here. My plan was to add
> this flag for now, and then eventually reverse it (or drop it) once all
> or most filesystems are converted.
>
> We can do it that way from the get-go if you
On Wed, Jun 14, 2017 at 01:05:27PM -0600, Jens Axboe wrote:
> Add four flags for the pwritev2(2) system call, allowing an application
> to give the kernel a hint about what on-media life times can be
> expected from a given write.
>
> The intent is for these values to be relative to each other,
On Wed, Jun 14, 2017 at 01:05:26PM -0600, Jens Axboe wrote:
> No functional changes in this patch, just in preparation for
> allowing applications to pass in hints about data life times
> for writes.
>
> Pack the i_stream field into a 2-byte hole, so we don't grow
> the size of the inode.
Can't
> +static const unsigned int rwf_write_to_opf_flag[] = {
> + 0, REQ_WRITE_SHORT, REQ_WRITE_MEDIUM, REQ_WRITE_LONG, REQ_WRITE_EXTREME
> +};
> +
> +/*
> + * 'stream_flags' is one of RWF_WRITE_LIFE_* values
> + */
> +void bio_set_streamid(struct bio *bio, unsigned int rwf_flags)
> +{
> +
Btw, I think these could also easily map to DSM field in the NVMe
write command, except that these unfortunately mix in read information
as well.
> + __REQ_WRITE_SHORT, /* short life time write */
-> Frequent writes and infrequent reads to the LBA range indicated.
or
-> Frequent
On 06/14/2017 02:26 PM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 01:05:27PM -0600, Jens Axboe wrote:
>> Add four flags for the pwritev2(2) system call, allowing an application
>> to give the kernel a hint about what on-media life times can be
>> expected from a given write.
>>
>> The
On 06/14/2017 02:25 PM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 01:05:26PM -0600, Jens Axboe wrote:
>> No functional changes in this patch, just in preparation for
>> allowing applications to pass in hints about data life times
>> for writes.
>>
>> Pack the i_stream field into a 2-byte
On 06/14/2017 02:32 PM, Christoph Hellwig wrote:
>> +static unsigned int nvme_get_write_stream(struct nvme_ns *ns,
>> + struct request *req)
>> +{
>> +unsigned int streamid = 0;
>> +
>> +if (req_op(req) != REQ_OP_WRITE ||
> +static unsigned int nvme_get_write_stream(struct nvme_ns *ns,
> + struct request *req)
> +{
> + unsigned int streamid = 0;
> +
> + if (req_op(req) != REQ_OP_WRITE || !blk_stream_valid(req->cmd_flags) ||
> + !ns->nr_streams)
> +
On 06/14/2017 02:37 PM, Christoph Hellwig wrote:
> Btw, I think these could also easily map to DSM field in the NVMe
> write command, except that these unfortunately mix in read information
> as well.
But that's the problem, they are read/write mixed flags. I'd much
rather keep them separate. If
On Tue, 2017-06-13 at 16:40 +0800, Eryu Guan wrote:
> On Mon, Jun 12, 2017 at 08:42:13AM -0400, Jeff Layton wrote:
> > Make a new btrfs/999 test that works the way Chris Mason suggested:
> >
> > Build a filesystem with 2 devices that stripes the data across
> > both devices, but mirrors metadata
73 matches
Mail list logo