Use nproc to get number of CPUs for fio jobs and introduce
_run_fio_rand_io helper for parallel IO which we don't really care about
the details and just want some IO.
Signed-off-by: Johannes Thumshirn
---
common/fio | 7 +++
tests/block/005 | 4 +---
tests/block/006
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by work with const
attribute_group. So mark the non-const structs as const.
File size before:
textdata bss dec hex filename
11622 9122076 146103912
Bypass if: bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND)
Writeback if: op_is_sync(bio->bi_opf) || bio->bi_opf & (REQ_META|REQ_PRIO)
Signed-off-by: Eric Wheeler
---
drivers/md/bcache/request.c | 3 +++
drivers/md/bcache/writeback.h | 3 ++-
2 files changed, 5
Signed-off-by: Eric Wheeler
---
Documentation/bcache.txt | 80
1 file changed, 80 insertions(+)
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt
index a9259b5..c78c012 100644
---
Add sysfs entries to support to hint for bypass/writeback by the ioprio
assigned to the bio. If the bio is unassigned, use current's io-context
ioprio for cache writeback or bypass (configured per-process with
`ionice`).
Having idle IOs bypass the cache can increase performance elsewhere
since
continue_at() doesn't have a return statement anymore.
Signed-off-by: Dan Carpenter
---
drivers/md/bcache/closure.h | 4
1 file changed, 4 deletions(-)
diff --git a/drivers/md/bcache/closure.h b/drivers/md/bcache/closure.h
index 1ec84ca..295b7e4 100644
---
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.
Signed-off-by: tang.junhui
Reviewed-by: Eric Wheeler
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/request.c | 6 +++---
1 file changed, 3
When there is not enough dirty data in writeback cache,
writeback rate is at minimum 1 key per second
util all dirty data to be cleaned, it is inefficiency,
and also causes waste of energy;
in this patch, When there is not enough dirty data,
let the writeback rate to be 0, and writeback
In olden times, closure_return() used to have a hidden return built in.
We removed the hidden return but forgot to add a new return here. If
"c" were NULL we would oops on the next line, but fortunately "c" is
never NULL. Let's just remove the if statement.
Signed-off-by: Dan Carpenter
bcache called ida_simple_remove() with minor which have multiplied by
BCACHE_MINORS, it would cause minor wrong release and leakage.
In addition, when adding partition support to bcache, the name assignment
was not updated, resulting in numbers jumping (bcache0, bcache16,
bcache32...). This has
If you encounter any errors in bch_cached_dev_attach it will return a negative
error code. The variable 'v' which stores the result is unsigned, thus user
space sees a very large value returned for bytes written which can cause
incorrect user space behavior. Utilize 1 signed variable to use
bucket_in_use is updated in gc thread which triggered by invalidating or
writing sectors_to_gc dirty data, It's been too long, Therefore, when we
use it to compare with the threshold, it is often not timely, which leads
to inaccurate judgment and often results in bucket depletion.
Signed-off-by:
Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
op_is_write(bio_op(bio)) &&
Thin flash device does not initialize stripe_sectors_dirty correctly, this
patch fixes this issue.
Signed-off-by: Tang Junhui
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/super.c | 3 ++-
drivers/md/bcache/writeback.c | 8
drivers/md/bcache/writeback.h |
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by: Jan Kara
Cc:
In currently, we only alloc 6 open buckets for each cache set,
but in usually, we always attach about 10 or so backend devices for
each cache set, and the each bcache device are always accessed by
about 10 or so threads in top application layer. So 6 open buckets
are too few, It has led to that
gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread write-back thread
| |bch_writeback_thread()
|bch_gc_thread()|
|
Since dirty sectors of thin flash cannot be used to cache data for backend
device, so we should subtract it in calculating writeback rate.
Signed-off-by: Tang Junhui
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/writeback.c | 2 +-
drivers/md/bcache/writeback.h | 19
set_gc_sectors() has been called in bch_gc_thread(), and it was called
again in bch_btree_gc_finish() . The following call is unnecessary, so
delete it.
Signed-off-by: Tang Junhui
---
drivers/md/bcache/btree.c | 1 -
1 file changed, 1 deletion(-)
diff --git
Some missed IOs are not counted into cache_misses, this patch fix this
issue.
Signed-off-by: tang.junhui
Reviewed-by: Eric Wheeler
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/request.c | 6 +-
1 file changed, 5 insertions(+), 1
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by work with const
attribute_group. So mark the non-const structs as const.
File size before:
textdata bss dec hex filename
5302 544 0584616d6
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by: Jan Kara
Cc:
bucket_in_use is updated in gc thread which triggered by invalidating or
writing sectors_to_gc dirty data, It's been too long, Therefore, when we
use it to compare with the threshold, it is often not timely, which leads
to inaccurate judgment and often results in bucket depletion.
Signed-off-by:
Thin flash device does not initialize stripe_sectors_dirty correctly, this
patch fixes this issue.
Signed-off-by: Tang Junhui
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/super.c | 3 ++-
drivers/md/bcache/writeback.c | 8
drivers/md/bcache/writeback.h |
Signed-off-by: Eric Wheeler
---
Documentation/bcache.txt | 80
1 file changed, 80 insertions(+)
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt
index a9259b5..c78c012 100644
---
gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread write-back thread
| |bch_writeback_thread()
|bch_gc_thread()|
|
If you encounter any errors in bch_cached_dev_attach it will return a negative
error code. The variable 'v' which stores the result is unsigned, thus user
space sees a very large value returned for bytes written which can cause
incorrect user space behavior. Utilize 1 signed variable to use
Bypass if: bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND)
Writeback if: op_is_sync(bio->bi_opf) || bio->bi_opf & (REQ_META|REQ_PRIO)
Signed-off-by: Eric Wheeler
---
drivers/md/bcache/request.c | 3 +++
drivers/md/bcache/writeback.h | 3 ++-
2 files changed, 5
Add sysfs entries to support to hint for bypass/writeback by the ioprio
assigned to the bio. If the bio is unassigned, use current's io-context
ioprio for cache writeback or bypass (configured per-process with
`ionice`).
Having idle IOs bypass the cache can increase performance elsewhere
since
continue_at() doesn't have a return statement anymore.
Signed-off-by: Dan Carpenter
---
drivers/md/bcache/closure.h | 4
1 file changed, 4 deletions(-)
diff --git a/drivers/md/bcache/closure.h b/drivers/md/bcache/closure.h
index 1ec84ca..295b7e4 100644
---
When there is not enough dirty data in writeback cache,
writeback rate is at minimum 1 key per second
util all dirty data to be cleaned, it is inefficiency,
and also causes waste of energy;
in this patch, When there is not enough dirty data,
let the writeback rate to be 0, and writeback
In olden times, closure_return() used to have a hidden return built in.
We removed the hidden return but forgot to add a new return here. If
"c" were NULL we would oops on the next line, but fortunately "c" is
never NULL. Let's just remove the if statement.
Signed-off-by: Dan Carpenter
bcache called ida_simple_remove() with minor which have multiplied by
BCACHE_MINORS, it would cause minor wrong release and leakage.
In addition, when adding partition support to bcache, the name assignment
was not updated, resulting in numbers jumping (bcache0, bcache16,
bcache32...). This has
Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
op_is_write(bio_op(bio)) &&
Some missed IOs are not counted into cache_misses, this patch fix this
issue.
Signed-off-by: tang.junhui
Reviewed-by: Eric Wheeler
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/request.c | 6 +-
1 file changed, 5 insertions(+), 1
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.
Signed-off-by: tang.junhui
Reviewed-by: Eric Wheeler
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/request.c | 6 +++---
1 file changed, 3
set_gc_sectors() has been called in bch_gc_thread(), and it was called
again in bch_btree_gc_finish() . The following call is unnecessary, so
delete it.
Signed-off-by: Tang Junhui
---
drivers/md/bcache/btree.c | 1 -
1 file changed, 1 deletion(-)
diff --git
Since dirty sectors of thin flash cannot be used to cache data for backend
device, so we should subtract it in calculating writeback rate.
Signed-off-by: Tang Junhui
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/writeback.c | 2 +-
drivers/md/bcache/writeback.h | 19
In currently, we only alloc 6 open buckets for each cache set,
but in usually, we always attach about 10 or so backend devices for
each cache set, and the each bcache device are always accessed by
about 10 or so threads in top application layer. So 6 open buckets
are too few, It has led to that
If we don't have the block layer enabled, we do not present card
status and extcsd in the debugfs.
Debugfs is not ABI, and maintaining files of no relevance for
non-block devices comes at a high maintenance cost if we shall
support it with the block layer compiled out.
The debugfs entries suffer
We have a data pointer for the ioctl() data, but we need to
pass other data along with the DRV_OP:s, so make this a
void * so it can be reused.
Signed-off-by: Linus Walleij
---
ChangeLog v3->v4:
- No changes just resending
ChangeLog v2->v3:
- No changes just resending
On 28.06.2017 16:58, Javier Gonzalez wrote:
>> On 28 Jun 2017, at 16.33, Carl-Daniel Hailfinger
>> wrote:
>>
>> thanks for the pointer to the github reporting page.
>> I'll answer your questions here (to make then indexable by search
>> engines in case someone
On 06/30/2017 07:05 AM, Brian King wrote:
> On 06/29/2017 09:17 PM, Jens Axboe wrote:
>> On 06/29/2017 07:20 PM, Ming Lei wrote:
>>> On Fri, Jun 30, 2017 at 2:42 AM, Jens Axboe wrote:
On 06/29/2017 10:00 AM, Jens Axboe wrote:
> On 06/29/2017 09:58 AM, Jens Axboe wrote:
Remove unused variable.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
---
drivers/lightnvm/pblk-sysfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/lightnvm/pblk-sysfs.c
Prevent pblk->lines being double freed in case of an error during pblk
initialization.
Fixes: dd2a43437337: "lightnvm: pblk: sched. metadata on write thread"
Reported-by: Dan Carpenter
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
When a read is directed to the cache, we risk that the lba has been
updated during the time we made the L2P table lookup and the time we are
actually reading form the cache. We intentionally not hold the L2P lock
not to block other threads.
While strict ordering is not a guarantee at this level
On 06/30/2017 09:56 AM, Javier González wrote:
> Hi Jens,
>
> Here you have a second round of fixes for pblk. They are in essence bug
> fixes including a double-free reported by Dan.
>
> There is also regression fix for pblk removal, which was introduced with
> the new metadata scheduler. This
On Fri, Jun 30, 2017 at 12:45:54PM -0400, Jeff Layton wrote:
> Should I aim to do that with an individual patch for each fs, or is it
> better to do a swath of them all at once in a single patch here?
I'd be perfectly happy with one big patch for all the trivial
conversions.
Fix bad metadata buffer assignations introduced when refactoring the
medatada write path.
Fixes: dd2a43437337 lightnvm: pblk: sched. metadata on write thread
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
---
On Thu, 2017-06-29 at 07:12 -0700, Christoph Hellwig wrote:
> Nice and simple, this looks great!
>
> Reviewed-by: Christoph Hellwig
Thanks! I think this turned out to be a lot cleaner too.
For filesystems that use filemap_write_and_wait_range today this now
becomes a pretty
Do bitmap checks only when debug mode is enable. The line bitmap used
for mapping to physical addresses is fairly large (~512KB) and it is
expensive to do this checks on the fast path.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
---
When user threads place data into the write buffer, they reserve space
and do the memory copy out of the lock. As a consequence, when the write
thread starts persisting data, there is a chance that it is not copied
yet. In this case, avoid polling, and schedule before retrying.
Signed-off-by:
When removing a pblk instance, pad the current line using asynchronous
I/O. This reduces the removal time from ~1 minute in the worst case to a
couple of seconds.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
---
Use the right types and conversions on le64 variables. Reported by
sparse.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
---
drivers/lightnvm/pblk-core.c | 2 +-
drivers/lightnvm/pblk-gc.c | 5 -
Add a sanity check to the pblk initialization sequence in order to
ensure that enough LUNs have been allocated to store the line metadata.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
---
drivers/lightnvm/pblk-init.c | 6 ++
1 file
For now, we allocate a per I/O buffer for GC data. Since the potential
size of the buffer is 256KB and GC is not in the fast path, do this
allocation with vmalloc. This puts lets pressure on the memory
allocator at no performance cost.
Signed-off-by: Javier González
Hi Jens,
Here you have a second round of fixes for pblk. They are in essence bug
fixes including a double-free reported by Dan.
There is also regression fix for pblk removal, which was introduced with
the new metadata scheduler. This fix makes that removing a pblk instance
takes again at most 2
Hi Max,
I remembered you reporting this. I think this is a regression introduced
with the scheduling, since ->rqs[] isn't static anymore. ->static_rqs[]
is, but that's not indexable by the tag we find. So I think we need to
guard those with a NULL check. The actual requests themselves are
static,
This is based on the old idea and code from Milosz Tanski. With the
aio nowait code it becomes mostly trivial now.
Signed-off-by: Christoph Hellwig
---
fs/aio.c | 6 --
fs/btrfs/file.c| 9 ++---
fs/ext4/file.c | 6 +++---
fs/xfs/xfs_file.c | 11
This series resurrects the old patches from Milosz to implement
non-blocking buffered reads. Thanks to the non-blocking AIO code from
Goldwyn the implementation becomes pretty much trivial. As that
implementation is in the block tree I would suggest that we merge
these patches through the block
And rename it to the more descriptive generic_file_buffered_read while
at it.
Signed-off-by: Christoph Hellwig
---
mm/filemap.c | 15 ---
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 742034e56100..3df0a57cd48e 100644
From: Milosz Tanski
Allow generic_file_buffered_read to bail out early instead of waiting for
the page lock or reading a page if IOCB_NOWAIT is specified.
Signed-off-by: Milosz Tanski
Reviewed-by: Christoph Hellwig
Reviewed-by: Jeff Moyer
From: Jan Kara
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by:
On 06/30/2017 09:08 AM, Jens Axboe wrote:
Compared with the totally percpu approach, this way might help 1:M or
N:M mapping, but won't help 1:1 map(NVMe), when hctx is mapped to
each CPU(especially there are huge hw queues on a big system), :-(
>>>
>>> Not disagreeing with that,
On 06/30/2017 01:15 PM, Christoph Hellwig wrote:
> This is based on the old idea and code from Milosz Tanski. With the
> aio nowait code it becomes mostly trivial now.
>
Looks Good.
Reviewed-by: Goldwyn Rodrigues
--
Goldwyn
From: Dan Carpenter
continue_at() doesn't have a return statement anymore.
Signed-off-by: Dan Carpenter
---
drivers/md/bcache/closure.h | 4
1 file changed, 4 deletions(-)
diff --git a/drivers/md/bcache/closure.h
From: Tang Junhui
When there is not enough dirty data in writeback cache,
writeback rate is at minimum 1 key per second
util all dirty data to be cleaned, it is inefficiency,
and also causes waste of energy;
in this patch, When there is not enough dirty data,
let the
From: Eric Wheeler
Signed-off-by: Eric Wheeler
---
Documentation/bcache.txt | 80
1 file changed, 80 insertions(+)
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt
index
From: Tang Junhui
gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread write-back thread
| |bch_writeback_thread()
|bch_gc_thread()
From: Dan Carpenter
In olden times, closure_return() used to have a hidden return built in.
We removed the hidden return but forgot to add a new return here. If
"c" were NULL we would oops on the next line, but fortunately "c" is
never NULL. Let's just remove the if
From: Tang Junhui
Thin flash device does not initialize stripe_sectors_dirty correctly, this
patch fixes this issue.
Signed-off-by: Tang Junhui
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/super.c | 3 ++-
drivers/md/bcache/writeback.c |
From: Tang Junhui
Some missed IOs are not counted into cache_misses, this patch fix this
issue.
Signed-off-by: tang.junhui
Reviewed-by: Eric Wheeler
Cc: sta...@vger.kernel.org
---
drivers/md/bcache/request.c | 6
From: Eric Wheeler
Bypass if: bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND)
Writeback if: op_is_sync(bio->bi_opf) || bio->bi_opf & (REQ_META|REQ_PRIO)
Signed-off-by: Eric Wheeler
---
drivers/md/bcache/request.c | 3 +++
From: Tang Junhui
In currently, we only alloc 6 open buckets for each cache set,
but in usually, we always attach about 10 or so backend devices for
each cache set, and the each bcache device are always accessed by
about 10 or so threads in top application layer. So 6
From: Tang Junhui
set_gc_sectors() has been called in bch_gc_thread(), and it was called
again in bch_btree_gc_finish() . The following call is unnecessary, so
delete it.
Signed-off-by: Tang Junhui
---
drivers/md/bcache/btree.c | 1 -
1 file
From: Tony Asleson
If you encounter any errors in bch_cached_dev_attach it will return a negative
error code. The variable 'v' which stores the result is unsigned, thus user
space sees a very large value returned for bytes written which can cause
incorrect user space
From: Eric Wheeler
Add sysfs entries to support to hint for bypass/writeback by the ioprio
assigned to the bio. If the bio is unassigned, use current's io-context
ioprio for cache writeback or bypass (configured per-process with
`ionice`).
Having idle IOs bypass the
From: Tang Junhui
bucket_in_use is updated in gc thread which triggered by invalidating or
writing sectors_to_gc dirty data, It's been too long, Therefore, when we
use it to compare with the threshold, it is often not timely, which leads
to inaccurate judgment and often
From: Tang Junhui
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.
Signed-off-by: tang.junhui
Reviewed-by: Eric Wheeler
Cc: sta...@vger.kernel.org
---
From: Liang Chen
mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.
Signed-off-by: Liang Chen
Reviewed-by: Eric Wheeler
From: Tang Junhui
Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
if (!congested &&
mode ==
On 06/30/2017 06:26 PM, Jens Axboe wrote:
> On 06/30/2017 05:23 PM, Ming Lei wrote:
>> Hi Bian,
>>
>> On Sat, Jul 1, 2017 at 2:33 AM, Brian King wrote:
>>> On 06/30/2017 09:08 AM, Jens Axboe wrote:
>>> Compared with the totally percpu approach, this way might help
Hi Bian,
On Sat, Jul 1, 2017 at 2:33 AM, Brian King wrote:
> On 06/30/2017 09:08 AM, Jens Axboe wrote:
> Compared with the totally percpu approach, this way might help 1:M or
> N:M mapping, but won't help 1:1 map(NVMe), when hctx is mapped to
> each
This patch removes the PCI device from the kernel's topology tree
if the device is no longer present.
Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
failure) left the PCI device in the kernel's topology upon device failure.
However, this does not work well for the slot
On 06/30/2017 08:08 AM, Jens Axboe wrote:
> On 06/30/2017 07:05 AM, Brian King wrote:
>> On 06/29/2017 09:17 PM, Jens Axboe wrote:
>>> On 06/29/2017 07:20 PM, Ming Lei wrote:
On Fri, Jun 30, 2017 at 2:42 AM, Jens Axboe wrote:
> On 06/29/2017 10:00 AM, Jens Axboe wrote:
On 06/30/2017 10:17 PM, Jens Axboe wrote:
> On 06/30/2017 08:08 AM, Jens Axboe wrote:
>> On 06/30/2017 07:05 AM, Brian King wrote:
>>> On 06/29/2017 09:17 PM, Jens Axboe wrote:
On 06/29/2017 07:20 PM, Ming Lei wrote:
> On Fri, Jun 30, 2017 at 2:42 AM, Jens Axboe wrote:
On 06/30/2017 05:23 PM, Ming Lei wrote:
> Hi Bian,
>
> On Sat, Jul 1, 2017 at 2:33 AM, Brian King wrote:
>> On 06/30/2017 09:08 AM, Jens Axboe wrote:
>> Compared with the totally percpu approach, this way might help 1:M or
>> N:M mapping, but won't help 1:1
On 06/29/2017 09:17 PM, Jens Axboe wrote:
> On 06/29/2017 07:20 PM, Ming Lei wrote:
>> On Fri, Jun 30, 2017 at 2:42 AM, Jens Axboe wrote:
>>> On 06/29/2017 10:00 AM, Jens Axboe wrote:
On 06/29/2017 09:58 AM, Jens Axboe wrote:
> On 06/29/2017 02:40 AM, Ming Lei wrote:
88 matches
Mail list logo