[PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO

2017-06-30 Thread Tang Junhui
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui Reviewed-by: Eric Wheeler Cc: sta...@vger.kernel.org --- drivers/md/bcache/request.c | 6 +++--- 1 file changed, 3

[PATCH 15/19] bcache: fix issue of writeback rate at minimum 1 key per second

2017-06-30 Thread Tang Junhui
ver, then goto step 2). 2) Loop in bch_writeback_thread() to check if there is enough dirty data for writeback. if there is not enough diry data for writing, then sleep 10 seconds, otherwise, write dirty data to back-end device. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcac

[PATCH 05/19] bcache: fix calling ida_simple_remove() with incorrect minor

2017-06-30 Thread Tang Junhui
bcache called ida_simple_remove() with minor which have multiplied by BCACHE_MINORS, it would cause minor wrong release and leakage. In addition, when adding partition support to bcache, the name assignment was not updated, resulting in numbers jumping (bcache0, bcache16, bcache32...). This has

[PATCH 12/19] bcache: update bucket_in_use periodically

2017-06-30 Thread Tang Junhui
-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/btree.c | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 866dcf7..77aa20b 100644 --- a/drivers/md/bcache/btree.c +++ b/driv

[PATCH 02/19] bcache: fix sequential large write IO bypass

2017-06-30 Thread Tang Junhui
Sequential write IOs were tested with bs=1M by FIO in writeback cache mode, these IOs were expected to be bypassed, but actually they did not. We debug the code, and find in check_should_bypass(): if (!congested && mode == CACHE_MODE_WRITEBACK && op_is_write(bio_op(bio)) &&

[PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device

2017-06-30 Thread Tang Junhui
Thin flash device does not initialize stripe_sectors_dirty correctly, this patch fixes this issue. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> Cc: sta...@vger.kernel.org --- drivers/md/bcache/super.c | 3 ++- drivers/md/bcache/writeback.c | 8 drivers/md/bcache/write

[PATCH 16/19] bcache: increase the number of open buckets

2017-06-30 Thread Tang Junhui
ead the same backend device, so it is good for write-back and also promote the usage efficiency of buckets. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/alloc.c b/drivers/

[PATCH 17/19] bcache: fix for gc and write-back race

2017-06-30 Thread Tang Junhui
rite locker This patch alloc a separate work-queue for write-back thread to avoid such race. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> Cc: sta...@vger.kernel.org --- drivers/md/bcache/bcache.h| 1 + drivers/md/bcache/super.c | 2 ++ drivers/md/bcache/writeback.c | 8 ++-- 3

[PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate

2017-06-30 Thread Tang Junhui
Since dirty sectors of thin flash cannot be used to cache data for backend device, so we should subtract it in calculating writeback rate. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> Cc: sta...@vger.kernel.org --- drivers/md/bcache/writeback.c | 2 +- drivers/md/bcache/writeback.

[PATCH 13/19] bcache: delete redundant calling set_gc_sectors()

2017-06-30 Thread Tang Junhui
set_gc_sectors() has been called in bch_gc_thread(), and it was called again in bch_btree_gc_finish() . The following call is unnecessary, so delete it. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/btree.c | 1 - 1 file changed, 1 deletion(-) diff --git a/driv

[PATCH 04/19] bcache: fix wrong cache_misses statistics

2017-06-30 Thread Tang Junhui
Some missed IOs are not counted into cache_misses, this patch fix this issue. Signed-off-by: tang.junhui Reviewed-by: Eric Wheeler Cc: sta...@vger.kernel.org --- drivers/md/bcache/request.c | 6 +- 1 file changed, 5 insertions(+), 1

[PATCH 12/19] bcache: update bucket_in_use periodically

2017-06-30 Thread Tang Junhui
-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/btree.c | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 866dcf7..77aa20b 100644 --- a/drivers/md/bcache/btree.c +++ b/driv

[PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device

2017-06-30 Thread Tang Junhui
Thin flash device does not initialize stripe_sectors_dirty correctly, this patch fixes this issue. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> Cc: sta...@vger.kernel.org --- drivers/md/bcache/super.c | 3 ++- drivers/md/bcache/writeback.c | 8 drivers/md/bcache/write

[PATCH 17/19] bcache: fix for gc and write-back race

2017-06-30 Thread Tang Junhui
rite locker This patch alloc a separate work-queue for write-back thread to avoid such race. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> Cc: sta...@vger.kernel.org --- drivers/md/bcache/bcache.h| 1 + drivers/md/bcache/super.c | 2 ++ drivers/md/bcache/writeback.c | 8 ++-- 3

[PATCH 15/19] bcache: fix issue of writeback rate at minimum 1 key per second

2017-06-30 Thread Tang Junhui
ver, then goto step 2). 2) Loop in bch_writeback_thread() to check if there is enough dirty data for writeback. if there is not enough diry data for writing, then sleep 10 seconds, otherwise, write dirty data to back-end device. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcac

[PATCH 05/19] bcache: fix calling ida_simple_remove() with incorrect minor

2017-06-30 Thread Tang Junhui
bcache called ida_simple_remove() with minor which have multiplied by BCACHE_MINORS, it would cause minor wrong release and leakage. In addition, when adding partition support to bcache, the name assignment was not updated, resulting in numbers jumping (bcache0, bcache16, bcache32...). This has

[PATCH 02/19] bcache: fix sequential large write IO bypass

2017-06-30 Thread Tang Junhui
Sequential write IOs were tested with bs=1M by FIO in writeback cache mode, these IOs were expected to be bypassed, but actually they did not. We debug the code, and find in check_should_bypass(): if (!congested && mode == CACHE_MODE_WRITEBACK && op_is_write(bio_op(bio)) &&

[PATCH 04/19] bcache: fix wrong cache_misses statistics

2017-06-30 Thread Tang Junhui
Some missed IOs are not counted into cache_misses, this patch fix this issue. Signed-off-by: tang.junhui Reviewed-by: Eric Wheeler Cc: sta...@vger.kernel.org --- drivers/md/bcache/request.c | 6 +- 1 file changed, 5 insertions(+), 1

[PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO

2017-06-30 Thread Tang Junhui
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui Reviewed-by: Eric Wheeler Cc: sta...@vger.kernel.org --- drivers/md/bcache/request.c | 6 +++--- 1 file changed, 3

[PATCH 13/19] bcache: delete redundant calling set_gc_sectors()

2017-06-30 Thread Tang Junhui
set_gc_sectors() has been called in bch_gc_thread(), and it was called again in bch_btree_gc_finish() . The following call is unnecessary, so delete it. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/btree.c | 1 - 1 file changed, 1 deletion(-) diff --git a/driv

[PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate

2017-06-30 Thread Tang Junhui
Since dirty sectors of thin flash cannot be used to cache data for backend device, so we should subtract it in calculating writeback rate. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> Cc: sta...@vger.kernel.org --- drivers/md/bcache/writeback.c | 2 +- drivers/md/bcache/writeback.

[PATCH 16/19] bcache: increase the number of open buckets

2017-06-30 Thread Tang Junhui
ead the same backend device, so it is good for write-back and also promote the usage efficiency of buckets. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/alloc.c b/drivers/

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-09-28 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike: > + if (KEY_INODE(>key) != KEY_INODE(>key)) > + return false; Please remove these redundant codes, all the keys in dc->writeback_keys have the same KEY_INODE. it is guaranted by refill_dirty(). Regards, Tang

Re: [PATCH] bcache: fix a comments typo in bch_alloc_sectors()

2017-09-26 Thread tang . junhui
efcount bch_bucket_alloc_set() > took: > */ > if (KEY_PTRS()) Yes, It's useful for code reading, Thanks. Reviewed-by: Tang Junhui <tang.jun...@zte.com.cn>

Re: [PATCH] bcache: fix race in setting bdev state

2017-10-09 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike: > How did you find this? Did the race trigger at detach or was it through > code inspection? I find this through code inspection. > I need to analyze this more. It looks correct on its own, but there are > a lot o

[PATCH] [PATCH v3] bcache: gc does not work when triggering by manual

2017-10-09 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> I try to execute the following command to trigger gc thread: [root@localhost internal]# echo 1 > trigger_gc But it does not work, I debug the code in gc_should_run(), It works only if in invalidating or sectors_to_gc < 0. So set sectors

[PATCH] bcache: fix race in setting bdev state

2017-10-09 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> cached_dev_put() is called before setting and writing bdev to BDEV_STATE_CLEAN state, but after calling cached_dev_put(), detach work queue works, and bdev is also set to BDEV_STATE_NONE state in cached_dev_detach_finish(), it may cause race con

Re: [PATCH] bcache: fix race in setting bdev state

2017-10-10 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike: > One race I think I see: we unset the dirty bit before setting ourselves > interruptible. Can bch_writeback_add/queue wake writeback before then > (and then writeback sets itself interruptible and never wakes up)? >

Re: [PATCH 1/5] bcache: don't write back data if reading it failed

2017-09-26 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> It looks good to me, I have noted this issue before. Thanks. --- Tang Junhui > If an IO operation fails, and we didn't successfully read data from the > cache, don't writeback invalid/partial data to the backing disk. > > Sign

Re: [PATCH 1/5] bcache: don't write back data if reading it failed

2017-09-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Lyle: Two questions: 1) In keys_contiguous(), you judge I/O contiguous in cache device, but not in backing device. I think you should judge it by backing device (remove PTR_CACHE() and use KEY_OFFSET() instead of PTR_OFFSET()?). 2) I did n

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-09-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike: For the second question, I thinks this modification is somewhat complex, cannot we do something simple to resolve it? I remember there were some patches trying to avoid too small writeback rate, Coly, is there any progre

[PATCH] update bucket_in_use in real time

2017-10-24 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> bucket_in_use is updated in gc thread which triggered by invalidating or writing sectors_to_gc dirty data, It's a long interval. Therefore, when we use it to compare with the threshold, it is often not timely, which leads to inaccurate judgment and

Question of compatibility of bcache

2017-11-24 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hi, everyone: I create bcache device in 3.10 kernel with bcache of 3.10 kernel, and we had used it for while, then we backport upstream code to bcache of 3.10 kernel, and build a new kernel module to use in 3.10 kernel. Is there any compatible

Re: Re: Re: [PATCH] bcache: stop writeback thread after detaching

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hi, Mike Thanks for your reminder. I'll checkpatch carefully next time. Thanks, Tang

[PATCH v2] bcache: stop writeback thread after detaching

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Currently, when a cached device detaching from cache, writeback thread is not stopped, and writeback_rate_update work is not canceled. For example, after bellow command: echo 1 >/sys/block/sdb/bcache/detach you can still see the writeba

[bug report] bcache stucked when writting jounrnal

2017-11-22 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hi, everyone: bcache stucked when reboot system after high load. root 1704 3.7 0.0 4164 360 ?D14:07 0:09 /usr/lib/udev/bcache-register /dev/sdc [] closure_sync+0x25/0x90 [bcache] [] bch_btree_set_root+0x1f1/0x250 [

Re: Re: [bug report] bcache stucked when writting jounrnal

2017-11-22 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hi, RuiHua: > I have met the similar problem once. > It looks like a deadlock between the cache device register thread and > bcache_allocator thread. > > The trace info tell us the journal is full, probablely the all

Re: Re: [PATCH] bcache: add a separate open bucket list for flash devices

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> On Tue, Nov 21, 2017 at 06:50:32PM +0800, tang.jun...@zte.com.cn wrote: > > From: Tang Junhui <tang.jun...@zte.com.cn> > > > > Currently in pick_data_bucket(), though we keep multiple buckets open > > for writes,

[PATCH] bcache: add a separate open bucket list for flash devices

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Currently in pick_data_bucket(), though we keep multiple buckets open for writes, and try to segregate different write streams for better cache utilization: first we look for a bucket where the last write to it was sequential with the current

[PATCH] [PATCH v2] bcache: segregate flash only volume write streams from cached devices

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> In such scenario that there are some flash only volumes , and some cached devices, when many tasks request these devices in writeback mode, the write IOs may fall to the same bucket as bellow: | cached data | flash data | cached data | cached data|

Re: Re: [PATCH] bcache: add a separate open bucket list for flash devices

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, Kent > Correct me if I am wrong. I guess the reason why you care about flash > only volume is because ceph users use flash only volume to store some > metadata only on SSD ? Yes, we store ceph metadata in flash only volume

Re: Re: [PATCH] [PATCH v2] bcache: segregate flash only volume writestreams from cached devices

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike > Thanks, this looks much better. Can you please fix the whitespace > issues so it gets through checkpatch cleanly? OK, I'll resend a patch later. Thanks, Tang

Re: Re: Re: [PATCH] bcache: stop writeback thread after detaching

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, Mike > > If the change can be inside bch_register_lock, it would (just) be more > > comfortable. The code is correct, because attach/detach sysfs is created > > after writeback_thread created and writeback_rate_updat

[PATCH] [PATCH v2] bcache: segregate flash only volume write streams from cached devices

2017-11-21 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> In such scenario that there are some flash only volumes , and some cached devices, when many tasks request these devices in writeback mode, the write IOs may fall to the same bucket as bellow: | cached data | flash data | cached data | cached data|

[PATCH] bcache: correct journal bucket reading

2017-11-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> There are 3 steps to read out all journal buckets. 1) Try to get a valid journal bucket by golden ratio hash or falling back to linear search. For an example, NNNYYYNN, each character represents a bucket, Y represents a valid journal bucket,

Re: [PATCH] bcache: correct journal bucket reading

2017-11-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike, Coly I found this issue by code reading. It looks serious. May I am wrong. Please have a review. Thanks, Tang

Re: Re: [PATCH] bcache: correct journal bucket reading

2017-11-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike > I think the scenario you listed can't happen, because the first bucket > we try in the hash-search is 0. If the circular buffer has wrapped, > that will be detected immediately and we'll leave the loop with l=0. > We sh

Re: Re: [PATCH] bcache: correct journal bucket reading

2017-11-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike > > that will be detected immediately and we'll leave the loop with l=0. > > We should add a comment that we need to try the first index first for > > correctness so that we don't inadvertently change this beha

[PATCH] bcache: add a comment in journal bucket reading

2017-11-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Journal bucket is a circular buffer, the bucket can be like YYYNNNYY, which means the first valid journal in the 7th bucket, and the latest valid journal in third bucket, in this case, if we do not try we the zero index first, We may get a valid j

Re: [PATCH] [PATCH v2] bcache: segregate flash only volume write streams from cached devices

2017-12-03 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike & Coly Could you please have a reveiw for this patch? > From: Tang Junhui <tang.jun...@zte.com.cn> > > In such scenario that there are some flash only volumes > , and some cached devices, when many tasks request

[PATCH] [PATCH V2] bcache: update bucket_in_use in real time

2017-10-24 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> bucket_in_use is updated in gc thread which triggered by invalidating or writing sectors_to_gc dirty data, It's a long interval. Therefore, when we use it to compare with the threshold, it is often not timely, which leads to inaccurate judgment and

Re: Re: [PATCH] update bucket_in_use in real time

2017-10-24 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Thanks to Mike and Coly's comments. >> + if(ca->set->avail_nbuckets > 0) { >> + ca->set->avail_nbuckets--; >> + bch_update_bucket_in_use(ca-&

[PATCH] bcache: stop writeback thread after detaching

2017-10-31 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Currently, when a cached device detaching from cache, writeback thread is not stopped, and writeback_rate_update work is not canceled. For example, after bellow command: echo 1 >/sys/block/sdb/bcache/detach you can still see the writeba

Re: Re: [PATCH 04/19] bcache: fix wrong cache_misses statistics

2017-10-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Eric > > I'm waiting to queue this patch pending your response to Coly. Can you > > update the message send a v2? > > Hi Tang, > > Can you to an update message and send this in so we can get the cache miss

Re: Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate

2017-10-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Eric > > I discussed with Tang offline, this patch is correct. But the patch > > commit log should be improved. Now I help to work on it, should be done > > quite soon. > > Has an updated commit log been made? I'

[PATCH] [PATCH V2] bcache: fix wrong cache_misses statistics

2017-10-27 Thread tang . junhui
From: "tang.junhui" Currently, Cache missed IOs are identified by s->cache_miss, but actually, there are many situations that missed IOs are not assigned a value for s->cache_miss in cached_dev_cache_miss(), for example, a bypassed IO (s->iop.bypass = 1), or the cache_bio

Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping

2018-01-04 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, >When cache set is stopping, calculating writeback rate is wast of time. >This is the purpose of the first checking, to avoid unnecessary delay >from bcache_flash_devs_sectors_dirty() inside __update_writeback_rate().

Re: [PATCH v3] bcache: fix writeback target calc on large devices

2018-01-05 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Mike, I thought twice, and feel this patch is a little complex and still not very accurate for small backend devices. I think we can resolve it like this: uint64_t cache_dirty_target = div_u64(cache_sectors * dc->writebac

[PATCH] bcache: fix wrong return value in bch_debug_init()

2017-12-25 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> in bch_debug_init(), ret is always 0, and the return value is useless, change it to return 0 if be success after calling debugfs_create_dir(), else return a non-zero value. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drive

Re: [PATCH v3] bcache: fix writeback target calc on large devices

2018-01-07 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> >On Fri, Jan 5, 2018 at 11:29 PM, <tang.jun...@zte.com.cn> wrote: >> From: Tang Junhui <tang.jun...@zte.com.cn> >> >> Hello Mike, >> >> I thought twice, and feel this patch is a little complex and still

Re: [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used

2018-01-04 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> LGTM. Reviewed-by: Tang Junhui <tang.jun...@zte.com.cn> >Member devices of struct cache_set is used to reference all attached > >bcache devices to this cache set. If it is treated

Re: [PATCH v1 02/10] bcache: set task properly in allocator_wait()

2018-01-04 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> LGTM. Reviewed-by: Tang Junhui <tang.jun...@zte.com.cn> >Kernel thread routine bch_allocator_thread() references macro >allocator_wait() to wait for a condition or quit to do_exit() >when kthread_should_stop() is true. > >M

Re: [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state

2018-01-04 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> LGTM. Reviewed-by: Tang Junhui <tang.jun...@zte.com.cn> >Kernel thread routine bch_writeback_thread() has the following code block, > >452 set_current_state(TASK_INTE

[PATCH] bcache: fix inaccurate io state for detached bcache devices

2018-01-08 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> When we run IO in a detached device, and run iostat to shows IO status, normally it will show like bellow (Omitted some fields): Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util sdd... 15.89 0.531.820.20

[PATCH 1/2] bcache: add journal statistic

2018-01-26 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Sometimes, Journal takes up a lot of CPU, we need statistics to know what's the journal is doing. So this patch provide some journal statistics: 1) reclaim: how many times the journal try to reclaim resource, usually the journal bucket or/and t

Re: [PATCH v2 06/12] bcache: set error_limit correctly

2018-01-16 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly: Then in bch_count_io_errors(), why did us still keep these code: > 92 unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT, > 93 >io_errors); &g

Re: [PATCH v2 06/12] bcache: set error_limit correctly

2018-01-17 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly: >It is because of ca->set->error_decay. When error_decay is set, bcache >tries to do an exponential decay for error count. That is, error numbers >is decaying against the quantity of io count, this is to avoid long tim

Re: [PATCH] bcache: lock in btree_flush_write() to avoid races

2018-01-24 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Kent && Nix > >neither of those locks are needed - rcu_read_lock() isn't needed because we >never >free struct btree (except at shutdown), and we're not derefing journal there __bch_btree_node_write(

Re: [PATCH] bcache: lock in btree_flush_write() to avoid races

2018-01-24 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Kent >The only purpose of rcu_read_lock() would be to ensure the object >isn't freed out from under you. That's not an issue here. > I do not think so. In for_each_cached_btree(), we traverse all btrees by hlist_for

[PATCH] [PATCH v2] bcache: fix for allocator and register thread race

2018-01-25 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> After long time running of random small IO writing, I reboot the machine, and after the machine power on, I found bcache got stuck, the stack is: [root@ceph153 ~]# cat /proc/2510/task/*/stack [] closure_sync+0x25/0x90 [bcache] [] bch_journal+0x118

[PATCH] bcache: fix for allocator and register thread race

2018-01-10 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> After long time run of random small IO writing, reboot the machine, and after the machine power on, bcache got stuck, the stack is: [root@ceph153 ~]# cat /proc/2510/task/*/stack [] closure_sync+0x25/0x90 [bcache] [] bch_journal+0x118/0x2b0 [

how to enlarge value of max_sectors_kb

2018-01-12 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> There is a machine with very little max_sectors_kb size: [root@ceph151 queue]# pwd /sys/block/sdd/queue [root@ceph151 queue]# cat max_hw_sectors_kb 256 [root@ceph151 queue]# cat max_sectors_kb 256 The performance is very low when I run big I/Os.

[PATCH] bcache: fix error return value in memory shrink

2018-01-30 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> In bch_mca_scan(), the return value should not be the number of freed btree nodes, but the number of pages of freed btree nodes. Signed-off-by: Tang Junhui <tang.jun...@zte.com.cn> --- drivers/md/bcache/btree.c | 2 +- 1 file changed, 1 ins

Re: [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly

2018-01-29 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly: OK, I got your point now. Thanks for your patience. And there is a small issue I hope to be modified: +#define BCACHE_DEV_WB_RUNNING4 +#define BCACHE_DEV_RATE_DW_RUNNING8 Would be OK just as: +#define BCACHE_DEV_WB_R

Re: [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly

2018-01-28 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly: This patch is somewhat difficult for me, I think we can resolve it in a simple way. We can take the schedule_delayed_work() under the protection of dc->writeback_lock, and judge if we need re-arm this work to queue. st

[PATCH] bcache: finish incremental GC

2018-01-29 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> In GC thread, we record the latest GC key in gc_done, which is expected to be used for incremental GC, but in currently code, we didn't realize it. When GC runs, front side IO would be blocked until the GC over, it would be a long time if there is

Re: [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly

2018-01-29 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly: There are some differences, Using variable of atomic_t type can not guarantee the atomicity of transaction. for example: A thread runs in update_writeback_rate() update_writeback_rate(){ + if (te

[PATCH] bcache: return attach error when no cache set exist

2018-02-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> I attach a back-end device to a cache set, and the cache set is not registered yet, this back-end device did not attach successfully, and no error returned: [root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/att

[PATCH 2/2] bcache: fix for data collapse after re-attaching an attached device

2018-02-05 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> back-end device sdm has already attached a cache_set with ID f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with another cache set, and it returns with an error: [root]# cd /sys/block/sdm/bcache [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cb

[PATCH v2 2/2] bcache: fix high CPU occupancy during journal

2018-01-31 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> After long time small writing I/O running, we found the occupancy of CPU is very high and I/O performance has been reduced by about half: [root@ceph151 internal]# top top - 15:51:05 up 1 day,2:43, 4 users, load average: 16.89, 15.15, 16.53 Tasks

Re: [PATCH 2/2] bcache: fix high CPU occupancy during journal

2018-01-31 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> > Unfortunately, this doesn't build because of nonexistent call heap_empty > (I assume some changes to util.h got left out). I really need clean > patches that build and are formatted properly. > > Mike Oh, I am so sorry for that

[PATCH] bcache: calculate the number of incremental GC nodes according to the total of btree nodes

2018-02-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> This patch base on "[PATCH] bcache: finish incremental GC". Since incremental GC would stop 100ms when front side I/O comes, so when there are many btree nodes, if GC only processes constant (100) nodes each time, GC would

[PATCH] bcache: fix incorrect sysfs output value of strip size

2018-02-10 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Stripe size is shown as zero when no strip in back end device: [root@ceph132 ~]# cat /sys/block/sdd/bcache/stripe_size 0.0k Actually it should be 1T Bytes (1 << 31 sectors), but in sysfs interface, stripe_size was changed from sectors to byt

Re: [for-416 PATCH v2] bcache: writeback: collapse contiguous IO better

2017-12-28 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> LGTM, I would like it much more if MAX_WRITEBACKS_IN_PASS(5) defines a little bigger value. Reviewed-by: Tang Junhui <tang.jun...@zte.com.cn> >Previously, there was some logic that attempted to immediately issue >writeback of backing

Re: [for-416 PATCH] bcache: fix writeback target calc on large devices

2018-01-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> This patch is useful for preventing the overflow of the expression (cache_dirty_target * bdev_sectors(dc->bdev)), but it also lead into a calc error, for example, when there is a 1G and 100*164G cached device, it would cause the "

Re: [for-416 PATCH 3/3] bcache: allow quick writeback when backing idle

2018-01-01 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> I noticed you add "closure_sync()" before assigning delay to zero in your patch, I think we should add it before: delay = writeback_delay(dc, size) otherwise we would alays get a wrong value of delay after calling writeback_delay(),

Re: [for-416 PATCH 3/3] bcache: allow quick writeback when backing idle

2018-01-02 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> >I don't think so. The thing that is controlled (in current code, and >this patch set) is the rate of issuance, not of completion (though >issuance rate is guaranteed not to exceed completion rate, because of >the semaphore for the m

Re: Re: [for-416 PATCH] bcache: fix writeback target calc on large devices

2018-01-02 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> >Thank you for the feedback. > >On Mon, Jan 1, 2018 at 10:33 PM, <tang.jun...@zte.com.cn> wrote: >> From: Tang Junhui <tang.jun...@zte.com.cn> >> >> This patch is useful for preventing the over

Re: Re: Re: [for-416 PATCH 3/3] bcache: allow quick writeback when backingidle

2018-01-02 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> >On 01/02/2018 12:53 AM, tang.jun...@zte.com.cn wrote: >> If no front-end I/O coming, would this cause write-back IOs one by one >> (one write-back IO issued must after the completion of the previous IO)? >> though with ze

Re: [PATCH] bcache: fix unmatched generic_end_io_acct() _start_io_acct()

2018-01-02 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> >The function cached_dev_make_request() and flash_dev_make_request() call >generic_start_io_acct() with (struct bcache_device)->disk when they start a >closure. Then the function bio_complete() calls generic_end_io_acct() with >(str

Re: [for-416 PATCH 2/3] bcache: writeback: properly order backing device IO

2017-12-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> LGTM, and I tested it, it promotes the write-back performance. [Sorry for the wrong content in the previous email] Reviewed-by: Tang Junhui <tang.jun...@zte.com.cn> Tested-by: Tang Junhui <tang.jun...@zte.com.cn> > Writeback keys a

Re: Re: Re: [for-416 PATCH 1/3] bcache: writeback: collapse contiguous IO better

2017-12-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> >> >> More importantly, >>> +while (!kthread_should_stop() && next) { >>> ... >>> +if (nk != 0 && !keys_contiguous(dc, keys[nk-1], next)) >>> +break; &

Re: [for-416 PATCH 1/3] bcache: writeback: collapse contiguous IO better

2017-12-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> I remember I have reviewed this patch befor, still there is a bug in keys_contiguous(), since KEY_OFFSET(key) stores the end adress of the request IO, so I think we should judge the contiguous of keys as bellow: if (bkey_cmp(>key,

Re: Re: [for-416 PATCH 1/3] bcache: writeback: collapse contiguous IO better

2017-12-27 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> More importantly, > +while (!kthread_should_stop() && next) { > ... > +if (nk != 0 && !keys_contiguous(dc, keys[nk-1], next)) > +break; > + > +size += KEY_SIZE

Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping

2018-01-03 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, Thanks for this serials. >struct delayed_work writeback_rate_update in struct cache_dev is a delayed >worker to call function update_writeback_rate() in period (the interval is >defined by dc->writeback_rate_update

Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping

2018-01-03 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, >dc->writeback_rate_update is a special delayed worker, it re-arms itself >to run after several seconds by, >>> schedule_delayed_work(>writeback_rate_update, >>> dc->writebac

Re: [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier

2018-01-03 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, Thanks for your works! Acctually stopping write-back thread and writeback_rate_update work in bcache_device_detach() has already done in: https://github.com/mlyle/linux/commit/397d02e162b8ee11940a4e9f45e16fee0650d64e Is it nessary

Re: [PATCH v1 09/10] bcache: add io_disable to struct cache_set

2018-01-03 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hello Coly, This patch is great! One tips, Could you replace the c->io_disable with the already exsited c->flags? So we can just need to add a new macro such as CACHE_SET_IO_DISABLE. >When too many I/Os failed on cache device, bch_

Re: Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping

2018-01-04 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> Hi Coly, >It is about an implicit and interesting ordering, a simple patch with a >lot detail behind. Let me explain why it's safe, > >- cancel_delayed_work_sync() is called in bcache_device_detach() when >dc->count is 0. But

[PATCH] bcache: lock in btree_flush_write() to avoid races

2018-01-23 Thread tang . junhui
From: Tang Junhui <tang.jun...@zte.com.cn> In btree_flush_write(), two places need to take a locker to avoid races: Firstly, we need take rcu read locker to protect the bucket_hash traverse, since hlist_for_each_entry_rcu() must be called under the protection of rcu read locker. Second

  1   2   >