subject:"BCache"

Re: [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-19 Thread Ming Lei

On Fri, Nov 16, 2018 at 02:46:45PM +0100, Christoph Hellwig wrote:
> > -   bio_for_each_segment_all(bv, bio, i) {
> > +   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
> 
> This really needs a comment.  Otherwise it looks fine to me.

OK, will do it in next version.

Thanks,
Ming

Re: [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-19 Thread Ming Lei

On Thu, Nov 15, 2018 at 04:44:02PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:58PM +0800, Ming Lei wrote:
> > bch_bio_alloc_pages() is always called on one new bio, so it is safe
> > to access the bvec table directly. Given it is the only kind of this
> > case, open code the bvec table access since bio_for_each_segment_all()
> > will be changed to support for iterating over multipage bvec.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Acked-by: Coly Li 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-btrfs@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-de...@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  drivers/md/bcache/util.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> > index 20eddeac1531..8517aebcda2d 100644
> > --- a/drivers/md/bcache/util.c
> > +++ b/drivers/md/bcache/util.c
> > @@ -270,7 +270,7 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
> > int i;
> > struct bio_vec *bv;
> >  
> > -   bio_for_each_segment_all(bv, bio, i) {
> > +   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
> 
> This is missing an i++.

Good catch, will fix it in next version.

thanks,
Ming

Re: [PATCH 05/12] bcache: convert to bioset_init()/mempool_init()

2018-05-20 Thread Coly Li

On 2018/5/21 6:25 AM, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet <kent.overstr...@gmail.com>

Hi Kent,

This change looks good to me,

Reviewed-by: Coly Li <col...@suse.de>

Thanks.

Coly Li

> ---
>  drivers/md/bcache/bcache.h  | 10 +-
>  drivers/md/bcache/bset.c| 13 -----
>  drivers/md/bcache/bset.h|  2 +-
>  drivers/md/bcache/btree.c   |  4 ++--
>  drivers/md/bcache/io.c      |  4 ++--
>  drivers/md/bcache/request.c | 18 +++++-
>  drivers/md/bcache/super.c   | 38 ++---
>  7 files changed, 37 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 3a0cfb237a..3050438761 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -269,7 +269,7 @@ struct bcache_device {
>   atomic_t*stripe_sectors_dirty;
>   unsigned long   *full_dirty_stripes;
>  
> - struct bio_set  *bio_split;
> + struct bio_set  bio_split;
>  
>   unsigneddata_csum:1;
>  
> @@ -528,9 +528,9 @@ struct cache_set {
>   struct closure  sb_write;
>   struct semaphoresb_write_mutex;
>  
> - mempool_t   *search;
> - mempool_t   *bio_meta;
> - struct bio_set  *bio_split;
> + mempool_t   search;
> + mempool_t   bio_meta;
> + struct bio_set  bio_split;
>  
>   /* For the btree cache */
>   struct shrinker shrink;
> @@ -655,7 +655,7 @@ struct cache_set {
>* A btree node on disk could have too many bsets for an iterator to fit
>* on the stack - have to dynamically allocate them
>*/
> - mempool_t   *fill_iter;
> + mempool_t   fill_iter;
>  
>   struct bset_sort_state  sort;
>  
> diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c
> index 579c696a5f..f3403b45bc 100644
> --- a/drivers/md/bcache/bset.c
> +++ b/drivers/md/bcache/bset.c
> @@ -1118,8 +1118,7 @@ struct bkey *bch_btree_iter_next_filter(struct 
> btree_iter *iter,
>  
>  void bch_bset_sort_state_free(struct bset_sort_state *state)
>  {
> - if (state->pool)
> - mempool_destroy(state->pool);
> + mempool_exit(>pool);
>  }
>  
>  int bch_bset_sort_state_init(struct bset_sort_state *state, unsigned 
> page_order)
> @@ -1129,11 +1128,7 @@ int bch_bset_sort_state_init(struct bset_sort_state 
> *state, unsigned page_order)
>   state->page_order = page_order;
>   state->crit_factor = int_sqrt(1 << page_order);
>  
> - state->pool = mempool_create_page_pool(1, page_order);
> - if (!state->pool)
> - return -ENOMEM;
> -
> - return 0;
> + return mempool_init_page_pool(>pool, 1, page_order);
>  }
>  EXPORT_SYMBOL(bch_bset_sort_state_init);
>  
> @@ -1191,7 +1186,7 @@ static void __btree_sort(struct btree_keys *b, struct 
> btree_iter *iter,
>  
>   BUG_ON(order > state->page_order);
>  
> - outp = mempool_alloc(state->pool, GFP_NOIO);
> + outp = mempool_alloc(>pool, GFP_NOIO);
>   out = page_address(outp);
>   used_mempool = true;
>   order = state->page_order;
> @@ -1220,7 +1215,7 @@ static void __btree_sort(struct btree_keys *b, struct 
> btree_iter *iter,
>   }
>  
>   if (used_mempool)
> - mempool_free(virt_to_page(out), state->pool);
> + mempool_free(virt_to_page(out), >pool);
>   else
>   free_pages((unsigned long) out, order);
>  
> diff --git a/drivers/md/bcache/bset.h b/drivers/md/bcache/bset.h
> index 0c24280f3b..b867f22004 100644
> --- a/drivers/md/bcache/bset.h
> +++ b/drivers/md/bcache/bset.h
> @@ -347,7 +347,7 @@ static inline struct bkey *bch_bset_search(struct 
> btree_keys *b,
>  /* Sorting */
>  
>  struct bset_sort_state {
> - mempool_t   *pool;
> + mempool_t   pool;
>  
>   unsignedpage_order;
>   unsignedcrit_factor;
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 17936b2dc7..2a0968c04e 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -204,7 +204,7 @@ void bch_btree_node_read_done(struct btree *b)
>   struct bset *i = btree_bset_first(b);
>   struct btree_iter *iter;
>  
> - iter = mempool_alloc(b->c->fill_iter, GFP_NOIO);
> + iter = mempool_alloc(>c->fill_iter, GFP_NOIO);
>   iter->size = b->c-&

[PATCH 05/12] bcache: convert to bioset_init()/mempool_init()

2018-05-20 Thread Kent Overstreet

Signed-off-by: Kent Overstreet <kent.overstr...@gmail.com>
---
 drivers/md/bcache/bcache.h  | 10 +-
 drivers/md/bcache/bset.c| 13 -
 drivers/md/bcache/bset.h|  2 +-
 drivers/md/bcache/btree.c   |  4 ++--
 drivers/md/bcache/io.c  |  4 ++--
 drivers/md/bcache/request.c | 18 +-
 drivers/md/bcache/super.c   | 38 ++---
 7 files changed, 37 insertions(+), 52 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 3a0cfb237a..3050438761 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -269,7 +269,7 @@ struct bcache_device {
atomic_t*stripe_sectors_dirty;
unsigned long   *full_dirty_stripes;
 
-   struct bio_set  *bio_split;
+   struct bio_set  bio_split;
 
unsigneddata_csum:1;
 
@@ -528,9 +528,9 @@ struct cache_set {
struct closure  sb_write;
struct semaphoresb_write_mutex;
 
-   mempool_t   *search;
-   mempool_t   *bio_meta;
-   struct bio_set  *bio_split;
+   mempool_t   search;
+   mempool_t   bio_meta;
+   struct bio_set  bio_split;
 
/* For the btree cache */
struct shrinker shrink;
@@ -655,7 +655,7 @@ struct cache_set {
 * A btree node on disk could have too many bsets for an iterator to fit
 * on the stack - have to dynamically allocate them
 */
-   mempool_t   *fill_iter;
+   mempool_t   fill_iter;
 
struct bset_sort_state  sort;
 
diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c
index 579c696a5f..f3403b45bc 100644
--- a/drivers/md/bcache/bset.c
+++ b/drivers/md/bcache/bset.c
@@ -1118,8 +1118,7 @@ struct bkey *bch_btree_iter_next_filter(struct btree_iter 
*iter,
 
 void bch_bset_sort_state_free(struct bset_sort_state *state)
 {
-   if (state->pool)
-   mempool_destroy(state->pool);
+   mempool_exit(>pool);
 }
 
 int bch_bset_sort_state_init(struct bset_sort_state *state, unsigned 
page_order)
@@ -1129,11 +1128,7 @@ int bch_bset_sort_state_init(struct bset_sort_state 
*state, unsigned page_order)
state->page_order = page_order;
state->crit_factor = int_sqrt(1 << page_order);
 
-   state->pool = mempool_create_page_pool(1, page_order);
-   if (!state->pool)
-   return -ENOMEM;
-
-   return 0;
+   return mempool_init_page_pool(>pool, 1, page_order);
 }
 EXPORT_SYMBOL(bch_bset_sort_state_init);
 
@@ -1191,7 +1186,7 @@ static void __btree_sort(struct btree_keys *b, struct 
btree_iter *iter,
 
BUG_ON(order > state->page_order);
 
-   outp = mempool_alloc(state->pool, GFP_NOIO);
+   outp = mempool_alloc(>pool, GFP_NOIO);
out = page_address(outp);
used_mempool = true;
order = state->page_order;
@@ -1220,7 +1215,7 @@ static void __btree_sort(struct btree_keys *b, struct 
btree_iter *iter,
}
 
if (used_mempool)
-   mempool_free(virt_to_page(out), state->pool);
+   mempool_free(virt_to_page(out), >pool);
else
free_pages((unsigned long) out, order);
 
diff --git a/drivers/md/bcache/bset.h b/drivers/md/bcache/bset.h
index 0c24280f3b..b867f22004 100644
--- a/drivers/md/bcache/bset.h
+++ b/drivers/md/bcache/bset.h
@@ -347,7 +347,7 @@ static inline struct bkey *bch_bset_search(struct 
btree_keys *b,
 /* Sorting */
 
 struct bset_sort_state {
-   mempool_t   *pool;
+   mempool_t   pool;
 
unsignedpage_order;
unsigned    crit_factor;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 17936b2dc7..2a0968c04e 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -204,7 +204,7 @@ void bch_btree_node_read_done(struct btree *b)
struct bset *i = btree_bset_first(b);
struct btree_iter *iter;
 
-   iter = mempool_alloc(b->c->fill_iter, GFP_NOIO);
+   iter = mempool_alloc(>c->fill_iter, GFP_NOIO);
iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
iter->used = 0;
 
@@ -271,7 +271,7 @@ void bch_btree_node_read_done(struct btree *b)
bch_bset_init_next(>keys, write_block(b),
   bset_magic(>c->sb));
 out:
-   mempool_free(iter, b->c->fill_iter);
+   mempool_free(iter, >c->fill_iter);
    return;
 err:
set_btree_node_io_error(b);
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 2ddf8515e6..9612873afe 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -17,12 +17,12 @@
 void bch_bbio_free(struct bio *bio, str

Re: [PATCH 08/10] bcache: move closures to lib/

2018-05-18 Thread Christoph Hellwig

On Fri, May 18, 2018 at 03:49:13AM -0400, Kent Overstreet wrote:
> Prep work for bcachefs - being a fork of bcache it also uses closures

Hell no.  This code needs to go away and not actually be promoted to
lib/.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] bcache: optimize continue_at_nobarrier()

2018-05-18 Thread Kent Overstreet

Signed-off-by: Kent Overstreet <kent.overstr...@gmail.com>
---
 drivers/md/bcache/closure.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/closure.h b/drivers/md/bcache/closure.h
index 3b9dfc9962..2392a46bcd 100644
--- a/drivers/md/bcache/closure.h
+++ b/drivers/md/bcache/closure.h
@@ -244,7 +244,7 @@ static inline void closure_queue(struct closure *cl)
 != offsetof(struct work_struct, func));
if (wq) {
INIT_WORK(>work, cl->work.func);
-   BUG_ON(!queue_work(wq, >work));
+   queue_work(wq, >work);
} else
cl->fn(cl);
 }
@@ -337,8 +337,13 @@ do {   
\
  */
 #define continue_at_nobarrier(_cl, _fn, _wq)   \
 do {   \
-   set_closure_fn(_cl, _fn, _wq);  \
-   closure_queue(_cl); \
+   closure_set_ip(_cl);\
+   if (_wq) {  \
+   INIT_WORK(&(_cl)->work, (void *) _fn);  \
+   queue_work((_wq), &(_cl)->work);\
+   } else {\
+   (_fn)(_cl); \
+   }   \
 } while (0)
 
 /**
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/10] bcache: move closures to lib/

2018-05-18 Thread Kent Overstreet

Prep work for bcachefs - being a fork of bcache it also uses closures

Signed-off-by: Kent Overstreet <kent.overstr...@gmail.com>
---
 drivers/md/bcache/Kconfig  | 10 +-
 drivers/md/bcache/Makefile |  6 +++---
 drivers/md/bcache/bcache.h |  2 +-
 drivers/md/bcache/super.c  |  1 -
 drivers/md/bcache/util.h   |  3 +--
 {drivers/md/bcache => include/linux}/closure.h | 17 -
 lib/Kconfig|  3 +++
 lib/Kconfig.debug  |  9 +
 lib/Makefile   |  2 ++
 {drivers/md/bcache => lib}/closure.c   | 17 -
 10 files changed, 36 insertions(+), 34 deletions(-)
 rename {drivers/md/bcache => include/linux}/closure.h (97%)
 rename {drivers/md/bcache => lib}/closure.c (95%)

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index 4d200883c5..45f1094c08 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -1,6 +1,7 @@
 
 config BCACHE
tristate "Block device as cache"
+   select CLOSURES
---help---
Allows a block device to be used as cache for other devices; uses
a btree for indexing and the layout is optimized for SSDs.
@@ -15,12 +16,3 @@ config BCACHE_DEBUG
 
Enables extra debugging tools, allows expensive runtime checks to be
turned on.
-
-config BCACHE_CLOSURES_DEBUG
-   bool "Debug closures"
-   depends on BCACHE
-   select DEBUG_FS
-   ---help---
-   Keeps all active closures in a linked list and provides a debugfs
-   interface to list them, which makes it possible to see asynchronous
-   operations that get stuck.
diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index d26b351958..2b790fb813 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -2,8 +2,8 @@
 
 obj-$(CONFIG_BCACHE)   += bcache.o
 
-bcache-y   := alloc.o bset.o btree.o closure.o debug.o extents.o\
-   io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
-   util.o writeback.o
+bcache-y   := alloc.o bset.o btree.o debug.o extents.o io.o\
+   journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o util.o\
+   writeback.o
 
 CFLAGS_request.o   += -Iblock
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 12e5197f18..d954dc44dd 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -180,6 +180,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,7 +192,6 @@
 
 #include "bset.h"
 #include "util.h"
-#include "closure.h"
 
 struct bucket {
atomic_t    pin;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f2273143b3..5f1ac8e0a3 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2148,7 +2148,6 @@ static int __init bcache_init(void)
mutex_init(_register_lock);
init_waitqueue_head(_wait);
register_reboot_notifier();
-   closure_debug_init();
 
bcache_major = register_blkdev(0, "bcache");
    if (bcache_major < 0) {
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index a6763db7f0..a75523ed0d 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -4,6 +4,7 @@
 #define _BCACHE_UTIL_H
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -12,8 +13,6 @@
 #include 
 #include 
 
-#include "closure.h"
-
 #define PAGE_SECTORS   (PAGE_SIZE / 512)
 
 struct closure;
diff --git a/drivers/md/bcache/closure.h b/include/linux/closure.h
similarity index 97%
rename from drivers/md/bcache/closure.h
rename to include/linux/closure.h
index 2392a46bcd..1072bf2c13 100644
--- a/drivers/md/bcache/closure.h
+++ b/include/linux/closure.h
@@ -154,7 +154,7 @@ struct closure {
 
atomic_tremaining;
 
-#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
+#ifdef CONFIG_DEBUG_CLOSURES
 #define CLOSURE_MAGIC_DEAD 0xc054dead
 #define CLOSURE_MAGIC_ALIVE0xc054a11e
 
@@ -183,15 +183,13 @@ static inline void closure_sync(struct closure *cl)
__closure_sync(cl);
 }
 
-#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
+#ifdef CONFIG_DEBUG_CLOSURES
 
-void closure_debug_init(void);
 void closure_debug_create(struct closure *cl);
 void closure_debug_destroy(struct closure *cl);
 
 #else
 
-static inline void closure_debug_init(void) {}
 static inline void closure_debug_create(struct closure *cl) {}
 static inline void closure_debug_destroy(struct closure *cl) {}
 
@@ -199,21 +197,21 @@ static inline void closure_debug_destroy(struct closure 
*cl) {}
 
 static inline void closure_set_ip(struct closure *cl)
 {
-#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
+#ifdef CONFIG_DEBUG_CLOSURES
cl->ip =

Re: Kernel 4.14 RAID5 multi disk array on bcache not mounting

2017-11-22 Thread Holger Hoffstätte

On 11/21/17 23:22, Lionel Bouton wrote:
> Le 21/11/2017 à 23:04, Andy Leadbetter a écrit :
>> I have a 4 disk array on top of 120GB bcache setup, arranged as follows
> [...]
>> Upgraded today to 4.14.1 from their PPA and the
> 
> 4.14 and 4.14.1 have a nasty bug affecting bcache users. See for example
> :
> https://www.reddit.com/r/linux/comments/7eh2oz/serious_regression_in_linux_414_using_bcache_can/

4.14.2 (just out as rc1) will have the fix.

-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel 4.14 RAID5 multi disk array on bcache not mounting

2017-11-21 Thread Lionel Bouton

Le 21/11/2017 à 23:04, Andy Leadbetter a écrit :
> I have a 4 disk array on top of 120GB bcache setup, arranged as follows
[...]
> Upgraded today to 4.14.1 from their PPA and the

4.14 and 4.14.1 have a nasty bug affecting bcache users. See for example
:
https://www.reddit.com/r/linux/comments/7eh2oz/serious_regression_in_linux_414_using_bcache_can/

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kernel 4.14 RAID5 multi disk array on bcache not mounting

2017-11-21 Thread Andy Leadbetter

I have a 4 disk array on top of 120GB bcache setup, arranged as follows

dev/sda1: UUID="42AE-12E3" TYPE="vfat" PARTLABEL="EFI System"
PARTUUID="d337c56a-fb0f-4e87-8d5f-a89122c81167"
/dev/sda2: UUID="06e3ce52-f34a-409a-a143-3c04f1d334ff" TYPE="ext4"
PARTLABEL="Linux filesystem"
PARTUUID="d2d3fa93-eebf-41ab-8162-d81722bf47ec"
/dev/sda4: UUID="b729c490-81f0-461f-baa2-977af9a7b6d9" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="84548857-f504-440a-857f-c0838c1eb83d"
/dev/sdb1: UUID="6016277c-143d-46b4-ae4e-8565ffc8158f" TYPE="swap"
PARTLABEL="Linux swap" PARTUUID="8692bf67-7271-4bf6-a623-b79d74093f2c"
/dev/sdb2: UUID="bc93c5e2-705a-4cbe-bcd9-7be1181163b2" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="662a450b-3592-4929-9647-8e8a1dedae69"
/dev/sdc1: UUID="9df21d4e-de02-4000-b684-5fb95d4d0492" TYPE="swap"
PARTLABEL="Linux swap" PARTUUID="ed9d7b8e-5480-4e70-b983-1a350ecae38a"
/dev/sdc2: UUID="7d8feaf6-aa6a-4b13-af49-0ad1bd1efb64" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="d343e23a-39ed-4061-80a2-55b66e20ecc1"
/dev/sdd1: UUID="18defba2-594b-402e-b3b2-8e38035c624d" TYPE="swap"
PARTLABEL="Linux swap" PARTUUID="fed9ffd6-0480-4496-8e6d-02d263d719b7"
/dev/sdd2: UUID="be0f0381-0d7e-46c9-ad04-01415bfc6f61" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="8f56de8a-105f-4d56-b699-59e1215b3c6b"
/dev/bcache32: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="731c31f1-51dd-477a-9bd1-fac73d0e6f69" TYPE="btrfs"
/dev/sde: UUID="05514ad3-d90a-4e90-aa11-7c6d34515ca2" TYPE="bcache"
/dev/bcache16: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="79cbcaf1-40b9-4954-a977-537ed3310e76" TYPE="btrfs"
/dev/bcache0: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="42d3a0dd-fbec-4318-9a5b-6d96aa1f6328" TYPE="btrfs"
/dev/bcache48: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="cb7018d6-a27d-493e-b41f-e45c64f6873a" TYPE="btrfs"
/dev/sda3: PARTUUID="d9fa3100-5044-4e10-9f2f-f8037786a43f"


ubuntu 17.10 with PPA Kernels up to 4.13.x all mount this array
perfectly, and the performance of the cache is as expected.

Upgraded today to 4.14.1 from their PPA and the

running btrfs dev scan finds the btrfs filesystem devices bcache16 and
bcache32, bcache0 and bcache48 are not recognised, and thus the file
system will not mount.

according bcache all devices are present, and attached to the cache
device correctly.

btrfs fi on Kernel 4.13 gives

Label: none  uuid: 38d5de43-28fb-40a9-a535-dbf17ff52e75
Total devices 4 FS bytes used 2.03TiB
devid1 size 1.82TiB used 1.07TiB path /dev/bcache16
devid2 size 1.82TiB used 1.07TiB path /dev/bcache32
devid3 size 1.82TiB used 1.07TiB path /dev/bcache0

devid4 size 1.82TiB used 1.07TiB path /dev/bcache48


Where do I start in debugging this?

btrfs-progs v4.12

btrfs fi df /

Data, RAID5: total=3.20TiB, used=2.02TiB
System, RAID5: total=192.00MiB, used=288.00KiB
Metadata, RAID5: total=6.09GiB, used=3.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

There are no errors in the dmesg that I can see from btrfs scan,
simply the two devices are not found.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-27 Thread Austin S. Hemmelgarn


On 2017-09-26 18:46, Ferry Toth wrote:

Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:


On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article=linux414-bcache-
raid=2

I think it might be idle hopes to think bcache can be used as a ssd
cache for btrfs to significantly improve performance.. True, the
benchmark is using ext.

It's a benchmark.  They're inherently synthetic and workload specific,
and therefore should not be trusted to represent things accurately for
arbitrary use cases.


So what. A decent benchmark tries to measure a specific aspect of the fs.
Yes, and it usually measures it using a ridiculously unrealistic 
workload.  Some of the benchmarks in iozone are a good example of this, 
like the backwards read one (there is nearly nothing that it provides 
any useful data for).  For a benchmark to be meaningful, you have to 
test what you actually intend to use, and from a practical perspective, 
that article is primarily testing throughput, which is not something you 
should be using SSD caching for.


I think you agree that applications doing lots of fsyncs (databases,
dpkg) are slow on btrfs especially on hdd's, whatever way you measure
that (it feels slow, it measures slow, it really is slow).
Yes, but they're also slow on _everything_.  fsync() is slow.  Period. 
It just more of an issue on BTRFS because it's a CoW filesystem _and_ 
it's slower than ext4 even with that CoW layer bypassed.


On a ssd the problem is less.
And most of that is a result of the significantly higher bulk throughput 
on the SSD, which is not something that SSD caching replicates.


So if you can fix that by using a ssd cache or a hybrid solution, how
would you like to compare that? It _feels_ faster?
That depends.  If it's on a desktop, then that actually is one of the 
best ways to test it, since user perception is your primary quality 
metric (you can make the fastest system in the world, but if the user 
can't tell, you've gained nothing).  If you're on anything else, you 
test the actual workload if possible, and a benchmark that tries to 
replicate the workload if not.  Put another way, if you're building a 
PGSQL server, you should be bench-marking things with a PGSQL 
bench-marking tool, not some arbitrary that likely won't replicate a 
PGSQL workload.



But the most important one (where btrfs always shows to be a little
slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.

And what makes you think it will be?  You're using it as a hot-data
cache, not a dedicated write-back cache, and you have the overhead from
bcache itself too.  Just some simple math based on examining the bcache
code suggests you can't get better than about 98% of the SSD's
performance if you're lucky, and I'd guess it's more like 80% most of
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on
the fs is used.

Actually, the improvement from using bcache with BTRFS is higher
proportionate to the baseline of not using it by a small margin than it
is when used with ext4.  BTRFS does a lot more with the disk, so you
have a lot more time spent accessing the disk, and thus more time that
can be reduced by improving disk performance.  While the CoW nature of
BTRFS does somewhat mitigate the performance improvement from using
bcache, it does not completely negate it.


I would like to reverse this, how much degradation do you suffer from
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.
Performance-wise?  It's workload dependent, but in most case it's a hit 
regardless of if you're using BTRFS or some other filesystem.


If instead you're asking what the difference in device longevity, you 
can probably expect the SSD to wear out faster in the second case. 
Unless you have a reasonably big SSD and are using write-around caching, 
every write will hit the SSD too, and you'll end up with lots of 
rewrites on the SSD.


IMHO you are hoping to get ssd performance at hdd cost.
Then you're looking at the wrong tool.  The primary use cases for SSD 
caching are smoothing latency and improving interactivity by reducing 
head movement.  Any other measure of performance is pretty much 
guaranteed to be worse with SSD caching than just using an SSD, and bulk 
throughput is often just as bad as, if not worse than, using a regular 
HDD by itself.


If you are that desperate for performance like an SSD, quit whining 
about cost and just buy an SSD.  Decent ones are down to less than 0.40 
USD per GB depending on the brand (search 'Crucial MX300' on Amazon if 
you want an example), so the cost isn't nearly as bad as people make it 
out to be, especially considering that most the time a normal person who 
isn't doing multimedia work

Re: Give up on bcache?

2017-09-26 Thread Ferry Toth

Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:

> On 2017-09-26 12:50, Ferry Toth wrote:
>> Looking at the Phoronix benchmark here:
>> 
>> https://www.phoronix.com/scan.php?page=article=linux414-bcache-
>> raid=2
>> 
>> I think it might be idle hopes to think bcache can be used as a ssd
>> cache for btrfs to significantly improve performance.. True, the
>> benchmark is using ext.
> It's a benchmark.  They're inherently synthetic and workload specific,
> and therefore should not be trusted to represent things accurately for
> arbitrary use cases.

So what. A decent benchmark tries to measure a specific aspect of the fs.

I think you agree that applications doing lots of fsyncs (databases, 
dpkg) are slow on btrfs especially on hdd's, whatever way you measure 
that (it feels slow, it measures slow, it really is slow).

On a ssd the problem is less.

So if you can fix that by using a ssd cache or a hybrid solution, how 
would you like to compare that? It _feels_ faster?

>> But the most important one (where btrfs always shows to be a little
>> slow)
>> would be the SQLLite test. And with ext at least performance _degrades_
>> except for the Writeback mode, and even there is nowhere near what the
>> SSD is capable of.
> And what makes you think it will be?  You're using it as a hot-data
> cache, not a dedicated write-back cache, and you have the overhead from
> bcache itself too.  Just some simple math based on examining the bcache
> code suggests you can't get better than about 98% of the SSD's
> performance if you're lucky, and I'd guess it's more like 80% most of
> the time.
>> 
>> I think with btrfs it will be even worse and that it is a fundamental
>> problem: caching is complex and the cache can not how how the data on
>> the fs is used.
> Actually, the improvement from using bcache with BTRFS is higher
> proportionate to the baseline of not using it by a small margin than it
> is when used with ext4.  BTRFS does a lot more with the disk, so you
> have a lot more time spent accessing the disk, and thus more time that
> can be reduced by improving disk performance.  While the CoW nature of
> BTRFS does somewhat mitigate the performance improvement from using
> bcache, it does not completely negate it.

I would like to reverse this, how much degradation do you suffer from 
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.

IMHO you are hoping to get ssd performance at hdd cost.  

>> I think the original idea of hot data tracking has a much better chance
>> to significantly improve performance. This of course as the SSD's and
>> HDD's then will be equal citizens and btrfs itself gets to decide on
>> which drive the data is best stored.
> First, the user needs to decide, not BTRFS (at least, by default, BTRFS
> should not be involved in the decision).  Second, tiered storage (that's
> what that's properly called) is mostly orthogonal to caching (though
> bcache and dm-cache behave like tiered storage once the cache is
> warmed).

So, on your desktop you really are going to seach for all sqllite, mysql 
and psql files, dpkg files etc. and move them to the ssd? You can already 
do that. Go ahead! 

The big win would be if the file system does that automatically for you.

>> With this implemented right, it would also finally silence the never
>> ending discussion why not btrfs and why zfs, ext, xfs etc. Which would
>> be a plus by its own right.
> Even with this, there would still be plenty of reasons to pick one of
> those filesystems over BTRFS.  There would however be one more reason to
> pick BTRFS over ext or XFS (but necessarily not ZFS, it already has
> caching built in).

Exactly, one more advantage of btrfs and one less of zfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Adam Borowski

On Tue, Sep 26, 2017 at 11:33:19PM +0500, Roman Mamedov wrote:
> On Tue, 26 Sep 2017 16:50:00 + (UTC)
> Ferry Toth <ft...@telfort.nl> wrote:
> 
> > https://www.phoronix.com/scan.php?page=article=linux414-bcache-
> > raid=2
> > 
> > I think it might be idle hopes to think bcache can be used as a ssd cache 
> > for btrfs to significantly improve performance..
> 
> My personal real-world experience shows that SSD caching -- with lvmcache --
> does indeed significantly improve performance of a large Btrfs filesystem with
> slowish base storage.
> 
> And that article, sadly, only demonstrates once again the general mediocre
> quality of Phoronix content: it is an astonishing oversight to not check out
> lvmcache in the same setup, to at least try to draw some useful conclusion, is
> it Bcache that is strangely deficient, or SSD caching as a general concept
> does not work well in the hardware setup utilized.

Also, it looks as if Phoronix' tests don't stress metadata at all.  Btrfs is
all about metadata, speeding it up greatly helps most workloads.

A pipe-dream wishlist would be:
* store and access master copy of metadata on SSD only
* pin all data blocks referenced by generations not yet mirrored
* slowly copy over metadata to HDD

-- 
⢀⣴⠾⠻⢶⣦⠀ We domesticated dogs 36000 years ago; together we chased
⣾⠁⢰⠒⠀⣿⡁ animals, hung out and licked or scratched our private parts.
⢿⡄⠘⠷⠚⠋⠀ Cats domesticated us 9500 years ago, and immediately we got
⠈⠳⣄ agriculture, towns then cities. -- whitroth on /.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Austin S. Hemmelgarn


On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article=linux414-bcache-
raid=2

I think it might be idle hopes to think bcache can be used as a ssd cache
for btrfs to significantly improve performance.. True, the benchmark is
using ext.
It's a benchmark.  They're inherently synthetic and workload specific, 
and therefore should not be trusted to represent things accurately for 
arbitrary use cases.


But the most important one (where btrfs always shows to be a little slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.
And what makes you think it will be?  You're using it as a hot-data 
cache, not a dedicated write-back cache, and you have the overhead from 
bcache itself too.  Just some simple math based on examining the bcache 
code suggests you can't get better than about 98% of the SSD's 
performance if you're lucky, and I'd guess it's more like 80% most of 
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on the
fs is used.
Actually, the improvement from using bcache with BTRFS is higher 
proportionate to the baseline of not using it by a small margin than it 
is when used with ext4.  BTRFS does a lot more with the disk, so you 
have a lot more time spent accessing the disk, and thus more time that 
can be reduced by improving disk performance.  While the CoW nature of 
BTRFS does somewhat mitigate the performance improvement from using 
bcache, it does not completely negate it.


I think the original idea of hot data tracking has a much better chance
to significantly improve performance. This of course as the SSD's and
HDD's then will be equal citizens and btrfs itself gets to decide on
which drive the data is best stored.
First, the user needs to decide, not BTRFS (at least, by default, BTRFS 
should not be involved in the decision).  Second, tiered storage (that's 
what that's properly called) is mostly orthogonal to caching (though 
bcache and dm-cache behave like tiered storage once the cache is warmed).


With this implemented right, it would also finally silence the never
ending discussion why not btrfs and why zfs, ext, xfs etc. Which would be
a plus by its own right.
Even with this, there would still be plenty of reasons to pick one of 
those filesystems over BTRFS.  There would however be one more reason to 
pick BTRFS over ext or XFS (but necessarily not ZFS, it already has 
caching built in).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Kai Krakow

Am Tue, 26 Sep 2017 23:33:19 +0500
schrieb Roman Mamedov <r...@romanrm.net>:

> On Tue, 26 Sep 2017 16:50:00 + (UTC)
> Ferry Toth <ft...@telfort.nl> wrote:
> 
> > https://www.phoronix.com/scan.php?page=article=linux414-bcache-
> > raid=2
> > 
> > I think it might be idle hopes to think bcache can be used as a ssd
> > cache for btrfs to significantly improve performance..  
> 
> My personal real-world experience shows that SSD caching -- with
> lvmcache -- does indeed significantly improve performance of a large
> Btrfs filesystem with slowish base storage.
> 
> And that article, sadly, only demonstrates once again the general
> mediocre quality of Phoronix content: it is an astonishing oversight
> to not check out lvmcache in the same setup, to at least try to draw
> some useful conclusion, is it Bcache that is strangely deficient, or
> SSD caching as a general concept does not work well in the hardware
> setup utilized.

Bcache is actually not meant to increase benchmark performance except
for very few corner cases. It is designed to improve interactivity and
perceived performance, reducing head movements. On the bcache homepage
there's actually tips on how to benchmark bcache correctly, including
warm-up phase and turning on sequential caching. Phoronix doesn't do
that, they test default settings, which is imho a good thing but you
should know the consequences and research how to turn the knobs.

Depending on the caching mode and cache size, the SQlite test may not
show real-world numbers. Also, you should optimize some btrfs options
to work correctly with bcache, e.g. force it to mount "nossd" as it
detects the bcache device as SSD - which is wrong for some workloads, I
think especially desktop workloads and most server workloads.

Also, you may want to tune udev to correct some attributes so other
applications can do their detection and behavior correctly, too:

$ cat /etc/udev/rules.d/00-ssd-scheduler.rules
ACTION=="add|change", KERNEL=="bcache*", ATTR{queue/rotational}="1"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", 
ATTR{queue/iosched/slice_idle}="0"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", 
ATTR{queue/scheduler}="kyber"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", 
ATTR{queue/scheduler}="bfq"

Take note: on a non-mq system you may want to use noop/deadline/cfq
instead of kyber/bfq.

I'm running bcache since over two years now and the performance
improvement is very very high with boot times going down to 30-40s from
3+ minutes previously, faster app startup times (almost instantly like
on SSD), reduced noise by reduced head movements, etc. Also, it has
easy setup (no split metadata/data cache, you can attach more than one
device to a single cache), and it is rocksolid even when crashing the
system.

Bcache learns by using LRU for caching: What you don't need will be
pushed out of cache over time, what you use, stays. This is actually a
lot like "hot data caching". Given a big enough cache, everything of
your daily needs would stay in cache, easily achieving hit ratios
around 90%. Since sequential access is bypassed, you don't have to
worry to flush the cache with large copy operations.

My system uses a 512G SSD with 400G dedicated to bcache, attached to 3x
1TB HDD draid0 mraid1 btrfs, filled with 2TB of net data and daily
backups using borgbackup. Bcache runs in writeback mode, the backup
takes around 15 minutes each night to dig through all data and stores
it to an internal intermediate backup also on bcache (xfs, write-around
mode). Currently not implemented, this intermediate backup will later
be mirrored to external, off-site location.

Some of the rest of the SSD is EFI-ESP, some swap space, and
over-provisioned area to keep bcache performance high.

$ uptime && bcache-status
 21:28:44 up 3 days, 20:38,  3 users,  load average: 1,18, 1,44, 2,14
--- bcache ---
UUIDaacfbcd9-dae5-4377-92d1-6808831a4885
Block Size  4.00 KiB
Bucket Size 512.00 KiB
Congested?  False
Read Congestion 2.0ms
Write Congestion20.0ms
Total Cache Size400 GiB
Total Cache Used400 GiB (100%)
Total Cache Unused  0 B (0%)
Evictable Cache 396 GiB (99%)
Replacement Policy  [lru] fifo random
Cache Mode  (Various)
Total Hits  2364518 (89%)
Total Misses290764
Total Bypass Hits   4284468 (100%)
Total Bypass Misses 0
Total Bypassed  215 GiB

The bucket size and block size was chosen to best fit with Samsung TLC
arrangement. But this is pure theory, I

Re: Give up on bcache?

2017-09-26 Thread Roman Mamedov

On Tue, 26 Sep 2017 16:50:00 + (UTC)
Ferry Toth <ft...@telfort.nl> wrote:

> https://www.phoronix.com/scan.php?page=article=linux414-bcache-
> raid=2
> 
> I think it might be idle hopes to think bcache can be used as a ssd cache 
> for btrfs to significantly improve performance..

My personal real-world experience shows that SSD caching -- with lvmcache --
does indeed significantly improve performance of a large Btrfs filesystem with
slowish base storage.

And that article, sadly, only demonstrates once again the general mediocre
quality of Phoronix content: it is an astonishing oversight to not check out
lvmcache in the same setup, to at least try to draw some useful conclusion, is
it Bcache that is strangely deficient, or SSD caching as a general concept
does not work well in the hardware setup utilized.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Give up on bcache?

2017-09-26 Thread Ferry Toth

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article=linux414-bcache-
raid=2

I think it might be idle hopes to think bcache can be used as a ssd cache 
for btrfs to significantly improve performance.. True, the benchmark is 
using ext.

But the most important one (where btrfs always shows to be a little slow) 
would be the SQLLite test. And with ext at least performance _degrades_ 
except for the Writeback mode, and even there is nowhere near what the 
SSD is capable of.

I think with btrfs it will be even worse and that it is a fundamental 
problem: caching is complex and the cache can not how how the data on the 
fs is used.

I think the original idea of hot data tracking has a much better chance 
to significantly improve performance. This of course as the SSD's and 
HDD's then will be equal citizens and btrfs itself gets to decide on 
which drive the data is best stored.

With this implemented right, it would also finally silence the never 
ending discussion why not btrfs and why zfs, ext, xfs etc. Which would be 
a plus by its own right.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-12-01 Thread Eric Wheeler

On Wed, 30 Nov 2016, Marc MERLIN wrote:
> On Wed, Nov 30, 2016 at 03:57:28PM -0800, Eric Wheeler wrote:
> > > I'll start another separate thread with the btrfs folks on how much
> > > pressure is put on the system, but on your side it would be good to help
> > > ensure that bcache doesn't crash the system altogether if too many
> > > requests are allowed to pile up.
> > 
> > Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk 
> > writes at the request queue on its way to the spinning disk or SSD:
> > http://algo.ing.unimo.it/people/paolo/disk_sched/
> > 
> > use the latest BFQ git here, merge it into v4.8.y:
> > https://github.com/linusw/linux-bfq/commits/bfq-v8
> > 
> > This doesn't completely fix the dirty_ration problem, but it is far better 
> > than CFQ or deadline in my opinion (and experience).
> 
> That's good to know thanks.
> But for my uninformed opinion, is there anything bcache can do to throttle
> incoming requests if they are piling up, or they're coming from producers
> upstream and bcache has no choice but try and process them as quickly as
> possible without a way to block the sender if too many are coming?

Not really.  The congestion isn't in bcache, its at the disk queue beyond 
bcache, but userspace processes are blocked by the (huge) pagecache dirty 
writeback which happens before bcache gets it and must complete before 
userspace may proceed: 

fs -> pagecache -> bcache -> {ssd,disk}  

The real issue is that the dirty page cache gets really big, flushes, 
waits for downstream devices (bcache->ssd,disk) to finish, and then 
returns to userspace.  The only way to limit dirty cache are those options 
that Linus mentioned.

BFQ can help for processes not tied to the flush because it may re-order 
other process requests ahead of the big flush---so even though a big flush 
is happening and that process is stalled, others might proceed without 
delay.

See this thread, too:

https://groups.google.com/forum/#!msg/bfq-iosched/M2M_UhbC05A/hf6Ni9JbAQAJ

--
Eric Wheeler



> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>   .... what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-12-01 Thread Janos Toth F.

Is there any fundamental reason not to support huge writeback caches?
(I mean, besides working around bugs and/or questionably poor design
choices which no one wishes to fix.)
The obvious drawback is the increased risk of data loss upon hardware
failure or kernel panic but why couldn't the user be allowed to draw
the line between probability of data loss and potential performance
gains?

The last time I changed hardware, I put double the amount of RAM into
my little home server for the sole reason to use a relatively huge
cache, especially a huge writeback cache. Although I realized it soon
enough that writeback ratios like 20/45 will make the system unstable
(OOM reaping) even if ~90% of the memory is theoretically free = used
as some form of cache, read or write, depending on this ratio
parameter and I ended up below the default to get rid of The Reaper.

My plan was to try and decrease the fragmentation of files which are
created by dumping several parallel real-time video streams into
separate files (and also minimize the HDD head seeks due to that).
(The computer in question is on a UPS.)

On Thu, Dec 1, 2016 at 4:49 PM, Michal Hocko <mho...@kernel.org> wrote:
> On Wed 30-11-16 10:16:53, Marc MERLIN wrote:
>> +folks from linux-mm thread for your suggestion
>>
>> On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote:
>> > > swraid5 < bcache < dmcrypt < btrfs
>> > >
>> > > Copying with btrfs send/receive causes massive hangs on the system.
>> > > Please see this explanation from Linus on why the workaround was
>> > > suggested:
>> > > https://lkml.org/lkml/2016/11/29/667
>> > And Linux' assessment is absolutely correct (at least, the general
>> > assessment is, I have no idea about btrfs_start_shared_extent, but I'm more
>> > than willing to bet he's correct that that's the culprit).
>>
>> > > All of this mostly went away with Linus' suggestion:
>> > > echo 2 > /proc/sys/vm/dirty_ratio
>> > > echo 1 > /proc/sys/vm/dirty_background_ratio
>> > >
>> > > But that's hiding the symptom which I think is that btrfs is piling up 
>> > > too many I/O
>> > > requests during btrfs send/receive and btrfs scrub (probably balance 
>> > > too) and not
>> > > looking at resulting impact to system health.
>>
>> > I see pretty much identical behavior using any number of other storage
>> > configurations on a USB 2.0 flash drive connected to a system with 16GB of
>> > RAM with the default dirty ratios because it's trying to cache up to 3.2GB
>> > of data for writeback.  While BTRFS is doing highly sub-optimal things 
>> > here,
>> > the ancient default writeback ratios are just as much a culprit.  I would
>> > suggest that get changed to 200MB or 20% of RAM, whichever is smaller, 
>> > which
>> > would give overall almost identical behavior to x86-32, which in turn works
>> > reasonably well for most cases.  I sadly don't have the time, patience, or
>> > expertise to write up such a patch myself though.
>>
>> Dear linux-mm folks, is that something you could consider (changing the
>> dirty_ratio defaults) given that it affects at least bcache and btrfs
>> (with or without bcache)?
>
> As much as the dirty_*ratio defaults a major PITA this is not something
> that would be _easy_ to change without high risks of regressions. This
> topic has been discussed many times with many good ideas, nothing really
> materialized from them though :/
>
> To be honest I really do hate dirty_*ratio and have seen many issues on
> very large machines and always suggested to use dirty_bytes instead but
> a particular value has always been a challenge to get right. It has
> always been very workload specific.
>
> That being said this is something more for IO people than MM IMHO.
>
> --
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-12-01 Thread Michal Hocko

On Wed 30-11-16 10:16:53, Marc MERLIN wrote:
> +folks from linux-mm thread for your suggestion
> 
> On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote:
> > > swraid5 < bcache < dmcrypt < btrfs
> > > 
> > > Copying with btrfs send/receive causes massive hangs on the system.
> > > Please see this explanation from Linus on why the workaround was
> > > suggested:
> > > https://lkml.org/lkml/2016/11/29/667
> > And Linux' assessment is absolutely correct (at least, the general
> > assessment is, I have no idea about btrfs_start_shared_extent, but I'm more
> > than willing to bet he's correct that that's the culprit).
> 
> > > All of this mostly went away with Linus' suggestion:
> > > echo 2 > /proc/sys/vm/dirty_ratio
> > > echo 1 > /proc/sys/vm/dirty_background_ratio
> > > 
> > > But that's hiding the symptom which I think is that btrfs is piling up 
> > > too many I/O
> > > requests during btrfs send/receive and btrfs scrub (probably balance too) 
> > > and not
> > > looking at resulting impact to system health.
> 
> > I see pretty much identical behavior using any number of other storage
> > configurations on a USB 2.0 flash drive connected to a system with 16GB of
> > RAM with the default dirty ratios because it's trying to cache up to 3.2GB
> > of data for writeback.  While BTRFS is doing highly sub-optimal things here,
> > the ancient default writeback ratios are just as much a culprit.  I would
> > suggest that get changed to 200MB or 20% of RAM, whichever is smaller, which
> > would give overall almost identical behavior to x86-32, which in turn works
> > reasonably well for most cases.  I sadly don't have the time, patience, or
> > expertise to write up such a patch myself though.
> 
> Dear linux-mm folks, is that something you could consider (changing the
> dirty_ratio defaults) given that it affects at least bcache and btrfs
> (with or without bcache)?

As much as the dirty_*ratio defaults a major PITA this is not something
that would be _easy_ to change without high risks of regressions. This
topic has been discussed many times with many good ideas, nothing really
materialized from them though :/

To be honest I really do hate dirty_*ratio and have seen many issues on
very large machines and always suggested to use dirty_bytes instead but
a particular value has always been a challenge to get right. It has
always been very workload specific.

That being said this is something more for IO people than MM IMHO.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-12-01 Thread Austin S. Hemmelgarn


On 2016-11-30 19:48, Chris Murphy wrote:

On Wed, Nov 30, 2016 at 4:57 PM, Eric Wheeler <bca...@lists.ewheeler.net> wrote:

On Wed, 30 Nov 2016, Marc MERLIN wrote:

+btrfs mailing list, see below why

On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:

On Mon, 27 Nov 2016, Coly Li wrote:


Yes, too many work queues... I guess the locking might be caused by some
very obscure reference of closure code. I cannot have any clue if I
cannot find a stable procedure to reproduce this issue.

Hmm, if there is a tool to clone all the meta data of the back end cache
and whole cached device, there might be a method to replay the oops much
easier.

Eric, do you have any hint ?


Note that the backing device doesn't have any metadata, just a superblock.
You can easily dd that off onto some other volume without transferring the
data. By default, data starts at 8k, or whatever you used in `make-bcache
-w`.


Ok, Linus helped me find a workaround for this problem:
https://lkml.org/lkml/2016/11/29/667
namely:
   echo 2 > /proc/sys/vm/dirty_ratio
   echo 1 > /proc/sys/vm/dirty_background_ratio
(it's a 24GB system, so the defaults of 20 and 10 were creating too many
requests in th buffers)

Note that this is only a workaround, not a fix.

When I did this and re tried my big copy again, I still got 100+ kernel
work queues, but apparently the underlying swraid5 was able to unblock
and satisfy the write requests before too many accumulated and crashed
the kernel.

I'm not a kernel coder, but seems to me that bcache needs a way to
throttle incoming requests if there are too many so that it does not end
up in a state where things blow up due to too many piled up requests.

You should be able to reproduce this by taking 5 spinning rust drives,
put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
I used btrfs) and send lots of requests.
Actually to be honest, the problems have mostly been happening when I do
btrfs scrub and btrfs send/receive which both generate I/O from within
the kernel instead of user space.
So here, btrfs may be a contributor to the problem too, but while btrfs
still trashes my system if I remove the caching device on bcache (and
with the default dirty ratio values), it doesn't crash the kernel.

I'll start another separate thread with the btrfs folks on how much
pressure is put on the system, but on your side it would be good to help
ensure that bcache doesn't crash the system altogether if too many
requests are allowed to pile up.



Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk
writes at the request queue on its way to the spinning disk or SSD:
http://algo.ing.unimo.it/people/paolo/disk_sched/

use the latest BFQ git here, merge it into v4.8.y:
https://github.com/linusw/linux-bfq/commits/bfq-v8

This doesn't completely fix the dirty_ration problem, but it is far better
than CFQ or deadline in my opinion (and experience).


There are several threads over the past year with users having
problems no one else had previously reported, and they were using BFQ.
But there's no evidence whether BFQ was the cause, or exposing some
existing bug that another scheduler doesn't. Anyway, I'd say using an
out of tree scheduler means higher burden of testing and skepticism.
Normally I'd agree on this, but BFQ is a bit of a different situation 
from usual because:
1. 90% of the reason that BFQ isn't in mainline is that the block 
maintainers have declared the legacy (non blk-mq) code deprecated and 
refuse to take anything new there despite having absolutely zero 
scheduling in blk-mq.
2. It's been around for years with hundreds of thousands of users over 
the years who have had no issues with it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Chris Murphy

On Wed, Nov 30, 2016 at 4:57 PM, Eric Wheeler <bca...@lists.ewheeler.net> wrote:
> On Wed, 30 Nov 2016, Marc MERLIN wrote:
>> +btrfs mailing list, see below why
>>
>> On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
>> > On Mon, 27 Nov 2016, Coly Li wrote:
>> > >
>> > > Yes, too many work queues... I guess the locking might be caused by some
>> > > very obscure reference of closure code. I cannot have any clue if I
>> > > cannot find a stable procedure to reproduce this issue.
>> > >
>> > > Hmm, if there is a tool to clone all the meta data of the back end cache
>> > > and whole cached device, there might be a method to replay the oops much
>> > > easier.
>> > >
>> > > Eric, do you have any hint ?
>> >
>> > Note that the backing device doesn't have any metadata, just a superblock.
>> > You can easily dd that off onto some other volume without transferring the
>> > data. By default, data starts at 8k, or whatever you used in `make-bcache
>> > -w`.
>>
>> Ok, Linus helped me find a workaround for this problem:
>> https://lkml.org/lkml/2016/11/29/667
>> namely:
>>echo 2 > /proc/sys/vm/dirty_ratio
>>echo 1 > /proc/sys/vm/dirty_background_ratio
>> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
>> requests in th buffers)
>>
>> Note that this is only a workaround, not a fix.
>>
>> When I did this and re tried my big copy again, I still got 100+ kernel
>> work queues, but apparently the underlying swraid5 was able to unblock
>> and satisfy the write requests before too many accumulated and crashed
>> the kernel.
>>
>> I'm not a kernel coder, but seems to me that bcache needs a way to
>> throttle incoming requests if there are too many so that it does not end
>> up in a state where things blow up due to too many piled up requests.
>>
>> You should be able to reproduce this by taking 5 spinning rust drives,
>> put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
>> I used btrfs) and send lots of requests.
>> Actually to be honest, the problems have mostly been happening when I do
>> btrfs scrub and btrfs send/receive which both generate I/O from within
>> the kernel instead of user space.
>> So here, btrfs may be a contributor to the problem too, but while btrfs
>> still trashes my system if I remove the caching device on bcache (and
>> with the default dirty ratio values), it doesn't crash the kernel.
>>
>> I'll start another separate thread with the btrfs folks on how much
>> pressure is put on the system, but on your side it would be good to help
>> ensure that bcache doesn't crash the system altogether if too many
>> requests are allowed to pile up.
>
>
> Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk
> writes at the request queue on its way to the spinning disk or SSD:
> http://algo.ing.unimo.it/people/paolo/disk_sched/
>
> use the latest BFQ git here, merge it into v4.8.y:
> https://github.com/linusw/linux-bfq/commits/bfq-v8
>
> This doesn't completely fix the dirty_ration problem, but it is far better
> than CFQ or deadline in my opinion (and experience).

There are several threads over the past year with users having
problems no one else had previously reported, and they were using BFQ.
But there's no evidence whether BFQ was the cause, or exposing some
existing bug that another scheduler doesn't. Anyway, I'd say using an
out of tree scheduler means higher burden of testing and skepticism.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 03:57:28PM -0800, Eric Wheeler wrote:
> > I'll start another separate thread with the btrfs folks on how much
> > pressure is put on the system, but on your side it would be good to help
> > ensure that bcache doesn't crash the system altogether if too many
> > requests are allowed to pile up.
> 
> Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk 
> writes at the request queue on its way to the spinning disk or SSD:
>   http://algo.ing.unimo.it/people/paolo/disk_sched/
> 
> use the latest BFQ git here, merge it into v4.8.y:
>   https://github.com/linusw/linux-bfq/commits/bfq-v8
> 
> This doesn't completely fix the dirty_ration problem, but it is far better 
> than CFQ or deadline in my opinion (and experience).

That's good to know thanks.
But for my uninformed opinion, is there anything bcache can do to throttle
incoming requests if they are piling up, or they're coming from producers
upstream and bcache has no choice but try and process them as quickly as
possible without a way to block the sender if too many are coming?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Eric Wheeler

On Wed, 30 Nov 2016, Marc MERLIN wrote:
> +btrfs mailing list, see below why
> 
> On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> > On Mon, 27 Nov 2016, Coly Li wrote:
> > > 
> > > Yes, too many work queues... I guess the locking might be caused by some
> > > very obscure reference of closure code. I cannot have any clue if I
> > > cannot find a stable procedure to reproduce this issue.
> > > 
> > > Hmm, if there is a tool to clone all the meta data of the back end cache
> > > and whole cached device, there might be a method to replay the oops much
> > > easier.
> > > 
> > > Eric, do you have any hint ?
> > 
> > Note that the backing device doesn't have any metadata, just a superblock. 
> > You can easily dd that off onto some other volume without transferring the 
> > data. By default, data starts at 8k, or whatever you used in `make-bcache 
> > -w`.
> 
> Ok, Linus helped me find a workaround for this problem:
> https://lkml.org/lkml/2016/11/29/667
> namely:
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio
> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
> requests in th buffers)
> 
> Note that this is only a workaround, not a fix.
> 
> When I did this and re tried my big copy again, I still got 100+ kernel
> work queues, but apparently the underlying swraid5 was able to unblock
> and satisfy the write requests before too many accumulated and crashed
> the kernel.
> 
> I'm not a kernel coder, but seems to me that bcache needs a way to
> throttle incoming requests if there are too many so that it does not end
> up in a state where things blow up due to too many piled up requests.
> 
> You should be able to reproduce this by taking 5 spinning rust drives,
> put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
> I used btrfs) and send lots of requests.
> Actually to be honest, the problems have mostly been happening when I do
> btrfs scrub and btrfs send/receive which both generate I/O from within
> the kernel instead of user space.
> So here, btrfs may be a contributor to the problem too, but while btrfs
> still trashes my system if I remove the caching device on bcache (and
> with the default dirty ratio values), it doesn't crash the kernel.
> 
> I'll start another separate thread with the btrfs folks on how much
> pressure is put on the system, but on your side it would be good to help
> ensure that bcache doesn't crash the system altogether if too many
> requests are allowed to pile up.


Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk 
writes at the request queue on its way to the spinning disk or SSD:
http://algo.ing.unimo.it/people/paolo/disk_sched/

use the latest BFQ git here, merge it into v4.8.y:
https://github.com/linusw/linux-bfq/commits/bfq-v8

This doesn't completely fix the dirty_ration problem, but it is far better 
than CFQ or deadline in my opinion (and experience).

-Eric



--
Eric Wheeler


> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN

+folks from linux-mm thread for your suggestion

On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote:
> > swraid5 < bcache < dmcrypt < btrfs
> > 
> > Copying with btrfs send/receive causes massive hangs on the system.
> > Please see this explanation from Linus on why the workaround was
> > suggested:
> > https://lkml.org/lkml/2016/11/29/667
> And Linux' assessment is absolutely correct (at least, the general
> assessment is, I have no idea about btrfs_start_shared_extent, but I'm more
> than willing to bet he's correct that that's the culprit).

> > All of this mostly went away with Linus' suggestion:
> > echo 2 > /proc/sys/vm/dirty_ratio
> > echo 1 > /proc/sys/vm/dirty_background_ratio
> > 
> > But that's hiding the symptom which I think is that btrfs is piling up too 
> > many I/O
> > requests during btrfs send/receive and btrfs scrub (probably balance too) 
> > and not
> > looking at resulting impact to system health.

> I see pretty much identical behavior using any number of other storage
> configurations on a USB 2.0 flash drive connected to a system with 16GB of
> RAM with the default dirty ratios because it's trying to cache up to 3.2GB
> of data for writeback.  While BTRFS is doing highly sub-optimal things here,
> the ancient default writeback ratios are just as much a culprit.  I would
> suggest that get changed to 200MB or 20% of RAM, whichever is smaller, which
> would give overall almost identical behavior to x86-32, which in turn works
> reasonably well for most cases.  I sadly don't have the time, patience, or
> expertise to write up such a patch myself though.

Dear linux-mm folks, is that something you could consider (changing the
dirty_ratio defaults) given that it affects at least bcache and btrfs
(with or without bcache)?

By the way, on the 200MB max suggestion, when I had 2 and 1% (or 480MB
and 240MB on my 24GB system), this was enough to make btrfs behave
sanely, but only if I had bcache turned off.
With bcache enabled, those values were just enough so that bcache didn't
crash my system, but not enough that prevent undesirable behaviour
(things hanging, 100+ bcache kworkers piled up, and more). However, the
copy did succeed, despite the relative impact on the system, so it's
better than nothing :)
But the impact from bcache probably goes beyond what btrfs is
responsible for, so I have a separate thread on the bcache list:
http://marc.info/?l=linux-bcache=148052441423532=2
http://marc.info/?l=linux-bcache=148052620524162=2

On the plus side, btrfs did ok with 0 visible impact to my system with
those 480 and 240MB dirty ratio values.

Thanks for your reply, Austin.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Austin S. Hemmelgarn


On 2016-11-30 12:18, Marc MERLIN wrote:

On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote:

+btrfs mailing list, see below why

Ok, Linus helped me find a workaround for this problem:
https://lkml.org/lkml/2016/11/29/667
namely:
   echo 2 > /proc/sys/vm/dirty_ratio
   echo 1 > /proc/sys/vm/dirty_background_ratio
(it's a 24GB system, so the defaults of 20 and 10 were creating too many
requests in th buffers)


I'll remove the bcache list on this followup since I want to concentrate
here on the fact that btrfs does behave badly with the default
dirty_ratio values.
I will comment that on big systems, almost everything behaves badly with 
the default dirty ratios, they're leftovers from when 1GB was a huge 
amount of RAM.  As usual though, BTRFS has pathological behavior 
compared to other options.

As a reminder, it's a btrfs send/receive copy between 2 swraid5 arrays
on spinning rust.
swraid5 < bcache < dmcrypt < btrfs

Copying with btrfs send/receive causes massive hangs on the system.
Please see this explanation from Linus on why the workaround was
suggested:
https://lkml.org/lkml/2016/11/29/667
And Linux' assessment is absolutely correct (at least, the general 
assessment is, I have no idea about btrfs_start_shared_extent, but I'm 
more than willing to bet he's correct that that's the culprit).


The hangs that I'm getting with bcache cache turned off (i.e.
passthrough) are now very likely only due to btrfs and mess up anything
doing file IO that ends up timing out, break USB even as reads time out
in the middle of USB requests, interrupts lost, and so forth.

All of this mostly went away with Linus' suggestion:
echo 2 > /proc/sys/vm/dirty_ratio
echo 1 > /proc/sys/vm/dirty_background_ratio

But that's hiding the symptom which I think is that btrfs is piling up too many 
I/O
requests during btrfs send/receive and btrfs scrub (probably balance too) and 
not
looking at resulting impact to system health.
I see pretty much identical behavior using any number of other storage 
configurations on a USB 2.0 flash drive connected to a system with 16GB 
of RAM with the default dirty ratios because it's trying to cache up to 
3.2GB of data for writeback.  While BTRFS is doing highly sub-optimal 
things here, the ancient default writeback ratios are just as much a 
culprit.  I would suggest that get changed to 200MB or 20% of RAM, 
whichever is smaller, which would give overall almost identical behavior 
to x86-32, which in turn works reasonably well for most cases.  I sadly 
don't have the time, patience, or expertise to write up such a patch 
myself though.


Is there a way to stop flodding the entire system with I/O and causing
so much strain on it?
(I realize that if there is a caching layer underneath that just takes
requests and says thank you without giving other clues that underneath
bad things are happening, it may be hard, but I'm asking anyway :)


[10338.968912] perf: interrupt took too long (3927 > 3917), lowering 
kernel.perf_event_max_sample_rate to 50750

[12971.047705] ftdi_sio ttyUSB15: usb_serial_generic_read_bulk_callback - urb 
stopped: -32

[17761.122238] usb 4-1.4: USB disconnect, device number 39
[17761.141063] usb 4-1.4: usbfs: USBDEVFS_CONTROL failed cmd hub-ctrl rqt 160 
rq 6 len 1024 ret -108
[17761.263252] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
[17761.938575] usb 4-1.4: new SuperSpeed USB device number 40 using xhci_hcd

[24130.574425] hpet1: lost 2306 rtc interrupts
[24156.034950] hpet1: lost 1628 rtc interrupts
[24173.314738] hpet1: lost 1104 rtc interrupts
[24180.129950] hpet1: lost 436 rtc interrupts
[24257.557955] hpet1: lost 4954 rtc interrupts
[24267.522656] hpet1: lost 637 rtc interrupts

[28034.954435] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28034.975471]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[28035.000964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[28035.025429] btrfs   D 91154d33fc70 0  5618   5372 0x0080
[28035.047717]  91154d33fc70 00200246 911842f880c0 
9115a4cf01c0
[28035.071020]  91154d33fc58 91154d34 91165493bca0 
9115623773f0
[28035.094252]  1000 0001 91154d33fc88 
b86cf1a6
[28035.117538] Call Trace:
[28035.125791]  [] schedule+0x8b/0xa3
[28035.141550]  [] btrfs_start_ordered_extent+0xce/0x122
[28035.162457]  [] ? wake_up_atomic_t+0x2c/0x2c
[28035.180891]  [] btrfs_wait_ordered_range+0xa9/0x10d
[28035.201723]  [] btrfs_truncate+0x40/0x24b
[28035.219269]  [] btrfs_setattr+0x1da/0x2d7
[28035.237032]  [] notify_change+0x252/0x39c
[28035.254566]  [] do_truncate+0x81/0xb4
[28035.271057]  [] vfs_truncate+0xd9/0xf9
[28035.287782]  [] do_sys_truncate+0x63/0xa7

[28155.781987] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28155.802229]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote:
> +btrfs mailing list, see below why
> 
> On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> > On Mon, 27 Nov 2016, Coly Li wrote:
> > > 
> > > Yes, too many work queues... I guess the locking might be caused by some
> > > very obscure reference of closure code. I cannot have any clue if I
> > > cannot find a stable procedure to reproduce this issue.
> > > 
> > > Hmm, if there is a tool to clone all the meta data of the back end cache
> > > and whole cached device, there might be a method to replay the oops much
> > > easier.
> > > 
> > > Eric, do you have any hint ?
> > 
> > Note that the backing device doesn't have any metadata, just a superblock. 
> > You can easily dd that off onto some other volume without transferring the 
> > data. By default, data starts at 8k, or whatever you used in `make-bcache 
> > -w`.
> 
> Ok, Linus helped me find a workaround for this problem:
> https://lkml.org/lkml/2016/11/29/667
> namely:
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio
> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
> requests in th buffers)
> 
> Note that this is only a workaround, not a fix.

Actually, I'm even more worried about the general bcache situation when
caching is enabled. In the message above, Linus wrote:

"One situation where I've seen something like this happen is

 (a) lots and lots of dirty data queued up
 (b) horribly slow storage
 (c) filesystem that ends up serializing on writeback under certain
circumstances

The usual case for (b) in the modern world is big SSD's that have bad
worst-case behavior (ie they may do gbps speeds when doing well, and
then they come to a screeching halt when their buffers fill up and
they have to do rewrites, and their gbps throughput drops to mbps or
lower).

Generally you only find that kind of really nasty SSD in the USB stick
world these days."

Well, come to think of it, this is _exactly_ what bcache will create, by
design. It'll swallow up a lot of IO cached to the SSD, until the SSD
buffers fill up and then things will hang while bcache struggles to
write it all to slower spinning rust storage.

Looks to me like bcache and dirty_ratio need to be synced somehow, or
things will fall over reliably.

What do you think?

Thanks,
Marc


> When I did this and re tried my big copy again, I still got 100+ kernel
> work queues, but apparently the underlying swraid5 was able to unblock
> and satisfy the write requests before too many accumulated and crashed
> the kernel.
> 
> I'm not a kernel coder, but seems to me that bcache needs a way to
> throttle incoming requests if there are too many so that it does not end
> up in a state where things blow up due to too many piled up requests.
> 
> You should be able to reproduce this by taking 5 spinning rust drives,
> put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
> I used btrfs) and send lots of requests.
> Actually to be honest, the problems have mostly been happening when I do
> btrfs scrub and btrfs send/receive which both generate I/O from within
> the kernel instead of user space.
> So here, btrfs may be a contributor to the problem too, but while btrfs
> still trashes my system if I remove the caching device on bcache (and
> with the default dirty ratio values), it doesn't crash the kernel.
> 
> I'll start another separate thread with the btrfs folks on how much
> pressure is put on the system, but on your side it would be good to help
> ensure that bcache doesn't crash the system altogether if too many
> requests are allowed to pile up.
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote:
> +btrfs mailing list, see below why
> 
> Ok, Linus helped me find a workaround for this problem:
> https://lkml.org/lkml/2016/11/29/667
> namely:
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio
> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
> requests in th buffers)

I'll remove the bcache list on this followup since I want to concentrate
here on the fact that btrfs does behave badly with the default
dirty_ratio values.
As a reminder, it's a btrfs send/receive copy between 2 swraid5 arrays
on spinning rust.
swraid5 < bcache < dmcrypt < btrfs

Copying with btrfs send/receive causes massive hangs on the system.
Please see this explanation from Linus on why the workaround was
suggested:
https://lkml.org/lkml/2016/11/29/667

The hangs that I'm getting with bcache cache turned off (i.e.
passthrough) are now very likely only due to btrfs and mess up anything
doing file IO that ends up timing out, break USB even as reads time out
in the middle of USB requests, interrupts lost, and so forth.

All of this mostly went away with Linus' suggestion:
echo 2 > /proc/sys/vm/dirty_ratio
echo 1 > /proc/sys/vm/dirty_background_ratio

But that's hiding the symptom which I think is that btrfs is piling up too many 
I/O
requests during btrfs send/receive and btrfs scrub (probably balance too) and 
not 
looking at resulting impact to system health.

Is there a way to stop flodding the entire system with I/O and causing
so much strain on it?
(I realize that if there is a caching layer underneath that just takes
requests and says thank you without giving other clues that underneath
bad things are happening, it may be hard, but I'm asking anyway :)


[10338.968912] perf: interrupt took too long (3927 > 3917), lowering 
kernel.perf_event_max_sample_rate to 50750

[12971.047705] ftdi_sio ttyUSB15: usb_serial_generic_read_bulk_callback - urb 
stopped: -32

[17761.122238] usb 4-1.4: USB disconnect, device number 39
[17761.141063] usb 4-1.4: usbfs: USBDEVFS_CONTROL failed cmd hub-ctrl rqt 160 
rq 6 len 1024 ret -108
[17761.263252] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
[17761.938575] usb 4-1.4: new SuperSpeed USB device number 40 using xhci_hcd

[24130.574425] hpet1: lost 2306 rtc interrupts
[24156.034950] hpet1: lost 1628 rtc interrupts
[24173.314738] hpet1: lost 1104 rtc interrupts
[24180.129950] hpet1: lost 436 rtc interrupts
[24257.557955] hpet1: lost 4954 rtc interrupts
[24267.522656] hpet1: lost 637 rtc interrupts

[28034.954435] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28034.975471]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[28035.000964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[28035.025429] btrfs   D 91154d33fc70 0  5618   5372 0x0080
[28035.047717]  91154d33fc70 00200246 911842f880c0 
9115a4cf01c0
[28035.071020]  91154d33fc58 91154d34 91165493bca0 
9115623773f0
[28035.094252]  1000 0001 91154d33fc88 
b86cf1a6
[28035.117538] Call Trace:
[28035.125791]  [] schedule+0x8b/0xa3
[28035.141550]  [] btrfs_start_ordered_extent+0xce/0x122
[28035.162457]  [] ? wake_up_atomic_t+0x2c/0x2c
[28035.180891]  [] btrfs_wait_ordered_range+0xa9/0x10d
[28035.201723]  [] btrfs_truncate+0x40/0x24b
[28035.219269]  [] btrfs_setattr+0x1da/0x2d7
[28035.237032]  [] notify_change+0x252/0x39c
[28035.254566]  [] do_truncate+0x81/0xb4
[28035.271057]  [] vfs_truncate+0xd9/0xf9
[28035.287782]  [] do_sys_truncate+0x63/0xa7

[28155.781987] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28155.802229]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[28155.827894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[28155.852479] btrfs   D 91154d33fc70 0  5618   5372 0x0080
[28155.874761]  91154d33fc70 00200246 911842f880c0 
9115a4cf01c0
[28155.898059]  91154d33fc58 91154d34 91165493bca0 
9115623773f0
[28155.921464]  1000 0001 91154d33fc88 
b86cf1a6
[28155.944720] Call Trace:
[28155.953176]  [] schedule+0x8b/0xa3
[28155.968945]  [] btrfs_start_ordered_extent+0xce/0x122
[28155.989811]  [] ? wake_up_atomic_t+0x2c/0x2c
[28156.008195]  [] btrfs_wait_ordered_range+0xa9/0x10d
[28156.028498]  [] btrfs_truncate+0x40/0x24b
[28156.046081]  [] btrfs_setattr+0x1da/0x2d7
[28156.063621]  [] notify_change+0x252/0x39c
[28156.081667]  [] do_truncate+0x81/0xb4
[28156.098732]  [] vfs_truncate+0xd9/0xf9
[28156.115489]  [] do_sys_truncate+0x63/0xa7
[28156.133389]  [] SyS_truncate+0xe/0x10
[28156.149831]  [] do_syscall_64+0x61/0x72
[28156.167179]  [] entry_SYSCALL64_slow_path+0x25/0x25

[28397.436986] INFO: task btrfs:

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

+btrfs mailing list, see below why

On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> On Mon, 27 Nov 2016, Coly Li wrote:
> > 
> > Yes, too many work queues... I guess the locking might be caused by some
> > very obscure reference of closure code. I cannot have any clue if I
> > cannot find a stable procedure to reproduce this issue.
> > 
> > Hmm, if there is a tool to clone all the meta data of the back end cache
> > and whole cached device, there might be a method to replay the oops much
> > easier.
> > 
> > Eric, do you have any hint ?
> 
> Note that the backing device doesn't have any metadata, just a superblock. 
> You can easily dd that off onto some other volume without transferring the 
> data. By default, data starts at 8k, or whatever you used in `make-bcache 
> -w`.

Ok, Linus helped me find a workaround for this problem:
https://lkml.org/lkml/2016/11/29/667
namely:
   echo 2 > /proc/sys/vm/dirty_ratio
   echo 1 > /proc/sys/vm/dirty_background_ratio
(it's a 24GB system, so the defaults of 20 and 10 were creating too many
requests in th buffers)

Note that this is only a workaround, not a fix.

When I did this and re tried my big copy again, I still got 100+ kernel
work queues, but apparently the underlying swraid5 was able to unblock
and satisfy the write requests before too many accumulated and crashed
the kernel.

I'm not a kernel coder, but seems to me that bcache needs a way to
throttle incoming requests if there are too many so that it does not end
up in a state where things blow up due to too many piled up requests.

You should be able to reproduce this by taking 5 spinning rust drives,
put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
I used btrfs) and send lots of requests.
Actually to be honest, the problems have mostly been happening when I do
btrfs scrub and btrfs send/receive which both generate I/O from within
the kernel instead of user space.
So here, btrfs may be a contributor to the problem too, but while btrfs
still trashes my system if I remove the caching device on bcache (and
with the default dirty ratio values), it doesn't crash the kernel.

I'll start another separate thread with the btrfs folks on how much
pressure is put on the system, but on your side it would be good to help
ensure that bcache doesn't crash the system altogether if too many
requests are allowed to pile up.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

BCache

2016-11-18 Thread Heiri Müller

Hello,


I „BCache“ a BTrFS RAID with 4 hard drives.
Normal use seems to work good. I have no problems.
I have the BTrFS RAID mounted through „/dev/bcache3“.
But when I remove a disk (simulating it’s broken), BTrFS doesn’t inform me 
about a „missing disk“:

# btrfs fi show

Label: 'RAID'  uuid: d0e2e2eb-2df7-454f-8446-5213cec2de3c
Total devices 4 FS bytes used 12.55GiB
devid1 size 465.76GiB used 6.00GiB path /dev/bcache3
devid2 size 931.51GiB used 6.00GiB path /dev/bcache1
devid3 size 596.17GiB used 6.00GiB path /dev/bcache2
devid4 size 465.76GiB used 6.00GiB path /dev/bcache0


One of these (actually „/dev/bcache3“ or „/dev/sde“) should be broken or 
missing.
I can’t make a „btrfs device delete missing“ either. It replies „ERROR: not a 
btrfs filesystem:“. It doesn’t matter, if I use „/dev/bcacheX“ or „/dev/sdX“ 
for that. And if I make a „btrfs device delete missing /mnt/raid“, it replies: 
„ERROR: error removing device 'missing': no missing devices found to remove“.
It looks to me, as if BCache hides these informations from BTrFS. Is this 
possible? What do you think?


Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 25/45] bcache: use bio op accessors

2016-06-06 Thread Hannes Reinecke

On 06/05/2016 09:32 PM, mchri...@redhat.com wrote:
> From: Mike Christie <mchri...@redhat.com>
> 
> Separate the op from the rq_flag_bits and have bcache
> set/get the bio using bio_set_op_attrs/bio_op.
> 
> Signed-off-by: Mike Christie <mchri...@redhat.com>
> ---
>  drivers/md/bcache/btree.c |  4 ++--
>  drivers/md/bcache/debug.c |  4 ++--
>  drivers/md/bcache/journal.c   |  7 ---
>  drivers/md/bcache/movinggc.c  |  2 +-
>  drivers/md/bcache/request.c   | 14 +++---
>  drivers/md/bcache/super.c | 24 +---
>  drivers/md/bcache/writeback.c |  4 ++--
>  7 files changed, 31 insertions(+), 28 deletions(-)
> 
Reviewed-by: Hannes Reinecke <h...@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/45] bcache: use op_is_write instead of checking for REQ_WRITE

2016-06-06 Thread Hannes Reinecke

On 06/05/2016 09:31 PM, mchri...@redhat.com wrote:
> From: Mike Christie <mchri...@redhat.com>
> 
> We currently set REQ_WRITE/WRITE for all non READ IOs
> like discard, flush, writesame, etc. In the next patches where we
> no longer set up the op as a bitmap, we will not be able to
> detect a operation direction like writesame by testing if REQ_WRITE is
> set.
> 
> This has bcache use the op_is_write helper which will do the right
> thing.
> 
> Signed-off-by: Mike Christie <mchri...@redhat.com>
> ---
>  drivers/md/bcache/io.c  | 2 +-
>  drivers/md/bcache/request.c | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
(Could probably folded together with the two previous patches)

Reviewed-by: Hannes Reinecke <h...@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 25/45] bcache: use bio op accessors

2016-06-05 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

Separate the op from the rq_flag_bits and have bcache
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchri...@redhat.com>
---
 drivers/md/bcache/btree.c |  4 ++--
 drivers/md/bcache/debug.c |  4 ++--
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   | 14 +++---
 drivers/md/bcache/super.c | 24 +---
 drivers/md/bcache/writeback.c |  4 ++--
 7 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index eab505e..76f7534 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -294,10 +294,10 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
-   bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
bio->bi_private = 
+   bio_set_op_attrs(bio, REQ_OP_READ, REQ_META|READ_SYNC);
 
bch_bio_map(bio, b->keys.set[0].data);
 
@@ -396,8 +396,8 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
-   b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
+   bio_set_op_attrs(b->bio, REQ_OP_WRITE, REQ_META|WRITE_SYNC|REQ_FUA);
    bch_bio_map(b->bio, i);
 
/*
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..c28df164 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,7 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(>key, 0);
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
-   bio->bi_rw  = REQ_META|READ_SYNC;
+   bio_set_op_attrs(bio, REQ_OP_READ, REQ_META|READ_SYNC);
bch_bio_map(bio, sorted);
 
submit_bio_wait(bio);
@@ -114,7 +114,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
-   check->bi_rw |= READ_SYNC;
+   bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC);
 
if (bio_alloc_pages(check, GFP_NOIO))
goto out_put;
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..a3c3b30 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,11 +54,11 @@ reread: left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
bio->bi_private = 
+   bio_set_op_attrs(bio, REQ_OP_READ, 0);
bch_bio_map(bio, data);
 
closure_bio_submit(bio, );
@@ -449,10 +449,10 @@ static void do_journal_discard(struct cache *ca)
atomic_set(>discard_in_flight, DISCARD_IN_FLIGHT);
 
bio_init(bio);
+   bio_set_op_attrs(bio, REQ_OP_DISCARD, 0);
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,11 +626,12 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
bio->bi_private = w;
+   bio_set_op_attrs(bio, REQ_OP_WRITE,
+    REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA);
    bch_bio_map(bio, w->data);
 
    trace_bcache_journal_write(bio);
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..1881319 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)

[PATCH 07/45] bcache: use op_is_write instead of checking for REQ_WRITE

2016-06-05 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.

This has bcache use the op_is_write helper which will do the right
thing.

Signed-off-by: Mike Christie <mchri...@redhat.com>
---
 drivers/md/bcache/io.c  | 2 +-
 drivers/md/bcache/request.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..fd885cc 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, >key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio_op(bio))
? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 25fa844..6b85a23 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -383,7 +383,7 @@ static bool check_should_bypass(struct cached_dev *dc, 
struct bio *bio)
 
if (mode == CACHE_MODE_NONE ||
(mode == CACHE_MODE_WRITEAROUND &&
-(bio->bi_rw & REQ_WRITE)))
+op_is_write(bio_op(bio
goto skip;
 
if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) ||
@@ -404,7 +404,7 @@ static bool check_should_bypass(struct cached_dev *dc, 
struct bio *bio)
 
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
-   (bio->bi_rw & REQ_WRITE) &&
+   op_is_write(bio_op(bio)) &&
(bio->bi_rw & REQ_SYNC))
goto rescale;
 
@@ -657,7 +657,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->cache_miss   = NULL;
s->d= d;
s->recoverable  = 1;
-   s->write= (bio->bi_rw & REQ_WRITE) != 0;
+   s->write= op_is_write(bio_op(bio));
s->read_dirty_data  = 0;
s->start_time   = jiffies;
 
-- 
2.7.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/42] bcache: set bi_op to REQ_OP

2016-04-15 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

This patch has bcache use bio->bi_op for REQ_OPs and rq_flag_bits
to bio->bi_rw.

Signed-off-by: Mike Christie <mchri...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Hannes Reinecke <h...@suse.com>
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
    bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(>key, 0);
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, >key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+       bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@

[PATCH 21/42] bcache: set bi_op to REQ_OP

2016-04-15 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

This patch has bcache use bio->bi_op for REQ_OPs and rq_flag_bits
to bio->bi_rw.

Signed-off-by: Mike Christie <mchri...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Hannes Reinecke <h...@suse.com>
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
    bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(>key, 0);
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, >key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+       bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@

[PATCH 21/42] bcache: set bi_op to REQ_OP

2016-04-13 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

This patch has bcache use bio->bi_op for REQ_OPs and rq_flag_bits
to bio->bi_rw.

Signed-off-by: Mike Christie <mchri...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ++++---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
    bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(>key, 0);
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, >key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+   bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static vo

[PATCH 21/35] bcache: set bi_op to REQ_OP

2016-02-24 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

This patch has bcache set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

This patch is compile tested only

Signed-off-by: Mike Christie <mchri...@redhat.com>
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(>key, 0);
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, >key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+   bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)

Re: btrfs on top of bcache on top of dmcrypt on top of md raid5

2016-02-14 Thread Chris Murphy

Use all defaults for everything. Anything new by show should do the
right thing including 4096 byte alignment.

gargamel:~# cryptsetup luksDump /dev/md8
[snip]
Payload offset: 3072

This is a bit weird because the default is 4096. But because the LUKS
offset (header + payload + extra unused space) is 2MiB, so it doesn't
affect alignment. There may be unpatched (fixes not backported) in the
tools of current long term supported distros, that can cause
misalignment. Probably top concern would be parted/libparted, which
would start partition 1 at LBA 63, which is not aligned. The upstream
tools for a long time now have set partition 1 to LBA 2048, but these
crusty old unpatched versions just seem to persist like a booger you
can't flick off. It's really annoying - this idea of "stable bugs"
that go on and on for a decade.



Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on top of bcache on top of dmcrypt on top of md raid5

2016-02-14 Thread Marc MERLIN

On Sun, Feb 14, 2016 at 01:43:05PM -0700, Chris Murphy wrote:
> Use all defaults for everything. Anything new by show should do the
> right thing including 4096 byte alignment.
> 
> gargamel:~# cryptsetup luksDump /dev/md8
> [snip]
> Payload offset: 3072
> 
> This is a bit weird because the default is 4096. But because the LUKS
> offset (header + payload + extra unused space) is 2MiB, so it doesn't
> affect alignment. There may be unpatched (fixes not backported) in the
> tools of current long term supported distros, that can cause
> misalignment. Probably top concern would be parted/libparted, which
> would start partition 1 at LBA 63, which is not aligned. The upstream
> tools for a long time now have set partition 1 to LBA 2048, but these
> crusty old unpatched versions just seem to persist like a booger you
> can't flick off. It's really annoying - this idea of "stable bugs"
> that go on and on for a decade.

Indeed. Thankfully my partitions now start at 2048 like you say.

The only thing I did wrong last time (when using bcache) is
md5 - dmcrypt - bcache - btrfs
ssd - dmcrypt /

This was stupid, I needed to do this:
md5  - bcache - dmcrypt - btrfs
ssd /

So I think at this point, just to be future proof, I'm going to add bcache 
on top of all block devices I have, before putting dmcrypt on top, even if I
don't have a cache device.
That way I can later add a cache device without problems.

Without doing that, adding bcache later is a full re-install :(

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on top of bcache on top of dmcrypt on top of md raid5

2016-02-12 Thread Marc MERLIN

I have a 5 drive md array with dmcrypt on top, and btrfs on top of that.
Kernel: 4.4 but the filesystem was created 2 years ago with an older version
of btrfs.
It's littered with files and hardlinks (it's a backup server). Mostly it
gets btrfs receive data, and rsyncs of filesystem trees that are
occasionally hardlinked to keep history (for data that wasn't on btrfs to
start with).
Basically the filesystem works, but it's slow, I can see that my system
feels sluggish when backups are running, cronjobs that are somewhat time
critical also fail to run in time when rsyncs/backups to that filesystem,
are running.

It's time to re-create it, but this time I'm looking at adding bcache in the
middle (backed by an encrypted ssd) to hopefully help with the random I/O
bits that won't be as fast on disk backed raid5.

Are there best practises in doing this?

Are there issues with the default filesystem options in btrfs?

Do I want -m dup considering it's ultimately backed by raid5/hdd and not
ssd? (I would think yes, but I've noticed -m dup gets disabled when bcache
is in the middle, probably because the detection gets foiled).

Do I want to mess with --nodesize or --sectorsize and adjust for ssd write
block size? (with ext4, I use -b 4096 -E stride=128,stripe-width=128 )

Any specific configuration I ought to do with bcache or mdadm chunk sizes?

Does align-payload look ok?
cryptsetup luksFormat --align-payload=8192 -s 256 -c aes-xts-plain

Thanks,
Marc

PS: for reference:
As discussed in the past, there seems to be a general agreement that dmcrypt
on top of mdadm is better than mdadm on top of dmcrypt now that dmcrypt is
multithreaded.
My current array and encryption look like this.

Currently, I have:
gargamel:~# mdadm --detail /dev/md8
/dev/md8:   
Version : 1.2
  Creation Time : Sat Apr 19 23:03:59 2014
 Raid Level : raid5
 Array Size : 7813523456 (7451.56 GiB 8001.05 GB)
  Used Dev Size : 1953380864 (1862.89 GiB 2000.26 GB)
   Raid Devices : 5
  Total Devices : 5
Persistence : Superblock is persistent

  Intent Bitmap : Internal

Update Time : Thu Feb 11 08:26:45 2016
  State : active 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 256K

gargamel:~# cryptsetup luksDump /dev/md8
LUKS header information for /dev/md8

Version:1
Cipher name:aes
Cipher mode:xts-plain64
Hash spec:  sha1
Payload offset: 3072
MK bits:256

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/35] bcache: set bi_op to REQ_OP

2016-01-05 Thread mchristi

From: Mike Christie <mchri...@redhat.com>

This patch has bcache set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie <mchri...@redhat.com>
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index db68562..4c48783 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(>key, 0);
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  |= REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, >key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+   bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_movi

Is btrfs on top of bcache stable now?

2015-04-20 Thread Marc MERLIN

On Mon, Apr 20, 2015 at 10:27:05AM +, Hugo Mills wrote:
See the first issue here: https://btrfs.wiki.kernel.org/index.php/Gotchas

Hi Hugo, looking at the page again, I see 
bcache + btrfs does not seem to be stable yet
linking to a thread more than 2 years old and btrfs kernels that
wouldn't be stable without bcache anyway.

I've seen others mention they switched to bcache recently and not seen
new it's broken reports.

So, is it ok
1) to assume bcache and btrfs play ok together now?
2) remove the warning from that gotchas page?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is btrfs on top of bcache stable now?

2015-04-20 Thread Fábio Pfeifer

I'm one of those that used to have problems with btrfs on top of bcache.
After some corruptions, I gave up this setup.

Recently (from February, I think) I gave it another shot, and I have
had no problems since.
I use bcache in writeback mode, with very good performance. I'm
feeling btrfs very stable in this setup.

Best Regards,

Fabio Pfeifer

2015-04-20 11:49 GMT-03:00 Marc MERLIN m...@merlins.org:
 On Mon, Apr 20, 2015 at 10:27:05AM +, Hugo Mills wrote:
See the first issue here: https://btrfs.wiki.kernel.org/index.php/Gotchas

 Hi Hugo, looking at the page again, I see
 bcache + btrfs does not seem to be stable yet
 linking to a thread more than 2 years old and btrfs kernels that
 wouldn't be stable without bcache anyway.

 I've seen others mention they switched to bcache recently and not seen
 new it's broken reports.

 So, is it ok
 1) to assume bcache and btrfs play ok together now?
 2) remove the warning from that gotchas page?

 Thanks,
 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering BTRFS from bcache failure.

2015-04-09 Thread Dan Merillat

On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote:
 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super


 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system

 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)

 So I can't even send an image for people to look at.

CCing some more people on this one, while this filesystem isn't
important I'd like to know that restore from backup isn't the only
option for BTRFS corruption.  All of the tools simply throw up their
hands and bail when confronted with this filesystem, even btrfs-image.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat

It's a known bug with bcache and enabling discard, it was discarding
sections containing data it wanted.  After a reboot bcache refused to
accept the cache data, and of course it was dirty because I'm frankly
too stupid to breathe sometimes.

So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
rescue the btrfs data that it trashed.


On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote:
 Hello,

 I had some luck in the past with btrfs restore using the -r option. I don't
 recall how I determined the roots... Maybe I tried random numbers? I was
 able to recover nearly all of my data from a bcache related crash from over
 a year ago.

 What kind of bcache failure did you see? I've been doing some testing
 recently and ran into 2 bcache failures. With both of these failures, I had
 a ' bad btree header at bucket' error message (which is entirely different
 from the crash I had over a year back). I'm currently trying a different SSD
 to see if that alleviates the issue. The error makes me think that it's a
 bcache specific issue that's unrelated to btrfs or possibly (in my case) an
 issue with the previous SSD.

 Did you encounter this same error?

 With my 2 most recent crashes, I didn't try to recover very hard (or even
 try 'btrfs recover; at all) as I've been taking daily backups. I did try
 btrfsck, and not only would it fail, it would segfault.

 -Cameron


 On 04/08/2015 11:07 AM, Dan Merillat wrote:

 Any ideas on where to start with this?  I did flush the cache out to
 disk before I made changes to the bcache configuration, so there
 shouldn't be anything completely missing, just some bits of stale
 metadata.  If I can get the tools to take the closest match and run
 with it it would probably recover nearly everything.

 At worst, is there a way to scan the metadata blocks and rebuild from
 found extent-trees?




 On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com
 wrote:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat

Sorry I pressed send before I finished my thoughts.

btrfs restore gets nowhere with any options.  btrfs-recover says the
superblocks are fine, and chunk recover does nothing after a few hours
of reading.

Everything else bails out with the errors I listed above.

On Wed, Apr 8, 2015 at 2:36 PM, Dan Merillat dan.meril...@gmail.com wrote:
 It's a known bug with bcache and enabling discard, it was discarding
 sections containing data it wanted.  After a reboot bcache refused to
 accept the cache data, and of course it was dirty because I'm frankly
 too stupid to breathe sometimes.

 So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
 rescue the btrfs data that it trashed.


 On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote:
 Hello,

 I had some luck in the past with btrfs restore using the -r option. I don't
 recall how I determined the roots... Maybe I tried random numbers? I was
 able to recover nearly all of my data from a bcache related crash from over
 a year ago.

 What kind of bcache failure did you see? I've been doing some testing
 recently and ran into 2 bcache failures. With both of these failures, I had
 a ' bad btree header at bucket' error message (which is entirely different
 from the crash I had over a year back). I'm currently trying a different SSD
 to see if that alleviates the issue. The error makes me think that it's a
 bcache specific issue that's unrelated to btrfs or possibly (in my case) an
 issue with the previous SSD.

 Did you encounter this same error?

 With my 2 most recent crashes, I didn't try to recover very hard (or even
 try 'btrfs recover; at all) as I've been taking daily backups. I did try
 btrfsck, and not only would it fail, it would segfault.

 -Cameron


 On 04/08/2015 11:07 AM, Dan Merillat wrote:

 Any ideas on where to start with this?  I did flush the cache out to
 disk before I made changes to the bcache configuration, so there
 shouldn't be anything completely missing, just some bits of stale
 metadata.  If I can get the tools to take the closest match and run
 with it it would probably recover nearly everything.

 At worst, is there a way to scan the metadata blocks and rebuild from
 found extent-trees?




 On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com
 wrote:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Kai Krakow

Dan Merillat dan.meril...@gmail.com schrieb:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.
 
 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2
 
 kernel version 3.18
 
 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0
 [  7567954464768 575.111495] BTRFS (device bcache0): parent transid verify
 [  failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0
 [  7567954214912 575.131803] BTRFS (device bcache0): parent transid verify
 [  failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed
 
 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 
 
 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system
 
 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)
 
 So I can't even send an image for people to look at.

There's always last resort (LAST RESORT!) btrfs-zero-log. It may destroy 
some of your data, however, and can make things even worse if other repairs 
could've helped before. So here's some pointers:

  * btrfs-find-root: find a working tree-root (no idea how to set it, tho)
  * mount -o recovery: mount in recovery mode (tries to mount with a working
superblock backup)
  * btrfs restore: command to get files off a broken fs (at least what is
readable, no guarantees for sane file contents, tho, I guess)

Its a bit hard to follow the discussion here because posts from Cameron are 
missing in my NNTP reader (I'm using the gmane gateway to read here). So I'm 
answering

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat

Any ideas on where to start with this?  I did flush the cache out to
disk before I made changes to the bcache configuration, so there
shouldn't be anything completely missing, just some bits of stale
metadata.  If I can get the tools to take the closest match and run
with it it would probably recover nearly everything.

At worst, is there a way to scan the metadata blocks and rebuild from
found extent-trees?




On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote:
 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super


 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system

 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)

 So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Recovering BTRFS from bcache failure.

2015-04-07 Thread Dan Merillat

Bcache failures are nasty, because they leave a mix of old and new
data on the disk.  In this case, there was very little dirty data, but
of course the tree roots were dirty and out-of-sync.

fileserver:/usr/src/btrfs-progs# ./btrfs --version
Btrfs v3.18.2

kernel version 3.18

[  572.573566] BTRFS info (device bcache0): enabling auto recovery
[  572.573619] BTRFS info (device bcache0): disk space caching is enabled
[  574.266055] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.276952] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277008] BTRFS: failed to read tree root on bcache0
[  574.277187] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277356] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277398] BTRFS: failed to read tree root on bcache0
[  574.285955] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 613694
[  574.298741] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 610499
[  574.298804] BTRFS: failed to read tree root on bcache0
[  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
[  575.111495] BTRFS (device bcache0): parent transid verify failed on
7567954464768 wanted 613688 found 613685
[  575.111559] BTRFS: failed to read tree root on bcache0
[  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
[  575.131803] BTRFS (device bcache0): parent transid verify failed on
7567954214912 wanted 613687 found 613680
[  575.131866] BTRFS: failed to read tree root on bcache0
[  575.180101] BTRFS: open_ctree failed

all the btrfs tools throw up their hands with similar errors:
ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super


fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
--init-extent-tree
enabling repair mode
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Couldn't open file system

Annoyingly:
# ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Open ctree failed
create failed (Success)

So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Suggestion on reducing short kernel hangs from my btrfs filesystems: bcache?

2014-11-14 Thread Marc MERLIN

I have a server which runs zoneminder (video recording which is CPU and
disk IO intensive) while also doing a bunch of I/O over serial ports.

I have a a dual core
Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz
(4 virtual CPUs in /proc/cpuinfo)

It's pretty clear that when zoneminder is doing more work, my programs
that talk to serial ports start failing due to delays on the kernel side
and desynchronization, causing serial port protocol errors (I'm using
USB serial adapters, and use 12 of them).
I'm pretty sure it's because of delays in the kernel more than user
space, but can't prove that easily.

I have a preempt kernel, kernel 3.16.3:
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y

From what I can tell, things did get worse after I upgraded from ext4 to
btrfs (not counting times where I resync the software raid5 underneath
or run a btrfs scrub).

I may try to see if VOLPREMPT might work better, but I'm thinking
putting an SSD in front of that mdadm RAID5 array will help by relieving
the IO load and hopefully giving more time for the CPU to handle serial
port requests.
I'm actually not sure if my issue is btrfs interrupting serial port
connections due to PREEMPT, or if serial port connections aren't being
serviced quickly enough because the kernel is busy with btrfs and PREMPT
hasn't kicked in yet.

From reading the list, bcache may work with btrfs, but before I try
that, I was curious if there are other or better ways to use an SSD to
make btrfs less impacting on my server?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-08-20 Thread raphead


Hi,
has this issue been resolved?
I would like to use the bcache + btrfs combo.
Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-08-04 Thread Fábio Pfeifer

After completely loosing my filesystem twice because of this bug, I gave
up using btrfs on top of bcache (also writeback). In my case, I used to
have some subvolumes and some snapshot of these subvolumes, but not many
of them. The btrfs mantra backup, bakcup and backup saved me.

Best regards,

Fábio Pfeifer

2014-07-30 20:01 GMT-03:00 Larkin Lowrey llow...@nuclearwinter.com:
 I've been running two backup servers, with 25T and 20T of data, using
 btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
 scrubs and backup verifies (SHA1 hashes) and have never had a corruption
 issue.

 My use of btrfs is simple, though, with no subvolumes and no btrfs level
 raid. My bcache backing devices are LVM volumes that span multiple md
 raid6 arrays. So, either the bug has been fixed or my configuration is
 not susceptible.

 I'm running kernel 3.15.5-200.fc20.x86_64.

 --Larkin

 On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
 this bug still exists?

 Kernel 3.14
 B: 2x HDD 1 TB
 C: 1x SSD 256 GB

 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

 I still have no incomplete page write messages in dmesg | grep btrfs and 
 the checksums of some manually reviewed files are okay.

 Who has more experiences about this?

 Thanks,

 - dp
 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2014-07-31 Thread dptrash

Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
this bug still exists?

Kernel 3.14
B: 2x HDD 1 TB
C: 1x SSD 256 GB

# make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
# mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

I still have no incomplete page write messages in dmesg | grep btrfs and 
the checksums of some manually reviewed files are okay.

Who has more experiences about this?

Thanks,

- dp
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-07-31 Thread Duncan

dptrash posted on Thu, 31 Jul 2014 17:35:44 +0200 as excerpted:

 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018,
 does this bug still exists?
 
 Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB
 
 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
 --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1
 
 I still have no incomplete page write messages in dmesg | grep btrfs
 and the checksums of some manually reviewed files are okay.
 
 Who has more experiences about this?

See the reply (not mine) to your earlier post of the question:

http://permalink.gmane.org/gmane.linux.kernel.bcache.devel/2602

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2014-07-30 Thread dptrash

Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
this bug still exists?

Kernel 3.14
B: 2x HDD 1 TB
C: 1x SSD 256 GB

# make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
# mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

I still have no incomplete page write messages in dmesg | grep btrfs and 
the checksums of some manually reviewed files are okay.

Who has more experiences about this?

Thanks,

- dp
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-07-30 Thread Larkin Lowrey

I've been running two backup servers, with 25T and 20T of data, using
btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
scrubs and backup verifies (SHA1 hashes) and have never had a corruption
issue.

My use of btrfs is simple, though, with no subvolumes and no btrfs level
raid. My bcache backing devices are LVM volumes that span multiple md
raid6 arrays. So, either the bug has been fixed or my configuration is
not susceptible.

I'm running kernel 3.15.5-200.fc20.x86_64.

--Larkin

On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
 this bug still exists?

 Kernel 3.14
 B: 2x HDD 1 TB
 C: 1x SSD 256 GB

 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

 I still have no incomplete page write messages in dmesg | grep btrfs and 
 the checksums of some manually reviewed files are okay.

 Who has more experiences about this?

 Thanks,

 - dp
 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-05-01 Thread Austin S Hemmelgarn

On 2014-04-30 14:16, Felix Homann wrote:
 Hi,
 a couple of months ago there has been some discussion about issues
 when using btrfs on bcache:
 
 http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018
 
 From looking at the mailing list archives I cannot tell whether or not
 this issue has been resolved in current kernels from either bcache's
 or btrfs' side.
 
 Can anyone tell me what's the current state of this issue? Should it
 be safe to use btrfs on bcache by now?

In all practicality, I don't think anyone who frequents the list knows.
 I do know that there are a number of people (myself included) who avoid
bcache in general because of having issues with seemingly random kernel
OOPSes when it is linked in (either as a module or compiled in), even
when it isn't being used.  My advice would be to just test it with some
non-essential data (maybe set up a virtual machine?).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2014-04-30 Thread Felix Homann

Hi,
a couple of months ago there has been some discussion about issues
when using btrfs on bcache:

http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018

From looking at the mailing list archives I cannot tell whether or not
this issue has been resolved in current kernels from either bcache's
or btrfs' side.

Can anyone tell me what's the current state of this issue? Should it
be safe to use btrfs on bcache by now?

Thanks and kind regards,
Felix
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-01-08 Thread Chris Mason

On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote:
 On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote:
  On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
   Hello,
   
   I put the WARN_ON(1); after the printk lines (incomplete page read
   and incomplete page write) in extent_io.c.
   
   here some call traces:
   
   [   19.509497] incomplete page read in btrfs with offset 2560 and length 
   1536
   [   19.509500] [ cut here ]
   [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
   end_bio_extent_readpage+0x788/0xc20 [btrfs]()
   [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
   iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
   ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
   evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
   i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
   snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
   processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
   usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
   ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
   scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
   [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
   W  O 3.12.5-1-ARCH #1
   [   19.509580] Hardware name: System manufacturer System Product
   Name/P5WDG2 WS Pro, BIOS 090503/06/2008
   [   19.509581]  0009 880231a63cb0 814ee37b
   
   [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
   
   [   19.509587]  8802320cc9c0  880233b0e000
   880231a63cf8
   [   19.509590] Call Trace:
   [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
   [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
   [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
   [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
   [btrfs]
  
  This should mean that bcache is either failing to read some blocks
  properly or is fiddling with the bv_len/bv_offset fields.
  
  Could someone from bcache comment?
 
 Oh man, I found this and then threw up my hands in despair.
 
 Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the
 biovec so it can retry a bio on error, if the biovecs weren't all whole pages,
 otherwise it just passes the biovec down with the next bio to the underlying
 cache/backing device.
 
 What btrfs appears to be doing though - I couldn't believe that code actually
 _worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice
 undefined after a bio's completed, they might have been updated if the driver
 was using blk_update_request but for many drivers that just process the entire
 bio all at once they just won't touch those fields - and that includes 
 anything
 that clones the bio (md/dm).
 
 This is probably relevant to immutable biovecs here...
 
 -
 
 Ok, I looked again at the relevant btrfs code, I guess I can see how this 
 printk
 isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check
 for here? And why is it using bv_offset and bv_len further down in
 end_bio_extent_readpage()?

After the IO is done, we're recording the specific logical byte range
that covered the IO.  In practice its always the full page, we can
switch to just trusting PAGE_CACHE_SIZE.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-01-08 Thread Kent Overstreet

On Wed, Jan 08, 2014 at 07:35:32PM +, Chris Mason wrote:
 On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote:
  Ok, I looked again at the relevant btrfs code, I guess I can see how this 
  printk
  isn't normally triggered. But Chris, _what on earth_ is btrfs trying to 
  check
  for here? And why is it using bv_offset and bv_len further down in
  end_bio_extent_readpage()?
 
 After the IO is done, we're recording the specific logical byte range
 that covered the IO.  In practice its always the full page, we can
 switch to just trusting PAGE_CACHE_SIZE.

Yeah, the code already assumes it was doing PAGE_CACHE_SIZE reads; what
you're effectively checking is that the driver did the bvec all at once,
and that it didn't process half a bvec, update it, then process the rest
- which is a completely fine thing to do.

So for now - yeah, the correct thing to do is to just ignore
bv_offset/bv_len and go by PAGE_CACHE_SIZE. But - after immutable
biovecs is in, _then_ you'll be able to depend on bv_offset/bv_len
remaining unchanged (and you can get rid of your dependency on
PAGE_CACHE_SIZE bvecs).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-01-06 Thread Kent Overstreet

On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote:
 On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
  Hello,
  
  I put the WARN_ON(1); after the printk lines (incomplete page read
  and incomplete page write) in extent_io.c.
  
  here some call traces:
  
  [   19.509497] incomplete page read in btrfs with offset 2560 and length 
  1536
  [   19.509500] [ cut here ]
  [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
  end_bio_extent_readpage+0x788/0xc20 [btrfs]()
  [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
  iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
  ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
  evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
  i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
  snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
  processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
  usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
  ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
  scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
  [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
  W  O 3.12.5-1-ARCH #1
  [   19.509580] Hardware name: System manufacturer System Product
  Name/P5WDG2 WS Pro, BIOS 090503/06/2008
  [   19.509581]  0009 880231a63cb0 814ee37b
  
  [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
  
  [   19.509587]  8802320cc9c0  880233b0e000
  880231a63cf8
  [   19.509590] Call Trace:
  [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
  [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
  [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
  [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
  [btrfs]
 
 This should mean that bcache is either failing to read some blocks
 properly or is fiddling with the bv_len/bv_offset fields.
 
 Could someone from bcache comment?

Oh man, I found this and then threw up my hands in despair.

Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the
biovec so it can retry a bio on error, if the biovecs weren't all whole pages,
otherwise it just passes the biovec down with the next bio to the underlying
cache/backing device.

What btrfs appears to be doing though - I couldn't believe that code actually
_worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice
undefined after a bio's completed, they might have been updated if the driver
was using blk_update_request but for many drivers that just process the entire
bio all at once they just won't touch those fields - and that includes anything
that clones the bio (md/dm).

This is probably relevant to immutable biovecs here...

-

Ok, I looked again at the relevant btrfs code, I guess I can see how this printk
isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check
for here? And why is it using bv_offset and bv_len further down in
end_bio_extent_readpage()?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-02 Thread Duncan

Austin S Hemmelgarn posted on Wed, 01 Jan 2014 15:12:40 -0500 as
excerpted:

 On 12/30/2013 11:02 AM, Austin S Hemmelgarn wrote:
 
 As an alternative to using bcache, you might try something simmilar to
 the following:
 64G SSD with /boot, /, and /usr Other HDD with /var, /usr/portage,
 /usr/src, and /home tmpfs or ramdisk for /tmp and /var/tmp
 This is essentially what I use now, and I have found that it
 significantly improves system performance.
 
 On this specific note, I would actually suggest against putting the
 portage tree on btrfs, it makes syncing go ridiculously slow,
 and it also seems to slow down emerge as well.

Interesting observation.

I had not see it here (with the gentoo tree and overlays on btrfs), but 
that's very likely because all my btrfs are on SSD, as I upgraded to both 
at the same time, because my previous default filesystem choice, 
reiserfs, isn't well suited to SSD due to excessive writing due to the 
journaling.

I do know slow syncs and portage dep-calculations were one of the reasons 
I switched to SSD (and thus btrfs), however.  That was getting pretty 
painful on spinning rust, at least with reiserfs.  And I imagine btrfs on 
single-device spinning rust would if anything be worse at least for 
syncs, due to the default dup metadata, meaning at least three writes 
(and three seeks) for each file, once for the data, twice for the 
metadata.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-01 Thread Duncan

Austin S Hemmelgarn posted on Mon, 30 Dec 2013 11:02:21 -0500 as
excerpted:

 I've actually tried a simmilar configuration myself a couple of times
 (also using Gentoo in-fact), and I can tell you from experience that
 unless things have changed greatly since kernel 3.12.1, it really isn't
 worth the headaches.

Basically what I posted, but now with added real experience! (TM) =:^)

 As an alternative to using bcache, you might try something simmilar to
 the following:
 64G SSD with /boot, /, and /usr Other HDD with /var, /usr/portage,
 /usr/src, and /home tmpfs or ramdisk for /tmp and /var/tmp
 This is essentially what I use now, and I have found that it
 significantly improves system performance.

Again, very similar to my own recommendation.  Nice to see others saying 
the same thing. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-01 Thread Austin S Hemmelgarn

On 12/30/2013 11:02 AM, Austin S Hemmelgarn wrote:
 
 As an alternative to using bcache, you might try something simmilar to
 the following:
 64G SSD with /boot, /, and /usr
 Other HDD with /var, /usr/portage, /usr/src, and /home
 tmpfs or ramdisk for /tmp and /var/tmp
 This is essentially what I use now, and I have found that it
 significantly improves system performance.
 
On this specific note, I would actually suggest against putting the 
portage tree on btrfs, it makes syncing go ridiculously slow, 
and it also seems to slow down emerge as well.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Marc MERLIN

On Mon, Dec 30, 2013 at 02:22:55AM +0100, Kai Krakow wrote:
 These thought are actually quite interesting. So you are saying that data 
 may not be fully written to SSD although the kernel thinks so? This is 

That, and worse.

Incidently, I have just posted on my G+ about this:
https://plus.google.com/106981743284611658289/posts/Us8yjK9SPs6

which is mostly links to
http://lkcl.net/reports/ssd_analysis.html
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

After you read those, you'll never think twice about SSDs and data loss
anymore :-/
(I kind of found that out myself over time too, but these have much more
data than I got myself empirically on a couple of SSDs)

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Austin S Hemmelgarn

On 12/29/2013 04:11 PM, Kai Krakow wrote:
 Hello list!
 
 I'm planning to buy a small SSD (around 60GB) and use it for bcache in front 
 of my 3x 1TB HDD btrfs setup (mraid1+draid0) using write-back caching. Btrfs 
 is my root device, thus the system must be able to boot from bcache using 
 init ramdisk. My /boot is a separate filesystem outside of btrfs and will be 
 outside of bcache. I am using Gentoo as my system.
 
 I have a few questions:
 
 * How stable is it? I've read about some csum errors lately...
 
 * I want to migrate my current storage to bcache without replaying a backup.
   Is it possible?
 
 * Did others already use it? What is the perceived performance for desktop
   workloads in comparision to not using bcache?
 
 * How well does bcache handle power outages? Btrfs does handle them very
   well since many months.
 
 * How well does it play with dracut as initrd? Is it as simple as telling it
   the new device nodes or is there something complicate to configure?
 
 * How does bcache handle a failing SSD when it starts to wear out in a few
   years?
 
 * Is it worth waiting for hot-relocation support in btrfs to natively use
   a SSD as cache?
 
 * Would you recommend going with a bigger/smaller SSD? I'm planning to use
   only 75% of it for bcache so wear-leveling can work better, maybe use
   another part of it for hibernation (suspend to disk).
I've actually tried a simmilar configuration myself a couple of times
(also using Gentoo in-fact), and I can tell you from experience that
unless things have changed greatly since kernel 3.12.1, it really isn't
worth the headaches.  Setting it up on an already installed system is a
serious pain because the backing device has to be reformatted with a
bcache super-block.  In addition, every kernel that I have tried that
had bcache compiled in or loaded as a module had issues, I would see a
kernel OOPS on average once a day from the bcache code, usually followed
shortly by a panic from some other unrelated subsystem.  I didn't get
any actual data corruption, but I wasn't using btrfs at the time for any
of my filesystems.

As an alternative to using bcache, you might try something simmilar to
the following:
64G SSD with /boot, /, and /usr
Other HDD with /var, /usr/portage, /usr/src, and /home
tmpfs or ramdisk for /tmp and /var/tmp
This is essentially what I use now, and I have found that it
significantly improves system performance.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Kai Krakow

Marc MERLIN m...@merlins.org schrieb:

 On Mon, Dec 30, 2013 at 02:22:55AM +0100, Kai Krakow wrote:
 These thought are actually quite interesting. So you are saying that data
 may not be fully written to SSD although the kernel thinks so? This is
 
 That, and worse.
 
 Incidently, I have just posted on my G+ about this:
 https://plus.google.com/106981743284611658289/posts/Us8yjK9SPs6
 
 which is mostly links to
 http://lkcl.net/reports/ssd_analysis.html
 https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault
 
 After you read those, you'll never think twice about SSDs and data loss
 anymore :-/
 (I kind of found that out myself over time too, but these have much more
 data than I got myself empirically on a couple of SSDs)

The bad thing here is: Even battery-backed RAID controllers won't help you 
here. I start to understand why I still don't trust this new technology 
entirely.

Thanks,
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Kai Krakow

Duncan 1i5t5.dun...@cox.net schrieb:

[ spoiler: tldr ;-) ]

 * How stable is it? I've read about some csum errors lately...
 
 FWIW, both bcache and btrfs are new and still developing technology.
 While I'm using btrfs here, I have tested usable (which for root means
 either means directly bootable or that you have tested booting to a
 recovery image and restoring from there, I do the former, here) backups,
 as STRONGLY recommended for btrfs in its current state, but haven't had
 to use them.
 
 And I considered bcache previously and might otherwise be using it, but
 at least personally, I'm not willing to try BOTH of them at once, since
 neither one is mature yet and if there are problems as there very well
 might be, I'd have the additional issue of figuring out which one was the
 problem, and I'm personally not prepared to deal with that.

I mostly trust btrfs by now. Don't understand me wrong: I still have my 
nightly backup job syncing the complete system to an external drive - 
nothing defeats a good backup. But btrfs has survived reliably multiple 
power-losses, kernel panics/freezes, unreliable USB connections, ... It 
looks very stable from that view. Yes, it may have bugs that may introduce 
errors fatal to the filesystem structure. But generally, under usual 
workloads it has proven stable for me. At least for desktop workloads.
 
 Instead, at this point I'd recommend choosing /either/ bcache /or/ btrfs,
 and using bcache with a more mature filesystem like ext4 or (what I used
 for years previous and still use for spinning rust) reiserfs.

I've used reiserfs for several years a long time ago. But it does absolutely 
not scale well for parallel/threaded workloads which is a show stopper for 
server workloads. But it always survived even the worst failure scenarios 
(like SCSI bus going offline for some RAID members) and the tools 
distributed with it were able to recover all data even if the FS was damaged 
beyond any usual things you would normally try when it does no longer mount. 
I've been with Ext3 before, and it was not only one time that a simple 
power-loss during high server-workload destroyed the filesystem beyond 
repair with fsck only making it worse.

Since reiserfs did not scale well and ext* FS has annoyed me more than once, 
we've decided to go with XFS. While it tends to wipe some data after power-
loss and leaves you with zero-filled files, it has proven extremely reliable 
even under those situations mentioned above like dying SCSI bus. Not to the 
extent reiserfs did but still very satisfying. The big plus: it scales 
extremely well with parallel workloads and can be optimized for the stripe 
configuration of the underlying RAID layer. So I made it my default 
filesystem for desktop, too. With the above mentioned annoying feature of 
zero'ing out recently touched files when the system crashed. But well, we 
all got proven backups, right? Yep, I also learned that lesson... *sigh

But btrfs, when first announced and while I already was jealously looking at 
ZFS, seemed to be the FS of my choice giving me flexible RAID setups, 
snapshots... I'm quite happy with it although it feels slow sometimes. I 
simply threw more RAM at it - now it is okay.


 And as I said, keep your backups as current as you're willing to deal
 with losing what's not backed up, and tested usable and (for root) either
 bootable or restorable from alternate boot, because while at least btrfs
 is /reasonably/ stable for /ordinary/ daily use, there remain corner-
 cases and you never know when your case is going to BE a corner-case!

I've got a small rescue system I can boot which has btrfs-tools and a recent 
kernel to flexible repair, restore, or whatever I want to do with my backup. 
My backup itself is not bootable (although it probably could, if I change 
some configurations files).

 * I want to migrate my current storage to bcache without replaying a
 backup.  Is it possible?
 
 Since I've not actually used bcache, I won't try to answer some of these,
 but will answer based on what I've seen on the list where I can...  I
 don't know on this one.

I remember someone created some pyhton scripts to make it possible - wrt to 
btrfs especially. Can't remember the link. Maybe I'm able to dig it up. But 
at least I read it as: There's no improvement on that migration path 
directly from bcache. I hoped otherwise...

 * Did others already use it? What is the perceived performance for
 desktop workloads in comparision to not using bcache?
 
 Others are indeed already using it.  I've seen some btrfs/bcache problems
 reported on this list, but as mentioned above, when both are in use that
 means figuring out which is the problem, and at least from the btrfs side
 I've not seen a lot of resolution in that regard.  From here it /looks/
 like that's simply being punted at this time, as there's still more
 easily traceable problems without the additional bcache variable to work
 on first.  But it's quite possible

Migrate to bcache: A few questions

2013-12-29 Thread Kai Krakow

Hello list!

I'm planning to buy a small SSD (around 60GB) and use it for bcache in front 
of my 3x 1TB HDD btrfs setup (mraid1+draid0) using write-back caching. Btrfs 
is my root device, thus the system must be able to boot from bcache using 
init ramdisk. My /boot is a separate filesystem outside of btrfs and will be 
outside of bcache. I am using Gentoo as my system.

I have a few questions:

* How stable is it? I've read about some csum errors lately...

* I want to migrate my current storage to bcache without replaying a backup.
  Is it possible?

* Did others already use it? What is the perceived performance for desktop
  workloads in comparision to not using bcache?

* How well does bcache handle power outages? Btrfs does handle them very
  well since many months.

* How well does it play with dracut as initrd? Is it as simple as telling it
  the new device nodes or is there something complicate to configure?

* How does bcache handle a failing SSD when it starts to wear out in a few
  years?

* Is it worth waiting for hot-relocation support in btrfs to natively use
  a SSD as cache?

* Would you recommend going with a bigger/smaller SSD? I'm planning to use
  only 75% of it for bcache so wear-leveling can work better, maybe use
  another part of it for hibernation (suspend to disk).

Regards,
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Chris Murphy


On Dec 29, 2013, at 2:11 PM, Kai Krakow hurikhan77+bt...@gmail.com wrote:

 
 * How stable is it? I've read about some csum errors lately…

Seems like bcache devs are still looking into the recent btrfs csum issues.

 
 * I want to migrate my current storage to bcache without replaying a backup.
  Is it possible?
 
 * Did others already use it? What is the perceived performance for desktop
  workloads in comparision to not using bcache?
 
 * How well does bcache handle power outages? Btrfs does handle them very
  well since many months.
 
 * How well does it play with dracut as initrd? Is it as simple as telling it
  the new device nodes or is there something complicate to configure?
 
 * How does bcache handle a failing SSD when it starts to wear out in a few
  years?

I think most of these questions are better suited for the bcache list. I think 
there are still many uncertainties about the behavior of SSDs during power 
failures when they aren't explicitly designed with power failure protection in 
mind. At best I'd hope for a rollback involving data loss, but hopefully not a 
corrupt file system. I'd rather lose the last minute of data supposedly written 
to the drive, than have to do a fuil restore from backup.

 
 * Is it worth waiting for hot-relocation support in btrfs to natively use
  a SSD as cache?

I haven't read anything about it. Don't see it listed in project ideas.

 
 * Would you recommend going with a bigger/smaller SSD? I'm planning to use
  only 75% of it for bcache so wear-leveling can work better, maybe use
  another part of it for hibernation (suspend to disk).

I think that depends greatly on workload. If you're writing or reading a lot of 
disparate files, or a lot of small file random writes (mail server), I'd go 
bigger. By default sequential IO isn't cached. So I think you can get a big 
boost in responsiveness with a relatively small bcache size.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Kai Krakow

Chris Murphy li...@colorremedies.com schrieb:

 I think most of these questions are better suited for the bcache list.

Ah yes, you are true. I will repost the non-btrfs related questions to the 
bcache list. But actually I am most interested in using bcache together 
btrfs, so getting a general picture of its current state in this combination 
would be nice - and so these questions may be partially appropriate here.

 I
 think there are still many uncertainties about the behavior of SSDs during
 power failures when they aren't explicitly designed with power failure
 protection in mind. At best I'd hope for a rollback involving data loss,
 but hopefully not a corrupt file system. I'd rather lose the last minute
 of data supposedly written to the drive, than have to do a fuil restore
 from backup.

These thought are actually quite interesting. So you are saying that data 
may not be fully written to SSD although the kernel thinks so? This is 
probably very dangerous. The bcache module could not ensure coherence 
between its backing devices and its own contents - and data loss will occur 
and probably destroy important file system structures.

I understand your words as data may only partially being written. This, of 
course, may happen to HDDs as well. But usually a file system works with 
transactions so the last incomplete transaction can simply be thrown away. I 
hope bcache implements the same architecture. But what does it mean for the 
stacked write-back architecture?

As I understand, bcache may use write-through for sequential writes, but 
write-back for random writes. In this case, part of the data may have hit 
the backing device, other data does only exist in the bcache. If that last 
transaction is not closed due to power-loss, and then thrown away, we have 
part of the transaction already written to the backing device that the 
filesystem does not know of after resume.

I'd appreciate some thoughts about it but this topic is probably also best 
moved over to the bcache list.

Thanks,
Kai 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Chris Murphy


On Dec 29, 2013, at 6:22 PM, Kai Krakow hurikhan77+bt...@gmail.com wrote:

 So you are saying that data 
 may not be fully written to SSD although the kernel thinks so?

Drives shouldn't lie when asked to flush to disk, but they do. Older article 
about this at lwn is a decent primer on the subject of write barriers.

http://lwn.net/Articles/283161/

 This is 
 probably very dangerous. The bcache module could not ensure coherence 
 between its backing devices and its own contents - and data loss will occur 
 and probably destroy important file system structures.

I don't know the details, there's more detail on lkml.org and bcache lists. My 
impression is that short of bugs, it should be much safer than you describe. 
It's not like a linear/concat md or LVM device fail scenario. There's good info 
in the bcache.h file:

http://lxr.free-electrons.com/source/drivers/md/bcache/bcache.h

If anything, once the kinks are worked out, under heavy random write IO I'd 
expect bcache to improve the likelihood data isn't lost. Faster speed of SSD 
means we get a faster commit of the data to stable media. Also bcache assumes 
the cache is always dirty on startup, no matter whether the shutdown was clean 
or dirty, so the code is explicitly designed to resolve the state of the cache 
relative to the backing device. It's actually pretty fascinating work.

It may not be required, but I'd expect we'd want the write cache on the backing 
device disabled. It should still honor write barriers but it kinda seems 
unnecessary and riskier to have it enabled (which is the default with consumer 
drives).


 As I understand, bcache may use write-through for sequential writes, but 
 write-back for random writes. In this case, part of the data may have hit 
 the backing device, other data does only exist in the bcache. If that last 
 transaction is not closed due to power-loss, and then thrown away, we have 
 part of the transaction already written to the backing device that the 
 filesystem does not know of after resume.

In the write through case we should be no worse off than the bare drive in a 
power loss. In the write back case the SSD should have committed more data than 
the HDD could have in the same situation. I don't understand the details of how 
partially successful writes to the backing media are handled when the system 
comes back up. Since bcache is also COW, SSD blocks aren't reused until data is 
committed to the backing device.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Duncan

Kai Krakow posted on Sun, 29 Dec 2013 22:11:16 +0100 as excerpted:

 Hello list!
 
 I'm planning to buy a small SSD (around 60GB) and use it for bcache in
 front of my 3x 1TB HDD btrfs setup (mraid1+draid0) using write-back
 caching. Btrfs is my root device, thus the system must be able to boot
 from bcache using init ramdisk. My /boot is a separate filesystem
 outside of btrfs and will be outside of bcache. I am using Gentoo as my
 system.

Gentooer here too. =:^)

 I have a few questions:
 
 * How stable is it? I've read about some csum errors lately...

FWIW, both bcache and btrfs are new and still developing technology.  
While I'm using btrfs here, I have tested usable (which for root means 
either means directly bootable or that you have tested booting to a 
recovery image and restoring from there, I do the former, here) backups, 
as STRONGLY recommended for btrfs in its current state, but haven't had 
to use them.

And I considered bcache previously and might otherwise be using it, but 
at least personally, I'm not willing to try BOTH of them at once, since 
neither one is mature yet and if there are problems as there very well 
might be, I'd have the additional issue of figuring out which one was the 
problem, and I'm personally not prepared to deal with that.

Instead, at this point I'd recommend choosing /either/ bcache /or/ btrfs, 
and using bcache with a more mature filesystem like ext4 or (what I used 
for years previous and still use for spinning rust) reiserfs.

And as I said, keep your backups as current as you're willing to deal 
with losing what's not backed up, and tested usable and (for root) either 
bootable or restorable from alternate boot, because while at least btrfs 
is /reasonably/ stable for /ordinary/ daily use, there remain corner-
cases and you never know when your case is going to BE a corner-case!

 * I want to migrate my current storage to bcache without replaying a
 backup.  Is it possible?

Since I've not actually used bcache, I won't try to answer some of these, 
but will answer based on what I've seen on the list where I can...  I 
don't know on this one.

 * Did others already use it? What is the perceived performance for
 desktop workloads in comparision to not using bcache?

Others are indeed already using it.  I've seen some btrfs/bcache problems 
reported on this list, but as mentioned above, when both are in use that 
means figuring out which is the problem, and at least from the btrfs side 
I've not seen a lot of resolution in that regard.  From here it /looks/ 
like that's simply being punted at this time, as there's still more 
easily traceable problems without the additional bcache variable to work 
on first.  But it's quite possible the bcache list is actively tackling 
btrfs/bache combination problems, as I'm not subscribed there.

So I can't answer the desktop performance comparison question directly, 
but given that I /am/ running btrfs on SSD, I /can/ say I'm quite happy 
with that. =:^)

Keep in mind...

We're talking storage cache here.  Given the cost of memory and common 
system configurations these days, 4-16 gig of memory on a desktop isn't 
unusual or cost prohibitive, and a common desktop working set should well 
fit.

I suspect my desktop setup, 16 gigs memory backing a 6-core AMD fx6100 
(bulldozer-1) @ 3.6 GHz, is probably a bit toward the high side even for 
a gentooer, but not inordinately so.  Based on my usage...

Typical app memory usage runs 1-2 GiB (that's with KDE 4.12.49. from 
the gentoo/kde overlay, but USE=-semantic-desktop, etc).  Buffer memory 
runs a few MiB but isn't normally significant, so it can fold into that 
same 1-2 GiB too.

That leaves a full 14 GiB for cache.  But at least with /my/ usage, 
normal non-update cache memory usage tends to be below ~6 GiB too, so 
total apps/buffer/cache memory usage tends to be below 8 GiB as well.

When I'm doing multi-job builds or working with big media files, I'll 
sometimes go above 8 gig usage, and that occasional cache-spill was why I 
upgraded to 16 gig.  But in practice, 10 gig would take care of that most 
of the time, and were it not for the accident of powers-of-two meaning 
16 gig is the notch above 8 gig, 10 or 12 gig would be plenty.  Truth be 
told, I so seldom use that last 4 gig that it's almost embarrassing.

* Tho if I ran multi-GiB VMs that'd use up that extra memory real fast!  
But while that /is/ becoming more common, I'm not exactly sure I'd 
classify 4 gigs plus of VM usage as desktop usage just yet.  
Workstation, yes, and definitely server, but not really desktop.

All that as background to this...

* Cache works only after first access.  If you only access something 
occasionally, it may not be worth caching at all.

* Similarly, if access isn't time critical, think of playing a huge video 
file where only a few meg in memory at once is plenty, and where storage 
access is several times faster than play-speed, cache isn't particularly 
useful.

* Bcache

Re: btrfs on bcache

2013-12-24 Thread Fábio Pfeifer

(resend int text only)
Some more information about this issue.

I installed my system last november (arch x86_64), with kernel 3.11.
That time I didn't see any csum error or
incomplete page read error. Some time later these errors started to
show up. I don't know exactly if it was in
3.11 - 3.12 upgrade or somewhere in the 3.12 cycle. I've been using
bcache in writeback mode from the beginning.

I made some more testing:
  - tryed bcache in writethrough, writearound  and none modes;
  - tryed linux kernel 3.13-rc5

The errors didn't go away (maybe because my filesystem is already
corrupted). I didn't have time to test with kernel 3.11 again.

But lately the errors increased, and it started to make my system
unstable, and then unusable.
I had to reformat everything and recover my backups.

I don't have my / and /home in btrfs over bcache anymore, but I can
make some tests in a spare HD and SSD i have here. I'll report back
after Christmas.

thanks,

Fabio

2013/12/20 Chris Mason c...@fb.com:
 On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
 Hello,

 I put the WARN_ON(1); after the printk lines (incomplete page read
 and incomplete page write) in extent_io.c.

 here some call traces:

 [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
 [   19.509500] [ cut here ]
 [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
 end_bio_extent_readpage+0x788/0xc20 [btrfs]()
 [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
 iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
 evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
 i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
 snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
 processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
 usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
 ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
 scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
 [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
 W  O 3.12.5-1-ARCH #1
 [   19.509580] Hardware name: System manufacturer System Product
 Name/P5WDG2 WS Pro, BIOS 090503/06/2008
 [   19.509581]  0009 880231a63cb0 814ee37b
 
 [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
 
 [   19.509587]  8802320cc9c0  880233b0e000
 880231a63cf8
 [   19.509590] Call Trace:
 [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
 [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
 [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
 [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
 [btrfs]

 This should mean that bcache is either failing to read some blocks
 properly or is fiddling with the bv_len/bv_offset fields.

 Could someone from bcache comment?

 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-20 Thread eb

On Thu, Dec 19, 2013 at 8:59 PM, Chris Mason c...@fb.com wrote:
 On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
 Btrfs shouldn't be setting the offset on the bios.  Are you able to add
 a WARN_ON to the message that prints this so we can see the stack trace?

If you send me a patch - my experience on hacking on the kernel is
exactly 0 - I'll try to see if I can compile a custom kernel and get
it running.

 Could you please cc the bcache and btrfs list together?

Done.

I did some more testing - I copied an image of a 128GB drive over the
network (via netcat) onto the bcache/btrfs system and verified the
results twice using sha1sum. They're both identical on the source
system (which is *not* using bcache) and bcache/btrfs setup. I've
gotten a lot of the incomplete write errors and a few csum erros in
dmesg, but apparently they haven't done any harm?

Not sure how remarkable this is, as these kinds of things are supposed
to bypass the cache anyway, but I assume they still have to go through
the subsystem.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-20 Thread Fábio Pfeifer

Hello,

I put the WARN_ON(1); after the printk lines (incomplete page read
and incomplete page write) in extent_io.c.

here some call traces:

[   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
[   19.509500] [ cut here ]
[   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
end_bio_extent_readpage+0x788/0xc20 [btrfs]()
[   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
[   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
W  O 3.12.5-1-ARCH #1
[   19.509580] Hardware name: System manufacturer System Product
Name/P5WDG2 WS Pro, BIOS 090503/06/2008
[   19.509581]  0009 880231a63cb0 814ee37b

[   19.509585]  880231a63ce8 81062bcd ea00085eaec0

[   19.509587]  8802320cc9c0  880233b0e000
880231a63cf8
[   19.509590] Call Trace:
[   19.509596]  [814ee37b] dump_stack+0x54/0x8d
[   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
[   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
[   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 [btrfs]
[   19.509617]  [8107010b] ? lock_timer_base.isra.35+0x2b/0x50
[   19.509619]  [8106f660] ? detach_if_pending+0x120/0x120
[   19.509623]  [811d98dd] bio_endio+0x1d/0x30
[   19.509632]  [a0090227] end_workqueue_fn+0x37/0x40 [btrfs]
[   19.509642]  [a00c6b1e] worker_loop+0x14e/0x560 [btrfs]
[   19.509646]  [810952b2] ? default_wake_function+0x12/0x20
[   19.509656]  [a00c69d0] ? btrfs_queue_worker+0x330/0x330 [btrfs]
[   19.509672]  [81084fe0] kthread+0xc0/0xd0
[   19.509677]  [81084f20] ? kthread_create_on_node+0x120/0x120
[   19.509680]  [814fce7c] ret_from_fork+0x7c/0xb0
[   19.509683]  [81084f20] ? kthread_create_on_node+0x120/0x120
[   19.509687] ---[ end trace bbc8d0d088375446 ]---
[   25.592100] incomplete page read in btrfs with offset 2560 and length 1536
[   25.592105] [ cut here ]
[   25.592141] WARNING: CPU: 0 PID: 442 at fs/btrfs/extent_io.c:2441
end_bio_extent_readpage+0x788/0xc20 [btrfs]()
[   25.592143] Modules linked in: cdc_acm fuse iTCO_wdt
iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
[   25.592205] CPU: 0 PID: 442 Comm: btrfs-endio-met Tainted: P
W  O 3.12.5-1-ARCH #1
[   25.592208] Hardware name: System manufacturer System Product
Name/P5WDG2 WS Pro, BIOS 090503/06/2008
[   25.592211]  0009 880229773cb0 814ee37b

[   25.592216]  880229773ce8 81062bcd ea0002a20a80

[   25.592220]  88022d3ab180  88022d326000
880229773cf8
[   25.592225] Call Trace:
[   25.592234]  [814ee37b] dump_stack+0x54/0x8d
[   25.592240]  [81062bcd] warn_slowpath_common+0x7d/0xa0
[   25.592245]  [81062caa] warn_slowpath_null+0x1a/0x20
[   25.592262]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 [btrfs]
[   25.592267]  [810701ef] ? try_to_del_timer_sync+0x4f/0x70
[   25.592271]  [81070262] ? del_timer_sync+0x52/0x60
[   25.592275]  [8106f660] ? detach_if_pending+0x120/0x120
[   25.592280]  [811d98dd] bio_endio+0x1d/0x30
[   25.592296]  [a0090227] end_workqueue_fn+0x37/0x40 [btrfs]
[   25.592312]  [a00c6b1e] worker_loop+0x14e/0x560 [btrfs]
[   25.592318]  [810952b2] ? default_wake_function+0x12/0x20
[   25.592335]  [a00c69d0] ? btrfs_queue_worker+0x330/0x330 [btrfs]
[   25.592350]  [81084fe0] kthread+0xc0/0xd0
[   25.592353]  [81084f20] ? kthread_create_on_node+0x120/0x120
[   25.592356]  [814fce7c] ret_from_fork+0x7c/0xb0
[   25.592359]  [81084f20

Re: btrfs on bcache

2013-12-20 Thread Chris Mason

On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
 Hello,
 
 I put the WARN_ON(1); after the printk lines (incomplete page read
 and incomplete page write) in extent_io.c.
 
 here some call traces:
 
 [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
 [   19.509500] [ cut here ]
 [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
 end_bio_extent_readpage+0x788/0xc20 [btrfs]()
 [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
 iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
 evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
 i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
 snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
 processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
 usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
 ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
 scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
 [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
 W  O 3.12.5-1-ARCH #1
 [   19.509580] Hardware name: System manufacturer System Product
 Name/P5WDG2 WS Pro, BIOS 090503/06/2008
 [   19.509581]  0009 880231a63cb0 814ee37b
 
 [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
 
 [   19.509587]  8802320cc9c0  880233b0e000
 880231a63cf8
 [   19.509590] Call Trace:
 [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
 [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
 [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
 [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
 [btrfs]

This should mean that bcache is either failing to read some blocks
properly or is fiddling with the bv_len/bv_offset fields.

Could someone from bcache comment?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-20 Thread Henry de Valence

On Thu, Dec 19, 2013 at 2:04 PM, Fábio Pfeifer fmpfei...@gmail.com wrote:
 Any update on this?

 I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
 device 500 GB IDE, cache 24 GB SSD = /dev/bcache0
 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
 messages in dmesg:

I also have this issue.

Also, this afternoon I experienced data corruption on my btrfs device
(checksum errors), which might or might not be related. I don't really
know how to determine the cause, but if anyone has suggestions they'd
be appreciated.

Cheers,
Henry de Valence
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer

Any update on this?

I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
device 500 GB IDE, cache 24 GB SSD = /dev/bcache0
On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
messages in dmesg:

(...)
[   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
212992 csum 519977505 expected csum 3166125439
[   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
106496 csum 3553846164 expected csum 1299185721
[   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
[   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
172032 csum 1883678196 expected csum 1337496676
[   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
[   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
237568 csum 2863587994 expected csum 2693116460
[   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
[   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
57344 csum 1528117893 expected csum 2239543273
[   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
[   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
[   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
[   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
[   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
[   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
16384 csum 1180114025 expected csum 474262911
[   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
[   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
327680 csum 3065880108 expected csum 2663659117
[   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
[   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
[   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
[   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
81920 csum 1511792656 expected csum 3733709121
[   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
(...)

should I be worried?

thanks,

Fabio Pfeifer

2013/12/18 eb e...@gmx.ch:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:

 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)

 sdb3+sdc1 = /dev/bcache0

 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:

 [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024

 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

 Trying to fix this, I unattached the cache (still using /dev/bcache0,
 but without /dev/sdb3 attached), causing these errors to disappear. As
 soon as I re-attached /dev/sdb3 they started again, so I am fairly
 sure it's an unfavorable interaction between bcache and btrfs.

 Is this something I should be worried about (they're only emitted with
 KERN_INFO?) or just an alignment problem? The underlying HDD is using
 4K-Sectors, while the block_size of bcache seems to be 512, could that
 be the issue here?

 I've also encountered incomplete reads and a few csum errors, but I
 have not been able to trigger these regularly. I have a feeling that
 the error is more likely  o be on the bcache end (I've mailed to that
 list as well), however any insight into the matter would be much
 appreciated.

 Thanks,

 - eb
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer

Forgot to mention: bcache is in writeback mode

2013/12/19 Fábio Pfeifer fmpfei...@gmail.com:
 Any update on this?

 I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
 device 500 GB IDE, cache 24 GB SSD = /dev/bcache0
 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
 messages in dmesg:

 (...)
 [   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
 212992 csum 519977505 expected csum 3166125439
 [   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
 [   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
 [   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
 106496 csum 3553846164 expected csum 1299185721
 [   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
 [   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
 172032 csum 1883678196 expected csum 1337496676
 [   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
 [   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
 237568 csum 2863587994 expected csum 2693116460
 [   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
 [   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
 57344 csum 1528117893 expected csum 2239543273
 [   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
 [   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
 [   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
 [   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
 [   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
 [   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
 16384 csum 1180114025 expected csum 474262911
 [   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
 [   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
 327680 csum 3065880108 expected csum 2663659117
 [   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
 [   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
 [   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
 [   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
 81920 csum 1511792656 expected csum 3733709121
 [   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
 (...)

 should I be worried?

 thanks,

 Fabio Pfeifer

 2013/12/18 eb e...@gmx.ch:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:

 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)

 sdb3+sdc1 = /dev/bcache0

 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:

 [13811.845540] incomplete page write in btrfs with offset 1536 and length 
 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 
 1024

 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

 Trying to fix this, I unattached the cache (still using /dev/bcache0,
 but without /dev/sdb3 attached), causing these errors to disappear. As
 soon as I re-attached /dev/sdb3 they started again, so I am fairly
 sure it's an unfavorable interaction between bcache and btrfs.

 Is this something I should be worried about (they're only emitted with
 KERN_INFO?) or just an alignment problem? The underlying HDD is using
 4K-Sectors, while the block_size of bcache seems to be 512, could that
 be the issue here?

 I've also encountered incomplete reads and a few csum errors, but I
 have not been able to trigger these regularly. I have a feeling that
 the error is more likely  o be on the bcache end (I've mailed to that
 list as well), however any insight into the matter would be much
 appreciated.

 Thanks,

 - eb
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-19 Thread Chris Mason

On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:
 
 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)
 
 sdb3+sdc1 = /dev/bcache0
 
 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:
 
 [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024
 
 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

Btrfs shouldn't be setting the offset on the bios.  Are you able to add
a WARN_ON to the message that prints this so we can see the stack trace?

Could you please cc the bcache and btrfs list together?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2013-12-18 Thread eb

I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows:

/dev/sdb3 - cache0 (80 GB Intel SSD)
/dev/sdc1 - backing device (2 TB WD HDD)

sdb3+sdc1 = /dev/bcache0

On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
as / and /home. What's been bothering me are the following entries in
my kernel log:

[13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
[13870.326639] incomplete page write in btrfs with offset 3072 and length 1024

The offset/length values are always either 1536/2560 or 3072/1024,
they sum up nicely to 4K. There are 607 of those in there as I am
writing this, the machine has been up 18 hours and been under no
particular I/O strain (it's a desktop).

Trying to fix this, I unattached the cache (still using /dev/bcache0,
but without /dev/sdb3 attached), causing these errors to disappear. As
soon as I re-attached /dev/sdb3 they started again, so I am fairly
sure it's an unfavorable interaction between bcache and btrfs.

Is this something I should be worried about (they're only emitted with
KERN_INFO?) or just an alignment problem? The underlying HDD is using
4K-Sectors, while the block_size of bcache seems to be 512, could that
be the issue here?

I've also encountered incomplete reads and a few csum errors, but I
have not been able to trigger these regularly. I have a feeling that
the error is more likely  o be on the bcache end (I've mailed to that
list as well), however any insight into the matter would be much
appreciated.

Thanks,

- eb
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/9] bcache: generic_make_request() handles large bios now

2013-11-04 Thread Kent Overstreet

So we get to delete our hacky workaround.

Signed-off-by: Kent Overstreet k...@daterainc.com
---
 drivers/md/bcache/bcache.h|  18 
 drivers/md/bcache/io.c| 100 +-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++
 drivers/md/bcache/super.c |  33 ++
 drivers/md/bcache/util.h  |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 include/linux/bio.h   |  12 -
 8 files changed, 19 insertions(+), 173 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 964353c..8f65331 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -241,19 +241,6 @@ struct keybuf {
DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-   struct bio_set  *bio_split;
-   mempool_t   *bio_split_hook;
-};
-
-struct bio_split_hook {
-   struct closure  cl;
-   struct bio_split_pool   *p;
-   struct bio  *bio;
-   bio_end_io_t*bi_end_io;
-   void*bi_private;
-};
-
 struct bcache_device {
struct closure  cl;
 
@@ -286,8 +273,6 @@ struct bcache_device {
int (*cache_miss)(struct btree *, struct search *,
  struct bio *, unsigned);
int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct io {
@@ -465,8 +450,6 @@ struct cache {
atomic_long_t   meta_sectors_written;
atomic_long_t   btree_sectors_written;
atomic_long_t   sectors_written;
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct gc_stat {
@@ -901,7 +884,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, 
const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, 
unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index fa028fa..86a0bb8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include linux/blkdev.h
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-   struct request_queue *q = bdev_get_queue(bio-bi_bdev);
-   struct bio_vec bv;
-   struct bvec_iter iter;
-   unsigned ret = 0, seg = 0;
-
-   if (bio-bi_rw  REQ_DISCARD)
-   return min(bio_sectors(bio), q-limits.max_discard_sectors);
-
-   bio_for_each_segment(bv, bio, iter) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio-bi_bdev,
-   .bi_sector  = bio-bi_iter.bi_sector,
-   .bi_size= ret  9,
-   .bi_rw  = bio-bi_rw,
-   };
-
-   if (seg == min_t(unsigned, BIO_MAX_PAGES,
-queue_max_segments(q)))
-   break;
-
-   if (q-merge_bvec_fn 
-   q-merge_bvec_fn(q, bvm, bv)  (int) bv.bv_len)
-   break;
-
-   seg++;
-   ret += bv.bv_len  9;
-   }
-
-   ret = min(ret, queue_max_sectors(q));
-
-   WARN_ON(!ret);
-   ret = max_t(int, ret, bio_iovec(bio).bv_len  9);
-
-   return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   s-bio-bi_end_io = s-bi_end_io;
-   s-bio-bi_private = s-bi_private;
-   bio_endio_nodec(s-bio, 0);
-
-   closure_debug_destroy(s-cl);
-   mempool_free(s, s-p-bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-   struct closure *cl = bio-bi_private;
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   if (error)
-   clear_bit(BIO_UPTODATE, s-bio-bi_flags);
-
-   bio_put(bio);
-   closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-   struct bio_split_hook *s;
-   struct bio *n;
-
-   if (!bio_has_data(bio)  !(bio-bi_rw  REQ_DISCARD))
-   goto submit;
-
-   if (bio_sectors(bio) = bch_bio_max_sectors(bio))
-   goto submit;
-
-   s = mempool_alloc(p-bio_split_hook, GFP_NOIO);
-   closure_init(s-cl, NULL);
-
-   s-bio  = bio;
-   s-p= p;
-   s-bi_end_io= bio-bi_end_io;
-   s-bi_private   = bio-bi_private;
-   bio_get(bio);
-
-   do {
-   n = bio_next_split(bio, bch_bio_max_sectors(bio),
-  GFP_NOIO, s-p-bio_split

Re: [RFC PATCH 0/7] bcache: md conversion

2012-05-18 Thread Alex Elsayed

Dan Williams wrote:

 The consensus from LSF was that bcache need not invent a new interface
 when md and dm can both do the job.  As mentioned in patch 7 this series
 aims to be a minimal conversion.  Other refactoring items like
 deprecating register_lock for mddev-reconfig_mutex are deferred.
 
 This supports assembly of an already established cache array:
 
 mdadm -A /dev/md/bcache /dev/sd[ab]
 
 ...will create the /dev/md/bcache container and a subarray representing
 the cache volume.  Flash-only, or backing-device only volumes were not
 tested.  Create support and hot-add/hot-remove come later.
 
 Note:
 * When attempting to test with small loopback devices (100MB), assembly
   soft locks in bcache_journal_read().  That hang went away with larger
   devices, so there seems to be minimum component device size that needs
   to be considered in the tooling.

Is there any plan to separate the on-disk layout (per-device headers, etc) 
from the logic for the purpose of reuse? I can think of at least one case 
where this would be extremely useful: integration in BtrFS.

BtrFS already has its own methods for making sure a group of devices are all 
present when the filesystem is mounted, so it doesn't really need the 
formatting of the backing device bcache does to prevent it from being 
mounted solo. Putting bcache under BtrFS would be silly in the same way as 
putting it under a raid array, but bcache can't be put on top of BtrFS.

Logically, in looking at BtrFS' architecture, a cache would likely fit best 
at the 'block group' level, which IIUC would be roughly equivalent to the 
recommended 'over raid, under lvm' method of using bcache.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bcache with SSD instead of battery powered raid cards

2012-03-20 Thread Justin Sharp


On 03/13/2012 04:06 AM, Kiran Patil wrote:

Hi,

Is anybody using bcache with SSD instead of battery powered raid cards
with Btrfs ?

Hard drives are cheap and big, SSDs are fast but small and expensive.
Wouldn't it be nice if you could transparently get the advantages of
both? With Bcache, you can have your cake and eat it too.

Bcache is a patch for the Linux kernel to use SSDs to cache other
block devices. It's analogous to L2Arc for ZFS, but Bcache also does
writeback caching, and it's filesystem agnostic. It's designed to be
switched on with a minimum of effort, and to work well without
configuration on any setup. By default it won't cache sequential IO,
just the random reads and writes that SSDs excel at. It's meant to be
suitable for desktops, servers, high end storage arrays, and perhaps
even embedded.

http://bcache.evilpiepirate.org/

Did you ever experiment with this? What results did you find?

There is also something similar called flashcache developed by some 
facebook engineer that I'm interested in trying. They are supposedly 
using this to speed up mysql+innodb. It is out of mainline tree code 
though, and I don't think there is much of an effort to get it in. It 
supports writeback, writethrough and writearound (blocks are never 
cached on write, only on read) caching. It uses dm-mapper to combine 
your 'cache block' with your 'slow spinning block' and then you put your 
filesystem on top of that dm device. https://github.com/facebook/flashcache


Regards,
--Justin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

bcache with SSD instead of battery powered raid cards

2012-03-13 Thread Kiran Patil

Hi,

Is anybody using bcache with SSD instead of battery powered raid cards
with Btrfs ?

Hard drives are cheap and big, SSDs are fast but small and expensive.
Wouldn't it be nice if you could transparently get the advantages of
both? With Bcache, you can have your cake and eat it too.

Bcache is a patch for the Linux kernel to use SSDs to cache other
block devices. It's analogous to L2Arc for ZFS, but Bcache also does
writeback caching, and it's filesystem agnostic. It's designed to be
switched on with a minimum of effort, and to work well without
configuration on any setup. By default it won't cache sequential IO,
just the random reads and writes that SSDs excel at. It's meant to be
suitable for desktops, servers, high end storage arrays, and perhaps
even embedded.

http://bcache.evilpiepirate.org/

http://news.gmane.org/gmane.linux.kernel.bcache.devel

Thanks,
Kiran.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

87 matches

Mail list logo