[PATCH RFC] btrfs-progs: Show backtrace on BUGs

2014-08-21 Thread Naohiro Aota
btrfs check is still under heavy development and so there are some
BUGs beging hit. btrfs check can be run on limited environment which
lacks gdb to debug the abort in detail. If we could see backtrace, it
will be easier to find a root cause of the BUG.

Following is my btrfs check output with and without the patch.

without patch:

ref mismatch on [1437411180544 16384] extent item 3, found 2
btrfs: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags  1)' failed.
enabling repair mode
Checking filesystem on /dev/sdb2
UUID: a53121ee-679f-4241-bb44-ceb5a1a7beb7

with patch:

ref mismatch on [1437411180544 16384] extent item 3, found 2
btrfs check(free_extent_buffer+0xa3)[0x808ff89]
btrfs check[0x8090222]
btrfs check(alloc_extent_buffer+0xa7)[0x80903d0]
btrfs check(btrfs_find_create_tree_block+0x2e)[0x807edef]
btrfs check(btrfs_alloc_free_block+0x299)[0x808a3c4]
btrfs check(__btrfs_cow_block+0x1d4)[0x8078896]
btrfs check(btrfs_cow_block+0x150)[0x807934d]
btrfs check(btrfs_search_slot+0x136)[0x807c041]
btrfs check[0x8065a6b]
btrfs check[0x806bc0f]
btrfs check(cmd_check+0xc8f)[0x806d03a]
btrfs check(main+0x167)[0x804f5ec]
/lib/libc.so.6(__libc_start_main+0xe6)[0xf754c4f6]
btrfs check[0x804f061]
btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags  1)' 
failed.
enabling repair mode
Checking filesystem on /dev/sdb2
UUID: a53121ee-679f-4241-bb44-ceb5a1a7beb7

Now it's much clear that there's something wrong around
alloc_extent_buffer.

Signed-off-by: Naohiro Aota na...@elisp.net
---
 Makefile |  2 +-
 kerncompat.h | 29 ++---
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 76565e8..9db5441 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,7 @@ CC = gcc
 LN = ln
 AR = ar
 AM_CFLAGS = -Wall -D_FILE_OFFSET_BITS=64 -DBTRFS_FLAT_INCLUDES 
-fno-strict-aliasing -fPIC
-CFLAGS = -g -O1 -fno-strict-aliasing
+CFLAGS = -g -O1 -fno-strict-aliasing -rdynamic
 objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \
  root-tree.o dir-item.o file-item.o inode-item.o inode-map.o \
  extent-cache.o extent_io.o volumes.o utils.o repair.o \
diff --git a/kerncompat.h b/kerncompat.h
index f370cd8..94fadc8 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -27,6 +27,7 @@
 #include byteswap.h
 #include assert.h
 #include stddef.h
+#include execinfo.h
 #include linux/types.h
 
 #ifndef READ
@@ -50,7 +51,18 @@
 #define ULONG_MAX   (~0UL)
 #endif
 
-#define BUG() assert(0)
+#define MAX_BACKTRACE 128
+#define show_trace()   \
+   do {\
+   void *trace[MAX_BACKTRACE]; \
+   int n = backtrace(trace, MAX_BACKTRACE);\
+   backtrace_symbols_fd(trace, n, 2);  \
+   } while(0)
+#define BUG()  \
+   do {\
+   show_trace();   \
+   assert(0);  \
+   } while(0)
 #ifdef __CHECKER__
 #define __force__attribute__((force))
 #define __bitwise__ __attribute__((bitwise))
@@ -233,8 +245,19 @@ static inline long IS_ERR(const void *ptr)
 #define kstrdup(x, y) strdup(x)
 #define kfree(x) free(x)
 
-#define BUG_ON(c) assert(!(c))
-#define WARN_ON(c) assert(!(c))
+#define BUG_ON(c)  \
+   do {\
+   if((c)) {   \
+   show_trace();   \
+   assert(!(c));   \
+   }   \
+   } while(0)
+#define WARN_ON(c) do {\
+   if((c)) {   \
+   show_trace();   \
+   assert(!(c));   \
+   }   \
+   } while(0)
 
 
 #define container_of(ptr, type, member) ({  \
-- 
2.0.4
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Do not free dirty extent buffer

2014-08-21 Thread Naohiro Aota
free_some_buffer() should not free dirty extent buffers. They should be
left for later commit.

Signed-off-by: Naohiro Aota na...@elisp.net
---
 extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extent_io.c b/extent_io.c
index a127e54..8a668be 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -552,7 +552,7 @@ static int free_some_buffers(struct extent_io_tree *tree)
 
list_for_each_safe(node, next, tree-lru) {
eb = list_entry(node, struct extent_buffer, lru);
-   if (eb-refs == 1) {
+   if (eb-refs == 1  !(eb-flags  EXTENT_DIRTY)) {
free_extent_buffer(eb);
if (tree-cache_size  cache_hard_max)
break;
-- 
2.0.4
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Do not free dirty extent buffer

2014-08-24 Thread Naohiro Aota
Hi,

Filipe David Manana fdman...@gmail.com writes:

 On Thu, Aug 21, 2014 at 1:07 PM, Naohiro Aota na...@elisp.net wrote:
 free_some_buffer() should not free dirty extent buffers. They should be
 left for later commit.

 Signed-off-by: Naohiro Aota na...@elisp.net
 ---
  extent_io.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/extent_io.c b/extent_io.c
 index a127e54..8a668be 100644
 --- a/extent_io.c
 +++ b/extent_io.c
 @@ -552,7 +552,7 @@ static int free_some_buffers(struct extent_io_tree *tree)

 list_for_each_safe(node, next, tree-lru) {
 eb = list_entry(node, struct extent_buffer, lru);
 -   if (eb-refs == 1) {
 +   if (eb-refs == 1  !(eb-flags  EXTENT_DIRTY)) {

 Hi,

 Did you meant bitwise and () and not logical and () right?

Oops, you're right. Following is the new patch.

From 0e1a49216d40b44909cdfacd5cd9a13aa796db41 Mon Sep 17 00:00:00 2001
From: Naohiro Aota na...@elisp.net
Date: Mon, 25 Aug 2014 14:03:59 +0900
Subject: [PATCH] btrfs-progs: Do not free dirty extent buffer

free_some_buffer() should not free dirty extent buffers. They are left
to be committed.

Signed-off-by: Naohiro Aota na...@elisp.net
---
 extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extent_io.c b/extent_io.c
index a127e54..1df377d 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -552,7 +552,7 @@ static int free_some_buffers(struct extent_io_tree *tree)
 
list_for_each_safe(node, next, tree-lru) {
eb = list_entry(node, struct extent_buffer, lru);
-   if (eb-refs == 1) {
+   if (eb-refs == 1  !(eb-flags  EXTENT_DIRTY)) {
free_extent_buffer(eb);
if (tree-cache_size  cache_hard_max)
break;
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-progs: initial reference count of extent buffer is correct?

2014-08-24 Thread Naohiro Aota
Hi, list

I'm having trouble with my btrfs FS recently and running btrfs check to
try to fix the FS. Unfortunately, it aborted with:

btrfsck: root-tree.c:81: btrfs_update_root: Assertion `!(ret != 0)' failed.

It means that extent tree root is not found in tree root tree! Then
I added btrfs_print_leaf() there to see what is happening there. There
were (... METADATA_ITEM 0) keys listed. Well, I found tree root
tree's root extent buffer is somewhat replaced by a extent buffer from
the extent tree.

Reading the code, it seems that free_some_buffers() reclaim extent
buffers allocated to root trees because they are not
extent_buffer_get()ed (i.e. @refs == 1). 

To reproduce this problem, try running this code. This program first
print root tree node's bytenr, and scan some trees. If your FS is large
enough to run free_some_buffers(), tree root node's bytenr after the
scan would be different.

#include stdio.h
#include ctree.h
#include disk-io.h

void scan_tree(struct btrfs_root *root, struct extent_buffer *eb)
{
  u32 i;
  u32 nr;
  nr = btrfs_header_nritems(eb);
  if (btrfs_is_leaf(eb)) return;
  u32 size = btrfs_level_size(root, btrfs_header_level(eb) - 1);
  for (i = 0; i  nr; i++) {
if (btrfs_is_leaf(eb)) return;
u64 bytenr = btrfs_node_blockptr(eb, i);
struct extent_buffer *next = read_tree_block(root, bytenr, size,
 btrfs_node_ptr_generation(eb, 
i));
if (!next) continue;
scan_tree(root, next);
  }
}

int main(int ac, char **av)
{
  struct btrfs_fs_info *info;
  struct btrfs_root *root;
  info = open_ctree_fs_info(av[1], 0, 0, OPEN_CTREE_PARTIAL);
  root = info-fs_root;
  printf(tree root %lld\n, info-tree_root-node-start);
  scan_tree(info-fs_root, info-extent_root-node);
  scan_tree(info-fs_root, info-csum_root-node);
  scan_tree(info-fs_root, info-fs_root-node);
  printf(tree root %lld\n, info-tree_root-node-start);
  return close_ctree(root);
}

On my environment, the above code print the following result. Tree root
tree variable is eventually pointing to another extent!

$ ./btrfs-reproduce /dev/sda3
tree root 91393835008
tree root 49102848

I found commit 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 changed the
initial @refs to 1, stating that we don't give enough
free_extent_buffer() to reduce the eb's references to zero so that the
eb can finally be freed, but I don't think this is correct. Even if
initial @refs == 2, one free_extent_buffer() would make the @refs to 1
and so let it reclaimed by free_some_buffer(), so it does not seems to
be a problem for me...

I think there are some collides how to use extent buffer: should
__alloc_extent_buffer set @refs = 2 for the caller or should the code
call extent_buffer_get() by themselves everywhere you allocate eb before
any other eb allocation not to let the first eb reclaimed? How to fix
this problem? revert 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 is the
collect way? or add missing extent_buffer_get() everywhere allocating
is done?

Naohiro
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: do not reclaim extent buffer

2014-09-30 Thread Naohiro Aota
We should kill free_some_buffers() to stop reclaiming extent buffers or
we will hit a problem described below.

As of commit 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407, we are not
counting a reference for tree-lru anymore. However free_some_buffers()
is still left and is reclaiming extent buffers whose @refs == 1. This
cause extent buffers to be reclaimed unintentionally. Thus the following
steps could happen:

1. A buffer at address A is reclaimed by free_some_buffers()
   (address A is also free()ed)
2. Some code call alloc_extent_buffer()
3. Address A is assigned to newly allocated buffer
4. You see a buffer pointed by A suddenly changed its content

This problem is also pointed out here and it has a reproducer:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36703.html

This commit drop free_some_buffers() and related variables, and also it
modify extent_io_tree_cleanup() to catch non-free'ed buffers properly.

Signed-off-by: Naohiro Aota na...@elisp.net
---
 extent_io.c | 37 +++--
 1 file changed, 3 insertions(+), 34 deletions(-)

diff --git a/extent_io.c b/extent_io.c
index 1df377d..425af8a 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -30,9 +30,6 @@
 #include ctree.h
 #include volumes.h
 
-static u64 cache_soft_max = 1024 * 1024 * 256;
-static u64 cache_hard_max = 1 * 1024 * 1024 * 1024;
-
 void extent_io_tree_init(struct extent_io_tree *tree)
 {
cache_tree_init(tree-state);
@@ -77,12 +74,9 @@ void extent_io_tree_cleanup(struct extent_io_tree *tree)
 
while(!list_empty(tree-lru)) {
eb = list_entry(tree-lru.next, struct extent_buffer, lru);
-   if (eb-refs != 1) {
-   fprintf(stderr, extent buffer leak: 
-   start %llu len %u\n,
-   (unsigned long long)eb-start, eb-len);
-   eb-refs = 1;
-   }
+   fprintf(stderr, extent buffer leak: 
+   start %llu len %u\n,
+   (unsigned long long)eb-start, eb-len);
free_extent_buffer(eb);
}
 
@@ -541,30 +535,6 @@ out:
return ret;
 }
 
-static int free_some_buffers(struct extent_io_tree *tree)
-{
-   u32 nrscan = 0;
-   struct extent_buffer *eb;
-   struct list_head *node, *next;
-
-   if (tree-cache_size  cache_soft_max)
-   return 0;
-
-   list_for_each_safe(node, next, tree-lru) {
-   eb = list_entry(node, struct extent_buffer, lru);
-   if (eb-refs == 1  !(eb-flags  EXTENT_DIRTY)) {
-   free_extent_buffer(eb);
-   if (tree-cache_size  cache_hard_max)
-   break;
-   } else {
-   list_move_tail(eb-lru, tree-lru);
-   }
-   if (nrscan++  64  tree-cache_size  cache_hard_max)
-   break;
-   }
-   return 0;
-}
-
 static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree,
   u64 bytenr, u32 blocksize)
 {
@@ -589,7 +559,6 @@ static struct extent_buffer *__alloc_extent_buffer(struct 
extent_io_tree *tree,
eb-cache_node.size = blocksize;
INIT_LIST_HEAD(eb-recow);
 
-   free_some_buffers(tree);
ret = insert_cache_extent(tree-cache, eb-cache_node);
if (ret) {
free(eb);
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: clear bio reference after submit_one_bio()

2015-01-05 Thread Naohiro Aota
After submit_one_bio(), `bio' can go away. However submit_extent_page()
leave `bio' referable if submit_one_bio() failed (e.g. -ENOMEM on OOM).
It will cause invalid paging request when submit_extent_page() is called
next time.

I reproduced ENOMEM case with the following script (need
CONFIG_FAIL_PAGE_ALLOC, and CONFIG_FAULT_INJECTION_DEBUG_FS).

  #!/bin/bash

  dmesgout=dmesg.txt
  start=10
  end=30
  step=1000

  # btrfs options
  device=/dev/vdb1
  directory=/mnt/btrfs

  # fault-injection options
  percent=100
  times=3

  mkdir -p $directory || exit 1
  mount -o compress $device $directory || exit 1

  rm -f $directory/file || exit 1
  dd if=/dev/zero of=$directory/file bs=1M count=512 || exit 1

  for interval in `seq $start $step $end`; do
  dmesg -C
  echo 1  /proc/sys/vm/drop_caches
  sync
  export FAILCMD_TYPE=fail_page_alloc
  ./failcmd.sh -p $percent -t $times -i $interval \
  --ignore-gfp-highmem=N --ignore-gfp-wait=N --min-order=0 \
  -- \
  cat $directory/file  /dev/null
  dmesg  ${dmesgout}
  if grep -q BUG: ${dmesgout}; then
  cat ${dmesgout}
  exit 1
  fi
  done

  umount $directory
  exit 0

Signed-off-by: Naohiro Aota na...@elisp.net
---
 fs/btrfs/extent_io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ebabd2..4421161 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2816,8 +2816,10 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
bio_add_page(bio, page, page_size, offset)  page_size) {
ret = submit_one_bio(rw, bio, mirror_num,
 prev_bio_flags);
-   if (ret  0)
+   if (ret  0) {
+   *bio_ret = NULL;
return ret;
+   }
bio = NULL;
} else {
return 0;
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: tweak key advancing condition

2015-06-03 Thread Naohiro Aota
The key advancing condition used in copy_to_sk() is loose. It can
advance the key even if it reaches sk-max_*: e.g. when the max key = (512,
1024, -1) and the current key = (512, 1025, 10), it increments the
offset by 1, continues hopeless search from (512, 1025, 11). This issue
make ioctl() to take a lot of time scanning all the leaf a blocks one by
one.

This commit fix the problem using standard way of key comparison:
btrfs_comp_cpu_keys()

Signed-off-by: Naohiro Aota na...@elisp.net
---
 fs/btrfs/ioctl.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 1c22c65..07dc01d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
u64 found_transid;
struct extent_buffer *leaf;
struct btrfs_ioctl_search_header sh;
+   struct btrfs_key test;
unsigned long item_off;
unsigned long item_len;
int nritems;
@@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root *root,
}
 advance_key:
ret = 0;
-   if (key-offset  (u64)-1  key-offset  sk-max_offset)
+   test.objectid = sk-max_objectid;
+   test.type = sk-max_type;
+   test.offset = sk-max_offset;
+   if (btrfs_comp_cpu_keys(key, test) = 0)
+   ret = 1;
+   else if (key-offset  (u64)-1)
key-offset++;
-   else if (key-type  (u8)-1  key-type  sk-max_type) {
+   else if (key-type  (u8)-1) {
key-offset = 0;
key-type++;
-   } else if (key-objectid  (u64)-1  key-objectid  sk-max_objectid) 
{
+   } else if (key-objectid  (u64)-1) {
key-offset = 0;
key-type = 0;
key-objectid++;
-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RESEND] btrfs: fix search key advancing condition

2015-06-29 Thread Naohiro Aota
The search key advancing condition used in copy_to_sk() is loose. It can
advance the key even if it reaches sk-max_*: e.g. when the max key = (512,
1024, -1) and the current key = (512, 1025, 10), it increments the
offset by 1, continues hopeless search from (512, 1025, 11). This issue
make ioctl() to take unexpectedly long time scanning all the leaf a blocks
one by one.

This commit fix the problem using standard way of key comparison:
btrfs_comp_cpu_keys()

Signed-off-by: Naohiro Aota na...@elisp.net
---
 fs/btrfs/ioctl.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 1c22c65..07dc01d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
u64 found_transid;
struct extent_buffer *leaf;
struct btrfs_ioctl_search_header sh;
+   struct btrfs_key test;
unsigned long item_off;
unsigned long item_len;
int nritems;
@@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root *root,
}
 advance_key:
ret = 0;
-   if (key-offset  (u64)-1  key-offset  sk-max_offset)
+   test.objectid = sk-max_objectid;
+   test.type = sk-max_type;
+   test.offset = sk-max_offset;
+   if (btrfs_comp_cpu_keys(key, test) = 0)
+   ret = 1;
+   else if (key-offset  (u64)-1)
key-offset++;
-   else if (key-type  (u8)-1  key-type  sk-max_type) {
+   else if (key-type  (u8)-1) {
key-offset = 0;
key-type++;
-   } else if (key-objectid  (u64)-1  key-objectid  sk-max_objectid) 
{
+   } else if (key-objectid  (u64)-1) {
key-offset = 0;
key-type = 0;
key-objectid++;
-- 
2.4.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] btrfs: fix search key advancing condition

2015-07-29 Thread Naohiro Aota
Hello, list.

Could any one take a look at on this? I believe this is a issue slowing
down ioctl(BTRFS_IOC_TREE_SEARCH) if the target key is missing.

On Tue, Jun 30, 2015 at 11:25 AM, Naohiro Aota na...@elisp.net wrote:
 The search key advancing condition used in copy_to_sk() is loose. It can
 advance the key even if it reaches sk-max_*: e.g. when the max key = (512,
 1024, -1) and the current key = (512, 1025, 10), it increments the
 offset by 1, continues hopeless search from (512, 1025, 11). This issue
 make ioctl() to take unexpectedly long time scanning all the leaf a blocks
 one by one.

 This commit fix the problem using standard way of key comparison:
 btrfs_comp_cpu_keys()

 Signed-off-by: Naohiro Aota na...@elisp.net
 ---
  fs/btrfs/ioctl.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 1c22c65..07dc01d 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 u64 found_transid;
 struct extent_buffer *leaf;
 struct btrfs_ioctl_search_header sh;
 +   struct btrfs_key test;
 unsigned long item_off;
 unsigned long item_len;
 int nritems;
 @@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root 
 *root,
 }
  advance_key:
 ret = 0;
 -   if (key-offset  (u64)-1  key-offset  sk-max_offset)
 +   test.objectid = sk-max_objectid;
 +   test.type = sk-max_type;
 +   test.offset = sk-max_offset;
 +   if (btrfs_comp_cpu_keys(key, test) = 0)
 +   ret = 1;
 +   else if (key-offset  (u64)-1)
 key-offset++;
 -   else if (key-type  (u8)-1  key-type  sk-max_type) {
 +   else if (key-type  (u8)-1) {
 key-offset = 0;
 key-type++;
 -   } else if (key-objectid  (u64)-1  key-objectid  
 sk-max_objectid) {
 +   } else if (key-objectid  (u64)-1) {
 key-offset = 0;
 key-type = 0;
 key-objectid++;
 --
 2.4.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: test: umount if confirmation failed

2015-12-04 Thread Naohiro Aota
When a check in check_inode() failed, the test should umount test target
file system. This commit add clean up umount line in failure path.

Signed-off-by: Naohiro Aota <na...@elisp.net>
---
 tests/fsck-tests/012-leaf-corruption/test.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh 
b/tests/fsck-tests/012-leaf-corruption/test.sh
index 6e23145..bfdd0ea 100755
--- a/tests/fsck-tests/012-leaf-corruption/test.sh
+++ b/tests/fsck-tests/012-leaf-corruption/test.sh
@@ -57,6 +57,7 @@ check_inode()
# Check whether the inode exists
exists=$($SUDO_HELPER find $path -inum $ino)
if [ -z "$exists" ]; then
+   $SUDO_HELPER umount $TEST_MNT
_fail "inode $ino not recovered correctly"
fi
 
@@ -64,17 +65,20 @@ check_inode()
found_mode=$(printf "%o" 0x$($SUDO_HELPER stat $exists -c %f))
if [ $found_mode -ne $mode ]; then
echo "$found_mode"
+   $SUDO_HELPER umount $TEST_MNT
_fail "inode $ino modes not recovered"
fi
 
# Check inode size
found_size=$($SUDO_HELPER stat $exists -c %s)
if [ $mode -ne 41700 -a $found_size -ne $size ]; then
+   $SUDO_HELPER umount $TEST_MNT
_fail "inode $ino size not recovered correctly"
fi
 
# Check inode name
if [ "$(basename $exists)" != "$name" ]; then
+   $SUDO_HELPER umount $TEST_MNT
_fail "inode $ino name not recovered correctly"
else
return 0
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: properly reset nlink of multi-linked file

2015-12-04 Thread Naohiro Aota
If a file is linked from more than one directory and only one
of the links is corrupted, btrfs check dose not reset the nlink
properly. Actually it can go into infinite loop to link the broken file
into lost+found.

This patch fix two part of the code. The first one delay the freeing
valid (no error, found inode ref, directory index, and directory
item) backrefs. Freeing valid backrefs earier prevent reset_nlink() to
add back all valid links.

The second fix is obvious: passing `ref_type' to btrfs_add_link() is just
wrong. It should be `filetype' instead. The current code can break all valid
file links.

Signed-off-by: Naohiro Aota <na...@elisp.net>
---
 cmds-check.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 6a0b50a..11ff3fe 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -810,7 +810,8 @@ static void maybe_free_inode_rec(struct cache_tree 
*inode_cache,
if (backref->found_dir_item && backref->found_dir_index) {
if (backref->filetype != filetype)
backref->errors |= REF_ERR_FILETYPE_UNMATCH;
-   if (!backref->errors && backref->found_inode_ref) {
+   if (!backref->errors && backref->found_inode_ref &&
+   rec->nlink == rec->found_link) {
list_del(>list);
free(backref);
}
@@ -2392,7 +2393,7 @@ static int reset_nlink(struct btrfs_trans_handle *trans,
list_for_each_entry(backref, >backrefs, list) {
ret = btrfs_add_link(trans, root, rec->ino, backref->dir,
 backref->name, backref->namelen,
-backref->ref_type, >index, 1);
+backref->filetype, >index, 1);
if (ret < 0)
goto out;
}
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: fix file restore to lost+found bug

2015-12-04 Thread Naohiro Aota
This series address an issue of btrfsck to restore infinite number of
same file into `lost+found' directory. The issue occur on a file which
is linked from two different directory A and B. If links from dir A is
corrupted and links from dir B is kept valid, btrfsck won't stop
creating a file in lost+found like this:

-
Moving file 'file.del.51' to 'lost+found' dir since it has no valid backref
Fixed the nlink of inode 1876
Trying to rebuild inode:1877
Moving file 'del' to 'lost+found' dir since it has no valid backref
Fixed the nlink of inode 1877
Can't get file name for inode 1876, using '1876' as fallback
Moving file '1876' to 'lost+found' dir since it has no valid backref
Fixed the nlink of inode 1876
Can't get file name for inode 1876, using '1876' as fallback
Moving file '1876.1876' to 'lost+found' dir since it has no valid backref
Fixed the nlink of inode 1876
(snip)
Moving file 
'1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876'
 to 'lost+found' dir since it has no valid backref
Fixed the nlink of inode 1876
Can't get file name for inode 1876, using '1876' as fallback
Can't get file name for inode 1876, using '1876' as fallback
Can't get file name for inode 1876, using '1876' as fallback
-

The problem is early release of inode backrefs. The release prevents
`reset_nlink()' to add back valid backrefs to an inode. In the result,
the following results occur:

0. btrfsck scan a FS tree
1. It finds valid links and invalid links (some links are lost)
2. All valid links are released
3. btrfsck detects found_links != nlink
4. reset_nlink() reset nlink to 0
5. No valid links are restored (thus still nlink = 0)
6. The file is restored to lost+found since nlink == 0 (now, nlink = 1)
7. btrfsck rescan the FS tree
8. It finds `found_links' = #valid_links+1 (in lost+found) and nlink = 1
9. again all valid links are lost, and restore to lost+found

The first patch add clean up code to the test. It umount test
directory on failure path. The second patch fix the above problem. And
the last patch extend the test to check a case of multiple-linked file
corruption.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] btrfs-progs: fix file restore to lost+found bug

2015-12-06 Thread Naohiro Aota
On Sat, Dec 5, 2015 at 10:35 AM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
> On 12/04/2015 01:37 PM, Naohiro Aota wrote:
>>
>> This series address an issue of btrfsck to restore infinite number of
>> same file into `lost+found' directory. The issue occur on a file which
>> is linked from two different directory A and B. If links from dir A is
>> corrupted and links from dir B is kept valid, btrfsck won't stop
>> creating a file in lost+found like this:
>>
>> -
>> Moving file 'file.del.51' to 'lost+found' dir since it has no valid
>> backref
>> Fixed the nlink of inode 1876
>> Trying to rebuild inode:1877
>> Moving file 'del' to 'lost+found' dir since it has no valid backref
>> Fixed the nlink of inode 1877
>> Can't get file name for inode 1876, using '1876' as fallback
>> Moving file '1876' to 'lost+found' dir since it has no valid backref
>> Fixed the nlink of inode 1876
>> Can't get file name for inode 1876, using '1876' as fallback
>> Moving file '1876.1876' to 'lost+found' dir since it has no valid backref
>> Fixed the nlink of inode 1876
>> (snip)
>> Moving file
>> '1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876'
>> to 'lost+found' dir since it has no valid backref
>> Fixed the nlink of inode 1876
>> Can't get file name for inode 1876, using '1876' as fallback
>> Can't get file name for inode 1876, using '1876' as fallback
>> Can't get file name for inode 1876, using '1876' as fallback
>> -
>>
>> The problem is early release of inode backrefs. The release prevents
>> `reset_nlink()' to add back valid backrefs to an inode. In the result,
>> the following results occur:
>>
>> 0. btrfsck scan a FS tree
>> 1. It finds valid links and invalid links (some links are lost)
>> 2. All valid links are released
>> 3. btrfsck detects found_links != nlink
>> 4. reset_nlink() reset nlink to 0
>> 5. No valid links are restored (thus still nlink = 0)
>> 6. The file is restored to lost+found since nlink == 0 (now, nlink = 1)
>> 7. btrfsck rescan the FS tree
>> 8. It finds `found_links' = #valid_links+1 (in lost+found) and nlink = 1
>> 9. again all valid links are lost, and restore to lost+found
>
>
> Right, that's one case I missed in the repair code.
>
> Thanks for the fix.

Thanks for the review.

>
>>
>> The first patch add clean up code to the test. It umount test
>> directory on failure path. The second patch fix the above problem. And
>> the last patch extend the test to check a case of multiple-linked file
>> corruption.
>
>
> But I only see the first 2 patches in maillist...
> The last test case seems missing?

Maybe, the last patch is too large to post to the list? Even it get
smaller, 130260 bytes seems to be a bit large.

How should I handle this? Put my repo somewhere and wait a maintainer
to pull it?

>
> Thanks,
> Qu
>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: test: umount if confirmation failed

2015-12-07 Thread Naohiro Aota
On Tue, Dec 8, 2015 at 12:33 AM, David Sterba <dste...@suse.cz> wrote:
> On Fri, Dec 04, 2015 at 02:37:25PM +0900, Naohiro Aota wrote:
>> When a check in check_inode() failed, the test should umount test target
>> file system. This commit add clean up umount line in failure path.
>
> The lack of cleanup after failed tests is intentional (exceptions
> possible). The tests are supposed to do cleanup if they pass but if they
> fail the test environment could be examined. In case of more complex
> setups (loop devices, dm devices) it's some work to set it up again.
>
> We could add an optional per-test script cleanup.sh that does the
> cleanup on demand (and is also run from tests/clean-tests.sh).

I see.
Well, I overlooked the README.md stating about clean-tests.sh...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] btrfs-progs: fix file restore to lost+found bug

2015-12-07 Thread Naohiro Aota
On Tue, Dec 8, 2015 at 12:35 AM, David Sterba <dste...@suse.cz> wrote:
> On Mon, Dec 07, 2015 at 11:59:19AM +0900, Naohiro Aota wrote:
>> > But I only see the first 2 patches in maillist...
>> > The last test case seems missing?
>>
>> Maybe, the last patch is too large to post to the list? Even it get
>> smaller, 130260 bytes seems to be a bit large.
>>
>> How should I handle this? Put my repo somewhere and wait a maintainer
>> to pull it?
>
> Please send it to me directly. The image will be available in
> btrfs-progs git and we don't necessarily need the copy in the
> mailinglist.

Sure. I'll send it to you.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Show a warning message if one of objectid reaches its highest value

2016-03-08 Thread Naohiro Aota
2016-03-07 12:05 GMT+09:00 Satoru Takeuchi :
> - It's better to show a warning message for the exceptional case
>   that one of objectid (in most case, inode number) reaches its
>   highest value. Show this message only once to avoid filling
>   dmesg with it.
> - EOVERFLOW is more proper return value for this case.
>   ENOSPC is for "No space left on device" case and objectid isn't
>   related to any device.

I have concern about EOVERFLOW. The value returned here will
go through to the user space via btrfs_find_free_ino() and btrfs_create().
It means that "creat" and "mkdir" can now return EOVERFLOW when it
failed to assign new inode number. Such behavior would disagree with
other file systems, which result in user space programs to be
confused.

Also, I don't think EOVERFLOW described in "creat(2)" (or open(2))
suits for this case. As far as I read the following man page from
creat(2),
giving ENOSPC is better option here.

> ENOSPC: pathname was to be created but the device containing pathname has  no 
>  room  for  the  new file.
> EOVERFLOW:
>  pathname refers to a regular file that is too large to be opened.
> (snip)

> Signed-off-by: Satoru Takeuchi 
> ---
> This patch can be applied to 4.5-rc7
> ---
>  fs/btrfs/inode-map.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index e50316c..f5e3228 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -556,7 +556,15 @@ int btrfs_find_free_objectid(struct btrfs_root *root, 
> u64 *objectid)
> mutex_lock(>objectid_mutex);
>
> if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> -   ret = -ENOSPC;
> +   static bool __warned = false;
> +
> +   if (unlikely(!__warned)) {
> +   btrfs_warn(root->fs_info,
> +  "The objectid of root %llu reaches its 
> highest value.\n",
> +  root->root_key.objectid);
> +   __warned = true;
> +   }
> +   ret = -EOVERFLOW;
> goto out;
> }
>
> --
> 2.5.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs

2016-09-02 Thread Naohiro Aota
Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But
the work can be done by btrfs_delete_unused_bgs() (and it's better since it
trim the BG). Let's dedupe the code.

While btrfs_delete_unused_bgs() is already hitting the relocated BG, it
skip the BG since the BG has "ro" flag set (to keep balancing BG intact).
On the other hand, btrfs cannot drop "ro" flag here to prevent additional
writes. So this patch make use of "removed" flag.
btrfs_delete_unused_bgs() now detect the flag to distinguish whether a
read-only BG is relocating or not.

Signed-off-by: Naohiro Aota <naohiro.a...@hgst.com>
---
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/volumes.c | 24 ++--
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 843ed27..d382735 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10971,7 +10971,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
spin_lock(_group->lock);
if (block_group->reserved ||
btrfs_block_group_used(_group->item) ||
-   block_group->ro ||
+   (block_group->ro && !block_group->removed) ||
list_is_singular(_group->list)) {
/*
 * We want to bail if we made new allocations or have
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7e6399f..1a6789d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2931,8 +2931,8 @@ out:
 static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset)
 {
struct btrfs_root *extent_root;
-   struct btrfs_trans_handle *trans;
int ret;
+   struct btrfs_block_group_cache *block_group;
 
root = root->fs_info->chunk_root;
extent_root = root->fs_info->extent_root;
@@ -2962,21 +2962,17 @@ static int btrfs_relocate_chunk(struct btrfs_root 
*root, u64 chunk_offset)
if (ret)
return ret;
 
-   trans = btrfs_start_trans_remove_block_group(root->fs_info,
-chunk_offset);
-   if (IS_ERR(trans)) {
-   ret = PTR_ERR(trans);
-   btrfs_handle_fs_error(root->fs_info, ret, NULL);
-   return ret;
-   }
-
/*
-* step two, delete the device extents and the
-* chunk tree entries
+* step two, flag the chunk as removed and let
+* btrfs_delete_unused_bgs() remove it.
 */
-   ret = btrfs_remove_chunk(trans, root, chunk_offset);
-   btrfs_end_transaction(trans, extent_root);
-   return ret;
+   block_group = btrfs_lookup_block_group(root->fs_info, chunk_offset);
+   spin_lock(_group->lock);
+   block_group->removed = 1;
+   spin_unlock(_group->lock);
+   btrfs_put_block_group(block_group);
+
+   return 0;
 }
 
 static int btrfs_relocate_sys_chunks(struct btrfs_root *root)
-- 
2.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs

2016-09-04 Thread Naohiro Aota
2016-09-02 (金) の 09:35 -0400 に Josef Bacik さんは書きました:
> On 09/02/2016 03:46 AM, Naohiro Aota wrote:
> > 
> > Currently, btrfs_relocate_chunk() is removing relocated BG by
> > itself. But
> > the work can be done by btrfs_delete_unused_bgs() (and it's better
> > since it
> > trim the BG). Let's dedupe the code.
> > 
> > While btrfs_delete_unused_bgs() is already hitting the relocated
> > BG, it
> > skip the BG since the BG has "ro" flag set (to keep balancing BG
> > intact).
> > On the other hand, btrfs cannot drop "ro" flag here to prevent
> > additional
> > writes. So this patch make use of "removed" flag.
> > btrfs_delete_unused_bgs() now detect the flag to distinguish
> > whether a
> > read-only BG is relocating or not.
> > 
> 
> This seems racey to me.  We remove the last part of the block group,
> it ends up 
> on the unused_bgs_list, we process this list, see that removed isn't
> set and we 
> skip it, then later we set removed, but it's too late.  I think the
> right way is 
> to actually do a transaction, set ->removed, manually add it to the 
> unused_bgs_list if it's not already, then end the transaction.  This
> way we are 
> guaranteed to have the bg on the list when it is ready to be
> removed.  This is 
> my analysis after looking at it for 10 seconds after being awake for
> like 30 
> minutes so if I'm missing something let me know.  Thanks,

I don't think a race will happen. Since we are holding
delete_unused_bgs_mutex here, btrfs_delte_unused_bgs() checks ->removed
flag after we unlock the mutex i.e. we setup the flag properly. For a
case btrfs_delete_usused_bgs() checks the BG before we hold
delte_unused_bgs_mutex, then that BG is removed by it (if it's empty)
and btrfs_relocate_chunk() should never see it.

Regards,
Naohiro


[PATCH 1/2] btrfs-progs: build: generate all dependency files

2017-09-14 Thread Naohiro Aota
We're missing several dependency files like:

$ diff -u <(find -name '*.o'|cut -d. -f2|sort) <(find -name '*.o.d'|cut -d. 
-f2|sort)
--- /proc/self/fd/112017-09-14 18:17:44.460564620 +0900
+++ /proc/self/fd/122017-09-14 18:17:44.460564620 +0900
@@ -3,7 +3,6 @@
 /btrfs-corrupt-block
 /btrfs-debug-tree
 /btrfs-find-root
-/btrfs-list
 /btrfs-map-logical
 /btrfs-select-super
 /btrfstune
@@ -29,11 +28,6 @@
 /cmds-scrub
 /cmds-send
 /cmds-subvolume
-/convert/common
-/convert/main
-/convert/source-ext2
-/convert/source-fs
-/convert/source-reiserfs
 /ctree
 /dir-item
 /disk-io


This is due to moving things out of objects and cmds_objects variables. Such
missing dependency files cause mis-building of some source files (try touch
utils.h; make mkfs/main.o).

This patch introduce a new variable "all_objects" to keep all the objects and
use the variable to generate proper dependency file building rules.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
---
 Makefile |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index a114eca..c00dff6 100644
--- a/Makefile
+++ b/Makefile
@@ -121,6 +121,9 @@ libbtrfs_headers = send-stream.h send-utils.h send.h 
kernel-lib/rbtree.h btrfs-l
 convert_objects = convert/main.o convert/common.o convert/source-fs.o \
  convert/source-ext2.o convert/source-reiserfs.o
 mkfs_objects = mkfs/main.o mkfs/common.o
+image_objects = image/main.o
+all_objects = $(objects) $(cmds_objects) $(libbtrfs_objects) 
$(convert_objects) \
+ $(mkfs_objects) $(image_objects)
 
 TESTS = fsck-tests.sh convert-tests.sh
 
@@ -591,5 +594,5 @@ uninstall:
cd $(DESTDIR)$(bindir); $(RM) -f -- btrfsck fsck.btrfs $(progs_install)
 
 ifneq ($(MAKECMDGOALS),clean)
--include $(objects:.o=.o.d) $(cmds_objects:.o=.o.d) $(subst .btrfs,, 
$(filter-out btrfsck.o.d, $(progs:=.o.d)))
+-include $(all_objects:.o=.o.d) $(subst .btrfs,, $(filter-out btrfsck.o.d, 
$(progs:=.o.d)))
 endif

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: build: omit unnecessary -MD flag

2017-09-14 Thread Naohiro Aota
According to gcc(1), "-MD is equivalent to -M -MF file, except that -E is not
implied." Since the rule in the Makefile is just generating dependency file
and not building object file, it is no use to have "-MD" here. Also, it's
overridden and conflicting with the following "-MM" flag. I guess we can drop
it.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
---
 Makefile |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index c00dff6..60c802a 100644
--- a/Makefile
+++ b/Makefile
@@ -264,7 +264,7 @@ else
 endif
 
 %.o.d: %.c
-   $(Q)$(CC) -MD -MM -MG -MF $@ -MT $(@:.o.d=.o) -MT $(@:.o.d=.static.o) 
-MT $@ $(CFLAGS) $<
+   $(Q)$(CC) -MM -MG -MF $@ -MT $(@:.o.d=.o) -MT $(@:.o.d=.static.o) -MT 
$@ $(CFLAGS) $<
 
 #
 # Pick from per-file variables, btrfs_*_cflags

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: build: generate all dependency files

2017-09-14 Thread Naohiro Aota
On 2017年09月14日 21:41, David Sterba wrote:
> On Thu, Sep 14, 2017 at 07:10:46PM +0900, Naohiro Aota wrote:
>> We're missing several dependency files like:
>>
>> $ diff -u <(find -name '*.o'|cut -d. -f2|sort) <(find -name '*.o.d'|cut -d. 
>> -f2|sort)
>>--- /proc/self/fd/112017-09-14 18:17:44.460564620 +0900
>>+++ /proc/self/fd/122017-09-14 18:17:44.460564620 +0900
> 
> Please note that an actual diff in the changelog is understood as start
> of the patch by git-am, indenting the --- or +++ lines makes it work
> again.

Oops, I forgot about that limitation. Thank you for the fix.

> 
>> @@ -3,7 +3,6 @@
>>  /btrfs-corrupt-block
>>  /btrfs-debug-tree
>>  /btrfs-find-root
>> -/btrfs-list
>>  /btrfs-map-logical
>>  /btrfs-select-super
>>  /btrfstune
>> @@ -29,11 +28,6 @@
>>  /cmds-scrub
>>  /cmds-send
>>  /cmds-subvolume
>> -/convert/common
>> -/convert/main
>> -/convert/source-ext2
>> -/convert/source-fs
>> -/convert/source-reiserfs
>>  /ctree
>>  /dir-item
>>  /disk-io
>> 
>>
>> This is due to moving things out of objects and cmds_objects variables. Such
>> missing dependency files cause mis-building of some source files (try touch
>> utils.h; make mkfs/main.o).
>>
>> This patch introduce a new variable "all_objects" to keep all the objects and
>> use the variable to generate proper dependency file building rules.
>>
>> Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
> 
> Applied, thanks.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix max chunk size on dup

2017-10-11 Thread Naohiro Aota
Hi,

You may notice my @wdc.com address is bounced. As my internship job at
wdc is finished, the address is already expired.
Please contact me on this @elisp.net address.

2017-10-11 2:42 GMT+09:00 Hans van Kranenburg <hans.van.kranenb...@mendix.com>:
> Sorry for the mail spam, it's an interesting code puzzle... :)
>
> On 10/10/2017 07:22 PM, Hans van Kranenburg wrote:
>> On 10/10/2017 07:07 PM, Hans van Kranenburg wrote:
>>> On 10/10/2017 01:31 PM, David Sterba wrote:
>>>> Hi,
>>>>
>>>> On Fri, Sep 29, 2017 at 04:20:51PM +0900, Naohiro Aota wrote:
>>>>> Balancing a fresh METADATA=dup btrfs file system (with size < 50G)
>>>>> generates a 128MB sized block group. While we set max_stripe_size =
>>>>> max_chunk_size = 256MB, we get this half sized block group:
>>>>>
>>>>> $ btrfs ins dump-t -t CHUNK_TREE btrfs.img|grep length
>>>>> length 8388608 owner 2 stripe_len 65536 type DATA
>>>>> length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
>>>>> length 134217728 owner 2 stripe_len 65536 type 
>>>>> METADATA|DUP
>>>>>
>>>>> Before commit 86db25785a6e ("Btrfs: fix max chunk size on raid5/6"), we
>>>>> used "stripe_size * ndevs > max_chunk_size * ncopies" to check the max
>>>>> chunk size. Since stripe_size = 256MB * dev_stripes (= 2) = 512MB, ndevs
>>>>> = 1, max_chunk_size = 256MB, and ncopies = 2, we allowed 256MB
>>>>> METADATA|DUP block group.
>>>>>
>>>>> But now, we use "stripe_size * data_stripes > max_chunk_size". Since
>>>>> data_stripes = 1 on DUP, it disallows the block group to have > 128MB.
>>>>> What missing here is "dev_stripes". Proper logical space used by the block
>>>>> group is "stripe_size * data_stripes / dev_stripes". Tweak the equations 
>>>>> to
>>>>> use the right value.
>>>>
>>>> I started looking into it and still don't fully understand it. Change
>>>> deep in the allocator can easily break some blockgroup combinations, so
>>>> I'm rather conservative here.
>>>
>>> I think that the added usage of data_stripes in 86db25785a6e is the
>>> problematic change. data_stripes is something that was introduced as
>>> part of RAID56 in 53b381b3a and clearly only has a meaning that's
>>> properly thought of for RAID56. The RAID56 commit already adds "this
>>> will have to be fixed for RAID1 and RAID10 over more drives", only the
>>> author doesn't catch the DUP case, which already breaks at that point.
>>>

Thank you for explaining in detail :) Yes, the allocator was already
broken by using the data_stripes.
(and it will also "break" existing file systems, in terms of making a
DUP|META block group smaller)
I believe this patch fix the things.

>>> At the beginning it says:
>>>
>>> int data_stripes; /* number of stripes that count for block group size */
>>>
>>> For the example:
>>>
>>> This is DUP:
>>>
>>> .sub_stripes= 1,
>>> .dev_stripes= 2,
>>> .devs_max   = 1,
>>> .devs_min   = 1,
>>> .tolerated_failures = 0,
>>> .devs_increment = 1,
>>> .ncopies= 2,
>>>
>>> In the code:
>>>
>>> max_stripe_size = SZ_256M
>>> max_chunk_size = max_stripe_size-> SZ_256M
>>>
>>> Then we have find_free_dev_extent:
>>> max_stripe_size * dev_stripes   -> SZ_256M * 2 -> 512M
>>>
>>> So we like to find 512M on a disk, to stuff 2 stripes of 256M inside for
>>> the DUP. (remember: The two parts of DUP *never* end up on a different
>>> disk, even if you have multiple)
>>>
>
> Another better fix would be to make a change here...
>
>>> If we find one:
>>> stripe_size = devices_info[ndevs-1].max_avail   -> 512M, yay
>
> ...because this is not yay. The 512M is max_avail, which needs to holds
> *2* stripes, not 1. So stripe_size is suddenly changed to twice the
> stripe size for DUP.
>
> So, an additional division again by dev_stripes would translate the
> max_avail on device ndevs-1 to the stripe size to use.

This catch the point. Notice stripe_size is divided by dev_stripes
after the lines in the patch.
It means that at this region (from "stripe_size =
devices_info[ndevs-1].max_avail;" to "stripe_size =
div_u64(stripe_size, dev_stri

Re: [PATCH] btrfs: set include path relatively

2017-10-13 Thread Naohiro Aota
2017-10-12 21:38 GMT+09:00 David Sterba <dste...@suse.cz>:
> On Thu, Oct 12, 2017 at 11:22:24AM +0900, Naohiro Aota wrote:
>> Currently, gcc is passed the include directory with full path. As a result,
>> dependency files (*.o.d) also record the full path at the build time. Such
>> full path dependency is annoying for sharing the source between multiple
>> machines, containers, or anything the path differ.
>>
>> And this is the same way what other program using autotools e.g. e2fsprogs
>> is doing:
>>
>> $ grep top_builddir Makefile
>> top_builddir = .
>> CPPFLAGS = -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib
>> BUILD_CFLAGS = -g -O2  -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib 
>> -DHAVE_CONFIG_H
>> 
>
> Makes sense for the header files. TOPDIR is also used for linking, can
> it cause similar problems, where the library search path is set as 
> -L$(TOPDIR) ?

Since both "TOPDIR := $(shell pwd)" and -L$(TOPDIR) is evaluated at
the build time,
it never cause problem on linking itself. What the problem with
dependency files is persisting
the full path into the dep files. We generate the dep files only for
objects (%.o.d rule)

Well, it's ok to use "-L." here, and it's the same way with autotools does.

>
> I wonder if we should do the same as in the example above and set
> default value of TOPDIR to '.', instead of `pwd`.

TOPDIR is also used by library-test{,.static}. They run gcc in a
temporary directory.
Thus, TOPDIR here should be absolute.

>
>
>> Signed-off-by: Naohiro Aota <na...@elisp.net>
>> ---
>>  Makefile | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index b1f3388..69b94fb 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -70,8 +70,8 @@ CFLAGS = $(SUBST_CFLAGS) \
>>-D_XOPEN_SOURCE=700  \
>>-fno-strict-aliasing \
>>-fPIC \
>> -  -I$(TOPDIR) \
>> -  -I$(TOPDIR)/kernel-lib \
>> +  -I. \
>> +  -I./kernel-lib \
>>$(EXTRAWARN_CFLAGS) \
>>$(DEBUG_CFLAGS_INTERNAL) \
>>$(EXTRA_CFLAGS)
>> --
>> 2.14.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: propagate error to btrfs_cmp_data_prepare caller

2017-09-08 Thread Naohiro Aota
btrfs_cmp_data_prepare() (almost) always returns 0 i.e. ignoring errors
from gather_extent_pages(). While the pages are freed by
btrfs_cmp_data_free(), cmp->num_pages still has > 0. Then,
btrfs_extent_same() try to access the already freed pages causing faults
(or violates PageLocked assertion).

This patch just return the error as is so that the caller stop the process.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
Fixes: f441460202cb ("btrfs: fix deadlock with extent-same and readpage")
Cc: <sta...@vger.kernel.org> # 4.2
---
 fs/btrfs/ioctl.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ae8fbf9d3de2..b5971923f15f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3028,7 +3028,7 @@ static int btrfs_cmp_data_prepare(struct inode *src, u64 
loff,
 out:
if (ret)
btrfs_cmp_data_free(cmp);
-   return 0;
+   return ret;
 }
 
 static int btrfs_cmp_data(u64 len, struct cmp_pages *cmp)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: clear ordered flag on cleaning up ordered extents

2017-09-08 Thread Naohiro Aota
On 2017年09月08日 03:25, David Sterba wrote:
> On Fri, Sep 01, 2017 at 06:59:49PM +0800, Qu Wenruo wrote:
>> On 2017年09月01日 16:58, Naohiro Aota wrote:
>>> commit 524272607e88 ("btrfs: Handle delalloc error correctly to avoid
>>> ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup
>>> submitted ordered extents. However, it does not clear the ordered bit
>>> (Private2) of coresponding pages. Thus, the following BUG occurs from
>>> free_pages_check_bad() (on btrfs/125 with nospace_cache).
>>>
>>> BUG: Bad page state in process btrfs  pfn:3fa787
>>> page:df2acfe9e1c0 count:0 mapcount:0 mapping:  (null) index:0xd
>>> flags: 0x80002008(uptodate|private_2)
>>> raw: 80002008  000d 
>>> raw: df2acf5c1b20 b443802238b0  
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> bad because of flags: 0x2000(private_2)
>>>
>>> This patch clear the flag as same as other places calling
>>> btrfs_dec_test_ordered_pending() for every page in the specified range.
>>>
>>> Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
>>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid 
>>> ordered extent hang")
>>> Cc: <sta...@vger.kernel.org> # 4.12
>>> ---
>>>   fs/btrfs/inode.c |   12 
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 24bcd5cd9cf2..ae4c0a1bef38 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -135,6 +135,18 @@ static inline void 
>>> btrfs_cleanup_ordered_extents(struct inode *inode,
>>>  const u64 offset,
>>>  const u64 bytes)
>>>   {
>>> +   unsigned long index = offset >> PAGE_SHIFT;
>>> +   unsigned long end_index = (offset + bytes - 1) >> PAGE_SHIFT;
>>> +   struct page *page;
>>> +
>>> +   while (index <= end_index) {
>>> +   page = find_get_page(inode->i_mapping, index);
>>> +   index++;
>>> +   if (!page)
>>> +   continue;
>>> +   ClearPagePrivate2(page);
>>> +   put_page(page);
>>> +   }
>>
>> At first glance, explicitly clearing Private2 flag here seems a little 
>> strange to me.
>> However btrfs_invalidatepage() is also doing the same thing, I think 
>> it's fine.
>>
>> Reviewed-by: Qu Wenruo <quwenruo.bt...@gmx.com>
>>
>> BTW, Private2 flag is set by extent_clear_unlock_delalloc() with 
>> page_ops |= PAGE_SET_PRIVATE2, but we're clearing the page flag without 
>> any encapsulation, it may be better to use similar function to clear 
>> Private2 flag.
> 
> I agree, the Private2 flag is given another meaning in btrfs, ie. the
> writeback status, so this would be better wrapped in helpers that
> reflect what is the private2 flag used for. The helpers might be
> trivial, but their name will be a better documentation than the random
> comments that we can be found next to its use.
> 

Thanks for the reviews. I'll send a new patch to add the wrapping
function. (or should I squash the patch with this change?)

Regards,
Naohiro
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix max chunk size on dup

2017-09-29 Thread Naohiro Aota
Balancing a fresh METADATA=dup btrfs file system (with size < 50G)
generates a 128MB sized block group. While we set max_stripe_size =
max_chunk_size = 256MB, we get this half sized block group:

$ btrfs ins dump-t -t CHUNK_TREE btrfs.img|grep length
length 8388608 owner 2 stripe_len 65536 type DATA
length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
length 134217728 owner 2 stripe_len 65536 type METADATA|DUP

Before commit 86db25785a6e ("Btrfs: fix max chunk size on raid5/6"), we
used "stripe_size * ndevs > max_chunk_size * ncopies" to check the max
chunk size. Since stripe_size = 256MB * dev_stripes (= 2) = 512MB, ndevs
= 1, max_chunk_size = 256MB, and ncopies = 2, we allowed 256MB
METADATA|DUP block group.

But now, we use "stripe_size * data_stripes > max_chunk_size". Since
data_stripes = 1 on DUP, it disallows the block group to have > 128MB.
What missing here is "dev_stripes". Proper logical space used by the block
group is "stripe_size * data_stripes / dev_stripes". Tweak the equations to
use the right value.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
---
 fs/btrfs/volumes.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e8f16c305df..d1ac3e1fb753 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4751,10 +4751,11 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
 * is really going to be in terms of logical address space,
 * and compare that answer with the max chunk size
 */
-   if (stripe_size * data_stripes > max_chunk_size) {
+   if (stripe_size * data_stripes > max_chunk_size * dev_stripes) {
u64 mask = (1ULL << 24) - 1;
 
-   stripe_size = div_u64(max_chunk_size, data_stripes);
+   stripe_size = div_u64(max_chunk_size * dev_stripes,
+ data_stripes);
 
/* bump the answer up to a 16MB boundary */
stripe_size = (stripe_size + mask) & ~mask;

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix NULL pointer dereference from free_reloc_roots()

2017-08-24 Thread Naohiro Aota
__del_reloc_root should be called before freeing up reloc_root->node.
If not, calling __del_reloc_root() dereference reloc_root->node, causing
the system BUG.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
---
 fs/btrfs/relocation.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 65661d1aae4e..6445de8e9ece 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2393,11 +2393,11 @@ void free_reloc_roots(struct list_head *list)
while (!list_empty(list)) {
reloc_root = list_entry(list->next, struct btrfs_root,
root_list);
+   __del_reloc_root(reloc_root);
free_extent_buffer(reloc_root->node);
free_extent_buffer(reloc_root->commit_root);
reloc_root->node = NULL;
reloc_root->commit_root = NULL;
-   __del_reloc_root(reloc_root);
}
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: finish ordered extent cleaning if no progress is found

2017-09-01 Thread Naohiro Aota
__endio_write_update_ordered() repeats the search until it reaches the end
of the specified range. This works well with direct IO path, because before
the function is called, it's ensured that there are ordered extents filling
whole the range. It's not the case, however, when it's called from
run_delalloc_range(): it is possible to have error in the midle of the loop
in e.g. run_delalloc_nocow(), so that there exisits the range not covered
by any ordered extents. By cleaning such "uncomplete" range,
__endio_write_update_ordered() stucks at offset where there're no ordered
extents.

Since the ordered extents are created from head to tail, we can stop the
search if there are no offset progress.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered 
extent hang")
Cc: <sta...@vger.kernel.org> # 4.12
---
 fs/btrfs/inode.c |4 
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ae4c0a1bef38..fd5934121b4b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8309,6 +8309,7 @@ static void __endio_write_update_ordered(struct inode 
*inode,
btrfs_work_func_t func;
u64 ordered_offset = offset;
u64 ordered_bytes = bytes;
+   u64 last_offset;
int ret;
 
if (btrfs_is_free_space_inode(BTRFS_I(inode))) {
@@ -8320,6 +8321,7 @@ static void __endio_write_update_ordered(struct inode 
*inode,
}
 
 again:
+   last_offset = ordered_offset;
ret = btrfs_dec_test_first_ordered_pending(inode, ,
   _offset,
   ordered_bytes,
@@ -8330,6 +8332,8 @@ static void __endio_write_update_ordered(struct inode 
*inode,
btrfs_init_work(>work, func, finish_ordered_fn, NULL, NULL);
btrfs_queue_work(wq, >work);
 out_test:
+   if (ordered_offset == last_offset)
+   return;
/*
 * our bio might span multiple ordered extents.  If we haven't
 * completed the accounting for the whole dio, go back and try again

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: clear ordered flag on cleaning up ordered extents

2017-09-01 Thread Naohiro Aota
commit 524272607e88 ("btrfs: Handle delalloc error correctly to avoid
ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup
submitted ordered extents. However, it does not clear the ordered bit
(Private2) of coresponding pages. Thus, the following BUG occurs from
free_pages_check_bad() (on btrfs/125 with nospace_cache).

BUG: Bad page state in process btrfs  pfn:3fa787
page:df2acfe9e1c0 count:0 mapcount:0 mapping:  (null) index:0xd
flags: 0x80002008(uptodate|private_2)
raw: 80002008  000d 
raw: df2acf5c1b20 b443802238b0  
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x2000(private_2)

This patch clear the flag as same as other places calling
btrfs_dec_test_ordered_pending() for every page in the specified range.

Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com>
Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered 
extent hang")
Cc: <sta...@vger.kernel.org> # 4.12
---
 fs/btrfs/inode.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 24bcd5cd9cf2..ae4c0a1bef38 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -135,6 +135,18 @@ static inline void btrfs_cleanup_ordered_extents(struct 
inode *inode,
 const u64 offset,
 const u64 bytes)
 {
+   unsigned long index = offset >> PAGE_SHIFT;
+   unsigned long end_index = (offset + bytes - 1) >> PAGE_SHIFT;
+   struct page *page;
+
+   while (index <= end_index) {
+   page = find_get_page(inode->i_mapping, index);
+   index++;
+   if (!page)
+   continue;
+   ClearPagePrivate2(page);
+   put_page(page);
+   }
return __endio_write_update_ordered(inode, offset + PAGE_SIZE,
bytes - PAGE_SIZE, false);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: set include path relatively

2017-10-11 Thread Naohiro Aota
Currently, gcc is passed the include directory with full path. As a result,
dependency files (*.o.d) also record the full path at the build time. Such
full path dependency is annoying for sharing the source between multiple
machines, containers, or anything the path differ.

And this is the same way what other program using autotools e.g. e2fsprogs
is doing:

$ grep top_builddir Makefile
top_builddir = .
CPPFLAGS = -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib
BUILD_CFLAGS = -g -O2  -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib 
-DHAVE_CONFIG_H


Signed-off-by: Naohiro Aota <na...@elisp.net>
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index b1f3388..69b94fb 100644
--- a/Makefile
+++ b/Makefile
@@ -70,8 +70,8 @@ CFLAGS = $(SUBST_CFLAGS) \
 -D_XOPEN_SOURCE=700  \
 -fno-strict-aliasing \
 -fPIC \
--I$(TOPDIR) \
--I$(TOPDIR)/kernel-lib \
+-I. \
+-I./kernel-lib \
 $(EXTRAWARN_CFLAGS) \
 $(DEBUG_CFLAGS_INTERNAL) \
 $(EXTRA_CFLAGS)
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: revert fs_devices state on error of btrfs_init_new_device()

2018-07-26 Thread Naohiro Aota
When btrfs hits error after modifying fs_devices in
btrfs_init_new_device() (such as btrfs_add_dev_item() returns error), it
leaves everything as is, but frees allocated btrfs_device. As a result,
fs_devices->devices and fs_devices->alloc_list contain already freed
btrfs_device, leading to later use-after-free bug.

Error path also messes the things like ->num_devices. While they go backs
to the original value by unscanning btrfs devices, it is safe to revert
them here.

Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling")
Signed-off-by: Naohiro Aota 
---
 fs/btrfs/volumes.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

 This patch applies on master, but not on kdave/for-next because of
 74b9f4e186eb ("btrfs: declare fs_devices in btrfs_init_new_device()")

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1da162928d1a..5f0512fffa52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2410,7 +2410,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
struct list_head *devices;
struct super_block *sb = fs_info->sb;
struct rcu_string *name;
-   u64 tmp;
+   u64 orig_super_total_bytes, orig_super_num_devices;
int seeding_dev = 0;
int ret = 0;
bool unlocked = false;
@@ -2509,12 +2509,14 @@ int btrfs_init_new_device(struct btrfs_fs_info 
*fs_info, const char *device_path
if (!blk_queue_nonrot(q))
fs_info->fs_devices->rotating = 1;
 
-   tmp = btrfs_super_total_bytes(fs_info->super_copy);
+   orig_super_total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
btrfs_set_super_total_bytes(fs_info->super_copy,
-   round_down(tmp + device->total_bytes, fs_info->sectorsize));
+   round_down(orig_super_total_bytes + device->total_bytes,
+  fs_info->sectorsize));
 
-   tmp = btrfs_super_num_devices(fs_info->super_copy);
-   btrfs_set_super_num_devices(fs_info->super_copy, tmp + 1);
+   orig_super_num_devices = btrfs_super_num_devices(fs_info->super_copy);
+   btrfs_set_super_num_devices(fs_info->super_copy,
+   orig_super_num_devices + 1);
 
/* add sysfs device entry */
btrfs_sysfs_add_device_link(fs_info->fs_devices, device);
@@ -2594,6 +2596,22 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
 
 error_sysfs:
btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);
+   mutex_lock(_info->fs_devices->device_list_mutex);
+   mutex_lock(_info->chunk_mutex);
+   list_del_rcu(>dev_list);
+   list_del(>dev_alloc_list);
+   fs_info->fs_devices->num_devices--;
+   fs_info->fs_devices->open_devices--;
+   fs_info->fs_devices->rw_devices--;
+   fs_info->fs_devices->total_devices--;
+   fs_info->fs_devices->total_rw_bytes -= device->total_bytes;
+   atomic64_sub(device->total_bytes, _info->free_chunk_space);
+   btrfs_set_super_total_bytes(fs_info->super_copy,
+   orig_super_total_bytes);
+   btrfs_set_super_num_devices(fs_info->super_copy,
+   orig_super_num_devices);
+   mutex_unlock(_info->chunk_mutex);
+   mutex_unlock(_info->fs_devices->device_list_mutex);
 error_trans:
if (seeding_dev)
sb->s_flags |= SB_RDONLY;
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix use-after-free of cmp workspace pages

2018-07-13 Thread Naohiro Aota
btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves
their page address intact. Now, if you hit "goto again" in
btrfs_extent_same_range() and hit some error in
btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages.

This is simple fix to reset the address to avoid use-after-free.

Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
Signed-off-by: Naohiro Aota 
---
 fs/btrfs/ioctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 43ecbe620dea..b077544b5232 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3327,11 +3327,13 @@ static void btrfs_cmp_data_free(struct cmp_pages *cmp)
if (pg) {
unlock_page(pg);
put_page(pg);
+   cmp->src_pages[i] = NULL;
}
pg = cmp->dst_pages[i];
if (pg) {
unlock_page(pg);
put_page(pg);
+   cmp->dst_pages[i] = NULL;
}
}
 }
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html