[PATCH v2 4/6] btrfs-progs: mkfs: Error out gracefully for --rootdir

2017-10-18 Thread Qu Wenruo
--rootdir option will start a transaction to fill the fs, however if
something goes wrong, from ENOSPC to lack of permission, we won't commit
transaction and cause BUG_ON trigger by uncommitted transaction:

--
extent buffer leak: start 29392896 len 16384
extent_io.c:579: free_extent_buffer: BUG_ON `eb->flags & EXTENT_DIRTY` 
triggered, value 1
--

The root fix is to introduce btrfs_abort_transaction() in btrfs-progs,
however in this particular case, we can workaround it by force
committing the transaction.

Since during mkfs, the magic of btrfs is set to an invalid one, without
setting fs_info->finalize_on_close() the fs is never able to be mounted.
So even we force to commit wrong transaction we won't screw up things
worse.

Signed-off-by: Qu Wenruo 
Reviewed-by: Nikolay Borisov 
---
 mkfs/main.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 60250c011ac3..358a046f1cf2 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1073,6 +1073,19 @@ static int make_image(const char *source_dir, struct 
btrfs_root *root)
printf("Making image is completed.\n");
return 0;
 fail:
+   /*
+* Since we don't have btrfs_abort_transaction() yet, uncommitted trans
+* will trigger a BUG_ON().
+*
+* However before mkfs is fully finished, the magic number is invalid,
+* so even we commit transaction here, the fs still can't be mounted.
+*
+* To do a graceful error out, here we commit transaction as a
+* workaround.
+* Since we have already hit some problem, the return value doesn't
+* matter now.
+*/
+   btrfs_commit_transaction(trans, root);
while (!list_empty(_head.list)) {
dir_entry = list_entry(dir_head.list.next,
   struct directory_name_entry, list);
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] btrfs-progs: mkfs: Fix overwritten return value for mkfs

2017-10-18 Thread Qu Wenruo
For mkfs failure, especially --rootdir errors like EPERM/ENOSPC, the out
branch will overwrite return value, causing wrong status code.

Signed-off-by: Qu Wenruo 
Reviewed-by: Nikolay Borisov 
---
 mkfs/main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 9d53c6632b45..60250c011ac3 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1426,6 +1426,7 @@ int main(int argc, char **argv)
int zero_end = 1;
int fd = -1;
int ret;
+   int close_ret;
int i;
int mixed = 0;
int nodesize_forced = 0;
@@ -1944,9 +1945,9 @@ raid_groups:
 */
fs_info->finalize_on_close = 1;
 out:
-   ret = close_ctree(root);
+   close_ret = close_ctree(root);
 
-   if (!ret) {
+   if (!close_ret) {
optind = saved_optind;
dev_cnt = argc - optind;
while (dev_cnt-- > 0) {
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] btrfs-progs: mkfs: Avoid positive return value from cleanup_temp_chunks

2017-10-18 Thread Qu Wenruo
Since we're calling btrfs_search_slot() the return value can be
positive.
However we just pass that return value out, causing undefined return
value.

This can cause mkfs return 1, which indicates something wrong.

Fix it.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 80e6089c37a1..9d53c6632b45 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1350,6 +1350,9 @@ static int cleanup_temp_chunks(struct btrfs_fs_info 
*fs_info,
ret = btrfs_search_slot(trans, root, , , 0, 0);
if (ret < 0)
goto out;
+   /* Don't pollute ret for >0 case */
+   if (ret > 0)
+   ret = 0;
 
btrfs_item_key_to_cpu(path.nodes[0], _key,
  path.slots[0]);
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] btrfs-progs: mkfs: Move source dir size calculation to its own files

2017-10-18 Thread Qu Wenruo
Also rename the function from size_sourcedir() to mkfs_size_dir().

Signed-off-by: Qu Wenruo 
Reviewed-by: Nikolay Borisov 
---
 mkfs/main.c| 66 ++
 mkfs/rootdir.c | 63 +++
 mkfs/rootdir.h |  2 ++
 3 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 7861e3075d6b..423b35579722 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "volumes.h"
@@ -448,67 +447,6 @@ static int create_chunks(struct btrfs_trans_handle *trans,
return ret;
 }
 
-/*
- * This ignores symlinks with unreadable targets and subdirs that can't
- * be read.  It's a best-effort to give a rough estimate of the size of
- * a subdir.  It doesn't guarantee that prepopulating btrfs from this
- * tree won't still run out of space.
- */
-static u64 global_total_size;
-static u64 fs_block_size;
-static int ftw_add_entry_size(const char *fpath, const struct stat *st,
- int type)
-{
-   if (type == FTW_F || type == FTW_D)
-   global_total_size += round_up(st->st_size, fs_block_size);
-
-   return 0;
-}
-
-static u64 size_sourcedir(const char *dir_name, u64 sectorsize,
- u64 *num_of_meta_chunks_ret, u64 *size_of_data_ret)
-{
-   u64 dir_size = 0;
-   u64 total_size = 0;
-   int ret;
-   u64 default_chunk_size = SZ_8M;
-   u64 allocated_meta_size = SZ_8M;
-   u64 allocated_total_size = 20 * SZ_1M;  /* 20MB */
-   u64 num_of_meta_chunks = 0;
-   u64 num_of_data_chunks = 0;
-   u64 num_of_allocated_meta_chunks =
-   allocated_meta_size / default_chunk_size;
-
-   global_total_size = 0;
-   fs_block_size = sectorsize;
-   ret = ftw(dir_name, ftw_add_entry_size, 10);
-   dir_size = global_total_size;
-   if (ret < 0) {
-   error("ftw subdir walk of %s failed: %s", dir_name,
-   strerror(errno));
-   exit(1);
-   }
-
-   num_of_data_chunks = (dir_size + default_chunk_size - 1) /
-   default_chunk_size;
-
-   num_of_meta_chunks = (dir_size / 2) / default_chunk_size;
-   if (((dir_size / 2) % default_chunk_size) != 0)
-   num_of_meta_chunks++;
-   if (num_of_meta_chunks <= num_of_allocated_meta_chunks)
-   num_of_meta_chunks = 0;
-   else
-   num_of_meta_chunks -= num_of_allocated_meta_chunks;
-
-   total_size = allocated_total_size +
-(num_of_data_chunks * default_chunk_size) +
-(num_of_meta_chunks * default_chunk_size);
-
-   *num_of_meta_chunks_ret = num_of_meta_chunks;
-   *size_of_data_ret = num_of_data_chunks * default_chunk_size;
-   return total_size;
-}
-
 static int zero_output_file(int out_fd, u64 size)
 {
int loop_num;
@@ -1085,8 +1023,8 @@ int main(int argc, char **argv)
goto error;
}
 
-   source_dir_size = size_sourcedir(source_dir, sectorsize,
-_of_meta_chunks, 
_of_data);
+   source_dir_size = btrfs_mkfs_size_dir(source_dir, sectorsize,
+   _of_meta_chunks, _of_data);
if(block_count < source_dir_size)
block_count = source_dir_size;
ret = zero_output_file(fd, block_count);
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 2cc8a3ac06d8..83a3191d2bd7 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "internal.h"
 #include "disk-io.h"
@@ -33,6 +34,15 @@
 #include "mkfs/rootdir.h"
 #include "send-utils.h"
 
+/*
+ * This ignores symlinks with unreadable targets and subdirs that can't
+ * be read.  It's a best-effort to give a rough estimate of the size of
+ * a subdir.  It doesn't guarantee that prepopulating btrfs from this
+ * tree won't still run out of space.
+ */
+static u64 global_total_size;
+static u64 fs_block_size;
+
 static u64 index_cnt = 2;
 
 static int add_directory_items(struct btrfs_trans_handle *trans,
@@ -670,3 +680,56 @@ fail:
 out:
return ret;
 }
+
+static int ftw_add_entry_size(const char *fpath, const struct stat *st,
+ int type)
+{
+   if (type == FTW_F || type == FTW_D)
+   global_total_size += round_up(st->st_size, fs_block_size);
+
+   return 0;
+}
+
+u64 btrfs_mkfs_size_dir(const char *dir_name, u64 sectorsize,
+   u64 *num_of_meta_chunks_ret, u64 *size_of_data_ret)
+{
+   u64 dir_size = 0;
+   u64 total_size = 0;
+   int ret;
+   u64 default_chunk_size = SZ_8M;
+   u64 allocated_meta_size = SZ_8M;
+   u64 

[PATCH v2 5/6] btrfs-progs: mkfs: Move image creation of rootdir to its own files

2017-10-18 Thread Qu Wenruo
In fact, --rootdir option is getting more and more independent from
normal mkfs code.

So move image creation function, make_image() and its related code to
mkfs/rootdir.[ch], and rename the function to btrfs_mkfs_fill_dir().

Signed-off-by: Qu Wenruo 
Reviewed-by: Nikolay Borisov 
---
 Makefile   |   4 +-
 mkfs/main.c| 652 +--
 mkfs/rootdir.c | 672 +
 mkfs/rootdir.h |  30 +++
 4 files changed, 706 insertions(+), 652 deletions(-)
 create mode 100644 mkfs/rootdir.c
 create mode 100644 mkfs/rootdir.h

diff --git a/Makefile b/Makefile
index d0657aaea0f5..12747547766f 100644
--- a/Makefile
+++ b/Makefile
@@ -113,7 +113,7 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o 
cmds-device.o cmds-scrub.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
   cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
   cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
\
-  mkfs/common.o
+  mkfs/common.o mkfs/rootdir.o
 libbtrfs_objects = send-stream.o send-utils.o kernel-lib/rbtree.o btrfs-list.o 
\
   kernel-lib/crc32c.o messages.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
@@ -123,7 +123,7 @@ libbtrfs_headers = send-stream.h send-utils.h send.h 
kernel-lib/rbtree.h btrfs-l
   extent-cache.h extent_io.h ioctl.h ctree.h btrfsck.h version.h
 convert_objects = convert/main.o convert/common.o convert/source-fs.o \
  convert/source-ext2.o convert/source-reiserfs.o
-mkfs_objects = mkfs/main.o mkfs/common.o
+mkfs_objects = mkfs/main.o mkfs/common.o mkfs/rootdir.o
 image_objects = image/main.o
 all_objects = $(objects) $(cmds_objects) $(libbtrfs_objects) 
$(convert_objects) \
  $(mkfs_objects) $(image_objects)
diff --git a/mkfs/main.c b/mkfs/main.c
index 358a046f1cf2..7861e3075d6b 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -24,17 +24,12 @@
 #include "ioctl.h"
 #include 
 #include 
-#include 
-#include 
 /* #include  included via androidcompat.h */
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 
 #include 
 #include "ctree.h"
@@ -45,20 +40,11 @@
 #include "list_sort.h"
 #include "help.h"
 #include "mkfs/common.h"
+#include "mkfs/rootdir.h"
 #include "fsfeatures.h"
 
-int path_cat_out(char *out, const char *p1, const char *p2);
-
-static u64 index_cnt = 2;
 static int verbose = 1;
 
-struct directory_name_entry {
-   const char *dir_name;
-   const char *path;
-   ino_t inum;
-   struct list_head list;
-};
-
 struct mkfs_allocation {
u64 data;
u64 metadata;
@@ -415,583 +401,6 @@ static char *parse_label(const char *input)
return strdup(input);
 }
 
-static int add_directory_items(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root, u64 objectid,
-  ino_t parent_inum, const char *name,
-  struct stat *st, int *dir_index_cnt)
-{
-   int ret;
-   int name_len;
-   struct btrfs_key location;
-   u8 filetype = 0;
-
-   name_len = strlen(name);
-
-   location.objectid = objectid;
-   location.offset = 0;
-   location.type = BTRFS_INODE_ITEM_KEY;
-
-   if (S_ISDIR(st->st_mode))
-   filetype = BTRFS_FT_DIR;
-   if (S_ISREG(st->st_mode))
-   filetype = BTRFS_FT_REG_FILE;
-   if (S_ISLNK(st->st_mode))
-   filetype = BTRFS_FT_SYMLINK;
-   if (S_ISSOCK(st->st_mode))
-   filetype = BTRFS_FT_SOCK;
-   if (S_ISCHR(st->st_mode))
-   filetype = BTRFS_FT_CHRDEV;
-   if (S_ISBLK(st->st_mode))
-   filetype = BTRFS_FT_BLKDEV;
-   if (S_ISFIFO(st->st_mode))
-   filetype = BTRFS_FT_FIFO;
-
-   ret = btrfs_insert_dir_item(trans, root, name, name_len,
-   parent_inum, ,
-   filetype, index_cnt);
-   if (ret)
-   return ret;
-   ret = btrfs_insert_inode_ref(trans, root, name, name_len,
-objectid, parent_inum, index_cnt);
-   *dir_index_cnt = index_cnt;
-   index_cnt++;
-
-   return ret;
-}
-
-static int fill_inode_item(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
-  struct btrfs_inode_item *dst, struct stat *src)
-{
-   u64 blocks = 0;
-   u64 sectorsize = root->fs_info->sectorsize;
-
-   /*
-* btrfs_inode_item has some reserved fields
-* and represents on-disk inode entry, so
-* zero everything to prevent information leak
-*/
-   memset(dst, 0, sizeof (*dst));
-
-   btrfs_set_stack_inode_generation(dst, trans->transid);
-   btrfs_set_stack_inode_size(dst, 

[PATCH v2 1/6] btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens

2017-10-18 Thread Qu Wenruo
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:

--
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
--

Nothing special, just BUG_ON() abusing from ancient code.

Fix them by using correct return.

Signed-off-by: Qu Wenruo 
Reviewed-by: Nikolay Borisov 
---
 extent-tree.c |  3 ++-
 volumes.c | 18 ++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 525a237e5923..055582c36da6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2690,7 +2690,8 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
   search_start, search_end, hint_byte, ins,
   trans->alloc_exclude_start,
   trans->alloc_exclude_nr, data);
-   BUG_ON(ret);
+   if (ret < 0)
+   return ret;
clear_extent_dirty(>free_space_cache,
   ins->objectid, ins->objectid + ins->offset - 1);
return ret;
diff --git a/volumes.c b/volumes.c
index 2209e5a9100b..e1ee27d5f3ce 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1032,11 +1032,13 @@ again:
 info->chunk_root->root_key.objectid,
 BTRFS_FIRST_CHUNK_TREE_OBJECTID, key.offset,
 calc_size, _offset, 0);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
device->bytes_used += calc_size;
ret = btrfs_update_device(trans, device);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
map->stripes[index].dev = device;
map->stripes[index].physical = dev_offset;
@@ -1075,16 +1077,24 @@ again:
map->ce.size = *num_bytes;
 
ret = insert_cache_extent(>mapping_tree.cache_tree, >ce);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
ret = btrfs_add_system_chunk(info, ,
chunk, btrfs_chunk_item_size(num_stripes));
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk;
}
 
kfree(chunk);
return ret;
+
+out_chunk_map:
+   kfree(map);
+out_chunk:
+   kfree(chunk);
+   return ret;
 }
 
 /*
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] Rootdir refactor and small bug fixes

2017-10-18 Thread Qu Wenruo
Sorry for the v2 patchset, just added a new 3-line patch.
But since it can screw up bisect, I re-send the whole patchset, to make
the new patch just before mkfs return value fix, so bisect will work as
it used to do.


First 4 patches are small bug fixes which can be applied even we don't
touch the functionality of --rootdir.

The last two patches will refactor --rootdir related functions ,mainly
size_sourcedir() and make_image(), to mkfs/rootdir.[ch].
And rename them to btrfs_mkfs_size_dir() and btrfs_mkfs_fill_dir()
respectively.
Functionality is not changed at all, so it will still shrink the device
or using the first 1M reserved space.

This moved about 700 lines, which reduced about 1/3 of original mkfs.c.

And by moving this ancient code to its own files, I also fixed several
small nits exposed by checkpatch script.

This provides a clean environment for later rootdir rework.

changelog:
v2:
  Add a new fix, to avoid mkfs return 1. The rest doesn't change.
  Add reviewed-by tag.

Qu Wenruo (6):
  btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens
  btrfs-progs: mkfs: Avoid positive return value from
cleanup_temp_chunks
  btrfs-progs: mkfs: Fix overwritten return value for mkfs
  btrfs-progs: mkfs: Error out gracefully for --rootdir
  btrfs-progs: mkfs: Move image creation of rootdir to its own files
  btrfs-progs: mkfs: Move source dir size calculation to its own files

 Makefile   |   4 +-
 extent-tree.c  |   3 +-
 mkfs/main.c| 713 +--
 mkfs/rootdir.c | 735 +
 mkfs/rootdir.h |  32 +++
 volumes.c  |  18 +-
 6 files changed, 795 insertions(+), 710 deletions(-)
 create mode 100644 mkfs/rootdir.c
 create mode 100644 mkfs/rootdir.h

-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-18 Thread Edmund Nadolski
just a few quick things for the changelog:

On 09/29/2017 01:43 PM, Josef Bacik wrote:
> Right now we do a lot of weird hoops around outstanding_extents in order
> to keep the extent count consistent.  This is because we logically
> transfer the outstanding_extent count from the initial reservation
> through the set_delalloc_bits.  This makes it pretty difficult to get a
> handle on how and when we need to mess with outstanding_extents.
> 
> Fix this by revamping the rules of how we deal with outstanding_extents.
> Now instead everybody that is holding on to a delalloc extent is
> required to increase the outstanding extents count for itself.  This
> means we'll have something like this
> 
> btrfs_dealloc_reserve_metadata- outstanding_extents = 1

s/dealloc/delalloc/


>  btrfs_set_delalloc   - outstanding_extents = 2

should be btrfs_set_extent_delalloc?


> btrfs_release_delalloc_extents- outstanding_extents = 1
> 
> for an initial file write.  Now take the append write where we extend an
> existing delalloc range but still under the maximum extent size
> 
> btrfs_delalloc_reserve_metadata - outstanding_extents = 2
>   btrfs_set_delalloc

btrfs_set_extent_delalloc?


> btrfs_set_bit_hook- outstanding_extents = 3
> btrfs_merge_bit_hook  - outstanding_extents = 2

should be btrfs_clear_bit_hook? (or btrfs_merge_extent_hook?)


> btrfs_release_delalloc_extents- outstanding_extnets = 1

btrfs_delalloc_release_metadata?


> 
> In order to make the ordered extent transition we of course must now
> make ordered extents carry their own outstanding_extent reservation, so
> for cow_file_range we end up with
> 
> btrfs_add_ordered_extent  - outstanding_extents = 2
> clear_extent_bit  - outstanding_extents = 1
> btrfs_remove_ordered_extent   - outstanding_extents = 0
> 
> This makes all manipulations of outstanding_extents much more explicit.
> Every successful call to btrfs_reserve_delalloc_metadata _must_ now be
   ^
btrfs_delalloc_reserve_metadata?


Thanks,
Ed

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: Fix bug for misused dev_t when lookup in dev state hash table.

2017-10-18 Thread Gu Jinxiang
From: Gu JinXiang 

Fix bug of commit 74d46992e0d9
("block: replace bi_bdev with a gendisk pointer and partitions index").

In this modify, use bio_dev(bio) to find dev state in function
__btrfsic_submit_bio. But when dev_state added to hashtable, it is using
dev_t of block_device.

bio_dev(bio) returns a dev_t of part0 which is different from dev_t in
block_device(bd_dev). bd_dev in block_device represents the exact
partition.
block_device.bd_dev = 
bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio).

When add a dev_state into hashtable it is using the exact partition's dev_t.
So when lookup it, it should also use the exact partition's dev_t.

Reproduce of this bug:
Use MOUNT_OPTIONS="-o check_int" when run btrfs/001 in xfstest.
Then there will be WARNING like below.
WARNING:
btrfs: attempt to write superblock which references block M @29523968 (sda7 
/654400/2) which is never written!

changelog:
v1->v2: Add explanation that bio_dev(bio) is different with 
block_device(bd_dev).

Signed-off-by: Gu JinXiang 
---
 fs/btrfs/check-integrity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index fb07e3c22b9a..02f9eb83173f 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2803,7 +2803,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
mutex_lock(_mutex);
/* since btrfsic_submit_bio() is also called before
 * btrfsic_mount(), this might return NULL */
-   dev_state = btrfsic_dev_state_lookup(bio_dev(bio));
+   dev_state = btrfsic_dev_state_lookup(bio_dev(bio) + bio->bi_partno);
if (NULL != dev_state &&
(bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
unsigned int i = 0;
-- 
2.13.5



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SOLVED - 32-bit kernel 4.13 bug - Mount failing - unable to find logical

2017-10-18 Thread Cameron Kelley



On 10-17-2017 10:10 PM, Roman Mamedov wrote:

On Wed, 18 Oct 2017 09:24:01 +0800
Qu Wenruo  wrote:




On 2017-10-18 04:43, Cameron Kelley wrote:

Hey btrfs gurus,

I have a 4 disk btrfs filesystem that has suddenly stopped mounting
after a recent reboot. The data is in an odd configuration due to
originally being in a 3 disk RAID1 before adding a 4th disk and running
a balance to convert to RAID10. There wasn't enough free space to
completely convert, so about half the data is still in RAID1 while the
other half is in RAID10. Both metadata and system are RAID10. It has
been in this configuration for 6 months or so now since adding the 4th
disk. It just holds archived media and hasn't had any data added or
modified in quite some time. I feel pretty stupid now for not correcting
that sooner though.

I have tried mounting with different mount options for recovery, ro,
degraded, etc. Log shows errors about "unable to find logical
3746892939264 length 4096"

When I do a btrfs check, it doesn't find any issues. Running
btrfs-find-root comes up with a message about a block that the
generation doesn't match. If I specify that block on the btrfs check, I
get transid verify failures.

I ran a dry run of a recovery of the entire filesystem which runs
through every file with no errors. I would just restore the data and
start fresh, but unfortunately I don't have the free space at the moment
for the ~4.5TB of data.

I also ran full smart self tests on all 4 disks with no errors.

root@nas2:~# uname -a
Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06
UTC 2017 i686 i686 i686 GNU/Linux


I don't think i686 kernel will cause any difference, but considering
most of us are using x86_64 to develop/test, maybe it will be a good
idea to upgrade to x86_64 kernel?


Indeed a problem with mounting on 32-bit in 4.13 has been reported recently:
https://www.spinics.net/lists/linux-btrfs/msg69734.html
with the same error message.

I believe it's this patchset that is supposed to fix that.
https://www.spinics.net/lists/linux-btrfs/msg70001.html

@Cameron maybe you didn't just reboot, but also upgraded your kernel at the
same time? In any case, try a 4.9 series kernel, or a 64-bit machine if you
want to stay with 4.13.



Just for reference to anyone else having this issue, it is indeed a bug in 
the 32-bit release of the 4.13 kernel. The x64 kernel had no issues 
mounting it.


An interesting thing to note is that I still had all the exact same mount 
issues and errors when I booted the latest PartedMagic live image with 
kernel 4.12.9 in 32-bit mode. The same PatedMagic image in 64-bit mode had 
no issues which is how I confirmed your suspicions.


Now for the part where I feel more stupid than I have in a long time.

1. Apparently I had updated the kernel one this NAS without realizing it 
since I was doing updates on multiple appliances at once a little while 
ago and just hadn't rebooted it since. When I ran into issues, I updated 
the kernel to the latest without looking at the kernel I was on just to 
see if that solved it.


2. And here's the real kicker. The processor in this NAS (Pentium E5200) 
is actually x64 capable. I must have skimmed information too quickly when 
I first built this years ago and thought it wasn't x64 capable.


I have rebuilt the NAS and I'm now running a scrub just to make sure steps 
I was taking to recover didn't cause any issues.


Anything else you would recommend to make sure there aren't any other 
issues that could have been caused by my tinkering?


Thank you very much for your help as I was banging my head against a wall. 
This NAS does so little that I tend to get careless with it. Lesson 
learned and embarrassment felt. The only solace is that this might help 
someone else who runs into this with kernel 4.13 on a 32-bit system.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to use btrfs on top of different types of devices?

2017-10-18 Thread Austin S. Hemmelgarn

On 2017-10-18 07:59, Adam Borowski wrote:

On Wed, Oct 18, 2017 at 07:30:55AM -0400, Austin S. Hemmelgarn wrote:

On 2017-10-17 16:21, Adam Borowski wrote:

It's a single-device filesystem, thus disconnects are obviously fatal.  But,
they never caused even a single bit of damage (as scrub goes), thus proving
btrfs handles this kind of disconnects well.  Unlike times past, the kernel
doesn't get confused thus no reboot is needed, merely an unmount, "service
nbd-client restart", mount, restart the rebuild jobs.

That's expected behavior though.  _Single_ device BTRFS has nothing to get
out of sync most of the time, the only time there's any possibility of an
issue is when you die after writing the first copy of a block that's in a
dup profile chunk, but even that is not very likely to cause problems
(you'll just lose at most the last  worth of data).


How come?  In a DUP profile, the writes are: chunk 1, chunk2, barrier,
superblock.  The two prior writes may be arbitrarily reordered -- both
between each other or even individual sectors inside the chunks, but unless
the disk lies about barriers, there's no way to have any corruption, thus
running scrub is not needed.

If the device dies after writing chunk 1 but before the barrier, you end up
needing scrub.  How much of a failure window is present is largely a
function of how fast the device is, but there is a failure window there.


CoW is there to ensure there is _no_ failure window.  The new content
doesn't matter until there are live pointers to it -- from the filesystem's
point of view we merely scribbled something on an unused part of the block
device.  Only after all pieces are in place (as ensured by the barrier), the
superblock is updated with a reference to the new metadata->data chain.
Even with CoW there _IS_ a failure window.  At a bare minimum, when 
updating the root of the tree which has multiple copies, you have a 
failure window.  This window could admittedly be significantly reduced 
for multi-device setups if we actually parallelized writes properly, but 
it would still be there.


Thus, no matter when a disconnect happens, after a crash you get either
uncorrupted old version or uncorrupted new version.

No scrub is ever needed for this reason on single device or on RAID1 that
didn't run degraded.
The whole conversation started regarding a RAID1 array that's 
functionally guaranteed to run degraded on a regular basis.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to use btrfs on top of different types of devices?

2017-10-18 Thread Austin S. Hemmelgarn

On 2017-10-18 09:53, Peter Grandi wrote:

I forget sometimes that people insist on storing large
volumes of data on unreliable storage...


Here obviously "unreliable" is used on the sense of storage that
can work incorrectly, not in the sense of storage that can fail.
Um, in what world is a device randomly dropping off the bus (this is the 
primary issue with USB for storage) not a failure?  Yes, it's not a 
catastrophic failure (for BTRFS at least), and it's transient (the 
kernel will re-enumerate the device when it resets the bus), but that 
doesn't change the fact that the service that is supposed to be provided 
by the device failed.


To clarify more concretely, when I say 'unreliable' in reference to 
computers technology (and for that matter, almost anything else), I mean 
something that encounters non-trivial error states, either correctable 
or uncorrectable, at a frequency above that which is deemed reasonable 
for the designed function of the device.



In my opinion the unreliability of the storage is the exact
reason for wanting to use raid1. And I think any problem one
encounters with an unreliable disk can likely happen with more
reliable ones as well, only less frequently, so if I don't
feel comfortable using raid1 on an unreliable medium then I
wouldn't trust it on a more reliable one either.


Oh please, please a bit less silliness would be welcome here.
In a previous comment on this tedious thread I had written:

   > If the block device abstraction layer and lower layers work
   > correctly, Btrfs does not have problems of that sort when
   > adding new devices; conversely if the block device layer and
   > lower layers do not work correctly, no mainline Linux
   > filesystem I know can cope with that.

   > Note: "work correctly" does not mean "work error-free".

The last line is very important and I added it advisedly.

You seem to be using "unreliable" in two completely different
meanings, without realizing it, as both "working incorrectly"
and "reporting a failure". They are really very different.
And you seem to be using the term 'failure' to only mean 'catastrophic 
failure'.  Strictly speaking, even that is 'working incorrectly', albeit 
in a much more specific and permanent manner than just returning errors.


Even looking at things that way though, Zoltan's assessment that 
reliability is essentially a measure of error rate is correct.  Internal 
SATA devices absolutely can randomly drop off the bus just like many USB 
storage devices do, but it almost never happens (it's a statistical 
impossibility if there are no hardware or firmware issues), so they are 
more reliable in that respect.


The "working incorrectly" general case is the so called
"bizantine generals problem" and (depending on assumptions) it
is insoluble.

Btrfs has some limited ability to detect (and sometimes recover
from) "working incorrectly" storage layers, but don't expect too
much from that.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unmountable fs. No root for superblock generation

2017-10-18 Thread Larkin Lowrey
I am unable to mount one my my filesystems.  The superblock thinks the 
latest generation is 2220927 but I can't seem to find a root with that 
number. I can find 2220926 and 2220928 but not 2220927. Is there 
anything that I can do to recover this FS?


# btrfs check /dev/Cached/Backups
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
Csum didn't match
Couldn't setup extent tree
Couldn't open file system

# btrfs-find-root -g 2220927 /dev/Cached/Backups
Couldn't setup extent tree
Couldn't setup device tree
Superblock thinks the generation is 2220927
Superblock thinks the level is 2

Found tree root at 159057884577792 gen 2220927 level 2
Well block 101489031790592(gen: 2220928 level: 2) seems good, but 
generation/level doesn't match, want gen: 2220927 level: 2


# btrfs check --tree-root 159057884577792  /dev/Cached/Backups
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
Csum didn't match
Couldn't setup extent tree
Couldn't open file system

# btrfs check --tree-root 101489031790592 /dev/Cached/Backups
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
Ignoring transid failure
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
Ignoring transid failure
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
Ignoring transid failure
Checking filesystem on /dev/Cached/Backups
UUID: 1b213dfd-6486-47d8-8459-bc5825882023
checking extents
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116480660914176 wanted 

Re: Is it safe to use btrfs on top of different types of devices?

2017-10-18 Thread Peter Grandi
> [ ... ] After all, btrfs would just have to discard one copy
> of each chunk. [ ... ]  One more thing that is not clear to me
> is the replication profile of a volume. I see that balance can
> convert chunks between profiles, for example from single to
> raid1, but I don't see how the default profile for new chunks
> can be set or quiered. [ ... ]

My impression is that the design rationale and aims for Btrfs
two-level allocation (in other fields known as a "BIBOP" scheme)
were not fully shared among Btrfs developers, that perhaps it
could have benefited from some further reflection on its
implications, and that its behaviour may have evolved
"opportunistically", maybe without much worrying as to
conceptual integrity. (I am trying to be euphemistic)

So while I am happy with the "Rodeh" core of Btrfs (COW,
sbuvolumes, checksums), the RAID-profile functionality and
especially the multi-device layer is not something I find
particularly to my taste. (I am trying to be euphemistic)

So when it comes to allocation, RAID-profiles, multiple devices,
I usually expect some random "surprising functionality". (I am
trying to be euphemistic)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to use btrfs on top of different types of devices?

2017-10-18 Thread Peter Grandi
>> I forget sometimes that people insist on storing large
>> volumes of data on unreliable storage...

Here obviously "unreliable" is used on the sense of storage that
can work incorrectly, not in the sense of storage that can fail.

> In my opinion the unreliability of the storage is the exact
> reason for wanting to use raid1. And I think any problem one
> encounters with an unreliable disk can likely happen with more
> reliable ones as well, only less frequently, so if I don't
> feel comfortable using raid1 on an unreliable medium then I
> wouldn't trust it on a more reliable one either.

Oh please, please a bit less silliness would be welcome here.
In a previous comment on this tedious thread I had written:

  > If the block device abstraction layer and lower layers work
  > correctly, Btrfs does not have problems of that sort when
  > adding new devices; conversely if the block device layer and
  > lower layers do not work correctly, no mainline Linux
  > filesystem I know can cope with that.

  > Note: "work correctly" does not mean "work error-free".

The last line is very important and I added it advisedly.

You seem to be using "unreliable" in two completely different
meanings, without realizing it, as both "working incorrectly"
and "reporting a failure". They are really very different.

The "working incorrectly" general case is the so called
"bizantine generals problem" and (depending on assumptions) it
is insoluble.

Btrfs has some limited ability to detect (and sometimes recover
from) "working incorrectly" storage layers, but don't expect too
much from that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: test: add new cli-test for subvol get/set-default

2017-10-18 Thread David Sterba
On Wed, Oct 18, 2017 at 11:00:43AM +0900, Misono, Tomohiro wrote:
> Add new test to check functionality of subvol get/set-default.
> 
> Signed-off-by: Tomohiro Misono 

Thanks, applied with the following diff to fix style and failures
when the test is not run as root initially:

- no command shortcuts
- the subvolume id for set-default should be read from rootid
- add missing SUDO_HELPER
- prepare_test_dev without the device size (unless justified)

--- a/tests/cli-tests/008-subvolume-get-set-default/test.sh
+++ b/tests/cli-tests/008-subvolume-get-set-default/test.sh
@@ -3,7 +3,7 @@

 check_default_id()
 {
-   id=$(run_check_stdout $SUDO_HELPER "$TOP/btrfs" sub get-def .) \
+   id=$(run_check_stdout $SUDO_HELPER "$TOP/btrfs" subvolume get-default 
.) \
|| { echo "$id"; exit 1; }
if $(echo "$id" | grep -vq "ID $1"); then
_fail "subvolume get-default: default id is not $1, but $id"
@@ -16,7 +16,7 @@ check_prereq mkfs.btrfs
 check_prereq btrfs

 setup_root_helper
-prepare_test_dev 2g
+prepare_test_dev

 run_check "$TOP/mkfs.btrfs" -f "$TEST_DEV"
 run_check_mount_test_dev
@@ -25,21 +25,23 @@ cd "$TEST_MNT"
 check_default_id 5

 # check "subvol set-default  "
-run_check "$TOP/btrfs" subvol create sub
-run_check $SUDO_HELPER "$TOP/btrfs" subvol set-default 257 .
-check_default_id 257
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub
+id=$(run_check_stdout "$TOP/btrfs" inspect-internal rootid sub)
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume set-default "$id" .
+check_default_id "$id"

 run_mustfail "set-default to non existent id" \
-   $SUDO_HELPER "$TOP/btrfs" subvol set-default 100 .
+   $SUDO_HELPER "$TOP/btrfs" subvolume set-default 100 .

 # check "subvol set-default "
-run_check "$TOP/btrfs" subvol create sub2
-run_check $SUDO_HELPER "$TOP/btrfs" subvol set-default ./sub2
-check_default_id 258
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub2
+id=$(run_check_stdout "$TOP/btrfs" inspect-internal rootid sub2)
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume set-default ./sub2
+check_default_id "$id"

-run_check mkdir sub2/dir
+run_check $SUDO_HELPER mkdir sub2/dir
 run_mustfail "set-default to normal directory" \
-   $SUDO_HELPER "$TOP/btrfs" subvol set-default ./sub2/dir
+   $SUDO_HELPER "$TOP/btrfs" subvolume set-default ./sub2/dir

 cd ..
 run_check_umount_test_dev
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: ref-verify: Fix NULL vs IS_ERR() check in walk_down_tree()

2017-10-18 Thread David Sterba
On Wed, Oct 18, 2017 at 10:36:35AM +0300, Dan Carpenter wrote:
> read_tree_block() returns error pointers, and never NULL and so I have
> updated the error handling.
> 
> Fixes: 74739121b4c7 ("Btrfs: add a extent ref verify tool")
> Signed-off-by: Dan Carpenter 

Thanks, I've folded the fix into the original commit and added credits.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Rootdir refactor and small bug fixes

2017-10-18 Thread Nikolay Borisov


On 18.10.2017 11:00, Qu Wenruo wrote:
> First 3 patches are small bug fixes which can be applied even we don't
> touch the functionality of --rootdir.
> 
> The last two patches will refactor --rootdir related functions (mainly
> size_sourcedir and make_image) to mkfs/rootdir.[ch].
> And rename them to btrfs_mkfs_size_dir() and btrfs_mkfs_fill_dir()
> respectively.
> Functionality is not changed at all, so it will still shrink the device
> or using the first 1M reserved space.
> 
> This moved about 700 lines, which reduced about 1/3 of original mkfs.c.
> 
> And by moving this ancient code to its own files, I also fixed several
> small nits exposed by checkpatch script.
> 
> This provides a clean environment for later rootdir rework.
> 
> Qu Wenruo (5):
>   btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens
>   btrfs-progs: mkfs: Fix overwritten return value for mkfs
>   btrfs-progs: mkfs: Error out gracefully for --rootdir
>   btrfs-progs: mkfs: Move image creation of rootdir to its own files
>   btrfs-progs: mkfs: Move source dir size calculation to its own files


Reviewed-by: Nikolay Borisov 

> 
>  Makefile   |   4 +-
>  extent-tree.c  |   3 +-
>  mkfs/main.c| 710 +--
>  mkfs/rootdir.c | 735 
> +
>  mkfs/rootdir.h |  32 +++
>  volumes.c  |  18 +-
>  6 files changed, 792 insertions(+), 710 deletions(-)
>  create mode 100644 mkfs/rootdir.c
>  create mode 100644 mkfs/rootdir.h
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to use btrfs on top of different types of devices?

2017-10-18 Thread Adam Borowski
On Wed, Oct 18, 2017 at 07:30:55AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-10-17 16:21, Adam Borowski wrote:
> > > > It's a single-device filesystem, thus disconnects are obviously fatal.  
> > > > But,
> > > > they never caused even a single bit of damage (as scrub goes), thus 
> > > > proving
> > > > btrfs handles this kind of disconnects well.  Unlike times past, the 
> > > > kernel
> > > > doesn't get confused thus no reboot is needed, merely an unmount, 
> > > > "service
> > > > nbd-client restart", mount, restart the rebuild jobs.
> > > That's expected behavior though.  _Single_ device BTRFS has nothing to get
> > > out of sync most of the time, the only time there's any possibility of an
> > > issue is when you die after writing the first copy of a block that's in a
> > > dup profile chunk, but even that is not very likely to cause problems
> > > (you'll just lose at most the last  worth of data).
> > 
> > How come?  In a DUP profile, the writes are: chunk 1, chunk2, barrier,
> > superblock.  The two prior writes may be arbitrarily reordered -- both
> > between each other or even individual sectors inside the chunks, but unless
> > the disk lies about barriers, there's no way to have any corruption, thus
> > running scrub is not needed.
> If the device dies after writing chunk 1 but before the barrier, you end up
> needing scrub.  How much of a failure window is present is largely a
> function of how fast the device is, but there is a failure window there.

CoW is there to ensure there is _no_ failure window.  The new content
doesn't matter until there are live pointers to it -- from the filesystem's
point of view we merely scribbled something on an unused part of the block
device.  Only after all pieces are in place (as ensured by the barrier), the
superblock is updated with a reference to the new metadata->data chain.

Thus, no matter when a disconnect happens, after a crash you get either
uncorrupted old version or uncorrupted new version.

No scrub is ever needed for this reason on single device or on RAID1 that
didn't run degraded.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Imagine there are bandits in your house, your kid is bleeding out,
⢿⡄⠘⠷⠚⠋⠀ the house is on fire, and seven big-ass trumpets are playing in the
⠈⠳⣄ sky.  Your cat demands food.  The priority should be obvious...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Best strategie to remove devices from pool

2017-10-18 Thread Austin S. Hemmelgarn

On 2017-10-17 13:58, Cloud Admin wrote:

Hi,
I want to remove two devices from a BTRFS RAID 1 pool. It should be
enough free space to do it, but what is the best strategie. Remove both
device in one call 'btrfs dev rem /dev/sda1 /dev/sdb1' (for example) or
should it be better in two separate calls? What is faster? Are there
other constraints to think about?
Ideally, delete them all with a single operation.  Internally the delete 
command uses some numeric trickery together with a balance operation to 
migrate the data off of the devices being removed.  If you do them one 
at a time, you will end up moving at least some data twice, and thus 
wasting time.  Other than that, there's not much to worry ab out, though 
keep in mind that deleting devices from an array can take a long time, 
especially if the array is mostly full.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to use btrfs on top of different types of devices?

2017-10-18 Thread Austin S. Hemmelgarn

On 2017-10-17 16:21, Adam Borowski wrote:

On Tue, Oct 17, 2017 at 03:19:09PM -0400, Austin S. Hemmelgarn wrote:

On 2017-10-17 13:06, Adam Borowski wrote:

The thing is, reliability guarantees required vary WILDLY depending on your
particular use cases.  On one hand, there's "even an one-minute downtime
would cost us mucho $$$s, can't have that!" -- on the other, "it died?
Okay, we got backups, lemme restore it after the weekend".

Yes, but if you are in the second case, you arguably don't need replication,
and would be better served by improving the reliability of your underlying
storage stack than trying to work around it's problems. Even in that case,
your overall reliability is still constrained by the least reliable
component (in more idiomatic terms 'a chain is only as strong as it's
weakest link').


MD can handle this case well, there's no reason btrfs shouldn't do that too.
A RAID is not akin to serially connected chain, it's a parallel connected
chain: while pieces of the broken second chain hanging down from the first
don't make it strictly more resilient than having just a single chain, in
general case it _is_ more reliable even if the other chain is weaker.
My chain analogy is supposed to be relating to the storage stack as a 
whole, RAID is a single link in the chain, with whatever filesystem 
above it, and whatever storage drivers and hardware below.


Don't we have a patchset that deals with marking a device as failed at
runtime floating on the mailing list?  I did not look at those patches yet,
but they are a step in this direction.
There were some disagreements on whether the device should be released 
(that is, the node closed) immediately when we know it's failed, or 
should be held open until remount.



Using replication with a reliable device and a questionable device is
essentially the same as trying to add redundancy to a machine by adding an
extra linkage that doesn't always work and can get in the way of the main
linkage it's supposed to be protecting from failure.  Yes, it will work most
of the time, but the system is going to be less reliable than it is without
the 'redundancy'.


That's the current state of btrfs, but the design is sound, and reaching
more than parity with MD is a matter of implementation.
Indeed, however MD is still not perfectly reliable in this situation 
(though they are exponentially better than BTRFS at the moment).



Thus, I switched the machine to NBD (albeit it sucks on 100Mbit eth).  Alas,
the network driver allocates memory with GFP_NOIO which causes NBD
disconnects (somehow, this doesn't ever happen on swap where GFP_NOIO would
be obvious but on regular filesystem where throwing out userspace memory is
safe).  The disconnects happen around once per week.

Somewhat off-topic, but you might try looking at ATAoE as an alternative,
it's more reliable in my experience (if you've got a reliable network),
gives better performance (there's less protocol overhead than NBD, and it
runs on top of layer 2 instead of layer 4)


I've tested it -- not on the Odroid-U2 but on Pine64 (fully working GbE).
NBD delivers 108MB/sec in a linear transfer, ATAoE is lucky to break
40MB/sec, same target (Qnap-253a, spinning rust), both in default
configuration without further tuning.  NBD is over IPv6 for that extra 20
bytes per packet overhead.

Interesting, I've seen the the exact opposite in terms of performance.


Also, NBD can be encrypted or arbitrarily routed.

Yes, though if you're on a local network, neither should matter :).



It's a single-device filesystem, thus disconnects are obviously fatal.  But,
they never caused even a single bit of damage (as scrub goes), thus proving
btrfs handles this kind of disconnects well.  Unlike times past, the kernel
doesn't get confused thus no reboot is needed, merely an unmount, "service
nbd-client restart", mount, restart the rebuild jobs.

That's expected behavior though.  _Single_ device BTRFS has nothing to get
out of sync most of the time, the only time there's any possibility of an
issue is when you die after writing the first copy of a block that's in a
dup profile chunk, but even that is not very likely to cause problems
(you'll just lose at most the last  worth of data).


How come?  In a DUP profile, the writes are: chunk 1, chunk2, barrier,
superblock.  The two prior writes may be arbitrarily reordered -- both
between each other or even individual sectors inside the chunks, but unless
the disk lies about barriers, there's no way to have any corruption, thus
running scrub is not needed.
If the device dies after writing chunk 1 but before the barrier, you end 
up needing scrub.  How much of a failure window is present is largely a 
function of how fast the device is, but there is a failure window there.



The moment you add another device though, that simplicity goes out the
window.


RAID1 doesn't seem less simple to me: if the new superblock has been
successfully written on at least one disk, barriers imply 

[PATCH 0/5] Rootdir refactor and small bug fixes

2017-10-18 Thread Qu Wenruo
First 3 patches are small bug fixes which can be applied even we don't
touch the functionality of --rootdir.

The last two patches will refactor --rootdir related functions (mainly
size_sourcedir and make_image) to mkfs/rootdir.[ch].
And rename them to btrfs_mkfs_size_dir() and btrfs_mkfs_fill_dir()
respectively.
Functionality is not changed at all, so it will still shrink the device
or using the first 1M reserved space.

This moved about 700 lines, which reduced about 1/3 of original mkfs.c.

And by moving this ancient code to its own files, I also fixed several
small nits exposed by checkpatch script.

This provides a clean environment for later rootdir rework.

Qu Wenruo (5):
  btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens
  btrfs-progs: mkfs: Fix overwritten return value for mkfs
  btrfs-progs: mkfs: Error out gracefully for --rootdir
  btrfs-progs: mkfs: Move image creation of rootdir to its own files
  btrfs-progs: mkfs: Move source dir size calculation to its own files

 Makefile   |   4 +-
 extent-tree.c  |   3 +-
 mkfs/main.c| 710 +--
 mkfs/rootdir.c | 735 +
 mkfs/rootdir.h |  32 +++
 volumes.c  |  18 +-
 6 files changed, 792 insertions(+), 710 deletions(-)
 create mode 100644 mkfs/rootdir.c
 create mode 100644 mkfs/rootdir.h

-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens

2017-10-18 Thread Qu Wenruo
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:

--
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
--

Nothing special, just BUG_ON() abusing from ancient code.

Fix them by using correct return.

Signed-off-by: Qu Wenruo 
---
 extent-tree.c |  3 ++-
 volumes.c | 18 ++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 525a237e5923..055582c36da6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2690,7 +2690,8 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
   search_start, search_end, hint_byte, ins,
   trans->alloc_exclude_start,
   trans->alloc_exclude_nr, data);
-   BUG_ON(ret);
+   if (ret < 0)
+   return ret;
clear_extent_dirty(>free_space_cache,
   ins->objectid, ins->objectid + ins->offset - 1);
return ret;
diff --git a/volumes.c b/volumes.c
index 2209e5a9100b..e1ee27d5f3ce 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1032,11 +1032,13 @@ again:
 info->chunk_root->root_key.objectid,
 BTRFS_FIRST_CHUNK_TREE_OBJECTID, key.offset,
 calc_size, _offset, 0);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
device->bytes_used += calc_size;
ret = btrfs_update_device(trans, device);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
map->stripes[index].dev = device;
map->stripes[index].physical = dev_offset;
@@ -1075,16 +1077,24 @@ again:
map->ce.size = *num_bytes;
 
ret = insert_cache_extent(>mapping_tree.cache_tree, >ce);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
ret = btrfs_add_system_chunk(info, ,
chunk, btrfs_chunk_item_size(num_stripes));
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk;
}
 
kfree(chunk);
return ret;
+
+out_chunk_map:
+   kfree(map);
+out_chunk:
+   kfree(chunk);
+   return ret;
 }
 
 /*
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] btrfs-progs: mkfs: Error out gracefully for --rootdir

2017-10-18 Thread Qu Wenruo
--rootdir option will start a transaction to fill the fs, however if
something goes wrong, from ENOSPC to lack of permission, we won't commit
transaction and cause BUG_ON trigger by uncommitted transaction:

--
extent buffer leak: start 29392896 len 16384
extent_io.c:579: free_extent_buffer: BUG_ON `eb->flags & EXTENT_DIRTY` 
triggered, value 1
--

The root fix is to introduce btrfs_abort_transaction() in btrfs-progs,
however in this particular case, we can workaround it by force
committing the transaction.

Since during mkfs, the magic of btrfs is set to an invalid one, without
setting fs_info->finalize_on_close() the fs is never able to be mounted.
So even we force to commit wrong transaction we won't screw up things
worse.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 5817f114c1a1..8c332aa1e12a 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1073,6 +1073,19 @@ static int make_image(const char *source_dir, struct 
btrfs_root *root)
printf("Making image is completed.\n");
return 0;
 fail:
+   /*
+* Since we don't have btrfs_abort_transaction() yet, uncommitted trans
+* will trigger a BUG_ON().
+*
+* However before mkfs is fully finished, the magic number is invalid,
+* so even we commit transaction here, the fs still can't be mounted.
+*
+* To do a graceful error out, here we commit transaction as a
+* workaround.
+* Since we have already hit some problem, the return value doesn't
+* matter now.
+*/
+   btrfs_commit_transaction(trans, root);
while (!list_empty(_head.list)) {
dir_entry = list_entry(dir_head.list.next,
   struct directory_name_entry, list);
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs-progs: mkfs: Fix overwritten return value for mkfs

2017-10-18 Thread Qu Wenruo
For mkfs failure, especially --rootdir errors like EPERM/ENOSPC, the out
branch will overwrite return value, causing wrong status code.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 1b4cabc1ef90..5817f114c1a1 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1423,6 +1423,7 @@ int main(int argc, char **argv)
int zero_end = 1;
int fd = -1;
int ret;
+   int close_ret;
int i;
int mixed = 0;
int nodesize_forced = 0;
@@ -1938,9 +1939,9 @@ raid_groups:
 */
fs_info->finalize_on_close = 1;
 out:
-   ret = close_ctree(root);
+   close_ret = close_ctree(root);
 
-   if (!ret) {
+   if (!close_ret) {
optind = saved_optind;
dev_cnt = argc - optind;
while (dev_cnt-- > 0) {
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] btrfs-progs: mkfs: Move image creation of rootdir to its own files

2017-10-18 Thread Qu Wenruo
In fact, --rootdir option is getting more and more independent from
normal mkfs code.

So move image creation function, make_image() and its related code to
mkfs/rootdir.[ch], and rename the function to btrfs_mkfs_fill_dir().

Signed-off-by: Qu Wenruo 
---
 Makefile   |   4 +-
 mkfs/main.c| 652 +--
 mkfs/rootdir.c | 672 +
 mkfs/rootdir.h |  30 +++
 4 files changed, 706 insertions(+), 652 deletions(-)
 create mode 100644 mkfs/rootdir.c
 create mode 100644 mkfs/rootdir.h

diff --git a/Makefile b/Makefile
index d0657aaea0f5..12747547766f 100644
--- a/Makefile
+++ b/Makefile
@@ -113,7 +113,7 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o 
cmds-device.o cmds-scrub.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
   cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
   cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
\
-  mkfs/common.o
+  mkfs/common.o mkfs/rootdir.o
 libbtrfs_objects = send-stream.o send-utils.o kernel-lib/rbtree.o btrfs-list.o 
\
   kernel-lib/crc32c.o messages.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
@@ -123,7 +123,7 @@ libbtrfs_headers = send-stream.h send-utils.h send.h 
kernel-lib/rbtree.h btrfs-l
   extent-cache.h extent_io.h ioctl.h ctree.h btrfsck.h version.h
 convert_objects = convert/main.o convert/common.o convert/source-fs.o \
  convert/source-ext2.o convert/source-reiserfs.o
-mkfs_objects = mkfs/main.o mkfs/common.o
+mkfs_objects = mkfs/main.o mkfs/common.o mkfs/rootdir.o
 image_objects = image/main.o
 all_objects = $(objects) $(cmds_objects) $(libbtrfs_objects) 
$(convert_objects) \
  $(mkfs_objects) $(image_objects)
diff --git a/mkfs/main.c b/mkfs/main.c
index 8c332aa1e12a..693a9d85f6b6 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -24,17 +24,12 @@
 #include "ioctl.h"
 #include 
 #include 
-#include 
-#include 
 /* #include  included via androidcompat.h */
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 
 #include 
 #include "ctree.h"
@@ -45,20 +40,11 @@
 #include "list_sort.h"
 #include "help.h"
 #include "mkfs/common.h"
+#include "mkfs/rootdir.h"
 #include "fsfeatures.h"
 
-int path_cat_out(char *out, const char *p1, const char *p2);
-
-static u64 index_cnt = 2;
 static int verbose = 1;
 
-struct directory_name_entry {
-   const char *dir_name;
-   const char *path;
-   ino_t inum;
-   struct list_head list;
-};
-
 struct mkfs_allocation {
u64 data;
u64 metadata;
@@ -415,583 +401,6 @@ static char *parse_label(const char *input)
return strdup(input);
 }
 
-static int add_directory_items(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root, u64 objectid,
-  ino_t parent_inum, const char *name,
-  struct stat *st, int *dir_index_cnt)
-{
-   int ret;
-   int name_len;
-   struct btrfs_key location;
-   u8 filetype = 0;
-
-   name_len = strlen(name);
-
-   location.objectid = objectid;
-   location.offset = 0;
-   location.type = BTRFS_INODE_ITEM_KEY;
-
-   if (S_ISDIR(st->st_mode))
-   filetype = BTRFS_FT_DIR;
-   if (S_ISREG(st->st_mode))
-   filetype = BTRFS_FT_REG_FILE;
-   if (S_ISLNK(st->st_mode))
-   filetype = BTRFS_FT_SYMLINK;
-   if (S_ISSOCK(st->st_mode))
-   filetype = BTRFS_FT_SOCK;
-   if (S_ISCHR(st->st_mode))
-   filetype = BTRFS_FT_CHRDEV;
-   if (S_ISBLK(st->st_mode))
-   filetype = BTRFS_FT_BLKDEV;
-   if (S_ISFIFO(st->st_mode))
-   filetype = BTRFS_FT_FIFO;
-
-   ret = btrfs_insert_dir_item(trans, root, name, name_len,
-   parent_inum, ,
-   filetype, index_cnt);
-   if (ret)
-   return ret;
-   ret = btrfs_insert_inode_ref(trans, root, name, name_len,
-objectid, parent_inum, index_cnt);
-   *dir_index_cnt = index_cnt;
-   index_cnt++;
-
-   return ret;
-}
-
-static int fill_inode_item(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
-  struct btrfs_inode_item *dst, struct stat *src)
-{
-   u64 blocks = 0;
-   u64 sectorsize = root->fs_info->sectorsize;
-
-   /*
-* btrfs_inode_item has some reserved fields
-* and represents on-disk inode entry, so
-* zero everything to prevent information leak
-*/
-   memset(dst, 0, sizeof (*dst));
-
-   btrfs_set_stack_inode_generation(dst, trans->transid);
-   btrfs_set_stack_inode_size(dst, src->st_size);
-   

[PATCH 5/5] btrfs-progs: mkfs: Move source dir size calculation to its own files

2017-10-18 Thread Qu Wenruo
Also rename the function from size_sourcedir() to mkfs_size_dir().

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c| 66 ++
 mkfs/rootdir.c | 63 +++
 mkfs/rootdir.h |  2 ++
 3 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 693a9d85f6b6..e2ebe3ce069f 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "volumes.h"
@@ -448,67 +447,6 @@ static int create_chunks(struct btrfs_trans_handle *trans,
return ret;
 }
 
-/*
- * This ignores symlinks with unreadable targets and subdirs that can't
- * be read.  It's a best-effort to give a rough estimate of the size of
- * a subdir.  It doesn't guarantee that prepopulating btrfs from this
- * tree won't still run out of space.
- */
-static u64 global_total_size;
-static u64 fs_block_size;
-static int ftw_add_entry_size(const char *fpath, const struct stat *st,
- int type)
-{
-   if (type == FTW_F || type == FTW_D)
-   global_total_size += round_up(st->st_size, fs_block_size);
-
-   return 0;
-}
-
-static u64 size_sourcedir(const char *dir_name, u64 sectorsize,
- u64 *num_of_meta_chunks_ret, u64 *size_of_data_ret)
-{
-   u64 dir_size = 0;
-   u64 total_size = 0;
-   int ret;
-   u64 default_chunk_size = SZ_8M;
-   u64 allocated_meta_size = SZ_8M;
-   u64 allocated_total_size = 20 * SZ_1M;  /* 20MB */
-   u64 num_of_meta_chunks = 0;
-   u64 num_of_data_chunks = 0;
-   u64 num_of_allocated_meta_chunks =
-   allocated_meta_size / default_chunk_size;
-
-   global_total_size = 0;
-   fs_block_size = sectorsize;
-   ret = ftw(dir_name, ftw_add_entry_size, 10);
-   dir_size = global_total_size;
-   if (ret < 0) {
-   error("ftw subdir walk of %s failed: %s", dir_name,
-   strerror(errno));
-   exit(1);
-   }
-
-   num_of_data_chunks = (dir_size + default_chunk_size - 1) /
-   default_chunk_size;
-
-   num_of_meta_chunks = (dir_size / 2) / default_chunk_size;
-   if (((dir_size / 2) % default_chunk_size) != 0)
-   num_of_meta_chunks++;
-   if (num_of_meta_chunks <= num_of_allocated_meta_chunks)
-   num_of_meta_chunks = 0;
-   else
-   num_of_meta_chunks -= num_of_allocated_meta_chunks;
-
-   total_size = allocated_total_size +
-(num_of_data_chunks * default_chunk_size) +
-(num_of_meta_chunks * default_chunk_size);
-
-   *num_of_meta_chunks_ret = num_of_meta_chunks;
-   *size_of_data_ret = num_of_data_chunks * default_chunk_size;
-   return total_size;
-}
-
 static int zero_output_file(int out_fd, u64 size)
 {
int loop_num;
@@ -1079,8 +1017,8 @@ int main(int argc, char **argv)
goto error;
}
 
-   source_dir_size = size_sourcedir(source_dir, sectorsize,
-_of_meta_chunks, 
_of_data);
+   source_dir_size = btrfs_mkfs_size_dir(source_dir, sectorsize,
+   _of_meta_chunks, _of_data);
if(block_count < source_dir_size)
block_count = source_dir_size;
ret = zero_output_file(fd, block_count);
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 2cc8a3ac06d8..83a3191d2bd7 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "internal.h"
 #include "disk-io.h"
@@ -33,6 +34,15 @@
 #include "mkfs/rootdir.h"
 #include "send-utils.h"
 
+/*
+ * This ignores symlinks with unreadable targets and subdirs that can't
+ * be read.  It's a best-effort to give a rough estimate of the size of
+ * a subdir.  It doesn't guarantee that prepopulating btrfs from this
+ * tree won't still run out of space.
+ */
+static u64 global_total_size;
+static u64 fs_block_size;
+
 static u64 index_cnt = 2;
 
 static int add_directory_items(struct btrfs_trans_handle *trans,
@@ -670,3 +680,56 @@ fail:
 out:
return ret;
 }
+
+static int ftw_add_entry_size(const char *fpath, const struct stat *st,
+ int type)
+{
+   if (type == FTW_F || type == FTW_D)
+   global_total_size += round_up(st->st_size, fs_block_size);
+
+   return 0;
+}
+
+u64 btrfs_mkfs_size_dir(const char *dir_name, u64 sectorsize,
+   u64 *num_of_meta_chunks_ret, u64 *size_of_data_ret)
+{
+   u64 dir_size = 0;
+   u64 total_size = 0;
+   int ret;
+   u64 default_chunk_size = SZ_8M;
+   u64 allocated_meta_size = SZ_8M;
+   u64 allocated_total_size = 20 * SZ_1M;  /* 20MB */
+   u64 

[PATCH] Btrfs: ref-verify: Fix NULL vs IS_ERR() check in walk_down_tree()

2017-10-18 Thread Dan Carpenter
read_tree_block() returns error pointers, and never NULL and so I have
updated the error handling.

Fixes: 74739121b4c7 ("Btrfs: add a extent ref verify tool")
Signed-off-by: Dan Carpenter 

diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c
index f65d78cf3c7e..34878699d363 100644
--- a/fs/btrfs/ref-verify.c
+++ b/fs/btrfs/ref-verify.c
@@ -584,7 +584,9 @@ static int walk_down_tree(struct btrfs_root *root, struct 
btrfs_path *path,
gen = btrfs_node_ptr_generation(path->nodes[level],
path->slots[level]);
eb = read_tree_block(fs_info, block_bytenr, gen);
-   if (!eb || !extent_buffer_uptodate(eb)) {
+   if (IS_ERR(eb))
+   return PTR_ERR(eb);
+   if (!extent_buffer_uptodate(eb)) {
free_extent_buffer(eb);
return -EIO;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix bug for misused dev_t when lookup in dev state hash table.

2017-10-18 Thread Nikolay Borisov


On 18.10.2017 06:43, Gu, Jinxiang wrote:
> Hi,
> 
>> -Original Message-
>> From: Nikolay Borisov [mailto:nbori...@suse.com]
>> Sent: Tuesday, October 17, 2017 9:36 PM
>> To: Gu, Jinxiang/顾 金香 ; linux-btrfs@vger.kernel.org; 
>> h...@lst.de
>> Subject: Re: [PATCH] btrfs: Fix bug for misused dev_t when lookup in dev 
>> state hash table.
>>
>>
>>
>> On 17.10.2017 14:34, Gu Jinxiang wrote:
>>> From: Gu JinXiang 
>>>
>>> Fix bug of commit 74d46992e0d9
>>> ("block: replace bi_bdev with a gendisk pointer and partitions index").
>>>
>>> In this modify, use bio_dev(bio) to find dev state in function
>>> __btrfsic_submit_bio. But when dev_state added to hashtable, it is
>>> using dev_t of block_device.
>>
>> This is rather incomprehensible. So bio_dev(bio) actually returns the dev_t 
>> of the device to which this bio is submitted
>> and the same dev_t should be used when btrfsic_dev_state_hashtable_add is 
>> called? What am I missing in here?
>>
> 
> bio_dev(bio) returns a dev_t of part0 which is different from dev_t in 
> block_device(bd_dev).
> bd_dev in block_device represents the exact partition.
> block_device.bd_dev = bio->bi_partno (same as block_device.bd_partno) + 
> bio_dev(bio).
> 
> When add a dev_state into hashtable it is using the exact partition's dev_t.
> So when lookup it, it should also use the exact partition's dev_t.

Right, ok. Can you please put this explanation into the changelog of the
patch and resend

> 
>>
>>>
>>> Reproduce of this bug:
>>> Use MOUNT_OPTIONS="-o check_int" when run btrfs/001 in xfstest.
>>> Then there will be WARNING like below.
>>> WARNING:
>>> btrfs: attempt to write superblock which references block M @29523968 (sda7 
>>> /654400/2) which is never written!
>>>
>>> Signed-off-by: Gu JinXiang 
>>> ---
>>>  fs/btrfs/check-integrity.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
>>> index fb07e3c22b9a..02f9eb83173f 100644
>>> --- a/fs/btrfs/check-integrity.c
>>> +++ b/fs/btrfs/check-integrity.c
>>> @@ -2803,7 +2803,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
>>> mutex_lock(_mutex);
>>> /* since btrfsic_submit_bio() is also called before
>>>  * btrfsic_mount(), this might return NULL */
>>> -   dev_state = btrfsic_dev_state_lookup(bio_dev(bio));
>>> +   dev_state = btrfsic_dev_state_lookup(bio_dev(bio) + bio->bi_partno);
>>
>> So this function looks up in btrfsic_dev_state_hashtable. And stuff in this 
>> hashtable ias added via
>> btrfsic_dev_state_hashtable_add function which seems to be only using the 
>> dev_t (after your other patch is applied):
>>
>> static void btrfsic_dev_state_hashtable_add(
>>
>> struct btrfsic_dev_state *ds,
>>
>> struct btrfsic_dev_state_hashtable *h)
>>
>> {
>>
>> const unsigned int hashval =
>>
>> (((unsigned int)((uintptr_t)ds->bdev->bd_dev)) &
>>
>>  (BTRFSIC_DEV2STATE_HASHTABLE_SIZE - 1));
>>
>>
>>
>> list_add(>collision_resolving_node, h->table + hashval);
>>
>> }
>>
>>
>> So how come your change is correct since you are passing the dev_t + 
>> partition number?
>>
>>> if (NULL != dev_state &&
>>> (bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
>>> unsigned int i = 0;
>>>
>>
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html