Removing bad hdd from btrfs volume

2015-08-06 Thread Peter Foley
Hi,

I have an btrfs volume that spans multiple disks (no raid, just
single), and earlier this morning I hit some hardware problems with
one of the disks.
I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
1gb that appears to be causing the read errors.
See http://sprunge.us/aeZC
Is there some way to figure out which file(s) are affected, and if
they are stuff I don't care about, is there some way to force btrfs to
lose the 1gb it can't copy off of the failing hdd?

Thanks,

Peter Foley
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/11] Btrfs: fallocate: Work with sectorsized blocks

2015-08-06 Thread Chandan Rajendra
While at it, this commit changes btrfs_truncate_page() to truncate sectorsized
blocks instead of pages. Hence the function has been renamed to
btrfs_truncate_block().

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 47 +--
 fs/btrfs/inode.c | 52 +++-
 3 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac314e..fec5fa9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3897,7 +3897,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
-int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
+int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e3b2b3c..f69e030 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2278,23 +2278,26 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
u64 tail_len;
u64 orig_start = offset;
u64 cur_offset;
+   unsigned char blocksize_bits;
u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
u64 drop_end;
int ret = 0;
int err = 0;
int rsv_count;
-   bool same_page;
+   bool same_block;
bool no_holes = btrfs_fs_incompat(root-fs_info, NO_HOLES);
u64 ino_size;
-   bool truncated_page = false;
+   bool truncated_block = false;
bool updated_inode = false;
 
+   blocksize_bits = inode-i_blkbits;
+
ret = btrfs_wait_ordered_range(inode, offset, len);
if (ret)
return ret;
 
mutex_lock(inode-i_mutex);
-   ino_size = round_up(inode-i_size, PAGE_CACHE_SIZE);
+   ino_size = round_up(inode-i_size, root-sectorsize);
ret = find_first_non_hole(inode, offset, len);
if (ret  0)
goto out_only_mutex;
@@ -2307,31 +2310,30 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
lockstart = round_up(offset, BTRFS_I(inode)-root-sectorsize);
lockend = round_down(offset + len,
 BTRFS_I(inode)-root-sectorsize) - 1;
-   same_page = ((offset  PAGE_CACHE_SHIFT) ==
-   ((offset + len - 1)  PAGE_CACHE_SHIFT));
-
+   same_block = ((offset  blocksize_bits)
+   == ((offset + len - 1)  blocksize_bits));
/*
-* We needn't truncate any page which is beyond the end of the file
+* We needn't truncate any block which is beyond the end of the file
 * because we are sure there is no data there.
 */
/*
-* Only do this if we are in the same page and we aren't doing the
-* entire page.
+* Only do this if we are in the same block and we aren't doing the
+* entire block.
 */
-   if (same_page  len  PAGE_CACHE_SIZE) {
+   if (same_block  len  root-sectorsize) {
if (offset  ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, len, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, len, 0);
} else {
ret = 0;
}
goto out_only_mutex;
}
 
-   /* zero back part of the first page */
+   /* zero back part of the first block */
if (offset  ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, 0, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, 0, 0);
if (ret) {
mutex_unlock(inode-i_mutex);
return ret;
@@ -2366,9 +2368,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
if (!ret) {
/* zero the front end of the last page */
if (tail_start + tail_len  ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode,
-   tail_start + tail_len, 0, 1);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode,
+   tail_start + tail_len,
+   0, 1);
if (ret)
goto 

[PATCH 08/11] Btrfs: btrfs_submit_direct_hook: Handle map_length bio vector length

2015-08-06 Thread Chandan Rajendra
In subpagesize-blocksize scenario, map_length can be less than the length of a
bio vector. Such a condition may cause btrfs_submit_direct_hook() to submit a
zero length bio. Fix this by comparing map_length against block size rather
than with bv_len.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dad76ef..1acee74 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8110,9 +8110,11 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
u64 file_offset = dip-logical_offset;
u64 submit_len = 0;
u64 map_length;
-   int nr_pages = 0;
-   int ret;
+   u32 blocksize = root-sectorsize;
int async_submit = 0;
+   int nr_sectors;
+   int ret;
+   int i;
 
map_length = orig_bio-bi_iter.bi_size;
ret = btrfs_map_block(root-fs_info, rw, start_sector  9,
@@ -8142,9 +8144,12 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
atomic_inc(dip-pending_bios);
 
while (bvec = (orig_bio-bi_io_vec + orig_bio-bi_vcnt - 1)) {
-   if (map_length  submit_len + bvec-bv_len ||
-   bio_add_page(bio, bvec-bv_page, bvec-bv_len,
-bvec-bv_offset)  bvec-bv_len) {
+   nr_sectors = bvec-bv_len  inode-i_blkbits;
+   i = 0;
+next_block:
+   if (unlikely(map_length  submit_len + blocksize ||
+   bio_add_page(bio, bvec-bv_page, blocksize,
+   bvec-bv_offset + (i * blocksize))  blocksize)) {
/*
 * inc the count before we submit the bio so
 * we know the end IO handler won't happen before
@@ -8165,7 +8170,6 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
file_offset += submit_len;
 
submit_len = 0;
-   nr_pages = 0;
 
bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev,
  start_sector, GFP_NOFS);
@@ -8183,9 +8187,14 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
bio_put(bio);
goto out_err;
}
+
+   goto next_block;
} else {
-   submit_len += bvec-bv_len;
-   nr_pages++;
+   submit_len += blocksize;
+   if (--nr_sectors) {
+   i++;
+   goto next_block;
+   }
bvec++;
}
}
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FW: btrfs-progs: android build

2015-08-06 Thread 강상우
Hi, I made btrfs-progs android build script and test it.\
And need some help on btrfs_wipe_existing_sb().\
On the test it looks work well.\
But I’m not sure it’s ok.


0001-btrfs-progs-Add-Android-build-mk-file.patch
Description: Binary data


[PATCH 09/11] Btrfs: Limit inline extents to root-sectorsize

2015-08-06 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root-sectorsize.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1acee74..daf2462 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start  0 ||
-   actual_end  PAGE_CACHE_SIZE ||
+   actual_end  root-sectorsize ||
data_len  BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size 
(actual_end  (root-sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] Btrfs: Compute and look up csums based on sectorsized blocks

2015-08-06 Thread Chandan Rajendra
Checksums are applicable to sectorsize units. The current code uses
bio-bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_SIZE. This patch makes the checksum computation and
look up code to work with sectorsize units.

Reviewed-by: Liu Bo bo.li@oracle.com
Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/file-item.c | 90 +---
 1 file changed, 57 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 58ece65..d752051 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -172,6 +172,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
u64 item_start_offset = 0;
u64 item_last_offset = 0;
u64 disk_bytenr;
+   u64 page_bytes_left;
u32 diff;
int nblocks;
int bio_index = 0;
@@ -220,6 +221,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
disk_bytenr = (u64)bio-bi_iter.bi_sector  9;
if (dio)
offset = logical_offset;
+
+   page_bytes_left = bvec-bv_len;
while (bio_index  bio-bi_vcnt) {
if (!dio)
offset = page_offset(bvec-bv_page) + bvec-bv_offset;
@@ -243,7 +246,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
if (BTRFS_I(inode)-root-root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
set_extent_bits(io_tree, offset,
-   offset + bvec-bv_len - 1,
+   offset + root-sectorsize - 1,
EXTENT_NODATASUM, GFP_NOFS);
} else {

btrfs_info(BTRFS_I(inode)-root-fs_info,
@@ -281,11 +284,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
 found:
csum += count * csum_size;
nblocks -= count;
-   bio_index += count;
+
while (count--) {
-   disk_bytenr += bvec-bv_len;
-   offset += bvec-bv_len;
-   bvec++;
+   disk_bytenr += root-sectorsize;
+   offset += root-sectorsize;
+   page_bytes_left -= root-sectorsize;
+   if (!page_bytes_left) {
+   bio_index++;
+   bvec++;
+   page_bytes_left = bvec-bv_len;
+   }
+
}
}
btrfs_free_path(path);
@@ -432,6 +441,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
struct bio_vec *bvec = bio-bi_io_vec;
int bio_index = 0;
int index;
+   int nr_sectors;
+   int i;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
@@ -459,41 +470,54 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
if (!contig)
offset = page_offset(bvec-bv_page) + bvec-bv_offset;
 
-   if (offset = ordered-file_offset + ordered-len ||
-   offset  ordered-file_offset) {
-   unsigned long bytes_left;
-   sums-len = this_sum_bytes;
-   this_sum_bytes = 0;
-   btrfs_add_ordered_sum(inode, ordered, sums);
-   btrfs_put_ordered_extent(ordered);
+   data = kmap_atomic(bvec-bv_page);
 
-   bytes_left = bio-bi_iter.bi_size - total_bytes;
 
-   sums = kzalloc(btrfs_ordered_sum_size(root, bytes_left),
-  GFP_NOFS);
-   BUG_ON(!sums); /* -ENOMEM */
-   sums-len = bytes_left;
-   ordered = btrfs_lookup_ordered_extent(inode, offset);
-   BUG_ON(!ordered); /* Logic error */
-   sums-bytenr = ((u64)bio-bi_iter.bi_sector  9) +
-  total_bytes;
-   index = 0;
+   nr_sectors = (bvec-bv_len + root-sectorsize - 1)
+inode-i_blkbits;
+
+
+   for (i = 0; i  nr_sectors; i++) {
+   if (offset = ordered-file_offset + ordered-len ||
+   offset  ordered-file_offset) {
+   unsigned long bytes_left;
+
+   sums-len = this_sum_bytes;
+   this_sum_bytes = 0;
+   btrfs_add_ordered_sum(inode, ordered, sums);
+   btrfs_put_ordered_extent(ordered);
+
+   bytes_left = 

[PATCH 03/11] Btrfs: Direct I/O read: Work on sectorsized blocks

2015-08-06 Thread Chandan Rajendra
The direct I/O read's endio and corresponding repair functions work on
page sized blocks. This commit adds the ability for direct I/O read to work on
subpagesized blocks.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 96 ++--
 1 file changed, 73 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e33dff3..ff8b699 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7630,9 +7630,9 @@ static int btrfs_check_dio_repairable(struct inode *inode,
 }
 
 static int dio_read_error(struct inode *inode, struct bio *failed_bio,
- struct page *page, u64 start, u64 end,
- int failed_mirror, bio_end_io_t *repair_endio,
- void *repair_arg)
+   struct page *page, unsigned int pgoff,
+   u64 start, u64 end, int failed_mirror,
+   bio_end_io_t *repair_endio, void *repair_arg)
 {
struct io_failure_record *failrec;
struct bio *bio;
@@ -7653,7 +7653,9 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
return -EIO;
}
 
-   if (failed_bio-bi_vcnt  1)
+   if ((failed_bio-bi_vcnt  1)
+   || (failed_bio-bi_io_vec-bv_len
+BTRFS_I(inode)-root-sectorsize))
read_mode = READ_SYNC | REQ_FAILFAST_DEV;
else
read_mode = READ_SYNC;
@@ -7661,7 +7663,7 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
isector = start - btrfs_io_bio(failed_bio)-logical;
isector = inode-i_sb-s_blocksize_bits;
bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
- 0, isector, repair_endio, repair_arg);
+   pgoff, isector, repair_endio, repair_arg);
if (!bio) {
free_io_failure(inode, failrec);
return -EIO;
@@ -7691,12 +7693,17 @@ struct btrfs_retry_complete {
 static void btrfs_retry_endio_nocsum(struct bio *bio, int err)
 {
struct btrfs_retry_complete *done = bio-bi_private;
+   struct inode *inode;
struct bio_vec *bvec;
int i;
 
if (err)
goto end;
 
+   BUG_ON(bio-bi_vcnt != 1);
+   inode = bio-bi_io_vec-bv_page-mapping-host;
+   BUG_ON(bio-bi_io_vec-bv_len != BTRFS_I(inode)-root-sectorsize);
+
done-uptodate = 1;
bio_for_each_segment_all(bvec, bio, i)
clean_io_failure(done-inode, done-start, bvec-bv_page, 0);
@@ -7711,22 +7718,30 @@ static int __btrfs_correct_data_nocsum(struct inode 
*inode,
struct bio_vec *bvec;
struct btrfs_retry_complete done;
u64 start;
+   unsigned int pgoff;
+   u32 sectorsize;
+   int nr_sectors;
int i;
int ret;
 
+   sectorsize = BTRFS_I(inode)-root-sectorsize;
+
start = io_bio-logical;
done.inode = inode;
 
bio_for_each_segment_all(bvec, io_bio-bio, i) {
-try_again:
+   nr_sectors = bvec-bv_len  inode-i_blkbits;
+   pgoff = bvec-bv_offset;
+
+next_block_or_try_again:
done.uptodate = 0;
done.start = start;
init_completion(done.done);
 
-   ret = dio_read_error(inode, io_bio-bio, bvec-bv_page, start,
-start + bvec-bv_len - 1,
-io_bio-mirror_num,
-btrfs_retry_endio_nocsum, done);
+   ret = dio_read_error(inode, io_bio-bio, bvec-bv_page,
+   pgoff, start, start + sectorsize - 1,
+   io_bio-mirror_num,
+   btrfs_retry_endio_nocsum, done);
if (ret)
return ret;
 
@@ -7734,10 +7749,15 @@ try_again:
 
if (!done.uptodate) {
/* We might have another mirror, so try again */
-   goto try_again;
+   goto next_block_or_try_again;
}
 
-   start += bvec-bv_len;
+   start += sectorsize;
+
+   if (nr_sectors--) {
+   pgoff += sectorsize;
+   goto next_block_or_try_again;
+   }
}
 
return 0;
@@ -7747,7 +7767,9 @@ static void btrfs_retry_endio(struct bio *bio, int err)
 {
struct btrfs_retry_complete *done = bio-bi_private;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+   struct inode *inode;
struct bio_vec *bvec;
+   u64 start;
int uptodate;
int ret;
int i;
@@ -7756,13 +7778,20 @@ static void btrfs_retry_endio(struct bio *bio, int err)
goto end;
 
uptodate = 1;
+
+   start = done-start;
+
+   BUG_ON(bio-bi_vcnt != 

[PATCH 05/11] Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units

2015-08-06 Thread Chandan Rajendra
In subpagesize-blocksize scenario, if i_size occurs in a block which is not
the last block in the page, then the space to be reserved should be calculated
appropriately.

Reviewed-by: Liu Bo bo.li@oracle.com
Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index afb8d2b..b39273b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8626,11 +8626,24 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
loff_t size;
int ret;
int reserved = 0;
+   u64 reserved_space;
u64 page_start;
u64 page_end;
+   u64 end;
+
+   reserved_space = PAGE_CACHE_SIZE;
 
sb_start_pagefault(inode-i_sb);
-   ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+
+   /*
+ Reserving delalloc space after obtaining the page lock can lead to
+ deadlock. For example, if a dirty page is locked by this function
+ and the call to btrfs_delalloc_reserve_space() ends up triggering
+ dirty page write out, then the btrfs_writepage() function could
+ end up waiting indefinitely to get a lock on the page currently
+ being processed by btrfs_page_mkwrite() function.
+*/
+   ret  = btrfs_delalloc_reserve_space(inode, reserved_space);
if (!ret) {
ret = file_update_time(vma-vm_file);
reserved = 1;
@@ -8651,6 +8664,7 @@ again:
size = i_size_read(inode);
page_start = page_offset(page);
page_end = page_start + PAGE_CACHE_SIZE - 1;
+   end = page_end;
 
if ((page-mapping != inode-i_mapping) ||
(page_start = size)) {
@@ -8666,7 +8680,7 @@ again:
 * we can't set the delalloc bits if there are pending ordered
 * extents.  Drop our locks and wait for them to finish
 */
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
if (ordered) {
unlock_extent_cached(io_tree, page_start, page_end,
 cached_state, GFP_NOFS);
@@ -8676,6 +8690,18 @@ again:
goto again;
}
 
+   if (page-index == ((size - 1)  PAGE_CACHE_SHIFT)) {
+   reserved_space = round_up(size - page_start, root-sectorsize);
+   if (reserved_space  PAGE_CACHE_SIZE) {
+   end = page_start + reserved_space - 1;
+   spin_lock(BTRFS_I(inode)-lock);
+   BTRFS_I(inode)-outstanding_extents++;
+   spin_unlock(BTRFS_I(inode)-lock);
+   btrfs_delalloc_release_space(inode,
+   PAGE_CACHE_SIZE - 
reserved_space);
+   }
+   }
+
/*
 * XXX - page_mkwrite gets called every time the page is dirtied, even
 * if it was already dirty, so for space accounting reasons we need to
@@ -8683,12 +8709,12 @@ again:
 * is probably a better way to do this, but for now keep consistent with
 * prepare_pages in the normal write path.
 */
-   clear_extent_bit(BTRFS_I(inode)-io_tree, page_start, page_end,
+   clear_extent_bit(BTRFS_I(inode)-io_tree, page_start, end,
  EXTENT_DIRTY | EXTENT_DELALLOC |
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, cached_state, GFP_NOFS);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, page_end,
+   ret = btrfs_set_extent_delalloc(inode, page_start, end,
cached_state);
if (ret) {
unlock_extent_cached(io_tree, page_start, page_end,
@@ -8727,7 +8753,7 @@ out_unlock:
}
unlock_page(page);
 out:
-   btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+   btrfs_delalloc_release_space(inode, reserved_space);
 out_noreserve:
sb_end_pagefault(inode-i_sb);
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/11] Btrfs: Clean pte corresponding to page straddling i_size

2015-08-06 Thread Chandan Rajendra
When extending a file by either truncate up or by writing beyond i_size, the
page which had i_size needs to be marked read only so that future writes to
the page via mmap interface causes btrfs_page_mkwrite() to be invoked. If not,
a write performed after extending the file via the mmap interface will find
the page to be writaeable and continue writing to the page without invoking
btrfs_page_mkwrite() i.e. we end up writing to a file without reserving disk
space.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/file.c  | 12 ++--
 fs/btrfs/inode.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f69e030..aba215c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1755,6 +1755,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
ssize_t err;
loff_t pos;
size_t count;
+   loff_t oldsize;
+   int clean_page = 0;
 
mutex_lock(inode-i_mutex);
err = generic_write_checks(iocb, from);
@@ -1793,14 +1795,17 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
pos = iocb-ki_pos;
count = iov_iter_count(from);
start_pos = round_down(pos, root-sectorsize);
-   if (start_pos  i_size_read(inode)) {
+   oldsize = i_size_read(inode);
+   if (start_pos  oldsize) {
/* Expand hole size to cover write data, preventing empty gap */
end_pos = round_up(pos + count, root-sectorsize);
-   err = btrfs_cont_expand(inode, i_size_read(inode), end_pos);
+   err = btrfs_cont_expand(inode, oldsize, end_pos);
if (err) {
mutex_unlock(inode-i_mutex);
goto out;
}
+   if (start_pos  round_up(oldsize, root-sectorsize))
+   clean_page = 1;
}
 
if (sync)
@@ -1812,6 +1817,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
num_written = __btrfs_buffered_write(file, from, pos);
if (num_written  0)
iocb-ki_pos = pos + num_written;
+   if (clean_page)
+   pagecache_isize_extended(inode, oldsize,
+   i_size_read(inode));
}
 
mutex_unlock(inode-i_mutex);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ea7d9f1..0a8a5ff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4824,7 +4824,6 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
}
 
if (newsize  oldsize) {
-   truncate_pagecache(inode, newsize);
/*
 * Don't do an expanding truncate while snapshoting is ongoing.
 * This is to ensure the snapshot captures a fully consistent
@@ -4847,6 +4846,7 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
 
i_size_write(inode, newsize);
btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
+   pagecache_isize_extended(inode, oldsize, newsize);
ret = btrfs_update_inode(trans, root, inode);
btrfs_end_write_no_snapshoting(root);
btrfs_end_transaction(trans, root);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/11] Btrfs: Fix block size returned to user space

2015-08-06 Thread Chandan Rajendra
btrfs_getattr() returns PAGE_CACHE_SIZE as the block size. Since
generic_fillattr() already does the right thing (by obtaining block size
from inode-i_blkbits), just remove the statement from btrfs_getattr.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index daf2462..ea7d9f1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9164,7 +9164,6 @@ static int btrfs_getattr(struct vfsmount *mnt,
 
generic_fillattr(inode, stat);
stat-dev = BTRFS_I(inode)-root-anon_dev;
-   stat-blksize = PAGE_CACHE_SIZE;
 
spin_lock(BTRFS_I(inode)-lock);
delalloc_bytes = BTRFS_I(inode)-delalloc_bytes;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/11] Btrfs: Pre subpagesize-blocksize cleanups

2015-08-06 Thread Chandan Rajendra
Hello all,

The patches posted along with this cover letter are cleanups made
during the developement of subpagesize-blocksize patchset. I believe
that they can be integrated with the mainline kernel. Hence I have
posted them separately from the subpagesize-blocksize patchset.

I have testsed the patchset by running xfstests on ppc64 and
x86_64. On ppc64, some of the Btrfs specific tests and generic/255
fail because they assume 4K as the filesystem's block size. I have
fixed some of the test cases. I will fix the rest and mail them to the
fstests mailing list in the near future.

Chandan Rajendra (11):
  Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to
block size
  Btrfs: Compute and look up csums based on sectorsized blocks
  Btrfs: Direct I/O read: Work on sectorsized blocks
  Btrfs: fallocate: Work with sectorsized blocks
  Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units
  Btrfs: Search for all ordered extents that could span across a page
  Btrfs: Use (eb-start, seq) as search key for tree modification log
  Btrfs: btrfs_submit_direct_hook: Handle map_length  bio vector length
  Btrfs: Limit inline extents to root-sectorsize
  Btrfs: Fix block size returned to user space
  Btrfs: Clean pte corresponding to page straddling i_size

 fs/btrfs/ctree.c |  34 
 fs/btrfs/ctree.h |   2 +-
 fs/btrfs/extent_io.c |   3 +-
 fs/btrfs/file-item.c |  90 ---
 fs/btrfs/file.c  |  99 +
 fs/btrfs/inode.c | 239 ---
 6 files changed, 308 insertions(+), 159 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/11] Btrfs: Search for all ordered extents that could span across a page

2015-08-06 Thread Chandan Rajendra
In subpagesize-blocksize scenario it is not sufficient to search using the
first byte of the page to make sure that there are no ordered extents
present across the page. Fix this.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/extent_io.c |  3 ++-
 fs/btrfs/inode.c | 25 ++---
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a3ec2c8..65691a0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3164,7 +3164,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
 
while (1) {
lock_extent(tree, start, end);
-   ordered = btrfs_lookup_ordered_extent(inode, start);
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   PAGE_CACHE_SIZE);
if (!ordered)
break;
unlock_extent(tree, start, end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b39273b..dad76ef 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1975,7 +1975,8 @@ again:
if (PagePrivate2(page))
goto out;
 
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start,
+   PAGE_CACHE_SIZE);
if (ordered) {
unlock_extent_cached(BTRFS_I(inode)-io_tree, page_start,
 page_end, cached_state, GFP_NOFS);
@@ -8519,6 +8520,8 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
struct extent_state *cached_state = NULL;
u64 page_start = page_offset(page);
u64 page_end = page_start + PAGE_CACHE_SIZE - 1;
+   u64 start;
+   u64 end;
int inode_evicting = inode-i_state  I_FREEING;
 
/*
@@ -8538,14 +8541,18 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
if (!inode_evicting)
lock_extent_bits(tree, page_start, page_end, 0, cached_state);
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+again:
+   start = page_start;
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   page_end - start + 1);
if (ordered) {
+   end = min(page_end, ordered-file_offset + ordered-len - 1);
/*
 * IO on this page will never be started, so we need
 * to account for any ordered extents now
 */
if (!inode_evicting)
-   clear_extent_bit(tree, page_start, page_end,
+   clear_extent_bit(tree, start, end,
 EXTENT_DIRTY | EXTENT_DELALLOC |
 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
 EXTENT_DEFRAG, 1, 0, cached_state,
@@ -8562,22 +8569,26 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
spin_lock_irq(tree-lock);
set_bit(BTRFS_ORDERED_TRUNCATED, ordered-flags);
-   new_len = page_start - ordered-file_offset;
+   new_len = start - ordered-file_offset;
if (new_len  ordered-truncated_len)
ordered-truncated_len = new_len;
spin_unlock_irq(tree-lock);
 
if (btrfs_dec_test_ordered_pending(inode, ordered,
-  page_start,
-  PAGE_CACHE_SIZE, 1))
+  start,
+  end - start + 1, 1))
btrfs_finish_ordered_io(ordered);
}
btrfs_put_ordered_extent(ordered);
if (!inode_evicting) {
cached_state = NULL;
-   lock_extent_bits(tree, page_start, page_end, 0,
+   lock_extent_bits(tree, start, end, 0,
 cached_state);
}
+
+   start = end + 1;
+   if (start  page_end)
+   goto again;
}
 
if (!inode_evicting) {
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix stale dir entries after removing a link and fsync

2015-08-06 Thread fdmanana
From: Filipe Manana fdman...@suse.com

We have one more case where after a log tree is replayed we get
inconsistent metadata leading to stale directory entries, due to
some directories having entries pointing to some inode while the
inode does not have a matching BTRFS_INODE_[REF|EXTREF]_KEY item.

To trigger the problem we need to have a file with multiple hard links
belonging to different parent directories. Then if one of those hard
links is removed and we fsync the file using one of its other links
that belongs to a different parent directory, we end up not logging
the fact that the removed hard link doesn't exists anymore in the
parent directory.

Simple reproducer:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo QA output created by $seq
  tmp=/tmp/$$
  status=1  # failure is the default!
  trap _cleanup; exit \$status 0 1 2 3 15

  _cleanup()
  {
  _cleanup_flakey
  rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs generic
  _supported_os Linux
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs $seqres.full 21
  _init_flakey
  _mount_flakey

  # Create our test directory and file.
  mkdir $SCRATCH_MNT/testdir
  touch $SCRATCH_MNT/foo
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo2
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo3

  # Make sure everything done so far is durably persisted.
  sync

  # Now we remove one of our file's hardlinks in the directory testdir.
  unlink $SCRATCH_MNT/testdir/foo3

  # We now fsync our file using the foo link, which has a parent that
  # is not the directory testdir.
  $XFS_IO_PROG -c fsync $SCRATCH_MNT/foo

  # Silently drop all writes and unmount to simulate a crash/power
  # failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again, mount to trigger journal/log replay.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # After the journal/log is replayed we expect to not see the foo3
  # link anymore and we should be able to remove all names in the
  # directory testdir and then remove it (no stale directory entries
  # left after the journal/log replay).
  echo Entries in testdir:
  ls -1 $SCRATCH_MNT/testdir

  rm -f $SCRATCH_MNT/testdir/*
  rmdir $SCRATCH_MNT/testdir

  _unmount_flakey

  status=0
  exit

The test fails with:

  $ ./check generic/107
  FSTYP -- btrfs
  PLATFORM  -- Linux/x86_64 debian3 4.1.0-rc6-btrfs-next-11+
  MKFS_OPTIONS  -- /dev/sdc
  MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

  generic/107 3s ... - output mismatch (see .../results/generic/107.out.bad)
--- tests/generic/107.out   2015-08-01 01:39:45.807462161 +0100
+++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad
@@ -1,3 +1,5 @@
 QA output created by 107
 Entries in testdir:
 foo2
+foo3
+rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': 
Directory not empty
...
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent \
  (see /home/fdmanana/git/hub/xfstests/results//generic/107.full)
_check_dmesg: something found in dmesg (see .../results/generic/107.dmesg)
  Ran: generic/107
  Failures: generic/107
  Failed 1 of 1 tests

  $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full
  (...)
  checking fs roots
  root 5 inode 257 errors 200, dir isize wrong
unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 errors 5, 
no dir item, no inode ref
  (...)

And produces the following warning in dmesg:

  [127298.759064] BTRFS info (device dm-0): failed to delete reference to foo3, 
inode 258 parent 257
  [127298.762081] [ cut here ]
  [127298.763311] WARNING: CPU: 10 PID: 7891 at fs/btrfs/inode.c:3956 
__btrfs_unlink_inode+0x182/0x35a [btrfs]()
  [127298.767327] BTRFS: Transaction aborted (error -2)
  (...)
  [127298.788611] Call Trace:
  [127298.789137]  [8145f077] dump_stack+0x4f/0x7b
  [127298.790090]  [81095de5] ? console_unlock+0x356/0x3a2
  [127298.791157]  [8104b3b0] warn_slowpath_common+0xa1/0xbb
  [127298.792323]  [a065ad09] ? __btrfs_unlink_inode+0x182/0x35a 
[btrfs]
  [127298.793633]  [8104b410] warn_slowpath_fmt+0x46/0x48
  [127298.794699]  [a065ad09] __btrfs_unlink_inode+0x182/0x35a [btrfs]
  [127298.797640]  [a065be8f] btrfs_unlink_inode+0x1e/0x40 [btrfs]
  [127298.798876]  [a065bf11] btrfs_unlink+0x60/0x9b [btrfs]
  [127298.800154]  [8116fb48] vfs_unlink+0x9c/0xed
  [127298.801303]  [81173481] do_unlinkat+0x12b/0x1fb
  [127298.802450]  [81253855] ? lockdep_sys_exit_thunk+0x12/0x14
  [127298.803797]  [81174056] SyS_unlinkat+0x29/0x2b
  [127298.805017]  [81465197] system_call_fastpath+0x12/0x6f
  [127298.806310] ---[ end trace 

[PATCH] fstests: generic test for fsync of file with multiple links

2015-08-06 Thread fdmanana
From: Filipe Manana fdman...@suse.com

Test that when we have a file with multiple hard links belonging to
different parent directories, if we remove one of those links, fsync the
file using one of its other links (that has a parent directory different
from the one we removed a link from), power fail and then replay the
fsync log/journal, the hard link we removed is not available anymore and
all the filesystem metadata is in a consistent state.

This test is motivated by an issue found in btrfs, where the test fails
with:

  generic/107 2s ... - output mismatch (see .../results/generic/107.out.bad)
--- tests/generic/107.out   2015-08-04 09:47:46.922131256 +0100
+++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad
@@ -1,3 +1,5 @@
 QA output created by 107
 Entries in testdir:
 foo2
+foo3
+rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': 
Directory not empty
...
(Run 'diff -u tests/generic/107.out .../generic/107.out.bad'  to see the 
entire diff)
  _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see 
.../generic/107.full)
  _check_dmesg: something found in dmesg (see .../generic/107.dmesg)

  $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full
  _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
  *** fsck.btrfs output ***
  checking extents
  checking free space cache
  checking fs roots
  root 5 inode 257 errors 200, dir isize wrong
unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 \
  errors 5, no dir item, no inode ref

  $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.dmesg
  (...)
  [188897.707311] BTRFS info (device dm-0): failed to delete reference to \
foo3, inode 258 parent 257
  [188897.711345] [ cut here ]
  [188897.713369] WARNING: CPU: 10 PID: 19452 at fs/btrfs/inode.c:3956 \
__btrfs_unlink_inode+0x182/0x35a [btrfs]()
  [188897.717661] BTRFS: Transaction aborted (error -2)
  (...)
  [188897.747898] Call Trace:
  [188897.748519]  [8145f077] dump_stack+0x4f/0x7b
  [188897.749602]  [81095de5] ? console_unlock+0x356/0x3a2
  [188897.750682]  [8104b3b0] warn_slowpath_common+0xa1/0xbb
  [188897.751936]  [a04c5d09] ? __btrfs_unlink_inode+0x182/0x35a 
[btrfs]
  [188897.753485]  [8104b410] warn_slowpath_fmt+0x46/0x48
  [188897.754781]  [a04c5d09] __btrfs_unlink_inode+0x182/0x35a [btrfs]
  [188897.756295]  [a04c6e8f] btrfs_unlink_inode+0x1e/0x40 [btrfs]
  [188897.757692]  [a04c6f11] btrfs_unlink+0x60/0x9b [btrfs]
  [188897.758978]  [8116fb48] vfs_unlink+0x9c/0xed
  [188897.760151]  [81173481] do_unlinkat+0x12b/0x1fb
  [188897.761354]  [81253855] ? lockdep_sys_exit_thunk+0x12/0x14
  [188897.762692]  [81174056] SyS_unlinkat+0x29/0x2b
  [188897.763741]  [81465197] system_call_fastpath+0x12/0x6f
  [188897.764894] ---[ end trace bbfddacb7aaada8c ]---
  [188897.765801] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: \
Aborting unused transaction(No such entry).

Tested against ext3/4, xfs, reiserfs and f2fs too, and all these
filesystems currently pass this test (on a 4.1 linux kernel at least).

The btrfs issue is fixed by the linux kernel patch titled:
Btrfs: fix stale dir entries after removing a link and fsync.

Signed-off-by: Filipe Manana fdman...@suse.com
---
 tests/generic/107 | 99 +++
 tests/generic/107.out |  3 ++
 tests/generic/group   |  1 +
 3 files changed, 103 insertions(+)
 create mode 100755 tests/generic/107
 create mode 100644 tests/generic/107.out

diff --git a/tests/generic/107 b/tests/generic/107
new file mode 100755
index 000..7d107d7
--- /dev/null
+++ b/tests/generic/107
@@ -0,0 +1,99 @@
+#! /bin/bash
+# FSQA Test No. 107
+#
+# Test that when we have a file with multiple hard links belonging to different
+# parent directories, if we remove one of those links, fsync the file using one
+# of its other links (that has a parent directory different from the one we
+# removed a link from), power fail and then replay the fsync log/journal, the
+# hard link we removed is not available anymore and all the filesystem metadata
+# is in a consistent state.
+#
+#---
+#
+# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana fdman...@suse.com
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# 

Re: Data single *and* raid?

2015-08-06 Thread Hendrik Friedel

Hello Quo,

thanks for your reply.

 But then:

root@homeserver:/mnt/__Complete_Disk# btrfs fi df /mnt/__Complete_Disk/
Data, RAID5: total=3.83TiB, used=3.78TiB
System, RAID5: total=32.00MiB, used=576.00KiB
Metadata, RAID5: total=6.46GiB, used=4.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

GlobalReserve is not a chunk type, it just means a range of metadata
reserved for overcommiting.
And it's always single.

Personally, I don't think it should be output in fi df command, as
it's in a higher level than chunk.

At least for your case, nothing is needed to worry about.



But this seems to be a RAID5 now, right?
Well, that's what I want, but the command was:
btrfs balance start -dprofiles=single -mprofiles=raid1
/mnt/__Complete_Disk/

So, we would expect raid1 here, no?

Greetings,
Hendrik





On 01.08.2015 22:44, Chris Murphy wrote:

On Sat, Aug 1, 2015 at 2:32 PM, Hugo Mills h...@carfax.org.uk wrote:

On Sat, Aug 01, 2015 at 10:09:35PM +0200, Hendrik Friedel wrote:

Hello,

I converted an array to raid5 by
btrfs device add /dev/sdd /mnt/new_storage
btrfs device add /dev/sdc /mnt/new_storage
btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/new_storage/

The Balance went through. But now:
Label: none  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
 Total devices 3 FS bytes used 5.28TiB
 devid1 size 2.73TiB used 2.57TiB path /dev/sde
 devid2 size 2.73TiB used 2.73TiB path /dev/sdc
 devid3 size 2.73TiB used 2.73TiB path /dev/sdd
btrfs-progs v4.1.1

Already the 2.57TiB is a bit surprising:
root@homeserver:/mnt# btrfs fi df /mnt/new_storage/
Data, single: total=2.55TiB, used=2.55TiB
Data, RAID5: total=2.73TiB, used=2.72TiB
System, RAID5: total=32.00MiB, used=736.00KiB
Metadata, RAID1: total=6.00GiB, used=5.33GiB
Metadata, RAID5: total=3.00GiB, used=2.99GiB


Looking at the btrfs fi show output, you've probably run out of
space during the conversion, probably due to an uneven distribution of
the original single chunks.

I think I would suggest balancing the single chunks, and trying the
conversion (of the unconverted parts) again:

# btrfs balance start -dprofiles=single -mprofile=raid1
/mnt/new_storage/
# btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft
/mnt/new_storage/



Yep I bet that's it also. btrfs fi usage might be better at exposing
this case.








--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5

2015-08-06 Thread Liu Bo
Hi,

On Wed, Aug 05, 2015 at 10:28:05AM +0200, Elias Probst wrote:
 I can reproduce a hard btrfs lockup (process issuing the ioctl() is in
 D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5.
 
 I had the same issue on 4.1, so it's unlikely a regression introduced in
 4.2.
 
 ## With the following steps, I can reproduce the problem:
 
 1. Create a new clean btrfs volume for /var/lib/machines
 machinectl set-limit 6G
 
 2. Paste this to /tmp/yum.conf
 [main]
 reposdir=/dev/null
 gpgcheck=0
 logfile=/var/log/yum.log
 installroot=/var/lib/machines/centos7.1-base
 assumeyes=1
 
 [base]
 name=CentOS 7.1.1503 - x86_64
 baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/
 enabled=1
 
 3. Bootstrap a CentOS 7.1 base image
 /usr/bin/yum -c /tmp/yum.conf groupinstall Base
 
 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base'
 strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD
 /var/lib/machines/centos7.1-base/
 
 
 `systemd-nspawn` will now just hang forever.
 I couldn't come up yet with a shorter/more low-level way to reproduce this as 
 I lack quite a bit of btrfs experience.

Thank you for reporting this.

Could you do 'echo w  /proc/sysrq-trigger' to gather the whole hang call stack?

Here's a quick patch that may address your problem, can you give it a shot after
getting sysrq-w output?

Thanks,

-liubo

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0770c91..b52bd66 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3478,6 +3478,22 @@ process_slot:
drop_start = new_key.offset;
 
/*
+* We need to look up the roots that point at
+* this bytenr and see if the new root does.  If
+* it does not we need to make sure we update
+* quotas appropriately.
+*/
+   if (disko  root != BTRFS_I(src)-root 
+   disko != last_disko) {
+   no_quota = check_ref(trans, root,
+disko);
+   if (no_quota  0) {
+   ret = no_quota;
+   goto out;
+   }
+   }
+
+   /*
 * 1 - adjusting old extent (we may have to * split it)
 * 1 - add new extent
 * 1 - inode update
@@ -3544,27 +3560,6 @@ process_slot:
btrfs_set_file_extent_num_bytes(leaf, extent,
datal);
 
-   /*
-* We need to look up the roots that point at
-* this bytenr and see if the new root does.  If
-* it does not we need to make sure we update
-* quotas appropriately.  - 
 */
-   if (disko  root != BTRFS_I(src)-root 
-   disko != last_disko) {
-   no_quota = check_ref(trans, root,
-disko);
-   if (no_quota  0) {
-   btrfs_abort_transaction(trans,
-   root,
-   ret);
-   btrfs_end_transaction(trans,
- root);
-   ret = no_quota;
-   goto out;
-   }
-   }
-
if (disko) {
inode_add_bytes(inode, datal);
ret = btrfs_inc_extent_ref(trans, root,


 
 ## Results:
 
 - Last 'strace' lines
 6095  fchown(16, 0, 0)  = 0
 6095  fchmod(16, 0755)  = 0
 6095  utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0
 6095  flistxattr(15, , 100)   = 0
 6095  getdents(15, /* 3 entries */, 32768) = 80
 6095  newfstatat(15, coreutils.mo, {st_mode=S_IFREG|0644, st_size=357263, 
 ...}, AT_SYMLINK_NOFOLLOW) = 0
 6095  openat(15, coreutils.mo, O_RDONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = 17
 6095  openat(16, coreutils.mo, 
 O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0644) = 18
 6095  fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
 6095  ioctl(18, BTRFS_IOC_CLONE
 
 - call trace in Kernel journal:
 Aug 05 10:10:03 moria kernel: 

Re: [PATCH 01/11] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size

2015-08-06 Thread Qu Wenruo

Hi Chanda,

Thanks for your effort to implement sub pagesize block size.

These cleanups look quite good, but still some small readablity 
recommendation inlined below.


Chandan Rajendra wrote on 2015/08/06 15:40 +0530:

Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE
units. Fix this by doing reservation/releases in block size units.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
  fs/btrfs/file.c | 40 
  1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 795d754..e3b2b3c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1362,16 +1362,19 @@ fail:
  static noinline int
  lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos,
+   size_t write_bytes,
u64 *lockstart, u64 *lockend,
struct extent_state **cached_state)
  {
+   struct btrfs_root *root = BTRFS_I(inode)-root;
u64 start_pos;
u64 last_pos;
int i;
int ret = 0;

-   start_pos = pos  ~((u64)PAGE_CACHE_SIZE - 1);
-   last_pos = start_pos + ((u64)num_pages  PAGE_CACHE_SHIFT) - 1;
+   start_pos = pos  ~((u64)root-sectorsize - 1);

Why not just roundown(pos, root-sectorisze)
Hard coded align is never that easy to read.


+   last_pos = start_pos
+   + ALIGN(pos + write_bytes - start_pos, root-sectorsize) - 1;

Maybe just a preference problem, I'd prefer to use round_down other than
ALIGN, as sometimes I still need to figure out if it is round_down or 
round_down.


if (start_pos  inode-i_size) {
struct btrfs_ordered_extent *ordered;
@@ -1489,6 +1492,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,

while (iov_iter_count(i)  0) {
size_t offset = pos  (PAGE_CACHE_SIZE - 1);
+   size_t sector_offset;
size_t write_bytes = min(iov_iter_count(i),
 nrptrs * (size_t)PAGE_CACHE_SIZE -
 offset);
@@ -1497,6 +1501,8 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
size_t reserve_bytes;
size_t dirty_pages;
size_t copied;
+   size_t dirty_sectors;
+   size_t num_sectors;

WARN_ON(num_pages  nrptrs);

@@ -1509,8 +1515,12 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
break;
}

-   reserve_bytes = num_pages  PAGE_CACHE_SHIFT;
+   sector_offset = pos  (root-sectorsize - 1);

Same here.

Thanks,
Qu

+   reserve_bytes = ALIGN(write_bytes + sector_offset,
+   root-sectorsize);
+
ret = btrfs_check_data_free_space(inode, reserve_bytes, 
write_bytes);
+
if (ret == -ENOSPC 
(BTRFS_I(inode)-flags  (BTRFS_INODE_NODATACOW |
  BTRFS_INODE_PREALLOC))) {
@@ -1523,7 +1533,9 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 */
num_pages = DIV_ROUND_UP(write_bytes + offset,
 PAGE_CACHE_SIZE);
-   reserve_bytes = num_pages  PAGE_CACHE_SHIFT;
+   reserve_bytes = ALIGN(write_bytes + 
sector_offset,
+   root-sectorsize);
+
ret = 0;
} else {
ret = -ENOSPC;
@@ -1558,8 +1570,8 @@ again:
break;

ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
- pos, lockstart, lockend,
- cached_state);
+   pos, write_bytes, lockstart,
+   lockend, cached_state);
if (ret  0) {
if (ret == -EAGAIN)
goto again;
@@ -1595,9 +1607,14 @@ again:
 * we still have an outstanding extent for the chunk we actually
 * managed to copy.
 */
-   if (num_pages  dirty_pages) {
-   release_bytes = (num_pages - dirty_pages) 
-   PAGE_CACHE_SHIFT;
+   num_sectors = reserve_bytes  inode-i_blkbits;
+   dirty_sectors = round_up(copied + sector_offset,
+   root-sectorsize);
+   dirty_sectors = inode-i_blkbits;
+
+   if (num_sectors  

Re: [PATCH 01/11] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size

2015-08-06 Thread Chandan Rajendra
On Friday 07 Aug 2015 11:08:30 Qu Wenruo wrote:
 Hi Chanda,
 
 Thanks for your effort to implement sub pagesize block size.
 
 These cleanups look quite good, but still some small readablity
 recommendation inlined below.
 
 Chandan Rajendra wrote on 2015/08/06 15:40 +0530:
  Currently, the code reserves/releases extents in multiples of
  PAGE_CACHE_SIZE units. Fix this by doing reservation/releases in block
  size units.
  
  Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
  ---
  
fs/btrfs/file.c | 40 
1 file changed, 28 insertions(+), 12 deletions(-)
  
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index 795d754..e3b2b3c 100644
  --- a/fs/btrfs/file.c
  +++ b/fs/btrfs/file.c
  
  @@ -1362,16 +1362,19 @@ fail:
static noinline int
lock_and_cleanup_extent_if_need(struct inode *inode, struct page
**pages,

  size_t num_pages, loff_t pos,
  
  +   size_t write_bytes,
  
  u64 *lockstart, u64 *lockend,
  struct extent_state **cached_state)

{
  
  +   struct btrfs_root *root = BTRFS_I(inode)-root;
  
  u64 start_pos;
  u64 last_pos;
  int i;
  int ret = 0;
  
  -   start_pos = pos  ~((u64)PAGE_CACHE_SIZE - 1);
  -   last_pos = start_pos + ((u64)num_pages  PAGE_CACHE_SHIFT) - 1;
  +   start_pos = pos  ~((u64)root-sectorsize - 1);
 
 Why not just roundown(pos, root-sectorisze)
 Hard coded align is never that easy to read.

Qu Wenruo, Thanks for pointing it out. I will replace them with
round_[down,up] calls and post V2. 

 
  +   last_pos = start_pos
  +   + ALIGN(pos + write_bytes - start_pos, root-sectorsize) - 1;
 
 Maybe just a preference problem, I'd prefer to use round_down other than
 ALIGN, as sometimes I still need to figure out if it is round_down or
 round_down.
 
  if (start_pos  inode-i_size) {
  
  struct btrfs_ordered_extent *ordered;
  
  @@ -1489,6 +1492,7 @@ static noinline ssize_t
  __btrfs_buffered_write(struct file *file, 
  while (iov_iter_count(i)  0) {
  
  size_t offset = pos  (PAGE_CACHE_SIZE - 1);
  
  +   size_t sector_offset;
  
  size_t write_bytes = min(iov_iter_count(i),
  
   nrptrs * (size_t)PAGE_CACHE_SIZE -
   offset);
  
  @@ -1497,6 +1501,8 @@ static noinline ssize_t
  __btrfs_buffered_write(struct file *file, 
  size_t reserve_bytes;
  size_t dirty_pages;
  size_t copied;
  
  +   size_t dirty_sectors;
  +   size_t num_sectors;
  
  WARN_ON(num_pages  nrptrs);
  
  @@ -1509,8 +1515,12 @@ static noinline ssize_t
  __btrfs_buffered_write(struct file *file, 
  break;
  
  }
  
  -   reserve_bytes = num_pages  PAGE_CACHE_SHIFT;
  +   sector_offset = pos  (root-sectorsize - 1);
 
 Same here.
 
 Thanks,
 Qu
 
  +   reserve_bytes = ALIGN(write_bytes + sector_offset,
  +   root-sectorsize);
  +
  
  ret = btrfs_check_data_free_space(inode, reserve_bytes, 
write_bytes);
  
  +
  
  if (ret == -ENOSPC 
  
  (BTRFS_I(inode)-flags  (BTRFS_INODE_NODATACOW |
  
BTRFS_INODE_PREALLOC))) {
  
  @@ -1523,7 +1533,9 @@ static noinline ssize_t
  __btrfs_buffered_write(struct file *file, 
   */
  
  num_pages = DIV_ROUND_UP(write_bytes + offset,
  
   PAGE_CACHE_SIZE);
  
  -   reserve_bytes = num_pages  PAGE_CACHE_SHIFT;
  +   reserve_bytes = ALIGN(write_bytes + 
sector_offset,
  +   root-sectorsize);
  +
  
  ret = 0;
  
  } else {
  
  ret = -ENOSPC;
  
  @@ -1558,8 +1570,8 @@ again:
  break;
  
  ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
  
  - pos, lockstart, 
lockend,
  - cached_state);
  +   pos, write_bytes, lockstart,
  +   lockend, cached_state);
  
  if (ret  0) {
  
  if (ret == -EAGAIN)
  
  goto again;
  
  @@ -1595,9 +1607,14 @@ again:
   * we still have an outstanding extent for the chunk we 
actually
   * managed to copy.
   */
  
  - 

RE: [PATCH V2] btrfs-progs: add newline to some error messages

2015-08-06 Thread Zhao Lei
Reviewed-by: Zhao Lei zhao...@cn.fujitsu.com

Thanks
Zhaolei

 -Original Message-
 From: Tsutomu Itoh [mailto:t-i...@jp.fujitsu.com]
 Sent: Friday, August 07, 2015 8:20 AM
 To: linux-btrfs@vger.kernel.org
 Cc: Zhao Lei
 Subject: [PATCH V2] btrfs-progs: add newline to some error messages
 
 Added a missing newline to some error messages.
 Also printf() was changed to fprintf(stderr) for error message.
 
 Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com
 ---
  btrfs-corrupt-block.c |  2 +-
  cmds-check.c  |  4 ++--
  cmds-send.c   |  4 ++--
  dir-item.c|  6 +++---
  free-space-cache.c| 24 +++-
  mkfs.c|  2 +-
  6 files changed, 24 insertions(+), 18 deletions(-)
 
 diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 
 1a2aa23..ea871f4
 100644
 --- a/btrfs-corrupt-block.c
 +++ b/btrfs-corrupt-block.c
 @@ -1010,7 +1010,7 @@ int find_chunk_offset(struct btrfs_root *root,
   goto out;
   }
   if (ret  0) {
 - fprintf(stderr, Error searching chunk);
 + fprintf(stderr, Error searching chunk\n);
   goto out;
   }
  out:
 diff --git a/cmds-check.c b/cmds-check.c index 50bb6f3..d0ffc94 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c
 @@ -2399,7 +2399,7 @@ static int repair_inode_nlinks(struct
 btrfs_trans_handle *trans,
 BTRFS_FIRST_FREE_OBJECTID, lost_found_ino,
 mode);
   if (ret  0) {
 - fprintf(stderr, Failed to create '%s' dir: %s,
 + fprintf(stderr, Failed to create '%s' dir: %s\n,
   dir_name, strerror(-ret));
   goto out;
   }
 @@ -2427,7 +2427,7 @@ static int repair_inode_nlinks(struct
 btrfs_trans_handle *trans,
   }
   if (ret  0) {
   fprintf(stderr,
 - Failed to link the inode %llu to %s dir: %s,
 + Failed to link the inode %llu to %s dir: %s\n,
   rec-ino, dir_name, strerror(-ret));
   goto out;
   }
 diff --git a/cmds-send.c b/cmds-send.c
 index a0b7f95..6f2f340 100644
 --- a/cmds-send.c
 +++ b/cmds-send.c
 @@ -193,13 +193,13 @@ static int write_buf(int fd, const void *buf, int size)
   ret = write(fd, (char*)buf + pos, size - pos);
   if (ret  0) {
   ret = -errno;
 - fprintf(stderr, ERROR: failed to dump stream. %s,
 + fprintf(stderr, ERROR: failed to dump stream. %s\n,
   strerror(-ret));
   goto out;
   }
   if (!ret) {
   ret = -EIO;
 - fprintf(stderr, ERROR: failed to dump stream. %s,
 + fprintf(stderr, ERROR: failed to dump stream. %s\n,
   strerror(-ret));
   goto out;
   }
 diff --git a/dir-item.c b/dir-item.c
 index a5bf861..f3ad98f 100644
 --- a/dir-item.c
 +++ b/dir-item.c
 @@ -285,7 +285,7 @@ int verify_dir_item(struct btrfs_root *root,
   u8 type = btrfs_dir_type(leaf, dir_item);
 
   if (type = BTRFS_FT_MAX) {
 - fprintf(stderr, invalid dir item type: %d,
 + fprintf(stderr, invalid dir item type: %d\n,
  (int)type);
   return 1;
   }
 @@ -294,7 +294,7 @@ int verify_dir_item(struct btrfs_root *root,
   namelen = XATTR_NAME_MAX;
 
   if (btrfs_dir_name_len(leaf, dir_item)  namelen) {
 - fprintf(stderr, invalid dir item name len: %u,
 + fprintf(stderr, invalid dir item name len: %u\n,
  (unsigned)btrfs_dir_data_len(leaf, dir_item));
   return 1;
   }
 @@ -302,7 +302,7 @@ int verify_dir_item(struct btrfs_root *root,
   /* BTRFS_MAX_XATTR_SIZE is the same for all dir items */
   if ((btrfs_dir_data_len(leaf, dir_item) +
btrfs_dir_name_len(leaf, dir_item)) 
 BTRFS_MAX_XATTR_SIZE(root)) {
 - fprintf(stderr, invalid dir item name + data len: %u + %u,
 + fprintf(stderr, invalid dir item name + data len: %u + %u\n,
  (unsigned)btrfs_dir_name_len(leaf, dir_item),
  (unsigned)btrfs_dir_data_len(leaf, dir_item));
   return 1;
 diff --git a/free-space-cache.c b/free-space-cache.c index 67f00fd..19ab0c9
 100644
 --- a/free-space-cache.c
 +++ b/free-space-cache.c
 @@ -107,7 +107,8 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl,
 struct btrfs_root *root,
 
   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
   if (ret) {
 - printf(Couldn't find file extent item for free space inode
 + fprintf(stderr,
 +Couldn't find file extent item for free 

Re: Removing bad hdd from btrfs volume

2015-08-06 Thread Duncan
Peter Foley posted on Thu, 06 Aug 2015 15:17:04 -0700 as excerpted:

 I have an btrfs volume that spans multiple disks (no raid, just single),
 and earlier this morning I hit some hardware problems with one of the
 disks.
 I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
 1gb that appears to be causing the read errors.
 See http://sprunge.us/aeZC Is there some way to figure out which file(s)
 are affected, and if they are stuff I don't care about, is there some
 way to force btrfs to lose the 1gb it can't copy off of the failing
 hdd?

Of course that's the classic raid0 trap (with btrfs multi-device single 
being effectively a raid0 with really big stripes).  Raid0 is (ideally) 
never supposed to be used for data that isn't throw-away, either because 
it's literally no-care data, or because there's backups kept 
appropriately updated, as it's generally considered as good as dead the 
moment one device fails or even really starts to go bad.

So ideally, with one device starting to go bad, you scrap the entire 
filesystem, remove the bad device (or trigger sector remap and reuse, but 
that's dangerous as once sectors start to go, generally the badness 
spreads so the entire device can't be considered trustworthy again), and 
mkfs a new filesystem on the remaining devices, with a replacement device 
thrown in as well if desired.

But sometimes the world isn't ideal; on the arguably more practical 
side... Most of my btrfs are raid1, both data/metadata, with the 
remainder being mixed-bg dup, so I've never tried this on single, 
personally, but...

First, you didn't mention versions so be sure you're current, btrfs-progs 
v4.1.2 is current on the user side, kernel 4.1.x (which you appear to 
have, based on the dmesg, BTW, gentoo here too =:^), or 4.2-rc5+ since 
4.2 is close to release now, is current on the kernel side.

Try btrfs scrub.  Assuming a current btrfs-progs, that should correct 
errors in the metadata, which should be raid1 and thus have a second 
hopefully valid copy to read from.  It should detect but not be able to 
correct errors in the single mode data, but should tell you what files 
the errors are in (I believe very old btrfs-progs scrub did not).

Armed with a list of the files with errors, you should be able to delete 
them.  Once all such files are deleted, the 1 GiB chunk that they were in 
should be empty, and a btrfs balance -dusage=0 should eliminate it.

At that point a btrfs dev del should work.

That's the theory, anyway.  As I said, I've not tried it myself.  But 
it's what I'd try if I did have single-mode data on anything and found 
myself in that situation.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] btrfs-progs: add newline to some error messages

2015-08-06 Thread Tsutomu Itoh
Added a missing newline to some error messages.
Also printf() was changed to fprintf(stderr) for error message.

Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com
---
 btrfs-corrupt-block.c |  2 +-
 cmds-check.c  |  4 ++--
 cmds-send.c   |  4 ++--
 dir-item.c|  6 +++---
 free-space-cache.c| 24 +++-
 mkfs.c|  2 +-
 6 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 1a2aa23..ea871f4 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1010,7 +1010,7 @@ int find_chunk_offset(struct btrfs_root *root,
goto out;
}
if (ret  0) {
-   fprintf(stderr, Error searching chunk);
+   fprintf(stderr, Error searching chunk\n);
goto out;
}
 out:
diff --git a/cmds-check.c b/cmds-check.c
index 50bb6f3..d0ffc94 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2399,7 +2399,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle 
*trans,
  BTRFS_FIRST_FREE_OBJECTID, lost_found_ino,
  mode);
if (ret  0) {
-   fprintf(stderr, Failed to create '%s' dir: %s,
+   fprintf(stderr, Failed to create '%s' dir: %s\n,
dir_name, strerror(-ret));
goto out;
}
@@ -2427,7 +2427,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle 
*trans,
}
if (ret  0) {
fprintf(stderr,
-   Failed to link the inode %llu to %s dir: %s,
+   Failed to link the inode %llu to %s dir: %s\n,
rec-ino, dir_name, strerror(-ret));
goto out;
}
diff --git a/cmds-send.c b/cmds-send.c
index a0b7f95..6f2f340 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -193,13 +193,13 @@ static int write_buf(int fd, const void *buf, int size)
ret = write(fd, (char*)buf + pos, size - pos);
if (ret  0) {
ret = -errno;
-   fprintf(stderr, ERROR: failed to dump stream. %s,
+   fprintf(stderr, ERROR: failed to dump stream. %s\n,
strerror(-ret));
goto out;
}
if (!ret) {
ret = -EIO;
-   fprintf(stderr, ERROR: failed to dump stream. %s,
+   fprintf(stderr, ERROR: failed to dump stream. %s\n,
strerror(-ret));
goto out;
}
diff --git a/dir-item.c b/dir-item.c
index a5bf861..f3ad98f 100644
--- a/dir-item.c
+++ b/dir-item.c
@@ -285,7 +285,7 @@ int verify_dir_item(struct btrfs_root *root,
u8 type = btrfs_dir_type(leaf, dir_item);
 
if (type = BTRFS_FT_MAX) {
-   fprintf(stderr, invalid dir item type: %d,
+   fprintf(stderr, invalid dir item type: %d\n,
   (int)type);
return 1;
}
@@ -294,7 +294,7 @@ int verify_dir_item(struct btrfs_root *root,
namelen = XATTR_NAME_MAX;
 
if (btrfs_dir_name_len(leaf, dir_item)  namelen) {
-   fprintf(stderr, invalid dir item name len: %u,
+   fprintf(stderr, invalid dir item name len: %u\n,
   (unsigned)btrfs_dir_data_len(leaf, dir_item));
return 1;
}
@@ -302,7 +302,7 @@ int verify_dir_item(struct btrfs_root *root,
/* BTRFS_MAX_XATTR_SIZE is the same for all dir items */
if ((btrfs_dir_data_len(leaf, dir_item) +
 btrfs_dir_name_len(leaf, dir_item))  BTRFS_MAX_XATTR_SIZE(root)) {
-   fprintf(stderr, invalid dir item name + data len: %u + %u,
+   fprintf(stderr, invalid dir item name + data len: %u + %u\n,
   (unsigned)btrfs_dir_name_len(leaf, dir_item),
   (unsigned)btrfs_dir_data_len(leaf, dir_item));
return 1;
diff --git a/free-space-cache.c b/free-space-cache.c
index 67f00fd..19ab0c9 100644
--- a/free-space-cache.c
+++ b/free-space-cache.c
@@ -107,7 +107,8 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, 
struct btrfs_root *root,
 
ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
if (ret) {
-   printf(Couldn't find file extent item for free space inode
+   fprintf(stderr,
+  Couldn't find file extent item for free space inode
%Lu\n, ino);
btrfs_release_path(path);
return -EINVAL;
@@ -138,7 +139,7 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, 
struct btrfs_root *root,
struct 

Re: Data single *and* raid?

2015-08-06 Thread Qu Wenruo



Hendrik Friedel wrote on 2015/08/06 20:57 +0200:

Hello Hugo,
hello Chris,

thanks for your advice. Now I am here:
btrfs balance start -dprofiles=single -mprofiles=raid1
/mnt/__Complete_Disk/
Done, had to relocate 0 out of 3939 chunks


root@homeserver:/mnt/__Complete_Disk# btrfs fi show
Label: none  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
 Total devices 3 FS bytes used 3.78TiB
 devid1 size 2.73TiB used 2.72TiB path /dev/sde
 devid2 size 2.73TiB used 2.23TiB path /dev/sdc
 devid3 size 2.73TiB used 2.73TiB path /dev/sdd

btrfs-progs v4.1.1


So, that looks good.

But then:
root@homeserver:/mnt/__Complete_Disk# btrfs fi df /mnt/__Complete_Disk/
Data, RAID5: total=3.83TiB, used=3.78TiB
System, RAID5: total=32.00MiB, used=576.00KiB
Metadata, RAID5: total=6.46GiB, used=4.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
GlobalReserve is not a chunk type, it just means a range of metadata 
reserved for overcommiting.

And it's always single.

Personally, I don't think it should be output in fi df command, as 
it's in a higher level than chunk.


At least for your case, nothing is needed to worry about.

Thanks,
Qu



Is the RAID5 expected here?
I did not yet run:
btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft
/mnt/new_storage/

Regards,
Hendrik


On 01.08.2015 22:44, Chris Murphy wrote:

On Sat, Aug 1, 2015 at 2:32 PM, Hugo Mills h...@carfax.org.uk wrote:

On Sat, Aug 01, 2015 at 10:09:35PM +0200, Hendrik Friedel wrote:

Hello,

I converted an array to raid5 by
btrfs device add /dev/sdd /mnt/new_storage
btrfs device add /dev/sdc /mnt/new_storage
btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/new_storage/

The Balance went through. But now:
Label: none  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
 Total devices 3 FS bytes used 5.28TiB
 devid1 size 2.73TiB used 2.57TiB path /dev/sde
 devid2 size 2.73TiB used 2.73TiB path /dev/sdc
 devid3 size 2.73TiB used 2.73TiB path /dev/sdd
btrfs-progs v4.1.1

Already the 2.57TiB is a bit surprising:
root@homeserver:/mnt# btrfs fi df /mnt/new_storage/
Data, single: total=2.55TiB, used=2.55TiB
Data, RAID5: total=2.73TiB, used=2.72TiB
System, RAID5: total=32.00MiB, used=736.00KiB
Metadata, RAID1: total=6.00GiB, used=5.33GiB
Metadata, RAID5: total=3.00GiB, used=2.99GiB


Looking at the btrfs fi show output, you've probably run out of
space during the conversion, probably due to an uneven distribution of
the original single chunks.

I think I would suggest balancing the single chunks, and trying the
conversion (of the unconverted parts) again:

# btrfs balance start -dprofiles=single -mprofile=raid1
/mnt/new_storage/
# btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft
/mnt/new_storage/



Yep I bet that's it also. btrfs fi usage might be better at exposing
this case.






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add newline to some error messages

2015-08-06 Thread Tsutomu Itoh

On 2015/08/06 15:07, Zhao Lei wrote:

Hi, Itho-san


-Original Message-
From: Tsutomu Itoh [mailto:t-i...@jp.fujitsu.com]
Sent: Thursday, August 06, 2015 12:01 PM
To: Zhao Lei; linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs-progs: add newline to some error messages

On 2015/08/06 12:51, Zhao Lei wrote:

Hi, Itoh


-Original Message-
From: linux-btrfs-ow...@vger.kernel.org
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Tsutomu Itoh
Sent: Thursday, August 06, 2015 11:06 AM
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] btrfs-progs: add newline to some error messages

Added a missing newline to some error messages.


Good found!

Seems more code need to be fixed, as:

# cat mkfs.c | tr -d '\n' | grep -o -w 'f\?printf([^(]*);' | sed

's/f\?printf[[:blank:]]*(\(stderr,\|\)[[:blank:]]*\(.*\)[,)].*/\2/g' | grep 
-v '\\n'
symlink too long for %s

Incompat features:  %s
#


It's OK.

printf(Incompat features:  %s, features_buf);
printf(\n);



# cat utils.c | tr -d '\n' | grep -o -w 'f\?printf([^(]*);' | sed

's/f\?printf[[:blank:]]*(\(stderr,\|\)[[:blank:]]*\(.*\)[,)].*/\2/g' | grep 
-v '\\n'

ERROR: DUP for data is allowed only in mixed mode %s [y/N]: *1 #
*1: It is not problem, should to be ignored


Already fixed by David in devel branch.


Got it.

I run above script for all .c files, nearly all are fixed by this patch,
except this:

free-space-cache.c
   Duplicate entries in free space cache, dumping
   Duplicate entries in free space cache, dumping
   block group %llu has wrong amount of free space

Above message seems having these problem:
1: lack of '\n'
2: better to use fprintf(stderr,
3: there is dumping in message, but I havn't see
   dump code in source.


I will send V2 patch, soon,

Thanks,
Tsutomu



Thanks
Zhaolei


Thanks,
Tsutomu



Thanks
Zhaolei


Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com
---
   btrfs-corrupt-block.c | 2 +-
   cmds-check.c  | 4 ++--
   cmds-send.c   | 4 ++--
   dir-item.c| 6 +++---
   mkfs.c| 2 +-
   5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index
1a2aa23..ea871f4
100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1010,7 +1010,7 @@ int find_chunk_offset(struct btrfs_root *root,
goto out;
}
if (ret  0) {
-   fprintf(stderr, Error searching chunk);
+   fprintf(stderr, Error searching chunk\n);
goto out;
}
   out:
diff --git a/cmds-check.c b/cmds-check.c index dd2fce3..0ddf57c
100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2398,7 +2398,7 @@ static int repair_inode_nlinks(struct
btrfs_trans_handle *trans,
  BTRFS_FIRST_FREE_OBJECTID, lost_found_ino,
  mode);
if (ret  0) {
-   fprintf(stderr, Failed to create '%s' dir: %s,
+   fprintf(stderr, Failed to create '%s' dir: %s\n,
dir_name, strerror(-ret));
goto out;
}
@@ -2426,7 +2426,7 @@ static int repair_inode_nlinks(struct
btrfs_trans_handle *trans,
}
if (ret  0) {
fprintf(stderr,
-   Failed to link the inode %llu to %s dir: %s,
+   Failed to link the inode %llu to %s dir: %s\n,
rec-ino, dir_name, strerror(-ret));
goto out;
}
diff --git a/cmds-send.c b/cmds-send.c index 20bba18..78ee54c 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -192,13 +192,13 @@ static int write_buf(int fd, const void *buf, int

size)

ret = write(fd, (char*)buf + pos, size - pos);
if (ret  0) {
ret = -errno;
-   fprintf(stderr, ERROR: failed to dump stream. %s,
+   fprintf(stderr, ERROR: failed to dump stream. %s\n,
strerror(-ret));
goto out;
}
if (!ret) {
ret = -EIO;
-   fprintf(stderr, ERROR: failed to dump stream. %s,
+   fprintf(stderr, ERROR: failed to dump stream. %s\n,
strerror(-ret));
goto out;
}
diff --git a/dir-item.c b/dir-item.c
index a5bf861..f3ad98f 100644
--- a/dir-item.c
+++ b/dir-item.c
@@ -285,7 +285,7 @@ int verify_dir_item(struct btrfs_root *root,
u8 type = btrfs_dir_type(leaf, dir_item);

if (type = BTRFS_FT_MAX) {
-   fprintf(stderr, invalid dir item type: %d,
+   fprintf(stderr, invalid dir item type: %d\n,
   (int)type);
return 1;
}
@@ -294,7 +294,7 @@ int verify_dir_item(struct btrfs_root 

[PATCH] btrfs: Remove unnecessary variants in relocation.c

2015-08-06 Thread Zhao Lei
These arguments are not used in functions, remove them for cleanup
and make kernel stack happy.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |  3 +--
 fs/btrfs/relocation.c  | 13 +
 fs/btrfs/transaction.c |  2 +-
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f57e6ca..f335c18 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -4185,8 +4185,7 @@ int btrfs_reloc_clone_csums(struct inode *inode, u64 
file_pos, u64 len);
 int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, struct extent_buffer *buf,
  struct extent_buffer *cow);
-void btrfs_reloc_pre_snapshot(struct btrfs_trans_handle *trans,
- struct btrfs_pending_snapshot *pending,
+void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
  u64 *bytes_to_reserve);
 int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
  struct btrfs_pending_snapshot *pending);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 4698928..303babe 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2523,8 +2523,7 @@ struct btrfs_root *select_reloc_root(struct 
btrfs_trans_handle *trans,
  * counted. return -ENOENT if the block is root of reloc tree.
  */
 static noinline_for_stack
-struct btrfs_root *select_one_root(struct btrfs_trans_handle *trans,
-  struct backref_node *node)
+struct btrfs_root *select_one_root(struct backref_node *node)
 {
struct backref_node *next;
struct btrfs_root *root;
@@ -2912,7 +2911,7 @@ static int relocate_tree_block(struct btrfs_trans_handle 
*trans,
return 0;
 
BUG_ON(node-processed);
-   root = select_one_root(trans, node);
+   root = select_one_root(node);
if (root == ERR_PTR(-ENOENT)) {
update_processed_blocks(rc, node);
goto out;
@@ -3755,8 +3754,7 @@ out:
  * helper to find next unprocessed extent
  */
 static noinline_for_stack
-int find_next_extent(struct btrfs_trans_handle *trans,
-struct reloc_control *rc, struct btrfs_path *path,
+int find_next_extent(struct reloc_control *rc, struct btrfs_path *path,
 struct btrfs_key *extent_key)
 {
struct btrfs_key key;
@@ -3951,7 +3949,7 @@ restart:
continue;
}
 
-   ret = find_next_extent(trans, rc, path, key);
+   ret = find_next_extent(rc, path, key);
if (ret  0)
err = ret;
if (ret != 0)
@@ -4596,8 +4594,7 @@ int btrfs_reloc_cow_block(struct btrfs_trans_handle 
*trans,
  * called before creating snapshot. it calculates metadata reservation
  * requried for relocating tree blocks in the snapshot
  */
-void btrfs_reloc_pre_snapshot(struct btrfs_trans_handle *trans,
- struct btrfs_pending_snapshot *pending,
+void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
  u64 *bytes_to_reserve)
 {
struct btrfs_root *root;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c0f18e7..049613c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1301,7 +1301,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 */
btrfs_set_skip_qgroup(trans, objectid);
 
-   btrfs_reloc_pre_snapshot(trans, pending, to_reserve);
+   btrfs_reloc_pre_snapshot(pending, to_reserve);
 
if (to_reserve  0) {
pending-error = btrfs_block_rsv_add(root,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bedup --defrag freezing

2015-08-06 Thread Austin S Hemmelgarn

On 2015-08-05 17:45, Konstantin Svist wrote:

Hi,

I've been running btrfs on Fedora for a while now, with bedup --defrag
running in a night-time cronjob.
Last few runs seem to have gotten stuck, without possibility of even
killing the process (kill -9 doesn't work) -- all I could do is hard
power cycle.

Did something change recently? Is bedup simply too out of date? What
should I use to de-duplicate across snapshots instead? Etc.?

AFAIK, bedup hasn't been actively developed for quite a while (I'm 
actually kind of surprised it runs with the newest btrfs-progs). 
Personally, I'd suggest using duperemove 
(https://github.com/markfasheh/duperemove).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] fstests: generic test for fsync of file with multiple links

2015-08-06 Thread Eryu Guan
On Thu, Aug 06, 2015 at 05:11:30AM +0100, fdman...@kernel.org wrote:
 From: Filipe Manana fdman...@suse.com
 
 Test that when we have a file with multiple hard links belonging to
 different parent directories, if we remove one of those links, fsync the
 file using one of its other links (that has a parent directory different
 from the one we removed a link from), power fail and then replay the
 fsync log/journal, the hard link we removed is not available anymore and
 all the filesystem metadata is in a consistent state.

Looks good to me, just one minor question below

 
 This test is motivated by an issue found in btrfs, where the test fails
 with:
 
   generic/107 2s ... - output mismatch (see .../results/generic/107.out.bad)
 --- tests/generic/107.out 2015-08-04 09:47:46.922131256 +0100
 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad
 @@ -1,3 +1,5 @@
  QA output created by 107
  Entries in testdir:
  foo2
 +foo3
 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': 
 Directory not empty
 ...
 (Run 'diff -u tests/generic/107.out .../generic/107.out.bad'  to see the 
 entire diff)
   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see 
 .../generic/107.full)
   _check_dmesg: something found in dmesg (see .../generic/107.dmesg)
 
   $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full
   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
   *** fsck.btrfs output ***
   checking extents
   checking free space cache
   checking fs roots
   root 5 inode 257 errors 200, dir isize wrong
   unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 \
   errors 5, no dir item, no inode ref
 
   $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.dmesg
   (...)
   [188897.707311] BTRFS info (device dm-0): failed to delete reference to \
 foo3, inode 258 parent 257
   [188897.711345] [ cut here ]
   [188897.713369] WARNING: CPU: 10 PID: 19452 at fs/btrfs/inode.c:3956 \
 __btrfs_unlink_inode+0x182/0x35a [btrfs]()
   [188897.717661] BTRFS: Transaction aborted (error -2)
   (...)
   [188897.747898] Call Trace:
   [188897.748519]  [8145f077] dump_stack+0x4f/0x7b
   [188897.749602]  [81095de5] ? console_unlock+0x356/0x3a2
   [188897.750682]  [8104b3b0] warn_slowpath_common+0xa1/0xbb
   [188897.751936]  [a04c5d09] ? __btrfs_unlink_inode+0x182/0x35a 
 [btrfs]
   [188897.753485]  [8104b410] warn_slowpath_fmt+0x46/0x48
   [188897.754781]  [a04c5d09] __btrfs_unlink_inode+0x182/0x35a 
 [btrfs]
   [188897.756295]  [a04c6e8f] btrfs_unlink_inode+0x1e/0x40 [btrfs]
   [188897.757692]  [a04c6f11] btrfs_unlink+0x60/0x9b [btrfs]
   [188897.758978]  [8116fb48] vfs_unlink+0x9c/0xed
   [188897.760151]  [81173481] do_unlinkat+0x12b/0x1fb
   [188897.761354]  [81253855] ? lockdep_sys_exit_thunk+0x12/0x14
   [188897.762692]  [81174056] SyS_unlinkat+0x29/0x2b
   [188897.763741]  [81465197] system_call_fastpath+0x12/0x6f
   [188897.764894] ---[ end trace bbfddacb7aaada8c ]---
   [188897.765801] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: \
 Aborting unused transaction(No such entry).
 
 Tested against ext3/4, xfs, reiserfs and f2fs too, and all these
 filesystems currently pass this test (on a 4.1 linux kernel at least).
 
 The btrfs issue is fixed by the linux kernel patch titled:
 Btrfs: fix stale dir entries after removing a link and fsync.
 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  tests/generic/107 | 99 
 +++
  tests/generic/107.out |  3 ++
  tests/generic/group   |  1 +
  3 files changed, 103 insertions(+)
  create mode 100755 tests/generic/107
  create mode 100644 tests/generic/107.out
 
 diff --git a/tests/generic/107 b/tests/generic/107
 new file mode 100755
 index 000..7d107d7
 --- /dev/null
 +++ b/tests/generic/107
 @@ -0,0 +1,99 @@
 +#! /bin/bash
 +# FSQA Test No. 107
 +#
 +# Test that when we have a file with multiple hard links belonging to 
 different
 +# parent directories, if we remove one of those links, fsync the file using 
 one
 +# of its other links (that has a parent directory different from the one we
 +# removed a link from), power fail and then replay the fsync log/journal, the
 +# hard link we removed is not available anymore and all the filesystem 
 metadata
 +# is in a consistent state.
 +#
 +#---
 +#
 +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
 +# Author: Filipe Manana fdman...@suse.com
 +#
 +# This program is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU General Public License as
 +# published by the Free Software Foundation.
 +#
 +# This program is distributed in the hope that it would be useful,
 +# but WITHOUT ANY 

Re: [PATCH] fstests: generic test for fsync of file with multiple links

2015-08-06 Thread Filipe Manana
On Thu, Aug 6, 2015 at 12:46 PM, Eryu Guan eg...@redhat.com wrote:
 On Thu, Aug 06, 2015 at 05:11:30AM +0100, fdman...@kernel.org wrote:
 From: Filipe Manana fdman...@suse.com

 Test that when we have a file with multiple hard links belonging to
 different parent directories, if we remove one of those links, fsync the
 file using one of its other links (that has a parent directory different
 from the one we removed a link from), power fail and then replay the
 fsync log/journal, the hard link we removed is not available anymore and
 all the filesystem metadata is in a consistent state.

 Looks good to me, just one minor question below


 This test is motivated by an issue found in btrfs, where the test fails
 with:

   generic/107 2s ... - output mismatch (see .../results/generic/107.out.bad)
 --- tests/generic/107.out 2015-08-04 09:47:46.922131256 +0100
 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad
 @@ -1,3 +1,5 @@
  QA output created by 107
  Entries in testdir:
  foo2
 +foo3
 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': 
 Directory not empty
 ...
 (Run 'diff -u tests/generic/107.out .../generic/107.out.bad'  to see the 
 entire diff)
   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see 
 .../generic/107.full)
   _check_dmesg: something found in dmesg (see .../generic/107.dmesg)

   $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full
   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
   *** fsck.btrfs output ***
   checking extents
   checking free space cache
   checking fs roots
   root 5 inode 257 errors 200, dir isize wrong
   unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 \
   errors 5, no dir item, no inode ref

   $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.dmesg
   (...)
   [188897.707311] BTRFS info (device dm-0): failed to delete reference to \
 foo3, inode 258 parent 257
   [188897.711345] [ cut here ]
   [188897.713369] WARNING: CPU: 10 PID: 19452 at fs/btrfs/inode.c:3956 \
 __btrfs_unlink_inode+0x182/0x35a [btrfs]()
   [188897.717661] BTRFS: Transaction aborted (error -2)
   (...)
   [188897.747898] Call Trace:
   [188897.748519]  [8145f077] dump_stack+0x4f/0x7b
   [188897.749602]  [81095de5] ? console_unlock+0x356/0x3a2
   [188897.750682]  [8104b3b0] warn_slowpath_common+0xa1/0xbb
   [188897.751936]  [a04c5d09] ? __btrfs_unlink_inode+0x182/0x35a 
 [btrfs]
   [188897.753485]  [8104b410] warn_slowpath_fmt+0x46/0x48
   [188897.754781]  [a04c5d09] __btrfs_unlink_inode+0x182/0x35a 
 [btrfs]
   [188897.756295]  [a04c6e8f] btrfs_unlink_inode+0x1e/0x40 [btrfs]
   [188897.757692]  [a04c6f11] btrfs_unlink+0x60/0x9b [btrfs]
   [188897.758978]  [8116fb48] vfs_unlink+0x9c/0xed
   [188897.760151]  [81173481] do_unlinkat+0x12b/0x1fb
   [188897.761354]  [81253855] ? lockdep_sys_exit_thunk+0x12/0x14
   [188897.762692]  [81174056] SyS_unlinkat+0x29/0x2b
   [188897.763741]  [81465197] system_call_fastpath+0x12/0x6f
   [188897.764894] ---[ end trace bbfddacb7aaada8c ]---
   [188897.765801] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: \
 Aborting unused transaction(No such entry).

 Tested against ext3/4, xfs, reiserfs and f2fs too, and all these
 filesystems currently pass this test (on a 4.1 linux kernel at least).

 The btrfs issue is fixed by the linux kernel patch titled:
 Btrfs: fix stale dir entries after removing a link and fsync.

 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  tests/generic/107 | 99 
 +++
  tests/generic/107.out |  3 ++
  tests/generic/group   |  1 +
  3 files changed, 103 insertions(+)
  create mode 100755 tests/generic/107
  create mode 100644 tests/generic/107.out

 diff --git a/tests/generic/107 b/tests/generic/107
 new file mode 100755
 index 000..7d107d7
 --- /dev/null
 +++ b/tests/generic/107
 @@ -0,0 +1,99 @@
 +#! /bin/bash
 +# FSQA Test No. 107
 +#
 +# Test that when we have a file with multiple hard links belonging to 
 different
 +# parent directories, if we remove one of those links, fsync the file using 
 one
 +# of its other links (that has a parent directory different from the one we
 +# removed a link from), power fail and then replay the fsync log/journal, 
 the
 +# hard link we removed is not available anymore and all the filesystem 
 metadata
 +# is in a consistent state.
 +#
 +#---
 +#
 +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
 +# Author: Filipe Manana fdman...@suse.com
 +#
 +# This program is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU General Public License as
 +# published by the Free Software Foundation.
 +#
 +# This program is distributed 

Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-06 Thread Austin S Hemmelgarn

On 2015-08-05 22:13, Chris Murphy wrote:

On Wed, Aug 5, 2015 at 6:45 PM, Paul Jones p...@pauljones.id.au wrote:

Would it be possible to store this type of critical information twice on each 
disk, at the beginning and end? I thought BTRFS already did that, but I might 
be thinking of some other filesystem. I've had my share of these types of oops! 
moments as well.


That option is metadata profile raid1. To do an automatic
-mconvert=raid1 when the user does 'btrfs device add' breaks any use
case where you want to temporarily add a small device, maybe a USB
stick, and now hundreds of MiBs possibly GiBs of metadata have to be
copied over to this device without warning. It could be made smart,
autoconvert to raid1 when the added device is at least 4x the size of
metadata allocation, but then that makes it inconsistent. OK so it
could be made interactive, but now that breaks scripts. So... where do
you draw the line?

Maybe this would work if the system chunk only was raid1? I don't know
what the minimum necessary information is for such a case.

Possibly it make more sense if 'btrfs device add' always does
-dconvert=raid1 unless a --quick option is passed?


Perhaps we could print out a big noisy warning that could be silenced?



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Why subvolume and not just volume?

2015-08-06 Thread Austin S Hemmelgarn

On 2015-08-06 03:23, Duncan wrote:

Martin posted on Wed, 05 Aug 2015 09:06:40 +0200 as excerpted:


[W]hat is the penalty of a subvolume compared to a directory? From a
design perspective, couldn't all directories just be subvolumes?


In addition to the performance issues mentioned by others, there's at
least one further practical reason as well.

Snapshots stop at subvolume boundaries.  It's thus quite useful to use
subvolumes to delineate the limits of the snapshot, saying, in effect,
snapshot this dir (which happens to be a subvol not just a normal dir)
recursively, but don't snapshot the subtree starting with this nested
subdir (which again is a (different) subvol).

And for some people, this is very useful functionality.  I use it to 
specifically exclude subsets of trivially reproducible data from backups 
(for example, I always clone public git repositories into individual 
subvolumes, and keep my local copy of the Portage tree on a separate one 
(when it isn't on a separate filesystem that is)).




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/6] sysfs-part2 Add seed device representation on the sysfs

2015-08-06 Thread David Sterba
On Thu, Aug 06, 2015 at 05:51:16AM +0800, Anand Jain wrote:
...
   these can go in.
 
 Btrfs: sysfs: support seed devices in the sysfs layout
 
  Sorry for late reply, the patches look good. I'm going to prepare a
  branch for pull into 4.3. Thanks.
 
 I suggested if this can wait.
 on the 2nd thought, I am preparing to conduct a survey to know most 
 preferred sysfs layout for btrfs.
   mainly between
   one, less invasive overlays on the existing layout (current method).
   the other, separates FS and Volume attributes (old method).
 
   sorry that I am going back a bit, but i think its worth as these API 
 are forever.

Understood.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data single *and* raid?

2015-08-06 Thread Hendrik Friedel

Hello Hugo,
hello Chris,

thanks for your advice. Now I am here:
btrfs balance start -dprofiles=single -mprofiles=raid1 /mnt/__Complete_Disk/
Done, had to relocate 0 out of 3939 chunks


root@homeserver:/mnt/__Complete_Disk# btrfs fi show
Label: none  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
Total devices 3 FS bytes used 3.78TiB
devid1 size 2.73TiB used 2.72TiB path /dev/sde
devid2 size 2.73TiB used 2.23TiB path /dev/sdc
devid3 size 2.73TiB used 2.73TiB path /dev/sdd

btrfs-progs v4.1.1


So, that looks good.

But then:
root@homeserver:/mnt/__Complete_Disk# btrfs fi df /mnt/__Complete_Disk/
Data, RAID5: total=3.83TiB, used=3.78TiB
System, RAID5: total=32.00MiB, used=576.00KiB
Metadata, RAID5: total=6.46GiB, used=4.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Is the RAID5 expected here?
I did not yet run:
btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft 
/mnt/new_storage/


Regards,
Hendrik


On 01.08.2015 22:44, Chris Murphy wrote:

On Sat, Aug 1, 2015 at 2:32 PM, Hugo Mills h...@carfax.org.uk wrote:

On Sat, Aug 01, 2015 at 10:09:35PM +0200, Hendrik Friedel wrote:

Hello,

I converted an array to raid5 by
btrfs device add /dev/sdd /mnt/new_storage
btrfs device add /dev/sdc /mnt/new_storage
btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/new_storage/

The Balance went through. But now:
Label: none  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
 Total devices 3 FS bytes used 5.28TiB
 devid1 size 2.73TiB used 2.57TiB path /dev/sde
 devid2 size 2.73TiB used 2.73TiB path /dev/sdc
 devid3 size 2.73TiB used 2.73TiB path /dev/sdd
btrfs-progs v4.1.1

Already the 2.57TiB is a bit surprising:
root@homeserver:/mnt# btrfs fi df /mnt/new_storage/
Data, single: total=2.55TiB, used=2.55TiB
Data, RAID5: total=2.73TiB, used=2.72TiB
System, RAID5: total=32.00MiB, used=736.00KiB
Metadata, RAID1: total=6.00GiB, used=5.33GiB
Metadata, RAID5: total=3.00GiB, used=2.99GiB


Looking at the btrfs fi show output, you've probably run out of
space during the conversion, probably due to an uneven distribution of
the original single chunks.

I think I would suggest balancing the single chunks, and trying the
conversion (of the unconverted parts) again:

# btrfs balance start -dprofiles=single -mprofile=raid1 /mnt/new_storage/
# btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft 
/mnt/new_storage/



Yep I bet that's it also. btrfs fi usage might be better at exposing this case.





--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: Fix wrong comment of btrfs_alloc_tree_block()

2015-08-06 Thread Zhao Lei
These wrong comment was copyed from another function(expired) from
init, this patch fixed them.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a436bd5..792247f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7536,9 +7536,6 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
 
 /*
  * finds a free extent and does all the dirty work required for allocation
- * returns the key for the extent through ins, and a tree buffer for
- * the first block of the extent through buf.
- *
  * returns the tree buffer or an ERR_PTR on error.
  */
 struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: abort transaction on btrfs_reloc_cow_block()

2015-08-06 Thread Zhao Lei
When btrfs_reloc_cow_block() failed in __btrfs_cow_block(), current
code just return a err-value to caller, but leave new_created extent
buffer exist and locked.

Then subsequent code (in relocate) try to lock above eb again,
and caused deadlock without any dmesg.
(eb lock use wait_event(), so no lockdep message)

It is hard to do recover work in __btrfs_cow_block() at this error
point, but we can abort transaction to avoid deadlock and operate on
unstable state.a

It also helps developer to find wrong place quickly.
(better than a frozen fs without any dmesg before patch)

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/ctree.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 54114b4..5f745ea 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1159,8 +1159,10 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
 
if (test_bit(BTRFS_ROOT_REF_COWS, root-state)) {
ret = btrfs_reloc_cow_block(trans, root, buf, cow);
-   if (ret)
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
return ret;
+   }
}
 
if (buf == root-node) {
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: Remove root argument in extent_data_ref_count()

2015-08-06 Thread Zhao Lei
Because it is never used.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 792247f..5f7cbd7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1316,8 +1316,7 @@ static noinline int remove_extent_data_ref(struct 
btrfs_trans_handle *trans,
return ret;
 }
 
-static noinline u32 extent_data_ref_count(struct btrfs_root *root,
- struct btrfs_path *path,
+static noinline u32 extent_data_ref_count(struct btrfs_path *path,
  struct btrfs_extent_inline_ref *iref)
 {
struct btrfs_key key;
@@ -6318,7 +6317,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
} else {
if (found_extent) {
BUG_ON(is_data  refs_to_drop !=
-  extent_data_ref_count(root, path, iref));
+  extent_data_ref_count(path, iref));
if (iref) {
BUG_ON(path-slots[0] != extent_slot);
} else {
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why subvolume and not just volume?

2015-08-06 Thread Duncan
Martin posted on Wed, 05 Aug 2015 09:06:40 +0200 as excerpted:

 [W]hat is the penalty of a subvolume compared to a directory? From a
 design perspective, couldn't all directories just be subvolumes?

In addition to the performance issues mentioned by others, there's at 
least one further practical reason as well.

Snapshots stop at subvolume boundaries.  It's thus quite useful to use 
subvolumes to delineate the limits of the snapshot, saying, in effect, 
snapshot this dir (which happens to be a subvol not just a normal dir) 
recursively, but don't snapshot the subtree starting with this nested 
subdir (which again is a (different) subvol).

Subvols act very much like directories, it is true.  But they have a few 
additional properties and different behaviors, and it is the distinction 
between directories and subvols that makes them valuable /as/ subvols.  
Without a distinction, the whole reason to have subvols as a separate 
feature vanishes.

(FWIW, the first systemd release, v219, to use btrfs subvolume in place 
of directories found out some of the behavior differences the hard way.  
Where it was previously doing mkdir, which returns success if the 
directory is already there, critical for a root filesystem keep read-only 
mounted by default, but with the required directories already created, on 
btrfs it tried to create a subvolume instead, which fails if there's a 
directory already there, particularly if it's a read-only mount.  So the 
behavior creating a subvol differs from that of creating a subdir, and 
systemd's tmpfiles service was failing on read-only btrfs mounts as a 
result, while it previously succeeded, when it was only trying to create 
directories, which already existed.  Oops!  The bug was fixed in v221, 
but the experience does illustrate that while subvolumes behave in /many/ 
ways like subdirs, there are indeed small differences in behavior that 
can leap up and bite the unwary.)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/8] Allow GFP_NOFS allocation to fail

2015-08-06 Thread Michal Hocko
On Wed 05-08-15 20:58:25, Andreas Dilger wrote:
 On Aug 5, 2015, at 3:51 AM, mho...@kernel.org wrote:
[...]
  The rest are the FS specific patches to fortify allocations
  requests which are really needed to finish transactions without RO
  remounts. There might be more needed but my test case survives with
  these in place.
 
 Wouldn't it make more sense to order the fs-specific patches _before_
 the GFP_NOFS can fail patch (#3), so that once that patch is applied
 all known failures have already been fixed?  Otherwise it could show
 test failures during bisection that would be confusing.

As I write below. If maintainers consider them useful even when GFP_NOFS
doesn't fail I will reword them and resend. But you cannot fix the world
without breaking it first in this case ;)
 
  They would obviously need some rewording if they are going to be
  applied even without Patch3 and I will do that if respective
  maintainers will take them. Ext3 and JBD are going away soon so they
  might be dropped but they have been in the tree while I was testing
  so I've kept them.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: qgroup: Fix a regression in qgroup reserved space.

2015-08-06 Thread Chris Mason
On Thu, Aug 06, 2015 at 04:42:41PM +0800, Qu Wenruo wrote:
 Hi Chris,
 
 Would you please consider including this patch into v4.2 if it is still
 possible?
 
 Although the fix is still not perfect and just a hotfix, as qgroup reserve
 parts still have a lot of problems from design and a lot of operations can
 still cause reserve space leak.
 
 But considering how easy it is to trigger, I still hope it to be merged asap
 before 4.2.

Thanks Qu, I'll get this in.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: btrfs-progs: android build

2015-08-06 Thread David Sterba
Hi,

On Thu, Aug 06, 2015 at 06:45:11PM +0900, �� wrote:
 Hi, I made btrfs-progs android build script and test it.\

Thanks. The changes as they stand are too intrusive to be added but give
me an idea what's needed.

The makefile part shares some variables with current make and adds some
specific variables and includes. Ideally there's only one makefile but
I think that we can live with a separate makefile for android as there
seem to be specific quirks that would complicate the common makefile.

This means we'd have to keep the shared part in sync manually but it's
not that hard. New files or new libs, this always requires more care.

 And need some help on btrfs_wipe_existing_sb().\
 On the test it looks work well.\

The code changes should be hidden in wrappers and pulled via a separate
header file. If this is not possible, the ifdefs should be in the
function implementations (eg. is_ssd, check_overwrite), not at the call
sites.

Other than that, I don't mind adding support for android builds.  I don't
have an android build environment at hand and cannot verify that it's
always working, so the same holds as for musl libc, fixes are up to you.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()

2015-08-06 Thread Zhao Lei
When a task trying to double lock a extent buffer, there are no
lockdep warning about it because this lock may be in blocking_lock
state, and make us hard to debug.

This patch add a WARN_ON() for above condition, it can not report
all deadlock cases(as lock between tasks), but at least helps us
some.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/locking.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index f8229ef..d7e6baf 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -241,6 +241,7 @@ void btrfs_tree_read_unlock_blocking(struct extent_buffer 
*eb)
  */
 void btrfs_tree_lock(struct extent_buffer *eb)
 {
+   WARN_ON(eb-lock_owner == current-pid);
 again:
wait_event(eb-read_lock_wq, atomic_read(eb-blocking_readers) == 0);
wait_event(eb-write_lock_wq, atomic_read(eb-blocking_writers) == 0);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html