Re: [PATCH v2 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-27 Thread Qu Wenruo


On 2018年03月28日 13:04, Amir Goldstein wrote:
> On Wed, Mar 28, 2018 at 7:40 AM, Qu Wenruo  wrote:
>> Basic test case which triggers fsstress with dm-log-writes, and then
>> check the fs after each FUA writes.
>> With needed infrastructure and special handlers for journal based fs.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>> changelog:
>> v2:
>>   Use proper "SUSE Linux Products GmbH" instead of "SuSE"
>>   Get rid of dm-snapshot which is pretty slow if we're creating and
>>   deleting snapshots repeatedly.
>>   (Maybe LVM thin provision would be much better, but current replay
>>solution is good so far, and no slower than dm-snapshot)
>>   Add back run_check so that we can get the seed.
>> ---
>> Unfortunately, neither xfs nor ext4 survies this test for even single
>> success, while btrfs survives.
>> (Although not what I want, I'm just trying my luck
>> to reproduce a serious btrfs corruption situation)
>>
>> Although btrfs may be the fastest fs for the test, since it has fixed
>> small amount of write in mkfs and almost nothing to replay, it still
>> takes about 240s~300s to finish (the same level using snapshot).
>>
>> It would take longer time for ext4 for its large amount of write during
>> mkfs, if it can survive the test in the first space.
> 
> Hmm, how much time would the total test would take if you don't fail
> it on fsck? I am asking because it may be possible to improve this with
> only a single snapshot after mkfs.

The only fs which can pass the test right now is btrfs, so other
estimation is mostly based on guess.

> 
> Anyway, if total test run time is expected to be under 10min I wouldn't
> bother with this optimization, at least not right now. IMO it is more
> important to get the test out there to get the corruption bugs floating.

I'd say from current status, if XFS doesn't fail, it would definitely
finish in 10min.
For EXT4 I'm not pretty sure.

I'd like to  keep current test case as simple as possible right now, and
for later enhancement, I have several different ideas:

1) Reflink fs + loopback
   Yep, use btrfs/xfs as base filesystem and create copy using reflink,
   then use such files as loopback device.
   The good thing is, AFAIK btrfs/xfs reflink is really fast.
   Much much faster than dm-snapshot or even LVM thin.

   The much much smaller block size (4K default) makes CoW overhead
   (LVM thin is 64K, not sure about dm-snapshot though).

   The problem is, such setup needs extra mount point and can be a
   little hard to setup, and we're introducing another layer of fs,
   if the fs itself has some hidden bug, it would screw up the test
   case.

2) LVM thin provision
   LVM thin provision looks much like btrfs/xfs for block level, and
   smaller default block size (64K vs original 2M) makes CoW overhead
   smaller.

   I'm currently testing this method, the good thing is it's a little
   easier to setup and we can use single mount point.

Anyway, with proper and efficient snapshot ability implemented, I will
definitely convert this test case, and add Flush test case.

Thanks for your review too,
Qu

> 
> Thanks for working on this!
> You can add
> Reviewed-by: Amir Goldstein 
> 
> 
>>
>> As a comparison, btrfs takes about 5 seconds to replay log, mount,
>> unmount and run fsck at the end of the test.
>> But for ext4, it already takes about 5 seconds to do the same thing
>> before triggering fsck error.
>>
>> Fsck fail for ext4:
>> _check_generic_filesystem: filesystem on /dev/mapper/test-scratch1 is 
>> inconsistent
>> *** fsck.ext4 output ***
>> fsck from util-linux 2.31.1
>> e2fsck 1.43.8 (1-Jan-2018)
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 131076 extent tree (at level 1) could be shorter.  Fix? no
>>
>> Inode 131262, i_size is 0, should be 258048.  Fix? no
>>
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>>
>> For xfs:
>> _check_xfs_filesystem: filesystem on /dev/mapper/test-scratch1 is 
>> inconsistent (r)
>> *** xfs_repair -n output ***
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>> - zero log...
>> - scan filesystem freespace and inode maps...
>> - found root inode chunk
>> Phase 3 - for each AG...
>> - scan (but don't clear) agi unlinked lists...
>> - process known inodes and perform inode discovery...
>> - agno = 0
>> - agno = 1
>> - agno = 2
>> bad symlink header ino 8409190, file block 0, disk block 1051147
>> problem with symbolic link in inode 8409190
>> would have cleared inode 8409190
>> - agno = 3
>> - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>> - setting up duplicate extent list...
>> - check for inodes claiming duplicate blocks...
>> - agno = 0
>> - agno = 1
>> - agno = 3
>> - 

Re: [PATCH v2 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-27 Thread Amir Goldstein
On Wed, Mar 28, 2018 at 7:40 AM, Qu Wenruo  wrote:
> Basic test case which triggers fsstress with dm-log-writes, and then
> check the fs after each FUA writes.
> With needed infrastructure and special handlers for journal based fs.
>
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Use proper "SUSE Linux Products GmbH" instead of "SuSE"
>   Get rid of dm-snapshot which is pretty slow if we're creating and
>   deleting snapshots repeatedly.
>   (Maybe LVM thin provision would be much better, but current replay
>solution is good so far, and no slower than dm-snapshot)
>   Add back run_check so that we can get the seed.
> ---
> Unfortunately, neither xfs nor ext4 survies this test for even single
> success, while btrfs survives.
> (Although not what I want, I'm just trying my luck
> to reproduce a serious btrfs corruption situation)
>
> Although btrfs may be the fastest fs for the test, since it has fixed
> small amount of write in mkfs and almost nothing to replay, it still
> takes about 240s~300s to finish (the same level using snapshot).
>
> It would take longer time for ext4 for its large amount of write during
> mkfs, if it can survive the test in the first space.

Hmm, how much time would the total test would take if you don't fail
it on fsck? I am asking because it may be possible to improve this with
only a single snapshot after mkfs.

Anyway, if total test run time is expected to be under 10min I wouldn't
bother with this optimization, at least not right now. IMO it is more
important to get the test out there to get the corruption bugs floating.

Thanks for working on this!
You can add
Reviewed-by: Amir Goldstein 


>
> As a comparison, btrfs takes about 5 seconds to replay log, mount,
> unmount and run fsck at the end of the test.
> But for ext4, it already takes about 5 seconds to do the same thing
> before triggering fsck error.
>
> Fsck fail for ext4:
> _check_generic_filesystem: filesystem on /dev/mapper/test-scratch1 is 
> inconsistent
> *** fsck.ext4 output ***
> fsck from util-linux 2.31.1
> e2fsck 1.43.8 (1-Jan-2018)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 131076 extent tree (at level 1) could be shorter.  Fix? no
>
> Inode 131262, i_size is 0, should be 258048.  Fix? no
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
>
> For xfs:
> _check_xfs_filesystem: filesystem on /dev/mapper/test-scratch1 is 
> inconsistent (r)
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> bad symlink header ino 8409190, file block 0, disk block 1051147
> problem with symbolic link in inode 8409190
> would have cleared inode 8409190
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 3
> - agno = 2
> entry "lb" in shortform directory 8409188 references free inode 8409190
> would have junked entry "lb" in directory inode 8409188
> bad symlink header ino 8409190, file block 0, disk block 1051147
> problem with symbolic link in inode 8409190
> would have cleared inode 8409190
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> entry "lb" in shortform directory inode 8409188 points to free inode 8409190
> would junk entry
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> Maximum metadata LSN (1:396) is ahead of log (1:63).
> Would format log to cycle 4.
> No modify flag set, skipping filesystem flush and exiting.
>
> And special note for XFS guys, I also hit an XFS internal metadata
> warning during journal replay:
> [ 7901.423659] XFS (dm-4): Starting recovery (logdev: internal)
> [ 7901.577511] XFS (dm-4): Metadata corruption detected at 
> xfs_dinode_verify+0x467/0x570 [xfs], inode 0x805067 dinode
> [ 7901.580529] XFS (dm-4): Unmount and run xfs_repair
> [ 7901.581901] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
> [ 7901.583205] b8963f41: 49 4e a1 ff 03 02 00 00 00 00 00 00 00 00 00 
> 00  IN..
> [ 7901.584659] f35a50e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [ 7901.586075] 386eea9e: 5a b2 0e 69 0a f3 43 27 5a b2 0e 69 0a f3 43 
> 27  Z..i..C'Z..i..C'
> [ 7901.587561] ac636661: 5a b2 0e 69 0d 

[PATCH v2 2/3] fstests: log-writes: Add support for METADATA flag

2018-03-27 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
Reviewed-by: Amir Goldstein 
---
changelog:
v2:
  Add Amir's reviewed-by tag.
---
 src/log-writes/log-writes.c | 3 ++-
 src/log-writes/log-writes.h | 9 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/log-writes/log-writes.c b/src/log-writes/log-writes.c
index a872429d3f41..5dc22c244cea 100644
--- a/src/log-writes/log-writes.c
+++ b/src/log-writes/log-writes.c
@@ -130,7 +130,8 @@ struct flags_to_str_entry {
DEFINE_LOG_FLAGS_STR_ENTRY(FLUSH),
DEFINE_LOG_FLAGS_STR_ENTRY(FUA),
DEFINE_LOG_FLAGS_STR_ENTRY(DISCARD),
-   DEFINE_LOG_FLAGS_STR_ENTRY(MARK)
+   DEFINE_LOG_FLAGS_STR_ENTRY(MARK),
+   DEFINE_LOG_FLAGS_STR_ENTRY(METADATA)
 };
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/src/log-writes/log-writes.h b/src/log-writes/log-writes.h
index 35ca35838aac..75fb8ac0bf79 100644
--- a/src/log-writes/log-writes.h
+++ b/src/log-writes/log-writes.h
@@ -20,10 +20,11 @@ typedef __u32 u32;
 /*
  * Constants copied from kernel file drivers/md/dm-log-writes.c
  */
-#define LOG_FLUSH_FLAG (1 << 0)
-#define LOG_FUA_FLAG (1 << 1)
-#define LOG_DISCARD_FLAG (1 << 2)
-#define LOG_MARK_FLAG (1 << 3)
+#define LOG_FLUSH_FLAG (1 << 0)
+#define LOG_FUA_FLAG   (1 << 1)
+#define LOG_DISCARD_FLAG   (1 << 2)
+#define LOG_MARK_FLAG  (1 << 3)
+#define LOG_METADATA_FLAG  (1 << 4)
 
 #define WRITE_LOG_VERSION 1
 #define WRITE_LOG_MAGIC 0x6a736677736872
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] fstests: log-writes: Add support to output human readable flags

2018-03-27 Thread Qu Wenruo
Also change the flag numeric output to hex.

Signed-off-by: Qu Wenruo 
Reviewed-by: Amir Goldstein 
---
changelog:
v2:
  Add Amir's reviewed-by tag.
---
 src/log-writes/log-writes.c | 70 -
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/src/log-writes/log-writes.c b/src/log-writes/log-writes.c
index 09391574c4d2..a872429d3f41 100644
--- a/src/log-writes/log-writes.c
+++ b/src/log-writes/log-writes.c
@@ -120,6 +120,58 @@ int log_discard(struct log *log, struct log_write_entry 
*entry)
return 0;
 }
 
+#define DEFINE_LOG_FLAGS_STR_ENTRY(x)  \
+   {LOG_##x##_FLAG, #x}
+
+struct flags_to_str_entry {
+   u64 flags;
+   const char *str;
+} log_flags_table[] = {
+   DEFINE_LOG_FLAGS_STR_ENTRY(FLUSH),
+   DEFINE_LOG_FLAGS_STR_ENTRY(FUA),
+   DEFINE_LOG_FLAGS_STR_ENTRY(DISCARD),
+   DEFINE_LOG_FLAGS_STR_ENTRY(MARK)
+};
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#define LOG_FLAGS_BUF_SIZE 128
+/*
+ * Convert numeric flags to human readable flags.
+ * @flags: numeric flags
+ * @buf:   output buffer for human readable string.
+ * must have enough space (LOG_FLAGS_BUF_SIZE) to contain all
+ * the string
+ */
+static void entry_flags_to_str(u64 flags, char *buf)
+{
+   int empty = 1;
+   int left_len;
+   int i;
+
+   buf[0] = '\0';
+   for (i = 0; i < ARRAY_SIZE(log_flags_table); i++) {
+   if (flags & log_flags_table[i].flags) {
+   if (!empty)
+   strncat(buf, "|", LOG_FLAGS_BUF_SIZE);
+   empty = 0;
+   strncat(buf, log_flags_table[i].str, 
LOG_FLAGS_BUF_SIZE);
+   flags &= ~log_flags_table[i].flags;
+   }
+   }
+   if (flags) {
+   if (!empty)
+   strncat(buf, "|", LOG_FLAGS_BUF_SIZE);
+   empty = 0;
+   left_len = LOG_FLAGS_BUF_SIZE - strnlen(buf,
+   LOG_FLAGS_BUF_SIZE);
+   if (left_len > 0)
+   snprintf(buf + strnlen(buf, LOG_FLAGS_BUF_SIZE),
+left_len, "UNKNOWN.0x%llx", flags);
+   }
+   if (empty)
+   strncpy(buf, "NONE", LOG_FLAGS_BUF_SIZE);
+}
+
 /*
  * @log: the log we are replaying.
  * @entry: entry to be replayed.
@@ -179,6 +231,7 @@ int log_replay_next_entry(struct log *log, struct 
log_write_entry *entry,
size_t read_size = read_data ? log->sectorsize :
sizeof(struct log_write_entry);
char *buf;
+   char flags_buf[LOG_FLAGS_BUF_SIZE];
ssize_t ret;
off_t offset;
int skip = 0;
@@ -210,19 +263,20 @@ int log_replay_next_entry(struct log *log, struct 
log_write_entry *entry,
log->cur_pos += read_size;
}
 
+   flags = le64_to_cpu(entry->flags);
+   entry_flags_to_str(flags, flags_buf);
skip = log_should_skip(log, entry);
if (log_writes_verbose > 1 || (log_writes_verbose && !skip)) {
-   printf("%s %d@%llu: sector %llu, size %llu, flags %llu\n",
+   printf("%s %d@%llu: sector %llu, size %llu, flags 0x%llx(%s)\n",
   skip ? "skipping" : "replaying",
   (int)log->cur_entry - 1, log->cur_pos / log->sectorsize,
   (unsigned long long)le64_to_cpu(entry->sector),
   (unsigned long long)size,
-  (unsigned long long)le64_to_cpu(entry->flags));
+  (unsigned long long)flags, flags_buf);
}
if (!size)
return 0;
 
-   flags = le64_to_cpu(entry->flags);
if (flags & LOG_DISCARD_FLAG)
return log_discard(log, entry);
 
@@ -301,7 +355,7 @@ int log_seek_entry(struct log *log, u64 entry_num)
return -1;
}
if (log_writes_verbose > 1)
-   printf("seek entry %d@%llu: %llu, size %llu, flags 
%llu\n",
+   printf("seek entry %d@%llu: %llu, size %llu, flags 
0x%llx\n",
   (int)i, log->cur_pos / log->sectorsize,
   (unsigned long long)le64_to_cpu(entry.sector),
   (unsigned long 
long)le64_to_cpu(entry.nr_sectors),
@@ -339,6 +393,7 @@ int log_seek_next_entry(struct log *log, struct 
log_write_entry *entry,
size_t read_size = read_data ? log->sectorsize :
sizeof(struct log_write_entry);
u64 flags;
+   char flags_buf[LOG_FLAGS_BUF_SIZE];
ssize_t ret;
 
if (log->cur_entry >= log->nr_entries)
@@ -366,14 +421,15 @@ int log_seek_next_entry(struct log *log, struct 
log_write_entry *entry,
} else {
log->cur_pos += read_size;
}
+   flags = 

[PATCH v2 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-27 Thread Qu Wenruo
Basic test case which triggers fsstress with dm-log-writes, and then
check the fs after each FUA writes.
With needed infrastructure and special handlers for journal based fs.

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Use proper "SUSE Linux Products GmbH" instead of "SuSE"
  Get rid of dm-snapshot which is pretty slow if we're creating and
  deleting snapshots repeatedly.
  (Maybe LVM thin provision would be much better, but current replay
   solution is good so far, and no slower than dm-snapshot)
  Add back run_check so that we can get the seed.
---
Unfortunately, neither xfs nor ext4 survies this test for even single
success, while btrfs survives.
(Although not what I want, I'm just trying my luck
to reproduce a serious btrfs corruption situation)

Although btrfs may be the fastest fs for the test, since it has fixed
small amount of write in mkfs and almost nothing to replay, it still
takes about 240s~300s to finish (the same level using snapshot).

It would take longer time for ext4 for its large amount of write during
mkfs, if it can survive the test in the first space.

As a comparison, btrfs takes about 5 seconds to replay log, mount,
unmount and run fsck at the end of the test.
But for ext4, it already takes about 5 seconds to do the same thing
before triggering fsck error.

Fsck fail for ext4:
_check_generic_filesystem: filesystem on /dev/mapper/test-scratch1 is 
inconsistent
*** fsck.ext4 output ***
fsck from util-linux 2.31.1
e2fsck 1.43.8 (1-Jan-2018)
Pass 1: Checking inodes, blocks, and sizes
Inode 131076 extent tree (at level 1) could be shorter.  Fix? no

Inode 131262, i_size is 0, should be 258048.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

For xfs:
_check_xfs_filesystem: filesystem on /dev/mapper/test-scratch1 is inconsistent 
(r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
bad symlink header ino 8409190, file block 0, disk block 1051147
problem with symbolic link in inode 8409190
would have cleared inode 8409190
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 3
- agno = 2
entry "lb" in shortform directory 8409188 references free inode 8409190
would have junked entry "lb" in directory inode 8409188
bad symlink header ino 8409190, file block 0, disk block 1051147
problem with symbolic link in inode 8409190
would have cleared inode 8409190
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
entry "lb" in shortform directory inode 8409188 points to free inode 8409190
would junk entry
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (1:396) is ahead of log (1:63).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.

And special note for XFS guys, I also hit an XFS internal metadata
warning during journal replay:
[ 7901.423659] XFS (dm-4): Starting recovery (logdev: internal)
[ 7901.577511] XFS (dm-4): Metadata corruption detected at 
xfs_dinode_verify+0x467/0x570 [xfs], inode 0x805067 dinode
[ 7901.580529] XFS (dm-4): Unmount and run xfs_repair
[ 7901.581901] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
[ 7901.583205] b8963f41: 49 4e a1 ff 03 02 00 00 00 00 00 00 00 00 00 
00  IN..
[ 7901.584659] f35a50e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  
[ 7901.586075] 386eea9e: 5a b2 0e 69 0a f3 43 27 5a b2 0e 69 0a f3 43 
27  Z..i..C'Z..i..C'
[ 7901.587561] ac636661: 5a b2 0e 69 0d 92 bc 00 00 00 00 00 00 00 00 
00  Z..i
[ 7901.588969] d75f9093: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  
[ 7901.590475] d2af5688: 00 00 00 02 00 00 00 00 00 00 00 00 84 d7 6a 
e9  ..j.
[ 7901.591907] e8dd8211: ff ff ff ff 34 93 a9 3a 00 00 00 00 00 00 00 
04  4..:
[ 7901.593319] b7610e4e: 00 00 00 01 00 00 00 45 00 00 00 00 00 00 00 
00  ...E
---

 common/dmlogwrites|  56 +++
 tests/generic/481 | 104 ++
 tests/generic/481.out |   2 +
 tests/generic/group   |   1 +
 4 files changed, 163 insertions(+)
 create mode 100755 tests/generic/481
 create mode 100644 

Re: [PATCH] fstests: test btrfs fsync after hole punching with no-holes mode

2018-03-27 Thread Eryu Guan
On Mon, Mar 26, 2018 at 11:59:21PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that when we have the no-holes mode enabled and a specific metadata
> layout, if we punch a hole and fsync the file, at replay time the whole
> hole was preserved.
> 
> This issue is fixed by the following btrfs patch for the linux kernel:
> 
>   "Btrfs: fix fsync after hole punching when using no-holes feature"

I'd expect a test failure with 4.16-rc6 kernel, as the mentioned fix
above is not there. But test always passes for me. Did I miss anything?
btrfs-progs version is btrfs-progs-4.11.1-3.fc27.

> 
> Signed-off-by: Filipe Manana 
> ---
>  tests/btrfs/159 | 100 
> 
>  tests/btrfs/159.out |   5 +++
>  tests/btrfs/group   |   1 +
>  3 files changed, 106 insertions(+)
>  create mode 100755 tests/btrfs/159
>  create mode 100644 tests/btrfs/159.out
> 
> diff --git a/tests/btrfs/159 b/tests/btrfs/159
> new file mode 100755
> index ..6083975a
> --- /dev/null
> +++ b/tests/btrfs/159
> @@ -0,0 +1,100 @@
> +#! /bin/bash
> +# FSQA Test No. 159
> +#
> +# Test that when we have the no-holes mode enabled and a specific metadata
> +# layout, if we punch a hole and fsync the file, at replay time the whole
> +# hole was preserved.
> +#
> +#---
> +#
> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + _cleanup_flakey
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs generic
 ^^^ btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +_require_xfs_io_command "fpunch"
> +
> +rm -f $seqres.full
> +
> +# We create the filesystem with a node size of 64Kb because we need to 
> create a
> +# specific metadata layout in order to trigger the bug we are testing. At the
> +# moment the node size can not be smaller then the system's page size, so 
> given
> +# that the largest possible page size is 64Kb and by default the node size is
> +# set to the system's page size value, we explictly create a filesystem with 
> a
> +# 64Kb node size.
> +_scratch_mkfs -O no-holes -n $((64 * 1024)) >>$seqres.full 2>&1
> +_require_metadata_journaling $SCRATCH_DEV
> +_init_flakey
> +_mount_flakey
> +
> +# Create our test file with 831 extents of 256Kb each. Before each extent, 
> there

I think it's 832 extents in total? As the index starts with 0.

Otherwise looks good to me.

Thanks,
Eryu

> +# is a 256Kb hole (except for the first extent, which starts at offset 0). 
> This
> +# creates two leafs in the filesystem tree, with the extent at offset 
> 216530944
> +# being the last item in the first leaf and the extent at offset 217055232 
> being
> +# the first item in the second leaf.
> +for ((i = 0; i <= 831; i++)); do
> + offset=$((i * 2 * 256 * 1024))
> + $XFS_IO_PROG -f -c "pwrite -S 0xab -b 256K $offset 256K" \
> + $SCRATCH_MNT/foobar >/dev/null
> +done
> +
> +# Now persist everything done so far.
> +sync
> +
> +# Now punch a hole that covers part of the extent at offset 216530944.
> +$XFS_IO_PROG -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" \
> +  -c "fsync" \
> +  $SCRATCH_MNT/foobar
> +
> +echo "File digest before power failure:"
> +md5sum $SCRATCH_MNT/foobar | _filter_scratch
> +
> +# Simulate a power failure and mount the filesystem to check that replaying 
> the
> +# fsync log/journal succeeds and our test file has the expected content.
> +_flakey_drop_and_remount
> +
> +echo "File digest after power failure and log replay:"
> +md5sum $SCRATCH_MNT/foobar | _filter_scratch
> +
> +_unmount_flakey
> +_cleanup_flakey
> +
> +status=0
> +exit
> diff --git a/tests/btrfs/159.out b/tests/btrfs/159.out
> new file mode 100644

Re: [PATCH v2 2/3] btrfs: Allow rmdir(2) to delete a subvolume

2018-03-27 Thread Misono Tomohiro
On 2018/03/27 21:39, Nikolay Borisov wrote:
> 
> 
> On 26.03.2018 11:30, Misono Tomohiro wrote:
>> This patch changes the behavior of rmdir(2) to allow it to delete
>> an empty subvolume by default, unless it is not a default subvolume
>> and send is not in progress.
>>
>> New function btrfs_delete_subvolume() is almost equal to the second half
>> of btrfs_ioctl_snap_destroy(). This function requires inode_lock for both
>> @dir and inode of @dentry. For rmdir(2) it is already acquired in vfs
>> layer before calling btrfs_rmdir().
>>
>> Note that while a non-privileged user cannot delete a read-only subvolume
>> by "btrfs subvolume delete" when user_subvol_rm_allowd mount option is
>> enabled, rmdir(2) can delete an empty read-only subvolume.
>> (However, rm -r cannot use for read-only subvolume containing files.)
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>  fs/btrfs/inode.c | 143 
>> ++-
>>  1 file changed, 142 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index db66fa4fede6..b778776eee8e 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4387,6 +4387,147 @@ noinline int may_destroy_subvol(struct btrfs_root 
>> *root)
>>  return ret;
>>  }
>>  
>> +static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
>> +{
>> +struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
>> +struct btrfs_root *root = BTRFS_I(dir)->root;
>> +struct inode *inode = d_inode(dentry);
>> +struct btrfs_root *dest = BTRFS_I(inode)->root;
>> +struct btrfs_trans_handle *trans;
>> +struct btrfs_block_rsv block_rsv;
>> +u64 root_flags;
>> +u64 qgroup_reserved;
>> +int ret;
>> +int err;
>> +
>> +/*
>> + * Don't allow to delete a subvolume with send in progress. This is
>> + * inside the i_mutex so the error handling that has to drop the bit
>> + * again is not run concurrently.
>> + */
>> +spin_lock(>root_item_lock);
>> +root_flags = btrfs_root_flags(>root_item);
>> +if (dest->send_in_progress == 0) {
>> +btrfs_set_root_flags(>root_item,
>> +root_flags | BTRFS_ROOT_SUBVOL_DEAD);
>> +spin_unlock(>root_item_lock);
>> +} else {
>> +spin_unlock(>root_item_lock);
>> +btrfs_warn(fs_info,
>> +   "Attempt to delete subvolume %llu during send",
>> +   dest->root_key.objectid);
>> +err = -EPERM;
>> +return err;
>> +}
>> +
>> +down_write(_info->subvol_sem);
>> +
>> +err = may_destroy_subvol(dest);
>> +if (err)
>> +goto out_up_write;
>> +
>> +btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP);
>> +/*
>> + * One for dir inode, two for dir entries, two for root
>> + * ref/backref.
>> + */
>> +err = btrfs_subvolume_reserve_metadata(root, _rsv,
>> +   5, _reserved, true);
>> +if (err)
>> +goto out_up_write;
>> +
>> +trans = btrfs_start_transaction(root, 0);
>> +if (IS_ERR(trans)) {
>> +err = PTR_ERR(trans);
>> +goto out_release;
>> +}
>> +trans->block_rsv = _rsv;
>> +trans->bytes_reserved = block_rsv.size;
>> +
>> +btrfs_record_snapshot_destroy(trans, BTRFS_I(dir));
>> +
>> +ret = btrfs_unlink_subvol(trans, root, dir,
>> +dest->root_key.objectid,
>> +dentry->d_name.name,
>> +dentry->d_name.len);
>> +if (ret) {
>> +err = ret;
>> +btrfs_abort_transaction(trans, ret);
>> +goto out_end_trans;
>> +}
>> +
>> +btrfs_record_root_in_trans(trans, dest);
>> +
>> +memset(>root_item.drop_progress, 0,
>> +sizeof(dest->root_item.drop_progress));
>> +dest->root_item.drop_level = 0;
>> +btrfs_set_root_refs(>root_item, 0);
>> +
>> +if (!test_and_set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, >state)) {
>> +ret = btrfs_insert_orphan_item(trans,
>> +fs_info->tree_root,
>> +dest->root_key.objectid);
>> +if (ret) {
>> +btrfs_abort_transaction(trans, ret);
>> +err = ret;
>> +goto out_end_trans;
>> +}
>> +}
>> +
>> +ret = btrfs_uuid_tree_rem(trans, fs_info, dest->root_item.uuid,
>> +  BTRFS_UUID_KEY_SUBVOL,
>> +  dest->root_key.objectid);
>> +if (ret && ret != -ENOENT) {
>> +btrfs_abort_transaction(trans, ret);
>> +err = ret;
>> +goto out_end_trans;
>> +}
>> +if (!btrfs_is_empty_uuid(dest->root_item.received_uuid)) {
>> +ret = btrfs_uuid_tree_rem(trans, fs_info,
>> +  

Re: [PATCH v2 1/3] btrfs: Move may_destroy_subvol() from ioctl.c to inode.c

2018-03-27 Thread Misono Tomohiro
On 2018/03/26 17:51, Nikolay Borisov wrote:
> 
> 
> On 26.03.2018 11:28, Misono Tomohiro wrote:
>> This is a preparation work to allow rmdir(2) to delete a subvolume.
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>  fs/btrfs/ctree.h |  1 +
>>  fs/btrfs/inode.c | 54 ++
>>  fs/btrfs/ioctl.c | 54 --
>>  3 files changed, 55 insertions(+), 54 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index da308774b8a4..2dbead106aab 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -3166,6 +3166,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle 
>> *trans,
>>  struct btrfs_root *root,
>>  struct inode *dir, u64 objectid,
>>  const char *name, int name_len);
>> +noinline int may_destroy_subvol(struct btrfs_root *root);
> 
> Why the explicit noinline? If it's for "documentation"/appearing in
> stack traces purposes then you should use noinline_for_stack. But I
> don't think it's really needed.

Just to keep the original declaration.
Also, the third patch returns this function into static again (in ioctl.c).

> 
>>  int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>  int front);
>>  int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index f53470112670..db66fa4fede6 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4333,6 +4333,60 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle 
>> *trans,
>>  return ret;
>>  }
>>  
>> +/*
>> + * helper to check if the subvolume references other subvolumes
>> + */
>> +noinline int may_destroy_subvol(struct btrfs_root *root)
>> +{
>> +struct btrfs_fs_info *fs_info = root->fs_info;
>> +struct btrfs_path *path;
>> +struct btrfs_dir_item *di;
>> +struct btrfs_key key;
>> +u64 dir_id;
>> +int ret;
>> +
>> +path = btrfs_alloc_path();
>> +if (!path)
>> +return -ENOMEM;
>> +
>> +/* Make sure this root isn't set as the default subvol */
>> +dir_id = btrfs_super_root_dir(fs_info->super_copy);
>> +di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path,
>> +   dir_id, "default", 7, 0);
>> +if (di && !IS_ERR(di)) {
>> +btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
>> +if (key.objectid == root->root_key.objectid) {
>> +ret = -EPERM;
>> +btrfs_err(fs_info,
>> +  "deleting default subvolume %llu is not 
>> allowed",
>> +  key.objectid);
>> +goto out;
>> +}
>> +btrfs_release_path(path);
>> +}
>> +
>> +key.objectid = root->root_key.objectid;
>> +key.type = BTRFS_ROOT_REF_KEY;
>> +key.offset = (u64)-1;
>> +
>> +ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0);
>> +if (ret < 0)
>> +goto out;
>> +BUG_ON(ret == 0);
>> +
>> +ret = 0;
>> +if (path->slots[0] > 0) {
>> +path->slots[0]--;
>> +btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
>> +if (key.objectid == root->root_key.objectid &&
>> +key.type == BTRFS_ROOT_REF_KEY)
>> +ret = -ENOTEMPTY;
>> +}
>> +out:
>> +btrfs_free_path(path);
>> +return ret;
>> +}
>> +
>>  static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
>>  {
>>  struct inode *inode = d_inode(dentry);
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 111ee282b777..11af9eb449ef 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1843,60 +1843,6 @@ static noinline int 
>> btrfs_ioctl_subvol_setflags(struct file *file,
>>  return ret;
>>  }
>>  
>> -/*
>> - * helper to check if the subvolume references other subvolumes
>> - */
>> -static noinline int may_destroy_subvol(struct btrfs_root *root)
>> -{
>> -struct btrfs_fs_info *fs_info = root->fs_info;
>> -struct btrfs_path *path;
>> -struct btrfs_dir_item *di;
>> -struct btrfs_key key;
>> -u64 dir_id;
>> -int ret;
>> -
>> -path = btrfs_alloc_path();
>> -if (!path)
>> -return -ENOMEM;
>> -
>> -/* Make sure this root isn't set as the default subvol */
>> -dir_id = btrfs_super_root_dir(fs_info->super_copy);
>> -di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path,
>> -   dir_id, "default", 7, 0);
>> -if (di && !IS_ERR(di)) {
>> -btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
>> -if (key.objectid == root->root_key.objectid) {
>> -ret = -EPERM;
>> -btrfs_err(fs_info,
>> -  "deleting default subvolume %llu is not 
>> allowed",
>> -  

error report: misc-test 006 sometimes fails in current devel branch

2018-03-27 Thread Misono Tomohiro
current devel branch of btrfs-progs (github) occasionally fails at misc-test 
006:
(kernel is 4.16.0-rc7)

==
$ sudo make test-misc TEST=006\*
[LD] fssum
[TEST]   misc-tests.sh
[TEST/misc]   006-image-on-missing-device
/usr/data/src/btrfs-progs/tests//common: line 177: 10819 Aborted
 (core dumped) $INSTRUMENT "$@" >> "$RESULTS" 2>&1
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 006-image-on-missing-device
make: *** [Makefile:329: test-misc] Error 1
==

git bisect points the commit fa754dddc204 "btrfs-progs: mkfs: precreate the 
uuid tree"
and after reverting this, it seems that the test always runs correctly.

Full log of misc-tests-resut.txt when the test fails is below:
--
=== Entering 
/usr/data/src/btrfs-progs/tests//misc-tests/006-image-on-missing-device
### losetup --find --show img1
/dev/loop0
### losetup --find --show img2
/dev/loop1
### /usr/data/src/btrfs-progs/mkfs.btrfs -f -d raid1 -m raid1 
/dev/loop0 /dev/loop1
btrfs-progs v4.15.1
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM /dev/loop0 (2.00GiB) ...
Performing full device TRIM /dev/loop1 (2.00GiB) ...
Label:  (null)
UUID:   220272d0-2454-429c-a198-83eaf954aceb
Node size:  16384
Sector size:4096
Filesystem size:4.00GiB
Block group profiles:
  Data: RAID1   204.75MiB
  Metadata: RAID1   204.75MiB
  System:   RAID1 8.00MiB
SSD detected:   no
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   IDSIZE  PATH
1 2.00GiB  /dev/loop0
2 2.00GiB  /dev/loop1

### mount /dev/loop0 /usr/data/src/btrfs-progs/tests//mnt
### dd if=/dev/zero of=/usr/data/src/btrfs-progs/tests//mnt/a bs=1M 
count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.00643606 s, 1.6 GB/s
### dd if=/dev/zero of=/usr/data/src/btrfs-progs/tests//mnt/b bs=4k 
count=1000 conv=sync
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB, 3.9 MiB) copied, 0.0056388 s, 726 MB/s
### umount /usr/data/src/btrfs-progs/tests//mnt
### /usr/data/src/btrfs-progs/btrfs check /dev/loop0
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/loop0
UUID: 220272d0-2454-429c-a198-83eaf954aceb
found 14843904 bytes used, no error found
total csum bytes: 14240
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 108171
file data blocks allocated: 14712832
 referenced 14712832
### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 
/tmp/test-img.dump
### /usr/data/src/btrfs-progs/btrfs filesystem show /dev/loop0
Label: none  uuid: 220272d0-2454-429c-a198-83eaf954aceb
Total devices 2 FS bytes used 14.16MiB
devid1 size 2.00GiB used 417.50MiB path /dev/loop0
devid2 size 2.00GiB used 417.50MiB path /dev/loop1

### wipefs -a /dev/loop1
/dev/loop1: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 
5f 4d
### losetup -d /dev/loop1
### /usr/data/src/btrfs-progs/btrfs filesystem show /dev/loop0
warning, device 2 is missing
Label: none  uuid: 220272d0-2454-429c-a198-83eaf954aceb
Total devices 2 FS bytes used 14.16MiB
devid1 size 2.00GiB used 417.50MiB path /dev/loop0
*** Some devices missing

### /usr/data/src/btrfs-progs/btrfs check /dev/loop0
checking extents
checking free space cache
failed to load free space cache for block group 30408704
failed to load free space cache for block group 245104640
checking fs roots
checking csums
checking root refs
warning, device 2 is missing
Checking filesystem on /dev/loop0
UUID: 220272d0-2454-429c-a198-83eaf954aceb
found 14843904 bytes used, no error found
total csum bytes: 14240
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 108171
file data blocks allocated: 14712832
 referenced 14712832
### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 
/tmp/test-img.dump
Couldn't map the block 459800576
free(): invalid pointer
failed (ignored, ret=134): /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 
/tmp/test-img.dump
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 006-image-on-missing-device
--

So, the last btrfs-image failed.
Also, it seems that even when the test succeeds, the last btrfs-image actually 
fails:
--
### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 
/tmp/test-img.dump
Couldn't map the block 459800576
ERROR: failed to flush pending data: -5
ERROR: create failed: Bad address
warning, device 2 is missing
failed (ignored, ret=1): /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 
/tmp/test-img.dump

[PATCH v2 5/8] btrfs: check if the fsid in the primary sb and copy sb are same

2018-03-27 Thread Anand Jain
During the btrfs dev scan make sure that other copies of superblock
contain the same fsid as the primary SB. So that we bring to the
user notice if the superblock has been overwritten.

 mkfs.btrfs -fq /dev/sdc
 mkfs.btrfs -fq /dev/sdb
 dd if=/dev/sdb of=/dev/sdc count=4K skip=64K seek=64K obs=1 ibs=1
 mount /dev/sdc /btrfs

Caveat: Pls note that older btrfs-progs do not wipe the non-overwriting
stale superblock like copy2 if a smaller mkfs.btrfs -b   is created.
Thus this patch in the kernel will report error. The workaround is to wipe
the superblock manually, like
  dd if=/dev/zero of= seek=274877906944 ibs=1 obs=1 count4K
OR apply the btrfs-progs patch
  btrfs-progs: wipe copies of the stale superblock beyond -b size
which shall find and wipe the non overwriting superblock during mkfs.

Signed-off-by: Anand Jain 
---
v1->v2:
  Do an explicit read for primary superblock. Drop kzalloc().
  Fix split pr_err().

 fs/btrfs/volumes.c | 47 ---
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e63723f23227..b099823f60d1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1198,40 +1198,65 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
 int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
  struct btrfs_fs_devices **fs_devices_ret)
 {
+   struct btrfs_super_block *disk_super_primary;
struct btrfs_super_block *disk_super;
struct btrfs_device *device;
struct block_device *bdev;
+   struct page *page_primary;
struct page *page;
int ret = 0;
u64 bytenr;
+   int i;
 
-   /*
-* we would like to check all the supers, but that would make
-* a btrfs mount succeed after a mkfs from a different FS.
-* So, we need to add a special mount option to scan for
-* later supers, using BTRFS_SUPER_MIRROR_MAX instead
-*/
-   bytenr = btrfs_sb_offset(0);
flags |= FMODE_EXCL;
 
bdev = blkdev_get_by_path(path, flags, holder);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
 
-   ret = btrfs_read_disk_super(bdev, bytenr, , _super);
+   /*
+* We would like to check all the supers and use one good copy,
+* but that would make a btrfs mount succeed after a mkfs from
+* a different FS.
+* So, we need to add a special mount option to scan for
+* later supers, using BTRFS_SUPER_MIRROR_MAX instead.
+* So, just validate if all copies of the superblocks are ok
+* and have the same fsid.
+*/
+   bytenr = btrfs_sb_offset(0);
+   ret = btrfs_read_disk_super(bdev, bytenr, _primary,
+   _super_primary);
if (ret < 0)
goto error_bdev_put;
 
+   for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+   bytenr = btrfs_sb_offset(i);
+   ret = btrfs_read_disk_super(bdev, bytenr, , _super);
+   if (ret < 0) {
+   ret = 0;
+   continue;
+   }
+
+   if (memcmp(disk_super_primary->fsid, disk_super->fsid,
+  BTRFS_FSID_SIZE)) {
+   pr_err("BTRFS (device %pg): superblock fsid mismatch, 
primary %pU copy%d %pU",
+   bdev, disk_super_primary->fsid, i,
+   disk_super->fsid);
+   ret = -EINVAL;
+   btrfs_release_disk_super(page);
+   goto error_bdev_put;
+   }
+   btrfs_release_disk_super(page);
+   }
+
mutex_lock(_mutex);
-   device = device_list_add(path, disk_super);
+   device = device_list_add(path, disk_super_primary);
if (IS_ERR(device))
ret = PTR_ERR(device);
else
*fs_devices_ret = device->fs_devices;
mutex_unlock(_mutex);
 
-   btrfs_release_disk_super(page);
-
 error_bdev_put:
blkdev_put(bdev, flags);
 
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] btrfs: drop the redundant invalidate_bdev()

2018-03-27 Thread Anand Jain
During the mount context btrfs_open_devices() calls invalidate_bdev()
for all the devices. So drop the invalidate_bdev() in open_ctree()
which is only for the btrfs_fs_devices::latest_bdev.

The call trace is as shown below.

btrfs_mount_root()
 |
 |_btrfs_parse_early_options (-o device only)
 | |_btrfs_scan_one_device
 |   |_btrfs_read_disk_super()
 | |_read_cache_page_gfp()
 |
 |_btrfs_scan_one_device(mount-arg-dev only)
 | |_btrfs_read_disk_super()
 |   |_read_cache_page_gfp()
 |
 |
 |_btrfs_open_devices(fsid:all)
 |  |_btrfs_open_one_device()
 ||_btrfs_get_bdev_and_sb()  <--- invalidate_bdev(fsid:all)
 |  |_btrfs_read_dev_super()
 ||_btrfs_read_dev_one_super()
 |  |___bread()
 |
 |_btrfs_fill_super()
   |_btrfs_open_ctree()   <-- invalidate_bdev(latest_bdev) <-- redundant
 |_btrfs_read_dev_super(latest_bdev only)
   |_btrfs_read_dev_one_super(latest_bdev only)
 |___bread(latest_bdev)

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 110465bec775..628c2b1d1349 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2553,8 +2553,6 @@ int open_ctree(struct super_block *sb,
 
__setup_root(tree_root, fs_info, BTRFS_ROOT_TREE_OBJECTID);
 
-   invalidate_bdev(fs_devices->latest_bdev);
-
/*
 * Read super block and check the signature bytes only
 */
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/8] btrfs: cleanup btrfs_read_disk_super() to return std error

2018-03-27 Thread Anand Jain
The only caller btrfs_scan_one_device() sets -EINVAL for error from
btrfs_read_disk_super(), so this patch returns -EINVAL from the latter
function.

A preparatory patch to add csum check in btrfs_read_disk_super().

Signed-off-by: Anand Jain 
---
v1->v2:
 Check btrfs_read_disk_super() to be more explicit ret < 0

 fs/btrfs/volumes.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 87d183accdb2..e63723f23227 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1154,23 +1154,23 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
 
/* make sure our super fits in the device */
if (bytenr + PAGE_SIZE >= i_size_read(bdev->bd_inode))
-   return 1;
+   return -EINVAL;
 
/* make sure our super fits in the page */
if (sizeof(**disk_super) > PAGE_SIZE)
-   return 1;
+   return -EINVAL;
 
/* make sure our super doesn't straddle pages on disk */
index = bytenr >> PAGE_SHIFT;
if ((bytenr + sizeof(**disk_super) - 1) >> PAGE_SHIFT != index)
-   return 1;
+   return -EINVAL;
 
/* pull in the page with our super */
*page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
   index, GFP_KERNEL);
 
if (IS_ERR_OR_NULL(*page))
-   return 1;
+   return -EINVAL;
 
p = kmap(*page);
 
@@ -1180,7 +1180,7 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
if (btrfs_super_bytenr(*disk_super) != bytenr ||
btrfs_super_magic(*disk_super) != BTRFS_MAGIC) {
btrfs_release_disk_super(*page);
-   return 1;
+   return -EINVAL;
}
 
if ((*disk_super)->label[0] &&
@@ -1218,10 +1218,9 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (IS_ERR(bdev))
return PTR_ERR(bdev);
 
-   if (btrfs_read_disk_super(bdev, bytenr, , _super)) {
-   ret = -EINVAL;
+   ret = btrfs_read_disk_super(bdev, bytenr, , _super);
+   if (ret < 0)
goto error_bdev_put;
-   }
 
mutex_lock(_mutex);
device = device_list_add(path, disk_super);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: wipe copies of the stale superblock beyond -b size

2018-03-27 Thread Anand Jain
During the mkfs.btrfs -b  btrfs_prepare_device() zeros all
the superblock bytenr locations only if the bytenr is below the
blockcount. The problem with this is that if the BTRFS is recreated
with a smaller size then we will leave the stale superblock in the disk
which shall confuse the recovery. As shown in the test case below.

 mkfs.btrfs -qf /dev/mapper/vg-lv
 mkfs.btrfs -qf -b1G /dev/mapper/vg-lv
 btrfs in dump-super -a /dev/mapper/vg-lv | grep '.fsid|superblock:'

 superblock: bytenr=65536, device=/dev/mapper/vg-lv
 dev_item.fsid  ebc67d01-7fc5-43f0-90b4-d1925002551e [match]
 superblock: bytenr=67108864, device=/dev/mapper/vg-lv
 dev_item.fsid  ebc67d01-7fc5-43f0-90b4-d1925002551e [match]
 superblock: bytenr=274877906944, device=/dev/mapper/vg-lv
 dev_item.fsid  b97a9206-593b-4933-a424-c6a6ee23fe7c [match]

So if we find a valid superblock zero it even if it's beyond the
blockcount.

Signed-off-by: Anand Jain 
---
 utils.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/utils.c b/utils.c
index e9cb3a82fda6..6a9408b06e73 100644
--- a/utils.c
+++ b/utils.c
@@ -365,6 +365,41 @@ int btrfs_prepare_device(int fd, const char *file, u64 
*block_count_ret,
return 1;
}
 
+   /*
+* Check for the BTRFS SB copies up until btrfs_device_size() and zero
+* it. So that kernel (or user for the manual recovery) don't have to
+* confuse with the stale SB copy during recovery.
+*/
+   if (block_count != btrfs_device_size(fd, )) {
+   for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+   struct btrfs_super_block *disk_super;
+   char buf[BTRFS_SUPER_INFO_SIZE];
+   disk_super = (struct btrfs_super_block *)buf;
+
+   /* Already zeroed above */
+   if (btrfs_sb_offset(i) < block_count)
+   continue;
+
+   /* Beyond actual disk size */
+   if (btrfs_sb_offset(i) >= btrfs_device_size(fd, ))
+   continue;
+
+   /* Does not contain any stale SB */
+   if (btrfs_read_dev_super(fd, disk_super,
+btrfs_sb_offset(i), 0))
+   continue;
+
+   ret = zero_dev_clamped(fd, btrfs_sb_offset(i),
+   BTRFS_SUPER_INFO_SIZE,
+   btrfs_device_size(fd, ));
+   if (ret < 0) {
+   error("failed to zero device '%s' bytenr %llu: 
%s",
+   file, btrfs_sb_offset(i), 
strerror(-ret));
+   return 1;
+   }
+   }
+   }
+
*block_count_ret = block_count;
return 0;
 }
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/8] btrfs: verify superblock checksum during scan

2018-03-27 Thread Anand Jain
During the scan context, we aren't verifying the superblock-
checksum when read.
This patch fixes it by adding the checksum verification function
btrfs_check_super_csum() in the function btrfs_read_disk_super().
And makes device scan to error fail if the primary superblock csum
is wrong, whereas if the copy-superblock csum is wrong it will just
just report mismatch and continue mount/scan as usual. When the
mount is successful We anyway overwrite all superblocks upon unmount.

The context in which this will be called is - device scan, device ready,
and mount -o device option.

Test script:

 Corrupt primary superblock and check if device scan and mount
 fails:
  mkfs.btrfs -fq /dev/sdc
  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
  btrfs dev scan /dev/sdc
  mount /dev/sdc /btrfs

 Corrupt secondary superblock and check if device scan and mount
 is succcessful, check for the dmesg for errors.
  mkfs.btrfs -fq /dev/sdc
  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=67108864
  btrfs dev scan /dev/sdc
  mount /dev/sdc /btrfs

Signed-off-by: Anand Jain 
---
v1->v2:
 changed title.
 use explicit (< 0) check for %errr.
 Un-split pr_err() string.
 Fix typo in the git commit log.
 Move the csum check after bytenr and btrfs magic verified.

 fs/btrfs/volumes.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b099823f60d1..eda86ba258fc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1149,6 +1149,7 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
 struct page **page,
 struct btrfs_super_block **disk_super)
 {
+   int err;
void *p;
pgoff_t index;
 
@@ -1183,6 +1184,18 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
return -EINVAL;
}
 
+   err = btrfs_check_super_csum((char *) *disk_super);
+   if (err < 0) {
+   if (err == -EINVAL)
+   pr_err("BTRFS error (device %pg): unsupported checksum 
type, bytenr=%llu",
+   bdev, bytenr);
+   else
+   pr_err("BTRFS error (device %pg): superblock checksum 
failed, bytenr=%llu",
+   bdev, bytenr);
+   btrfs_release_disk_super(*page);
+   return err;
+   }
+
if ((*disk_super)->label[0] &&
(*disk_super)->label[BTRFS_LABEL_SIZE - 1])
(*disk_super)->label[BTRFS_LABEL_SIZE - 1] = '\0';
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/8] btrfs: return required error from btrfs_check_super_csum

2018-03-27 Thread Anand Jain
Return the required -EINVAL and -EUCLEAN from the function
btrfs_check_super_csum(). And more the error log into the
parent function.

Signed-off-by: Anand Jain 
---
v1->v2:
 Fix commit indent reported by checkpatch.pl
 Fix pr_err split string

 fs/btrfs/disk-io.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b9b435579527..4c573602480c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -392,9 +392,10 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
 /*
  * Return 0 if the superblock checksum type matches the checksum value of that
  * algorithm. Pass the raw disk superblock data.
+ * Otherwise:  -EINVAL  if csum type is not found
+ * -EUCLEAN if csum does not match
  */
-static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
- char *raw_disk_sb)
+static int btrfs_check_super_csum(char *raw_disk_sb)
 {
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
@@ -402,11 +403,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
u32 crc = ~(u32)0;
char result[sizeof(crc)];
 
-   if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
-   btrfs_err(fs_info, "unsupported checksum algorithm %u",
- csum_type);
-   return 1;
-   }
+   if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes))
+   return -EINVAL;
 
/*
 * The super_block structure does not span the whole
@@ -418,7 +416,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
btrfs_csum_final(crc, result);
 
if (memcmp(raw_disk_sb, result, sizeof(result)))
-   return 1;
+   return -EUCLEAN;
 
return 0;
 }
@@ -2570,9 +2568,14 @@ int open_ctree(struct super_block *sb,
 * We want to check superblock checksum, the type is stored inside.
 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 */
-   if (btrfs_check_super_csum(fs_info, bh->b_data)) {
-   btrfs_err(fs_info, "superblock checksum mismatch");
-   err = -EINVAL;
+   err = btrfs_check_super_csum(bh->b_data);
+   if (err) {
+   if (err == -EINVAL)
+   pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm",
+   fs_devices->latest_bdev);
+   else
+   pr_err("BTRFS error (device %pg): superblock checksum 
mismatch",
+   fs_devices->latest_bdev);
brelse(bh);
goto fail_alloc;
}
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/8] btrfs: verify checksum for all devices in mount context

2018-03-27 Thread Anand Jain
During mount context, we aren't verifying the superblock checksum
for all the devices, instead, we verify it only for the
struct btrfs_fs_device::latest_bdev. This patch fixes it by moving
the checksum verification code from the function open_ctree() into
the function btrfs_read_dev_one_super().

By doing this now we are verifying the superblock checksum in the
mount-context, device-replace and, device-delete context. The
device-replace and device-delete call-chain is as show below after
this patch.

delete-device/replace:
btrfs_rm_device() || btrfs_dev_replace_by_ioctl()
|_btrfs_find_device_by_devspec()
  |_btrfs_find_device_missing_or_by_path()
|_btrfs_find_device_by_path()
  |_btrfs_get_bdev_and_sb()
|_btrfs_read_dev_super()
  |_btrfs_read_dev_one_super()
|_btrfs_check_super_csum()

Test case:
 Before:
   mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
   dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
   mount /dev/sdb /btrfs <-- success as kernel does not check csum
 for non-btrfs_fs_devices::latest_bdev.
 After:
   mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
   dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
   mount /dev/sdb /btrfs
 mount: mount /dev/sdc on /btrfs failed: Structure needs cleaning
   mount -o degraded /dev/sdc /btrfs
 mount: mount /dev/sdc on /btrfs failed: Structure needs cleaning

 So the current recovery step is to fix the primary superblock, by
 using the btrfs-progs cli.

Signed-off-by: Anand Jain 
Reviewed-by: Nikolay Borisov 
---
v1->v2:
 git commit log update. With the call-chain (which I believe will
  go away in the long term, we need the uuid from the userland not
  the disk-path). And add test case.
 Check err < 0 explicitly and drop check for "else if (err == -EUCLEAN)"
  v1:
err = btrfs_check_super_csum(bh->b_data);
if (err) {
if (err == -EINVAL)
pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm",
bdev);
else if (err == -EUCLEAN)
pr_err("BTRFS error (device %pg): superblock checksum 
mismatch",
bdev);
brelse(bh);
return err;
}

 fs/btrfs/disk-io.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 35dbbdc613cd..110465bec775 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2565,22 +2565,6 @@ int open_ctree(struct super_block *sb,
}
 
/*
-* We want to check superblock checksum, the type is stored inside.
-* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
-*/
-   err = btrfs_check_super_csum(bh->b_data);
-   if (err) {
-   if (err == -EINVAL)
-   pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm",
-   fs_devices->latest_bdev);
-   else
-   pr_err("BTRFS error (device %pg): superblock checksum 
mismatch",
-   fs_devices->latest_bdev);
-   brelse(bh);
-   goto fail_alloc;
-   }
-
-   /*
 * super_copy is zeroed at allocation time and we never touch the
 * following bytes up to INFO_SIZE, the checksum is calculated from
 * the whole block of INFO_SIZE
@@ -3126,6 +3110,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, 
int copy_num,
struct buffer_head *bh;
struct btrfs_super_block *super;
u64 bytenr;
+   int err;
 
bytenr = btrfs_sb_offset(copy_num);
if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
@@ -3146,6 +3131,22 @@ int btrfs_read_dev_one_super(struct block_device *bdev, 
int copy_num,
return -EINVAL;
}
 
+   /*
+* Check the superblock checksum, the type is stored inside.
+* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
+*/
+   err = btrfs_check_super_csum(bh->b_data);
+   if (err < 0) {
+   if (err == -EINVAL)
+   pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm",
+   bdev);
+   else
+   pr_err("BTRFS error (device %pg): superblock checksum 
mismatch",
+   bdev);
+   brelse(bh);
+   return err;
+   }
+
*bh_ret = bh;
return 0;
 }
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] Superblock read and verify cleanups

2018-03-27 Thread Anand Jain
v1->v2:
 Various changes suggested by Nicokay. Thanks. For specific changes
 pls ref to the patch.

Patch 1-4/8 are preparatory patches adds cleanups and nonstatic requisites.

Patch 5/8 makes sure that all copies of the superblock have the same fsid
when we scan the device.

Patch 6/8 verifies superblock csum when we read it in the scan context.

Patch 7/8 fixes a bug that we weren't verifying the superblock csum for
the non-latest_bdev.

And 8/8 patch drops the redundant invalidate_bdev() call during mount.

There is a btrfs-progs patch which is a kind of related, as its found that
we weren't wiping the non-overwritten superblock, so it could cause
confusion during the superblock recovery process. So the patch btrfs-progs
1/1 adds code to wipe superblock if we aren't overwriting it.

Now since kernel patch 5/8 checks if all the superblock copies are
pointing to the same fsid on the disk, so the scan will fail if without
the above 1/1 btrfs-progs, as in the example below [1]. However the simple
workaround is to wipe the superblock manually [2] or apply the btrfs-progs
patch below.

[1]
 mkfs.btrfs -qf /dev/sdb <-- 1T disk
 mkfs.btrfs -b 256M  /dev/sdb
 ERROR: device scan failed on /dev/sdb

[2]
 dd if=/dev/zero of= seek=274877906944 ibs=1 obs=1 count4K

Unfortunately, the error messages should have been failed to register
[3] device into the kernel to be more appropriate to the error.

[3]
ret = ioctl(fd, BTRFS_IOC_SCAN_DEV, );
if (ret < 0) {
error("device scan failed on '%s': %m", fname);
ret = -errno;
}


Patches 1-7/8 were sent independently before. And  I found few more things
to fix alongs the line, and since they are related, so I am sending these
all together. Also, as there are minor changes, like in pr_err strings,
and splitting the unrelated changes into a separate patch, so though I am
thankful for the received reviewed-by, I couldn't include them here. Sorry.

Finally, here I am including the function relations [4] so that it will help
to review the code. And this flow is before these patches were applied.

[4]
In the long term, I suggest deprecating ioctl args which pass device path
(where possible), like in delete-device/replace. And
btrfs_read_dev_one_super() should replace btrfs_read_disk_super()

delete-device/replace:
btrfs_rm_device() || btrfs_dev_replace_by_ioctl()
|_btrfs_find_device_by_devspec()
  |_btrfs_find_device_missing_or_by_path()
|_btrfs_find_device_by_path()
  |_btrfs_get_bdev_and_sb()
|_btrfs_read_dev_super()
  |_btrfs_read_dev_one_super()
|___bread()

btrfs_mount_root()
 |
 |_btrfs_parse_early_options (-o device only)
 | |_btrfs_scan_one_device
 |   |_btrfs_read_disk_super()
 | |_read_cache_page_gfp()
 |
 |_btrfs_scan_one_device(mount-arg-dev only)
 | |_btrfs_read_disk_super()
 |   |_read_cache_page_gfp()
 |
 |
 |_btrfs_open_devices(fsid:all)
 |  |_btrfs_open_one_device()
 ||_btrfs_get_bdev_and_sb()  <--- invalidate_bdev(fsid:all)
 |  |_btrfs_read_dev_super()
 ||_btrfs_read_dev_one_super()
 |  |___bread()
 |
 |_btrfs_fill_super()
   |_btrfs_open_ctree()   <-- invalidate_bdev(latest_bdev) <-- redundant
 |_btrfs_read_dev_super(latest_bdev only)
 | |_btrfs_read_dev_one_super(latest_bdev only)
 |   |___bread(latest_bdev)
 |
 |_btrfs_check_super_csum(latest_bdev only) [*]
 |
 |_btrfs_read_chunk_tree
 | |_read_one_dev()
 |   |_open_seed_devices()
 | |_btrfs_open_devices(fs_devices->seed only)

scan/ready
|
|_btrfs_scan_one_device(ioctl-arg-dev only)
   |_btrfs_read_disk_super()
 |_read_cache_page_gfp()


Anand Jain (8):
  btrfs: cleanup btrfs_check_super_csum() for better code flow
  btrfs: return required error from btrfs_check_super_csum
  btrfs: cleanup btrfs_read_disk_super() to return std error
  btrfs: make btrfs_check_super_csum() non static
  btrfs: check if the fsid in the primary sb and copy sb are same
  btrfs: verify checksum when superblock is read for scan
  btrfs: verify checksum for all devices in mount context
  btrfs: drop the redundant invalidate_bdev()

 fs/btrfs/disk-io.c | 72 +++---
 fs/btrfs/disk-io.h |  1 +
 fs/btrfs/volumes.c | 84 ++
 3 files changed, 102 insertions(+), 55 deletions(-)

Anand Jain (1):
  btrfs-progs: wipe copies of the stale superblock beyond -b size

 utils.c | 35 +++
 1 file changed, 35 insertions(+)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] btrfs: make btrfs_check_super_csum() non-static

2018-03-27 Thread Anand Jain
In preparation to add the superblock csum verification for the
scan context, make btrfs_check_super_csum() non-static.

Signed-off-by: Anand Jain 
Reviewed-by: Nikolay Borisov 
---
 fs/btrfs/disk-io.c | 2 +-
 fs/btrfs/disk-io.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4c573602480c..35dbbdc613cd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -395,7 +395,7 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
  * Otherwise:  -EINVAL  if csum type is not found
  * -EUCLEAN if csum does not match
  */
-static int btrfs_check_super_csum(char *raw_disk_sb)
+int btrfs_check_super_csum(char *raw_disk_sb)
 {
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 70a88d61b547..c400cc68f913 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -69,6 +69,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors);
 struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
struct buffer_head **bh_ret);
+int btrfs_check_super_csum(char *raw_disk_sb);
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
  struct btrfs_key *location);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] btrfs: cleanup btrfs_check_super_csum() for better code flow

2018-03-27 Thread Anand Jain
We check the %csum_type twice. Drop one. Check for the csum_type and
then for the csum. Which also matches with the progs which have better
logic. This is preparatory patch to get proper error code from
btrfs_check_super_csum().

Signed-off-by: Anand Jain 
Reviewed-by: Nikolay Borisov 
---
 fs/btrfs/disk-io.c | 38 +-
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1657d6aa4fa6..b9b435579527 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -399,32 +399,28 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
u16 csum_type = btrfs_super_csum_type(disk_sb);
-   int ret = 0;
-
-   if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
-   u32 crc = ~(u32)0;
-   char result[sizeof(crc)];
-
-   /*
-* The super_block structure does not span the whole
-* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
-* is filled with zeros and is included in the checksum.
-*/
-   crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
-   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
-   btrfs_csum_final(crc, result);
-
-   if (memcmp(raw_disk_sb, result, sizeof(result)))
-   ret = 1;
-   }
+   u32 crc = ~(u32)0;
+   char result[sizeof(crc)];
 
if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
btrfs_err(fs_info, "unsupported checksum algorithm %u",
-   csum_type);
-   ret = 1;
+ csum_type);
+   return 1;
}
 
-   return ret;
+   /*
+* The super_block structure does not span the whole
+* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
+* is filled with zeros and is included in the checksum.
+*/
+   crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
+ crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
+   btrfs_csum_final(crc, result);
+
+   if (memcmp(raw_disk_sb, result, sizeof(result)))
+   return 1;
+
+   return 0;
 }
 
 /*
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: wipe copies of the stale superblock beyond -b size

2018-03-27 Thread Anand Jain



On 03/27/2018 02:19 PM, Nikolay Borisov wrote:



On 27.03.2018 02:50, Anand Jain wrote:




I told you this code can be made a lot simpler, simply by modifying the
last argument passed to zero_dev_clamped. I even posted the resulting
diff which was just 3 lines changed.

I agree that it's a good idea to wipe all available superblock when we
use -b. However I don't agree with your approach - it adds a loop, it
adds a bunch of checks and makes the complexity orders of magnitude
higher than it could be. So I'm asking again - is there any inherent
benefit which I'm missing in your newly added 35 lines of code against
just passing the block device to zero_dev_clamped and letting the
existing logic take care of everything?


I had that idea for v1 as well, but I didn't do it because it would
zero bytenr_copy#2 even when there is no btrfs superblock, (which is
fine only with in block_count).

Some might view it as corrupting the usr data (for which they have
specified -b option?)? I am just discussing I don't have any usecase
to prove it though. Do you have any idea?
If you think it should be ok, I shall go ahead and zero without
checking for the btrfs superblock beyond block_count.


Right, so the pertinent question is - is anyone expecting to do mkfs on
a volume irrespective of the options and expect to be able to recover
data prior to the mkfs? I think the answer is 'no', but let's see what
the community says.


 Writing zero even if there is no superblock is not a question of
 recovering btrfs, its a question whether there was any other good
 data that is intersting to the user, for which they have specified
 -b option.

 Asking community is a good idea, lets take a conservative approach
 to zero stale superblock only if we find a valid superblock, atleast
 for now, we can add cleanups later.

Thanks, Aannd



Thanks, Anand


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] btrfs: verify checksum when superblock is read for scan

2018-03-27 Thread Anand Jain




Test script:

  Corrupt primary superblock and check if device scan and mount
  fails:
  mkfs.btrfs -fq /dev/sdc
  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
  btrfs dev scan
  mount /dev/sdb /btrfs

  Corrupt secondary superblock and check if device scan and mount
  is succcessful, check for the dmesg for errors.
  mkfs.btrfs -fq /dev/sdc
  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
  btrfs dev scan
  mount /dev/sdb /btrfs


Have you considered adding fstests, it will be very easy to test for
this behavior?


 This is one off kind of bug, not sure if it would value add
 for checking it all the time in xfstests?



Same comment as before regarding string literals splitting across lines.


accepted.



+   else
+   pr_err("BTRFS error (device %pg): "\
+   "superblock checksum failed, bytenr=%llu",
+   bdev, bytenr);
+   btrfs_release_disk_super(*page);
+   return err;
+   }


Also it will be better to have the checksum check after we have verified
some basic invariants - that bytenr and magic have sane values.


 accepted.

Thanks,
Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] btrfs: verify checksum for all devices in mount context

2018-03-27 Thread Anand Jain



On 03/27/2018 08:21 PM, Nikolay Borisov wrote:



On 26.03.2018 11:27, Anand Jain wrote:

During mount context, we aren't verifying the superblock checksum
for all the devices, instead, we verify it only for the
struct btrfs_fs_device::latest_bdev. This patch fixes it
by moving the checksum verification code from the function open_ctree()
into the function btrfs_read_dev_one_super().

By doing this now we are verifying the superblock checksum
in the mount-context, device-replace and, device-delete context.


Which call chain provides the device-replace and device-delete contexts?
I think it is worth it documenting them in the changelog.


 Will copy it here from the cover-letter.



Signed-off-by: Anand Jain 


Reviewed-by: Nikolay Borisov 





---
  fs/btrfs/disk-io.c | 35 +--
  1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6299ab18da5f..3cc50041c0b9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2565,24 +2565,6 @@ int open_ctree(struct super_block *sb,
}
  
  	/*

-* We want to check superblock checksum, the type is stored inside.
-* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
-*/
-   err = btrfs_check_super_csum(bh->b_data);
-   if (err) {
-   if (err == -EINVAL)
-   pr_err("BTRFS error (device %pg): "\
-   "unsupported checksum algorithm",
-   fs_devices->latest_bdev);
-   else
-   pr_err("BTRFS error (device %pg): "\
-   "superblock checksum mismatch",
-   fs_devices->latest_bdev);
-   brelse(bh);
-   goto fail_alloc;
-   }
-
-   /*
 * super_copy is zeroed at allocation time and we never touch the
 * following bytes up to INFO_SIZE, the checksum is calculated from
 * the whole block of INFO_SIZE
@@ -3128,6 +3110,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, 
int copy_num,
struct buffer_head *bh;
struct btrfs_super_block *super;
u64 bytenr;
+   int err;
  
  	bytenr = btrfs_sb_offset(copy_num);

if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
@@ -3148,6 +3131,22 @@ int btrfs_read_dev_one_super(struct block_device *bdev, 
int copy_num,
return -EINVAL;
}
  
+	/*

+* Check the superblock checksum, the type is stored inside.
+* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
+*/
+   err = btrfs_check_super_csum(bh->b_data);
+   if (err) {


I am fixing this to if (err < 0) {


+   if (err == -EINVAL)
+   pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm",
+   bdev);
+   else if (err == -EUCLEAN)
+   pr_err("BTRFS error (device %pg): superblock checksum 
mismatch",
+   bdev);


 I am also dropping else if to else in v2.

Thanks, Anand



+   brelse(bh);
+   return err;
+   }
+
*bh_ret = bh;
return 0;
  }


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] btrfs: check if the fsid in the primary sb and copy sb are same

2018-03-27 Thread Anand Jain



On 03/27/2018 04:49 PM, Nikolay Borisov wrote:



On 26.03.2018 11:27, Anand Jain wrote:

During the btrfs dev scan make sure that other copies of superblock
contain the same fsid as the primary SB. So that we bring to the
user notice if the superblock has been overwritten.

  mkfs.btrfs -fq /dev/sdc
  mkfs.btrfs -fq /dev/sdb
  dd if=/dev/sdb of=/dev/sdc count=4K skip=64K seek=64K obs=1 ibs=1
  mount /dev/sdc /btrfs

Caveat: Pls note that older btrfs-progs do not wipe the non-overwriting
stale superblock like copy2 if a smaller mkfs.btrfs -bis created.
So this patch in the kernel will report error. The workaround is to wipe
the superblock manually, like
  dd if=/dev/zero of= seek=274877906944 ibs=1 obs=1 count4K
OR apply btrfs-progs patch
   btrfs-progs: wipe copies of the stale superblock beyond -b size
which shall find and wipe the non overwriting superblock.

Signed-off-by: Anand Jain 
---
  fs/btrfs/volumes.c | 60 ++
  1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ed22f0a3d239..45dd0674571b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1198,40 +1198,74 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
  int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
  struct btrfs_fs_devices **fs_devices_ret)
  {
+   struct btrfs_super_block *disk_super_primary;
struct btrfs_super_block *disk_super;
struct btrfs_device *device;
struct block_device *bdev;
struct page *page;
int ret = 0;
-   u64 bytenr;
+   int i;
  
-	/*

-* we would like to check all the supers, but that would make
-* a btrfs mount succeed after a mkfs from a different FS.
-* So, we need to add a special mount option to scan for
-* later supers, using BTRFS_SUPER_MIRROR_MAX instead
-*/
-   bytenr = btrfs_sb_offset(0);
flags |= FMODE_EXCL;
  
  	bdev = blkdev_get_by_path(path, flags, holder);

if (IS_ERR(bdev))
return PTR_ERR(bdev);
  
-	ret = btrfs_read_disk_super(bdev, bytenr, , _super);

-   if (ret)
+   disk_super_primary = kzalloc(sizeof(*disk_super_primary), GFP_KERNEL);
+   if (!disk_super_primary) {
+   ret = -ENOMEM;
goto error_bdev_put;
+   }
+
+   /*
+* We would like to check all the supers and use one good copy,
+* but that would make a btrfs mount succeed after a mkfs from
+* a different FS.
+* So, we need to add a special mount option to scan for
+* later supers, using BTRFS_SUPER_MIRROR_MAX instead.
+* So, just validate if all copies of the superblocks are ok
+* and have the same fsid.
+*/
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+   u64 bytenr = btrfs_sb_offset(i);
+
+   ret = btrfs_read_disk_super(bdev, bytenr, , _super);
+   if (ret) {
+   if (i == 0)
+   goto error_kfree;
+   /* copy2 is optional */
+   ret = 0;
+   continue;
+   }
+
+   if (i == 0) {
+   memcpy(disk_super_primary, disk_super,
+  sizeof(*disk_super_primary));
+   btrfs_release_disk_super(page);
+   continue;


Doing the memcpy is enough here, the bottom of the loop already releases
the disk page and continues on the next iteration.


 The page map happens inside btrfs_read_disk_super(), we
 need unmap before going for the next superblock.



+   } else if (memcmp(disk_super_primary->fsid, disk_super->fsid,
+  BTRFS_FSID_SIZE)) {
+   pr_err("BTRFS (device %pg): superblock fsid missmatch "\
+   "primary %pU copy%d %pU", bdev,
+   disk_super_primary->fsid, i, disk_super->fsid);
+   ret = -EINVAL;
+   btrfs_release_disk_super(page);
+   goto error_kfree;
+   }
+   btrfs_release_disk_super(page);


I'd say split the "read first sb" from the loop, because alway want to
read it and return an error if it fails. And then have the loop begin at
i = 1 and handle only the possible mirrors of the sb. That would clean
up the nested 'if' in handling the ret.

Also you could introduce another struct *page primary_page where you
read the first super block. That way you save a memcpy + kzalloc but
you'd have to always free it on function exit so I am not sure how much
value it brings in terms of readability.


 Right. Also with this approach I don't have to kzallo() anymore. Will fix.

Thanks, Anand


+   }
  
  	mutex_lock(_mutex);

-   device = 

Re: [PATCH 2/8] btrfs: return required error from btrfs_check_super_csum

2018-03-27 Thread Anand Jain



On 03/28/2018 03:16 AM, David Sterba wrote:

On Tue, Mar 27, 2018 at 11:05:53AM +0300, Nikolay Borisov wrote:

+   if (err) {
+   if (err == -EINVAL)
+   pr_err("BTRFS error (device %pg): "\
+   "unsupported checksum algorithm",
+   fs_devices->latest_bdev);


I don't think strings should be idented. I.e the correct thing here
should be to have only fs_devices->latest_bdev on a new line, even if
the string goes above the char limit of 80. In any case the \ is not
needed due to gcc's support for string literal concatenation:

pr_err("BTRFS error (device %pg): "
"unsupported checksum algorithm",
fs_devices->latest_bdev)

should work equally well. But as I said I don't think this is even
needed. Let's wait and see what David says.


The strings should not be split, because we want to be able to grep the
sources for the error messages. If the string does not fit 80 chars
per line, then it can be moved to the next line and un-indented to the
left. Plenty of examples.


 I understand reason behind string should not be split, but I
 thought here will be an exception, as if you want to search
 for the string you will still use "unsupported checksum algorithm"
 which is still in one line.
 About the grep not able to find the string which are split,
 I look at it as fixing the wrong end of the problem. That is Grep
 end problem being fixes at the c code end. Anyway as its kind of
 linux general guidlines, I shall fix my patch.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HI

2018-03-27 Thread Lucy Boston
-- 
Greeting, once again is me Lucy Boston this is twice am contacting you
please is very urgent respond to me for more details through my.
Email:

dr.lucybos...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] btrfs: use RCU in btrfs_show_devname for device list traversal

2018-03-27 Thread Anand Jain



On 03/28/2018 02:48 AM, David Sterba wrote:

The show_devname callback is used to print device name in
/proc/self/mounts, we need to traverse the device list consistently and
read the name that's copied to a seq buffer so we don't need further
locking.

If the first device is being deleted at the same time, the RCU will
allow us to read the device name, though it will become stale right
after the RCU protection ends. This is unavoidable and the user can
expect that the device will disappear from the filesystem's list at some
point.

The device_list_mutex was pretty heavy as it is used eg. for writing
superblock and a few other IO related contexts. This can stall any
application that reads the proc file for no reason.

Signed-off-by: David Sterba 


Reviewed-by: Anand Jain 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] btrfs: return required error from btrfs_check_super_csum

2018-03-27 Thread David Sterba
On Tue, Mar 27, 2018 at 11:05:53AM +0300, Nikolay Borisov wrote:
> > +   if (err) {
> > +   if (err == -EINVAL)
> > +   pr_err("BTRFS error (device %pg): "\
> > +   "unsupported checksum algorithm",
> > +   fs_devices->latest_bdev);
> 
> I don't think strings should be idented. I.e the correct thing here
> should be to have only fs_devices->latest_bdev on a new line, even if
> the string goes above the char limit of 80. In any case the \ is not
> needed due to gcc's support for string literal concatenation:
> 
> pr_err("BTRFS error (device %pg): "
>"unsupported checksum algorithm",
>fs_devices->latest_bdev)
> 
> should work equally well. But as I said I don't think this is even
> needed. Let's wait and see what David says.

The strings should not be split, because we want to be able to grep the
sources for the error messages. If the string does not fit 80 chars
per line, then it can be moved to the next line and un-indented to the
left. Plenty of examples.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Delayed refs cleanups/streamlining

2018-03-27 Thread David Sterba
On Wed, Mar 21, 2018 at 10:45:04AM +0200, Nikolay Borisov wrote:
> This patchkit aims to simplify and streamline the logic involved in 
> initialising and adding delayed refs/delayed head structures which deal with 
> modification of the extent tree. Currently the logic for init and add was 
> contained in one function for each type of structure. This resulted in very 
> awkward interface with shitloads of arguments, this in turn made it really 
> hard
> to understand the gist of the code. This series rectifies the situation by 
> extracting common parts and using those rather than open coding duplicated
> logic for every type of delayed ref (tree or data ref). 
> 
> The first 5 patches deal with the delayed_ref structs. Each patch is 
> incremental and makes the code bisectable. The last tree factor out the init 
> code for delayed_ref_head into a separate function and begin to use it. The 
> final completely splits the two. 
> 
> The net result is both a cleaner interface as well as somewhat reduced 
> critical section under delayed_refs->lock spinlock. 
> 
> Nikolay Borisov (8):
>   btrfs: Factor out common delayed refs init code
>   btrfs: Use init_delayed_ref_common in add_delayed_tree_ref
>   btrfs: Use init_delayed_ref_common in add_delayed_data_ref
>   btrfs: Open-code add_delayed_tree_ref
>   btrfs: Open-code add_delayed_data_ref
>   btrfs: Introduce init_delayed_ref_head
>   btrfs: Use init_delayed_ref_head in add_delayed_ref_head
>   btrfs: split delayed ref head initialization and addition

I'll add the patchset to for-next testing branch so it can get picked by
the git repo scrapers, but I haven't reviewed nor tested it. There were
some merge conflict with current misc-next (fs_info exists for
add_delayed_data_ref) so I had a chance to look at some patches closely.
The overall impression is good, so the review sholud make sure there are
no typos or similar issues that can arise when moving code around.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] btrfs: use RCU in btrfs_show_devname for device list traversal

2018-03-27 Thread David Sterba
The show_devname callback is used to print device name in
/proc/self/mounts, we need to traverse the device list consistently and
read the name that's copied to a seq buffer so we don't need further
locking.

If the first device is being deleted at the same time, the RCU will
allow us to read the device name, though it will become stale right
after the RCU protection ends. This is unavoidable and the user can
expect that the device will disappear from the filesystem's list at some
point.

The device_list_mutex was pretty heavy as it is used eg. for writing
superblock and a few other IO related contexts. This can stall any
application that reads the proc file for no reason.

Signed-off-by: David Sterba 
---
 fs/btrfs/super.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4b817947e00f..9c18f05b954d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2290,11 +2290,18 @@ static int btrfs_show_devname(struct seq_file *m, 
struct dentry *root)
struct list_head *head;
struct rcu_string *name;
 
-   mutex_lock(_info->fs_devices->device_list_mutex);
+   /*
+* Lightweight locking of the devices. We should not need
+* device_list_mutex here as we only read the device data and the list
+* is protected by RCU.  Even if a device is deleted during the list
+* traversals, we'll get valid data, the freeing callback will wait at
+* least until until the rcu_read_unlock.
+*/
+   rcu_read_lock();
cur_devices = fs_info->fs_devices;
while (cur_devices) {
head = _devices->devices;
-   list_for_each_entry(dev, head, dev_list) {
+   list_for_each_entry_rcu(dev, head, dev_list) {
if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state))
continue;
if (!dev->name)
@@ -2306,14 +2313,12 @@ static int btrfs_show_devname(struct seq_file *m, 
struct dentry *root)
}
 
if (first_dev) {
-   rcu_read_lock();
name = rcu_dereference(first_dev->name);
seq_escape(m, name->str, " \t\n\\");
-   rcu_read_unlock();
} else {
WARN_ON(1);
}
-   mutex_unlock(_info->fs_devices->device_list_mutex);
+   rcu_read_unlock();
return 0;
 }
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] btrfs: split dev-replace locking helpers for read and write

2018-03-27 Thread David Sterba
The current calls are unclear in what way btrfs_dev_replace_lock takes
the locks, so drop the argument, split the helpers and use similar
naming as for read and write locks.

Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 98 +-
 fs/btrfs/dev-replace.h |  6 ++--
 fs/btrfs/reada.c   | 10 +++---
 fs/btrfs/scrub.c   | 14 
 fs/btrfs/volumes.c | 18 +-
 5 files changed, 74 insertions(+), 72 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7efbc4d1128b..91d0815dd514 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -200,13 +200,13 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle 
*trans,
struct btrfs_dev_replace_item *ptr;
struct btrfs_dev_replace *dev_replace = _info->dev_replace;
 
-   btrfs_dev_replace_lock(dev_replace, 0);
+   btrfs_dev_replace_read_lock(dev_replace);
if (!dev_replace->is_valid ||
!dev_replace->item_needs_writeback) {
-   btrfs_dev_replace_unlock(dev_replace, 0);
+   btrfs_dev_replace_read_unlock(dev_replace);
return 0;
}
-   btrfs_dev_replace_unlock(dev_replace, 0);
+   btrfs_dev_replace_read_unlock(dev_replace);
 
key.objectid = 0;
key.type = BTRFS_DEV_REPLACE_KEY;
@@ -264,7 +264,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
ptr = btrfs_item_ptr(eb, path->slots[0],
 struct btrfs_dev_replace_item);
 
-   btrfs_dev_replace_lock(dev_replace, 1);
+   btrfs_dev_replace_write_lock(dev_replace);
if (dev_replace->srcdev)
btrfs_set_dev_replace_src_devid(eb, ptr,
dev_replace->srcdev->devid);
@@ -287,7 +287,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
btrfs_set_dev_replace_cursor_right(eb, ptr,
dev_replace->cursor_right);
dev_replace->item_needs_writeback = 0;
-   btrfs_dev_replace_unlock(dev_replace, 1);
+   btrfs_dev_replace_write_unlock(dev_replace);
 
btrfs_mark_buffer_dirty(eb);
 
@@ -352,7 +352,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
return PTR_ERR(trans);
}
 
-   btrfs_dev_replace_lock(dev_replace, 1);
+   btrfs_dev_replace_write_lock(dev_replace);
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -390,7 +390,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
dev_replace->item_needs_writeback = 1;
atomic64_set(_replace->num_write_errors, 0);
atomic64_set(_replace->num_uncorrectable_read_errors, 0);
-   btrfs_dev_replace_unlock(dev_replace, 1);
+   btrfs_dev_replace_write_unlock(dev_replace);
 
ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
if (ret)
@@ -402,7 +402,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
-   btrfs_dev_replace_lock(dev_replace, 1);
+   btrfs_dev_replace_write_lock(dev_replace);
goto leave;
}
 
@@ -426,7 +426,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 leave:
dev_replace->srcdev = NULL;
dev_replace->tgtdev = NULL;
-   btrfs_dev_replace_unlock(dev_replace, 1);
+   btrfs_dev_replace_write_unlock(dev_replace);
btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
return ret;
 }
@@ -493,18 +493,18 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
/* don't allow cancel or unmount to disturb the finishing procedure */
mutex_lock(_replace->lock_finishing_cancel_unmount);
 
-   btrfs_dev_replace_lock(dev_replace, 0);
+   btrfs_dev_replace_read_lock(dev_replace);
/* was the operation canceled, or is it finished? */
if (dev_replace->replace_state !=
BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED) {
-   btrfs_dev_replace_unlock(dev_replace, 0);
+   btrfs_dev_replace_read_unlock(dev_replace);
mutex_unlock(_replace->lock_finishing_cancel_unmount);
return 0;
}
 
tgt_device = dev_replace->tgtdev;
src_device = dev_replace->srcdev;
-   btrfs_dev_replace_unlock(dev_replace, 0);
+   btrfs_dev_replace_read_unlock(dev_replace);
 
/*
 * flush all outstanding I/O and inode extent mappings before the
@@ -529,7 +529,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
/* keep away write_all_supers() during the finishing procedure */
mutex_lock(_info->fs_devices->device_list_mutex);
mutex_lock(_info->chunk_mutex);
-   

[PATCH 5/6] btrfs: remove stale comments about fs_mutex

2018-03-27 Thread David Sterba
The fs_mutex has been killed in 2008, a213501153fd66e2 ("Btrfs: Replace
the big fs_mutex with a collection of other locks"), still remembered in
some comments.

We don't have any extra needs for locking in the ACL handlers.

Signed-off-by: David Sterba 
---
 fs/btrfs/acl.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 1ba49ebe67da..700a3dfd3129 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -65,9 +65,6 @@ struct posix_acl *btrfs_get_acl(struct inode *inode, int type)
return acl;
 }
 
-/*
- * Needs to be called with fs_mutex held
- */
 static int __btrfs_set_acl(struct btrfs_trans_handle *trans,
 struct inode *inode, struct posix_acl *acl, int type)
 {
@@ -127,11 +124,6 @@ int btrfs_set_acl(struct inode *inode, struct posix_acl 
*acl, int type)
return ret;
 }
 
-/*
- * btrfs_init_acl is already generally called under fs_mutex, so the locking
- * stuff has been fixed to work with that.  If the locking stuff changes, we
- * need to re-evaluate the acl locking stuff.
- */
 int btrfs_init_acl(struct btrfs_trans_handle *trans,
   struct inode *inode, struct inode *dir)
 {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] btrfs: update barrier in should_cow_block

2018-03-27 Thread David Sterba
Once there was a simple int force_cow that was used with the plain
barriers, and then converted to a bit, so we should use the appropriate
barrier helper.

Other variables in the complex if condition do not depend on a barrier,
so we should be fine in case the atomic barrier becomes a no-op.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b88a79e69ddf..4c75df2eea2e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1502,8 +1502,8 @@ static inline int should_cow_block(struct 
btrfs_trans_handle *trans,
if (btrfs_is_testing(root->fs_info))
return 0;
 
-   /* ensure we can see the force_cow */
-   smp_rmb();
+   /* Ensure we can see the FORCE_COW bit */
+   smp_mb__before_atomic();
 
/*
 * We do not need to cow a block if
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] btrfs: use lockdep_assert_held for mutexes

2018-03-27 Thread David Sterba
Using lockdep_assert_held is preferred, replace mutex_is_locked.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/scrub.c   |  4 ++--
 fs/btrfs/volumes.c | 10 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c1618ab9fecf..4116fb20b1af 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4542,7 +4542,7 @@ void check_system_chunk(struct btrfs_trans_handle *trans,
 * Needed because we can end up allocating a system chunk and for an
 * atomic and race free space reservation in the chunk block reserve.
 */
-   ASSERT(mutex_is_locked(_info->chunk_mutex));
+   lockdep_assert_held(_info->chunk_mutex);
 
info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM);
spin_lock(>lock);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ec56f33feea9..9cfbccdeacf3 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -371,7 +371,7 @@ static struct full_stripe_lock *insert_full_stripe_lock(
struct full_stripe_lock *entry;
struct full_stripe_lock *ret;
 
-   WARN_ON(!mutex_is_locked(_root->lock));
+   lockdep_assert_held(_root->lock);
 
p = _root->root.rb_node;
while (*p) {
@@ -413,7 +413,7 @@ static struct full_stripe_lock *search_full_stripe_lock(
struct rb_node *node;
struct full_stripe_lock *entry;
 
-   WARN_ON(!mutex_is_locked(_root->lock));
+   lockdep_assert_held(_root->lock);
 
node = locks_root->root.rb_node;
while (node) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b2d05c6b1c56..350c12f1f19e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2047,7 +2047,7 @@ void btrfs_rm_dev_replace_remove_srcdev(struct 
btrfs_fs_info *fs_info,
 {
struct btrfs_fs_devices *fs_devices;
 
-   WARN_ON(!mutex_is_locked(_info->fs_devices->device_list_mutex));
+   lockdep_assert_held(_info->fs_devices->device_list_mutex);
 
/*
 * in case of fs with no seed, srcdev->fs_devices will point
@@ -2237,7 +2237,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info 
*fs_info)
struct btrfs_device *device;
u64 super_flags;
 
-   BUG_ON(!mutex_is_locked(_mutex));
+   lockdep_assert_held(_mutex);
if (!fs_devices->seeding)
return -EINVAL;
 
@@ -2984,7 +2984,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info 
*fs_info, u64 chunk_offset)
 * we release the path used to search the chunk/dev tree and before
 * the current task acquires this mutex and calls us.
 */
-   ASSERT(mutex_is_locked(_info->delete_unused_bgs_mutex));
+   lockdep_assert_held(_info->delete_unused_bgs_mutex);
 
ret = btrfs_can_relocate(fs_info, chunk_offset);
if (ret)
@@ -5068,7 +5068,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 {
u64 chunk_offset;
 
-   ASSERT(mutex_is_locked(_info->chunk_mutex));
+   lockdep_assert_held(_info->chunk_mutex);
chunk_offset = find_next_chunk(fs_info);
return __btrfs_alloc_chunk(trans, chunk_offset, type);
 }
@@ -6618,7 +6618,7 @@ static struct btrfs_fs_devices *open_seed_devices(struct 
btrfs_fs_info *fs_info,
struct btrfs_fs_devices *fs_devices;
int ret;
 
-   BUG_ON(!mutex_is_locked(_mutex));
+   lockdep_assert_held(_mutex);
ASSERT(fsid);
 
fs_devices = fs_info->fs_devices->seed;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] btrfs: use lockdep_assert_held for spinlocks

2018-03-27 Thread David Sterba
Using lockdep_assert_held is preferred, replace assert_spin_locked.

Signed-off-by: David Sterba 
---
 fs/btrfs/delayed-ref.c | 6 +++---
 fs/btrfs/qgroup.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 7ab5e0128f0c..6dcb0128ea7f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -216,7 +216,7 @@ int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans,
struct btrfs_delayed_ref_root *delayed_refs;
 
delayed_refs = >transaction->delayed_refs;
-   assert_spin_locked(_refs->lock);
+   lockdep_assert_held(_refs->lock);
if (mutex_trylock(>mutex))
return 0;
 
@@ -239,7 +239,7 @@ static inline void drop_delayed_ref(struct 
btrfs_trans_handle *trans,
struct btrfs_delayed_ref_head *head,
struct btrfs_delayed_ref_node *ref)
 {
-   assert_spin_locked(>lock);
+   lockdep_assert_held(>lock);
rb_erase(>ref_node, >ref_tree);
RB_CLEAR_NODE(>ref_node);
if (!list_empty(>add_list))
@@ -307,7 +307,7 @@ void btrfs_merge_delayed_refs(struct btrfs_trans_handle 
*trans,
struct rb_node *node;
u64 seq = 0;
 
-   assert_spin_locked(>lock);
+   lockdep_assert_held(>lock);
 
if (RB_EMPTY_ROOT(>ref_tree))
return;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index aa259d6986e1..ad09845b618c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1414,7 +1414,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info 
*fs_info,
struct btrfs_qgroup_extent_record *entry;
u64 bytenr = record->bytenr;
 
-   assert_spin_locked(_refs->lock);
+   lockdep_assert_held(_refs->lock);
trace_btrfs_qgroup_trace_extent(fs_info, record);
 
while (*p) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Btrfs lockdep and locking cleanups

2018-03-27 Thread David Sterba
A few cleanups related to lockdep and locking. The btrfs_show_devname is
a functional change, uses the same mechanism as btrfs_ioctl_dev_info or
btrfs_ioctl_fs_info so that's not a new thing.

David Sterba (6):
  btrfs: use lockdep_assert_held for spinlocks
  btrfs: use lockdep_assert_held for mutexes
  btrfs: update barrier in should_cow_block
  btrfs: use RCU in btrfs_show_devname for device list traversal
  btrfs: remove stale comments about fs_mutex
  btrfs: split dev-replace locking helpers for read and write

 fs/btrfs/acl.c |  8 -
 fs/btrfs/ctree.c   |  4 +--
 fs/btrfs/delayed-ref.c |  6 ++--
 fs/btrfs/dev-replace.c | 98 +-
 fs/btrfs/dev-replace.h |  6 ++--
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/qgroup.c  |  2 +-
 fs/btrfs/reada.c   | 10 +++---
 fs/btrfs/scrub.c   | 18 +-
 fs/btrfs/super.c   | 15 +---
 fs/btrfs/volumes.c | 28 +++
 11 files changed, 98 insertions(+), 99 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] Qgroup metadata reservation rework

2018-03-27 Thread Filipe Manana
On Tue, Mar 27, 2018 at 4:23 PM, David Sterba  wrote:
> On Tue, Mar 27, 2018 at 07:49:02AM +0800, Qu Wenruo wrote:
>> > Merging plan for this patchset from last week was to postpone until
>> > 4.18 due to lack of final testing here. I've tried to run this with
>> > quotas enabled an fstests that led to warnings in the power failure
>> > simulation tests.
>>
>> Which test case?
>
> eg shared/298, the log is then flooded by this message
>
> [153085.406444] [ cut here ]
> [153085.411426] qgroup 5 data reserved space underflow, have 32661504 to free 
> 268435456
> [153085.411584] WARNING: CPU: 7 PID: 22475 at fs/btrfs/qgroup.c:102 
> qgroup_rsv_release+0x1fa/0x210 [btrfs]
> [153085.499574] CPU: 7 PID: 22475 Comm: fio Tainted: GW
> 4.16.0-rc6-1.ge195904-vanilla+ #192
> [153085.509186] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
> [153085.516008] RIP: 0010:qgroup_rsv_release+0x1fa/0x210 [btrfs]
> [153085.521899] RSP: 0018:b1de01bf7c18 EFLAGS: 00010296
> [153085.527367] RAX: 0047 RBX: 928ce4950c00 RCX: 
> 0002
> [153085.534799] RDX: 0007 RSI: 0001 RDI: 
> 0202
> [153085.542236] RBP:  R08:  R09: 
> 
> [153085.549670] R10:  R11: b1de01bf7aa0 R12: 
> 928ce4950c00
> [153085.557106] R13:  R14: 1000 R15: 
> 928c3bb54000
> [153085.564545] FS:  7f90afdc5740() GS:928ce740() 
> knlGS:
> [153085.572945] CS:  0010 DS:  ES:  CR0: 80050033
> [153085.578925] CR2: 0076daa8 CR3: 0001f86e CR4: 
> 06e0
> [153085.586367] Call Trace:
> [153085.589112]  btrfs_qgroup_free_refroot+0x174/0x2c0 [btrfs]
> [153085.594873]  __btrfs_run_delayed_refs+0xb9e/0x17e0 [btrfs]
> [153085.600671]  btrfs_run_delayed_refs+0x10d/0x1c0 [btrfs]
> [153085.606179]  btrfs_commit_transaction+0x30/0xad0 [btrfs]
> [153085.611782]  btrfs_sync_file+0x393/0x4f0 [btrfs]
> [153085.616652]  do_fsync+0x38/0x60
> [153085.620021]  SyS_fsync+0xc/0x10
> [153085.623402]  do_syscall_64+0x71/0x1a0
> [153085.627296]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [153085.632590] RIP: 0033:0x7f90af0942cd
> [153085.636379] RSP: 002b:7ffcb59a4c20 EFLAGS: 0293 ORIG_RAX: 
> 004a
> [153085.644249] RAX: ffda RBX:  RCX: 
> 7f90af0942cd
> [153085.651675] RDX: 1000 RSI: 013e5960 RDI: 
> 0004
> [153085.659111] RBP:  R08: 0008 R09: 
> 9e37fffc0001
> [153085.666549] R10: 0f75e468 R11: 0293 R12: 
> 
> [153085.673985] R13: 0640 R14: 7f90968acfb8 R15: 
> 7f90adad4db0
> [153085.700848] ---[ end trace 48a650f3d217fbac ]---
>
>
> or generic/475

This one is totally unrelated with this patchset, as it happens
without it and without qgroups enabled.
It happens very often actually, it's just some proper cleanup missing
after some fatal error that lead to a transaction abort.

>
> [69595.290236] WARNING: CPU: 1 PID: 3659 at fs/btrfs/extent-tree.c:124 
> btrfs_put_block_group+0x52/0x60 [btrfs]
> [69595.300499] CPU: 1 PID: 3659 Comm: umount Tainted: GW
> 4.16.0-rc6-1.ge195904-vanilla+ #192
> [69595.300502] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
> [69595.300555] RIP: 0010:btrfs_put_block_group+0x52/0x60 [btrfs]
> [69595.300569] RSP: 0018:b1de02d67dd0 EFLAGS: 00010286
> [69595.393015] RAX: 0001 RBX: 928ca4b7dc00 RCX: 
> 0002
> [69595.393018] RDX: 928ca4b7ddd8 RSI: 0001 RDI: 
> 928ca4b7dc00
> [69595.393024] RBP: 928ca4b7dc00 R08: 928cd6c70e10 R09: 
> 
> [69595.420315] R10:  R11: 928cd6c70e10 R12: 
> 928c9c7640d0
> [69595.420318] R13: 928c9c764000 R14: 928c9c764120 R15: 
> 
> [69595.420322] FS:  7ff5e66a2840() GS:928ce680() 
> knlGS:
> [69595.420328] CS:  0010 DS:  ES:  CR0: 80050033
> [69595.449136] CR2: 7efdd6e6b594 CR3: 000211943000 CR4: 
> 06e0
> [69595.449140] Call Trace:
> [69595.449207]  btrfs_free_block_groups+0x164/0x420 [btrfs]
> [69595.464610]  close_ctree+0x127/0x2f0 [btrfs]
> [69595.469046]  generic_shutdown_super+0x69/0x110
> [69595.469054]  kill_anon_super+0xe/0x20
> [69595.469102]  btrfs_kill_super+0x13/0x100 [btrfs]
> [69595.469114]  deactivate_locked_super+0x39/0x70
> [69595.469122]  cleanup_mnt+0x3b/0x70
> [69595.469128]  task_work_run+0x80/0xb0
> [69595.469139]  exit_to_usermode_loop+0x71/0xa0
> [69595.469148]  do_syscall_64+0x157/0x1a0
> [69595.469158]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [69595.469164] RIP: 0033:0x7ff5e5f99d37
> [69595.469167] RSP: 002b:7ffeb054adf8 EFLAGS: 0246 ORIG_RAX: 
> 00a6
> [69595.469172] RAX:  RBX: 

[PATCH] btrfs: lift errors from add_extent_changeset to the callers

2018-03-27 Thread David Sterba
The missing error handling in add_extent_changeset was hidden, so make
it at least visible in the callers.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f27bad003f8e..f141d0fd7a65 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -119,23 +119,22 @@ struct extent_page_data {
unsigned int sync_io:1;
 };
 
-static void add_extent_changeset(struct extent_state *state, unsigned bits,
+static int add_extent_changeset(struct extent_state *state, unsigned bits,
 struct extent_changeset *changeset,
 int set)
 {
int ret;
 
if (!changeset)
-   return;
+   return 0;
if (set && (state->state & bits) == bits)
-   return;
+   return 0;
if (!set && (state->state & bits) == 0)
-   return;
+   return 0;
changeset->bytes_changed += state->end - state->start + 1;
ret = ulist_add(>range_changed, state->start, state->end,
GFP_ATOMIC);
-   /* ENOMEM */
-   BUG_ON(ret < 0);
+   return ret;
 }
 
 static void flush_write_bio(struct extent_page_data *epd);
@@ -527,6 +526,7 @@ static struct extent_state *clear_state_bit(struct 
extent_io_tree *tree,
 {
struct extent_state *next;
unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS;
+   int ret;
 
if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) {
u64 range = state->end - state->start + 1;
@@ -534,7 +534,8 @@ static struct extent_state *clear_state_bit(struct 
extent_io_tree *tree,
tree->dirty_bytes -= range;
}
clear_state_cb(tree, state, bits);
-   add_extent_changeset(state, bits_to_clear, changeset, 0);
+   ret = add_extent_changeset(state, bits_to_clear, changeset, 0);
+   BUG_ON(ret);
state->state &= ~bits_to_clear;
if (wake)
wake_up(>wq);
@@ -805,13 +806,15 @@ static void set_state_bits(struct extent_io_tree *tree,
   unsigned *bits, struct extent_changeset *changeset)
 {
unsigned bits_to_set = *bits & ~EXTENT_CTLBITS;
+   int ret;
 
set_state_cb(tree, state, bits);
if ((bits_to_set & EXTENT_DIRTY) && !(state->state & EXTENT_DIRTY)) {
u64 range = state->end - state->start + 1;
tree->dirty_bytes += range;
}
-   add_extent_changeset(state, bits_to_set, changeset, 1);
+   ret = add_extent_changeset(state, bits_to_set, changeset, 1);
+   BUG_ON(ret);
state->state |= bits_to_set;
 }
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] btrfs fiemap related BUG fix.

2018-03-27 Thread David Sterba
On Wed, Mar 07, 2018 at 04:20:17PM +0800, robbieko wrote:
> From: Robbie Ko 
> 
> This patchset intends to fix btrfs fiemap related bug.
> 
> The fiemap has the following problems:
> 
> 1) Wrong extent count when fm_extent_count is zero.
> 
> 
> 2) SHARED bit is not correct
> I have two ideas, but I do not know which one is the best.
> 
> Like:
>  # dd if=/dev/zero bs=16K count=2 oflag=dsync of=/mnt/btrfs/file
>  # xfs_io -c "fiemap -v" /mnt/btrfs/file
>  /mnt/btrfs/file:
>  EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
>0: [0..63]: 4241424..424148764   0x1
>  # cloner -s $((16*1024)) /mnt/btrfs/file /mnt/btrfs/file_clone
> 
> 1. When any extent is shared in extent map, the entire extent map is shared
>  # xfs_io -c "fiemap -v" /mnt/btrfs/file
>  /mnt/btrfs/file:
>  EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
>0: [0..63]: 4241424..4241487   64  0x2001
> 
> 2. Split into different extent
>  # xfs_io -c "fiemap -v" /mnt/btrfs/file
>  /mnt/btrfs/file:
>  EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
>0: [0..31]: 4241424..4241455   32  0x0
>1: [32..63]:4241456..4241487   32  0x2001
> 
> Robbie Ko (2):
>   Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero
>   Btrfs: fix fiemap extent SHARED flag error with range clone.

There were a lot of comments, I'm not sure I have a clear picture of
what and how is addressed. Can you please update the patches and resend?
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] Qgroup metadata reservation rework

2018-03-27 Thread David Sterba
On Tue, Mar 27, 2018 at 07:49:02AM +0800, Qu Wenruo wrote:
> > Merging plan for this patchset from last week was to postpone until
> > 4.18 due to lack of final testing here. I've tried to run this with
> > quotas enabled an fstests that led to warnings in the power failure
> > simulation tests.
> 
> Which test case?

eg shared/298, the log is then flooded by this message 

[153085.406444] [ cut here ]
[153085.411426] qgroup 5 data reserved space underflow, have 32661504 to free 
268435456
[153085.411584] WARNING: CPU: 7 PID: 22475 at fs/btrfs/qgroup.c:102 
qgroup_rsv_release+0x1fa/0x210 [btrfs]
[153085.499574] CPU: 7 PID: 22475 Comm: fio Tainted: GW
4.16.0-rc6-1.ge195904-vanilla+ #192
[153085.509186] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[153085.516008] RIP: 0010:qgroup_rsv_release+0x1fa/0x210 [btrfs]
[153085.521899] RSP: 0018:b1de01bf7c18 EFLAGS: 00010296
[153085.527367] RAX: 0047 RBX: 928ce4950c00 RCX: 
0002
[153085.534799] RDX: 0007 RSI: 0001 RDI: 
0202
[153085.542236] RBP:  R08:  R09: 

[153085.549670] R10:  R11: b1de01bf7aa0 R12: 
928ce4950c00
[153085.557106] R13:  R14: 1000 R15: 
928c3bb54000
[153085.564545] FS:  7f90afdc5740() GS:928ce740() 
knlGS:
[153085.572945] CS:  0010 DS:  ES:  CR0: 80050033
[153085.578925] CR2: 0076daa8 CR3: 0001f86e CR4: 
06e0
[153085.586367] Call Trace:
[153085.589112]  btrfs_qgroup_free_refroot+0x174/0x2c0 [btrfs]
[153085.594873]  __btrfs_run_delayed_refs+0xb9e/0x17e0 [btrfs]
[153085.600671]  btrfs_run_delayed_refs+0x10d/0x1c0 [btrfs]
[153085.606179]  btrfs_commit_transaction+0x30/0xad0 [btrfs]
[153085.611782]  btrfs_sync_file+0x393/0x4f0 [btrfs]
[153085.616652]  do_fsync+0x38/0x60
[153085.620021]  SyS_fsync+0xc/0x10
[153085.623402]  do_syscall_64+0x71/0x1a0
[153085.627296]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[153085.632590] RIP: 0033:0x7f90af0942cd
[153085.636379] RSP: 002b:7ffcb59a4c20 EFLAGS: 0293 ORIG_RAX: 
004a
[153085.644249] RAX: ffda RBX:  RCX: 
7f90af0942cd
[153085.651675] RDX: 1000 RSI: 013e5960 RDI: 
0004
[153085.659111] RBP:  R08: 0008 R09: 
9e37fffc0001
[153085.666549] R10: 0f75e468 R11: 0293 R12: 

[153085.673985] R13: 0640 R14: 7f90968acfb8 R15: 
7f90adad4db0
[153085.700848] ---[ end trace 48a650f3d217fbac ]---


or generic/475

[69595.290236] WARNING: CPU: 1 PID: 3659 at fs/btrfs/extent-tree.c:124 
btrfs_put_block_group+0x52/0x60 [btrfs]
[69595.300499] CPU: 1 PID: 3659 Comm: umount Tainted: GW
4.16.0-rc6-1.ge195904-vanilla+ #192
[69595.300502] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[69595.300555] RIP: 0010:btrfs_put_block_group+0x52/0x60 [btrfs]
[69595.300569] RSP: 0018:b1de02d67dd0 EFLAGS: 00010286
[69595.393015] RAX: 0001 RBX: 928ca4b7dc00 RCX: 0002
[69595.393018] RDX: 928ca4b7ddd8 RSI: 0001 RDI: 928ca4b7dc00
[69595.393024] RBP: 928ca4b7dc00 R08: 928cd6c70e10 R09: 
[69595.420315] R10:  R11: 928cd6c70e10 R12: 928c9c7640d0
[69595.420318] R13: 928c9c764000 R14: 928c9c764120 R15: 
[69595.420322] FS:  7ff5e66a2840() GS:928ce680() 
knlGS:
[69595.420328] CS:  0010 DS:  ES:  CR0: 80050033
[69595.449136] CR2: 7efdd6e6b594 CR3: 000211943000 CR4: 06e0
[69595.449140] Call Trace:
[69595.449207]  btrfs_free_block_groups+0x164/0x420 [btrfs]
[69595.464610]  close_ctree+0x127/0x2f0 [btrfs]
[69595.469046]  generic_shutdown_super+0x69/0x110
[69595.469054]  kill_anon_super+0xe/0x20
[69595.469102]  btrfs_kill_super+0x13/0x100 [btrfs]
[69595.469114]  deactivate_locked_super+0x39/0x70
[69595.469122]  cleanup_mnt+0x3b/0x70
[69595.469128]  task_work_run+0x80/0xb0
[69595.469139]  exit_to_usermode_loop+0x71/0xa0
[69595.469148]  do_syscall_64+0x157/0x1a0
[69595.469158]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[69595.469164] RIP: 0033:0x7ff5e5f99d37
[69595.469167] RSP: 002b:7ffeb054adf8 EFLAGS: 0246 ORIG_RAX: 
00a6
[69595.469172] RAX:  RBX: 563360af5060 RCX: 7ff5e5f99d37
[69595.469175] RDX: 0001 RSI:  RDI: 563360af52c0
[69595.469178] RBP: 563360af52c0 R08: 7ff5e628b97d R09: 0005
[69595.469181] R10: 563360af5280 R11: 0246 R12: 7ff5e649a1a4
[69595.469183] R13:  R14:  R15: 
[69595.469339] ---[ end trace 48a650f3d217fba6 ]---

[69595.530603] WARNING: CPU: 1 PID: 3659 at fs/btrfs/extent-tree.c:9945 

[PATCH v2 2/2] btrfs: Validate child tree block's level and first key

2018-03-27 Thread Qu Wenruo
We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @level and @first_key to verify the
child tree block.

The new @level check is mandatory and all call sites are already
modified to extract expected level from its call chain.

While @first_key is optional, the following call sites are skipping such
check:
1) Root node/leaf
   As ROOT_ITEM doesn't contain the first key, skip @first_key check.
2) Direct backref
   Only parent bytenr and level is known and we need to resolve the key
   all by ourselves, skip @first_key check.

Another note of this verification is, it needs extra info from nodeptr
or ROOT_ITEM, so it can't fit into current tree-checker framework, which
is limited to node/leaf boundary.

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Make @level check mandatory, suggesed by Jeff and Nikolay.
  Change parameter order as @level is now mandatory, put it in front of
  @first_key.
  Change verify_parent_level() to verify_key_level() to avoid confusion
  on the @level parameter.
  Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging.
---
 fs/btrfs/backref.c |  6 ++--
 fs/btrfs/ctree.c   | 28 +++
 fs/btrfs/disk-io.c | 95 +++---
 fs/btrfs/disk-io.h |  8 +++--
 fs/btrfs/extent-tree.c |  6 +++-
 fs/btrfs/print-tree.c  | 10 --
 fs/btrfs/qgroup.c  |  7 ++--
 fs/btrfs/ref-verify.c  |  7 +++-
 fs/btrfs/relocation.c  | 21 ---
 fs/btrfs/tree-log.c| 28 +--
 10 files changed, 170 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 26484648d090..ec71c7e66cfa 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -738,7 +738,8 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info,
BUG_ON(ref->key_for_search.type);
BUG_ON(!ref->wanted_disk_byte);
 
-   eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0);
+   eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0,
+ref->level - 1, NULL);
if (IS_ERR(eb)) {
free_pref(ref);
return PTR_ERR(eb);
@@ -1291,7 +1292,8 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
ref->level == 0) {
struct extent_buffer *eb;
 
-   eb = read_tree_block(fs_info, ref->parent, 0);
+   eb = read_tree_block(fs_info, ref->parent, 0,
+ref->level, NULL);
if (IS_ERR(eb)) {
ret = PTR_ERR(eb);
goto out;
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b88a79e69ddf..4f1aefccd34a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1418,6 +1418,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
struct tree_mod_root *old_root = NULL;
u64 old_generation = 0;
u64 logical;
+   int level;
 
eb_root = btrfs_read_lock_root_node(root);
tm = __tree_mod_log_oldest_root(fs_info, eb_root, time_seq);
@@ -1428,15 +1429,17 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
old_root = >old_root;
old_generation = tm->generation;
logical = old_root->logical;
+   level = old_root->level;
} else {
logical = eb_root->start;
+   level = btrfs_header_level(eb_root);
}
 
tm = tree_mod_log_search(fs_info, logical, time_seq);
if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
btrfs_tree_read_unlock(eb_root);
free_extent_buffer(eb_root);
-   old = read_tree_block(fs_info, logical, 0);
+   old = read_tree_block(fs_info, logical, 0, level, NULL);
if (WARN_ON(IS_ERR(old) || !extent_buffer_uptodate(old))) {
if (!IS_ERR(old))
free_extent_buffer(old);
@@ -1656,6 +1659,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
btrfs_set_lock_blocking(parent);
 
for (i = start_slot; i <= end_slot; i++) {
+   struct btrfs_key first_key;
int close = 1;
 
btrfs_node_key(parent, _key, i);
@@ -1665,6 +1669,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
progress_passed = 1;
blocknr = 

[PATCH 1/2] btrfs: tests/qgroup: Fix wrong tree backref level

2018-03-27 Thread Qu Wenruo
The extent tree of the test fs is like the following:

 BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 
3919
  item 0 key (4096 168 4096) itemoff 3944 itemsize 51
  extent refs 1 gen 1 flags 2
  tree block key (68719476736 0 0) level 1
   ^^^
  ref#0: tree block backref root 5

And it's using an empty tree for fs tree, so there is no way that its
level can be 1.

For REAL (created by mkfs) fs tree backref with no skinny metadata, the
result should look like:

 item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
 refs 1 gen 4 flags TREE_BLOCK
 tree block key (256 INODE_ITEM 0) level 0
   ^^^
 tree block backref root 5

Fix the level to 0, so it won't break later tree level checker.

Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tests/qgroup-tests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c
index 90204b166643..160eb2fba726 100644
--- a/fs/btrfs/tests/qgroup-tests.c
+++ b/fs/btrfs/tests/qgroup-tests.c
@@ -63,7 +63,7 @@ static int insert_normal_tree_ref(struct btrfs_root *root, 
u64 bytenr,
btrfs_set_extent_generation(leaf, item, 1);
btrfs_set_extent_flags(leaf, item, BTRFS_EXTENT_FLAG_TREE_BLOCK);
block_info = (struct btrfs_tree_block_info *)(item + 1);
-   btrfs_set_tree_block_level(leaf, block_info, 1);
+   btrfs_set_tree_block_level(leaf, block_info, 0);
iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
if (parent > 0) {
btrfs_set_extent_inline_ref_type(leaf, iref,
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] btrfs: Allow rmdir(2) to delete a subvolume

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:30, Misono Tomohiro wrote:
> This patch changes the behavior of rmdir(2) to allow it to delete
> an empty subvolume by default, unless it is not a default subvolume
> and send is not in progress.
> 
> New function btrfs_delete_subvolume() is almost equal to the second half
> of btrfs_ioctl_snap_destroy(). This function requires inode_lock for both
> @dir and inode of @dentry. For rmdir(2) it is already acquired in vfs
> layer before calling btrfs_rmdir().
> 
> Note that while a non-privileged user cannot delete a read-only subvolume
> by "btrfs subvolume delete" when user_subvol_rm_allowd mount option is
> enabled, rmdir(2) can delete an empty read-only subvolume.
> (However, rm -r cannot use for read-only subvolume containing files.)
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  fs/btrfs/inode.c | 143 
> ++-
>  1 file changed, 142 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index db66fa4fede6..b778776eee8e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4387,6 +4387,147 @@ noinline int may_destroy_subvol(struct btrfs_root 
> *root)
>   return ret;
>  }
>  
> +static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
> +{
> + struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
> + struct btrfs_root *root = BTRFS_I(dir)->root;
> + struct inode *inode = d_inode(dentry);
> + struct btrfs_root *dest = BTRFS_I(inode)->root;
> + struct btrfs_trans_handle *trans;
> + struct btrfs_block_rsv block_rsv;
> + u64 root_flags;
> + u64 qgroup_reserved;
> + int ret;
> + int err;
> +
> + /*
> +  * Don't allow to delete a subvolume with send in progress. This is
> +  * inside the i_mutex so the error handling that has to drop the bit
> +  * again is not run concurrently.
> +  */
> + spin_lock(>root_item_lock);
> + root_flags = btrfs_root_flags(>root_item);
> + if (dest->send_in_progress == 0) {
> + btrfs_set_root_flags(>root_item,
> + root_flags | BTRFS_ROOT_SUBVOL_DEAD);
> + spin_unlock(>root_item_lock);
> + } else {
> + spin_unlock(>root_item_lock);
> + btrfs_warn(fs_info,
> +"Attempt to delete subvolume %llu during send",
> +dest->root_key.objectid);
> + err = -EPERM;
> + return err;
> + }
> +
> + down_write(_info->subvol_sem);
> +
> + err = may_destroy_subvol(dest);
> + if (err)
> + goto out_up_write;
> +
> + btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP);
> + /*
> +  * One for dir inode, two for dir entries, two for root
> +  * ref/backref.
> +  */
> + err = btrfs_subvolume_reserve_metadata(root, _rsv,
> +5, _reserved, true);
> + if (err)
> + goto out_up_write;
> +
> + trans = btrfs_start_transaction(root, 0);
> + if (IS_ERR(trans)) {
> + err = PTR_ERR(trans);
> + goto out_release;
> + }
> + trans->block_rsv = _rsv;
> + trans->bytes_reserved = block_rsv.size;
> +
> + btrfs_record_snapshot_destroy(trans, BTRFS_I(dir));
> +
> + ret = btrfs_unlink_subvol(trans, root, dir,
> + dest->root_key.objectid,
> + dentry->d_name.name,
> + dentry->d_name.len);
> + if (ret) {
> + err = ret;
> + btrfs_abort_transaction(trans, ret);
> + goto out_end_trans;
> + }
> +
> + btrfs_record_root_in_trans(trans, dest);
> +
> + memset(>root_item.drop_progress, 0,
> + sizeof(dest->root_item.drop_progress));
> + dest->root_item.drop_level = 0;
> + btrfs_set_root_refs(>root_item, 0);
> +
> + if (!test_and_set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, >state)) {
> + ret = btrfs_insert_orphan_item(trans,
> + fs_info->tree_root,
> + dest->root_key.objectid);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + err = ret;
> + goto out_end_trans;
> + }
> + }
> +
> + ret = btrfs_uuid_tree_rem(trans, fs_info, dest->root_item.uuid,
> +   BTRFS_UUID_KEY_SUBVOL,
> +   dest->root_key.objectid);
> + if (ret && ret != -ENOENT) {
> + btrfs_abort_transaction(trans, ret);
> + err = ret;
> + goto out_end_trans;
> + }
> + if (!btrfs_is_empty_uuid(dest->root_item.received_uuid)) {
> + ret = btrfs_uuid_tree_rem(trans, fs_info,
> +   dest->root_item.received_uuid,
> +   

Re: [PATCH 7/8] btrfs: verify checksum for all devices in mount context

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> During mount context, we aren't verifying the superblock checksum
> for all the devices, instead, we verify it only for the
> struct btrfs_fs_device::latest_bdev. This patch fixes it
> by moving the checksum verification code from the function open_ctree()
> into the function btrfs_read_dev_one_super().
> 
> By doing this now we are verifying the superblock checksum
> in the mount-context, device-replace and, device-delete context.

Which call chain provides the device-replace and device-delete contexts?
I think it is worth it documenting them in the changelog.

> 
> Signed-off-by: Anand Jain 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/disk-io.c | 35 +--
>  1 file changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 6299ab18da5f..3cc50041c0b9 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2565,24 +2565,6 @@ int open_ctree(struct super_block *sb,
>   }
>  
>   /*
> -  * We want to check superblock checksum, the type is stored inside.
> -  * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
> -  */
> - err = btrfs_check_super_csum(bh->b_data);
> - if (err) {
> - if (err == -EINVAL)
> - pr_err("BTRFS error (device %pg): "\
> - "unsupported checksum algorithm",
> - fs_devices->latest_bdev);
> - else
> - pr_err("BTRFS error (device %pg): "\
> - "superblock checksum mismatch",
> - fs_devices->latest_bdev);
> - brelse(bh);
> - goto fail_alloc;
> - }
> -
> - /*
>* super_copy is zeroed at allocation time and we never touch the
>* following bytes up to INFO_SIZE, the checksum is calculated from
>* the whole block of INFO_SIZE
> @@ -3128,6 +3110,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, 
> int copy_num,
>   struct buffer_head *bh;
>   struct btrfs_super_block *super;
>   u64 bytenr;
> + int err;
>  
>   bytenr = btrfs_sb_offset(copy_num);
>   if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
> @@ -3148,6 +3131,22 @@ int btrfs_read_dev_one_super(struct block_device 
> *bdev, int copy_num,
>   return -EINVAL;
>   }
>  
> + /*
> +  * Check the superblock checksum, the type is stored inside.
> +  * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
> +  */
> + err = btrfs_check_super_csum(bh->b_data);
> + if (err) {
> + if (err == -EINVAL)
> + pr_err("BTRFS error (device %pg): unsupported checksum 
> algorithm",
> + bdev);
> + else if (err == -EUCLEAN)
> + pr_err("BTRFS error (device %pg): superblock checksum 
> mismatch",
> + bdev);
> + brelse(bh);
> + return err;
> + }
> +
>   *bh_ret = bh;
>   return 0;
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corruption on Big Endian System

2018-03-27 Thread David Sterba
On Mon, Mar 26, 2018 at 10:00:04AM -0500, Ashu Tiwary wrote:
> It appears my system may have hit the issue reverted here (
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg74621.html
> ) ( [PATCH] Revert "btrfs: use proper endianness accessors for
> super_copy" ); system is an IBM OpenPower 720 (Big Endian) running
> Fedora 27; kernel was at 4.15.9; attempting to reboot after updating
> system (including kernel to 4.15.10) caused system to not be able to
> boot:
> 
> =
>  Mounting /sysroot...
> [   34.644721] BTRFS warning (device dm-3): suspicious: generation <
> chunk_root_generation: 15959351903540740096 < 17261735521269317632
> [   34.644761] BTRFS info (device dm-3): disk space caching is enabled
> [   34.644771] BTRFS info (device dm-3): has skinny extents
> [   34.645925] BTRFS critical (device dm-3): unable to find logical
> 71472550772736 length 65536
> [   34.645941] BTRFS critical (device dm-3): unable to find logical
> 71472550772736 length 65536
> [   34.645952] BTRFS error (device dm-3): failed to read chunk root
> [   34.807156] BTRFS error (device dm-3): open_ctree failed
> [FAILED] Failed to mount /sysroot.
> =
> 
> It appears there is a manual method available to repair the corrupted
> superblock: can that be made available?

Yes, details below.

Quick check from your logs:

chunk_root_generation: 15959351903540740096 = 0xdd7b0200
should be: 162781 = 0x0x27bdd

tree root block pointer: 71472550772736 = 0x4101
should be: 21037056 = 0x141

The tool is available in the branch devel in my repos.  Setup repository
unless you already have one:

$ git clone git://github.com/kdave/btrfs-progs
$ cd btrfs-progs
$ git checkout devel
$ ./autogen.sh
$ ./configure

Build the tool:

$ make btrfs-sb-mod

Use it (replace the path with the real one) and save the output in case
something goes unexpectedly wrong:

device=/path/to/the/device
./btrfs-sb-mod $device root @0
./btrfs-sb-mod $device generation @0
./btrfs-sb-mod $device chunk_root @0
./btrfs-sb-mod $device chunk_root_generation @0
./btrfs-sb-mod $device cache_generation @0
./btrfs-sb-mod $device uuid_tree_generation @0

It prints the current and new values so it's reversible, besides that
the byteswap can be run twice to get back to the starting point.

Then the filesystem should be mountable again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] btrfs: make btrfs_check_super_csum() non-static

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> In preparation to add the superblock csum verification for the
> scan context, make btrfs_check_super_csum() non-static.
> 
> Signed-off-by: Anand Jain 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/disk-io.c | 2 +-
>  fs/btrfs/disk-io.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 8b4e602ed60a..6299ab18da5f 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -395,7 +395,7 @@ static int verify_parent_transid(struct extent_io_tree 
> *io_tree,
>   * Otherwise: -EINVAL  if csum type is not found
>   * -EUCLEAN if csum does not match
>   */
> -static int btrfs_check_super_csum(char *raw_disk_sb)
> +int btrfs_check_super_csum(char *raw_disk_sb)
>  {
>   struct btrfs_super_block *disk_sb =
>   (struct btrfs_super_block *)raw_disk_sb;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 70a88d61b547..c400cc68f913 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -69,6 +69,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
> max_mirrors);
>  struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
>  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>   struct buffer_head **bh_ret);
> +int btrfs_check_super_csum(char *raw_disk_sb);
>  int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>  struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
> struct btrfs_key *location);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] btrfs: verify checksum when superblock is read for scan

2018-03-27 Thread Nikolay Borisov


On 27.03.2018 14:30, Nikolay Borisov wrote:
> 
> 
> On 26.03.2018 11:27, Anand Jain wrote:
>> During the scan context, we aren't verifying the superblock-
>> checksum when read.
>> This patch fixes it by adding the checksum verification function
>> btrfs_check_super_csum() in the function btrfs_read_disk_super().
>> And makes device scan to error fail if the primary superblock csum
>> is wrong, whereas if the copy-superblock csum is wrong it will just
>> just report missmatch and continue mount/scan as usual. When the
>> mount is successful We anyway overwrite all superblocks upon unmount.
>>
>> The context in which this will be called is - device scan, device ready,
>> and mount -o device option.
>>
>> Test script:
>>
>>  Corrupt primary superblock and check if device scan and mount
>>  fails:
>>  mkfs.btrfs -fq /dev/sdc
>>  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
>>  btrfs dev scan
>>  mount /dev/sdb /btrfs
>>
>>  Corrupt secondary superblock and check if device scan and mount
>>  is succcessful, check for the dmesg for errors.
>>  mkfs.btrfs -fq /dev/sdc
>>  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
>>  btrfs dev scan
>>  mount /dev/sdb /btrfs
> 
> Have you considered adding fstests, it will be very easy to test for
> this behavior?
> 
>>
>> Signed-off-by: Anand Jain 
>> ---
>>  fs/btrfs/volumes.c | 15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 45dd0674571b..384e503c83ff 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1149,6 +1149,7 @@ static int btrfs_read_disk_super(struct block_device 
>> *bdev, u64 bytenr,
>>   struct page **page,
>>   struct btrfs_super_block **disk_super)
>>  {
>> +int err;
>>  void *p;
>>  pgoff_t index;
>>  
>> @@ -1177,6 +1178,20 @@ static int btrfs_read_disk_super(struct block_device 
>> *bdev, u64 bytenr,
>>  /* align our pointer to the offset of the super block */
>>  *disk_super = p + (bytenr & ~PAGE_MASK);
>>  
>> +err = btrfs_check_super_csum((char *) *disk_super);
>> +if (err) {
> if (err < 0)
>> +if (err == -EINVAL)
>> +pr_err("BTRFS error (device %pg): "\
>> +"unsupported checksum type, bytenr=%llu",
>> +bdev, bytenr);
> 
> Same comment as before regarding string literals splitting across lines.
> 
>> +else
>> +pr_err("BTRFS error (device %pg): "\
>> +"superblock checksum failed, bytenr=%llu",
>> +bdev, bytenr);
>> +btrfs_release_disk_super(*page);
>> +return err;
>> +}

Also it will be better to have the checksum check after we have verified
some basic invariants - that bytenr and magic have sane values.

>> +
>>  if (btrfs_super_bytenr(*disk_super) != bytenr ||
>>  btrfs_super_magic(*disk_super) != BTRFS_MAGIC) {
>>  btrfs_release_disk_super(*page);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] btrfs: cleanup btrfs_check_super_csum() for better code flow

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> We check the %csum_type twice. Drop one. Check for the csum_type and
> then for the csum. Which also matches with the progs which have better
> logic. This is preparatory patch to get proper error code from
> btrfs_check_super_csum().
> 
> Signed-off-by: Anand Jain 
Reviewed-by: Nikolay Borisov 
> ---
>  fs/btrfs/disk-io.c | 38 +-
>  1 file changed, 17 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 1657d6aa4fa6..b9b435579527 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -399,32 +399,28 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   struct btrfs_super_block *disk_sb =
>   (struct btrfs_super_block *)raw_disk_sb;
>   u16 csum_type = btrfs_super_csum_type(disk_sb);
> - int ret = 0;
> -
> - if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
> - u32 crc = ~(u32)0;
> - char result[sizeof(crc)];
> -
> - /*
> -  * The super_block structure does not span the whole
> -  * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
> -  * is filled with zeros and is included in the checksum.
> -  */
> - crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
> - crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
> - btrfs_csum_final(crc, result);
> -
> - if (memcmp(raw_disk_sb, result, sizeof(result)))
> - ret = 1;
> - }
> + u32 crc = ~(u32)0;
> + char result[sizeof(crc)];
>  
>   if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
>   btrfs_err(fs_info, "unsupported checksum algorithm %u",
> - csum_type);
> - ret = 1;
> +   csum_type);
> + return 1;
>   }
>  
> - return ret;
> + /*
> +  * The super_block structure does not span the whole
> +  * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
> +  * is filled with zeros and is included in the checksum.
> +  */
> + crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
> +   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
> + btrfs_csum_final(crc, result);
> +
> + if (memcmp(raw_disk_sb, result, sizeof(result)))
> + return 1;
> +
> + return 0;
>  }
>  
>  /*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] btrfs: verify checksum when superblock is read for scan

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> During the scan context, we aren't verifying the superblock-
> checksum when read.
> This patch fixes it by adding the checksum verification function
> btrfs_check_super_csum() in the function btrfs_read_disk_super().
> And makes device scan to error fail if the primary superblock csum
> is wrong, whereas if the copy-superblock csum is wrong it will just
> just report missmatch and continue mount/scan as usual. When the
> mount is successful We anyway overwrite all superblocks upon unmount.
> 
> The context in which this will be called is - device scan, device ready,
> and mount -o device option.
> 
> Test script:
> 
>  Corrupt primary superblock and check if device scan and mount
>  fails:
>  mkfs.btrfs -fq /dev/sdc
>  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
>  btrfs dev scan
>  mount /dev/sdb /btrfs
> 
>  Corrupt secondary superblock and check if device scan and mount
>  is succcessful, check for the dmesg for errors.
>  mkfs.btrfs -fq /dev/sdc
>  dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
>  btrfs dev scan
>  mount /dev/sdb /btrfs

Have you considered adding fstests, it will be very easy to test for
this behavior?

> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 45dd0674571b..384e503c83ff 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1149,6 +1149,7 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>struct page **page,
>struct btrfs_super_block **disk_super)
>  {
> + int err;
>   void *p;
>   pgoff_t index;
>  
> @@ -1177,6 +1178,20 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>   /* align our pointer to the offset of the super block */
>   *disk_super = p + (bytenr & ~PAGE_MASK);
>  
> + err = btrfs_check_super_csum((char *) *disk_super);
> + if (err) {
if (err < 0)
> + if (err == -EINVAL)
> + pr_err("BTRFS error (device %pg): "\
> + "unsupported checksum type, bytenr=%llu",
> + bdev, bytenr);

Same comment as before regarding string literals splitting across lines.

> + else
> + pr_err("BTRFS error (device %pg): "\
> + "superblock checksum failed, bytenr=%llu",
> + bdev, bytenr);
> + btrfs_release_disk_super(*page);
> + return err;
> + }
> +
>   if (btrfs_super_bytenr(*disk_super) != bytenr ||
>   btrfs_super_magic(*disk_super) != BTRFS_MAGIC) {
>   btrfs_release_disk_super(*page);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] btrfs: check if the fsid in the primary sb and copy sb are same

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> During the btrfs dev scan make sure that other copies of superblock
> contain the same fsid as the primary SB. So that we bring to the
> user notice if the superblock has been overwritten.
> 
>  mkfs.btrfs -fq /dev/sdc
>  mkfs.btrfs -fq /dev/sdb
>  dd if=/dev/sdb of=/dev/sdc count=4K skip=64K seek=64K obs=1 ibs=1
>  mount /dev/sdc /btrfs
> 
> Caveat: Pls note that older btrfs-progs do not wipe the non-overwriting
> stale superblock like copy2 if a smaller mkfs.btrfs -bis created.
> So this patch in the kernel will report error. The workaround is to wipe
> the superblock manually, like
>  dd if=/dev/zero of= seek=274877906944 ibs=1 obs=1 count4K
> OR apply btrfs-progs patch
>   btrfs-progs: wipe copies of the stale superblock beyond -b size
> which shall find and wipe the non overwriting superblock.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 60 
> ++
>  1 file changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ed22f0a3d239..45dd0674571b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1198,40 +1198,74 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>  int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
> struct btrfs_fs_devices **fs_devices_ret)
>  {
> + struct btrfs_super_block *disk_super_primary;
>   struct btrfs_super_block *disk_super;
>   struct btrfs_device *device;
>   struct block_device *bdev;
>   struct page *page;
>   int ret = 0;
> - u64 bytenr;
> + int i;
>  
> - /*
> -  * we would like to check all the supers, but that would make
> -  * a btrfs mount succeed after a mkfs from a different FS.
> -  * So, we need to add a special mount option to scan for
> -  * later supers, using BTRFS_SUPER_MIRROR_MAX instead
> -  */
> - bytenr = btrfs_sb_offset(0);
>   flags |= FMODE_EXCL;
>  
>   bdev = blkdev_get_by_path(path, flags, holder);
>   if (IS_ERR(bdev))
>   return PTR_ERR(bdev);
>  
> - ret = btrfs_read_disk_super(bdev, bytenr, , _super);
> - if (ret)
> + disk_super_primary = kzalloc(sizeof(*disk_super_primary), GFP_KERNEL);
> + if (!disk_super_primary) {
> + ret = -ENOMEM;
>   goto error_bdev_put;
> + }
> +
> + /*
> +  * We would like to check all the supers and use one good copy,
> +  * but that would make a btrfs mount succeed after a mkfs from
> +  * a different FS.
> +  * So, we need to add a special mount option to scan for
> +  * later supers, using BTRFS_SUPER_MIRROR_MAX instead.
> +  * So, just validate if all copies of the superblocks are ok
> +  * and have the same fsid.
> +  */
> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
> + u64 bytenr = btrfs_sb_offset(i);
> +
> + ret = btrfs_read_disk_super(bdev, bytenr, , _super);
> + if (ret) {
> + if (i == 0)
> + goto error_kfree;
> + /* copy2 is optional */
> + ret = 0;
> + continue;
> + }
> +
> + if (i == 0) {
> + memcpy(disk_super_primary, disk_super,
> +sizeof(*disk_super_primary));
> + btrfs_release_disk_super(page);
> + continue;

Doing the memcpy is enough here, the bottom of the loop already releases
the disk page and continues on the next iteration.

> + } else if (memcmp(disk_super_primary->fsid, disk_super->fsid,
> +BTRFS_FSID_SIZE)) {
> + pr_err("BTRFS (device %pg): superblock fsid missmatch "\
> + "primary %pU copy%d %pU", bdev,
> + disk_super_primary->fsid, i, disk_super->fsid);
> + ret = -EINVAL;
> + btrfs_release_disk_super(page);
> + goto error_kfree;
> + }
> + btrfs_release_disk_super(page);

I'd say split the "read first sb" from the loop, because alway want to
read it and return an error if it fails. And then have the loop begin at
i = 1 and handle only the possible mirrors of the sb. That would clean
up the nested 'if' in handling the ret.

Also you could introduce another struct *page primary_page where you
read the first super block. That way you save a memcpy + kzalloc but
you'd have to always free it on function exit so I am not sure how much
value it brings in terms of readability.

> + }
>  
>   mutex_lock(_mutex);
> - device = device_list_add(path, disk_super);
> + device = device_list_add(path, disk_super_primary);
>   if (IS_ERR(device))
>   ret = PTR_ERR(device);
>   else
>  

[PATCH] fstests: test btrfs fsync after hole punching with no-holes mode

2018-03-27 Thread fdmanana
From: Filipe Manana 

Test that when we have the no-holes mode enabled and a specific metadata
layout, if we punch a hole and fsync the file, at replay time the whole
hole was preserved.

This issue is fixed by the following btrfs patch for the linux kernel:

  "Btrfs: fix fsync after hole punching when using no-holes feature"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/159 | 100 
 tests/btrfs/159.out |   5 +++
 tests/btrfs/group   |   1 +
 3 files changed, 106 insertions(+)
 create mode 100755 tests/btrfs/159
 create mode 100644 tests/btrfs/159.out

diff --git a/tests/btrfs/159 b/tests/btrfs/159
new file mode 100755
index ..6083975a
--- /dev/null
+++ b/tests/btrfs/159
@@ -0,0 +1,100 @@
+#! /bin/bash
+# FSQA Test No. 159
+#
+# Test that when we have the no-holes mode enabled and a specific metadata
+# layout, if we punch a hole and fsync the file, at replay time the whole
+# hole was preserved.
+#
+#---
+#
+# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_dm_target flakey
+_require_xfs_io_command "fpunch"
+
+rm -f $seqres.full
+
+# We create the filesystem with a node size of 64Kb because we need to create a
+# specific metadata layout in order to trigger the bug we are testing. At the
+# moment the node size can not be smaller then the system's page size, so given
+# that the largest possible page size is 64Kb and by default the node size is
+# set to the system's page size value, we explictly create a filesystem with a
+# 64Kb node size.
+_scratch_mkfs -O no-holes -n $((64 * 1024)) >>$seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_init_flakey
+_mount_flakey
+
+# Create our test file with 831 extents of 256Kb each. Before each extent, 
there
+# is a 256Kb hole (except for the first extent, which starts at offset 0). This
+# creates two leafs in the filesystem tree, with the extent at offset 216530944
+# being the last item in the first leaf and the extent at offset 217055232 
being
+# the first item in the second leaf.
+for ((i = 0; i <= 831; i++)); do
+   offset=$((i * 2 * 256 * 1024))
+   $XFS_IO_PROG -f -c "pwrite -S 0xab -b 256K $offset 256K" \
+   $SCRATCH_MNT/foobar >/dev/null
+done
+
+# Now persist everything done so far.
+sync
+
+# Now punch a hole that covers part of the extent at offset 216530944.
+$XFS_IO_PROG -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" \
+-c "fsync" \
+$SCRATCH_MNT/foobar
+
+echo "File digest before power failure:"
+md5sum $SCRATCH_MNT/foobar | _filter_scratch
+
+# Simulate a power failure and mount the filesystem to check that replaying the
+# fsync log/journal succeeds and our test file has the expected content.
+_flakey_drop_and_remount
+
+echo "File digest after power failure and log replay:"
+md5sum $SCRATCH_MNT/foobar | _filter_scratch
+
+_unmount_flakey
+_cleanup_flakey
+
+status=0
+exit
diff --git a/tests/btrfs/159.out b/tests/btrfs/159.out
new file mode 100644
index ..3317e516
--- /dev/null
+++ b/tests/btrfs/159.out
@@ -0,0 +1,5 @@
+QA output created by 159
+File digest before power failure:
+c5c0a13588a639529c979c57c336f441  SCRATCH_MNT/foobar
+File digest after power failure and log replay:
+c5c0a13588a639529c979c57c336f441  SCRATCH_MNT/foobar
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 8007e07e..ba766f6b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -161,3 +161,4 @@
 156 auto quick trim
 157 auto quick raid
 158 auto quick raid scrub
+159 auto quick
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to 

[PATCH 2/2] Btrfs: fix copy_items() return value when logging an inode

2018-03-27 Thread fdmanana
From: Filipe Manana 

When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the caller passed was
released and is now different from what is was before, and the caller
expects a return value of 0 to mean both success and that the path
has not changed, while a return value of 1 means both success and
signals the caller that it can not reuse the path, it has to perform
another tree search.

Even though this is a case that should not be triggered on normal
circumstances or very rare at least, its consequences can be very
unpredictable (especially when replaying a log tree).

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/tree-log.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 1d738ff5c41b..abd9fd81cd1c 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3960,6 +3960,7 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
ASSERT(ret == 0);
src = src_path->nodes[0];
i = 0;
+   need_find_last_extent = true;
}
 
btrfs_item_key_to_cpu(src, , i);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: fix fsync after hole punching when using no-holes feature

2018-03-27 Thread fdmanana
From: Filipe Manana 

When we have the no-holes mode enabled and fsync a file after punching a
hole in it, we can end up not logging the whole hole range in the log tree.
This happens if the file has extent items that span more than one leaf and
we punch a hole that covers a range that starts in a leaf but does not go
beyond the offset of the first extent in the next leaf.

Example:

  $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
  $ mount /dev/sdb /mnt
  $ for ((i = 0; i <= 831; i++)); do
offset=$((i * 2 * 256 * 1024))
xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
/mnt/foobar >/dev/null
done
  $ sync

  # We now have 2 leafs in our filesystem fs tree, the first leaf has an
  # item corresponding the extent at file offset 216530944 and the second
  # leaf has a first item corresponding to the extent at offset 217055232.
  # Now we punch a hole that partially covers the range of the extent at
  # offset 216530944 but does go beyond the offset 217055232.

  $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
  $ xfs_io -c "fsync" /mnt/foobar

  

  # mount to replay the log
  $ mount /dev/sdb /mnt

  # Before this patch, only the subrange [216658016, 216662016[ (length of
  # 4000 bytes) was logged, leaving an incorrect file layout after log
  # replay.

Fix this by checking if there is a hole between the last extent item that
we processed and the first extent item in the next leaf, and if there is
one, log an explicit hole extent item.

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/tree-log.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index fd573816f461..1d738ff5c41b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3994,6 +3994,36 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
break;
*last_extent = extent_end;
}
+
+   /*
+* Check if there is a hole between the last extent found in our leaf
+* and the first extent in the next leaf. If there is one, we need to
+* log an explicit hole so that at replay time we can punch the hole.
+*/
+   if (ret == 0 &&
+   key.objectid == btrfs_ino(inode) &&
+   key.type == BTRFS_EXTENT_DATA_KEY &&
+   i == btrfs_header_nritems(src_path->nodes[0])) {
+   ret = btrfs_next_leaf(inode->root, src_path);
+   need_find_last_extent = true;
+   if (ret > 0) {
+   ret = 0;
+   } else if (ret == 0) {
+   btrfs_item_key_to_cpu(src_path->nodes[0], ,
+ src_path->slots[0]);
+   if (key.objectid == btrfs_ino(inode) &&
+   key.type == BTRFS_EXTENT_DATA_KEY &&
+   *last_extent < key.offset) {
+   const u64 len = key.offset - *last_extent;
+
+   ret = btrfs_insert_file_extent(trans, log,
+  btrfs_ino(inode),
+  *last_extent, 0,
+  0, len, 0, len,
+  0, 0, 0);
+   }
+   }
+   }
/*
 * Need to let the callers know we dropped the path so they should
 * re-search.
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] btrfs: cleanup btrfs_read_disk_super() to return std error

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> The only caller btrfs_scan_one_device() sets -EINVAL for error from
> btrfs_read_disk_super(), so this patch returns -EINVAL from the latter
> function.
> 
> A preparatory patch to add csum check in btrfs_read_disk_super().
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 87d183accdb2..ed22f0a3d239 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1154,23 +1154,23 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>  
>   /* make sure our super fits in the device */
>   if (bytenr + PAGE_SIZE >= i_size_read(bdev->bd_inode))
> - return 1;
> + return -EINVAL;
>  
>   /* make sure our super fits in the page */
>   if (sizeof(**disk_super) > PAGE_SIZE)
> - return 1;
> + return -EINVAL;
>  
>   /* make sure our super doesn't straddle pages on disk */
>   index = bytenr >> PAGE_SHIFT;
>   if ((bytenr + sizeof(**disk_super) - 1) >> PAGE_SHIFT != index)
> - return 1;
> + return -EINVAL;
>  
>   /* pull in the page with our super */
>   *page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
>  index, GFP_KERNEL);
>  
>   if (IS_ERR_OR_NULL(*page))
> - return 1;
> + return -EINVAL;
>  
>   p = kmap(*page);
>  
> @@ -1180,7 +1180,7 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>   if (btrfs_super_bytenr(*disk_super) != bytenr ||
>   btrfs_super_magic(*disk_super) != BTRFS_MAGIC) {
>   btrfs_release_disk_super(*page);
> - return 1;
> + return -EINVAL;
>   }
>  
>   if ((*disk_super)->label[0] &&
> @@ -1218,10 +1218,9 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   if (IS_ERR(bdev))
>   return PTR_ERR(bdev);
>  
> - if (btrfs_read_disk_super(bdev, bytenr, , _super)) {
> - ret = -EINVAL;
> + ret = btrfs_read_disk_super(bdev, bytenr, , _super);
> + if (ret)

I'd rather have explicit ret < 0 check.

>   goto error_bdev_put;
> - }
>  
>   mutex_lock(_mutex);
>   device = device_list_add(path, disk_super);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] btrfs: return required error from btrfs_check_super_csum

2018-03-27 Thread Nikolay Borisov


On 26.03.2018 11:27, Anand Jain wrote:
> Return the required -EINVAL and -EUCLEAN from the function
> btrfs_check_super_csum(). And more the error log into the
> parent function.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c | 27 ---
>  1 file changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b9b435579527..8b4e602ed60a 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -392,9 +392,10 @@ static int verify_parent_transid(struct extent_io_tree 
> *io_tree,
>  /*
>   * Return 0 if the superblock checksum type matches the checksum value of 
> that
>   * algorithm. Pass the raw disk superblock data.
> + * Otherwise: -EINVAL  if csum type is not found
> + * -EUCLEAN if csum does not match
>   */
> -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
> -   char *raw_disk_sb)
> +static int btrfs_check_super_csum(char *raw_disk_sb)
>  {
>   struct btrfs_super_block *disk_sb =
>   (struct btrfs_super_block *)raw_disk_sb;
> @@ -402,11 +403,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   u32 crc = ~(u32)0;
>   char result[sizeof(crc)];
>  
> - if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
> - btrfs_err(fs_info, "unsupported checksum algorithm %u",
> -   csum_type);
> - return 1;
> - }
> + if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes))
> + return -EINVAL;
>  
>   /*
>* The super_block structure does not span the whole
> @@ -418,7 +416,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   btrfs_csum_final(crc, result);
>  
>   if (memcmp(raw_disk_sb, result, sizeof(result)))
> - return 1;
> + return -EUCLEAN;
>  
>   return 0;
>  }
> @@ -2570,9 +2568,16 @@ int open_ctree(struct super_block *sb,
>* We want to check superblock checksum, the type is stored inside.
>* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
>*/
> - if (btrfs_check_super_csum(fs_info, bh->b_data)) {
> - btrfs_err(fs_info, "superblock checksum mismatch");
> - err = -EINVAL;
> + err = btrfs_check_super_csum(bh->b_data);
> + if (err) {
> + if (err == -EINVAL)
> + pr_err("BTRFS error (device %pg): "\
> + "unsupported checksum algorithm",
> + fs_devices->latest_bdev);

I don't think strings should be idented. I.e the correct thing here
should be to have only fs_devices->latest_bdev on a new line, even if
the string goes above the char limit of 80. In any case the \ is not
needed due to gcc's support for string literal concatenation:

pr_err("BTRFS error (device %pg): "
   "unsupported checksum algorithm",
   fs_devices->latest_bdev)

should work equally well. But as I said I don't think this is even
needed. Let's wait and see what David says.

> + else
> + pr_err("BTRFS error (device %pg): "\
> + "superblock checksum mismatch",
> + fs_devices->latest_bdev);
>   brelse(bh);
>   goto fail_alloc;
>   }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] btrfs-progs: print-tree: Use macro to replace immediate number for readable flags string buffer length

2018-03-27 Thread Nikolay Borisov


On 27.03.2018 09:04, Qu Wenruo wrote:
> In print-tree, we have a lot of parsers to convert numeric flags to
> human readable string.
> 
> For the buffer size we're using immediate numbers for all their callers.
> Change this to macro so it will be much easier for us to expand the
> buffer size.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
> changelog:
> v2:
>   Move all buffer length macros to the header part.
>   Change comment to refer to the buffer length marco for later
>   modication.
> ---
>  print-tree.c | 50 +++---
>  1 file changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index a2f6bfc027c9..064f1bb9ff8c 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -26,6 +26,12 @@
>  #include "print-tree.h"
>  #include "utils.h"
>  
> +#define BG_FLAGS_BUF_LEN 64  /* for bg_flags_to_str() */
> +#define QGROUP_FLAGS_BUF_LEN 32  /* for qgroup_flags_to_str() */
> +#define EXTENT_FLAGS_BUF_LEN 32  /* for extent_flags_to_str() */
> +#define ROOT_FLAGS_BUF_LEN   16  /* for root_flags_to_str() */
> +#define INODE_FLAGS_BUF_LEN  128 /* for inode_flags_to_str() */
> +#define HEADER_FLAGS_BUF_LEN 32  /* for header_flags_to_str() */
>  
>  static void print_dir_item_type(struct extent_buffer *eb,
>  struct btrfs_dir_item *di)
> @@ -137,7 +143,13 @@ static void print_inode_ref_item(struct extent_buffer 
> *eb, u32 size,
>   }
>  }
>  
> -/* Caller should ensure sizeof(*ret)>=21 "DATA|METADATA|RAID10" */
> +/*
> + * Caller should ensure sizeof(*ret)>=21 "DATA|METADATA|RAID10"
> + * Here we bump up to 64 bytes to handle the worst (and invalid) case:
> + * "DATA|METADATA|SYSTEM|RAID0|RAID1|DUP|RAID10|RAID5|RAID6"
> + *
> + * BG_FLAGS_BUF_LEN is ensured to have enough space.
> + */
>  static void bg_flags_to_str(u64 flags, char *ret)
>  {
>   int empty = 1;
> @@ -180,7 +192,11 @@ static void bg_flags_to_str(u64 flags, char *ret)
>   }
>  }
>  
> -/* Caller should ensure sizeof(*ret)>= 26 "OFF|SCANNING|INCONSISTENT" */
> +/*
> + * Caller should ensure sizeof(*ret)>= 26 "OFF|SCANNING|INCONSISTENT"
> + *
> + * QGROUP_FLAGS_BUF_LEN is ensured to have enough space.
> + */
>  static void qgroup_flags_to_str(u64 flags, char *ret)
>  {
>   if (flags & BTRFS_QGROUP_STATUS_FLAG_ON)
> @@ -199,7 +215,7 @@ void print_chunk_item(struct extent_buffer *eb, struct 
> btrfs_chunk *chunk)
>   u16 num_stripes = btrfs_chunk_num_stripes(eb, chunk);
>   int i;
>   u32 chunk_item_size;
> - char chunk_flags_str[32] = {0};
> + char chunk_flags_str[BG_FLAGS_BUF_LEN] = {0};
>  
>   /* The chunk must contain at least one stripe */
>   if (num_stripes < 1) {
> @@ -385,7 +401,11 @@ static void print_file_extent_item(struct extent_buffer 
> *eb,
>   compress_str);
>  }
>  
> -/* Caller should ensure sizeof(*ret) >= 16("DATA|TREE_BLOCK") */
> +/*
> + * Caller should ensure sizeof(*ret) > strlen("DATA|TREE_BLOCK|FULL_BACKREF")
> + *
> + * EXTENT_FLAGS_BUF_LEN is ensured to have enough space.
> + */
>  static void extent_flags_to_str(u64 flags, char *ret)
>  {
>   int empty = 1;
> @@ -420,7 +440,7 @@ void print_extent_item(struct extent_buffer *eb, int 
> slot, int metadata)
>   u32 item_size = btrfs_item_size_nr(eb, slot);
>   u64 flags;
>   u64 offset;
> - char flags_str[32] = {0};
> + char flags_str[EXTENT_FLAGS_BUF_LEN] = {0};
>  
>   if (item_size < sizeof(*ei)) {
>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
> @@ -542,6 +562,8 @@ static int empty_uuid(const u8 *uuid)
>  
>  /*
>   * Caller must ensure sizeof(*ret) >= 7 "RDONLY"
> + *
> + * ROOT_FLAGS_BUF_LEN is ensured to have enough space
>   */
>  static void root_flags_to_str(u64 flags, char *ret)
>  {
> @@ -577,7 +599,7 @@ static void print_root_item(struct extent_buffer *leaf, 
> int slot)
>   struct btrfs_root_item root_item;
>   int len;
>   char uuid_str[BTRFS_UUID_UNPARSED_SIZE];
> - char flags_str[32] = {0};
> + char flags_str[ROOT_FLAGS_BUF_LEN] = {0};
>   struct btrfs_key drop_key;
>  
>   ri = btrfs_item_ptr(leaf, slot, struct btrfs_root_item);
> @@ -887,6 +909,8 @@ static void print_uuid_item(struct extent_buffer *l, 
> unsigned long offset,
>  /*
>   * Caller should ensure sizeof(*ret) >= 102: all charactors plus '|' of
>   * BTRFS_INODE_* flags
> + *
> + * INODE_FLAGS_BUF_LEN is ensured to have enough space
>   */
>  static void inode_flags_to_str(u64 flags, char *ret)
>  {
> @@ -911,7 +935,7 @@ static void inode_flags_to_str(u64 flags, char *ret)
>  static void print_inode_item(struct extent_buffer *eb,
>   struct btrfs_inode_item *ii)
>  {
> - char flags_str[256];
> + char flags_str[INODE_FLAGS_BUF_LEN];
>  
>   memset(flags_str, 0, sizeof(flags_str));
>   inode_flags_to_str(btrfs_inode_flags(eb, ii), flags_str);
> @@ -1002,7 +1026,7 @@ 

[PATCH v2 1/4] btrfs-progs: Remove btrfs-debug-tree command

2018-03-27 Thread Nikolay Borisov
There is already a replacement in the face of btrfs inspect-internal
dump-tree. And this command is just a simple wrapper around it. Just
remove it and adjust the show-blocks script to call the main btrfs
binary to achieve the same effect.

Signed-off-by: Nikolay Borisov 
---

V2 (analogical changes to other patches so changelog is present only in the 
first patch): 
 * Remove tool reference from .gitignore and Documentation/Makefile.in
 * Document that the standalone tools are removed in the STANDALONE section in 
 the documentation.

 .gitignore   |  1 -
 Documentation/Makefile.in|  1 -
 Documentation/btrfs.asciidoc |  3 ++-
 Makefile |  4 +---
 btrfs-debug-tree.c   | 41 -
 show-blocks  |  2 +-
 6 files changed, 4 insertions(+), 48 deletions(-)
 delete mode 100644 btrfs-debug-tree.c

diff --git a/.gitignore b/.gitignore
index 4ec04cb6dcf6..237a45182e8e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -10,7 +10,6 @@ Documentation/*.gz
 Documentation/*.html
 btrfs
 btrfs.static
-btrfs-debug-tree
 btrfs-map-logical
 btrfs-fragments
 btrfsck
diff --git a/Documentation/Makefile.in b/Documentation/Makefile.in
index 64947afb3f6f..0a90f803c020 100644
--- a/Documentation/Makefile.in
+++ b/Documentation/Makefile.in
@@ -103,7 +103,6 @@ install-man: man
$(INSTALL) -m 644 $(GZ_MAN8) $(DESTDIR)$(man8dir)
$(LN_S) -f btrfs-check.8.gz $(DESTDIR)$(man8dir)/btrfsck.8.gz
$(LN_S) -f btrfs-rescue.8.gz $(DESTDIR)$(man8dir)/btrfs-zero-log.8.gz
-   $(LN_S) -f btrfs-inspect-internal.8.gz 
$(DESTDIR)$(man8dir)/btrfs-debug-tree.8.gz
$(LN_S) -f btrfs-inspect-internal.8.gz 
$(DESTDIR)$(man8dir)/btrfs-show-super.8.gz
 
 uninstall:
diff --git a/Documentation/btrfs.asciidoc b/Documentation/btrfs.asciidoc
index 8253023444b6..5b328cd18dbf 100644
--- a/Documentation/btrfs.asciidoc
+++ b/Documentation/btrfs.asciidoc
@@ -120,7 +120,8 @@ long (years) depreciation period.
 
 Deprecated and obsolete tools:
 
-*btrfs-debug-tree*:: moved to *btrfs inspect-internal dump-tree*
+*btrfs-debug-tree*:: moved to *btrfs inspect-internal dump-tree*. Removed from 
+source distribution. 
 *btrfs-show-super*:: moved to *btrfs inspect-internal dump-super*
 *btrfs-zero-log*:: moved to *btrfs rescue zero-log*
 
diff --git a/Makefile b/Makefile
index 6065522a615f..a7726e98947d 100644
--- a/Makefile
+++ b/Makefile
@@ -210,8 +210,7 @@ MAKEOPTS = --no-print-directory Q=$(Q)
 progs = $(progs_install) btrfsck btrfs-corrupt-block
 
 # install only selected
-progs_install = btrfs mkfs.btrfs btrfs-debug-tree \
-   btrfs-map-logical btrfs-image btrfs-zero-log \
+progs_install = btrfs mkfs.btrfs btrfs-map-logical btrfs-image btrfs-zero-log \
btrfs-find-root btrfstune \
btrfs-select-super
 
@@ -229,7 +228,6 @@ endif
 btrfs_convert_cflags = -DBTRFSCONVERT_EXT2=$(BTRFSCONVERT_EXT2)
 btrfs_convert_cflags += -DBTRFSCONVERT_REISERFS=$(BTRFSCONVERT_REISERFS)
 btrfs_fragments_libs = -lgd -lpng -ljpeg -lfreetype
-btrfs_debug_tree_objects = cmds-inspect-dump-tree.o
 btrfs_show_super_objects = cmds-inspect-dump-super.o
 btrfs_calc_size_objects = cmds-inspect-tree-stats.o
 cmds_restore_cflags = -DBTRFSRESTORE_ZSTD=$(BTRFSRESTORE_ZSTD)
diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c
deleted file mode 100644
index 7bee018f604e..
--- a/btrfs-debug-tree.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Copyright (C) 2007 Oracle.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#include "disk-io.h"
-#include "volumes.h"
-#include "utils.h"
-#include "commands.h"
-#include "help.h"
-
-int main(int argc, char **argv)
-{
-   int ret;
-
-   set_argv0(argv);
-
-   if (argc > 1 && !strcmp(argv[1], "--help"))
-   usage(cmd_inspect_dump_tree_usage);
-
-   radix_tree_init();
-
-   ret = cmd_inspect_dump_tree(argc, argv);
-
-   btrfs_close_all_devices();
-
-   return ret;
-}
diff --git a/show-blocks b/show-blocks
index 49e1770e5bd0..3876b2acfa74 100755
--- a/show-blocks
+++ b/show-blocks
@@ -108,7 +108,7 @@ from pylab import *
 return X
 
 def run_debug_tree(device):
-p = os.popen('btrfs-debug-tree -e ' + device)
+p = os.popen('btrfs inspect-internal dump-tree -e ' + device)
 data = loaddata(p)
 return data
 
-- 
2.7.4

--
To 

[PATCH v2 3/4] btrfs-progs: Remove deprecated btrfs-show-super

2018-03-27 Thread Nikolay Borisov
Its function has been superseded by btrfs inspect-internal show-super.
Furthermore the tools is currently not built by default. Just remove
it.

Signed-off-by: Nikolay Borisov 
---
 .gitignore|  1 -
 Documentation/Makefile.in |  1 -
 Documentation/btrfs-select-super.asciidoc |  3 +--
 Documentation/btrfs.asciidoc  |  3 ++-
 Makefile  |  2 +-
 btrfs-show-super.c| 39 ---
 6 files changed, 4 insertions(+), 45 deletions(-)
 delete mode 100644 btrfs-show-super.c

diff --git a/.gitignore b/.gitignore
index c0f85fc448c8..06db51f39512 100644
--- a/.gitignore
+++ b/.gitignore
@@ -27,7 +27,6 @@ btrfs-convert
 btrfs-find-root
 btrfs-find-root.static
 btrfs-image
-btrfs-show-super
 btrfs-corrupt-block
 btrfs-select-super
 btrfs-calc-size
diff --git a/Documentation/Makefile.in b/Documentation/Makefile.in
index ee32a50dc068..184647c41940 100644
--- a/Documentation/Makefile.in
+++ b/Documentation/Makefile.in
@@ -102,7 +102,6 @@ install-man: man
$(MV) $(DESTDIR)$(man5dir)/btrfs-man5.5.gz 
$(DESTDIR)$(man5dir)/btrfs.5.gz
$(INSTALL) -m 644 $(GZ_MAN8) $(DESTDIR)$(man8dir)
$(LN_S) -f btrfs-check.8.gz $(DESTDIR)$(man8dir)/btrfsck.8.gz
-   $(LN_S) -f btrfs-inspect-internal.8.gz 
$(DESTDIR)$(man8dir)/btrfs-show-super.8.gz
 
 uninstall:
cd $(DESTDIR)$(man8dir); rm -f btrfs-check.8.gz $(GZ_MAN8)
diff --git a/Documentation/btrfs-select-super.asciidoc 
b/Documentation/btrfs-select-super.asciidoc
index 6e94a034f482..e3bca98b4f6e 100644
--- a/Documentation/btrfs-select-super.asciidoc
+++ b/Documentation/btrfs-select-super.asciidoc
@@ -21,8 +21,7 @@ The filesystem specified by 'device' must not be mounted.
 NOTE: *Prior to overwriting the primary superblock, please make sure that the 
backup
 copies are valid!*
 
-To dump a superblock use the *btrfs inspect-internal
-dump-super* command, or the obsolete command *btrfs-show-super*.
+To dump a superblock use the *btrfs inspect-internal dump-super* command.
 
 Then run the check (in the non-repair mode) using the command *btrfs check -s*
 where '-s' specifies the superblock copy to use.
diff --git a/Documentation/btrfs.asciidoc b/Documentation/btrfs.asciidoc
index e89400a2e5b5..0c08950b38fd 100644
--- a/Documentation/btrfs.asciidoc
+++ b/Documentation/btrfs.asciidoc
@@ -122,7 +122,8 @@ long (years) depreciation period.
 
 *btrfs-debug-tree*:: moved to *btrfs inspect-internal dump-tree*. Removed from 
 source distribution. 
-*btrfs-show-super*:: moved to *btrfs inspect-internal dump-super*
+*btrfs-show-super*:: moved to *btrfs inspect-internal dump-super*, standalone
+removed.
 *btrfs-zero-log*:: moved to *btrfs rescue zero-log*, standalone removed.
 
 EXIT STATUS
diff --git a/Makefile b/Makefile
index 2a2a76855a26..087289b756ef 100644
--- a/Makefile
+++ b/Makefile
@@ -215,7 +215,7 @@ progs_install = btrfs mkfs.btrfs btrfs-map-logical 
btrfs-image \
btrfs-select-super
 
 # other tools, not built by default
-progs_extra = btrfs-fragments btrfs-calc-size btrfs-show-super
+progs_extra = btrfs-fragments btrfs-calc-size
 
 progs_static = $(foreach p,$(progs),$(p).static)
 
diff --git a/btrfs-show-super.c b/btrfs-show-super.c
deleted file mode 100644
index 4273e42dca3d..
--- a/btrfs-show-super.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Copyright (C) 2012 STRATO AG.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#include "utils.h"
-#include "commands.h"
-#include "help.h"
-
-int main(int argc, char **argv)
-{
-
-   int ret;
-
-   set_argv0(argv);
-
-   warning(
-"\nthe tool has been deprecated, please use 'btrfs inspect-internal 
dump-super' instead\n");
-
-   if (argc > 1 && !strcmp(argv[1], "--help"))
-   usage(cmd_inspect_dump_super_usage);
-
-   ret = cmd_inspect_dump_super(argc, argv);
-
-   return ret;
-}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] btrfs-progs: Remove deprecated btrfs-calc-size tool

2018-03-27 Thread Nikolay Borisov
Its function has been superseded by btrfs inspect-internal tree-stats.
Just remove it.

Signed-off-by: Nikolay Borisov 
---
 .gitignore|  1 -
 Makefile  |  2 +-
 btrfs-calc-size.c | 39 ---
 3 files changed, 1 insertion(+), 41 deletions(-)
 delete mode 100644 btrfs-calc-size.c

diff --git a/.gitignore b/.gitignore
index 06db51f39512..9e4b7520e709 100644
--- a/.gitignore
+++ b/.gitignore
@@ -29,7 +29,6 @@ btrfs-find-root.static
 btrfs-image
 btrfs-corrupt-block
 btrfs-select-super
-btrfs-calc-size
 btrfs-crc
 btrfstune
 mktables
diff --git a/Makefile b/Makefile
index 087289b756ef..051739b8cbc9 100644
--- a/Makefile
+++ b/Makefile
@@ -215,7 +215,7 @@ progs_install = btrfs mkfs.btrfs btrfs-map-logical 
btrfs-image \
btrfs-select-super
 
 # other tools, not built by default
-progs_extra = btrfs-fragments btrfs-calc-size
+progs_extra = btrfs-fragments
 
 progs_static = $(foreach p,$(progs),$(p).static)
 
diff --git a/btrfs-calc-size.c b/btrfs-calc-size.c
deleted file mode 100644
index d2d68ab259eb..
--- a/btrfs-calc-size.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Copyright (C) 2011 Red Hat.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#include "volumes.h"
-#include "utils.h"
-#include "commands.h"
-#include "help.h"
-
-int main(int argc, char **argv)
-{
-   int ret;
-
-   warning(
-"\nthe tool has been deprecated, please use 'btrfs inspect-internal 
tree-stats' instead\n");
-
-   if (argc > 1 && !strcmp(argv[1], "--help"))
-   usage(cmd_inspect_tree_stats_usage);
-
-   ret = cmd_inspect_tree_stats(argc, argv);
-
-   btrfs_close_all_devices();
-
-   return ret;
-}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] btrfs-progs: Remove deprecated btrfs-zero-log standalone tool

2018-03-27 Thread Nikolay Borisov
Its function has been subsumed by "btrfs rescue zero-log". Remove its
source file and adjust make/tests soruces accordingly.

Signed-off-by: Nikolay Borisov 
---
 .gitignore|  1 -
 Documentation/Makefile.in |  1 -
 Documentation/btrfs.asciidoc  |  2 +-
 Makefile  |  4 +-
 btrfs-zero-log.c  | 78 ---
 tests/misc-tests.sh   |  1 -
 tests/misc-tests/003-zero-log/test.sh |  6 +--
 7 files changed, 4 insertions(+), 89 deletions(-)
 delete mode 100644 btrfs-zero-log.c

diff --git a/.gitignore b/.gitignore
index 237a45182e8e..c0f85fc448c8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -28,7 +28,6 @@ btrfs-find-root
 btrfs-find-root.static
 btrfs-image
 btrfs-show-super
-btrfs-zero-log
 btrfs-corrupt-block
 btrfs-select-super
 btrfs-calc-size
diff --git a/Documentation/Makefile.in b/Documentation/Makefile.in
index 0a90f803c020..ee32a50dc068 100644
--- a/Documentation/Makefile.in
+++ b/Documentation/Makefile.in
@@ -102,7 +102,6 @@ install-man: man
$(MV) $(DESTDIR)$(man5dir)/btrfs-man5.5.gz 
$(DESTDIR)$(man5dir)/btrfs.5.gz
$(INSTALL) -m 644 $(GZ_MAN8) $(DESTDIR)$(man8dir)
$(LN_S) -f btrfs-check.8.gz $(DESTDIR)$(man8dir)/btrfsck.8.gz
-   $(LN_S) -f btrfs-rescue.8.gz $(DESTDIR)$(man8dir)/btrfs-zero-log.8.gz
$(LN_S) -f btrfs-inspect-internal.8.gz 
$(DESTDIR)$(man8dir)/btrfs-show-super.8.gz
 
 uninstall:
diff --git a/Documentation/btrfs.asciidoc b/Documentation/btrfs.asciidoc
index 5b328cd18dbf..e89400a2e5b5 100644
--- a/Documentation/btrfs.asciidoc
+++ b/Documentation/btrfs.asciidoc
@@ -123,7 +123,7 @@ long (years) depreciation period.
 *btrfs-debug-tree*:: moved to *btrfs inspect-internal dump-tree*. Removed from 
 source distribution. 
 *btrfs-show-super*:: moved to *btrfs inspect-internal dump-super*
-*btrfs-zero-log*:: moved to *btrfs rescue zero-log*
+*btrfs-zero-log*:: moved to *btrfs rescue zero-log*, standalone removed.
 
 EXIT STATUS
 ---
diff --git a/Makefile b/Makefile
index a7726e98947d..2a2a76855a26 100644
--- a/Makefile
+++ b/Makefile
@@ -210,7 +210,7 @@ MAKEOPTS = --no-print-directory Q=$(Q)
 progs = $(progs_install) btrfsck btrfs-corrupt-block
 
 # install only selected
-progs_install = btrfs mkfs.btrfs btrfs-map-logical btrfs-image btrfs-zero-log \
+progs_install = btrfs mkfs.btrfs btrfs-map-logical btrfs-image \
btrfs-find-root btrfstune \
btrfs-select-super
 
@@ -322,7 +322,7 @@ test-fsck: btrfs btrfs-image btrfs-corrupt-block mkfs.btrfs 
btrfstune
$(Q)bash tests/fsck-tests.sh
 
 test-misc: btrfs btrfs-image btrfs-corrupt-block mkfs.btrfs btrfstune fssum \
-   btrfs-zero-log btrfs-find-root btrfs-select-super btrfs-convert
+   btrfs-find-root btrfs-select-super btrfs-convert
@echo "[TEST]   misc-tests.sh"
$(Q)bash tests/misc-tests.sh
 
diff --git a/btrfs-zero-log.c b/btrfs-zero-log.c
deleted file mode 100644
index 2fce59e98d6e..
--- a/btrfs-zero-log.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * Copyright (C) 2007 Oracle.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#include "kerncompat.h"
-
-#include 
-#include 
-#include "ctree.h"
-#include "disk-io.h"
-#include "transaction.h"
-#include "utils.h"
-#include "help.h"
-
-__attribute__((noreturn)) static void print_usage(void)
-{
-   printf("usage: btrfs-zero-log dev\n");
-   exit(1);
-}
-
-int main(int argc, char **argv)
-{
-   struct btrfs_root *root;
-   struct btrfs_trans_handle *trans;
-   struct btrfs_super_block *sb;
-   int ret;
-
-   set_argv0(argv);
-   if (check_argc_exact(argc - optind, 1))
-   print_usage();
-
-   radix_tree_init();
-
-   printf("WARNING: this utility is deprecated, please use 'btrfs rescue 
zero-log'\n\n");
-
-   if ((ret = check_mounted(argv[optind])) < 0) {
-   fprintf(stderr, "ERROR: could not check mount status: %s\n", 
strerror(-ret));
-   goto out;
-   } else if (ret) {
-   fprintf(stderr, "ERROR: %s is currently mounted\n", 
argv[optind]);
-   ret = -EBUSY;
-   goto out;
-   }
-
-   root = open_ctree(argv[optind], 0, OPEN_CTREE_WRITES | 
OPEN_CTREE_PARTIAL);
-   if (!root) {
- 

Re: [PATCH 00/10] Simplify function interfaces

2018-03-27 Thread Su Yue



On 03/27/2018 03:19 PM, Nikolay Borisov wrote:

A bunch of functions in lowmem mode take an 'ext_ref' parameter only to pass it
down the call chain where it eventually is consumed. Turns out the functions
which actually check the parameter are find_inode_ref and check_inode_item, the
are only passing it down to them. At the same time those functions can get a
reference to fs_info and do the check in them at the appropriate time.

This patchset cleanups the interface of various function by dropping the
ext_ref argument and moving the actual query of the EXTREF feature closer to
where it's being used.

The final patch just changes signature of __btrfs_fs_incompat to better match
the logic of the code and be identical to its kernel counterpart. All in all
this series doesn't introduce any functional changes per-se.

Nikolay Borisov (10):
   btrfs-progs: Drop ext_ref parameter from find_inode_ref
   btrfs-progs: Drop ext_ref param from check_dir_item
   btrfs-progs: Drop ext_ref argument from check_inode_item
   btrfs-progs: Drop unused ext_ref parameter from process_one_leaf
   btrfs-progs: Remove ext_ref param from check_fs_first_inode
   btrfs-progs: Remove ext_ref param from walk_down_tree
   btrfs-progs: Drop ext_ref param from check_fs_first_inode
   btrfs-progs: Drop ext_ref arument from check_fs_root
   btrfs-progs: Remove ext_ref local variable from check_fs_roots_lowmem
   btrfs-progs: Make __btrfs_fs_incompat return bool



Nice work.
For all,
Reviewed-by: Su Yue 


  check/mode-lowmem.c | 63 +++--
  ctree.h |  2 +-
  2 files changed, 28 insertions(+), 37 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] btrfs-progs: Drop ext_ref param from check_dir_item

2018-03-27 Thread Nikolay Borisov
This parameter is no longer used in this function, so drop it

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index a8a2f76549a2..2c7b270a3ffa 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1188,14 +1188,12 @@ static void print_dir_item_err(struct btrfs_root *root, 
struct btrfs_key *key,
  * @key:   the key of the INODE_REF/INODE_EXTREF
  * @path:   the path
  * @size:  the st_size of the INODE_ITEM
- * @ext_ref:   the EXTENDED_IREF feature
  *
  * Return 0 if no error occurred.
  * Return DIR_COUNT_AGAIN if the isize of the inode should be recalculated.
  */
 static int check_dir_item(struct btrfs_root *root, struct btrfs_key *di_key,
- struct btrfs_path *path, u64 *size,
- unsigned int ext_ref)
+ struct btrfs_path *path, u64 *size)
 {
struct btrfs_dir_item *di;
struct btrfs_inode_item *ii;
@@ -2011,7 +2009,7 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path,
imode_to_type(mode), key.objectid,
key.offset);
}
-   ret = check_dir_item(root, , path, , ext_ref);
+   ret = check_dir_item(root, , path, );
err |= ret;
break;
case BTRFS_EXTENT_DATA_KEY:
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] btrfs-progs: Drop ext_ref argument from check_inode_item

2018-03-27 Thread Nikolay Borisov
We can derive this argument from root->fs_info and also make it local
to the sole switch case it's used.

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 2c7b270a3ffa..e26e9bdf3bb3 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1918,13 +1918,10 @@ static bool has_orphan_item(struct btrfs_root *root, 
u64 ino)
  * 2. check inode ref/extref
  * 3. check dir item/index
  *
- * @ext_ref:   the EXTENDED_IREF feature
- *
  * Return 0 if no error occurred.
  * Return >0 for error or hit the traversal is done(by error bitmap)
  */
-static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path,
-   unsigned int ext_ref)
+static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path)
 {
struct extent_buffer *node;
struct btrfs_inode_item *ii;
@@ -1993,6 +1990,9 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path,
err |= ret;
break;
case BTRFS_INODE_EXTREF_KEY:
+   {
+   bool ext_ref = btrfs_fs_incompat(root->fs_info,
+EXTENDED_IREF);
if (key.type == BTRFS_INODE_EXTREF_KEY && !ext_ref)
warning("root %llu EXTREF[%llu %llu] isn't 
supported",
root->objectid, key.objectid,
@@ -2001,6 +2001,7 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path,
 mode);
err |= ret;
break;
+   }
case BTRFS_DIR_ITEM_KEY:
case BTRFS_DIR_INDEX_KEY:
if (!dir) {
@@ -2171,7 +2172,7 @@ static int process_one_leaf(struct btrfs_root *root, 
struct btrfs_path *path,
path->slots[0] = i;
 
 again:
-   err |= check_inode_item(root, path, ext_ref);
+   err |= check_inode_item(root, path);
 
/* modify cur since check_inode_item may change path */
cur = path->nodes[0];
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] btrfs-progs: Remove ext_ref local variable from check_fs_roots_lowmem

2018-03-27 Thread Nikolay Borisov
All real consumers of that variable have inlined the checks since they
are simple enough. So just remove it.

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index eb6ca6453dc5..4f463d4f726f 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4458,13 +4458,10 @@ int check_fs_roots_lowmem(struct btrfs_fs_info *fs_info)
struct btrfs_path path;
struct btrfs_key key;
struct extent_buffer *node;
-   unsigned int ext_ref;
int slot;
int ret;
int err = 0;
 
-   ext_ref = btrfs_fs_incompat(fs_info, EXTENDED_IREF);
-
btrfs_init_path();
key.objectid = BTRFS_FS_TREE_OBJECTID;
key.offset = 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] btrfs-progs: Drop unused ext_ref parameter from process_one_leaf

2018-03-27 Thread Nikolay Borisov
It's no longer used in the function so just remove it

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index e26e9bdf3bb3..34519f30a4aa 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2141,7 +2141,7 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path)
  * Returns 0   No errors found
  */
 static int process_one_leaf(struct btrfs_root *root, struct btrfs_path *path,
-   struct node_refs *nrefs, int *level, int ext_ref)
+   struct node_refs *nrefs, int *level)
 {
struct extent_buffer *cur = path->nodes[0];
struct btrfs_key key;
@@ -4031,7 +4031,7 @@ static int walk_down_tree(struct btrfs_trans_handle 
*trans,
ret = 0;
if (!check_all)
ret = process_one_leaf(root, path, nrefs,
-  level, ext_ref);
+  level);
else
ret = check_leaf_items(trans, root, path,
   nrefs, account_file_data);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] btrfs-progs: Drop ext_ref parameter from find_inode_ref

2018-03-27 Thread Nikolay Borisov
This is a boolean parameter which signals whether the fs has the
EXTENDED_IREF feature flag toggled or not. Since a reference to fs_info
can be obtained there is no need to pollute the interface.

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index dac3201b7d99..a8a2f76549a2 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -921,14 +921,13 @@ static int check_inode_extref(struct btrfs_root *root,
  * @namelen:   the length of name in the INODE_REF/INODE_EXTREF
  * @index_ret: the index in the INODE_REF/INODE_EXTREF,
  *  value (64)-1 means do not check index
- * @ext_ref:   the EXTENDED_IREF feature
  *
  * Return 0 if no error occurred.
  * Return >0 for error bitmap
  */
 static int find_inode_ref(struct btrfs_root *root, struct btrfs_key *key,
- char *name, int namelen, u64 *index_ret,
- unsigned int ext_ref)
+ char *name, int namelen, u64 *index_ret)
+
 {
struct btrfs_path path;
struct btrfs_inode_ref *ref;
@@ -1001,8 +1000,9 @@ static int find_inode_ref(struct btrfs_root *root, struct 
btrfs_key *key,
}
 
 extref:
+
/* Skip if not support EXTENDED_IREF feature */
-   if (!ext_ref)
+   if (!btrfs_fs_incompat(root->fs_info, EXTENDED_IREF))
goto out;
 
btrfs_release_path();
@@ -1311,8 +1311,7 @@ static int check_dir_item(struct btrfs_root *root, struct 
btrfs_key *di_key,
key.objectid = location.objectid;
key.type = BTRFS_INODE_REF_KEY;
key.offset = di_key->objectid;
-   tmp_err |= find_inode_ref(root, , namebuf, len,
- , ext_ref);
+   tmp_err |= find_inode_ref(root, , namebuf, len, );
 
/* check relative INDEX/ITEM */
key.objectid = di_key->objectid;
@@ -4241,7 +4240,7 @@ static int check_fs_first_inode(struct btrfs_root *root, 
unsigned int ext_ref)
/* special index value */
index = 0;
 
-   ret = find_inode_ref(root, , "..", strlen(".."), , ext_ref);
+   ret = find_inode_ref(root, , "..", strlen(".."), );
if (ret < 0)
goto out;
err |= ret;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] btrfs-progs: Drop ext_ref param from check_fs_first_inode

2018-03-27 Thread Nikolay Borisov
It's no longer used in that function so can be dropped altogether.

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 76365a214e34..daa088c056b6 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4263,7 +4263,6 @@ static int check_fs_first_inode(struct btrfs_root *root)
  * blocks and integrity of fs tree items.
  *
  * @root: the root of the tree to be checked.
- * @ext_ref   feature EXTENDED_IREF is enable or not.
  * @account   if NOT 0 means check the tree (including tree)'s treeblocks.
  *otherwise means check fs tree(s) items relationship and
  *   @root MUST be a fs tree root.
@@ -4271,8 +4270,7 @@ static int check_fs_first_inode(struct btrfs_root *root)
  * Returns not 0  represents error.
  */
 static int check_btrfs_root(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root, unsigned int ext_ref,
-   int check_all)
+   struct btrfs_root *root, int check_all)
 {
struct btrfs_path path;
struct node_refs nrefs;
@@ -4352,7 +4350,7 @@ static int check_btrfs_root(struct btrfs_trans_handle 
*trans,
 static int check_fs_root(struct btrfs_root *root, unsigned int ext_ref)
 {
reset_cached_block_groups(root->fs_info);
-   return check_btrfs_root(NULL, root, ext_ref, 0);
+   return check_btrfs_root(NULL, root, 0);
 }
 
 /*
@@ -4556,11 +4554,11 @@ int check_chunks_and_extents_lowmem(struct 
btrfs_fs_info *fs_info)
}
 
root1 = root->fs_info->chunk_root;
-   ret = check_btrfs_root(trans, root1, 0, 1);
+   ret = check_btrfs_root(trans, root1, 1);
err |= ret;
 
root1 = root->fs_info->tree_root;
-   ret = check_btrfs_root(trans, root1, 0, 1);
+   ret = check_btrfs_root(trans, root1, 1);
err |= ret;
 
btrfs_init_path();
@@ -4591,7 +4589,7 @@ int check_chunks_and_extents_lowmem(struct btrfs_fs_info 
*fs_info)
goto next;
}
 
-   ret = check_btrfs_root(trans, cur_root, 0, 1);
+   ret = check_btrfs_root(trans, cur_root, 1);
err |= ret;
 
if (key.objectid == BTRFS_TREE_RELOC_OBJECTID)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] btrfs-progs: Remove ext_ref param from check_fs_first_inode

2018-03-27 Thread Nikolay Borisov
It's no longer use and can be dropped.

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 34519f30a4aa..29c55e202f2b 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4198,7 +4198,7 @@ static int repair_fs_first_inode(struct btrfs_root *root, 
int err)
  * returns >0 means error
  * returns <0 means fatal error
  */
-static int check_fs_first_inode(struct btrfs_root *root, unsigned int ext_ref)
+static int check_fs_first_inode(struct btrfs_root *root)
 {
struct btrfs_path path;
struct btrfs_key key;
@@ -4290,7 +4290,7 @@ static int check_btrfs_root(struct btrfs_trans_handle 
*trans,
 * the first inode item in the leaf, if inode item (256) is
 * missing we will skip it forever.
 */
-   ret = check_fs_first_inode(root, ext_ref);
+   ret = check_fs_first_inode(root);
if (ret < 0)
return ret;
}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] btrfs-progs: Make __btrfs_fs_incompat return bool

2018-03-27 Thread Nikolay Borisov
First this function does the '!!' trick to turn its value into a bool,
yet the return type is int. Second, the kernel counterpart also returns
bool.

Signed-off-by: Nikolay Borisov 
---
 ctree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ctree.h b/ctree.h
index fa861ba0b4c3..1fafe7f90837 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2460,7 +2460,7 @@ static inline u32 btrfs_file_extent_inline_len(struct 
extent_buffer *eb,
 #define btrfs_fs_incompat(fs_info, opt) \
__btrfs_fs_incompat((fs_info), BTRFS_FEATURE_INCOMPAT_##opt)
 
-static inline int __btrfs_fs_incompat(struct btrfs_fs_info *fs_info, u64 flag)
+static inline bool __btrfs_fs_incompat(struct btrfs_fs_info *fs_info, u64 flag)
 {
struct btrfs_super_block *disk_super;
disk_super = fs_info->super_copy;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] btrfs-progs: Remove ext_ref param from walk_down_tree

2018-03-27 Thread Nikolay Borisov
It's no longer used so just remove it.

Signed-off-by: Nikolay Borisov 
---
 check/mode-lowmem.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 29c55e202f2b..76365a214e34 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -3959,8 +3959,7 @@ static int check_leaf_items(struct btrfs_trans_handle 
*trans,
  */
 static int walk_down_tree(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, struct btrfs_path *path,
- int *level, struct node_refs *nrefs, int ext_ref,
- int check_all)
+ int *level, struct node_refs *nrefs, int check_all)
 {
enum btrfs_tree_block_status status;
u64 bytenr;
@@ -4318,7 +4317,7 @@ static int check_btrfs_root(struct btrfs_trans_handle 
*trans,
 
while (1) {
ret = walk_down_tree(trans, root, , , ,
-ext_ref, check_all);
+check_all);
 
err |= !!ret;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/10] btrfs-progs: tests: add testcase for undelete-subvol

2018-03-27 Thread Lu Fengqi
The testcase checks the functionality of "btrfs rescue undelete-subvol",
including recovering an intact subvolume, and handling correctly
incomplete subvolume.

Signed-off-by: Lu Fengqi 
---
v2: add shell quoting to test script.

 .../030-undelete-subvol/deleted_subvolume.img  | Bin 0 -> 4096 bytes
 .../030-undelete-subvol/drop_progress.raw.xz   | Bin 0 -> 23452 bytes
 tests/misc-tests/030-undelete-subvol/test.sh   |  34 +
 3 files changed, 34 insertions(+)
 create mode 100644 tests/misc-tests/030-undelete-subvol/deleted_subvolume.img
 create mode 100644 tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz
 create mode 100755 tests/misc-tests/030-undelete-subvol/test.sh

diff --git a/tests/misc-tests/030-undelete-subvol/deleted_subvolume.img 
b/tests/misc-tests/030-undelete-subvol/deleted_subvolume.img
new file mode 100644
index 
..9168a8e33fbbb14f9ea78514721a52191b94d1a0
GIT binary patch
literal 4096
zcmeH|do+|=AIB%x8p^0djl0}JQb@$O%TY$;lEx)vlF+ymV;WO45{D?%IqnhB$S}Fi
zjN6EihKkC4NQhyKaT(V$2k-mO={@KF_pD{Fy}r-)`R-@^*504B_p@W+PlCQF!fF8j
zZGs!m9b0|F@NPJOvGJw?_&4=#{gw~i3;@(EFS(@+TiUf5-~Erz<=ODYja#|MmTnGi
zw`~I31pbc*g!Km3O1HeZD~KRV`{Pj~uzt3r2|UgTH}a
zWDkSP-%;<0K6oq%LMk0=5)g}2I{;`k(RwtnGuI_hB01WRSAongy!#b`_p*d$38#+;
zcYI;9gzxul-@Q0?=XZCRp1a_O(t&`~#!SFI<>VfKQp$%5D%q@qRO^IYiZQ11
z*LJ*O@akHeln2DGHANbna7)texSTKS8w%9X2EBg2OP_*;ZbHjccM=V#lO%}
zATc9#siZ*izH<@v{m4ksm!$X!TLEok)EiGD1DIiBhF{t3@g30ydV%CQ75thpDR_Sz
zkNpn7>SYJJu-@rDqHlw-Wb-Wdq8i)|MhOAQWLvnU+THRaIvD@>=47X$XKjKl#3c>c
z+gVL~u-1qB(5Zq5RlSf2!q|mGkUWzvayXPkcZ+}`@>>iAb(MsWDCeg9_g?z@
z+UzN}@~(J8&7SGm+x+QKwvX*meP^nxg%)4Y+tC!c%#Li1uw)!=uIZLYL%h({>6
zI?6W8VFdIKTctfwulDqceyAL)1|1Zhpij6oC^|RGxbla6`S|q7kxq3u8hQMPc;I2Y
zCWZ`c;*ciOjl@3Y^DqdF9BWsR_*&1bIYVLtMwmWM2N+tNJ-7j(O%B-L35XizYGSd1
zB~$y~k-c18T=w!oFaLH~OTx)ncn*m3#tpvpB({MHrvM%M=n2J!-i}LGsv{?
z<~5bwAY@lJ)i%#aoOp^!R$6CJL^-x?kT(roP@)*Q8XnwB2j3pd-#ynd#jp7WCx#4WSfc|{y
z#5H*01m+|Z94ps6)UYtvKCre(ZF$dTAxE?$+j2>nxqYpZ_Rkl)A)IQAwvH~^(SzZi
z0p9u3gpBilS%_h>xu4tFT+N#=O{4QmFL|iRrJJjnU7WSMuKUyHtOvks7;`zGw0BVH
zJQ}jPdQ!xy(s!Us?OIYAl7nvL$7V~c{(G8ho_TID+~?_UeBFyGcNvvctBnsrrf
zR>%@OeHtPSWh}(@*~6Mo%wR1Xc{kY<0UPWj-p+;|;pI=V<}0+sd36pSJ9HL1GOmQHYWmh`5IF7C{XLOE6~i$Hfktz9xZp$KS@J7FBer
zU6XsQh2^|0*VWg^M;!+bUUhGGt|)c?CClbV{)#jL~#x7@Hgi$!??u)-rbQ^;oY
zY!$-MsH;KQ%fzKtwafY8t$(=9mvBoO{a>gHD^*bc*g7T=1Gzb*{j2yz>ok&
z7M6gGH4Q*~983Vsv#8?bD~SiDobt+CdWEqjraBk8ESoH7vV6%oI3Ejb*$
zuNVEyMF_j+Q>(9d(j#nI_W+HW?IqPn%^pKydSPdh>>Y3s5VIg;-RjrS*7CMI7{u~HVW_{Y*$se=Z}$CDIdvvKrhnca(1M-26(9HuWE?Ohji
zuBpmHt(QTgQ-Yq{m73~S!#xX-w!hB|`k?xhx6>;m!B6zHz=2>h)6=4D_a-K3kJ6t;
zWMpX8X0U`fZB*msQ+^4vT8hol-6nS`h35;Ds!prmWNZ?5xhs3K9QX}xIY9op0XE<2
zd-c9BbluK;C2_kTap|WlT=|N;-RvRtrenZQU
zPiVGB&=zwO#labtRhCASg?G_h$wLNDK8)U6ui@n~0zl=OVap45dP71-m(j?er4a{gMzE>X8BhF^HR&+yN
z``L#iGzIqve?b}mDKJ3R?!Sj~hEkClMj*aQPHs-je`Hw)G=a^_UGlEFuf?-JGn
zrt3Oz*O1f$_uqL2)~`vD^hS#j3mRtrVnrHk6fM)Az8Q5M9FVc;SjPt^dWinLHEw
z(Yd%niJ8}8Uk@IFxs`OJKq)!4{LOe!xf}C
zCCb1~=m9Lg$9ZHABR*mWv80B!dG3fdJeMV!XUBJIiz)c*3_^%D@4@C+zdY>o!34x-
zf=~s)a%m<7<=N$mEk3g{K0aS)5Q~o%4(9R>QL7W2uspH*KopFWYyiKEB0WdA*o`T(uUGBdW_*E
zzTw9jeaVG?gtI%=*AAzOt@~-PogS@pCRs>(P<0-IA0-=um{T^q#LWS2+;ZNGGoXTU+*0WG?kjb!RY9Rd@zE!
zprWN%HWIB*2GqTt(OQjR79XdXGC|eojjqRj@Ntje7V=+!*iK3
ze=gkgvO#(Wb3wy1yH#b!yCt$!YIsJE^$1(}*0xh|G%dc{X8vwN4zHQ}bVmsRRrWRKCJQwJZ4r;|fIzaNqWP$xtDSB^E`4R#+kNfF#ZlImmLpgm-*(wek`j)@7COOkP^)r&;tzG%wF$uJ>jkzy^dj
z*q1CnCUcRGf<3#$dF~&$F~=`u6%?zfjrjSgZ#=WTN%JruO@f4|;_Itq@hv``
znjt?-*%=Zcor6q8k0n~uoI_Sjmizo#?Qx)NGtnCjRYcZ6yKf~=j+^wl)tqQ;;UoIe
z6dYQm`k}lBct=xCfMBv`>_%x+lk+Tb7FuRhD^!2Cr(cdZ=3Hg+ski3Y-op
zFEWTlcBk^eKNL|WnH>4v24}=}Oi<&5mq8M6PBQuKPF}pW3#$g67be*!O|OI5jBm@GHp1l$d3pM0hxL^?iZNjay1ELxtIk(9q%;7k{=WOq
zVcqtGQXM9N6owoF$=%Ja6A{~D@{W^FvI{VZ9!L(Uc`zt6i4mv(BPmaC9yTV{>?PC1dsJG
z4AQ!27bo-Ofq5a~mLqp4hPsA?n;ise0$mkdvDwg^$n=EO81vd*1PIXPHN*)(Tyl%q
zOt)_gg-@hrLba(Mc9JL}fC;bOb?$U|cn^Ofk)*8B

[PATCH v2 04/10] btrfs-progs: undelete-subvol: introduce is_subvol_intact

2018-03-27 Thread Lu Fengqi
The function is used to determine whether the subvolume is intact.

Signed-off-by: Lu Fengqi 
---
 Makefile  |  3 ++-
 undelete-subvol.c | 53 +
 undelete-subvol.h | 17 +
 3 files changed, 72 insertions(+), 1 deletion(-)
 create mode 100644 undelete-subvol.c
 create mode 100644 undelete-subvol.h

diff --git a/Makefile b/Makefile
index 6065522a615f..cb38984c7386 100644
--- a/Makefile
+++ b/Makefile
@@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o 
extent-tree.o print-tree.o \
  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o 
task-utils.o \
  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
- fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
+ fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
+ undelete-subvol.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
   cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
diff --git a/undelete-subvol.c b/undelete-subvol.c
new file mode 100644
index ..00fcc4895778
--- /dev/null
+++ b/undelete-subvol.c
@@ -0,0 +1,53 @@
+/*
+ * Copyright (C) 2018 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include "ctree.h"
+
+/*
+ * Determines whether the subvolume is intact, according to the drop_progress
+ * of the root item specified by subvol_id.
+ *
+ * Return true if the subvolume is intact, otherwise return false.
+ */
+static bool is_subvol_intact(struct btrfs_root *root, u64 subvol_id)
+{
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct extent_buffer *leaf;
+   struct btrfs_root_item root_item;
+   u64 offset;
+   int ret;
+
+   key.objectid = subvol_id;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = 0;
+
+   btrfs_init_path();
+   ret = btrfs_search_slot(NULL, root, , , 0, 0);
+   if (ret) {
+   ret = false;
+   goto out;
+   }
+
+   leaf = path.nodes[0];
+   offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
+
+   read_extent_buffer(leaf, _item, offset, sizeof(root_item));
+
+   /* the subvolume is intact if the objectid of drop_progress == 0 */
+   ret = btrfs_disk_key_objectid(_item.drop_progress) ? false : true;
+
+out:
+   btrfs_release_path();
+   return ret;
+}
diff --git a/undelete-subvol.h b/undelete-subvol.h
new file mode 100644
index ..7cfd100cce37
--- /dev/null
+++ b/undelete-subvol.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2018 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef __BTRFS_UNDELETE_SUBVOLUME_H__
+#define __BTRFS_UNDELETE_SUBVOLUME_H__
+
+#endif
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/10] btrfs-progs: use btrfs_find_free_dir_index to find free inode index

2018-03-27 Thread Lu Fengqi
Since we have an existing function to find free inode index, we can
reuse it here.

Signed-off-by: Lu Fengqi 
---
 inode.c | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/inode.c b/inode.c
index 478036562652..86905365dfd8 100644
--- a/inode.c
+++ b/inode.c
@@ -628,7 +628,7 @@ int btrfs_link_subvol(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
struct btrfs_inode_item *inode_item;
struct extent_buffer *leaf;
struct btrfs_key key;
-   u64 index = 2;
+   u64 index;
char buf[BTRFS_NAME_LEN + 1]; /* for snprintf null */
int len;
int i;
@@ -638,28 +638,15 @@ int btrfs_link_subvol(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
if (len == 0 || len > BTRFS_NAME_LEN)
return -EINVAL;
 
-   /* find the free dir_index */
-   btrfs_init_path();
-   key.objectid = dirid;
-   key.type = BTRFS_DIR_INDEX_KEY;
-   key.offset = (u64)-1;
-
-   ret = btrfs_search_slot(NULL, root, , , 0, 0);
-   if (ret <= 0) {
-   error("search for DIR_INDEX dirid %llu failed: %d",
-   (unsigned long long)dirid, ret);
+   /* add the dir_item/dir_index */
+   ret = btrfs_find_free_dir_index(root, dirid, );
+   if (ret < 0) {
+   error("unable to find free dir index dirid %llu failed: %d",
+ (unsigned long long)dirid, ret);
goto fail;
}
 
-   if (path.slots[0] > 0) {
-   path.slots[0]--;
-   btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]);
-   if (key.objectid == dirid && key.type == BTRFS_DIR_INDEX_KEY)
-   index = key.offset + 1;
-   }
-   btrfs_release_path();
-
-   /* add the dir_item/dir_index */
+   btrfs_init_path();
key.objectid = dirid;
key.offset = 0;
key.type =  BTRFS_INODE_ITEM_KEY;
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/10] btrfs-progs: copy btrfs_del_orphan_item from kernel

2018-03-27 Thread Lu Fengqi
Add btrfs_del_orphan_item for the later subvolume undelete.

Signed-off-by: Lu Fengqi 
---
 ctree.h |  2 ++
 inode.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/ctree.h b/ctree.h
index fa861ba0b4c3..0fc31dd8d998 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2793,6 +2793,8 @@ int btrfs_unlink(struct btrfs_trans_handle *trans, struct 
btrfs_root *root,
 int btrfs_add_orphan_item(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, struct btrfs_path *path,
  u64 ino);
+int btrfs_del_orphan_item(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root, u64 offset);
 int btrfs_mkdir(struct btrfs_trans_handle *trans, struct btrfs_root *root,
char *name, int namelen, u64 parent_ino, u64 *ino, int mode);
 struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root, const char *base,
diff --git a/inode.c b/inode.c
index 2398bca4a109..8d0812c7cf50 100644
--- a/inode.c
+++ b/inode.c
@@ -262,6 +262,36 @@ int btrfs_add_orphan_item(struct btrfs_trans_handle *trans,
return btrfs_insert_empty_item(trans, root, path, , 0);
 }
 
+int btrfs_del_orphan_item(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root, u64 offset)
+{
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   int ret = 0;
+
+   key.objectid = BTRFS_ORPHAN_OBJECTID;
+   key.type = BTRFS_ORPHAN_ITEM_KEY;
+   key.offset = offset;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   ret = btrfs_search_slot(trans, root, , path, -1, 1);
+   if (ret < 0)
+   goto out;
+   if (ret) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   ret = btrfs_del_item(trans, root, path);
+
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
 /*
  * Unlink an inode, which will remove its backref and corresponding dir_index/
  * dir_item if any of them exists.
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/10] btrfs-progs: undelete-subvol: add undelete-subvol subcommand

2018-03-27 Thread Lu Fengqi
Add the undelete-subvol subcommand for btrfs rescue. This subcommand is
used to recover deleted subvolume left intact on the device.

Signed-off-by: Lu Fengqi 
---
v2: add -s option to specify subvol_id.

 cmds-rescue.c | 70 +++
 1 file changed, 70 insertions(+)

diff --git a/cmds-rescue.c b/cmds-rescue.c
index c40088ad374e..c5132126e922 100644
--- a/cmds-rescue.c
+++ b/cmds-rescue.c
@@ -25,6 +25,7 @@
 #include "disk-io.h"
 #include "commands.h"
 #include "utils.h"
+#include "undelete-subvol.h"
 #include "help.h"
 
 static const char * const rescue_cmd_group_usage[] = {
@@ -248,6 +249,73 @@ out:
return !!ret;
 }
 
+static const char * const cmd_rescue_undelete_subvol_usage[] = {
+   "btrfs rescue undelete-subvol [-s ] ",
+   "Undelete deleted subvolume",
+   "All deleted subvolume that still left intact on the device will be",
+   "recovered. If -s  option is given, then just recover the",
+   "subvolume which specified by .",
+   "",
+   "-s   specify the subvolume which will be recovered.",
+   NULL
+};
+
+static int cmd_rescue_undelete_subvol(int argc, char **argv)
+{
+   struct btrfs_fs_info *fs_info;
+   char *devname;
+   u64 subvol_id = 0;
+   int ret;
+
+   while (1) {
+   int c = getopt(argc, argv, "s:");
+
+   if (c < 0)
+   break;
+   switch (c) {
+   case 's':
+   subvol_id = arg_strtou64(optarg);
+   if (!is_fstree(subvol_id)) {
+   error("%llu is not a valid subvolume id",
+   subvol_id);
+   ret = -EINVAL;
+   goto out;
+   }
+   break;
+   default:
+   usage(cmd_rescue_undelete_subvol_usage);
+   }
+   }
+
+   if (check_argc_exact(argc - optind, 1))
+   usage(cmd_rescue_undelete_subvol_usage);
+
+   devname = argv[optind];
+   ret = check_mounted(devname);
+   if (ret < 0) {
+   error("could not check mount status: %s", strerror(-ret));
+   goto out;
+   } else if (ret) {
+   error("%s is currently mounted", devname);
+   ret = -EBUSY;
+   goto out;
+   }
+
+   fs_info = open_ctree_fs_info(devname, 0, 0, 0, OPEN_CTREE_WRITES |
+OPEN_CTREE_PARTIAL);
+   if (!fs_info) {
+   error("could not open btrfs");
+   ret = -EIO;
+   goto out;
+   }
+
+   ret = btrfs_undelete_intact_subvols(fs_info->tree_root, subvol_id);
+
+   close_ctree(fs_info->tree_root);
+out:
+   return ret;
+}
+
 static const char rescue_cmd_group_info[] =
 "toolbox for specific rescue operations";
 
@@ -260,6 +328,8 @@ const struct cmd_group rescue_cmd_group = {
{ "zero-log", cmd_rescue_zero_log, cmd_rescue_zero_log_usage, 
NULL, 0},
{ "fix-device-size", cmd_rescue_fix_device_size,
cmd_rescue_fix_device_size_usage, NULL, 0},
+   { "undelete-subvol", cmd_rescue_undelete_subvol,
+   cmd_rescue_undelete_subvol_usage, NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/10] btrfs-progs: undelete-subvol: introduce link_subvol_to_lostfound

2018-03-27 Thread Lu Fengqi
The function will create lost+found directory, link the deleted
subvolume specified by the subvol_id to the directory, update the
information of root_item and cleanup the associated orphan item.

Signed-off-by: Lu Fengqi 
---
 undelete-subvol.c | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/undelete-subvol.c b/undelete-subvol.c
index 781057df2b84..9243e35545c5 100644
--- a/undelete-subvol.c
+++ b/undelete-subvol.c
@@ -106,3 +106,79 @@ out:
btrfs_release_path();
return ret;
 }
+
+/*
+ * Recover a subvolume specified by subvol_id, and link it to the lost+found
+ * directory.
+ *
+ * @root: the root of the root tree.
+ * @subvol_id: specify the subvolume which will be linked, and also be the part
+ * of the subvolume name.
+ *
+ * Return 0 if no error occurred.
+ */
+static int link_subvol_to_lostfound(struct btrfs_root *root, u64 subvol_id)
+{
+   struct btrfs_trans_handle *trans;
+   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct btrfs_root *fs_root = fs_info->fs_root;
+   char buf[BTRFS_NAME_LEN + 1] = {0}; /* 1 for snprintf null */
+   char *dir_name = "lost+found";
+   u64 lost_found_ino = 0;
+   u32 mode = 0700;
+   int ret;
+
+   /*
+* For link subvolume to lost+found,
+* 2 for parent(256)'s dir_index and dir_item
+* 2 for lost+found dir's inode_item and inode_ref
+* 2 for lost+found dir's dir_index and dir_item for the subvolume
+* 2 for the subvolume's root_ref and root_backref
+*/
+   trans = btrfs_start_transaction(fs_root, 8);
+   if (IS_ERR(trans)) {
+   error("unable to start transaction");
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   /* Create lost+found directory */
+   ret = btrfs_mkdir(trans, fs_root, dir_name, strlen(dir_name),
+ BTRFS_FIRST_FREE_OBJECTID, _found_ino,
+ mode);
+   if (ret < 0) {
+   error("failed to create '%s' dir: %d", dir_name, ret);
+   goto out;
+   }
+
+   /* Link the subvolume to lost+found directory */
+   snprintf(buf, BTRFS_NAME_LEN + 1, "sub%llu", subvol_id);
+   ret = btrfs_link_subvol(trans, fs_root, buf, subvol_id, lost_found_ino,
+   false);
+   if (ret) {
+   error("failed to link the subvol %llu: %d", subvol_id, ret);
+   goto out;
+   }
+
+   /* Clear root flags BTRFS_ROOT_SUBVOL_DEAD and increase root refs */
+   ret = recover_dead_root(trans, root, subvol_id);
+   if (ret)
+   goto out;
+
+   /* Delete the orphan item after undeletion is completed. */
+   ret = btrfs_del_orphan_item(trans, root, subvol_id);
+   if (ret) {
+   error("failed to delete the orphan_item for %llu: %d",
+   subvol_id, ret);
+   goto out;
+   }
+
+   ret = btrfs_commit_transaction(trans, fs_root);
+   if (ret) {
+   error("transaction commit failed: %d", ret);
+   goto out;
+   }
+
+out:
+   return ret;
+}
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/10] btrfs-progs: undelete-subvol: introduce btrfs_undelete_intact_subvols

2018-03-27 Thread Lu Fengqi
The function default will traverse the all orphan items on the tree root,
and recover the all intact subvolumes. If subvol_id is specified, then only
the corresponding subvolume will be recovered.

Signed-off-by: Lu Fengqi 
---
v2: add subvol_id argumenta to specify subvol_id instead of recovering
all subvolumes.

 undelete-subvol.c | 70 +++
 undelete-subvol.h |  2 ++
 2 files changed, 72 insertions(+)

diff --git a/undelete-subvol.c b/undelete-subvol.c
index 9243e35545c5..5b494ca086ab 100644
--- a/undelete-subvol.c
+++ b/undelete-subvol.c
@@ -15,6 +15,7 @@
 #include "transaction.h"
 #include "disk-io.h"
 #include "messages.h"
+#include "undelete-subvol.h"
 
 /*
  * Determines whether the subvolume is intact, according to the drop_progress
@@ -182,3 +183,72 @@ static int link_subvol_to_lostfound(struct btrfs_root 
*root, u64 subvol_id)
 out:
return ret;
 }
+
+/*
+ * Traverse all orphan items on the root tree, restore them to the lost+found
+ * directory if the corresponding subvolumes are still intact left on the disk.
+ *
+ * @root   the root of the root tree.
+ * @subvol_id  if not set to 0, skip other subvolumes and only recover the
+ * subvolume specified by @subvol_id.
+ *
+ * Return 0 if no error occurred even if no subvolume was recovered.
+ */
+int btrfs_undelete_intact_subvols(struct btrfs_root *root, u64 subvol_id)
+{
+   struct btrfs_key key;
+   struct btrfs_path path;
+   u64 found_count = 0;
+   u64 recovered_count = 0;
+   int ret = 0;
+
+   key.objectid = BTRFS_ORPHAN_OBJECTID;
+   key.type = BTRFS_ORPHAN_ITEM_KEY;
+   key.offset = subvol_id ? subvol_id + 1 : (u64)-1;
+
+   btrfs_init_path();
+   while (subvol_id != key.offset) {
+   ret = btrfs_search_slot(NULL, root, , , 0, 0);
+   if (ret < 0) {
+   error("search ORPHAN_ITEM for %llu failed.\n",
+ key.offset);
+   break;
+   }
+
+   path.slots[0]--;
+   btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]);
+
+   btrfs_release_path();
+
+   /* No more BTRFS_ORPHAN_ITEM, so we don't need to continue. */
+   if (key.type != BTRFS_ORPHAN_ITEM_KEY) {
+   ret = 0;
+   break;
+   }
+
+   /* If subvol_id is non-zero, skip other deleted subvolume. */
+   if (subvol_id && subvol_id != key.offset) {
+   ret = -ENOENT;
+   break;
+   }
+
+   if (!is_subvol_intact(root, key.offset))
+   continue;
+
+   /* Here we can confirm there is an intact subvolume. */
+   found_count++;
+   ret = link_subvol_to_lostfound(root, key.offset);
+   if (ret == 0) {
+   recovered_count++;
+   printf(
+   "Recovered subvolume %llu to lost+found successfully.\n",
+   key.offset);
+   }
+
+   }
+
+   printf("Found %llu subvols left intact\n", found_count);
+   printf("Recovered %llu subvols\n", found_count);
+
+   return ret;
+}
diff --git a/undelete-subvol.h b/undelete-subvol.h
index 7cfd100cce37..f773210c46fe 100644
--- a/undelete-subvol.h
+++ b/undelete-subvol.h
@@ -14,4 +14,6 @@
 #ifndef __BTRFS_UNDELETE_SUBVOLUME_H__
 #define __BTRFS_UNDELETE_SUBVOLUME_H__
 
+int btrfs_undelete_intact_subvols(struct btrfs_root *root, u64 subvol_id);
+
 #endif
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/10] btrfs-progs: extract btrfs_link_subvol from btrfs_mksubvol

2018-03-27 Thread Lu Fengqi
The original btrfs_mksubvol is too specific to specify the directory that
the subvolume will link to. Furthermore, in this transaction, we don't only
need to create root_ref/dir-item, but also update the refs or flags of
root_item. Extract a generic btrfs_link_subvol that allow the caller pass a
trans argument for later subvolume undelete.

No functional changes for the btrfs_mksubvol.

Signed-off-by: Lu Fengqi 
---
 convert/main.c | 57 +
 ctree.h|  5 +++--
 inode.c| 46 +-
 3 files changed, 81 insertions(+), 27 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 6bdfab40d0b0..dd6b42a1f2b1 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1002,6 +1002,63 @@ err:
return ret;
 }
 
+/*
+ * Link the subvolume specified by @root_objectid to the root_dir of @root.
+ *
+ * @root   the root of the file tree which the subvolume will
+ * be linked to.
+ * @subvol_namethe name of the subvolume which will be linked.
+ * @root_objectid  specify the subvolume which will be linked.
+ * @convertthe flag to determine whether to try to resolve
+ * the name conflict.
+ *
+ * Return the root of the subvolume if success, otherwise return NULL.
+ */
+static struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root,
+const char *subvol_name,
+u64 root_objectid,
+bool convert)
+{
+   struct btrfs_trans_handle *trans;
+   struct btrfs_root *subvol_root = NULL;
+   struct btrfs_key key;
+   u64 dirid = btrfs_root_dirid(>root_item);
+   int ret;
+
+
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   error("unable to start transaction");
+   goto fail;
+   }
+
+   ret = btrfs_link_subvol(trans, root, subvol_name, root_objectid, dirid,
+   true);
+   if (ret) {
+   error("unable to link subvolume %s", subvol_name);
+   goto fail;
+   }
+
+   ret = btrfs_commit_transaction(trans, root);
+   if (ret) {
+   error("transaction commit failed: %d", ret);
+   goto fail;
+   }
+
+   key.objectid = root_objectid;
+   key.offset = (u64)-1;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+
+   subvol_root = btrfs_read_fs_root(root->fs_info, );
+   if (!subvol_root) {
+   error("unable to link subvolume %s", subvol_name);
+   goto fail;
+   }
+
+fail:
+   return subvol_root;
+}
+
 /*
  * Migrate super block to its default position and zero 0 ~ 16k
  */
diff --git a/ctree.h b/ctree.h
index 0fc31dd8d998..4bff0b821472 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2797,8 +2797,9 @@ int btrfs_del_orphan_item(struct btrfs_trans_handle 
*trans,
  struct btrfs_root *root, u64 offset);
 int btrfs_mkdir(struct btrfs_trans_handle *trans, struct btrfs_root *root,
char *name, int namelen, u64 parent_ino, u64 *ino, int mode);
-struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root, const char *base,
- u64 root_objectid, bool convert);
+int btrfs_link_subvol(struct btrfs_trans_handle *trans, struct btrfs_root 
*root,
+ const char *base, u64 root_objectid, u64 dirid,
+ bool convert);
 
 /* file.c */
 int btrfs_get_extent(struct btrfs_trans_handle *trans,
diff --git a/inode.c b/inode.c
index 8d0812c7cf50..478036562652 100644
--- a/inode.c
+++ b/inode.c
@@ -606,19 +606,28 @@ out:
return ret;
 }
 
-struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root,
- const char *base, u64 root_objectid,
- bool convert)
+/*
+ * Link the subvolume specified by @root_objectid to the directory specified by
+ * @dirid on the file tree specified by @root.
+ *
+ * @root   the root of the file tree where the directory on.
+ * @base   the name of the subvolume which will be linked.
+ * @root_objectid  specify the subvolume which will be linked.
+ * @dirid  specify the directory which the subvolume will be
+ * linked to.
+ * @convertthe flag to determine whether to try to resolve
+ * the name conflict.
+ */
+int btrfs_link_subvol(struct btrfs_trans_handle *trans, struct btrfs_root 
*root,
+ const char *base, u64 root_objectid, u64 dirid,
+ bool convert)
 {
-   struct btrfs_trans_handle *trans;
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_root *tree_root = fs_info->tree_root;
-   struct btrfs_root *new_root = NULL;
struct btrfs_path path;
struct 

[PATCH v2 05/10] btrfs-progs: undelete-subvol: introduce recover_dead_root

2018-03-27 Thread Lu Fengqi
The function will find the root_item specified by the subvol_id,
clear the BTRFS_ROOT_SUBVOL_DEAD flag and set root_refs to one.

Signed-off-by: Lu Fengqi 
---
 ctree.h   |  1 +
 undelete-subvol.c | 55 +++
 2 files changed, 56 insertions(+)

diff --git a/ctree.h b/ctree.h
index 4bff0b821472..7c0b8150bc4e 100644
--- a/ctree.h
+++ b/ctree.h
@@ -184,6 +184,7 @@ static int btrfs_csum_sizes[] = { 4 };
 #define BTRFS_FT_MAX   9
 
 #define BTRFS_ROOT_SUBVOL_RDONLY   (1ULL << 0)
+#define BTRFS_ROOT_SUBVOL_DEAD (1ULL << 48)
 
 /*
  * the key defines the order in the tree, and so it also defines (optimal)
diff --git a/undelete-subvol.c b/undelete-subvol.c
index 00fcc4895778..781057df2b84 100644
--- a/undelete-subvol.c
+++ b/undelete-subvol.c
@@ -12,6 +12,9 @@
  */
 
 #include "ctree.h"
+#include "transaction.h"
+#include "disk-io.h"
+#include "messages.h"
 
 /*
  * Determines whether the subvolume is intact, according to the drop_progress
@@ -51,3 +54,55 @@ out:
btrfs_release_path();
return ret;
 }
+
+/*
+ * Clear BTRFS_ROOT_SUBVOL_DEAD flag and set the root_refs to one.
+ *
+ * @root   the root of root tree.
+ * @subvol_id  specify the root_item which will be modified.
+ *
+ * Return 0 if no error occurred.
+ */
+static int recover_dead_root(struct btrfs_trans_handle *trans,
+struct btrfs_root *root, u64 subvol_id)
+{
+   struct btrfs_key key;
+   struct btrfs_path path;
+   struct extent_buffer *leaf;
+   struct btrfs_root_item root_item;
+   u64 root_flags;
+   u64 offset;
+   int ret;
+
+   key.objectid = subvol_id;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = 0;
+
+   btrfs_init_path();
+   ret = btrfs_search_slot(trans, root, , , 0, 0);
+   if (ret) {
+   error("couldn't find ROOT_ITEM for %llu failed: %d",
+   subvol_id, ret);
+   goto out;
+   }
+
+   leaf = path.nodes[0];
+
+   offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
+   read_extent_buffer(leaf, _item, offset, sizeof(root_item));
+
+   /* Clear BTRFS_ROOT_SUBVOL_DEAD */
+   root_flags = btrfs_root_flags(_item);
+   btrfs_set_root_flags(_item,
+root_flags & ~BTRFS_ROOT_SUBVOL_DEAD);
+
+   /* Increase the refs */
+   btrfs_set_root_refs(_item, 1);
+
+   write_extent_buffer(leaf, _item, offset, sizeof(root_item));
+   btrfs_mark_buffer_dirty(leaf);
+
+out:
+   btrfs_release_path();
+   return ret;
+}
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/10] undelete subvolume offline version

2018-03-27 Thread Lu Fengqi
This patchset will add undelete-subvol subcommand for btrfs rescue. This
enhancement allows undeleting a subvolume on an unmounted filesystem.

Patchset can be fetched from github:
https://github.com/littleroad/btrfs-progs.git undelete

v1->v2: add -s option to allow user specify the subvolume which will be
recovered.

The first six patches are not modified.
7th-8th add the -s option to specify subvol_id instead of recovering all
subvolumes.
9th add shell quoting to test script.
10th just add the -s option documentation.

Lu Fengqi (10):
  btrfs-progs: copy btrfs_del_orphan_item from kernel
  btrfs-progs: extract btrfs_link_subvol from btrfs_mksubvol
  btrfs-progs: use btrfs_find_free_dir_index to find free inode index
  btrfs-progs: undelete-subvol: introduce is_subvol_intact
  btrfs-progs: undelete-subvol: introduce recover_dead_root
  btrfs-progs: undelete-subvol: introduce link_subvol_to_lostfound
  btrfs-progs: undelete-subvol: introduce btrfs_undelete_intact_subvols
  btrfs-progs: undelete-subvol: add undelete-subvol subcommand
  btrfs-progs: tests: add testcase for undelete-subvol
  btrfs-progs: undelete-subvol: update completion and documentation

 Documentation/btrfs-rescue.asciidoc|  12 +
 Makefile   |   3 +-
 btrfs-completion   |   2 +-
 cmds-rescue.c  |  70 ++
 convert/main.c |  57 +
 ctree.h|   8 +-
 inode.c|  99 
 .../030-undelete-subvol/deleted_subvolume.img  | Bin 0 -> 4096 bytes
 .../030-undelete-subvol/drop_progress.raw.xz   | Bin 0 -> 23452 bytes
 tests/misc-tests/030-undelete-subvol/test.sh   |  34 +++
 undelete-subvol.c  | 254 +
 undelete-subvol.h  |  19 ++
 12 files changed, 511 insertions(+), 47 deletions(-)
 create mode 100644 tests/misc-tests/030-undelete-subvol/deleted_subvolume.img
 create mode 100644 tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz
 create mode 100755 tests/misc-tests/030-undelete-subvol/test.sh
 create mode 100644 undelete-subvol.c
 create mode 100644 undelete-subvol.h

-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Persoonlijke en beleggingslening.

2018-03-27 Thread Funding Trusts Finance


-- 
Goede dag,
  
We zijn Funding Trusts Finance verstrekt leningen per postadvertentie. Wij 
bieden verschillende soorten leningen of projectleningen (korte en lange 
termijnleningen, persoonlijke leningen, leningen aan bedrijven enz.) Met een 
rentetarief van 3%. We verstrekken leningen aan mensen in nood, ongeacht hun 
locatie, geslacht, burgerlijke staat, opleiding of baan, maar moeten een legale 
manier van terugbetaling hebben. Onze leningen variren tussen 5.000,00 
tot 20.000.000,00 US Dollar of Euro of Pond met een maximale duur van 15 jaar. 
Als u genteresseerd bent in meer informatie, hebben we investeerders die 
genteresseerd zijn in het financieren van projecten van groot volume. De 
procedures zijn als volgt: -

1-De klant moet een korte samenvatting van het project verzenden. Dit moet het 
totale bedrag omvatten dat vereist is voor het project, geschat rendement op 
investering, terugbetalingsperiode van de lening, dit mag niet meer dan 20 jaar 
zijn

Neem contact met ons op: i...@fundingtrustsfinance.com

INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigde hoeveelheid: ...
Looptijd: ...
Beroep: ...
Maandelijks inkomensniveau: ..
Geslacht: ..
Geboortedatum: ...
Staat: ...
Land: .
Doel: .

"Miljoenen mensen in de wereld hebben een kredietprobleem van een of andere 
vorm. Je bent niet de enige. We hebben een hele reeks leningopties die kunnen 
helpen. Ontdek nu je opties!"

Met vriendelijke groet,
Ronny Hens,
i...@fundingtrustsfinance.com
WEBSITE: www.fundingtrustfinance.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Persoonlijke en beleggingslening.

2018-03-27 Thread Funding Trusts Finance


-- 
Goede dag,
  
We zijn Funding Trusts Finance verstrekt leningen per postadvertentie. Wij 
bieden verschillende soorten leningen of projectleningen (korte en lange 
termijnleningen, persoonlijke leningen, leningen aan bedrijven enz.) Met een 
rentetarief van 3%. We verstrekken leningen aan mensen in nood, ongeacht hun 
locatie, geslacht, burgerlijke staat, opleiding of baan, maar moeten een legale 
manier van terugbetaling hebben. Onze leningen variren tussen 5.000,00 
tot 20.000.000,00 US Dollar of Euro of Pond met een maximale duur van 15 jaar. 
Als u genteresseerd bent in meer informatie, hebben we investeerders die 
genteresseerd zijn in het financieren van projecten van groot volume. De 
procedures zijn als volgt: -

1-De klant moet een korte samenvatting van het project verzenden. Dit moet het 
totale bedrag omvatten dat vereist is voor het project, geschat rendement op 
investering, terugbetalingsperiode van de lening, dit mag niet meer dan 20 jaar 
zijn

Neem contact met ons op: i...@fundingtrustsfinance.com

INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigde hoeveelheid: ...
Looptijd: ...
Beroep: ...
Maandelijks inkomensniveau: ..
Geslacht: ..
Geboortedatum: ...
Staat: ...
Land: .
Doel: .

"Miljoenen mensen in de wereld hebben een kredietprobleem van een of andere 
vorm. Je bent niet de enige. We hebben een hele reeks leningopties die kunnen 
helpen. Ontdek nu je opties!"

Met vriendelijke groet,
Ronny Hens,
i...@fundingtrustsfinance.com
WEBSITE: www.fundingtrustfinance.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Persoonlijke en beleggingslening.

2018-03-27 Thread Funding Trusts Finance


-- 
Goede dag,
  
We zijn Funding Trusts Finance verstrekt leningen per postadvertentie. Wij 
bieden verschillende soorten leningen of projectleningen (korte en lange 
termijnleningen, persoonlijke leningen, leningen aan bedrijven enz.) Met een 
rentetarief van 3%. We verstrekken leningen aan mensen in nood, ongeacht hun 
locatie, geslacht, burgerlijke staat, opleiding of baan, maar moeten een legale 
manier van terugbetaling hebben. Onze leningen variren tussen 5.000,00 
tot 20.000.000,00 US Dollar of Euro of Pond met een maximale duur van 15 jaar. 
Als u genteresseerd bent in meer informatie, hebben we investeerders die 
genteresseerd zijn in het financieren van projecten van groot volume. De 
procedures zijn als volgt: -

1-De klant moet een korte samenvatting van het project verzenden. Dit moet het 
totale bedrag omvatten dat vereist is voor het project, geschat rendement op 
investering, terugbetalingsperiode van de lening, dit mag niet meer dan 20 jaar 
zijn

Neem contact met ons op: i...@fundingtrustsfinance.com

INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigde hoeveelheid: ...
Looptijd: ...
Beroep: ...
Maandelijks inkomensniveau: ..
Geslacht: ..
Geboortedatum: ...
Staat: ...
Land: .
Doel: .

"Miljoenen mensen in de wereld hebben een kredietprobleem van een of andere 
vorm. Je bent niet de enige. We hebben een hele reeks leningopties die kunnen 
helpen. Ontdek nu je opties!"

Met vriendelijke groet,
Ronny Hens,
i...@fundingtrustsfinance.com
WEBSITE: www.fundingtrustfinance.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: wipe copies of the stale superblock beyond -b size

2018-03-27 Thread Nikolay Borisov


On 27.03.2018 02:50, Anand Jain wrote:
> 
> 
>> I told you this code can be made a lot simpler, simply by modifying the
>> last argument passed to zero_dev_clamped. I even posted the resulting
>> diff which was just 3 lines changed.
>>
>> I agree that it's a good idea to wipe all available superblock when we
>> use -b. However I don't agree with your approach - it adds a loop, it
>> adds a bunch of checks and makes the complexity orders of magnitude
>> higher than it could be. So I'm asking again - is there any inherent
>> benefit which I'm missing in your newly added 35 lines of code against
>> just passing the block device to zero_dev_clamped and letting the
>> existing logic take care of everything?
> 
> I had that idea for v1 as well, but I didn't do it because it would
> zero bytenr_copy#2 even when there is no btrfs superblock, (which is
> fine only with in block_count).
> 
> Some might view it as corrupting the usr data (for which they have
> specified -b option?)? I am just discussing I don't have any usecase
> to prove it though. Do you have any idea?
> If you think it should be ok, I shall go ahead and zero without
> checking for the btrfs superblock beyond block_count.

Right, so the pertinent question is - is anyone expecting to do mkfs on
a volume irrespective of the options and expect to be able to recover
data prior to the mkfs? I think the answer is 'no', but let's see what
the community says.

> 
> Thanks, Anand
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] btrfs-progs: print-tree: Use macro to replace immediate number for readable flags string buffer length

2018-03-27 Thread Qu Wenruo
In print-tree, we have a lot of parsers to convert numeric flags to
human readable string.

For the buffer size we're using immediate numbers for all their callers.
Change this to macro so it will be much easier for us to expand the
buffer size.

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Move all buffer length macros to the header part.
  Change comment to refer to the buffer length marco for later
  modication.
---
 print-tree.c | 50 +++---
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/print-tree.c b/print-tree.c
index a2f6bfc027c9..064f1bb9ff8c 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -26,6 +26,12 @@
 #include "print-tree.h"
 #include "utils.h"
 
+#define BG_FLAGS_BUF_LEN   64  /* for bg_flags_to_str() */
+#define QGROUP_FLAGS_BUF_LEN   32  /* for qgroup_flags_to_str() */
+#define EXTENT_FLAGS_BUF_LEN   32  /* for extent_flags_to_str() */
+#define ROOT_FLAGS_BUF_LEN 16  /* for root_flags_to_str() */
+#define INODE_FLAGS_BUF_LEN128 /* for inode_flags_to_str() */
+#define HEADER_FLAGS_BUF_LEN   32  /* for header_flags_to_str() */
 
 static void print_dir_item_type(struct extent_buffer *eb,
 struct btrfs_dir_item *di)
@@ -137,7 +143,13 @@ static void print_inode_ref_item(struct extent_buffer *eb, 
u32 size,
}
 }
 
-/* Caller should ensure sizeof(*ret)>=21 "DATA|METADATA|RAID10" */
+/*
+ * Caller should ensure sizeof(*ret)>=21 "DATA|METADATA|RAID10"
+ * Here we bump up to 64 bytes to handle the worst (and invalid) case:
+ * "DATA|METADATA|SYSTEM|RAID0|RAID1|DUP|RAID10|RAID5|RAID6"
+ *
+ * BG_FLAGS_BUF_LEN is ensured to have enough space.
+ */
 static void bg_flags_to_str(u64 flags, char *ret)
 {
int empty = 1;
@@ -180,7 +192,11 @@ static void bg_flags_to_str(u64 flags, char *ret)
}
 }
 
-/* Caller should ensure sizeof(*ret)>= 26 "OFF|SCANNING|INCONSISTENT" */
+/*
+ * Caller should ensure sizeof(*ret)>= 26 "OFF|SCANNING|INCONSISTENT"
+ *
+ * QGROUP_FLAGS_BUF_LEN is ensured to have enough space.
+ */
 static void qgroup_flags_to_str(u64 flags, char *ret)
 {
if (flags & BTRFS_QGROUP_STATUS_FLAG_ON)
@@ -199,7 +215,7 @@ void print_chunk_item(struct extent_buffer *eb, struct 
btrfs_chunk *chunk)
u16 num_stripes = btrfs_chunk_num_stripes(eb, chunk);
int i;
u32 chunk_item_size;
-   char chunk_flags_str[32] = {0};
+   char chunk_flags_str[BG_FLAGS_BUF_LEN] = {0};
 
/* The chunk must contain at least one stripe */
if (num_stripes < 1) {
@@ -385,7 +401,11 @@ static void print_file_extent_item(struct extent_buffer 
*eb,
compress_str);
 }
 
-/* Caller should ensure sizeof(*ret) >= 16("DATA|TREE_BLOCK") */
+/*
+ * Caller should ensure sizeof(*ret) > strlen("DATA|TREE_BLOCK|FULL_BACKREF")
+ *
+ * EXTENT_FLAGS_BUF_LEN is ensured to have enough space.
+ */
 static void extent_flags_to_str(u64 flags, char *ret)
 {
int empty = 1;
@@ -420,7 +440,7 @@ void print_extent_item(struct extent_buffer *eb, int slot, 
int metadata)
u32 item_size = btrfs_item_size_nr(eb, slot);
u64 flags;
u64 offset;
-   char flags_str[32] = {0};
+   char flags_str[EXTENT_FLAGS_BUF_LEN] = {0};
 
if (item_size < sizeof(*ei)) {
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
@@ -542,6 +562,8 @@ static int empty_uuid(const u8 *uuid)
 
 /*
  * Caller must ensure sizeof(*ret) >= 7 "RDONLY"
+ *
+ * ROOT_FLAGS_BUF_LEN is ensured to have enough space
  */
 static void root_flags_to_str(u64 flags, char *ret)
 {
@@ -577,7 +599,7 @@ static void print_root_item(struct extent_buffer *leaf, int 
slot)
struct btrfs_root_item root_item;
int len;
char uuid_str[BTRFS_UUID_UNPARSED_SIZE];
-   char flags_str[32] = {0};
+   char flags_str[ROOT_FLAGS_BUF_LEN] = {0};
struct btrfs_key drop_key;
 
ri = btrfs_item_ptr(leaf, slot, struct btrfs_root_item);
@@ -887,6 +909,8 @@ static void print_uuid_item(struct extent_buffer *l, 
unsigned long offset,
 /*
  * Caller should ensure sizeof(*ret) >= 102: all charactors plus '|' of
  * BTRFS_INODE_* flags
+ *
+ * INODE_FLAGS_BUF_LEN is ensured to have enough space
  */
 static void inode_flags_to_str(u64 flags, char *ret)
 {
@@ -911,7 +935,7 @@ static void inode_flags_to_str(u64 flags, char *ret)
 static void print_inode_item(struct extent_buffer *eb,
struct btrfs_inode_item *ii)
 {
-   char flags_str[256];
+   char flags_str[INODE_FLAGS_BUF_LEN];
 
memset(flags_str, 0, sizeof(flags_str));
inode_flags_to_str(btrfs_inode_flags(eb, ii), flags_str);
@@ -1002,7 +1026,7 @@ static void print_block_group_item(struct extent_buffer 
*eb,
struct btrfs_block_group_item *bgi)
 {
struct btrfs_block_group_item bg_item;
-   char flags_str[256];
+   char flags_str[BG_FLAGS_BUF_LEN];
 
read_extent_buffer(eb, _item, (unsigned long)bgi,