Re: Moving btrfs fs disk between computers

2017-08-18 Thread Duncan
Adam Borowski posted on Sat, 19 Aug 2017 04:35:26 +0200 as excerpted:

> On Fri, Aug 18, 2017 at 11:09:22PM -0300, Hérikz Nawarro wrote:
>> Hello everyone,
>> 
>> Can i create a btrfs fs in a machine and move the disk to another
>> machine like a ext4 or ntfs?
> 
> Yeah, no problem whatsoever, even for multi-device filesystems.
> Btrfs doesn't care about what devices it is on.
> 
> There has been no new incompat features in quite a while, so old kernels
> shouldn't be an issue either.
> 
> I think the only way for valid btrfs to be unmountable on another
> machine with a modern kernel is to mess with page size, which is AFAIK
> not even possible on x86.

IOW, as long as the two machines are the same arch, or more precisely, 
have the same page size if different archs, it should be fine as long as 
you're not running a really old kernel on one of them.

But don't try mounting a filesystem created on x86_64 on something exotic 
like sparc or ppc or something.  If the two archs' page sizes are the 
same it should work, but I'd still be sure I had an extra set of backups 
before I tried something like that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
THANK YOU ALL!  I just had to truncate the first 1.5 KiB of the image
file to get the offsets right (God knows why), but I could then get
the btrfs driver to recognize it, and I could MOUNT THE FILESYSTEM!
I'm going to run btrfs check on it, free some space on it, PROPERLY
remove the extra device, and then boot the system and set up both more
cloud backups and probably throw in a 500GB disk I have laying around
so I can set up snapshots.
Everyone has been very helpful, and sorry for the real issue being my
own stupidity.  :)

On Fri, Aug 18, 2017 at 10:28 PM, Zirconium Hacker  wrote:
> The image doesn't have a valid superblock.  I'm really confused as to
> how that could've happened.
>
> On Fri, Aug 18, 2017 at 7:21 PM, Qu Wenruo  wrote:
>>
>>
>> On 2017年08月19日 05:52, Zirconium Hacker wrote:
>>>
>>> Ok, so since it's clear now that I need that 5 GB device to be
>>> present... I found the image file.  But how do I get BTRFS to
>>> recognize the image as a device?
>>
>>
>> # losetup -f
>> Remember the loop*, here use /dev/loop1 as example.
>>
>> # losetup /dev/loop1 
>> # partprobe /dev/loop1
>> Then you should have /dev/loop1p1
>>
>> # btrfs dev rescan
>> If nothing wrong happened, you should be good to go.
>>
>> Thanks,
>> Qu
>>
>>
>>>  I have zero experience with
>>> multi-device systems.  Setting it up as a loop device doesn't fix
>>> mounting, and wipefs doesn't detect the BTRFS magic number, but
>>> printing some of it to console shows it does have real data.  Writing
>>> the magic number onto it (it's a copy of the original to be safe)
>>> shows in dump-super, but all other values are zero.
>>>
>>> I tried sending the above on my phone earlier but it was detected as a
>>> "virus" because it contained HTML.  Whoops.
>>>
>>> On Fri, Aug 18, 2017 at 11:00 AM, Chris Murphy 
>>> wrote:

 On Fri, Aug 18, 2017 at 2:47 AM, Zirconium Hacker 
 wrote:

> I vaguely remember following this guide at some point:
>
> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
> -- specifically the "Balance cannot run because the filesystem is
> full" part.  This may have broken some things?
>

 If you don't do 'btrfs device delete /dev/loop0' or if that command
 does not complete, then it's possible to get into the situation you're
 in.

 Have you ever mounted this file system with -o degraded?

 I'm going to guess the history is something like:
 1. enospc
 2. btrfs dev add
 3. some kind of filtered balance, which only causes data block groups
 to be moved to the 2nd device
 4. 2nd device is physically removed without first 'btrfs dev del'

 Zirco's superblock very clearly says num_devices  2, so I'd expect
 normal mount to always fail unless both devices are present. Is there
 some weird edge case where Btrfs might permit non-degraded mount when
 only data bg's are on a 2nd device? And then trouble only happens
 later when a balance is done and it goes looking for these bg's? And
 then, boom!

 --
 Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving btrfs fs disk between computers

2017-08-18 Thread Adam Borowski
On Fri, Aug 18, 2017 at 11:09:22PM -0300, Hérikz Nawarro wrote:
> Hello everyone,
> 
> Can i create a btrfs fs in a machine and move the disk to another
> machine like a ext4 or ntfs?

Yeah, no problem whatsoever, even for multi-device filesystems.
Btrfs doesn't care about what devices it is on.

There has been no new incompat features in quite a while, so old kernels
shouldn't be an issue either.

I think the only way for valid btrfs to be unmountable on another machine
with a modern kernel is to mess with page size, which is AFAIK not even
possible on x86.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀-- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
The image doesn't have a valid superblock.  I'm really confused as to
how that could've happened.

On Fri, Aug 18, 2017 at 7:21 PM, Qu Wenruo  wrote:
>
>
> On 2017年08月19日 05:52, Zirconium Hacker wrote:
>>
>> Ok, so since it's clear now that I need that 5 GB device to be
>> present... I found the image file.  But how do I get BTRFS to
>> recognize the image as a device?
>
>
> # losetup -f
> Remember the loop*, here use /dev/loop1 as example.
>
> # losetup /dev/loop1 
> # partprobe /dev/loop1
> Then you should have /dev/loop1p1
>
> # btrfs dev rescan
> If nothing wrong happened, you should be good to go.
>
> Thanks,
> Qu
>
>
>>  I have zero experience with
>> multi-device systems.  Setting it up as a loop device doesn't fix
>> mounting, and wipefs doesn't detect the BTRFS magic number, but
>> printing some of it to console shows it does have real data.  Writing
>> the magic number onto it (it's a copy of the original to be safe)
>> shows in dump-super, but all other values are zero.
>>
>> I tried sending the above on my phone earlier but it was detected as a
>> "virus" because it contained HTML.  Whoops.
>>
>> On Fri, Aug 18, 2017 at 11:00 AM, Chris Murphy 
>> wrote:
>>>
>>> On Fri, Aug 18, 2017 at 2:47 AM, Zirconium Hacker 
>>> wrote:
>>>
 I vaguely remember following this guide at some point:

 http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
 -- specifically the "Balance cannot run because the filesystem is
 full" part.  This may have broken some things?

>>>
>>> If you don't do 'btrfs device delete /dev/loop0' or if that command
>>> does not complete, then it's possible to get into the situation you're
>>> in.
>>>
>>> Have you ever mounted this file system with -o degraded?
>>>
>>> I'm going to guess the history is something like:
>>> 1. enospc
>>> 2. btrfs dev add
>>> 3. some kind of filtered balance, which only causes data block groups
>>> to be moved to the 2nd device
>>> 4. 2nd device is physically removed without first 'btrfs dev del'
>>>
>>> Zirco's superblock very clearly says num_devices  2, so I'd expect
>>> normal mount to always fail unless both devices are present. Is there
>>> some weird edge case where Btrfs might permit non-degraded mount when
>>> only data bg's are on a 2nd device? And then trouble only happens
>>> later when a balance is done and it goes looking for these bg's? And
>>> then, boom!
>>>
>>> --
>>> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Moving btrfs fs disk between computers

2017-08-18 Thread Hérikz Nawarro
Hello everyone,

Can i create a btrfs fs in a machine and move the disk to another
machine like a ext4 or ntfs?

Thanks in advance.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 images make scrub believe the filesystem is corrupted.

2017-08-18 Thread Qu Wenruo



On 2017年08月19日 01:43, Josef Bacik wrote:

On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:

On 08/18/2017 01:39 AM, Josef Bacik wrote:
[...]

This is happening because the app (the guest OS in this case, we saw this a lot
with windows guests) is changing the pages while they are in flight.  We
calculate the checksum of the page before it's written, so if it changes while
in flight we'll end up with a csum mismatch.

To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
You'll have to re-create the image because NODATASUM won't apply to the already
invalid checksums.  Thanks,


Hi Josef,

could you elaborate: do you are saying that using O_DIRECT is incompatible with 
DATASUM ?



No, I'm saying using O_DIRECT with applications that don't protect in-flight
memory are incompatible with DATASUM.  We have no way of making sure nobody
touches the page while we're writing it out, so after we calculate the checksum
any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
user space pages, there's nothing we can do to stop user space from doing stupid
things.

The options I looked into before were things like detecting the page had changed
since we calculated the checksum, and re-submitting the write.  This punishes
applications that do the right thing (databases for example) by forcing us to
calculate checksums twice.


Just curious about this.

Why not just scrubbing data/metadata in commit roots?
And don't use any page cache, but always read them out from disk?

For datacsum case, it's always cowed, so it won't change in-flight.

Although the cost is obvious, such method can only check data/metadata 
in previous trans and doesn't use page cache means tons of IO.


Thanks,
Qu


This is a shit situation because users aren't going to understand this
limitation, and it bites them in the ass with all these weird errors.  I think
maybe we need to go back to the double-checksum thing by default, and have a
flag or something for users to set if they know their application behaves
properly.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Qu Wenruo



On 2017年08月19日 05:52, Zirconium Hacker wrote:

Ok, so since it's clear now that I need that 5 GB device to be
present... I found the image file.  But how do I get BTRFS to
recognize the image as a device?


# losetup -f
Remember the loop*, here use /dev/loop1 as example.

# losetup /dev/loop1 
# partprobe /dev/loop1
Then you should have /dev/loop1p1

# btrfs dev rescan
If nothing wrong happened, you should be good to go.

Thanks,
Qu


 I have zero experience with
multi-device systems.  Setting it up as a loop device doesn't fix
mounting, and wipefs doesn't detect the BTRFS magic number, but
printing some of it to console shows it does have real data.  Writing
the magic number onto it (it's a copy of the original to be safe)
shows in dump-super, but all other values are zero.

I tried sending the above on my phone earlier but it was detected as a
"virus" because it contained HTML.  Whoops.

On Fri, Aug 18, 2017 at 11:00 AM, Chris Murphy  wrote:

On Fri, Aug 18, 2017 at 2:47 AM, Zirconium Hacker  wrote:


I vaguely remember following this guide at some point:
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
-- specifically the "Balance cannot run because the filesystem is
full" part.  This may have broken some things?



If you don't do 'btrfs device delete /dev/loop0' or if that command
does not complete, then it's possible to get into the situation you're
in.

Have you ever mounted this file system with -o degraded?

I'm going to guess the history is something like:
1. enospc
2. btrfs dev add
3. some kind of filtered balance, which only causes data block groups
to be moved to the 2nd device
4. 2nd device is physically removed without first 'btrfs dev del'

Zirco's superblock very clearly says num_devices  2, so I'd expect
normal mount to always fail unless both devices are present. Is there
some weird edge case where Btrfs might permit non-degraded mount when
only data bg's are on a 2nd device? And then trouble only happens
later when a balance is done and it goes looking for these bg's? And
then, boom!

--
Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 images make scrub believe the filesystem is corrupted.

2017-08-18 Thread Goffredo Baroncelli
On 08/18/2017 07:43 PM, Josef Bacik wrote:
> On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
>> On 08/18/2017 01:39 AM, Josef Bacik wrote:
>> [...]
>>> This is happening because the app (the guest OS in this case, we saw this a 
>>> lot
>>> with windows guests) is changing the pages while they are in flight.  We
>>> calculate the checksum of the page before it's written, so if it changes 
>>> while
>>> in flight we'll end up with a csum mismatch.
>>>
>>> To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 
>>> image.
>>> You'll have to re-create the image because NODATASUM won't apply to the 
>>> already
>>> invalid checksums.  Thanks,
>>
>> Hi Josef,
>>
>> could you elaborate: do you are saying that using O_DIRECT is incompatible 
>> with DATASUM ?
>>
> 
> No, I'm saying using O_DIRECT with applications that don't protect in-flight
> memory are incompatible with DATASUM.  

This is what I call an 'incompatibility'. Even is a "corner" case, it is still 
an incompatibility. And to be honest, it is still difficult to say that a "VM" 
is a "corner" case.

> We have no way of making sure nobody
> touches the page while we're writing it out, so after we calculate the 
> checksum
> any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
> user space pages, there's nothing we can do to stop user space from doing 
> stupid
> things.

I understand the technical difficulties; however I can't agree about "user 
space [...] doing *stupid* things". If it is not explicitly forbidden, it is 
legal; not "stupid"

How the application know that the page aren't in-flight anymore ? It is 
sufficient to wait the end of the write() syscall ? Or it has to wait the end 
of a fsync() ?
 
> The options I looked into before were things like detecting the page had 
> changed
> since we calculated the checksum, and re-submitting the write.  This punishes
> applications that do the right thing (databases for example) by forcing us to
> calculate checksums twice.

There are other "cases" where it is possible to have the same problem ? It is 
the same for mmap() ?

> 
> This is a shit situation because users aren't going to understand this
> limitation, and it bites them in the ass with all these weird errors.  I think
> maybe we need to go back to the double-checksum thing by default, and have a
> flag or something for users to set if they know their application behaves
> properly.  

Or... disable checksum for the "O_DIRECT" writings... If you can't trust the 
checksums at 100%, these don't make sense.

> 
> Josef
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/7] Btrfs: remove BUG() in btrfs_extent_inline_ref_size

2017-08-18 Thread Liu Bo
Now that btrfs_get_extent_inline_ref_type() can report if type is a
valid one and all callers can gracefully deal with that, we don't need
to crash here.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 714e779..38b0735 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1799,7 +1799,6 @@ static inline u32 btrfs_extent_inline_ref_size(int type)
if (type == BTRFS_EXTENT_DATA_REF_KEY)
return sizeof(struct btrfs_extent_data_ref) +
   offsetof(struct btrfs_extent_inline_ref, offset);
-   BUG();
return 0;
 }
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/7] Btrfs: add one more sanity check for shared ref type

2017-08-18 Thread Liu Bo
Every shared ref has a parent tree block, which can be get from
btrfs_extent_inline_ref_offset().  And the tree block must be aligned
to the nodesize, so we'd know this inline ref is not valid if this
block's bytenr is not aligned to the nodesize, in which case, most
likely the ref type has been misused.

This adds the above mentioned check and also updates
print_extent_item() called by btrfs_print_leaf() to point out the
invalid ref while printing the tree structure.

Signed-off-by: Liu Bo 
---
 fs/btrfs/extent-tree.c | 29 +
 fs/btrfs/print-tree.c  | 27 +--
 2 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d75129a..13264cd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1158,19 +1158,40 @@ int btrfs_get_extent_inline_ref_type(struct 
extent_buffer *eb,
 enum btrfs_inline_ref_type is_data)
 {
int type = btrfs_extent_inline_ref_type(eb, iref);
+   u64 offset = btrfs_extent_inline_ref_offset(eb, iref);
 
if (type == BTRFS_TREE_BLOCK_REF_KEY ||
type == BTRFS_SHARED_BLOCK_REF_KEY ||
type == BTRFS_SHARED_DATA_REF_KEY ||
type == BTRFS_EXTENT_DATA_REF_KEY) {
if (is_data == BTRFS_REF_TYPE_BLOCK) {
-   if (type == BTRFS_TREE_BLOCK_REF_KEY ||
-   type == BTRFS_SHARED_BLOCK_REF_KEY)
+   if (type == BTRFS_TREE_BLOCK_REF_KEY)
return type;
+   if (type == BTRFS_SHARED_BLOCK_REF_KEY) {
+   ASSERT(eb->fs_info);
+   /*
+* Every shared one has parent tree
+* block, which must be aligned to
+* nodesize.
+*/
+   if (offset &&
+   IS_ALIGNED(offset, eb->fs_info->nodesize))
+   return type;
+   }
} else if (is_data == BTRFS_REF_TYPE_DATA) {
-   if (type == BTRFS_EXTENT_DATA_REF_KEY ||
-   type == BTRFS_SHARED_DATA_REF_KEY)
+   if (type == BTRFS_EXTENT_DATA_REF_KEY)
return type;
+   if (type == BTRFS_SHARED_DATA_REF_KEY) {
+   ASSERT(eb->fs_info);
+   /*
+* Every shared one has parent tree
+* block, which must be aligned to
+* nodesize.
+*/
+   if (offset &&
+   IS_ALIGNED(offset, eb->fs_info->nodesize))
+   return type;
+   }
} else {
ASSERT(is_data == BTRFS_REF_TYPE_ANY);
return type;
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index 0e52e47..9f8c5ee 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -44,7 +44,7 @@ static void print_dev_item(struct extent_buffer *eb,
 static void print_extent_data_ref(struct extent_buffer *eb,
  struct btrfs_extent_data_ref *ref)
 {
-   pr_info("\t\textent data backref root %llu objectid %llu offset %llu 
count %u\n",
+   pr_cont("extent data backref root %llu objectid %llu offset %llu count 
%u\n",
   btrfs_extent_data_ref_root(eb, ref),
   btrfs_extent_data_ref_objectid(eb, ref),
   btrfs_extent_data_ref_offset(eb, ref),
@@ -63,6 +63,7 @@ static void print_extent_item(struct extent_buffer *eb, int 
slot, int type)
u32 item_size = btrfs_item_size_nr(eb, slot);
u64 flags;
u64 offset;
+   int ref_index = 0;
 
if (item_size < sizeof(*ei)) {
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
@@ -104,12 +105,20 @@ static void print_extent_item(struct extent_buffer *eb, 
int slot, int type)
iref = (struct btrfs_extent_inline_ref *)ptr;
type = btrfs_extent_inline_ref_type(eb, iref);
offset = btrfs_extent_inline_ref_offset(eb, iref);
+   pr_info("\t\tref#%d: ", ref_index++);
switch (type) {
case BTRFS_TREE_BLOCK_REF_KEY:
-   pr_info("\t\ttree block backref root %llu\n", offset);
+   pr_cont("tree block backref root %llu\n", offset);
break;
case BTRFS_SHARED_BLOCK_REF_KEY:
-   pr_info("\t\tshared block backref parent %llu\n", 
offset);
+   pr_cont("shared block backref parent %llu\n", offset);
+

[PATCH v3 0/7] add sanity check for extent inline ref type

2017-08-18 Thread Liu Bo
An invalid extent inline ref type could be read from a btrfs image and
it ends up with a panic[1], this set is to deal with the insane value
gracefully in patch 1-2 and clean up BUG() in the code in patch 3-6.

Patch 7 adds one more check to see if the ref is a valid shared one.

I'm not sure in the real world what may result in this corruption, but
I've seen several reports on the ML about __btrfs_free_extent saying
something was missing (or simply wrong), while testing this set with
btrfs-corrupt-block, I found that switching ref type could end up that
situation as well, eg. a data extent's ref type
(BTRFS_EXTENT_DATA_REF_KEY) is switched to (BTRFS_TREE_BLOCK_REF_KEY).
Hopefully this can give people more sights next time when that
happens.

[1]:https://www.spinics.net/lists/linux-btrfs/msg65646.html

v3:
- btrfs_inline_ref_types -> btrfs_inline_ref_type
- convert WARN(1) to btrfs_err so that we know which btrfs has that error.

v2:
- add enum type and return BTRFS_REF_TYPE_INVALID instead of -EINVAL.
- remove one more BUG_ON which is in __add_tree_block.
- add validation check for shared refs.
- improve btrfs_print_leaf to show which refs has something wrong.

Liu Bo (7):
  Btrfs: add a helper to retrive extent inline ref type
  Btrfs: convert to use btrfs_get_extent_inline_ref_type
  Btrfs: remove BUG() in btrfs_extent_inline_ref_size
  Btrfs: remove BUG() in print_extent_item
  Btrfs: remove BUG() in add_data_reference
  Btrfs: remove BUG_ON in __add_tree_block
  Btrfs: add one more sanity check for shared ref type

 fs/btrfs/backref.c | 11 --
 fs/btrfs/ctree.h   | 12 ++-
 fs/btrfs/extent-tree.c | 94 ++
 fs/btrfs/print-tree.c  | 28 ---
 fs/btrfs/relocation.c  | 30 +---
 5 files changed, 157 insertions(+), 18 deletions(-)

-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/7] Btrfs: convert to use btrfs_get_extent_inline_ref_type

2017-08-18 Thread Liu Bo
Since we have a helper which can do sanity check, this converts all
btrfs_extent_inline_ref_type to it.

Signed-off-by: Liu Bo 
---
 fs/btrfs/backref.c | 11 +--
 fs/btrfs/extent-tree.c | 36 ++--
 fs/btrfs/relocation.c  | 13 +++--
 3 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index f723c11..1b3d9df 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1012,7 +1012,11 @@ static int __add_inline_refs(struct btrfs_path *path, 
u64 bytenr,
int type;
 
iref = (struct btrfs_extent_inline_ref *)ptr;
-   type = btrfs_extent_inline_ref_type(leaf, iref);
+   type = btrfs_get_extent_inline_ref_type(leaf, iref,
+   BTRFS_REF_TYPE_ANY);
+   if (type == BTRFS_REF_TYPE_INVALID)
+   return -EINVAL;
+
offset = btrfs_extent_inline_ref_offset(leaf, iref);
 
switch (type) {
@@ -1908,7 +1912,10 @@ static int __get_extent_inline_ref(unsigned long *ptr, 
struct extent_buffer *eb,
 
end = (unsigned long)ei + item_size;
*out_eiref = (struct btrfs_extent_inline_ref *)(*ptr);
-   *out_type = btrfs_extent_inline_ref_type(eb, *out_eiref);
+   *out_type = btrfs_get_extent_inline_ref_type(eb, *out_eiref,
+BTRFS_REF_TYPE_ANY);
+   if (*out_type == BTRFS_REF_TYPE_INVALID)
+   return -EINVAL;
 
*ptr += btrfs_extent_inline_ref_size(*out_type);
WARN_ON(*ptr > end);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c540578..d75129a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1454,12 +1454,18 @@ static noinline u32 extent_data_ref_count(struct 
btrfs_path *path,
struct btrfs_extent_data_ref *ref1;
struct btrfs_shared_data_ref *ref2;
u32 num_refs = 0;
+   int type;
 
leaf = path->nodes[0];
btrfs_item_key_to_cpu(leaf, , path->slots[0]);
if (iref) {
-   if (btrfs_extent_inline_ref_type(leaf, iref) ==
-   BTRFS_EXTENT_DATA_REF_KEY) {
+   /*
+* If type is invalid, we should have bailed out earlier than
+* this call.
+*/
+   type = btrfs_get_extent_inline_ref_type(leaf, iref, 
BTRFS_REF_TYPE_DATA);
+   ASSERT(type != BTRFS_REF_TYPE_INVALID);
+   if (type == BTRFS_EXTENT_DATA_REF_KEY) {
ref1 = (struct btrfs_extent_data_ref *)(>offset);
num_refs = btrfs_extent_data_ref_count(leaf, ref1);
} else {
@@ -1620,6 +1626,7 @@ int lookup_inline_extent_backref(struct 
btrfs_trans_handle *trans,
int ret;
int err = 0;
bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA);
+   int needed;
 
key.objectid = bytenr;
key.type = BTRFS_EXTENT_ITEM_KEY;
@@ -1711,6 +1718,11 @@ int lookup_inline_extent_backref(struct 
btrfs_trans_handle *trans,
BUG_ON(ptr > end);
}
 
+   if (owner >= BTRFS_FIRST_FREE_OBJECTID)
+   needed = BTRFS_REF_TYPE_DATA;
+   else
+   needed = BTRFS_REF_TYPE_BLOCK;
+
err = -ENOENT;
while (1) {
if (ptr >= end) {
@@ -1718,7 +1730,12 @@ int lookup_inline_extent_backref(struct 
btrfs_trans_handle *trans,
break;
}
iref = (struct btrfs_extent_inline_ref *)ptr;
-   type = btrfs_extent_inline_ref_type(leaf, iref);
+   type = btrfs_get_extent_inline_ref_type(leaf, iref, needed);
+   if (type == BTRFS_REF_TYPE_INVALID) {
+   err = -EINVAL;
+   goto out;
+   }
+
if (want < type)
break;
if (want > type) {
@@ -1910,7 +1927,12 @@ void update_inline_extent_backref(struct btrfs_fs_info 
*fs_info,
if (extent_op)
__run_delayed_extent_op(extent_op, leaf, ei);
 
-   type = btrfs_extent_inline_ref_type(leaf, iref);
+   /*
+* If type is invalid, we should have bailed out after
+* lookup_inline_extent_backref().
+*/
+   type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_ANY);
+   ASSERT(type != BTRFS_REF_TYPE_INVALID);
 
if (type == BTRFS_EXTENT_DATA_REF_KEY) {
dref = (struct btrfs_extent_data_ref *)(>offset);
@@ -3195,6 +3217,7 @@ static noinline int check_committed_ref(struct btrfs_root 
*root,
struct btrfs_extent_item *ei;
struct btrfs_key key;
u32 item_size;
+   int type;
int ret;
 
key.objectid = bytenr;
@@ -3236,8 +3259,9 @@ static noinline int check_committed_ref(struct btrfs_root 
*root,
  

[PATCH v3 6/7] Btrfs: remove BUG_ON in __add_tree_block

2017-08-18 Thread Liu Bo
The BUG_ON() can be triggered when the caller is processing an invalid
extent inline ref, e.g.

a shared data ref is offered instead of a extent data ref, such that
it tries to find a non-exist tree block and then btrfs_search_slot
returns 1 for no such item.

This replaces the BUG_ON() with a WARN() followed by calling
btrfs_print_leaf() to show more details about what's going on and
returning -EINVAL to upper callers.

Signed-off-by: Liu Bo 
---
 fs/btrfs/relocation.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 53fc798..71d38dd 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -32,6 +32,7 @@
 #include "free-space-cache.h"
 #include "inode-map.h"
 #include "qgroup.h"
+#include "print-tree.h"
 
 /*
  * backref_node, mapping_node and tree_block start with this
@@ -3485,7 +3486,16 @@ static int __add_tree_block(struct reloc_control *rc,
goto again;
}
}
-   BUG_ON(ret);
+   if (ret) {
+   ASSERT(ret == 1);
+   btrfs_print_leaf(rc->extent_root->fs_info, path->nodes[0]);
+   btrfs_err(fs_info,
+"tree block extent item (%llu) is not found in extent tree",
+bytenr);
+   WARN_ON(1);
+   ret = -EINVAL;
+   goto out;
+   }
 
ret = add_tree_block(rc, , path, blocks);
 out:
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/7] Btrfs: remove BUG() in print_extent_item

2017-08-18 Thread Liu Bo
btrfs_print_leaf() is used in btrfs_get_extent_inline_ref_type, so
here we really want to print the invalid value of ref type instead of
causing a kernel panic.

Signed-off-by: Liu Bo 
---
 fs/btrfs/print-tree.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index fcae61e..0e52e47 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -121,7 +121,10 @@ static void print_extent_item(struct extent_buffer *eb, 
int slot, int type)
   offset, btrfs_shared_data_ref_count(eb, sref));
break;
default:
-   BUG();
+   btrfs_err(eb->fs_info,
+ "extent %llu has invalid ref type %d\n",
+ eb->start, type);
+   return;
}
ptr += btrfs_extent_inline_ref_size(type);
}
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/7] Btrfs: remove BUG() in add_data_reference

2017-08-18 Thread Liu Bo
Now that we have a helper to report invalid value of extent inline ref
type, we need to quit gracefully instead of throwing out a kernel panic.

Signed-off-by: Liu Bo 
---
 fs/btrfs/relocation.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f0bef3c..53fc798 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3774,7 +3774,10 @@ int add_data_references(struct reloc_control *rc,
ret = find_data_references(rc, extent_key,
   eb, dref, blocks);
} else {
-   BUG();
+   ret = -EINVAL;
+   btrfs_err(rc->extent_root->fs_info,
+"extent %llu slot %d has an invalid inline ref type",
+eb->start, path->slots[0]);
}
if (ret) {
err = ret;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/7] Btrfs: add a helper to retrive extent inline ref type

2017-08-18 Thread Liu Bo
An invalid value of extent inline ref type may be read from a
malicious image which may force btrfs to crash.

This adds a helper which does sanity check for the ref type, so we can
know if it's sane, return type if so, otherwise return an error.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.h   | 11 +++
 fs/btrfs/extent-tree.c | 37 +
 2 files changed, 48 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3f3eb7b..714e779 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2561,6 +2561,17 @@ static inline gfp_t btrfs_alloc_write_mask(struct 
address_space *mapping)
 
 /* extent-tree.c */
 
+enum btrfs_inline_ref_type {
+   BTRFS_REF_TYPE_INVALID = 0,
+   BTRFS_REF_TYPE_BLOCK =   1,
+   BTRFS_REF_TYPE_DATA =2,
+   BTRFS_REF_TYPE_ANY = 3,
+};
+
+int btrfs_get_extent_inline_ref_type(struct extent_buffer *eb,
+struct btrfs_extent_inline_ref *iref,
+enum btrfs_inline_ref_type is_data);
+
 u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes);
 
 static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e3b0b41..c540578 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1148,6 +1148,43 @@ static int convert_extent_item_v0(struct 
btrfs_trans_handle *trans,
 }
 #endif
 
+/*
+ * is_data == BTRFS_REF_TYPE_BLOCK, tree block type is required,
+ * is_data == BTRFS_REF_TYPE_DATA, data type is requried,
+ * is_data == BTRFS_REF_TYPE_ANY, either type is OK.
+ */
+int btrfs_get_extent_inline_ref_type(struct extent_buffer *eb,
+struct btrfs_extent_inline_ref *iref,
+enum btrfs_inline_ref_type is_data)
+{
+   int type = btrfs_extent_inline_ref_type(eb, iref);
+
+   if (type == BTRFS_TREE_BLOCK_REF_KEY ||
+   type == BTRFS_SHARED_BLOCK_REF_KEY ||
+   type == BTRFS_SHARED_DATA_REF_KEY ||
+   type == BTRFS_EXTENT_DATA_REF_KEY) {
+   if (is_data == BTRFS_REF_TYPE_BLOCK) {
+   if (type == BTRFS_TREE_BLOCK_REF_KEY ||
+   type == BTRFS_SHARED_BLOCK_REF_KEY)
+   return type;
+   } else if (is_data == BTRFS_REF_TYPE_DATA) {
+   if (type == BTRFS_EXTENT_DATA_REF_KEY ||
+   type == BTRFS_SHARED_DATA_REF_KEY)
+   return type;
+   } else {
+   ASSERT(is_data == BTRFS_REF_TYPE_ANY);
+   return type;
+   }
+   }
+
+   btrfs_print_leaf(eb->fs_info, eb);
+   btrfs_err(eb->fs_info, "eb %llu invalid extent inline ref type %d",
+ eb->start, type);
+   WARN_ON(1);
+
+   return BTRFS_REF_TYPE_INVALID;
+}
+
 static u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset)
 {
u32 high_crc = ~(u32)0;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
Ok, so since it's clear now that I need that 5 GB device to be
present... I found the image file.  But how do I get BTRFS to
recognize the image as a device?  I have zero experience with
multi-device systems.  Setting it up as a loop device doesn't fix
mounting, and wipefs doesn't detect the BTRFS magic number, but
printing some of it to console shows it does have real data.  Writing
the magic number onto it (it's a copy of the original to be safe)
shows in dump-super, but all other values are zero.

I tried sending the above on my phone earlier but it was detected as a
"virus" because it contained HTML.  Whoops.

On Fri, Aug 18, 2017 at 11:00 AM, Chris Murphy  wrote:
> On Fri, Aug 18, 2017 at 2:47 AM, Zirconium Hacker  
> wrote:
>
>> I vaguely remember following this guide at some point:
>> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
>> -- specifically the "Balance cannot run because the filesystem is
>> full" part.  This may have broken some things?
>>
>
> If you don't do 'btrfs device delete /dev/loop0' or if that command
> does not complete, then it's possible to get into the situation you're
> in.
>
> Have you ever mounted this file system with -o degraded?
>
> I'm going to guess the history is something like:
> 1. enospc
> 2. btrfs dev add
> 3. some kind of filtered balance, which only causes data block groups
> to be moved to the 2nd device
> 4. 2nd device is physically removed without first 'btrfs dev del'
>
> Zirco's superblock very clearly says num_devices  2, so I'd expect
> normal mount to always fail unless both devices are present. Is there
> some weird edge case where Btrfs might permit non-degraded mount when
> only data bg's are on a 2nd device? And then trouble only happens
> later when a balance is done and it goes looking for these bg's? And
> then, boom!
>
> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: change how we decide to commit transactions during flushing

2017-08-18 Thread josef
From: Josef Bacik 

Nikolay reported that generic/273 was failing currently with ENOSPC.
Turns out this is because we get to the point where the outstanding
reservations are greater than the pinned space on the fs.  This is a
mistake, previously we used the current reservation amount in
may_commit_transaction, not the entire outstanding reservation amount.
Fix this to find the minimum byte size needed to make progress in
flushing, and pass that into may_commit_transaction.  From there we can
make a smarter decision on whether to commit the transaction or not.
This fixes the failure in generic/273.

Reported-by: Nikolay Borisov 
Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c | 40 +---
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fac42b8..4509e47 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4872,8 +4872,12 @@ static int may_commit_transaction(struct btrfs_fs_info 
*fs_info,
return -ENOSPC;
 
spin_lock(_rsv->lock);
+   if (delayed_rsv->size > bytes)
+   bytes = 0;
+   else
+   bytes -= delayed_rsv->size;
if (percpu_counter_compare(_info->total_bytes_pinned,
-  bytes - delayed_rsv->size) < 0) {
+  bytes) < 0) {
spin_unlock(_rsv->lock);
return -ENOSPC;
}
@@ -4901,7 +4905,7 @@ struct reserve_ticket {
  */
 static void flush_space(struct btrfs_fs_info *fs_info,
   struct btrfs_space_info *space_info, u64 num_bytes,
-  int state)
+  u64 min_bytes, int state)
 {
struct btrfs_root *root = fs_info->extent_root;
struct btrfs_trans_handle *trans;
@@ -4944,7 +4948,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
break;
case COMMIT_TRANS:
ret = may_commit_transaction(fs_info, space_info,
-num_bytes, 0);
+min_bytes, 0);
break;
default:
ret = -ENOSPC;
@@ -4959,17 +4963,23 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 static inline u64
 btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
 struct btrfs_space_info *space_info,
-bool system_chunk)
+u64 *min_reclaim, bool system_chunk)
 {
struct reserve_ticket *ticket;
u64 used;
u64 expected;
u64 to_reclaim = 0;
 
-   list_for_each_entry(ticket, _info->tickets, list)
+   list_for_each_entry(ticket, _info->tickets, list) {
+   if (min_reclaim && *min_reclaim == 0)
+   *min_reclaim = ticket->bytes;
to_reclaim += ticket->bytes;
-   list_for_each_entry(ticket, _info->priority_tickets, list)
+   }
+   list_for_each_entry(ticket, _info->priority_tickets, list) {
+   if (min_reclaim && *min_reclaim == 0)
+   *min_reclaim = ticket->bytes;
to_reclaim += ticket->bytes;
+   }
if (to_reclaim)
return to_reclaim;
 
@@ -4992,6 +5002,8 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info 
*fs_info,
to_reclaim = 0;
to_reclaim = min(to_reclaim, space_info->bytes_may_use +
 space_info->bytes_reserved);
+   if (min_reclaim)
+   *min_reclaim = to_reclaim;
return to_reclaim;
 }
 
@@ -5005,7 +5017,7 @@ static inline int need_do_async_reclaim(struct 
btrfs_fs_info *fs_info,
if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh)
return 0;
 
-   if (!btrfs_calc_reclaim_metadata_size(fs_info, space_info,
+   if (!btrfs_calc_reclaim_metadata_size(fs_info, space_info, NULL,
  system_chunk))
return 0;
 
@@ -5038,13 +5050,14 @@ static void btrfs_async_reclaim_metadata_space(struct 
work_struct *work)
int flush_state;
int commit_cycles = 0;
u64 last_tickets_id;
+   u64 min_reclaim = 0;
 
fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
 
spin_lock(_info->lock);
to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info, space_info,
- false);
+ _reclaim, false);
if (!to_reclaim) {
space_info->flush = 0;
spin_unlock(_info->lock);
@@ -5055,15 +5068,18 @@ static void btrfs_async_reclaim_metadata_space(struct 
work_struct *work)
 

Re: [PATCH] btrfs: submit superblock io with REQ_META

2017-08-18 Thread Liu Bo
On Fri, Aug 18, 2017 at 06:21:53PM +0200, David Sterba wrote:
> The superblock is also metadata of the filesystem so the relevant IO
> should be tagged as such.
> 

Reviewed-by: Liu Bo 

-liubo
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/disk-io.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 90b967ae46d0..21b55e8caf33 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3433,9 +3433,11 @@ static int write_dev_supers(struct btrfs_device 
> *device,
>*/
>   if (i == 0) {
>   ret = btrfsic_submit_bh(REQ_OP_WRITE,
> - REQ_SYNC | REQ_FUA, bh);
> + REQ_SYNC | REQ_FUA | REQ_META,
> + bh);
>   } else {
> - ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
> + ret = btrfsic_submit_bh(REQ_OP_WRITE,
> + REQ_SYNC | REQ_META, bh);
>   }
>   if (ret)
>   errors++;
> -- 
> 2.14.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: do not async submit for nodatacsum inodes

2017-08-18 Thread Liu Bo
While we submit direct writes, if the inode is flagged with nodatasum,
there's no benefit to submit asynchronously, because

a) we don't have to calculate checksum across processors,

b) and direct IO has started a plug, but async submit makes us queue
IO on each device's scheduled IO list instead of DIO's plug list, so
that IOs get much less merges in general.

Lets use sync submit for nodatasum inodes.

Signed-off-by: Liu Bo 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95c2120..f4a48d8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8467,7 +8467,7 @@ static inline int __btrfs_submit_dio_bio(struct bio *bio, 
struct inode *inode,
goto err;
}
 map:
-   ret = btrfs_map_bio(fs_info, bio, 0, async_submit);
+   ret = btrfs_map_bio(fs_info, bio, 0, 0);
 err:
bio_put(bio);
return ret;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove unnecessary skip_sum

2017-08-18 Thread Liu Bo
Now that we can get inode from dip->inode, it's not necessary to pass
skip_sum from callers.

Signed-off-by: Liu Bo 
---
 fs/btrfs/inode.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f4a48d8..91e701c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8424,8 +8424,7 @@ static inline blk_status_t 
btrfs_lookup_and_bind_dio_csum(struct inode *inode,
 }
 
 static inline int __btrfs_submit_dio_bio(struct bio *bio, struct inode *inode,
-u64 file_offset, int skip_sum,
-int async_submit)
+u64 file_offset, int async_submit)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_dio_private *dip = bio->bi_private;
@@ -8443,7 +8442,7 @@ static inline int __btrfs_submit_dio_bio(struct bio *bio, 
struct inode *inode,
goto err;
}
 
-   if (skip_sum)
+   if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
goto map;
 
if (write && async_submit) {
@@ -8473,8 +8472,7 @@ static inline int __btrfs_submit_dio_bio(struct bio *bio, 
struct inode *inode,
return ret;
 }
 
-static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
-   int skip_sum)
+static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 {
struct inode *inode = dip->inode;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -8537,7 +8535,7 @@ static int btrfs_submit_direct_hook(struct 
btrfs_dio_private *dip,
 */
atomic_inc(>pending_bios);
 
-   ret = __btrfs_submit_dio_bio(bio, inode, file_offset, skip_sum,
+   ret = __btrfs_submit_dio_bio(bio, inode, file_offset,
 async_submit);
if (ret) {
bio_put(bio);
@@ -8557,8 +8555,7 @@ static int btrfs_submit_direct_hook(struct 
btrfs_dio_private *dip,
} while (submit_len > 0);
 
 submit:
-   ret = __btrfs_submit_dio_bio(bio, inode, file_offset, skip_sum,
-async_submit);
+   ret = __btrfs_submit_dio_bio(bio, inode, file_offset, async_submit);
if (!ret)
return 0;
 
@@ -8583,12 +8580,9 @@ static void btrfs_submit_direct(struct bio *dio_bio, 
struct inode *inode,
struct btrfs_dio_private *dip = NULL;
struct bio *bio = NULL;
struct btrfs_io_bio *io_bio;
-   int skip_sum;
bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
int ret = 0;
 
-   skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
-
bio = btrfs_bio_clone(dio_bio);
 
dip = kzalloc(sizeof(*dip), GFP_NOFS);
@@ -8631,7 +8625,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, 
struct inode *inode,
dio_data->unsubmitted_oe_range_end;
}
 
-   ret = btrfs_submit_direct_hook(dip, skip_sum);
+   ret = btrfs_submit_direct_hook(dip);
if (!ret)
return;
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: make plug in writing meta blocks really work

2017-08-18 Thread Liu Bo
We have started plug in btrfs_write_and_wait_marked_extents() but the
generated IOs actually go to device's schedule IO list where the work
is doing in another task, thus the started plug doesn't make any
sense.

And since we wait for IOs immediately after writing meta blocks, it's
the same case as writing log tree, doing sync submit can merge more
IOs.

Signed-off-by: Liu Bo 
---
 fs/btrfs/disk-io.c | 6 --
 fs/btrfs/transaction.c | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 080e2eb..8d097ba 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1005,8 +1005,10 @@ static blk_status_t __btree_submit_bio_done(void 
*private_data, struct bio *bio,
return ret;
 }
 
-static int check_async_write(unsigned long bio_flags)
+static int check_async_write(struct btrfs_inode *bi, unsigned long bio_flags)
 {
+   if (atomic_read(>sync_writers))
+   return 0;
if (bio_flags & EXTENT_BIO_TREE_LOG)
return 0;
 #ifdef CONFIG_X86
@@ -1022,7 +1024,7 @@ static blk_status_t btree_submit_bio_hook(void 
*private_data, struct bio *bio,
 {
struct inode *inode = private_data;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-   int async = check_async_write(bio_flags);
+   int async = check_async_write(BTRFS_I(inode), bio_flags);
blk_status_t ret;
 
if (bio_op(bio) != REQ_OP_WRITE) {
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f615d59..9c5f126 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -950,6 +950,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info 
*fs_info,
u64 start = 0;
u64 end;
 
+   atomic_inc(_I(fs_info->btree_inode)->sync_writers);
while (!find_first_extent_bit(dirty_pages, start, , ,
  mark, _state)) {
bool wait_writeback = false;
@@ -985,6 +986,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info 
*fs_info,
cond_resched();
start = end + 1;
}
+   atomic_dec(_I(fs_info->btree_inode)->sync_writers);
return werr;
 }
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 images make scrub believe the filesystem is corrupted.

2017-08-18 Thread Paulo Dias
HI all..

So this sucks, having to check every single disk image so it wont get
corrupted stuff, while it works on other filesystems isn't exactly
stellar.

i changed the qcow2 settings per this page:
https://pve.proxmox.com/wiki/Performance_Tweaks

i went with cache=writeback since it neither uses O_DSYNC nor O_DIRECT
semantics.

I understand the reasons checksumming might fail and that isn't
exactly btrfs fault but i scoured the entire btrfs wiki and couldnt
find anything warning about btrfs + qcow2 (or other image types) in
the wiki pages. Maybe adding some warning would help unlucky ppl like
myself?

Also, is is safe to enable compress=lzo , or is it also a no no, im
also starting to suspect that discard wasnt the culprit since i have
it enabled for more then a year and the only problem i got with
corruption was precisely this images?
| Paulo Dias
| paulo.miguel.d...@gmail.com

Tempora mutantur, nos et mutamur in illis.


On Fri, Aug 18, 2017 at 2:59 PM, Liu Bo  wrote:
> On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
>> On 08/18/2017 01:39 AM, Josef Bacik wrote:
>> [...]
>> > This is happening because the app (the guest OS in this case, we saw this 
>> > a lot
>> > with windows guests) is changing the pages while they are in flight.  We
>> > calculate the checksum of the page before it's written, so if it changes 
>> > while
>> > in flight we'll end up with a csum mismatch.
>> >
>> > To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 
>> > image.
>> > You'll have to re-create the image because NODATASUM won't apply to the 
>> > already
>> > invalid checksums.  Thanks,
>>
>> Hi Josef,
>>
>> could you elaborate: do you are saying that using O_DIRECT is incompatible 
>> with DATASUM ?
>>
>
> They're compatible, but applications need to be careful.
>
> O_DIRECT takes userspace page, and it works like
> DIO write
>   p = get_user_page();
>   add p to bio
>   #btrfs submits this bio
>   calc_checksum(bio);
>   submit_bio();
>
> There's a chance that page p got changed between calc_checksum() and
> submit_bio(), which then causes the mismatch.
>
> For buffered IO, dirty page cache pages is synchronized with page
> fault by page lock and page writeback bit.
>
> thanks,
> -liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 images make scrub believe the filesystem is corrupted.

2017-08-18 Thread Liu Bo
On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
> On 08/18/2017 01:39 AM, Josef Bacik wrote:
> [...]
> > This is happening because the app (the guest OS in this case, we saw this a 
> > lot
> > with windows guests) is changing the pages while they are in flight.  We
> > calculate the checksum of the page before it's written, so if it changes 
> > while
> > in flight we'll end up with a csum mismatch.
> > 
> > To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 
> > image.
> > You'll have to re-create the image because NODATASUM won't apply to the 
> > already
> > invalid checksums.  Thanks,
> 
> Hi Josef,
> 
> could you elaborate: do you are saying that using O_DIRECT is incompatible 
> with DATASUM ?
>

They're compatible, but applications need to be careful.

O_DIRECT takes userspace page, and it works like
DIO write
  p = get_user_page();
  add p to bio
  #btrfs submits this bio
  calc_checksum(bio);
  submit_bio();

There's a chance that page p got changed between calc_checksum() and
submit_bio(), which then causes the mismatch.

For buffered IO, dirty page cache pages is synchronized with page
fault by page lock and page writeback bit.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 images make scrub believe the filesystem is corrupted.

2017-08-18 Thread Josef Bacik
On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
> On 08/18/2017 01:39 AM, Josef Bacik wrote:
> [...]
> > This is happening because the app (the guest OS in this case, we saw this a 
> > lot
> > with windows guests) is changing the pages while they are in flight.  We
> > calculate the checksum of the page before it's written, so if it changes 
> > while
> > in flight we'll end up with a csum mismatch.
> > 
> > To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 
> > image.
> > You'll have to re-create the image because NODATASUM won't apply to the 
> > already
> > invalid checksums.  Thanks,
> 
> Hi Josef,
> 
> could you elaborate: do you are saying that using O_DIRECT is incompatible 
> with DATASUM ?
> 

No, I'm saying using O_DIRECT with applications that don't protect in-flight
memory are incompatible with DATASUM.  We have no way of making sure nobody
touches the page while we're writing it out, so after we calculate the checksum
any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
user space pages, there's nothing we can do to stop user space from doing stupid
things.

The options I looked into before were things like detecting the page had changed
since we calculated the checksum, and re-submitting the write.  This punishes
applications that do the right thing (databases for example) by forcing us to
calculate checksums twice.

This is a shit situation because users aren't going to understand this
limitation, and it bites them in the ass with all these weird errors.  I think
maybe we need to go back to the double-checksum thing by default, and have a
flag or something for users to set if they know their application behaves
properly.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 images make scrub believe the filesystem is corrupted.

2017-08-18 Thread Goffredo Baroncelli
On 08/18/2017 01:39 AM, Josef Bacik wrote:
[...]
> This is happening because the app (the guest OS in this case, we saw this a 
> lot
> with windows guests) is changing the pages while they are in flight.  We
> calculate the checksum of the page before it's written, so if it changes while
> in flight we'll end up with a csum mismatch.
> 
> To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 
> image.
> You'll have to re-create the image because NODATASUM won't apply to the 
> already
> invalid checksums.  Thanks,

Hi Josef,

could you elaborate: do you are saying that using O_DIRECT is incompatible with 
DATASUM ?

Let me know

BR
G.Baroncelli

> 
> Josef 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: submit superblock io with REQ_META

2017-08-18 Thread David Sterba
The superblock is also metadata of the filesystem so the relevant IO
should be tagged as such.

Signed-off-by: David Sterba 
---
 fs/btrfs/disk-io.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 90b967ae46d0..21b55e8caf33 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3433,9 +3433,11 @@ static int write_dev_supers(struct btrfs_device *device,
 */
if (i == 0) {
ret = btrfsic_submit_bh(REQ_OP_WRITE,
-   REQ_SYNC | REQ_FUA, bh);
+   REQ_SYNC | REQ_FUA | REQ_META,
+   bh);
} else {
-   ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
+   ret = btrfsic_submit_bh(REQ_OP_WRITE,
+   REQ_SYNC | REQ_META, bh);
}
if (ret)
errors++;
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH preview] btrfs: allow to set compression level for zlib

2017-08-18 Thread David Sterba
On Sat, Aug 05, 2017 at 02:15:42AM +, Nick Terrell wrote:
> Looking at the memory usage of BtrFS zstd, the 128 KB window size keeps the
> memory usage very reasonable up to level 19. The zstd compression levels
> are computed using a tool that selects the parameters that give the best
> compression ratio for a given compression speed target. Since BtrFS has a
> fixed window size, the default compression levels might not be optimal. We
> could compute our own compression levels for a 128 KB window size.
> 
> | Level | Memory |
> |---||
> | 1 | 0.8 MB |
> | 2 | 1.0 MB |
> | 3 | 1.3 MB |
> | 4 | 0.9 MB |
> | 5 | 1.4 MB |
> | 6 | 1.5 MB |
> | 7 | 1.4 MB |
> | 8 | 1.8 MB |
> | 9 | 1.8 MB |
> | 10| 1.8 MB |
> | 11| 1.8 MB |
> | 12| 1.8 MB |
> | 13| 2.4 MB |
> | 14| 2.6 MB |
> | 15| 2.6 MB |
> | 16| 3.1 MB |
> | 17| 3.1 MB |
> | 18| 3.1 MB |
> | 19| 3.1 MB |
> 
> The workspace memory usage for each compression level.

That's quite a lot, in kernel. IIRC zlib and lzo use less than 200kb,
zstd wants 800kb for level 1. And this needs to be contiguous memory, so
if we're lucky and get the memory at the mount time, fine. In general
the memory can be fragmented (in the worst case, there are only 4k
chunks available), so we'd have to vmalloc and consume the virtual
mappings in great numbers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH preview] btrfs: allow to set compression level for zlib

2017-08-18 Thread David Sterba
On Fri, Aug 04, 2017 at 09:51:44PM +, Nick Terrell wrote:
> > + * @type_level is encoded algorithm and level, where level 0 means whatever
> > + * default the algorithm chooses and is opaque here;
> > + * - compression algo are 0-3
> > + * - the level are bits 4-7
> 
> zstd has 19 levels, but we can either only allow the first 15 + default, or
> provide a mapping from zstd-level to BtrFS zstd-level.

19 levels sounds too much to me, I hoped that 15 should be enough for
everybody. When I teste various zlib level, there were only small
compression gains at a high runtime cost from levels 6-9. So some kind
of mapping would be desirable if the levels 16+ prove to be better than
< 15 under the btrfs contraints.

> >   * @out_pages is an in/out parameter, holds maximum number of pages to 
> > allocate
> >   * and returns number of actually allocated pages
> >   *
> > @@ -880,7 +885,7 @@ static void free_workspaces(void)
> >   * @max_out tells us the max number of bytes that we're allowed to
> >   * stuff into pages
> >   */
> > -int btrfs_compress_pages(int type, struct address_space *mapping,
> > +int btrfs_compress_pages(unsigned int type_level, struct address_space 
> > *mapping,
> >  u64 start, struct page **pages,
> >  unsigned long *out_pages,
> >  unsigned long *total_in,
> > @@ -888,9 +893,11 @@ int btrfs_compress_pages(int type, struct 
> > address_space *mapping,
> >  {
> > struct list_head *workspace;
> > int ret;
> > +   int type = type_level & 0xF;
> >  
> > workspace = find_workspace(type);
> >  
> > +   btrfs_compress_op[type - 1]->set_level(workspace, type_level);
> 
> zlib uses the same amount of memory independently of the compression level,
> but zstd uses a different amount of memory for each level.

We could extend the code to provide 2 types of workspaces, one to cover
the "fast" levels and one for the "high compression" levels. And the
larger one can be used for 'fast' so it does not just idle around.

> zstd will have
> to allocate memory here if it doesn't have enough (or has way to much),
> will that be okay?

This would be a problem. Imagine that the system is short on free
memory, starts to flush dirty data and now some filesystem starts asking
for hundreds of kilobytes of new memory just to write the data (and free
resources and the memory in turn). That's the case we need to
preallocate when there's enough memory, at least one workspace, so
there's a guarantee of forward progress.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] btrfs: Add zstd support

2017-08-18 Thread David Sterba
On Wed, Jul 26, 2017 at 12:19:29AM +0100, Giovanni Cabiddu wrote:
> Hi Nick,
> 
> On Thu, Jul 20, 2017 at 10:27:42PM +0100, Nick Terrell wrote:
> > Add zstd compression and decompression support to BtrFS. zstd at its
> > fastest level compresses almost as well as zlib, while offering much
> > faster compression and decompression, approaching lzo speeds.
> Can we look at integrating the zstd implementation below the acomp API
> available in the crypto subsystem?
> (https://github.com/torvalds/linux/blob/master/crypto/acompress.c)
> Acomp was designed to provide a generic and uniform API for compression
> in the kernel which hides algorithm specific details to frameworks.
> In future it would be nice to see btrfs using exclusively acomp
> for compression. This way when a new compression algorithm is exposed
> through acomp, it will be available immediately in btrfs.

The compression algorithm is considered part of the filesystem format
and must be covered by an incompatibility bit. Regarding the acomp API,
I don't see much point adding the extra abstraction layer for the two
currently supported algorithms. We only call 2 functions, for
compression and decompression, no need to allocate the acomp helper
structures and call the functions indirectly.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH] btrfs-progs: Specify C standard to gnu90 explicitly

2017-08-18 Thread David Sterba
On Tue, Aug 15, 2017 at 09:17:12AM +0900, Qu Wenruo wrote:
> Different C compilers have different default language standard.
> This sometimes causes problem on different system.
> 
> For distribution like CentOS/RHEL7, its gcc is still 4.8 and will report
> error for c90 style declaration, while most developers are using newer
> gcc which will just ignore it.
> This makes us hard to detect such language standard problem.
> 
> This patch will specify standard to gnu90 explicitly to avoid such problem.
> Gnu90 is a good mix of c90 while provide a lot of useful gnu extension,
> and is supported by all modern gcc and clang.
> 
> Reported-by: Marco Lorenzo Crociani 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent

2017-08-18 Thread Nikolay Borisov
Currently this function is always called with the object id of the root key of
the chunk_tree, which is always BTRFS_CHUNK_TREE_OBJECTID. So let's subsume it
straight into the function itself. No functional change.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/volumes.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 63608c5f4487..d024f1b07282 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1571,8 +1571,7 @@ static int btrfs_free_dev_extent(struct 
btrfs_trans_handle *trans,
 
 static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
  struct btrfs_device *device,
- u64 chunk_tree, u64 chunk_offset, u64 start,
- u64 num_bytes)
+ u64 chunk_offset, u64 start, u64 num_bytes)
 {
int ret;
struct btrfs_path *path;
@@ -1599,7 +1598,8 @@ static int btrfs_alloc_dev_extent(struct 
btrfs_trans_handle *trans,
leaf = path->nodes[0];
extent = btrfs_item_ptr(leaf, path->slots[0],
struct btrfs_dev_extent);
-   btrfs_set_dev_extent_chunk_tree(leaf, extent, chunk_tree);
+   btrfs_set_dev_extent_chunk_tree(leaf, extent,
+   BTRFS_CHUNK_TREE_OBJECTID);
btrfs_set_dev_extent_chunk_objectid(leaf, extent,
BTRFS_FIRST_CHUNK_TREE_OBJECTID);
btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
@@ -4903,10 +4903,8 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
ret = btrfs_update_device(trans, device);
if (ret)
break;
-   ret = btrfs_alloc_dev_extent(trans, device,
-chunk_root->root_key.objectid,
-chunk_offset, dev_offset,
-stripe_size);
+   ret = btrfs_alloc_dev_extent(trans, device, chunk_offset,
+dev_offset, stripe_size);
if (ret)
break;
}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Chris Murphy
On Fri, Aug 18, 2017 at 2:47 AM, Zirconium Hacker  wrote:

> I vaguely remember following this guide at some point:
> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
> -- specifically the "Balance cannot run because the filesystem is
> full" part.  This may have broken some things?
>

If you don't do 'btrfs device delete /dev/loop0' or if that command
does not complete, then it's possible to get into the situation you're
in.

Have you ever mounted this file system with -o degraded?

I'm going to guess the history is something like:
1. enospc
2. btrfs dev add
3. some kind of filtered balance, which only causes data block groups
to be moved to the 2nd device
4. 2nd device is physically removed without first 'btrfs dev del'

Zirco's superblock very clearly says num_devices  2, so I'd expect
normal mount to always fail unless both devices are present. Is there
some weird edge case where Btrfs might permit non-degraded mount when
only data bg's are on a 2nd device? And then trouble only happens
later when a balance is done and it goes looking for these bg's? And
then, boom!

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent

2017-08-18 Thread Nikolay Borisov
THe function is always called with chunk_objectid set to
BTRFS_FIRST_CHUNK_TREE_OBJECTID. Let's collapse the parameter in the function
itself. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/volumes.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a37a31ba6843..63608c5f4487 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1571,8 +1571,8 @@ static int btrfs_free_dev_extent(struct 
btrfs_trans_handle *trans,
 
 static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
  struct btrfs_device *device,
- u64 chunk_tree, u64 chunk_objectid,
- u64 chunk_offset, u64 start, u64 num_bytes)
+ u64 chunk_tree, u64 chunk_offset, u64 start,
+ u64 num_bytes)
 {
int ret;
struct btrfs_path *path;
@@ -1600,7 +1600,8 @@ static int btrfs_alloc_dev_extent(struct 
btrfs_trans_handle *trans,
extent = btrfs_item_ptr(leaf, path->slots[0],
struct btrfs_dev_extent);
btrfs_set_dev_extent_chunk_tree(leaf, extent, chunk_tree);
-   btrfs_set_dev_extent_chunk_objectid(leaf, extent, chunk_objectid);
+   btrfs_set_dev_extent_chunk_objectid(leaf, extent,
+   BTRFS_FIRST_CHUNK_TREE_OBJECTID);
btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
 
btrfs_set_dev_extent_length(leaf, extent, num_bytes);
@@ -4904,7 +4905,6 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
break;
ret = btrfs_alloc_dev_extent(trans, device,
 chunk_root->root_key.objectid,
-BTRFS_FIRST_CHUNK_TREE_OBJECTID,
 chunk_offset, dev_offset,
 stripe_size);
if (ret)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: Simplify btrfs_alloc_dev_extent

2017-08-18 Thread David Sterba
On Fri, Jul 28, 2017 at 08:59:12AM +0300, Nikolay Borisov wrote:
> 
> 
> On 27.07.2017 20:57, Filipe Manana wrote:
> > On Thu, Jul 27, 2017 at 6:36 PM, Nikolay Borisov  wrote:
> >> Currently btrfs_alloc_dev_extent essentially open codes btrfs_insert_item. 
> >> So
> >> let's remove the superfluous code, leaving only the important bits, namely
> >> initialising the device extent and just calling btrfs_insert_item. So 
> >> first add
> >> definition for the stack-based set/get function. And then use them.
> >> Additionally, remove the code which sets the uuid of the block header, 
> >> since
> >> this is something which is already handled by the core btree code.
> > 
> > Quite honestly, I don't see the value of this change at all.
> > It doesn't make things simpler nor more readable nor nothing.
> > We have many, really many places using btrfs_insert_empty_item()
> > instead of calling btrfs_insert_item(), are you planning on sending a
> > patch to do the replacement for each of them? What's the point?
> 
> I beg you to differ. Some of the code in btrfs is a mess, it's working
> but it's messy. There is constant violation of abstractions (as is the
> case in this function, heck the uuid setting of the block header
> function doesn't even belong here). All of this hampers reading
> comprehension of the code for newcomers. You are experienced in the code
> and likely this doesn't apply to you but since I'm someone relatively
> new to the code this has been my experience. And I believe any effort to
> actually simplify the code, make it more coherent and succinct is a win
> long-term.
> 
> I will wait for other feedback, if people feel patches like that are
> just bikeshedding then I will refrain from such cleanups in the future.

What I don't like about this patch, also pointed out by Filipe, is the
additional memory allocation. Just for the sake of code beauty, this is
not the direction the cleanups should go. The function is IMHO readable
and below my current threshold of cleanup need.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent

2017-08-18 Thread David Sterba
On Thu, Jul 27, 2017 at 08:36:36PM +0300, Nikolay Borisov wrote:
> Currently this function is always called with the object id of the root key of
> the chunk_tree, which is always BTRFS_CHUNK_TREE_OBJECTID. So let's subsume it
> straight into the function itself. No functional change.
> 
> Signed-off-by: Nikolay Borisov 
> ---
>  fs/btrfs/volumes.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 1c0a5fff0e2f..5a1913956f20 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1578,8 +1578,7 @@ static int btrfs_free_dev_extent(struct 
> btrfs_trans_handle *trans,
>  
>  static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
> struct btrfs_device *device,
> -   u64 chunk_tree, u64 chunk_offset, u64 start,
> -   u64 num_bytes)
> +   u64 chunk_offset, u64 start, u64 num_bytes)

This hunk does not apply, the context in git contains u64 chunk_objectid
and I haven't found any patch that would remove or rename it. Please
refresh and resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove redundant setting of uuid in btrfs_block_header.

2017-08-18 Thread David Sterba
On Fri, Jul 28, 2017 at 10:50:14AM +0300, Nikolay Borisov wrote:
> btrfs_alloc_dev_extent currently unconditionally sets the uuid in the leaf 
> block
> header the function is working with. This is unnecessary since this operation
> is peformed by the core btree handling code (splitting a node, allocating a 
> new
> btree block etc). So let's remove it.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Next btrfs development cycle open - 4.15

2017-08-18 Thread David Sterba
Hi,

a friendly reminder of the timetable and what's expected at this phase.

4.12 - current
4.13 - upcoming, urgent regression fixes only
4.14 - development closed, pull request in prep, fixes or regressions only
4.15 - development open, until 4.14-rc5

(https://btrfs.wiki.kernel.org/index.php/Developer%27s_FAQ#Development_schedule)

Besides the the usual cleanups and fixes, you can now start sending any patches
that could be more intrusive and would benefit from a longer period of
testing, or development revisions.

The base of the patches should be the last announced pull request, which is
going to be named 'for-4.14-part1' in my k.org tree.  Reviewed patches will be
collected in a branch that's usually named 'misc-next' in my devel git repos
and is part of the for-next at k.org git repo.

k.org: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
devel1: http://repo.or.cz/linux-2.6/btrfs-unstable.git
devel2: https://github.com/kdave/btrfs-devel

d.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: use btrfsic_submit_bio instead of submit_bio in write_dev_flush

2017-08-18 Thread David Sterba
On Fri, Aug 18, 2017 at 04:38:07PM +0800, Lu Fengqi wrote:
> Although this bio has no data attached, it will reach this condition
> (bio->bi_opf & REQ_PREFLUSH) and then update the flush_gen of dev_state
> in __btrfsic_submit_bio. So we should still submit it through integrity
> checker. Otherwise, the integrity checker will throw the following warning
> when I mount a newly created btrfs filesystem.

Right, thanks.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Qu Wenruo



On 2017年08月18日 18:20, Zirconium Hacker wrote:

# ./btrfs-debug-tree -b 131072 /dev/sda4
https://pastebin.com/TDa0GuqB


At least this output explains everything.
( although the result may not make you happy )

Check this:

item 59 key (FIRST_CHUNK_TREE CHUNK_ITEM 62351474688) itemoff 11447 
itemsize 80

length 1073741824 owner 2 stripe_len 65536 type DATA
io_align 65536 io_width 65536 sector_size 4096
num_stripes 1 sub_stripes 1
stripe 0 devid 2 offset 1074790400
dev_uuid 039224e8-dd3a-4e95-af1d-660a2021ac55

This means that, for logical address in range [62351474688, 
62351474688+1073741824), all these very important metadata are in your 
*2nd* device!!

( And 4G data is also on that device)

That's why all tree backup roots and tree roots read fails.
Because there is no such device for btrfs to read, and that's why all we 
get is 0.


And considering all your tree root and backup roots are in this range, 
that's to say, without finding your 2nd device (whose dev uuid is 
039224e8-dd3a-4e95-af1d-660a2021ac55) you lost all your possibility to 
recovery the fs.

(This reminds me to enhance btrfs kernel message about missing device)

Since both super block and dump-tree result point to a missing device, I 
really recommend you to double check the history of the fs.
( Maybe you added a usb device to do balance but forget to run "btrfs 
device remove"? )


BTW, to check the dev uuid, you could use the following command to get 
dev_uuid:

# btrfs inspect dump-super  | grep dev_item.uuid

Good luck.

Thanks,
Qu


# ./btrfs-debug-tree -b 61809344512 /dev/sda4
btrfs-progs v4.12-dirty
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
bytenr mismatch, want=61809344512, have=0
ERROR: failed to read 61809344512
# ./btrfs-debug-tree -b 61807755264 /dev/sda4
btrfs-progs v4.12-dirty
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
bytenr mismatch, want=61807755264, have=0
ERROR: failed to read 61807755264

And that last one you wanted me to run debug-tree on was a duplicate.

Bonus:
# ./btrfs-debug-tree -b 108544 /dev/sda4
btrfs-progs v4.12-dirty
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
node 108544 level 1 items 2 free 491 generation 325709 owner 1
fs uuid 29889b3a-1c10-48e4-ad6d-21d03d06e90b
chunk uuid 33f664ec-d0bc-42f9-87f1-d2c05046
key (EXTENT_TREE ROOT_ITEM 0) block 1085456384 (66251) gen 325709
key (286 INODE_ITEM 0) block 1085505536 (66254) gen 325709

BTW, thank you for your quick responses and help so far.

On Fri, Aug 18, 2017 at 5:46 AM, Qu Wenruo  wrote:

Would you please try this patch?
https://patchwork.kernel.org/patch/9908173/

This should allow btrfs-debug-tree to output tree block even tree root is
corrupted.
You could apply it on lasted master branch (tagged as v4.12).

Then re-execute the following command (with patched btrfs-progs):
# btrfs-debug-tree -b 131072 /dev/sda4

And some new output:
# btrfs-debug-tree -b 61809344512 /dev/sda4
# btrfs-debug-tree -b 61807755264 /dev/sda4
# btrfs-debug-tree -b 61809344512 /dev/sda4

Thanks,
Qu


On 2017年08月18日 17:29, Zirconium Hacker wrote:


$ sudo btrfs check -r 108544 /dev/sda4
parent transid verify failed on 108544 wanted 325966 found 325709
parent transid verify failed on 108544 wanted 325966 found 325709
Ignoring transid failure
bytenr mismatch, want=61352312832, have=0
Couldn't setup device tree
ERROR: cannot open file system

On Fri, Aug 18, 2017 at 5:19 AM, Qu Wenruo  wrote:




On 2017年08月18日 17:08, Zirconium Hacker wrote:



I already ran that earlier, here's the pastebin:
https://pastebin.com/KGB8nVRA

Running debug-tree on all 1084 of them (I guess that was unnecessary)
gave the same errors every time:
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4



Then try using btrfs check with new root:

# btrfs check -r 108544 /dev/sda4

Please note that, the generation in superblock differs quite a lot with
find-root result.
So I'm afraid it will cause quite a lot of problems.

But least, it should help btrfs check to get over "Couldn't read tree
root"
error message.

And for btrfs-debug-tree error, I'll submit a patch soon to allow it to
be
run on such heavily damaged fs.


Thanks,
Qu


On Fri, Aug 18, 2017 at 5:03 AM, Qu Wenruo 
wrote:





On 2017年08月18日 16:47, Zirconium Hacker wrote:




$ sudo btrfs-debug-tree -b 131072 /dev/sda4
btrfs-progs v4.12
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4





I think this can be improved for case like this.
I'll try to submit a patch to enhance btrfs-debug-tree.

Would you please try "btrfs-find-root /dev/sda4"?
This will try to locate on-disk old tree root, and if we're lucky, old
tree
root can allow us to mount the fs.



Mounting with degraded,ro does not fix the multi-device issue.  The
system was 

Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
# ./btrfs-debug-tree -b 131072 /dev/sda4
https://pastebin.com/TDa0GuqB
# ./btrfs-debug-tree -b 61809344512 /dev/sda4
btrfs-progs v4.12-dirty
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
bytenr mismatch, want=61809344512, have=0
ERROR: failed to read 61809344512
# ./btrfs-debug-tree -b 61807755264 /dev/sda4
btrfs-progs v4.12-dirty
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
bytenr mismatch, want=61807755264, have=0
ERROR: failed to read 61807755264

And that last one you wanted me to run debug-tree on was a duplicate.

Bonus:
# ./btrfs-debug-tree -b 108544 /dev/sda4
btrfs-progs v4.12-dirty
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
node 108544 level 1 items 2 free 491 generation 325709 owner 1
fs uuid 29889b3a-1c10-48e4-ad6d-21d03d06e90b
chunk uuid 33f664ec-d0bc-42f9-87f1-d2c05046
key (EXTENT_TREE ROOT_ITEM 0) block 1085456384 (66251) gen 325709
key (286 INODE_ITEM 0) block 1085505536 (66254) gen 325709

BTW, thank you for your quick responses and help so far.

On Fri, Aug 18, 2017 at 5:46 AM, Qu Wenruo  wrote:
> Would you please try this patch?
> https://patchwork.kernel.org/patch/9908173/
>
> This should allow btrfs-debug-tree to output tree block even tree root is
> corrupted.
> You could apply it on lasted master branch (tagged as v4.12).
>
> Then re-execute the following command (with patched btrfs-progs):
> # btrfs-debug-tree -b 131072 /dev/sda4
>
> And some new output:
> # btrfs-debug-tree -b 61809344512 /dev/sda4
> # btrfs-debug-tree -b 61807755264 /dev/sda4
> # btrfs-debug-tree -b 61809344512 /dev/sda4
>
> Thanks,
> Qu
>
>
> On 2017年08月18日 17:29, Zirconium Hacker wrote:
>>
>> $ sudo btrfs check -r 108544 /dev/sda4
>> parent transid verify failed on 108544 wanted 325966 found 325709
>> parent transid verify failed on 108544 wanted 325966 found 325709
>> Ignoring transid failure
>> bytenr mismatch, want=61352312832, have=0
>> Couldn't setup device tree
>> ERROR: cannot open file system
>>
>> On Fri, Aug 18, 2017 at 5:19 AM, Qu Wenruo  wrote:
>>>
>>>
>>>
>>> On 2017年08月18日 17:08, Zirconium Hacker wrote:


 I already ran that earlier, here's the pastebin:
 https://pastebin.com/KGB8nVRA

 Running debug-tree on all 1084 of them (I guess that was unnecessary)
 gave the same errors every time:
 bytenr mismatch, want=61809344512, have=0
 Couldn't read tree root
 ERROR: unable to open /dev/sda4

>>>
>>> Then try using btrfs check with new root:
>>>
>>> # btrfs check -r 108544 /dev/sda4
>>>
>>> Please note that, the generation in superblock differs quite a lot with
>>> find-root result.
>>> So I'm afraid it will cause quite a lot of problems.
>>>
>>> But least, it should help btrfs check to get over "Couldn't read tree
>>> root"
>>> error message.
>>>
>>> And for btrfs-debug-tree error, I'll submit a patch soon to allow it to
>>> be
>>> run on such heavily damaged fs.
>>>
>>>
>>> Thanks,
>>> Qu
>>>
 On Fri, Aug 18, 2017 at 5:03 AM, Qu Wenruo 
 wrote:
>
>
>
>
> On 2017年08月18日 16:47, Zirconium Hacker wrote:
>>
>>
>>
>> $ sudo btrfs-debug-tree -b 131072 /dev/sda4
>> btrfs-progs v4.12
>> bytenr mismatch, want=61809344512, have=0
>> Couldn't read tree root
>> ERROR: unable to open /dev/sda4
>
>
>
>
> I think this can be improved for case like this.
> I'll try to submit a patch to enhance btrfs-debug-tree.
>
> Would you please try "btrfs-find-root /dev/sda4"?
> This will try to locate on-disk old tree root, and if we're lucky, old
> tree
> root can allow us to mount the fs.
>
>>
>> Mounting with degraded,ro does not fix the multi-device issue.  The
>> system was never really intended to have a second device, though:
>
>
>
>
> Wait for a minute, did you mean this btrfs doesn't ever have a second
> device?
> This seems quite weird now.
>
>>
>> $ sudo btrfs fi show /dev/sda4
>> bytenr mismatch, want=61809344512, have=0
>> Couldn't read tree root
>> Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
>> Total devices 2 FS bytes used 49.52GiB
>> devid1 size 54.07GiB used 54.07GiB path /dev/sda4
>> *** Some devices missing
>>
>> I vaguely remember following this guide at some point:
>>
>>
>>
>> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
>> -- specifically the "Balance cannot run because the filesystem is
>> full" part.  This may have broken some things?
>
>
>
>
> Not sure, at least from your superblock, too many things are in doubt.
>   From the number of devices, to strange system chunk.
>
>
> Thanks,
> Qu
>>
>>
>>
>>
>> On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo 

Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Qu Wenruo

Would you please try this patch?
https://patchwork.kernel.org/patch/9908173/

This should allow btrfs-debug-tree to output tree block even tree root 
is corrupted.

You could apply it on lasted master branch (tagged as v4.12).

Then re-execute the following command (with patched btrfs-progs):
# btrfs-debug-tree -b 131072 /dev/sda4

And some new output:
# btrfs-debug-tree -b 61809344512 /dev/sda4
# btrfs-debug-tree -b 61807755264 /dev/sda4
# btrfs-debug-tree -b 61809344512 /dev/sda4

Thanks,
Qu

On 2017年08月18日 17:29, Zirconium Hacker wrote:

$ sudo btrfs check -r 108544 /dev/sda4
parent transid verify failed on 108544 wanted 325966 found 325709
parent transid verify failed on 108544 wanted 325966 found 325709
Ignoring transid failure
bytenr mismatch, want=61352312832, have=0
Couldn't setup device tree
ERROR: cannot open file system

On Fri, Aug 18, 2017 at 5:19 AM, Qu Wenruo  wrote:



On 2017年08月18日 17:08, Zirconium Hacker wrote:


I already ran that earlier, here's the pastebin:
https://pastebin.com/KGB8nVRA

Running debug-tree on all 1084 of them (I guess that was unnecessary)
gave the same errors every time:
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4



Then try using btrfs check with new root:

# btrfs check -r 108544 /dev/sda4

Please note that, the generation in superblock differs quite a lot with
find-root result.
So I'm afraid it will cause quite a lot of problems.

But least, it should help btrfs check to get over "Couldn't read tree root"
error message.

And for btrfs-debug-tree error, I'll submit a patch soon to allow it to be
run on such heavily damaged fs.


Thanks,
Qu


On Fri, Aug 18, 2017 at 5:03 AM, Qu Wenruo  wrote:




On 2017年08月18日 16:47, Zirconium Hacker wrote:



$ sudo btrfs-debug-tree -b 131072 /dev/sda4
btrfs-progs v4.12
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4




I think this can be improved for case like this.
I'll try to submit a patch to enhance btrfs-debug-tree.

Would you please try "btrfs-find-root /dev/sda4"?
This will try to locate on-disk old tree root, and if we're lucky, old
tree
root can allow us to mount the fs.



Mounting with degraded,ro does not fix the multi-device issue.  The
system was never really intended to have a second device, though:




Wait for a minute, did you mean this btrfs doesn't ever have a second
device?
This seems quite weird now.



$ sudo btrfs fi show /dev/sda4
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
Total devices 2 FS bytes used 49.52GiB
devid1 size 54.07GiB used 54.07GiB path /dev/sda4
*** Some devices missing

I vaguely remember following this guide at some point:


http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
-- specifically the "Balance cannot run because the filesystem is
full" part.  This may have broken some things?




Not sure, at least from your superblock, too many things are in doubt.
  From the number of devices, to strange system chunk.


Thanks,
Qu




On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo 
wrote:





On 2017年08月18日 15:17, Zirconium Hacker wrote:




I checked my fstab, and my mount options for that partition are:
nodev,nosuid (so no discard).
As far as I remember, I had some issues converting from ext4 with
existing tools (I think that was on Debian so the tools were likely
older) so I did a manual conversion backup, wipe, copy files back).

$ sudo btrfs-find-root -o 3 /dev/sda4
Couldn't read tree root
Superblock thinks the generation is 311252
Superblock thinks the level is 0
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
Found tree root at 131072 gen 311252 level 0





So chunk root (and since it's level 0, the whole chunk tree) seems
good.

Could you please try the following command?
# btrfs-debug-tree -b 131072 /dev/sda4

I assume it may fail due to the fact that root tree is corrupted.
But maybe we are lucky?


And further investigating your super dump and the code, it's shows some
clue, mostly related to your multi-device setup.

Your find-root output shows that, the only chunk leaf in /dev/sda4
seems
good.
And in btrfs_read_chunk_tree(), which returned -EIO and caused the
error
message, will first search chunk root.

Since your chunk leaf is good, such search itself should not cause too
much
problem.

Then btrfs_read_chunk_tree() will try to read out each device, by
calling
read_one_dev().
Which can return -EIO if any device is missing and you're not using
degraded
mount option.

Is your 2nd device missing? If so, would you please try to mount with
"degraded,ro" mount option?

BTW, if you didn't manually convert chunk profiles, did you first
create
btrfs on single device, and then added a new device to the btrfs?

Thanks,
Qu



On Fri, Aug 18, 2017 at 12:10 AM, 

[PATCH] btrfs-progs: Allow inspect dump-tree to show specified tree block even some tree roots are corrupted

2017-08-18 Thread Qu Wenruo
For btrfs inspect-internal dump-tree, if we use "-b" parameter to show
specified tree block, then we don't really need extra tree roots.

Only chunk root is needed to build up the whole chunk mapping so we can
read tree blocks.

This patch will add __OPEN_CTREE_RETURN_CHUNK_ROOT flag when show
speicifed tree block.
So even root tree is corrupted, we can still use inspect-internal
dump-tree to do some debugging.

Reported-by: Zirconium Hacker 
Signed-off-by: Qu Wenruo 
---
 cmds-inspect-dump-tree.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index 93dff086..876ddcb5 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -222,6 +222,7 @@ int cmd_inspect_dump_tree(int argc, char **argv)
int uuid_tree_only = 0;
int roots_only = 0;
int root_backups = 0;
+   unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
u64 block_only = 0;
struct btrfs_root *tree_root_scan;
u64 tree_id = 0;
@@ -260,6 +261,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
root_backups = 1;
break;
case 'b':
+   /*
+* If only showing one block, no need to fill roots
+* other than chunk root
+*/
+   open_ctree_flags |= __OPEN_CTREE_RETURN_CHUNK_ROOT;
block_only = arg_strtou64(optarg);
break;
case 't': {
@@ -299,19 +305,14 @@ int cmd_inspect_dump_tree(int argc, char **argv)
 
printf("%s\n", PACKAGE_STRING);
 
-   info = open_ctree_fs_info(argv[optind], 0, 0, 0, OPEN_CTREE_PARTIAL);
+   info = open_ctree_fs_info(argv[optind], 0, 0, 0, open_ctree_flags);
if (!info) {
error("unable to open %s", argv[optind]);
goto out;
}
 
-   root = info->fs_root;
-   if (!root) {
-   error("unable to open %s", argv[optind]);
-   goto out;
-   }
-
if (block_only) {
+   root = info->chunk_root;
leaf = read_tree_block(info,
  block_only,
  info->nodesize, 0);
@@ -337,6 +338,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
goto close_root;
}
 
+   root = info->fs_root;
+   if (!root) {
+   error("unable to open %s", argv[optind]);
+   goto out;
+   }
+
if (!(extent_only || uuid_tree_only || tree_id)) {
if (roots_only) {
printf("root tree: %llu level %d\n",
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
$ sudo btrfs check -r 108544 /dev/sda4
parent transid verify failed on 108544 wanted 325966 found 325709
parent transid verify failed on 108544 wanted 325966 found 325709
Ignoring transid failure
bytenr mismatch, want=61352312832, have=0
Couldn't setup device tree
ERROR: cannot open file system

On Fri, Aug 18, 2017 at 5:19 AM, Qu Wenruo  wrote:
>
>
> On 2017年08月18日 17:08, Zirconium Hacker wrote:
>>
>> I already ran that earlier, here's the pastebin:
>> https://pastebin.com/KGB8nVRA
>>
>> Running debug-tree on all 1084 of them (I guess that was unnecessary)
>> gave the same errors every time:
>> bytenr mismatch, want=61809344512, have=0
>> Couldn't read tree root
>> ERROR: unable to open /dev/sda4
>>
>
> Then try using btrfs check with new root:
>
> # btrfs check -r 108544 /dev/sda4
>
> Please note that, the generation in superblock differs quite a lot with
> find-root result.
> So I'm afraid it will cause quite a lot of problems.
>
> But least, it should help btrfs check to get over "Couldn't read tree root"
> error message.
>
> And for btrfs-debug-tree error, I'll submit a patch soon to allow it to be
> run on such heavily damaged fs.
>
>
> Thanks,
> Qu
>
>> On Fri, Aug 18, 2017 at 5:03 AM, Qu Wenruo  wrote:
>>>
>>>
>>>
>>> On 2017年08月18日 16:47, Zirconium Hacker wrote:


 $ sudo btrfs-debug-tree -b 131072 /dev/sda4
 btrfs-progs v4.12
 bytenr mismatch, want=61809344512, have=0
 Couldn't read tree root
 ERROR: unable to open /dev/sda4
>>>
>>>
>>>
>>> I think this can be improved for case like this.
>>> I'll try to submit a patch to enhance btrfs-debug-tree.
>>>
>>> Would you please try "btrfs-find-root /dev/sda4"?
>>> This will try to locate on-disk old tree root, and if we're lucky, old
>>> tree
>>> root can allow us to mount the fs.
>>>

 Mounting with degraded,ro does not fix the multi-device issue.  The
 system was never really intended to have a second device, though:
>>>
>>>
>>>
>>> Wait for a minute, did you mean this btrfs doesn't ever have a second
>>> device?
>>> This seems quite weird now.
>>>

 $ sudo btrfs fi show /dev/sda4
 bytenr mismatch, want=61809344512, have=0
 Couldn't read tree root
 Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
 Total devices 2 FS bytes used 49.52GiB
 devid1 size 54.07GiB used 54.07GiB path /dev/sda4
 *** Some devices missing

 I vaguely remember following this guide at some point:


 http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
 -- specifically the "Balance cannot run because the filesystem is
 full" part.  This may have broken some things?
>>>
>>>
>>>
>>> Not sure, at least from your superblock, too many things are in doubt.
>>>  From the number of devices, to strange system chunk.
>>>
>>>
>>> Thanks,
>>> Qu



 On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo 
 wrote:
>
>
>
>
> On 2017年08月18日 15:17, Zirconium Hacker wrote:
>>
>>
>>
>> I checked my fstab, and my mount options for that partition are:
>> nodev,nosuid (so no discard).
>> As far as I remember, I had some issues converting from ext4 with
>> existing tools (I think that was on Debian so the tools were likely
>> older) so I did a manual conversion backup, wipe, copy files back).
>>
>> $ sudo btrfs-find-root -o 3 /dev/sda4
>> Couldn't read tree root
>> Superblock thinks the generation is 311252
>> Superblock thinks the level is 0
>> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
>> Found tree root at 131072 gen 311252 level 0
>
>
>
>
> So chunk root (and since it's level 0, the whole chunk tree) seems
> good.
>
> Could you please try the following command?
> # btrfs-debug-tree -b 131072 /dev/sda4
>
> I assume it may fail due to the fact that root tree is corrupted.
> But maybe we are lucky?
>
>
> And further investigating your super dump and the code, it's shows some
> clue, mostly related to your multi-device setup.
>
> Your find-root output shows that, the only chunk leaf in /dev/sda4
> seems
> good.
> And in btrfs_read_chunk_tree(), which returned -EIO and caused the
> error
> message, will first search chunk root.
>
> Since your chunk leaf is good, such search itself should not cause too
> much
> problem.
>
> Then btrfs_read_chunk_tree() will try to read out each device, by
> calling
> read_one_dev().
> Which can return -EIO if any device is missing and you're not using
> degraded
> mount option.
>
> Is your 2nd device missing? If so, would you please try to mount with
> "degraded,ro" mount option?
>
> BTW, if you didn't manually convert chunk profiles, did you first
> create
> btrfs on 

Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Qu Wenruo



On 2017年08月18日 17:08, Zirconium Hacker wrote:

I already ran that earlier, here's the pastebin: https://pastebin.com/KGB8nVRA

Running debug-tree on all 1084 of them (I guess that was unnecessary)
gave the same errors every time:
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4



Then try using btrfs check with new root:

# btrfs check -r 108544 /dev/sda4

Please note that, the generation in superblock differs quite a lot with 
find-root result.

So I'm afraid it will cause quite a lot of problems.

But least, it should help btrfs check to get over "Couldn't read tree 
root" error message.


And for btrfs-debug-tree error, I'll submit a patch soon to allow it to 
be run on such heavily damaged fs.


Thanks,
Qu


On Fri, Aug 18, 2017 at 5:03 AM, Qu Wenruo  wrote:



On 2017年08月18日 16:47, Zirconium Hacker wrote:


$ sudo btrfs-debug-tree -b 131072 /dev/sda4
btrfs-progs v4.12
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4



I think this can be improved for case like this.
I'll try to submit a patch to enhance btrfs-debug-tree.

Would you please try "btrfs-find-root /dev/sda4"?
This will try to locate on-disk old tree root, and if we're lucky, old tree
root can allow us to mount the fs.



Mounting with degraded,ro does not fix the multi-device issue.  The
system was never really intended to have a second device, though:



Wait for a minute, did you mean this btrfs doesn't ever have a second
device?
This seems quite weird now.



$ sudo btrfs fi show /dev/sda4
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
Total devices 2 FS bytes used 49.52GiB
devid1 size 54.07GiB used 54.07GiB path /dev/sda4
*** Some devices missing

I vaguely remember following this guide at some point:

http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
-- specifically the "Balance cannot run because the filesystem is
full" part.  This may have broken some things?



Not sure, at least from your superblock, too many things are in doubt.
 From the number of devices, to strange system chunk.


Thanks,
Qu



On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo  wrote:




On 2017年08月18日 15:17, Zirconium Hacker wrote:



I checked my fstab, and my mount options for that partition are:
nodev,nosuid (so no discard).
As far as I remember, I had some issues converting from ext4 with
existing tools (I think that was on Debian so the tools were likely
older) so I did a manual conversion backup, wipe, copy files back).

$ sudo btrfs-find-root -o 3 /dev/sda4
Couldn't read tree root
Superblock thinks the generation is 311252
Superblock thinks the level is 0
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
Found tree root at 131072 gen 311252 level 0




So chunk root (and since it's level 0, the whole chunk tree) seems good.

Could you please try the following command?
# btrfs-debug-tree -b 131072 /dev/sda4

I assume it may fail due to the fact that root tree is corrupted.
But maybe we are lucky?


And further investigating your super dump and the code, it's shows some
clue, mostly related to your multi-device setup.

Your find-root output shows that, the only chunk leaf in /dev/sda4 seems
good.
And in btrfs_read_chunk_tree(), which returned -EIO and caused the error
message, will first search chunk root.

Since your chunk leaf is good, such search itself should not cause too
much
problem.

Then btrfs_read_chunk_tree() will try to read out each device, by calling
read_one_dev().
Which can return -EIO if any device is missing and you're not using
degraded
mount option.

Is your 2nd device missing? If so, would you please try to mount with
"degraded,ro" mount option?

BTW, if you didn't manually convert chunk profiles, did you first create
btrfs on single device, and then added a new device to the btrfs?

Thanks,
Qu



On Fri, Aug 18, 2017 at 12:10 AM, Chris Murphy 
wrote:



On Thu, Aug 17, 2017 at 4:42 PM, Qu Wenruo 
wrote:


BTW are you using discard mount option? Sometimes it can cause
problem.




OP did not say if it was using discard mount option; but did say some
time before this (I'm not sure how recent) he had used fstrim. The
firmware for this SSD model is current.


--
Chris Murphy



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
I already ran that earlier, here's the pastebin: https://pastebin.com/KGB8nVRA

Running debug-tree on all 1084 of them (I guess that was unnecessary)
gave the same errors every time:
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4

On Fri, Aug 18, 2017 at 5:03 AM, Qu Wenruo  wrote:
>
>
> On 2017年08月18日 16:47, Zirconium Hacker wrote:
>>
>> $ sudo btrfs-debug-tree -b 131072 /dev/sda4
>> btrfs-progs v4.12
>> bytenr mismatch, want=61809344512, have=0
>> Couldn't read tree root
>> ERROR: unable to open /dev/sda4
>
>
> I think this can be improved for case like this.
> I'll try to submit a patch to enhance btrfs-debug-tree.
>
> Would you please try "btrfs-find-root /dev/sda4"?
> This will try to locate on-disk old tree root, and if we're lucky, old tree
> root can allow us to mount the fs.
>
>>
>> Mounting with degraded,ro does not fix the multi-device issue.  The
>> system was never really intended to have a second device, though:
>
>
> Wait for a minute, did you mean this btrfs doesn't ever have a second
> device?
> This seems quite weird now.
>
>>
>> $ sudo btrfs fi show /dev/sda4
>> bytenr mismatch, want=61809344512, have=0
>> Couldn't read tree root
>> Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
>> Total devices 2 FS bytes used 49.52GiB
>> devid1 size 54.07GiB used 54.07GiB path /dev/sda4
>> *** Some devices missing
>>
>> I vaguely remember following this guide at some point:
>>
>> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
>> -- specifically the "Balance cannot run because the filesystem is
>> full" part.  This may have broken some things?
>
>
> Not sure, at least from your superblock, too many things are in doubt.
> From the number of devices, to strange system chunk.
>
>
> Thanks,
> Qu
>>
>>
>> On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo  wrote:
>>>
>>>
>>>
>>> On 2017年08月18日 15:17, Zirconium Hacker wrote:


 I checked my fstab, and my mount options for that partition are:
 nodev,nosuid (so no discard).
 As far as I remember, I had some issues converting from ext4 with
 existing tools (I think that was on Debian so the tools were likely
 older) so I did a manual conversion backup, wipe, copy files back).

 $ sudo btrfs-find-root -o 3 /dev/sda4
 Couldn't read tree root
 Superblock thinks the generation is 311252
 Superblock thinks the level is 0
 ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
 Found tree root at 131072 gen 311252 level 0
>>>
>>>
>>>
>>> So chunk root (and since it's level 0, the whole chunk tree) seems good.
>>>
>>> Could you please try the following command?
>>> # btrfs-debug-tree -b 131072 /dev/sda4
>>>
>>> I assume it may fail due to the fact that root tree is corrupted.
>>> But maybe we are lucky?
>>>
>>>
>>> And further investigating your super dump and the code, it's shows some
>>> clue, mostly related to your multi-device setup.
>>>
>>> Your find-root output shows that, the only chunk leaf in /dev/sda4 seems
>>> good.
>>> And in btrfs_read_chunk_tree(), which returned -EIO and caused the error
>>> message, will first search chunk root.
>>>
>>> Since your chunk leaf is good, such search itself should not cause too
>>> much
>>> problem.
>>>
>>> Then btrfs_read_chunk_tree() will try to read out each device, by calling
>>> read_one_dev().
>>> Which can return -EIO if any device is missing and you're not using
>>> degraded
>>> mount option.
>>>
>>> Is your 2nd device missing? If so, would you please try to mount with
>>> "degraded,ro" mount option?
>>>
>>> BTW, if you didn't manually convert chunk profiles, did you first create
>>> btrfs on single device, and then added a new device to the btrfs?
>>>
>>> Thanks,
>>> Qu
>>>

 On Fri, Aug 18, 2017 at 12:10 AM, Chris Murphy 
 wrote:
>
>
> On Thu, Aug 17, 2017 at 4:42 PM, Qu Wenruo 
> wrote:
>
>> BTW are you using discard mount option? Sometimes it can cause
>> problem.
>
>
>
> OP did not say if it was using discard mount option; but did say some
> time before this (I'm not sure how recent) he had used fstrim. The
> firmware for this SSD model is current.
>
>
> --
> Chris Murphy


 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Qu Wenruo



On 2017年08月18日 16:47, Zirconium Hacker wrote:

$ sudo btrfs-debug-tree -b 131072 /dev/sda4
btrfs-progs v4.12
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4


I think this can be improved for case like this.
I'll try to submit a patch to enhance btrfs-debug-tree.

Would you please try "btrfs-find-root /dev/sda4"?
This will try to locate on-disk old tree root, and if we're lucky, old 
tree root can allow us to mount the fs.




Mounting with degraded,ro does not fix the multi-device issue.  The
system was never really intended to have a second device, though:


Wait for a minute, did you mean this btrfs doesn't ever have a second 
device?

This seems quite weird now.



$ sudo btrfs fi show /dev/sda4
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
Total devices 2 FS bytes used 49.52GiB
devid1 size 54.07GiB used 54.07GiB path /dev/sda4
*** Some devices missing

I vaguely remember following this guide at some point:
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
-- specifically the "Balance cannot run because the filesystem is
full" part.  This may have broken some things?


Not sure, at least from your superblock, too many things are in doubt.
From the number of devices, to strange system chunk.

Thanks,
Qu


On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo  wrote:



On 2017年08月18日 15:17, Zirconium Hacker wrote:


I checked my fstab, and my mount options for that partition are:
nodev,nosuid (so no discard).
As far as I remember, I had some issues converting from ext4 with
existing tools (I think that was on Debian so the tools were likely
older) so I did a manual conversion backup, wipe, copy files back).

$ sudo btrfs-find-root -o 3 /dev/sda4
Couldn't read tree root
Superblock thinks the generation is 311252
Superblock thinks the level is 0
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
Found tree root at 131072 gen 311252 level 0



So chunk root (and since it's level 0, the whole chunk tree) seems good.

Could you please try the following command?
# btrfs-debug-tree -b 131072 /dev/sda4

I assume it may fail due to the fact that root tree is corrupted.
But maybe we are lucky?


And further investigating your super dump and the code, it's shows some
clue, mostly related to your multi-device setup.

Your find-root output shows that, the only chunk leaf in /dev/sda4 seems
good.
And in btrfs_read_chunk_tree(), which returned -EIO and caused the error
message, will first search chunk root.

Since your chunk leaf is good, such search itself should not cause too much
problem.

Then btrfs_read_chunk_tree() will try to read out each device, by calling
read_one_dev().
Which can return -EIO if any device is missing and you're not using degraded
mount option.

Is your 2nd device missing? If so, would you please try to mount with
"degraded,ro" mount option?

BTW, if you didn't manually convert chunk profiles, did you first create
btrfs on single device, and then added a new device to the btrfs?

Thanks,
Qu



On Fri, Aug 18, 2017 at 12:10 AM, Chris Murphy 
wrote:


On Thu, Aug 17, 2017 at 4:42 PM, Qu Wenruo 
wrote:


BTW are you using discard mount option? Sometimes it can cause problem.



OP did not say if it was using discard mount option; but did say some
time before this (I'm not sure how recent) he had used fstrim. The
firmware for this SSD model is current.


--
Chris Murphy


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
$ sudo btrfs-debug-tree -b 131072 /dev/sda4
btrfs-progs v4.12
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
ERROR: unable to open /dev/sda4

Mounting with degraded,ro does not fix the multi-device issue.  The
system was never really intended to have a second device, though:

$ sudo btrfs fi show /dev/sda4
bytenr mismatch, want=61809344512, have=0
Couldn't read tree root
Label: none  uuid: 29889b3a-1c10-48e4-ad6d-21d03d06e90b
Total devices 2 FS bytes used 49.52GiB
devid1 size 54.07GiB used 54.07GiB path /dev/sda4
*** Some devices missing

I vaguely remember following this guide at some point:
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
-- specifically the "Balance cannot run because the filesystem is
full" part.  This may have broken some things?

On Fri, Aug 18, 2017 at 4:15 AM, Qu Wenruo  wrote:
>
>
> On 2017年08月18日 15:17, Zirconium Hacker wrote:
>>
>> I checked my fstab, and my mount options for that partition are:
>> nodev,nosuid (so no discard).
>> As far as I remember, I had some issues converting from ext4 with
>> existing tools (I think that was on Debian so the tools were likely
>> older) so I did a manual conversion backup, wipe, copy files back).
>>
>> $ sudo btrfs-find-root -o 3 /dev/sda4
>> Couldn't read tree root
>> Superblock thinks the generation is 311252
>> Superblock thinks the level is 0
>> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
>> Found tree root at 131072 gen 311252 level 0
>
>
> So chunk root (and since it's level 0, the whole chunk tree) seems good.
>
> Could you please try the following command?
> # btrfs-debug-tree -b 131072 /dev/sda4
>
> I assume it may fail due to the fact that root tree is corrupted.
> But maybe we are lucky?
>
>
> And further investigating your super dump and the code, it's shows some
> clue, mostly related to your multi-device setup.
>
> Your find-root output shows that, the only chunk leaf in /dev/sda4 seems
> good.
> And in btrfs_read_chunk_tree(), which returned -EIO and caused the error
> message, will first search chunk root.
>
> Since your chunk leaf is good, such search itself should not cause too much
> problem.
>
> Then btrfs_read_chunk_tree() will try to read out each device, by calling
> read_one_dev().
> Which can return -EIO if any device is missing and you're not using degraded
> mount option.
>
> Is your 2nd device missing? If so, would you please try to mount with
> "degraded,ro" mount option?
>
> BTW, if you didn't manually convert chunk profiles, did you first create
> btrfs on single device, and then added a new device to the btrfs?
>
> Thanks,
> Qu
>
>>
>> On Fri, Aug 18, 2017 at 12:10 AM, Chris Murphy 
>> wrote:
>>>
>>> On Thu, Aug 17, 2017 at 4:42 PM, Qu Wenruo 
>>> wrote:
>>>
 BTW are you using discard mount option? Sometimes it can cause problem.
>>>
>>>
>>> OP did not say if it was using discard mount option; but did say some
>>> time before this (I'm not sure how recent) he had used fstrim. The
>>> firmware for this SSD model is current.
>>>
>>>
>>> --
>>> Chris Murphy
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: add necessary close(fd) in mkfs

2017-08-18 Thread Nikolay Borisov


On 18.08.2017 11:32, Gu Jinxiang wrote:
> Add some close(fd) when error occures in mkfs.
> And add close(fd) when end use it.

When a process exits all of its fds (and all other runtime resources
tracked by the kernel) are freed, so we don't leak fds if that's your
worry. Admittedly, it's good practice to manage the lifetime of fds
correctly in a program, so I don't have any strong preferences either ways.


> 
> Signed-off-by: Gu Jinxiang 
> ---
>  mkfs/main.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 2b109a5..ec82565 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -1694,6 +1694,7 @@ int main(int argc, char **argv)
>   file,
>   (unsigned long long)block_count,
>   (unsigned long long)dev_block_count);
> + close(fd);
>   exit(1);
>   }
>   } else {
> @@ -1711,6 +1712,7 @@ int main(int argc, char **argv)
>   ret = zero_output_file(fd, block_count);
>   if (ret) {
>   error("unable to zero the output file");
> + close(fd);
>   exit(1);
>   }
>   /* our "device" is the new image file */
> @@ -1721,6 +1723,7 @@ int main(int argc, char **argv)
>   if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
>   error("device is too small to make filesystem, must be at least 
> %llu",
>   (unsigned long 
> long)BTRFS_MKFS_SYSTEM_GROUP_SIZE);
> + close(fd);
>   exit(1);
>   }
>  
> @@ -1740,6 +1743,7 @@ int main(int argc, char **argv)
>   ret = make_btrfs(fd, _cfg);
>   if (ret) {
>   error("error during mkfs: %s", strerror(-ret));
> + close(fd);
>   exit(1);
>   }
>  
> @@ -1750,6 +1754,7 @@ int main(int argc, char **argv)
>   close(fd);
>   exit(1);
>   }
> + close(fd);
>   root = fs_info->fs_root;
>   fs_info->alloc_start = alloc_start;
>  
> @@ -1827,6 +1832,7 @@ int main(int argc, char **argv)
>   sectorsize, sectorsize, sectorsize);
>   if (ret) {
>   error("unable to add %s to filesystem: %d", file, ret);
> + close(fd);
>   goto out;
>   }
>   if (verbose >= 2) {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: use btrfsic_submit_bio instead of submit_bio in write_dev_flush

2017-08-18 Thread Lu Fengqi
Although this bio has no data attached, it will reach this condition
(bio->bi_opf & REQ_PREFLUSH) and then update the flush_gen of dev_state
in __btrfsic_submit_bio. So we should still submit it through integrity
checker. Otherwise, the integrity checker will throw the following warning
when I mount a newly created btrfs filesystem.

[10264.755497] btrfs: attempt to write superblock which references block M 
@29523968 (sdb1/654400/0) which is not flushed out of disk's write cache 
(block flush_gen=1, dev->flush_gen=0)!
[10264.755498] btrfs: attempt to write superblock which references block M 
@29523968 (sdb1/37912576/0) which is not flushed out of disk's write cache 
(block flush_gen=1, dev->flush_gen=0)!

Signed-off-by: Lu Fengqi 
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9e187df3c106..0dac37439bb9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3515,7 +3515,7 @@ static void write_dev_flush(struct btrfs_device *device)
init_completion(>flush_wait);
bio->bi_private = >flush_wait;
 
-   submit_bio(bio);
+   btrfsic_submit_bio(bio);
device->flush_bio_sent = 1;
 }
 
-- 
2.14.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi du -s gives Inappropriate ioctl for device

2017-08-18 Thread Piotr Szymaniak
On Thu, Aug 17, 2017 at 10:59:33PM +0200, Goffredo Baroncelli wrote:
> Patches will follow shortly

With your patches applied:

~ # btrfs fi du -s /mnt/red/\@backup/ 
/mnt/red/\@backup/.snapshot/monthly_2017-08-01_05\:30\:01/ /mnt/red/\@svn/ 
/mnt/red/\@svn/.snapshot/weekly_2017-08-05_04\:20\:02/
 Total   Exclusive  Set shared  Filename
   1.86TiB75.73MiB   312.77GiB  /mnt/red/@backup/
 320.29GiB   0.00B   312.19GiB  
/mnt/red/@backup/.snapshot/monthly_2017-08-01_05:30:01/
  52.24GiB10.19MiB 4.13GiB  /mnt/red/@svn/
   4.35GiB 1.05MiB 4.12GiB  
/mnt/red/@svn/.snapshot/weekly_2017-08-05_04:20:02/


Best regards,
Piotr Szymaniak.
-- 
Mysle,  ze gdyby  diabel  nie istnial i mialby go stworzyc czlowiek, to
stworzylby go na swoj obraz i podobienstwo.
  -- Fiodor Dostojewski


signature.asc
Description: Digital signature


[PATCH 1/2] btrfs-progs: add necessary close(fd) in mkfs

2017-08-18 Thread Gu Jinxiang
Add some close(fd) when error occures in mkfs.
And add close(fd) when end use it.

Signed-off-by: Gu Jinxiang 
---
 mkfs/main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 2b109a5..ec82565 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1694,6 +1694,7 @@ int main(int argc, char **argv)
file,
(unsigned long long)block_count,
(unsigned long long)dev_block_count);
+   close(fd);
exit(1);
}
} else {
@@ -1711,6 +1712,7 @@ int main(int argc, char **argv)
ret = zero_output_file(fd, block_count);
if (ret) {
error("unable to zero the output file");
+   close(fd);
exit(1);
}
/* our "device" is the new image file */
@@ -1721,6 +1723,7 @@ int main(int argc, char **argv)
if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
error("device is too small to make filesystem, must be at least 
%llu",
(unsigned long 
long)BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+   close(fd);
exit(1);
}
 
@@ -1740,6 +1743,7 @@ int main(int argc, char **argv)
ret = make_btrfs(fd, _cfg);
if (ret) {
error("error during mkfs: %s", strerror(-ret));
+   close(fd);
exit(1);
}
 
@@ -1750,6 +1754,7 @@ int main(int argc, char **argv)
close(fd);
exit(1);
}
+   close(fd);
root = fs_info->fs_root;
fs_info->alloc_start = alloc_start;
 
@@ -1827,6 +1832,7 @@ int main(int argc, char **argv)
sectorsize, sectorsize, sectorsize);
if (ret) {
error("unable to add %s to filesystem: %d", file, ret);
+   close(fd);
goto out;
}
if (verbose >= 2) {
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: delete not-used parameter fd

2017-08-18 Thread Gu Jinxiang
Parameter fd is not used in function make_image and
traverse_directory of mkfs.
Delete it.

Signed-off-by: Gu Jinxiang 
---
 mkfs/main.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index ec82565..3e9c1b8 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -801,7 +801,7 @@ static char *make_path(const char *dir, const char *name)
 
 static int traverse_directory(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, const char *dir_name,
- struct directory_name_entry *dir_head, int out_fd)
+ struct directory_name_entry *dir_head)
 {
int ret = 0;
 
@@ -1029,8 +1029,7 @@ static int create_chunks(struct btrfs_trans_handle *trans,
return ret;
 }
 
-static int make_image(const char *source_dir, struct btrfs_root *root,
-   int out_fd)
+static int make_image(const char *source_dir, struct btrfs_root *root)
 {
int ret;
struct btrfs_trans_handle *trans;
@@ -1048,7 +1047,7 @@ static int make_image(const char *source_dir, struct 
btrfs_root *root,
INIT_LIST_HEAD(_head.list);
 
trans = btrfs_start_transaction(root, 1);
-   ret = traverse_directory(trans, root, source_dir, _head, out_fd);
+   ret = traverse_directory(trans, root, source_dir, _head);
if (ret) {
error("unable to traverse directory %s: %d", source_dir, ret);
goto fail;
@@ -1882,7 +1881,7 @@ raid_groups:
goto out;
}
 
-   ret = make_image(source_dir, root, fd);
+   ret = make_image(source_dir, root);
if (ret) {
error("error wihle filling filesystem: %d", ret);
goto out;
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Qu Wenruo



On 2017年08月18日 15:17, Zirconium Hacker wrote:

I checked my fstab, and my mount options for that partition are:
nodev,nosuid (so no discard).
As far as I remember, I had some issues converting from ext4 with
existing tools (I think that was on Debian so the tools were likely
older) so I did a manual conversion backup, wipe, copy files back).

$ sudo btrfs-find-root -o 3 /dev/sda4
Couldn't read tree root
Superblock thinks the generation is 311252
Superblock thinks the level is 0
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
Found tree root at 131072 gen 311252 level 0


So chunk root (and since it's level 0, the whole chunk tree) seems good.

Could you please try the following command?
# btrfs-debug-tree -b 131072 /dev/sda4

I assume it may fail due to the fact that root tree is corrupted.
But maybe we are lucky?


And further investigating your super dump and the code, it's shows some 
clue, mostly related to your multi-device setup.


Your find-root output shows that, the only chunk leaf in /dev/sda4 seems 
good.
And in btrfs_read_chunk_tree(), which returned -EIO and caused the error 
message, will first search chunk root.


Since your chunk leaf is good, such search itself should not cause too 
much problem.


Then btrfs_read_chunk_tree() will try to read out each device, by 
calling read_one_dev().
Which can return -EIO if any device is missing and you're not using 
degraded mount option.


Is your 2nd device missing? If so, would you please try to mount with 
"degraded,ro" mount option?


BTW, if you didn't manually convert chunk profiles, did you first create 
btrfs on single device, and then added a new device to the btrfs?


Thanks,
Qu



On Fri, Aug 18, 2017 at 12:10 AM, Chris Murphy  wrote:

On Thu, Aug 17, 2017 at 4:42 PM, Qu Wenruo  wrote:


BTW are you using discard mount option? Sometimes it can cause problem.


OP did not say if it was using discard mount option; but did say some
time before this (I'm not sure how recent) he had used fstrim. The
firmware for this SSD model is current.


--
Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi du -s gives Inappropriate ioctl for device

2017-08-18 Thread Goffredo Baroncelli
On 08/18/2017 09:12 AM, Nikolay Borisov wrote:
> 
> 
> On 18.08.2017 10:09, Goffredo Baroncelli wrote:
>> On 08/18/2017 08:34 AM, Nikolay Borisov wrote:
[...]
>>> It would be awesome if you manage to introduce xfstests for this case
>>
>> I am not sure if this is the right thing to do: the bugs are related to 
>> a) an incorrect error handling of the function lookup_path_rootid()
>> b) the strange behavior of the BTRFS_EMPTY_SUBVOL_DIR_OBJECTID directory
>>
>> For a) it is impossible to check for each lookup_path_rootid() call from a 
>> xfstest; for b) I am quite sure that there are a lot of corner case not 
>> properly handled in btrfs progs (and still btrfs fi-du handle it wrongly)
> 
> So the bug is actually in the btrfsprogs not in the fs, am I correct?
> 
Yes, definitively


BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS error (device sda4): failed to read chunk tree: -5

2017-08-18 Thread Zirconium Hacker
I checked my fstab, and my mount options for that partition are:
nodev,nosuid (so no discard).
As far as I remember, I had some issues converting from ext4 with
existing tools (I think that was on Debian so the tools were likely
older) so I did a manual conversion backup, wipe, copy files back).

$ sudo btrfs-find-root -o 3 /dev/sda4
Couldn't read tree root
Superblock thinks the generation is 311252
Superblock thinks the level is 0
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
Found tree root at 131072 gen 311252 level 0

On Fri, Aug 18, 2017 at 12:10 AM, Chris Murphy  wrote:
> On Thu, Aug 17, 2017 at 4:42 PM, Qu Wenruo  wrote:
>
>> BTW are you using discard mount option? Sometimes it can cause problem.
>
> OP did not say if it was using discard mount option; but did say some
> time before this (I'm not sure how recent) he had used fstrim. The
> firmware for this SSD model is current.
>
>
> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi du -s gives Inappropriate ioctl for device

2017-08-18 Thread Nikolay Borisov


On 18.08.2017 10:09, Goffredo Baroncelli wrote:
> On 08/18/2017 08:34 AM, Nikolay Borisov wrote:
>>
>>
>> On 17.08.2017 23:59, Goffredo Baroncelli wrote:
>>> Hi,
>>>
>>> On 08/17/2017 08:43 PM, Chris Murphy wrote:
 # btrfs sub create test1
 Create subvolume './test1'
 # btrfs sub create test1/test2
 Create subvolume 'test1/test2'
 # btrfs sub snap test1 test1.snap
 Create a snapshot of 'test1' in './test1.snap'
 # btrfs fi du -s test1
  Total   Exclusive  Set shared  Filename
  0.00B   0.00B   0.00B  test1
 # btrfs fi du -s test1.snap
  Total   Exclusive  Set shared  Filename
 ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
 #

>>>
>>> tanks for the test case. Now I was able to reproduce the problem. The 
>>> bug(s) are two:
>>>
>>> 1) to get the treeid of the files/directory, the function 
>>> lookup_path_rootid() is used, which behaves strangely when it is called on 
>>> a directory that is BTRFS_EMPTY_SUBVOL_DIR_OBJECTID. In fact this function 
>>> is commented as following:
>>> []
>>> /*
>>>  * For a given:
>>>  * - file or directory return the containing tree root id
>>>  * - subvolume return its own tree id
>>>  * - BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) the result is
>>>  *   undefined and function returns -1
>>>  */
>>> int lookup_path_rootid(int fd, u64 *rootid)
>>> {
>>> []
>>>
>>> The caller (du_add_file()) doesn't consider this case.
>>>
>>> 2) in the function du_walk_dir(), an error returned by du_add_file() is 
>>> ignored, but the return value is not reset. So if the last entry has ino 
>>> BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) an error is 
>>> returned. But if there is another directory item no error is returned !
>>>
>>> See the following tests cases:
>>>
>>> # btrfs sub create test1
>>> # btrfs sub create test1/test2
>>> # btrfs sub snap test1 test1.snap
>>> # btrfs fi du -s test1
>>>  Total   Exclusive  Set shared  Filename
>>>  0.00B   0.00B   0.00B  test1
>>> # btrfs fi du -s test1.snap
>>>  Total   Exclusive  Set shared  Filename
>>> ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
>>>
>>> But if you add *another* file/dir in test1.snap you got:
>>>
>>> # mkdir test1.snap/dir
>>> # btrfs fi du -s test1.snap
>>>  Total   Exclusive  Set shared  Filename
>>>  0.00B   0.00B   0.00B  test1.snap
>>>
>>> The error disappea> Patches will follow shortly
>>
>> It would be awesome if you manage to introduce xfstests for this case
> 
> I am not sure if this is the right thing to do: the bugs are related to 
> a) an incorrect error handling of the function lookup_path_rootid()
> b) the strange behavior of the BTRFS_EMPTY_SUBVOL_DIR_OBJECTID directory
> 
> For a) it is impossible to check for each lookup_path_rootid() call from a 
> xfstest; for b) I am quite sure that there are a lot of corner case not 
> properly handled in btrfs progs (and still btrfs fi-du handle it wrongly)

So the bug is actually in the btrfsprogs not in the fs, am I correct?

> 
> BR
> G.Baroncelli
> 
>>
>>>

 # uname -r
 4.13.0-0.rc4.git1.1.fc27.x86_64
 # rpm -q btrfs-progs
 btrfs-progs-4.12-1.fc27.x86_64


 
 Chris Murphy

>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi du -s gives Inappropriate ioctl for device

2017-08-18 Thread Goffredo Baroncelli
On 08/18/2017 08:34 AM, Nikolay Borisov wrote:
> 
> 
> On 17.08.2017 23:59, Goffredo Baroncelli wrote:
>> Hi,
>>
>> On 08/17/2017 08:43 PM, Chris Murphy wrote:
>>> # btrfs sub create test1
>>> Create subvolume './test1'
>>> # btrfs sub create test1/test2
>>> Create subvolume 'test1/test2'
>>> # btrfs sub snap test1 test1.snap
>>> Create a snapshot of 'test1' in './test1.snap'
>>> # btrfs fi du -s test1
>>>  Total   Exclusive  Set shared  Filename
>>>  0.00B   0.00B   0.00B  test1
>>> # btrfs fi du -s test1.snap
>>>  Total   Exclusive  Set shared  Filename
>>> ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
>>> #
>>>
>>
>> tanks for the test case. Now I was able to reproduce the problem. The bug(s) 
>> are two:
>>
>> 1) to get the treeid of the files/directory, the function 
>> lookup_path_rootid() is used, which behaves strangely when it is called on a 
>> directory that is BTRFS_EMPTY_SUBVOL_DIR_OBJECTID. In fact this function is 
>> commented as following:
>> []
>> /*
>>  * For a given:
>>  * - file or directory return the containing tree root id
>>  * - subvolume return its own tree id
>>  * - BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) the result is
>>  *   undefined and function returns -1
>>  */
>> int lookup_path_rootid(int fd, u64 *rootid)
>> {
>> []
>>
>> The caller (du_add_file()) doesn't consider this case.
>>
>> 2) in the function du_walk_dir(), an error returned by du_add_file() is 
>> ignored, but the return value is not reset. So if the last entry has ino 
>> BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) an error is 
>> returned. But if there is another directory item no error is returned !
>>
>> See the following tests cases:
>>
>> # btrfs sub create test1
>> # btrfs sub create test1/test2
>> # btrfs sub snap test1 test1.snap
>> # btrfs fi du -s test1
>>  Total   Exclusive  Set shared  Filename
>>  0.00B   0.00B   0.00B  test1
>> # btrfs fi du -s test1.snap
>>  Total   Exclusive  Set shared  Filename
>> ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
>>
>> But if you add *another* file/dir in test1.snap you got:
>>
>> # mkdir test1.snap/dir
>> # btrfs fi du -s test1.snap
>>  Total   Exclusive  Set shared  Filename
>>  0.00B   0.00B   0.00B  test1.snap
>>
>> The error disappea> Patches will follow shortly
> 
> It would be awesome if you manage to introduce xfstests for this case

I am not sure if this is the right thing to do: the bugs are related to 
a) an incorrect error handling of the function lookup_path_rootid()
b) the strange behavior of the BTRFS_EMPTY_SUBVOL_DIR_OBJECTID directory

For a) it is impossible to check for each lookup_path_rootid() call from a 
xfstest; for b) I am quite sure that there are a lot of corner case not 
properly handled in btrfs progs (and still btrfs fi-du handle it wrongly)

BR
G.Baroncelli

> 
>>
>>>
>>> # uname -r
>>> 4.13.0-0.rc4.git1.1.fc27.x86_64
>>> # rpm -q btrfs-progs
>>> btrfs-progs-4.12-1.fc27.x86_64
>>>
>>>
>>> 
>>> Chris Murphy
>>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Don't call lookup_path_rootid() when ino=BTRFS_EMPTY_SUBVOL_DIR_OBJECTID

2017-08-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

When ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID, the item is not referred
to any file-tree. So lookup_path_rootid() doesn't return any sensate
value.

Signed-off-by: Goffredo Baroncelli 
---
 cmds-fi-du.c | 25 -
 ctree.h  |  2 ++
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/cmds-fi-du.c b/cmds-fi-du.c
index 4bf6af3e..013f8dfd 100644
--- a/cmds-fi-du.c
+++ b/cmds-fi-du.c
@@ -433,7 +433,6 @@ static int du_add_file(const char *filename, int dirfd,
u64 file_total = 0;
u64 file_shared = 0;
u64 dir_set_shared = 0;
-   u64 subvol;
int fd;
DIR *dirstream = NULL;
 
@@ -462,16 +461,24 @@ static int du_add_file(const char *filename, int dirfd,
goto out;
}
 
-   ret = lookup_path_rootid(fd, );
-   if (ret)
-   goto out_close;
+   /*
+* st.st_ino == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (==2)
+* there is no any related tree
+*/
+   if (st.st_ino != BTRFS_EMPTY_SUBVOL_DIR_OBJECTID) {
+   u64 subvol;
 
-   if (inode_seen(st.st_ino, subvol))
-   goto out_close;
+   ret = lookup_path_rootid(fd, );
+   if (ret)
+   goto out_close;
 
-   ret = mark_inode_seen(st.st_ino, subvol);
-   if (ret)
-   goto out_close;
+   if (inode_seen(st.st_ino, subvol))
+   goto out_close;
+
+   ret = mark_inode_seen(st.st_ino, subvol);
+   if (ret)
+   goto out_close;
+   }
 
if (S_ISREG(st.st_mode)) {
ret = du_calc_file_space(fd, shared_extents, _total,
diff --git a/ctree.h b/ctree.h
index 48ae8903..2fc67b06 100644
--- a/ctree.h
+++ b/ctree.h
@@ -138,6 +138,8 @@ struct btrfs_free_space_ctl;
  */
 #define BTRFS_DEV_ITEMS_OBJECTID 1ULL
 
+#define BTRFS_EMPTY_SUBVOL_DIR_OBJECTID 2
+
 /*
  * the max metadata block size.  This limit is somewhat artificial,
  * but the memmove costs go through the roof for larger blocks.
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: incorrect invalid lookup_path_rootid() error handling

2017-08-18 Thread Goffredo Baroncelli
Hi All,

Piotr and Chris, pointed me at these bugs which could be triggered by this test 
case:

# btrfs sub create test1
# btrfs sub create test1/test2
# btrfs sub snap test1 test1.snap
# btrfs fi du -s test1
 Total   Exclusive  Set shared  Filename
 0.00B   0.00B   0.00B  test1
# btrfs fi du -s test1.snap
 Total   Exclusive  Set shared  Filename
ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device

These are two bugs:
1) in the function du_walk_dir() the error returned by du_add_file() normally
is ignored. But if du_walk_dir() "walks" the last items, the error is returned
to the caller
2) in the function du_add_file() it doesn't make sense to call
lookup_path_rootid() when the inode is BTRFS_EMPTY_SUBVOL_DIR_OBJECTID: in this
case the function doesn't return a valid value.

BR
G.Baroncelli
--
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Reset the ret value when ignore an error from du_add_file()

2017-08-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

In du_walk_dir(), when du_add_file() returns an error it is
usually ignored. However if the error is returned querying
the last item, the error is returned to the caller.

Signed-off-by: Goffredo Baroncelli 
---
 cmds-fi-du.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/cmds-fi-du.c b/cmds-fi-du.c
index f106f45b..4bf6af3e 100644
--- a/cmds-fi-du.c
+++ b/cmds-fi-du.c
@@ -403,6 +403,7 @@ static int du_walk_dir(struct du_dir_ctxt *ctxt, struct 
rb_root *shared_extents)
  shared_extents, , ,
  0);
if (ret == -ENOTTY) {
+   ret = 0;
continue;
} else if (ret) {
fprintf(stderr,
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi du -s gives Inappropriate ioctl for device

2017-08-18 Thread Nikolay Borisov


On 17.08.2017 23:59, Goffredo Baroncelli wrote:
> Hi,
> 
> On 08/17/2017 08:43 PM, Chris Murphy wrote:
>> # btrfs sub create test1
>> Create subvolume './test1'
>> # btrfs sub create test1/test2
>> Create subvolume 'test1/test2'
>> # btrfs sub snap test1 test1.snap
>> Create a snapshot of 'test1' in './test1.snap'
>> # btrfs fi du -s test1
>>  Total   Exclusive  Set shared  Filename
>>  0.00B   0.00B   0.00B  test1
>> # btrfs fi du -s test1.snap
>>  Total   Exclusive  Set shared  Filename
>> ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
>> #
>>
> 
> tanks for the test case. Now I was able to reproduce the problem. The bug(s) 
> are two:
> 
> 1) to get the treeid of the files/directory, the function 
> lookup_path_rootid() is used, which behaves strangely when it is called on a 
> directory that is BTRFS_EMPTY_SUBVOL_DIR_OBJECTID. In fact this function is 
> commented as following:
> []
> /*
>  * For a given:
>  * - file or directory return the containing tree root id
>  * - subvolume return its own tree id
>  * - BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) the result is
>  *   undefined and function returns -1
>  */
> int lookup_path_rootid(int fd, u64 *rootid)
> {
> []
> 
> The caller (du_add_file()) doesn't consider this case.
> 
> 2) in the function du_walk_dir(), an error returned by du_add_file() is 
> ignored, but the return value is not reset. So if the last entry has ino 
> BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) an error is 
> returned. But if there is another directory item no error is returned !
> 
> See the following tests cases:
> 
> # btrfs sub create test1
> # btrfs sub create test1/test2
> # btrfs sub snap test1 test1.snap
> # btrfs fi du -s test1
>  Total   Exclusive  Set shared  Filename
>  0.00B   0.00B   0.00B  test1
> # btrfs fi du -s test1.snap
>  Total   Exclusive  Set shared  Filename
> ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
> 
> But if you add *another* file/dir in test1.snap you got:
> 
> # mkdir test1.snap/dir
> # btrfs fi du -s test1.snap
>  Total   Exclusive  Set shared  Filename
>  0.00B   0.00B   0.00B  test1.snap
> 
> The error disappea> Patches will follow shortly

It would be awesome if you manage to introduce xfstests for this case

> 
>>
>> # uname -r
>> 4.13.0-0.rc4.git1.1.fc27.x86_64
>> # rpm -q btrfs-progs
>> btrfs-progs-4.12-1.fc27.x86_64
>>
>>
>> 
>> Chris Murphy
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html