Re: help!!! error when mount a btrfs file system
At 03/16/2017 08:23 PM, 李云甫 wrote: hi, buddy I have a file server with btrfs file system, it's work well for several months. but after last system reboot, the /dev/sdb become not mountable. below is the details. is there any advise? ##Version info Fedora 25 Server Kernel 4.9.13-201.fc25.x86_64 btrfs-progs v4.6.1 #error messages when mount mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. ##dmesg |tail [79570.756871] BTRFS error (device sdb): parent transid verify failed on 21413888 wanted 755660 found 623605 [79570.762307] BTRFS error (device sdb): bad tree block start 0 21413888 Chunk tree corrupted. [79570.762345] BTRFS error (device sdb): failed to read chunk tree: -5 [79570.778129] BTRFS error (device sdb): open_ctree failed [79589.743772] BTRFS error (device sdb): support for check_integrity* not compiled in! [79589.803176] BTRFS error (device sdb): open_ctree failed ##btrfsck parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 checksum verify failed on 21413888 found E4E3BDB6 wanted E4E3BDB6 is the crc32 of a leaf filled with all zero data. And wanted csum is also 0, means the whole leaf is all zero. Either something went wrong related to discard, or your chunk tree got completely corrupted. parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted At least 4 leaves/nodes of chunk tree are corrupted. I assume that's all your chunk tree. I would say the chance to recover is very low. Thanks, Qu bytenr mismatch, want=22888448, have=0 Couldn't read chunk tree Couldn't open file system ##btrfs-find-root parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure Couldn't read chunk tree ERROR: open ctree failed ##btrfs-show-super -a /dev/sdb superblock: bytenr=65536, device=/dev/sdb - csum0xb6f3ccb1 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 2396231680 bytes_used 22205028102144 sectorsize 4096 nodesize16384 leafsize16384 stripesize 4096 root_dir6 num_devices 1 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation770740 uuid_tree_generation770740 dev_item.uuid dd8c8d66-b6f5-48d8-9d5e-6f56b2ad4751 dev_item.fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f [match] dev_item.type 0 dev_item.total_bytes2396231680 dev_item.bytes_used 23274943676416 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 superblock: bytenr=67108864, device=/dev/sdb - csum0x1692e47f [match] bytenr 67108864 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes
Re: help converting btrfs to new writeback error tracking?
On Mon, 2017-05-08 at 11:39 -0700, Liu Bo wrote: > Hi Jeff, > > On Fri, May 05, 2017 at 04:11:18PM -0400, Jeff Layton wrote: > > On Fri, 2017-05-05 at 12:21 -0700, Liu Bo wrote: > > > Hi Jeff, > > > > > > On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote: > > > > I've been working on set of patches to clean up how writeback errors are > > > > tracked and handled in the kernel: > > > > > > > > http://marc.info/?l=linux-fsdevel=149304074111261=2 > > > > > > > > The basic idea is that rather than having a set of flags that are > > > > cleared whenever they are checked, we have a sequence counter and error > > > > that are tracked on a per-mapping basis, and can then use that sequence > > > > counter to tell whether the error should be reported. > > > > > > > > This changes the way that things like filemap_write_and_wait work. > > > > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared > > > > inappropriately (and thus losing errors that should be reported), you > > > > can now tell whether there has been a writeback error since a certain > > > > point in time, irrespective of whether anyone else is checking for > > > > errors. > > > > > > > > I've been doing some conversions of the existing code to the new scheme, > > > > but btrfs has _really_ complicated error handling. I think it could > > > > probably be simplified with this new scheme, but I could use some help > > > > here. > > > > > > > > What I think we probably want to do is to sample the error sequence in > > > > the mapping at well-defined points in time (probably when starting a > > > > transaction?) and then use that to determine whether writeback errors > > > > have occurred since then. Is there anyone in the btrfs community who > > > > could help me here? > > > > > > > > > > I went through the patch set and reviewed the btrfs part particular in > > > [PATCH v3 14/20] fs: retrofit old error reporting API onto new > > > infrastructure > > > > > > It looks good to me. > > > > > > In btrfs ->writepage(), it sets PG_error whenever an error > > > (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in > > > end_extent_writepage(). And the special case is the compression code, > > > where it > > > only sets mapping's error when there is any error during processing > > > compression > > > bytes. > > > > > > Similar to ext4, btrfs tracks the IO error by setting mapping's error in > > > writepage_endio and other places (eg. compression code), and around > > > tree-log.c > > > it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is > > > usually > > > set in writepage_endio and sometimes in some error handling code where it > > > couldn't call endio. > > > > > > So the conversion in btrfs's fsync() seems to be good enough, did I miss > > > anything? > > > > > > > Many thanks for taking a look: > > > > There are a number of calls in btrfs to filemap_fdatawait_range that > > check the return code. That function will wait for writeback on all of > > the pages in the mapping range and return an error if there has been > > one. Note too that there are also some places that ignore the return > > code. > > > > These patches change how filemap_fdatawait_range (and some similar > > functions) work. Before this set, you'd get an error if one had occurred > > since anyone last checked it. Now, you only get an error there if one > > occurred since you started waiting. If the failed writeback occurred > > before that function was called, you won't get an error back. > > > > Since all filemap_fdatawait_range() called in btrfs checked the return value, > it > is supposed to catch any errors that are occured from > filemap_fdatawrite_range() > which is called twice by btrfs_fdatawrite_range()[1], so with this set, it's > possible to fail to detect errors if only calling filemap_fdatawait_range(). > > [1]: filemap_fdatawrite_range() needs to be called twice to make sure > compressed > data is flushed. > > > For fsync, it shouldn't matter. You'll get an error back there if one > > occurred since the last fsync since you're setting it in the mapping. > > The bigger question is whether other callers in this code do anything > > with that e
Re: File system is oddly full after kernel upgrade, balance doesn't help
On 2017-01-28 13:15, MegaBrutal wrote: Hello, Of course I can't retrieve the data from before the balance, but here is the data from now: root@vmhost:~# btrfs fi show /tmp/mnt/curlybrace Label: 'curlybrace' uuid: f471bfca-51c4-4e44-ac72-c6cd9ccaf535 Total devices 1 FS bytes used 752.38MiB devid1 size 2.00GiB used 1.90GiB path /dev/mapper/vmdata--vg-lxc--curlybrace root@vmhost:~# btrfs fi df /tmp/mnt/curlybrace Data, single: total=773.62MiB, used=714.82MiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=577.50MiB, used=37.55MiB GlobalReserve, single: total=512.00MiB, used=0.00B root@vmhost:~# btrfs fi usage /tmp/mnt/curlybrace Overall: Device size: 2.00GiB Device allocated: 1.90GiB Device unallocated: 103.38MiB Device missing: 0.00B Used: 789.94MiB Free (estimated): 162.18MiB(min: 110.50MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB(used: 0.00B) Data,single: Size:773.62MiB, Used:714.82MiB /dev/mapper/vmdata--vg-lxc--curlybrace 773.62MiB Metadata,DUP: Size:577.50MiB, Used:37.55MiB /dev/mapper/vmdata--vg-lxc--curlybrace 1.13GiB System,DUP: Size:8.00MiB, Used:16.00KiB /dev/mapper/vmdata--vg-lxc--curlybrace 16.00MiB Unallocated: /dev/mapper/vmdata--vg-lxc--curlybrace 103.38MiB So... if I sum the data, metadata, and the global reserve, I see why only ~170 MB is left. I have no idea, however, why the global reserve sneaked up to 512 MB for such a small file system, and how could I resolve this situation. Any ideas? MegaBrutal Total amateur here just jumping in, so feel free to ignore me, but what caught my eye was the small device size. I've had issues with BTRFS on small devices (4GiB & 8GiB), forcing me to use other filesystems on them (like EXT4, which has a smaller allocation size). Issues being both ENOSPC and miscellaneous other strange errors (which may have been fixed by now). My theory being that the 1GiB data and 256MiB metadata chunk sizes are significant on such small devices. I don't know if there is an official recommended minimum device size, but keeping 4GiB or more free seems to work most of the time for my usage patterns. ~~AEM -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help converting btrfs to new writeback error tracking?
Hi Jeff, On Fri, May 05, 2017 at 04:11:18PM -0400, Jeff Layton wrote: > On Fri, 2017-05-05 at 12:21 -0700, Liu Bo wrote: > > Hi Jeff, > > > > On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote: > > > I've been working on set of patches to clean up how writeback errors are > > > tracked and handled in the kernel: > > > > > > http://marc.info/?l=linux-fsdevel=149304074111261=2 > > > > > > The basic idea is that rather than having a set of flags that are > > > cleared whenever they are checked, we have a sequence counter and error > > > that are tracked on a per-mapping basis, and can then use that sequence > > > counter to tell whether the error should be reported. > > > > > > This changes the way that things like filemap_write_and_wait work. > > > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared > > > inappropriately (and thus losing errors that should be reported), you > > > can now tell whether there has been a writeback error since a certain > > > point in time, irrespective of whether anyone else is checking for > > > errors. > > > > > > I've been doing some conversions of the existing code to the new scheme, > > > but btrfs has _really_ complicated error handling. I think it could > > > probably be simplified with this new scheme, but I could use some help > > > here. > > > > > > What I think we probably want to do is to sample the error sequence in > > > the mapping at well-defined points in time (probably when starting a > > > transaction?) and then use that to determine whether writeback errors > > > have occurred since then. Is there anyone in the btrfs community who > > > could help me here? > > > > > > > I went through the patch set and reviewed the btrfs part particular in > > [PATCH v3 14/20] fs: retrofit old error reporting API onto new > > infrastructure > > > > It looks good to me. > > > > In btrfs ->writepage(), it sets PG_error whenever an error > > (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in > > end_extent_writepage(). And the special case is the compression code, > > where it > > only sets mapping's error when there is any error during processing > > compression > > bytes. > > > > Similar to ext4, btrfs tracks the IO error by setting mapping's error in > > writepage_endio and other places (eg. compression code), and around > > tree-log.c > > it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is > > usually > > set in writepage_endio and sometimes in some error handling code where it > > couldn't call endio. > > > > So the conversion in btrfs's fsync() seems to be good enough, did I miss > > anything? > > > > Many thanks for taking a look: > > There are a number of calls in btrfs to filemap_fdatawait_range that > check the return code. That function will wait for writeback on all of > the pages in the mapping range and return an error if there has been > one. Note too that there are also some places that ignore the return > code. > > These patches change how filemap_fdatawait_range (and some similar > functions) work. Before this set, you'd get an error if one had occurred > since anyone last checked it. Now, you only get an error there if one > occurred since you started waiting. If the failed writeback occurred > before that function was called, you won't get an error back. > Since all filemap_fdatawait_range() called in btrfs checked the return value, it is supposed to catch any errors that are occured from filemap_fdatawrite_range() which is called twice by btrfs_fdatawrite_range()[1], so with this set, it's possible to fail to detect errors if only calling filemap_fdatawait_range(). [1]: filemap_fdatawrite_range() needs to be called twice to make sure compressed data is flushed. > For fsync, it shouldn't matter. You'll get an error back there if one > occurred since the last fsync since you're setting it in the mapping. > The bigger question is whether other callers in this code do anything > with that error return. > > If they do, then the next question is: from what point do you want to > detect errors that have occurred? > > What sort of makes sense to me (in a handwavy way) would be to sample > the errseq_t in the mapping when you start a transaction, and then check > vs. that for errors. Then, even if you have parallel transactions going > on the same inode (is that even possible?) then you can tell whether > they all succeded or not. > > Thoughts?
Re: help converting btrfs to new writeback error tracking?
On Fri, 2017-05-05 at 12:21 -0700, Liu Bo wrote: > Hi Jeff, > > On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote: > > I've been working on set of patches to clean up how writeback errors are > > tracked and handled in the kernel: > > > > http://marc.info/?l=linux-fsdevel=149304074111261=2 > > > > The basic idea is that rather than having a set of flags that are > > cleared whenever they are checked, we have a sequence counter and error > > that are tracked on a per-mapping basis, and can then use that sequence > > counter to tell whether the error should be reported. > > > > This changes the way that things like filemap_write_and_wait work. > > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared > > inappropriately (and thus losing errors that should be reported), you > > can now tell whether there has been a writeback error since a certain > > point in time, irrespective of whether anyone else is checking for > > errors. > > > > I've been doing some conversions of the existing code to the new scheme, > > but btrfs has _really_ complicated error handling. I think it could > > probably be simplified with this new scheme, but I could use some help > > here. > > > > What I think we probably want to do is to sample the error sequence in > > the mapping at well-defined points in time (probably when starting a > > transaction?) and then use that to determine whether writeback errors > > have occurred since then. Is there anyone in the btrfs community who > > could help me here? > > > > I went through the patch set and reviewed the btrfs part particular in > [PATCH v3 14/20] fs: retrofit old error reporting API onto new infrastructure > > It looks good to me. > > In btrfs ->writepage(), it sets PG_error whenever an error > (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in > end_extent_writepage(). And the special case is the compression code, where > it > only sets mapping's error when there is any error during processing > compression > bytes. > > Similar to ext4, btrfs tracks the IO error by setting mapping's error in > writepage_endio and other places (eg. compression code), and around tree-log.c > it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is usually > set in writepage_endio and sometimes in some error handling code where it > couldn't call endio. > > So the conversion in btrfs's fsync() seems to be good enough, did I miss > anything? > Many thanks for taking a look: There are a number of calls in btrfs to filemap_fdatawait_range that check the return code. That function will wait for writeback on all of the pages in the mapping range and return an error if there has been one. Note too that there are also some places that ignore the return code. These patches change how filemap_fdatawait_range (and some similar functions) work. Before this set, you'd get an error if one had occurred since anyone last checked it. Now, you only get an error there if one occurred since you started waiting. If the failed writeback occurred before that function was called, you won't get an error back. For fsync, it shouldn't matter. You'll get an error back there if one occurred since the last fsync since you're setting it in the mapping. The bigger question is whether other callers in this code do anything with that error return. If they do, then the next question is: from what point do you want to detect errors that have occurred? What sort of makes sense to me (in a handwavy way) would be to sample the errseq_t in the mapping when you start a transaction, and then check vs. that for errors. Then, even if you have parallel transactions going on the same inode (is that even possible?) then you can tell whether they all succeded or not. Thoughts? -- Jeff Layton <jlay...@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help converting btrfs to new writeback error tracking?
Hi Jeff, On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote: > I've been working on set of patches to clean up how writeback errors are > tracked and handled in the kernel: > > http://marc.info/?l=linux-fsdevel=149304074111261=2 > > The basic idea is that rather than having a set of flags that are > cleared whenever they are checked, we have a sequence counter and error > that are tracked on a per-mapping basis, and can then use that sequence > counter to tell whether the error should be reported. > > This changes the way that things like filemap_write_and_wait work. > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared > inappropriately (and thus losing errors that should be reported), you > can now tell whether there has been a writeback error since a certain > point in time, irrespective of whether anyone else is checking for > errors. > > I've been doing some conversions of the existing code to the new scheme, > but btrfs has _really_ complicated error handling. I think it could > probably be simplified with this new scheme, but I could use some help > here. > > What I think we probably want to do is to sample the error sequence in > the mapping at well-defined points in time (probably when starting a > transaction?) and then use that to determine whether writeback errors > have occurred since then. Is there anyone in the btrfs community who > could help me here? > I went through the patch set and reviewed the btrfs part particular in [PATCH v3 14/20] fs: retrofit old error reporting API onto new infrastructure It looks good to me. In btrfs ->writepage(), it sets PG_error whenever an error (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in end_extent_writepage(). And the special case is the compression code, where it only sets mapping's error when there is any error during processing compression bytes. Similar to ext4, btrfs tracks the IO error by setting mapping's error in writepage_endio and other places (eg. compression code), and around tree-log.c it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is usually set in writepage_endio and sometimes in some error handling code where it couldn't call endio. So the conversion in btrfs's fsync() seems to be good enough, did I miss anything? Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
help converting btrfs to new writeback error tracking?
I've been working on set of patches to clean up how writeback errors are tracked and handled in the kernel: http://marc.info/?l=linux-fsdevel=149304074111261=2 The basic idea is that rather than having a set of flags that are cleared whenever they are checked, we have a sequence counter and error that are tracked on a per-mapping basis, and can then use that sequence counter to tell whether the error should be reported. This changes the way that things like filemap_write_and_wait work. Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared inappropriately (and thus losing errors that should be reported), you can now tell whether there has been a writeback error since a certain point in time, irrespective of whether anyone else is checking for errors. I've been doing some conversions of the existing code to the new scheme, but btrfs has _really_ complicated error handling. I think it could probably be simplified with this new scheme, but I could use some help here. What I think we probably want to do is to sample the error sequence in the mapping at well-defined points in time (probably when starting a transaction?) and then use that to determine whether writeback errors have occurred since then. Is there anyone in the btrfs community who could help me here? Thanks, -- Jeff Layton <jlay...@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] btrfs-progs: help: Unbind short help description from btrfs
usage_command_group_short() always binds its description to 'btrfs', making us unable to this function in other progs. This patch makes the short description independent, so callers need to pass the short description by themselves. Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com> --- btrfs.c | 12 +++- help.c | 14 +++--- help.h | 3 ++- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/btrfs.c b/btrfs.c index f096e780..b3686c4b 100644 --- a/btrfs.c +++ b/btrfs.c @@ -24,6 +24,15 @@ #include "utils.h" #include "help.h" +static const char * const btrfs_short_desc[] = { + "For an overview of a given command use 'btrfs command --help'", + "or 'btrfs [command...] --help --full' to print all available options.", + "Any command name can be shortened as far as it stays unambiguous,", + "however it is recommended to use full command names in scripts.", + "All command groups have their manual page named 'btrfs-'.", + NULL +}; + static const char * const btrfs_cmd_group_usage[] = { "btrfs [--help] [--version] [...] []", NULL @@ -126,7 +135,8 @@ int main(int argc, char **argv) if (!prefixcmp(argv[0], "--")) argv[0] += 2; } else { - usage_command_group_short(_cmd_group); + usage_command_group_short(_cmd_group, + btrfs_short_desc); exit(1); } } diff --git a/help.c b/help.c index 19b0d357..13c45ffd 100644 --- a/help.c +++ b/help.c @@ -262,7 +262,8 @@ static void usage_command_group_internal(const struct cmd_group *grp, int full, } } -void usage_command_group_short(const struct cmd_group *grp) +void usage_command_group_short(const struct cmd_group *grp, + const char * const *short_desc) { const char * const *usagestr = grp->usagestr; FILE *outf = stdout; @@ -298,12 +299,11 @@ void usage_command_group_short(const struct cmd_group *grp) fprintf(outf, " %-16s %s\n", cmd->token, cmd->usagestr[1]); } - fputc('\n', outf); - fprintf(stderr, "For an overview of a given command use 'btrfs command --help'\n"); - fprintf(stderr, "or 'btrfs [command...] --help --full' to print all available options.\n"); - fprintf(stderr, "Any command name can be shortened as far as it stays unambiguous,\n"); - fprintf(stderr, "however it is recommended to use full command names in scripts.\n"); - fprintf(stderr, "All command groups have their manual page named 'btrfs-'.\n"); + if (short_desc) { + fputc('\n', outf); + while (*short_desc && **short_desc) + fprintf(outf, "%s\n", *short_desc++); + } } void usage_command_group(const struct cmd_group *grp, int full, int err) diff --git a/help.h b/help.h index 7458e745..9b190fb1 100644 --- a/help.h +++ b/help.h @@ -58,7 +58,8 @@ struct cmd_group; void usage(const char * const *usagestr) __attribute__((noreturn)); void usage_command(const struct cmd_struct *cmd, int full, int err); void usage_command_group(const struct cmd_group *grp, int all, int err); -void usage_command_group_short(const struct cmd_group *grp); +void usage_command_group_short(const struct cmd_group *grp, + const char * const *short_desc); void help_unknown_token(const char *arg, const struct cmd_group *grp) __attribute__((noreturn)); void help_ambiguous_token(const char *arg, const struct cmd_group *grp) __attribute__((noreturn)); -- 2.12.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 04.04.2017 18:55, Chris Murphy wrote: > On Tue, Apr 4, 2017 at 10:52 AM, Chris Murphy <li...@colorremedies.com> wrote: > > >> Mounting -o ro,degraded is probably permitted by the file system, but >> chunks of the file system and certainly your data, will be missing. So >> it's just a matter of time before copying data off will fail. > ** Context here is, more than 1 device missing. > Thanks you guys for all your help and input. I've ordered two new drives to backup all my data. I have a cloud backup in place, but 13TB takes a while to upload :-) I think I'm gonna abandon btrfs as the main fs for my home server. I'm just gonna set up a separate LVM volume for storing snapshots and backups, since I use btrfs on all my single disk machines. Thanks again everyone. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On Mon, Apr 3, 2017 at 10:02 PM, Robert Krigwrote: > > > On 03.04.2017 16:25, Robert Krig wrote: >> >> I'm gonna run a extensive memory check once I get home, since you >> mentioned corrupt memory might be an issue here. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > I ran a memtest over a couple of hours with no errors. Ram seems to be > fine so far. Inconclusive. A memtest can take days to expose a problem, and even that's not conclusive. The list archive has some examples of where memory testers gave RAM a pass, but doing things like compiling the kernel would fail. > > I've looked at the link you provided. Frankly it looks very scary. (At > least to me it does) > But I've just thought of something else. > > My storage array is BTRFS Raid1 with 4x8TB Drives. > Wouldn't it be possible to simply disconnect two of those drives, mount > with -o degraded and still have access (even if read-only) to all my data? man mkfs.btrfs Btrfs raid1 supports only one device missing, no matter how many drives. Mounting -o ro,degraded is probably permitted by the file system, but chunks of the file system and certainly your data, will be missing. So it's just a matter of time before copying data off will fail. I suggest trying -o ro with all drives, not a degraded mount, and copying data off. Any failures should be logged. Metadata errors are logged without paths, whereas data corruption included path to the affected file. This is easier than scraping the file system with btrfs restore. If you can't mount ro with all drives, or ro,degraded with just one device missing, you'll need to use btrfs restore which is more tolerant of missing metadata. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 2017-04-04 09:29, Brian B wrote: On 04/04/2017 12:02 AM, Robert Krig wrote: My storage array is BTRFS Raid1 with 4x8TB Drives. Wouldn't it be possible to simply disconnect two of those drives, mount with -o degraded and still have access (even if read-only) to all my data? Just jumping on this point: my understanding of BTRFS "RAID1" is that each file (block?) is randomly assigned to two disks of the array (no matter how many disks are in the array). So if you remove two disks, you will probably have files that were "assigned" to both of those disks, and will be missing. In short, you can't remove more than one disk of a BTRFS RAID1 and still have all of your data. That understanding is correct. From a functional perspective, BTRFS raid1 is currently a RAID10 implementation with striping happening at a very large granularity. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On Tue, Apr 04, 2017 at 09:29:11AM -0400, Brian B wrote: > On 04/04/2017 12:02 AM, Robert Krig wrote: > > My storage array is BTRFS Raid1 with 4x8TB Drives. > > Wouldn't it be possible to simply disconnect two of those drives, mount > > with -o degraded and still have access (even if read-only) to all my data? > Just jumping on this point: my understanding of BTRFS "RAID1" is that > each file (block?) is randomly assigned to two disks of the array (no Arbitrarily assigned, rather than randomly assigned (there is a deterministic algorithm for it, but it's wise not to rely on the exact behaviour of that algorithm, because there are a number of factors that can alter its behaviour). > matter how many disks are in the array). So if you remove two disks, > you will probably have files that were "assigned" to both of those > disks, and will be missing. > > In short, you can't remove more than one disk of a BTRFS RAID1 and still > have all of your data. Indeed. Hugo. -- Hugo Mills | Some days, it's just not worth gnawing through the hugo@... carfax.org.uk | straps http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 04/04/2017 12:02 AM, Robert Krig wrote: > My storage array is BTRFS Raid1 with 4x8TB Drives. > Wouldn't it be possible to simply disconnect two of those drives, mount > with -o degraded and still have access (even if read-only) to all my data? Just jumping on this point: my understanding of BTRFS "RAID1" is that each file (block?) is randomly assigned to two disks of the array (no matter how many disks are in the array). So if you remove two disks, you will probably have files that were "assigned" to both of those disks, and will be missing. In short, you can't remove more than one disk of a BTRFS RAID1 and still have all of your data. signature.asc Description: OpenPGP digital signature
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 03.04.2017 16:25, Robert Krig wrote: > > I'm gonna run a extensive memory check once I get home, since you > mentioned corrupt memory might be an issue here. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html I ran a memtest over a couple of hours with no errors. Ram seems to be fine so far. I've looked at the link you provided. Frankly it looks very scary. (At least to me it does) But I've just thought of something else. My storage array is BTRFS Raid1 with 4x8TB Drives. Wouldn't it be possible to simply disconnect two of those drives, mount with -o degraded and still have access (even if read-only) to all my data? E.g. I could use the two removed drives as a backup and rebuild my array from there. Since I'm kind of playing with the idea of turning it into a MD RAID5 and only use btrfs on specific lvm volumes which need it. The one thing that slightly worries me with this idea is, I don't know if there is a way to tell which datablocks are on which drives. If I've understood btrfs raid1 correctly it simply ensures that there is at least a copy of each block on a different device. Would my idea work? Or could it be that I can only safely remove one drive, since the other drives might contain blocks from any of the other drives? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 04/03/2017 04:20 PM, Robert Krig wrote: > > > On 03.04.2017 16:08, Hans van Kranenburg wrote: >> On 04/03/2017 12:11 PM, Robert Krig wrote: >> The corruption is at item 157. Can you attach all of the output, or >> pastebin it? >> > > I've attached the entire log of btrfs-debug-tree. This was generated > with btrfs-progs 4.7.3 Meuh, item 156 key (23416298414080 EXTENT_ITEM 4096) itemoff 8643 itemsize 53 item 157 key (23416298418176 EXTENT_ITEM 4096) itemoff 8590 itemsize 53 8590 + 53 = 8643. I don't get what's invalid about that. "incorrect offsets 8590 1258314415" if (btrfs_item_offset_nr(buf, i) != btrfs_item_end_nr(buf, i + 1)) { ret = BTRFS_TREE_BLOCK_INVALID_OFFSETS; fprintf(stderr, "incorrect offsets %u %u\n", btrfs_item_offset_nr(buf, i), btrfs_item_end_nr(buf, i + 1)); goto fail; } Ah, ok, so the corruption is in item 158, but it's reported as corruption in item 157. There's no really simple tool right now to fix this manually. We can also try to dd 16kiB of metadata from disk, fix it, and write it back. We've been doing that before, it's a bit of work, but it can succeed. Here's more instructions: https://www.spinics.net/lists/linux-btrfs/msg62459.html So, if you're the adventurous type... But then again, if this is really memory failure, there might be other errors all around the fs, which you didn't hit while reading back the data yet. Also note that btrfs does not protect you against this, also not for data in files that gets corrupted in memory before it's written out (which contains the checksum step). > If it makes a difference, I can try it again with the newest version of > btrfs-progs? No, that code hasn't been touched in over 5 years. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 03.04.2017 16:20, Robert Krig wrote: > > On 03.04.2017 16:08, Hans van Kranenburg wrote: >> On 04/03/2017 12:11 PM, Robert Krig wrote: >> The corruption is at item 157. Can you attach all of the output, or >> pastebin it? >> > > I've attached the entire log of btrfs-debug-tree. This was generated > with btrfs-progs 4.7.3 > > If it makes a difference, I can try it again with the newest version of > btrfs-progs? I forgot to mention that btrfs-debug-tree also segfaults with a "memory access error" I'm gonna run a extensive memory check once I get home, since you mentioned corrupt memory might be an issue here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 03.04.2017 16:08, Hans van Kranenburg wrote: > On 04/03/2017 12:11 PM, Robert Krig wrote: > The corruption is at item 157. Can you attach all of the output, or > pastebin it? > I've attached the entire log of btrfs-debug-tree. This was generated with btrfs-progs 4.7.3 If it makes a difference, I can try it again with the newest version of btrfs-progs? btrfs-progs v4.7.3 leaf 38666170826752 items 199 free space 1506 generation 1248226 owner 2 fs uuid 8c4f8e26-3442-463f-ad8a-668dfef02593 chunk uuid 1f04f64e-0ec8-4b39-83d9-a2df75179d3e item 0 key (23416295448576 EXTENT_ITEM 36864) itemoff 16230 itemsize 53 extent refs 1 gen 671397 flags DATA extent data backref root 5 objectid 4959957 offset 0 count 1 item 1 key (23416295485440 EXTENT_ITEM 8192) itemoff 16177 itemsize 53 extent refs 1 gen 972749 flags DATA extent data backref root 5 objectid 7328099 offset 0 count 1 item 2 key (23416295493632 EXTENT_ITEM 12288) itemoff 16124 itemsize 53 extent refs 1 gen 797708 flags DATA extent data backref root 5 objectid 5842103 offset 1966080 count 1 item 3 key (23416295505920 EXTENT_ITEM 8192) itemoff 16071 itemsize 53 extent refs 1 gen 1244513 flags DATA extent data backref root 44107 objectid 28528 offset 974848 count 1 item 4 key (23416295514112 EXTENT_ITEM 8192) itemoff 16034 itemsize 37 extent refs 1 gen 625327 flags DATA shared data backref parent 38666872045568 count 1 item 5 key (23416295522304 EXTENT_ITEM 16384) itemoff 15997 itemsize 37 extent refs 1 gen 625327 flags DATA shared data backref parent 38666872045568 count 1 item 6 key (23416295538688 EXTENT_ITEM 49152) itemoff 15944 itemsize 53 extent refs 1 gen 585321 flags DATA extent data backref root 5 objectid 4742401 offset 393216 count 1 item 7 key (23416295587840 EXTENT_ITEM 8192) itemoff 15907 itemsize 37 extent refs 1 gen 625327 flags DATA shared data backref parent 38666872045568 count 1 item 8 key (23416295596032 EXTENT_ITEM 4096) itemoff 15854 itemsize 53 extent refs 1 gen 625327 flags DATA extent data backref root 5 objectid 1123021 offset 6029312 count 1 item 9 key (23416295600128 EXTENT_ITEM 4096) itemoff 15801 itemsize 53 extent refs 1 gen 975337 flags DATA extent data backref root 5 objectid 7334929 offset 0 count 1 item 10 key (23416295604224 EXTENT_ITEM 57344) itemoff 15748 itemsize 53 extent refs 1 gen 572974 flags DATA extent data backref root 5 objectid 4430156 offset 0 count 1 item 11 key (23416295661568 EXTENT_ITEM 106496) itemoff 15695 itemsize 53 extent refs 1 gen 585319 flags DATA extent data backref root 5 objectid 4742398 offset 2490368 count 1 item 12 key (23416295768064 EXTENT_ITEM 4096) itemoff 15642 itemsize 53 extent refs 1 gen 795227 flags DATA extent data backref root 5 objectid 5769382 offset 12288 count 1 item 13 key (23416295772160 EXTENT_ITEM 4096) itemoff 15589 itemsize 53 extent refs 1 gen 795227 flags DATA extent data backref root 5 objectid 5769383 offset 4096 count 1 item 14 key (23416295776256 EXTENT_ITEM 4096) itemoff 15536 itemsize 53 extent refs 1 gen 585370 flags DATA extent data backref root 5 objectid 4742594 offset 1310720 count 1 item 15 key (23416295780352 EXTENT_ITEM 8192) itemoff 15499 itemsize 37 extent refs 1 gen 625327 flags DATA shared data backref parent 32477101621248 count 1 item 16 key (23416295788544 EXTENT_ITEM 151552) itemoff 15446 itemsize 53 extent refs 1 gen 992062 flags DATA extent data backref root 5 objectid 7458028 offset 0 count 1 item 17 key (23416295940096 EXTENT_ITEM 4096) itemoff 15393 itemsize 53 extent refs 1 gen 1027477 flags DATA extent data backref root 5 objectid 7508879 offset 4096 count 1 item 18 key (23416295944192 EXTENT_ITEM 4096) itemoff 15340 itemsize 53 extent refs 1 gen 1023977 flags DATA extent data backref root 5 objectid 7496365 offset 20480 count 1 item 19 key (23416295948288 EXTENT_ITEM 36864) itemoff 15287 itemsize 53 extent refs 1 gen 516177 flags DATA extent data backref root 5 objectid 3897818 offset 12976128 count 1 item 20 key (23416295985152 EXTENT_ITEM 45056) itemoff 15234 itemsize 53 extent refs 1 gen 444976 flags DATA extent data backref root 5 objectid 3591929 offset 12320768 count 1 item 21 key
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 04/03/2017 03:50 PM, Robert Krig wrote: > > > On 03.04.2017 12:11, Robert Krig wrote: >> Hi guys, I seem to have run into a spot of trouble with my btrfs partition. >> >> I've got 4 x 8TB in a RAID1 BTRFS configuration. >> >> I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs >> progs version v4.7.3 >> >> Server has 8GB of Ram. >> >> >> I was running duperemove using a hashfile, which seemed to have run out >> space and aborted. Then I tried a balance operation, with -dusage >> progressively set to 0 1 5 15 30 50, which then aborted, I presume that >> this caused the fs to mount readonly. I only noticed it somewhat later. >> >> I've since rebooted, and I can mount the filesystem OK, but after some >> time (I presume caused by reads or writes) it once again switches to >> readonly. >> >> I tried unmounting/remounting again and running a scrub, but the scrub >> aborts after some time. >> >> > > > I've compiled the newest btrfs-tools version 4.10.2 > > This is what I get when running a btrfsck -p /dev/sda > > hecking filesystem on > /dev/sda > > > UUID: > 8c4f8e26-3442-463f-ad8a-668dfef02593 > > > incorrect offsets 8590 > 1258314415 > > > bad block > 38666170826752 > > > > > > ERROR: errors found in extent allocation tree or chunk > allocation > Speicherzugriffsfehler > > For the non-german speakers: Speicherzugriffsfehler = Memory Access Error > > Dmesg shows this: > > Apr 03 15:47:05 atlas kernel: btrfs[9140]: segfault at 9476b99e ip > 0044c459 sp 7fff556b4b10 error 4 in > btrfs[40+9d000] That's probably because the tool does not verify if the numbers in the fields make sense before using them. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 04/03/2017 12:11 PM, Robert Krig wrote: > Hi guys, I seem to have run into a spot of trouble with my btrfs partition. > > I've got 4 x 8TB in a RAID1 BTRFS configuration. > > I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs > progs version v4.7.3 > > Server has 8GB of Ram. > > > I was running duperemove using a hashfile, which seemed to have run out > space and aborted. Then I tried a balance operation, with -dusage > progressively set to 0 1 5 15 30 50, which then aborted, I presume that > this caused the fs to mount readonly. I only noticed it somewhat later. The balance probably did not cause the issue, but it ran across the invalid metadata page, while digging around in the filesyste and then choked on it. > I've since rebooted, and I can mount the filesystem OK, but after some > time (I presume caused by reads or writes) it once again switches to > readonly. > > I tried unmounting/remounting again and running a scrub, but the scrub > aborts after some time. > > > Here is the output from the kernel when the partition crashes: > > Apr 03 11:32:57 atlas kernel: BTRFS info (device sda): The free space > cache file (37732863967232) is invalid. skip it > Apr 03 11:33:46 atlas kernel: BTRFS critical (device sda): corrupt leaf, > slot offset bad: block=38666170826752, root=1, slot=157 > [...] Note: The root=1 is a lie? Looking at the output of btrfs-debug-tree below, this is definitely a tree block of tree 2, not 1. I have seen this more often, but not looked at the code yet. Maybe some bug in assembling the error message? > I tried running a btrfs-debug-tree -b 38666170826752 /dev/sda > > btrfs-progs > v4.7.3 > > > leaf 38666170826752 items 199 free space 1506 generation 1248226 owner > 2 > > > fs uuid > 8c4f8e26-3442-463f-ad8a-668dfef02593 > > > chunk uuid > 1f04f64e-0ec8-4b39-83d9-a2df75179d3e > > > item 0 key (23416295448576 EXTENT_ITEM 36864) itemoff 16230 > itemsize > 53 > > extent refs 1 gen 671397 flags > DATA > > > extent data backref root 5 objectid 4959957 offset 0 > count > 1 > > > [...] The corruption is at item 157. Can you attach all of the output, or pastebin it? > this goes on and on. I can provide the entire output if thats helpful. Yes. The corruption is in item 157, and then from the point of the itemoff value. This is the offset of the item data in the metadata page. See https://btrfs.wiki.kernel.org/index.php/On-disk_Format#Leaf_Node > Any ideas on what I could do to fix the partition? Is it fixable, or is > it a lost cause? Memory corruption, not on disk corruption. So, either a bitflip, or garbage which ended up on this memory location for whatever reason or a bug in whatever part of the kernel, a pointer in another module gone wonky, etc, which we might learn more about after seeing more of the output. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
On 03.04.2017 12:11, Robert Krig wrote: > Hi guys, I seem to have run into a spot of trouble with my btrfs partition. > > I've got 4 x 8TB in a RAID1 BTRFS configuration. > > I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs > progs version v4.7.3 > > Server has 8GB of Ram. > > > I was running duperemove using a hashfile, which seemed to have run out > space and aborted. Then I tried a balance operation, with -dusage > progressively set to 0 1 5 15 30 50, which then aborted, I presume that > this caused the fs to mount readonly. I only noticed it somewhat later. > > I've since rebooted, and I can mount the filesystem OK, but after some > time (I presume caused by reads or writes) it once again switches to > readonly. > > I tried unmounting/remounting again and running a scrub, but the scrub > aborts after some time. > > I've compiled the newest btrfs-tools version 4.10.2 This is what I get when running a btrfsck -p /dev/sda hecking filesystem on /dev/sda UUID: 8c4f8e26-3442-463f-ad8a-668dfef02593 incorrect offsets 8590 1258314415 bad block 38666170826752 ERROR: errors found in extent allocation tree or chunk allocation Speicherzugriffsfehler For the non-german speakers: Speicherzugriffsfehler = Memory Access Error Dmesg shows this: Apr 03 15:47:05 atlas kernel: btrfs[9140]: segfault at 9476b99e ip 0044c459 sp 7fff556b4b10 error 4 in btrfs[40+9d000] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"
Hi guys, I seem to have run into a spot of trouble with my btrfs partition. I've got 4 x 8TB in a RAID1 BTRFS configuration. I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs progs version v4.7.3 Server has 8GB of Ram. I was running duperemove using a hashfile, which seemed to have run out space and aborted. Then I tried a balance operation, with -dusage progressively set to 0 1 5 15 30 50, which then aborted, I presume that this caused the fs to mount readonly. I only noticed it somewhat later. I've since rebooted, and I can mount the filesystem OK, but after some time (I presume caused by reads or writes) it once again switches to readonly. I tried unmounting/remounting again and running a scrub, but the scrub aborts after some time. Here is the output from the kernel when the partition crashes: Apr 03 11:32:57 atlas kernel: BTRFS info (device sda): The free space cache file (37732863967232) is invalid. skip it Apr 03 11:33:46 atlas kernel: BTRFS critical (device sda): corrupt leaf, slot offset bad: block=38666170826752, root=1, slot=157 Apr 03 11:33:46 atlas kernel: [ cut here ] Apr 03 11:33:46 atlas kernel: WARNING: CPU: 0 PID: 17810 at /home/zumbi/linux-4.9.13/fs/btrfs/extent-tree.c:6961 __btrfs_free_extent.isra.69+0x152/0xd60 [b Apr 03 11:33:46 atlas kernel: BTRFS: Transaction aborted (error -5) Apr 03 11:33:46 atlas kernel: Modules linked in: xt_multiport iptable_filter ip_tables x_tables binfmt_misc cpufreq_userspace cpufreq_conservative cpufreq_ Apr 03 11:33:46 atlas kernel: ppdev lp parport autofs4 btrfs xor raid6_pq dm_mod md_mod fuse sg sd_mod ahci libahci libata crc32c_intel scsi_mod fan therm Apr 03 11:33:46 atlas kernel: CPU: 0 PID: 17810 Comm: mc Not tainted 4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1 Apr 03 11:33:46 atlas kernel: Hardware name: ASUS All Series/H87M-E, BIOS 0703 10/30/2013 Apr 03 11:33:46 atlas kernel: 97d29cd5 b8ab4bb53a50 Apr 03 11:33:46 atlas kernel: 97a778a4 154c080b2000 b8ab4bb53aa8 8908ad438b40 Apr 03 11:33:46 atlas kernel: 890951b96000 89086c3d4000 97a7791f Apr 03 11:33:46 atlas kernel: Call Trace: Apr 03 11:33:46 atlas kernel: [] ? dump_stack+0x5c/0x77 Apr 03 11:33:46 atlas kernel: [] ? __warn+0xc4/0xe0 Apr 03 11:33:46 atlas kernel: [] ? warn_slowpath_fmt+0x5f/0x80 Apr 03 11:33:46 atlas kernel: [] ? __btrfs_free_extent.isra.69+0x152/0xd60 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? __btrfs_run_delayed_refs+0x466/0x1360 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? set_extent_buffer_dirty+0x64/0xb0 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? btrfs_run_delayed_refs+0x8f/0x2b0 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? btrfs_should_end_transaction+0x3f/0x60 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? btrfs_truncate_inode_items+0x63a/0xde0 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? btrfs_evict_inode+0x4a2/0x5f0 [btrfs] Apr 03 11:33:46 atlas kernel: [] ? evict+0xb6/0x180 Apr 03 11:33:46 atlas kernel: [] ? do_unlinkat+0x148/0x300 Apr 03 11:33:46 atlas kernel: [] ? system_call_fast_compare_end+0xc/0x9b Apr 03 11:33:46 atlas kernel: ---[ end trace 2a45c2819ff7b785 ]--- Apr 03 11:33:46 atlas kernel: BTRFS: error (device sda) in __btrfs_free_extent:6961: errno=-5 IO failure Apr 03 11:33:46 atlas kernel: BTRFS info (device sda): forced readonly Apr 03 11:33:46 atlas kernel: BTRFS: error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure Apr 03 11:33:50 atlas kernel: BTRFS warning (device sda): failed setting block group ro, ret=-30 Apr 03 11:33:50 atlas kernel: BTRFS warning (device sda): failed setting block group ro, ret=-30 Apr 03 11:33:52 atlas kernel: BTRFS warning (device sda): failed setting block group ro, ret=-30 Apr 03 11:33:53 atlas kernel: BTRFS warning (device sda): Skipping commit of aborted transaction. Apr 03 11:33:53 atlas kernel: BTRFS: error (device sda) in cleanup_transaction:1850: errno=-5 IO failure Apr 03 11:33:53 atlas kernel: BTRFS info (device sda): delayed_refs has NO entry Apr 03 11:33:54 atlas kernel: BTRFS warning (device sda): failed setting block group ro, ret=-30 I tried running a btrfs-debug-tree -b 38666170826752 /dev/sda btrfs-progs v4.7.3 leaf 38666170826752 items 199 free space 1506 generation 1248226 owner 2 fs uuid 8c4f8e26-3442-463f-ad8a-668dfef02593 chunk uuid 1f04f64e-0ec8-4b39-83d9-a2df75179d3e item 0 key (23416295448576 EXTENT_ITEM 36864) itemoff 16230 itemsize 53
Re: help : "bad tree block start" -> btrfs forced readonly
Hi, some news from the coal mine... Le 17/03/2017 à 11:03, Lionel Bouton a écrit : > [...] > I'm considering trying to use a 4 week old snapshot of the device to > find out if it was corrupted or not instead. It will still be a pain if > it works but rsync for less than a month of data is at least an order of > magnitude faster than a full restore. btrfs check -p /dev/sdb is running on this 4 week old snapshot. The extents check passed without any error, it is currently checking the free space (and it's just done while I was writing this and is doing fs roots). I'm not sure of the list of checks it performs. I assume the free space^H... fs roots can't be much longer than the rest (on a ~13TB of 20TB used filesystem with ~ 10 million files and half a dozen subvolumes). It took less than an hour to check extents. I'll give it another hour and stop it if its not done : it's already passing stages than the live data couldn't get to. I may be wrong but I suspect Ceph is innocent of any wrong-doing here : I think there's a high probability that if Ceph could corrupt its data in our configuration the snapshot would have been corrupted too (most of its data is shared with the live data). I wonder if QEMU or the VM kernel managed to transform IO timeouts (which clearly happened below Ceph and were passed to the VM in many instances) into garbage reads which ended in garbage writes. If it isn't in QEMU and happened in the kernel this was with 4.1.15 so it might be a corrected kernel bug in either the block or fs layers. I'm not especially ecstatic at the prospect of testing this behavior again but I will automate more Ceph snapshots in the future (and the VM is now on 4.9.6). Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help : "bad tree block start" -> btrfs forced readonly
Le 17/03/2017 à 10:51, Roman Mamedov a écrit : > On Fri, 17 Mar 2017 10:27:11 +0100 > Lionel Bouton <lionel-subscript...@bouton.name> wrote: > >> Hi, >> >> Le 17/03/2017 à 09:43, Hans van Kranenburg a écrit : >>> btrfs-debug-tree -b 3415463870464 >> Here is what it gives me back : >> >> btrfs-debug-tree -b 3415463870464 /dev/sdb >> btrfs-progs v4.6.1 >> checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 >> checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 >> bytenr mismatch, want=3415463870464, have=72340172838076673 >> ERROR: failed to read 3415463870464 >> >> Is there a way to remove part of the tree and keep the rest ? It could >> help minimize the time needed to restore data. > If you are able to experiment with writable snapshots, you could try using > "btrfs-corrupt-block" to kill the bad block, and see what btrfsck makes out of > the rest. In a similar case I got little to no damage to the overall FS. > http://www.spinics.net/lists/linux-btrfs/msg53061.html > I've launched btrfs check in read-only mode : btrfs check -p /dev/sdb Checking filesystem on /dev/sdb UUID: dbbde1f0-d8a0-4c7c-a7b8-17237e98e525 checksum verify failed on 3415463755776 found A85405B7 wanted 01010101 checksum verify failed on 3415463755776 found A85405B7 wanted 01010101 bytenr mismatch, want=3415463755776, have=72340172838076673 checksum verify failed on 3415464001536 found A85405B7 wanted 01010101 checksum verify failed on 3415464001536 found A85405B7 wanted 01010101 bytenr mismatch, want=3415464001536, have=72340172838076673 checksum verify failed on 3415464640512 found A85405B7 wanted 01010101 checksum verify failed on 3415464640512 found A85405B7 wanted 01010101 bytenr mismatch, want=3415464640512, have=72340172838076673 This goes on for pages... I probably missed some output and then there are lots of errors like this one : ref mismatch on [3415470456832 16384] extent item 1, found 0 Backref 3415470456832 root 3420 not referenced back 0x268013d0 Incorrect global backref count on 3415470456832 found 1 wanted 0 backpointer mismatch on [3415470456832 16384] owner ref check failed [3415470456832 16384] ... Followed by lots of this : ref mismatch on [11010388205568 278528] extent item 1, found 0 checksum verify failed on 3415464869888 found A85405B7 wanted 01010101 checksum verify failed on 3415464869888 found A85405B7 wanted 01010101 bytenr mismatch, want=3415464869888, have=72340172838076673 Incorrect local backref count on 11010388205568 root 257 owner 7487206 offset 0 found 0 wanted 1 back 0x72335670 Backref disk bytenr does not match extent record, bytenr=11010388205568, ref bytenr=0 backpointer mismatch on [11010388205568 278528] owner ref check failed [11010388205568 278528] ... I stopped there : am I correct in thinking that it will take ages to try to salvage this without any guarantee that I'll get a substantial amount of the 10 million files on this filesystem ? I'm considering trying to use a 4 week old snapshot of the device to find out if it was corrupted or not instead. It will still be a pain if it works but rsync for less than a month of data is at least an order of magnitude faster than a full restore. Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help : "bad tree block start" -> btrfs forced readonly
On Fri, 17 Mar 2017 10:27:11 +0100 Lionel Bouton <lionel-subscript...@bouton.name> wrote: > Hi, > > Le 17/03/2017 à 09:43, Hans van Kranenburg a écrit : > > btrfs-debug-tree -b 3415463870464 > > Here is what it gives me back : > > btrfs-debug-tree -b 3415463870464 /dev/sdb > btrfs-progs v4.6.1 > checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 > checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 > bytenr mismatch, want=3415463870464, have=72340172838076673 > ERROR: failed to read 3415463870464 > > Is there a way to remove part of the tree and keep the rest ? It could > help minimize the time needed to restore data. If you are able to experiment with writable snapshots, you could try using "btrfs-corrupt-block" to kill the bad block, and see what btrfsck makes out of the rest. In a similar case I got little to no damage to the overall FS. http://www.spinics.net/lists/linux-btrfs/msg53061.html -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help : "bad tree block start" -> btrfs forced readonly
On 03/17/2017 10:27 AM, Lionel Bouton wrote: > Hi, > > Le 17/03/2017 à 09:43, Hans van Kranenburg a écrit : >> btrfs-debug-tree -b 3415463870464 > > Here is what it gives me back : > > btrfs-debug-tree -b 3415463870464 /dev/sdb > btrfs-progs v4.6.1 > checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 > checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 > bytenr mismatch, want=3415463870464, have=72340172838076673 > ERROR: failed to read 3415463870464 So in the place where checksum is supposed to be stored, it has 01010101 and recomputing the checksum of the garbage results in A85405B7. Found / wanted is also confusing here, since 01010101 is what it found, but A85405B7 is what it 'found out'. > Is there a way to remove part of the tree and keep the rest ? It could > help minimize the time needed to restore data. No, that's not how it works. Those trees are not file/directory structure trees. You can try btrfs-debug-tree and see how far it gets dumping everything it can find, and then search for 3415463870464 in the output. Somewhere, there has to be another object (one level higher) which points to this address. If you find it, you can find out in which tree the block lives. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help : "bad tree block start" -> btrfs forced readonly
Hi, Le 17/03/2017 à 09:43, Hans van Kranenburg a écrit : > btrfs-debug-tree -b 3415463870464 Here is what it gives me back : btrfs-debug-tree -b 3415463870464 /dev/sdb btrfs-progs v4.6.1 checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 checksum verify failed on 3415463870464 found A85405B7 wanted 01010101 bytenr mismatch, want=3415463870464, have=72340172838076673 ERROR: failed to read 3415463870464 Is there a way to remove part of the tree and keep the rest ? It could help minimize the time needed to restore data. Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help : "bad tree block start" -> btrfs forced readonly
On 03/17/2017 09:11 AM, Lionel Bouton wrote: > Le 17/03/2017 à 05:32, Lionel Bouton a écrit : >> Hi, >> >> [...] >> I'll catch some sleep right now (it's 5:28 AM here) but I'll be able to >> work on this in 3 or 4 hours. > > I woke up to this : > > Mar 17 06:56:30 fileserver kernel: btree_readpage_end_io_hook: 104476 > callbacks suppressed > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 > Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree > block start 72340172838076673 3415463870464 The error is about a page of metadata (tree block) that is damaged or has been lost. Your btrfs is reading the metadata page at location 3415463870464 (virtual address space). Inside the page, the address is stored again as a method of verification. The error means that it expected to see metadata items that live in a block at position 3415463870464 in your filesystem virtual address space, but instead it encounters some data, from which the bytes in the location where that address should be translate back to 72340172838076673. I needed to look at the kernel source code to figure this out, the error is not very descriptive. found_start = btrfs_header_bytenr(eb); if (found_start != eb->start) { btrfs_err_rl(fs_info, "bad tree block start %llu %llu", found_start, eb->start); ret = -EIO; goto err; } > and the server was unusable. The impact depends heavily on what part of the metadata it is, which tree it's from, how much tree is hidden behind it etc. You can try btrfs-debug-tree -b 3415463870464 to see if it outputs any readable information. If this was a metadata page, it would have at least a corrupted bytenr field, otherwise it's likely not something in the btrfs metadata format. > I just moved the client to a read-only backup server and we are trying > to find out if we can salvage this or if we start the full restore > procedure. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help : "bad tree block start" -> btrfs forced readonly
Le 17/03/2017 à 05:32, Lionel Bouton a écrit : > Hi, > > [...] > I'll catch some sleep right now (it's 5:28 AM here) but I'll be able to > work on this in 3 or 4 hours. I woke up to this : Mar 17 06:56:30 fileserver kernel: btree_readpage_end_io_hook: 104476 callbacks suppressed Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 17 06:56:30 fileserver kernel: BTRFS error (device sdb): bad tree block start 72340172838076673 3415463870464 and the server was unusable. I just moved the client to a read-only backup server and we are trying to find out if we can salvage this or if we start the full restore procedure. Help ? Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
help : "bad tree block start" -> btrfs forced readonly
fileserver kernel: [] ? kthread+0xbc/0xe0 Mar 16 23:30:20 fileserver kernel: [] ? kthread_create_on_node+0x180/0x180 Mar 16 23:30:20 fileserver kernel: [] ? ret_from_fork+0x42/0x70 Mar 16 23:30:20 fileserver kernel: [] ? kthread_create_on_node+0x180/0x180 Mar 16 23:30:20 fileserver kernel: ---[ end trace f03445c45d440372 ]--- Mar 16 23:30:20 fileserver kernel: BTRFS: error (device sdb) in __btrfs_run_delayed_items:1188: errno=-5 IO failure Mar 16 23:30:20 fileserver kernel: BTRFS info (device sdb): forced readonly Mar 16 23:30:20 fileserver kernel: BTRFS warning (device sdb): Skipping commit of aborted transaction. Mar 16 23:30:20 fileserver kernel: BTRFS: error (device sdb) in cleanup_transaction:1692: errno=-5 IO failure Mar 16 23:30:22 fileserver kernel: BTRFS (device sdb): bad tree block start 72340172838076673 3415463870464 Mar 16 23:30:22 fileserver kernel: BTRFS (device sdb): bad tree block start 72340172838076673 3415463870464 I removed the failing disk from the cluster and rebooted the server. The filesystem mounted fine but some time later I got these : Mar 17 03:49:48 fileserver kernel: BTRFS (device sdb): bad tree block start 72340172838076673 3415464230912 Mar 17 03:49:48 fileserver kernel: BTRFS (device sdb): bad tree block start 72340172838076673 3415464230912 Mar 17 03:49:48 fileserver kernel: BTRFS (device sdb): bad tree block start 72340172838076673 3415464230912 The filesystem didn't remount readonly this time but I installed a new kernel (4.9.6 with the r1 Gentoo patchset insteal of 4.1.15-r1) and rebooted again. I have a snapshot of the full device at the time of each reboot if it can help (I can relatively easily make rw copies and work on them without affecting the ro snapshots) and an earlier one from 4 weeks ago. Can someone please help me determine if I can save this filesystem and how ? I suspect there isn't much damage in quantity (there were only a handful of damaged sectors before the disk was removed). I'm just not sure how I can check if the internal BTRFS structures are still sound and won't create a snowball effect destroying much more. It is still currently used in this state in production and I'm trying to avoid a painful switch to a remote, slow snapshot from yesterday while beginning a very long recovery from scratch (this is at least a 2 weeks procedure maybe more). I'll catch some sleep right now (it's 5:28 AM here) but I'll be able to work on this in 3 or 4 hours. Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help!!! error when mount a btrfs file system
At 03/17/2017 01:36 AM, Liu Bo wrote: On Thu, Mar 16, 2017 at 08:23:05PM +0800, 李云甫 wrote: hi, buddy I have a file server with btrfs file system, it's work well for several months. but after last system reboot, the /dev/sdb become not mountable. below is the details. is there any advise? ##Version info Fedora 25 Server Kernel 4.9.13-201.fc25.x86_64 btrfs-progs v4.6.1 #error messages when mount mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. ##dmesg |tail [79570.756871] BTRFS error (device sdb): parent transid verify failed on 21413888 wanted 755660 found 623605 [79570.762307] BTRFS error (device sdb): bad tree block start 0 21413888 [79570.762345] BTRFS error (device sdb): failed to read chunk tree: -5 [79570.778129] BTRFS error (device sdb): open_ctree failed [79589.743772] BTRFS error (device sdb): support for check_integrity* not compiled in! [79589.803176] BTRFS error (device sdb): open_ctree failed Looks like one node of the chunk tree was zero'd by something, were you use -o discard or fstrim before reboot? Thanks, -liubo ##btrfsck parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 checksum verify failed on 21413888 found E4E3BDB6 wanted parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted I'm afraid not only one, but two chunk tree blocks are zeroed. While you still have a little chance to recovery chunk tree by using backup chunk roots. Would you please paste the output of "btrfs-show-super -f /dev/sdb"? Thanks, Qu bytenr mismatch, want=22888448, have=0 Couldn't read chunk tree Couldn't open file system ##btrfs-find-root parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure Couldn't read chunk tree ERROR: open ctree failed ##btrfs-show-super -a /dev/sdb superblock: bytenr=65536, device=/dev/sdb - csum0xb6f3ccb1 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 2396231680 bytes_used 22205028102144 sectorsize 4096 nodesize16384 leafsize16384 stripesize 4096 root_dir6 num_devices 1 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation770740 uuid_tree_generation770740 dev_item.uuid dd8c8d66-b6f5-48d8-9d5e-6f56b2ad4751 dev_item.fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f [match] dev_item.type 0 dev_item.total_bytes2396231680 dev_item.bytes_used 23274943676416 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 superblock: bytenr=67108864, device=/dev/sdb - csum0x1692e47f [match] bytenr 67108864 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level1 log_root0 log_root_transid0 log_root_level
Re: help!!! error when mount a btrfs file system
On Thu, Mar 16, 2017 at 08:23:05PM +0800, 李云甫 wrote: > hi, buddy > >I have a file server with btrfs file system, it's work well for several > months. > > but after last system reboot, the /dev/sdb become not mountable. > > below is the details. is there any advise? > > > ##Version info > Fedora 25 Server > Kernel 4.9.13-201.fc25.x86_64 > btrfs-progs v4.6.1 > > #error messages when mount > mount: wrong fs type, bad option, bad superblock on /dev/sdb, > missing codepage or helper program, or other error > > In some cases useful info is found in syslog - try > dmesg | tail or so. > > ##dmesg |tail > [79570.756871] BTRFS error (device sdb): parent transid verify failed on > 21413888 wanted 755660 found 623605 > [79570.762307] BTRFS error (device sdb): bad tree block start 0 21413888 > [79570.762345] BTRFS error (device sdb): failed to read chunk tree: -5 > [79570.778129] BTRFS error (device sdb): open_ctree failed > [79589.743772] BTRFS error (device sdb): support for check_integrity* not > compiled in! > [79589.803176] BTRFS error (device sdb): open_ctree failed > Looks like one node of the chunk tree was zero'd by something, were you use -o discard or fstrim before reboot? Thanks, -liubo > ##btrfsck > parent transid verify failed on 21413888 wanted 755660 found 623605 > parent transid verify failed on 21413888 wanted 755660 found 623605 > checksum verify failed on 21413888 found E4E3BDB6 wanted > parent transid verify failed on 21413888 wanted 755660 found 623605 > Ignoring transid failure > checksum verify failed on 21331968 found E4E3BDB6 wanted > checksum verify failed on 21331968 found E4E3BDB6 wanted > checksum verify failed on 21692416 found E4E3BDB6 wanted > checksum verify failed on 21692416 found E4E3BDB6 wanted > checksum verify failed on 22888448 found E4E3BDB6 wanted > checksum verify failed on 22888448 found E4E3BDB6 wanted > checksum verify failed on 22888448 found E4E3BDB6 wanted > checksum verify failed on 22888448 found E4E3BDB6 wanted > bytenr mismatch, want=22888448, have=0 > Couldn't read chunk tree > Couldn't open file system > > ##btrfs-find-root > parent transid verify failed on 21413888 wanted 755660 found 623605 > parent transid verify failed on 21413888 wanted 755660 found 623605 > parent transid verify failed on 21413888 wanted 755660 found 623605 > Ignoring transid failure > Couldn't read chunk tree > ERROR: open ctree failed > > ##btrfs-show-super -a /dev/sdb > superblock: bytenr=65536, device=/dev/sdb > - > csum 0xb6f3ccb1 [match] > bytenr65536 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f > label samba_fs > generation770740 > root 16187774615552 > sys_array_size355 > chunk_root_generation 755799 > root_level1 > chunk_root24331161698304 > chunk_root_level 1 > log_root 0 > log_root_transid 0 > log_root_level0 > total_bytes 2396231680 > bytes_used22205028102144 > sectorsize4096 > nodesize 16384 > leafsize 16384 > stripesize4096 > root_dir 6 > num_devices 1 > compat_flags 0x0 > compat_ro_flags 0x0 > incompat_flags0x169 > ( MIXED_BACKREF | > COMPRESS_LZO | > BIG_METADATA | > EXTENDED_IREF | > SKINNY_METADATA ) > csum_type 0 > csum_size 4 > cache_generation 770740 > uuid_tree_generation 770740 > dev_item.uuid dd8c8d66-b6f5-48d8-9d5e-6f56b2ad4751 > dev_item.fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f [match] > dev_item.type 0 > dev_item.total_bytes 2396231680 > dev_item.bytes_used 23274943676416 > dev_item.io_align 4096 > dev_item.io_width 4096 > dev_item.sector_size 4096 > dev_item.devid1 > dev_item.dev_group0 > dev_item.seek_speed 0 > dev_item.bandwidth0 > dev_item.generation 0 > > superblock: bytenr=67108864, device=/dev/sdb > - > csum 0x1692e47f [match] > bytenr67108864 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f > label samba_fs > generation770740 > root 16187774615552 > sys_array_size355 > chunk_root_generation 755799 > root_level1 > chunk_root24331161698304 > chunk_root_level 1 > log_root 0 > log_root_transid 0 > log_root_level0 > total_bytes 2396231680 > bytes_used
help!!! error when mount a btrfs file system
hi, buddy I have a file server with btrfs file system, it's work well for several months. but after last system reboot, the /dev/sdb become not mountable. below is the details. is there any advise? ##Version info Fedora 25 Server Kernel 4.9.13-201.fc25.x86_64 btrfs-progs v4.6.1 #error messages when mount mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. ##dmesg |tail [79570.756871] BTRFS error (device sdb): parent transid verify failed on 21413888 wanted 755660 found 623605 [79570.762307] BTRFS error (device sdb): bad tree block start 0 21413888 [79570.762345] BTRFS error (device sdb): failed to read chunk tree: -5 [79570.778129] BTRFS error (device sdb): open_ctree failed [79589.743772] BTRFS error (device sdb): support for check_integrity* not compiled in! [79589.803176] BTRFS error (device sdb): open_ctree failed ##btrfsck parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 checksum verify failed on 21413888 found E4E3BDB6 wanted parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted bytenr mismatch, want=22888448, have=0 Couldn't read chunk tree Couldn't open file system ##btrfs-find-root parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure Couldn't read chunk tree ERROR: open ctree failed ##btrfs-show-super -a /dev/sdb superblock: bytenr=65536, device=/dev/sdb - csum0xb6f3ccb1 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 2396231680 bytes_used 22205028102144 sectorsize 4096 nodesize16384 leafsize16384 stripesize 4096 root_dir6 num_devices 1 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation770740 uuid_tree_generation770740 dev_item.uuid dd8c8d66-b6f5-48d8-9d5e-6f56b2ad4751 dev_item.fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f [match] dev_item.type 0 dev_item.total_bytes2396231680 dev_item.bytes_used 23274943676416 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 superblock: bytenr=67108864, device=/dev/sdb - csum0x1692e47f [match] bytenr 67108864 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 2396231680 bytes_used 22205028102144 sectorsize 4096 nodesize16384 leafsize16384 stripesize 4096 root_dir6 num_devices 1 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size
help!!! error when mount a btrfs file system
hi, buddy I have a file server with btrfs file system, it's work well for several months. but after last system reboot, the /dev/sdb become not mountable. below is the details. is there any advise? ##Version info Fedora 25 Server Kernel 4.9.13-201.fc25.x86_64 btrfs-progs v4.6.1 #error messages when mount mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. ##dmesg |tail [79570.756871] BTRFS error (device sdb): parent transid verify failed on 21413888 wanted 755660 found 623605 [79570.762307] BTRFS error (device sdb): bad tree block start 0 21413888 [79570.762345] BTRFS error (device sdb): failed to read chunk tree: -5 [79570.778129] BTRFS error (device sdb): open_ctree failed [79589.743772] BTRFS error (device sdb): support for check_integrity* not compiled in! [79589.803176] BTRFS error (device sdb): open_ctree failed ##btrfsck parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 checksum verify failed on 21413888 found E4E3BDB6 wanted parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21331968 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 21692416 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted checksum verify failed on 22888448 found E4E3BDB6 wanted bytenr mismatch, want=22888448, have=0 Couldn't read chunk tree Couldn't open file system ##btrfs-find-root parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 parent transid verify failed on 21413888 wanted 755660 found 623605 Ignoring transid failure Couldn't read chunk tree ERROR: open ctree failed ##btrfs-show-super -a /dev/sdb superblock: bytenr=65536, device=/dev/sdb - csum 0xb6f3ccb1 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root 16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 2396231680 bytes_used 22205028102144 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 1 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 770740 uuid_tree_generation 770740 dev_item.uuid dd8c8d66-b6f5-48d8-9d5e-6f56b2ad4751 dev_item.fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f [match] dev_item.type 0 dev_item.total_bytes 2396231680 dev_item.bytes_used 23274943676416 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 superblock: bytenr=67108864, device=/dev/sdb - csum 0x1692e47f [match] bytenr 67108864 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root 16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 2396231680 bytes_used 22205028102144 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 1 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 770740 uuid_tree_generation 770740 dev_item.uuid dd8c8d66-b6f5-48d8-9d5e-6f56b2ad4751 dev_item.fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f [match] dev_item.type 0 dev_item.total_bytes 2396231680 dev_item.bytes_used 23274943676416 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 superblock: bytenr=274877906944, device=/dev/sdb - csum 0xeb15b24e [match] bytenr 274877906944 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 7f5aecd6-40fe-45ce-90c3-e86bacc4bf6f label samba_fs generation 770740 root 16187774615552 sys_array_size 355 chunk_root_generation 755799 root_level 1 chunk_root 24331161698304 chunk_root_level 1 log_root 0
Re: Help understanding autodefrag details
On 2017-02-10 09:21, Peter Zaitsev wrote: Hi, As I have been reading btrfs whitepaper it speaks about autodefrag in very generic terms - once random write in the file is detected it is put in the queue to be defragmented. Yet I could not find any specifics about this process described anywhere. My use case is databases and as such large files (100GB+)so my questions are - is my understanding what defrag queue is based on files not parts of files which got fragmented correct ? Autodefrag is location based within the file, not for the whole file. I forget the exact size of the area around the write it will try to defrag, and the maximum size the write can be to trigger it, but the selection amounts to the following: 1. Is this write not likely to be followed by a write to the next logical address in the file? (I'm not certain exactly what heuristic is used to determine this). 2. Is this write small enough to likely cause fragmentation? (This one is a simple threshold test, but I forget the threshold). 3. If both 1 and 2 are true, schedule the area containing the write to be defragmented. - Is single random write is enough to schedule file for defrag or is there some more elaborate math to consider file fragmented and needing optimization ? I'm not sure. It depends on whether or not the random write detection heuristic that is used has some handling for the first few writes, or needs some data from their position to determine the 'randomness' of future writes. - Is this queue FIFO or is it priority queue where files in more need of fragmentation jump in front (or is there some other mechanics ? I think it's a FIFO queue, but there may be multiple threads servicing it, and I think it's smart enough to merge areas that overlap into a single operation. - Will file to be attempted to be defragmented completely or does defrag focuses on the most fragmented areas of the file first ? AFAIK, autodefrag only defrags the region around where the write happened. - Is there any way to view this defrag queue ? Not that I know of, but in most cases it should be mostly empty, since the areas being handled are usually small enough that items get processed pretty quick. - How are resources allocated to background autodefrag vs resources serving foreground user load are controlled AFAIK, there is no way to manually control this. It would be kind of nice though if autodefrag ran as it's own thread. - What are space requirements for defrag ? is it required for the space to be available for complete file copy or is it not required ? Pretty minimal space requirements. Even regular defrag technically doesn't need enough space for the whole file. Both work with whatever amount of space they have, but you obviously get better results with more free space. - Can defrag handle file which is being constantly written to or is it based on the concept what file should be idle for some time and when it is going to be defragmented In my experience, it handles files seeing constant writes just fine, even if you're saturating the disk bandwidth (it will just reduce your effective bandwidth a small amount). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help understanding autodefrag details
Hi, As I have been reading btrfs whitepaper it speaks about autodefrag in very generic terms - once random write in the file is detected it is put in the queue to be defragmented. Yet I could not find any specifics about this process described anywhere. My use case is databases and as such large files (100GB+)so my questions are - is my understanding what defrag queue is based on files not parts of files which got fragmented correct ? - Is single random write is enough to schedule file for defrag or is there some more elaborate math to consider file fragmented and needing optimization ? - Is this queue FIFO or is it priority queue where files in more need of fragmentation jump in front (or is there some other mechanics ? - Will file to be attempted to be defragmented completely or does defrag focuses on the most fragmented areas of the file first ? - Is there any way to view this defrag queue ? - How are resources allocated to background autodefrag vs resources serving foreground user load are controlled - What are space requirements for defrag ? is it required for the space to be available for complete file copy or is it not required ? - Can defrag handle file which is being constantly written to or is it based on the concept what file should be idle for some time and when it is going to be defragmented Let me know if you have any information on these -- Peter Zaitsev, CEO, Percona Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File system is oddly full after kernel upgrade, balance doesn't help
lable and will error out, without using the global reserve. So if at any time btrfs reports more than 0 global reserve used, it means btrfs thinks it's in pretty serious straits and it's in quite a pickle, making non-zero global reserve usage a primary indicator of a filesystem in trouble, no matter what else is reported. So with all that said, you can see that on that 8-gig per device, pair- device raid1, btrfs has allocated only 512 MiB of metadata on each device, of which 232 MiB on each is used, *nominally* leaving 280 MiB metadata unused on each device, tho global reserve comes from that. But, there's only 16 MiB of global reserve, counted only once. If we assume it'd be used equally from each device, that's 8 MiB of global reserve on each device subtracted from that 280 MiB nominally free, leaving 272 MiB of metadata free, a reasonably healthy filesystem state, considering that's more metadata than actually used, plus there's nearly 4.5 GiB entirely unallocated on each device, that can be allocated to data or metadata as needed. That's quite a contrast compared to yours, a quarter the size, 2 GiB instead of 8, and as you have only the single device, the metadata defaulted to dup, so it uses twice as much space on the single device. But the *real* contrast is as you said, your global reserve, an entirely unrealistic half a GiB, on a 2 GiB filesystem! Of course global reserve being accounted single, while your metadata is dup, half should come from each side of that dup, so your real metadata usage vs. free can be calculated as 577.5 size (per side of the dup) - 37.5 (normal used), - 256 (half of the global reserve), basically 284 MiB of usable metadata space (per side of the dup, but each side should be used equally). Add to that the ~100 MiB unallocated, tho if used for dup metadata you'd only have half that usable, and you're not in /horrible/ shape. But that 512 MiB global reserve, a quarter of the total filesystem size, is just killing you. And unless it has something to do with snapshots/subvolumes, I don't have a clue why, or what to do about it. But here's what I'd try, based on the answer to the question of whether you use snapshots/subvolumes (or use any of the btrfs reflink-based dedup tools as they have many of the same implications as snapshots, tho the scope is of course a bit different), and how many you have if so: * Snapshots and reflinks are great, but unfortunately, have limited scaling ability at this time. While on normal sized btrfs the limit before scaling becomes an issue seems to be a few hundred (under 1000 and for most under 500), it /may/ be that on a btrfs as small as your two- GiB, more than say 10 may be an issue. As I said, I don't /know/ if it'll help, but if you're over this, I'd certainly try reducing the number of snapshots/reflinks to under 10 per subvolume/file and see if it helps at all. * You /may/ be able to try btrfs bal start -musage=, starting with a relatively low value (you tried 0, it's percentage, try 2, 5, 10.. up toward 100%, until you see some results or you get ENOSPC errors), and see some results. However, typical metadata chunks are 256 MiB in size, tho they should be smaller on a 2 GiB btrfs, but I'm not sure by how much, and it's relatively likely you'll run into ENOSPC errors due to metadata chunks larger than half (dup so it'll take two chunks of the same size) your unallocated space size, before you get anywhere, even if balancing would otherwise help -- which again I'm not even sure it will, as I don't know whether it helps with bloated global reserve, or not. * If the balance ENOSPCs, you may of course try (temporarily) increasing the size of the filesystem, possibly by adding a device. There's discussion of that on the wiki. But I honestly don't know how global reserve will behave, because something's clearly going on with it and I have no idea what. For all I know, it'll eat most of the new space again, and you'll be in an even worse position, as it won't then let you remove the device you added to try to fix the problem. * Similarly, but perhaps less risky with regard to global reserve size, tho definitely being more risky in terms of data safety in case something goes wrong (but the data's backed up, right?), you could try doing a btrfs balance start -mconvert=single, to reduce the metadata usage from dup to single mode. Tho personally, I'd probably bother with the risk, simply double-checking my backups, then going ahead with the next one instead of this one. * Since in data admin terms, data without a backup is considered to be defined by the lack thereof of that backup, as worth less than the time and trouble necessary to do it, and that applies even stronger to a still under heavy development and not yet fully stable filesystem such as btrfs, it's relatively safe to assume you either have a backup, or don't really care about the possibility of losing the
Re: File system is oddly full after kernel upgrade, balance doesn't help
Hello, Of course I can't retrieve the data from before the balance, but here is the data from now: root@vmhost:~# btrfs fi show /tmp/mnt/curlybrace Label: 'curlybrace' uuid: f471bfca-51c4-4e44-ac72-c6cd9ccaf535 Total devices 1 FS bytes used 752.38MiB devid1 size 2.00GiB used 1.90GiB path /dev/mapper/vmdata--vg-lxc--curlybrace root@vmhost:~# btrfs fi df /tmp/mnt/curlybrace Data, single: total=773.62MiB, used=714.82MiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=577.50MiB, used=37.55MiB GlobalReserve, single: total=512.00MiB, used=0.00B root@vmhost:~# btrfs fi usage /tmp/mnt/curlybrace Overall: Device size: 2.00GiB Device allocated: 1.90GiB Device unallocated: 103.38MiB Device missing: 0.00B Used: 789.94MiB Free (estimated): 162.18MiB(min: 110.50MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB(used: 0.00B) Data,single: Size:773.62MiB, Used:714.82MiB /dev/mapper/vmdata--vg-lxc--curlybrace 773.62MiB Metadata,DUP: Size:577.50MiB, Used:37.55MiB /dev/mapper/vmdata--vg-lxc--curlybrace 1.13GiB System,DUP: Size:8.00MiB, Used:16.00KiB /dev/mapper/vmdata--vg-lxc--curlybrace 16.00MiB Unallocated: /dev/mapper/vmdata--vg-lxc--curlybrace 103.38MiB So... if I sum the data, metadata, and the global reserve, I see why only ~170 MB is left. I have no idea, however, why the global reserve sneaked up to 512 MB for such a small file system, and how could I resolve this situation. Any ideas? MegaBrutal 2017-01-28 7:46 GMT+01:00 Duncan <1i5t5.dun...@cox.net>: > MegaBrutal posted on Fri, 27 Jan 2017 19:45:00 +0100 as excerpted: > >> Hi, >> >> Not sure if it caused by the upgrade, but I only encountered this >> problem after I upgraded to Ubuntu Yakkety, which comes with a 4.8 >> kernel. >> Linux vmhost 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> >> This is the 2nd file system which showed these symptoms, so I thought >> it's more than happenstance. I don't remember what I did with the first >> one, but I somehow managed to fix it with balance, if I remember >> correctly, but it doesn't help with this one. >> >> FS state before any attempts to fix: >> Filesystem 1M-blocks Used Available Use% Mounted on >> [...]curlybrace 1024 1024 0 100% /tmp/mnt/curlybrace >> >> Resized LV, run „btrfs filesystem resize max /tmp/mnt/curlybrace”: >> [...]curlybrace 2048 1303 0 100% /tmp/mnt/curlybrace >> >> Notice how the usage magically jumped up to 1303 MB, and despite the FS >> size is 2048 MB, the usage is still displayed as 100%. >> >> Tried full balance (other options with -dusage had no result): >> root@vmhost:~# btrfs balance start -v /tmp/mnt/curlybrace > >> Starting balance without any filters. >> ERROR: error during balancing '/tmp/mnt/curlybrace': >> No space left on device > >> No space left on device? How? >> >> But it changed the situation: >> [...]curlybrace 2048 1302 190 88% /tmp/mnt/curlybrace >> >> This is still not acceptable. I need to recover at least 50% free space >> (since I increased the FS to the double). >> >> A 2nd balance attempt resulted in this: >> [...]curlybrace 2048 1302 162 89% /tmp/mnt/curlybrace >> >> So... it became slightly worse. >> >> What's going on? How can I fix the file system to show real data? > > Something seems off, yes, but... > > https://btrfs.wiki.kernel.org/index.php/FAQ > > Reading the whole thing will likely be useful, but especially 1.3/1.4 and > 4.6-4.9 discussing the problem of space usage, reporting, and (primarily > in some of the other space related FAQs beyond the specific ones above) > how to try and fix it when space runes out, on btrfs. > > If you read them before, read them again, because you didn't post the > btrfs free-space reports covered in 4.7, instead posting what appears to > be the standard (non-btrfs) df report, which for all the reasons > explained in the FAQ, is at best only an estimate on btrfs. That > estimate is obviously behaving unexpectedly in your case, but without the > btrfs specific reports, it's nigh impossible to even guess with any > chance at accuracy what's going on, or how to fix it. > > A WAG would be that part of the problem might be that you were into > global reserve before the resize, so after the filesystem got more space > to use, the first thing it did was unload that global reserve usage, > thereby immediately upping apparent usage. That might explain that
Re: File system is oddly full after kernel upgrade, balance doesn't help
MegaBrutal posted on Fri, 27 Jan 2017 19:45:00 +0100 as excerpted: > Hi, > > Not sure if it caused by the upgrade, but I only encountered this > problem after I upgraded to Ubuntu Yakkety, which comes with a 4.8 > kernel. > Linux vmhost 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > > This is the 2nd file system which showed these symptoms, so I thought > it's more than happenstance. I don't remember what I did with the first > one, but I somehow managed to fix it with balance, if I remember > correctly, but it doesn't help with this one. > > FS state before any attempts to fix: > Filesystem 1M-blocks Used Available Use% Mounted on > [...]curlybrace 1024 1024 0 100% /tmp/mnt/curlybrace > > Resized LV, run „btrfs filesystem resize max /tmp/mnt/curlybrace”: > [...]curlybrace 2048 1303 0 100% /tmp/mnt/curlybrace > > Notice how the usage magically jumped up to 1303 MB, and despite the FS > size is 2048 MB, the usage is still displayed as 100%. > > Tried full balance (other options with -dusage had no result): > root@vmhost:~# btrfs balance start -v /tmp/mnt/curlybrace > Starting balance without any filters. > ERROR: error during balancing '/tmp/mnt/curlybrace': > No space left on device > No space left on device? How? > > But it changed the situation: > [...]curlybrace 2048 1302 190 88% /tmp/mnt/curlybrace > > This is still not acceptable. I need to recover at least 50% free space > (since I increased the FS to the double). > > A 2nd balance attempt resulted in this: > [...]curlybrace 2048 1302 162 89% /tmp/mnt/curlybrace > > So... it became slightly worse. > > What's going on? How can I fix the file system to show real data? Something seems off, yes, but... https://btrfs.wiki.kernel.org/index.php/FAQ Reading the whole thing will likely be useful, but especially 1.3/1.4 and 4.6-4.9 discussing the problem of space usage, reporting, and (primarily in some of the other space related FAQs beyond the specific ones above) how to try and fix it when space runes out, on btrfs. If you read them before, read them again, because you didn't post the btrfs free-space reports covered in 4.7, instead posting what appears to be the standard (non-btrfs) df report, which for all the reasons explained in the FAQ, is at best only an estimate on btrfs. That estimate is obviously behaving unexpectedly in your case, but without the btrfs specific reports, it's nigh impossible to even guess with any chance at accuracy what's going on, or how to fix it. A WAG would be that part of the problem might be that you were into global reserve before the resize, so after the filesystem got more space to use, the first thing it did was unload that global reserve usage, thereby immediately upping apparent usage. That might explain that initial jump in usage after the resize. But that's just a WAG. Without at least btrfs filesystem usage, or btrfs filesystem df plus btrfs filesystem show, from before the resize, after, and before and after the balances, a WAG is what it remains. And again, without those reports, there's no way to say whether balance can be expected to help, or not. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
File system is oddly full after kernel upgrade, balance doesn't help
Hi, Not sure if it caused by the upgrade, but I only encountered this problem after I upgraded to Ubuntu Yakkety, which comes with a 4.8 kernel. Linux vmhost 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux This is the 2nd file system which showed these symptoms, so I thought it's more than happenstance. I don't remember what I did with the first one, but I somehow managed to fix it with balance, if I remember correctly, but it doesn't help with this one. FS state before any attempts to fix: Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/vmdata--vg-lxc--curlybrace 1024 1024 0 100% /tmp/mnt/curlybrace Resized LV, run „btrfs filesystem resize max /tmp/mnt/curlybrace”: /dev/mapper/vmdata--vg-lxc--curlybrace 2048 1303 0 100% /tmp/mnt/curlybrace Notice how the usage magically jumped up to 1303 MB, and despite the FS size is 2048 MB, the usage is still displayed as 100%. Tried full balance (other options with -dusage had no result): root@vmhost:~# btrfs balance start -v /tmp/mnt/curlybrace Dumping filters: flags 0x7, state 0x0, force is off DATA (flags 0x0): balancing METADATA (flags 0x0): balancing SYSTEM (flags 0x0): balancing WARNING: Full balance without filters requested. This operation is very intense and takes potentially very long. It is recommended to use the balance filters to narrow down the balanced data. Use 'btrfs balance start --full-balance' option to skip this warning. The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1 Starting balance without any filters. ERROR: error during balancing '/tmp/mnt/curlybrace': No space left on device There may be more info in syslog - try dmesg | tail No space left on device? How? But it changed the situation: /dev/mapper/vmdata--vg-lxc--curlybrace 2048 1302 190 88% /tmp/mnt/curlybrace This is still not acceptable. I need to recover at least 50% free space (since I increased the FS to the double). A 2nd balance attempt resulted in this: /dev/mapper/vmdata--vg-lxc--curlybrace 2048 1302 162 89% /tmp/mnt/curlybrace So... it became slightly worse. What's going on? How can I fix the file system to show real data? Regards, MegaBrutal -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
ize 4 cache_generation75012 uuid_tree_generation75012 dev_item.uuid 108c02c0-9812-428e-8f90-23bdf88e11bf dev_item.fsid 82651f91-4989-415b-bd83-ae830f12608c [match] dev_item.type 0 dev_item.total_bytes536869842944 dev_item.bytes_used 440259313664 dev_item.io_align 0 dev_item.io_width 0 dev_item.sector_size0 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 Regards, Jari > Regards, > Xin > > > > Sent: Monday, December 19, 2016 at 2:32 AM > From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> > To: linux-btrfs@vger.kernel.org > Cc: "Xin Zhou" <xin.z...@gmx.com> > Subject: Re: Help please: BTRFS fs crashed due to bad removal of USB drive, > no help from recovery procedures > Xin Zhou <xin.z...@gmx.com> kirjoitti 17.12.2016 kello 22.27: >> >> Hi Jari, >> >> Similar with other file system, btrfs has copies of super blocks. >> Try to run "man btrfs check", "man btrfs rescue" and related commands for >> more details. >> Regards, >> Xin > > Hi Xin, > > I did follow all recovery procedures from man and wiki pages. Tools do not > help as they thing there is no BTRFS fs anymore. However if I try to reformat > the device I get: > > btrfs-progs v4.4 > See http://btrfs.wiki.kernel.org for more information. > /dev/sdb1 appears to contain an existing filesystem (btrfs). > > So, recovery tools seem to thing there is no btrfs filesystem. Mkfs seems to > thing there is. > > What I have tried: > btrfsck /dev/sdb1 > mount -t btrfs -o ro /dev/sdb1 /mnt/share/ > mount -t btrfs -o ro,recovery /dev/sdb1 /mnt/share/ > mount -t btrfs -o roootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ > mount -t btrfs -o rootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ > mount -t btrfs -o rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 > /mnt/share/ > mount -t btrfs -o ro,rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 > /mnt/share/ > btrfs restore /dev/sdb1 /target/device > btrfs rescue zero-log /dev/sdb1 > btrfsck --init-csum-tree /dev/sdb1 > btrfsck --fix-crc /dev/sdb1 > btrfsck --check-data-csum /dev/sdb1 > btrfs rescue chunk-recover /dev/sdb1 > btrfs rescue super-recover /dev/sdb1 > btrfs rescue zero-log /dev/sdb1 > > No help whatsoever. > > Jari > >> >> >> >> Sent: Saturday, December 17, 2016 at 2:06 AM >> From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> >> To: linux-btrfs@vger.kernel.org >> Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no >> help from recovery procedures >> Syslog tells: >> [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 >> [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors >> [ 135.462544] BTRFS error (device sdb1): open_ctree failed >> >> What have been done: >> * All "btrfs rescue" options >> >> Info on system >> * fs on external SSD via USB >> * kernel 4.9.0 (tried with 4.8.13) >> * btrfs-tools 4.4 >> * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 >> >> Any help appreciated. Around 300G of TV recordings on the drive, which of >> course will eventually come as replays. >> >> Jari >> -- >> *** Jari Seppälä >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at >> http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] > -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Hi Jari, The message shows: > [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors So according this info, before trying to run repair / rescue procedure, would you like to show the 0,1,2 superblock status? Regards, Xin Sent: Monday, December 19, 2016 at 2:32 AM From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> To: linux-btrfs@vger.kernel.org Cc: "Xin Zhou" <xin.z...@gmx.com> Subject: Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures Xin Zhou <xin.z...@gmx.com> kirjoitti 17.12.2016 kello 22.27: > > Hi Jari, > > Similar with other file system, btrfs has copies of super blocks. > Try to run "man btrfs check", "man btrfs rescue" and related commands for > more details. > Regards, > Xin Hi Xin, I did follow all recovery procedures from man and wiki pages. Tools do not help as they thing there is no BTRFS fs anymore. However if I try to reformat the device I get: btrfs-progs v4.4 See http://btrfs.wiki.kernel.org for more information. /dev/sdb1 appears to contain an existing filesystem (btrfs). So, recovery tools seem to thing there is no btrfs filesystem. Mkfs seems to thing there is. What I have tried: btrfsck /dev/sdb1 mount -t btrfs -o ro /dev/sdb1 /mnt/share/ mount -t btrfs -o ro,recovery /dev/sdb1 /mnt/share/ mount -t btrfs -o roootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o rootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o ro,rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 /mnt/share/ btrfs restore /dev/sdb1 /target/device btrfs rescue zero-log /dev/sdb1 btrfsck --init-csum-tree /dev/sdb1 btrfsck --fix-crc /dev/sdb1 btrfsck --check-data-csum /dev/sdb1 btrfs rescue chunk-recover /dev/sdb1 btrfs rescue super-recover /dev/sdb1 btrfs rescue zero-log /dev/sdb1 No help whatsoever. Jari > > > > Sent: Saturday, December 17, 2016 at 2:06 AM > From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> > To: linux-btrfs@vger.kernel.org > Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no > help from recovery procedures > Syslog tells: > [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 > [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors > [ 135.462544] BTRFS error (device sdb1): open_ctree failed > > What have been done: > * All "btrfs rescue" options > > Info on system > * fs on external SSD via USB > * kernel 4.9.0 (tried with 4.8.13) > * btrfs-tools 4.4 > * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 > > Any help appreciated. Around 300G of TV recordings on the drive, which of > course will eventually come as replays. > > Jari > -- > *** Jari Seppälä > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at > http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Xin Zhou <xin.z...@gmx.com> kirjoitti 17.12.2016 kello 22.27: > > Hi Jari, > > Similar with other file system, btrfs has copies of super blocks. > Try to run "man btrfs check", "man btrfs rescue" and related commands for > more details. > Regards, > Xin Hi Xin, I did follow all recovery procedures from man and wiki pages. Tools do not help as they thing there is no BTRFS fs anymore. However if I try to reformat the device I get: btrfs-progs v4.4 See http://btrfs.wiki.kernel.org for more information. /dev/sdb1 appears to contain an existing filesystem (btrfs). So, recovery tools seem to thing there is no btrfs filesystem. Mkfs seems to thing there is. What I have tried: btrfsck /dev/sdb1 mount -t btrfs -o ro /dev/sdb1 /mnt/share/ mount -t btrfs -o ro,recovery /dev/sdb1 /mnt/share/ mount -t btrfs -o roootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o rootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o ro,rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 /mnt/share/ btrfs restore /dev/sdb1 /target/device btrfs rescue zero-log /dev/sdb1 btrfsck --init-csum-tree /dev/sdb1 btrfsck --fix-crc /dev/sdb1 btrfsck --check-data-csum /dev/sdb1 btrfs rescue chunk-recover /dev/sdb1 btrfs rescue super-recover /dev/sdb1 btrfs rescue zero-log /dev/sdb1 No help whatsoever. Jari > > > > Sent: Saturday, December 17, 2016 at 2:06 AM > From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> > To: linux-btrfs@vger.kernel.org > Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no > help from recovery procedures > Syslog tells: > [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 > [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors > [ 135.462544] BTRFS error (device sdb1): open_ctree failed > > What have been done: > * All "btrfs rescue" options > > Info on system > * fs on external SSD via USB > * kernel 4.9.0 (tried with 4.8.13) > * btrfs-tools 4.4 > * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 > > Any help appreciated. Around 300G of TV recordings on the drive, which of > course will eventually come as replays. > > Jari > -- > *** Jari Seppälä > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Hi Jari, Similar with other file system, btrfs has copies of super blocks. Try to run "man btrfs check", "man btrfs rescue" and related commands for more details. Regards, Xin Sent: Saturday, December 17, 2016 at 2:06 AM From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> To: linux-btrfs@vger.kernel.org Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures Syslog tells: [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors [ 135.462544] BTRFS error (device sdb1): open_ctree failed What have been done: * All "btrfs rescue" options Info on system * fs on external SSD via USB * kernel 4.9.0 (tried with 4.8.13) * btrfs-tools 4.4 * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 Any help appreciated. Around 300G of TV recordings on the drive, which of course will eventually come as replays. Jari -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Syslog tells: [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors [ 135.462544] BTRFS error (device sdb1): open_ctree failed What have been done: * All "btrfs rescue" options Info on system * fs on external SSD via USB * kernel 4.9.0 (tried with 4.8.13) * btrfs-tools 4.4 * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 Any help appreciated. Around 300G of TV recordings on the drive, which of course will eventually come as replays. Jari -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with stack trace
So it's btrfs problem, i catch hung again with 4.8.7, and i can't catch if ES data stored on ext4 Trace from 4.8.7: Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: INFO: task btrfs-transacti:4143 blocked for more than 120 seconds. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Not tainted 4.8.0-1-amd64 #1 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: btrfs-transacti D 9dd15e0d8180 0 4143 2 0x Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd954a97100 9dd15a7b80c0 920e5e15 9dd956ffbe08 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd956ffc000 9dd9553091f0 9dd955309000 9dd9553091f0 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd954a97100 925eb4d1 9dca41f6e550 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Call Trace: Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? try_to_del_timer_sync+0x55/0x80 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? schedule+0x31/0x80 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? wake_atomic_t_function+0x60/0x60 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? start_transaction+0x273/0x4b0 [btrfs] Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? transaction_kthread+0x77/0x200 [btrfs] Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs] Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? kthread+0xcd/0xf0 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? ret_from_fork+0x1f/0x40 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? kthread_create_on_node+0x190/0x190 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: INFO: task htop:12776 blocked for more than 120 seconds. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Not tainted 4.8.0-1-amd64 #1 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: htopD 9dd95d898180 0 12776 1 0x0004 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd84d02d0c0 9dd959c9a0c0 9dd183ed3e00 0041 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd84d05 9dd84d04fdf8 9dca4319cc68 9dca4319cc80 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd84d04fd90 925eb4d1 9dd84d02d0c0 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Call Trace: Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? schedule+0x31/0x80 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? rwsem_down_read_failed+0xf8/0x150 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? call_rwsem_down_read_failed+0x14/0x30 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? down_read+0x1c/0x30 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? proc_pid_cmdline_read+0xae/0x540 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? vfs_read+0x90/0x130 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? SyS_read+0x52/0xc0 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? system_call_fast_compare_end+0xc/0x96 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: INFO: task iotop:12785 blocked for more than 120 seconds. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Not tainted 4.8.0-1-amd64 #1 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: iotop D 9dd15e158180 0 12785 1 0x0004 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd855a92100 9dd15a7ba140 9dd9546e1c00 7ff86ac74000 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd8549cc000 9dd8549cbdf8 9dca4319cc68 9dca4319cc80 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd8549cbd90 925eb4d1 9dd855a92100 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Call Trace: Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? schedule+0x31/0x80 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? rwsem_down_read_failed+0xf8/0x150 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? call_rwsem_down_read_failed+0x14/0x30 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? down_read+0x1c/0x30 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? proc_pid_cmdline_read+0xae/0x540 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? vfs_read+0x90/0x130 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? SyS_read+0x52/0xc0 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: [] ? system_call_fast_compare_end+0xc/0x96 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: INFO: task java:18198 blocked for more than 120 seconds. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: Not tainted 4.8.0-1-amd64 #1 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: javaD 9dd95d898180 0 18198 1 0x0100 Nov 25 14:09:30 msq-k1-srv-ids-01 kernel: 9dd660e1a140 9dd959c9a0c0 9dd660e1a140
Help with stack trace
Hi, i use btrfs as a storage for root and data for ElasticSearch servers and i catch strange bug then servers hungs. But i get this stack trace only if start Elastic. Debian 8 x64 Linux msq-k1-srv-ids-02 4.8.0-1-amd64 #1 SMP Debian 4.8.5-1 (2016-10-28) x86_64 GNU/Linux Also catch it on Debian linux 4.7.6 btrfs-progs v4.7.3 btrfs check don't find any errors, so i think may be this is some kind of race condition? Stack trace: [ 365.619814] INFO: task kworker/u480:1:205 blocked for more than 120 seconds. [ 365.619891] Not tainted 4.8.0-1-amd64 #1 [ 365.619926] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 365.619984] kworker/u480:1 D 888d7bb18180 0 205 2 0x [ 365.620103] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 365.620158] 888d7235a000 888d74b820c0 c03c68fb 888d6b9d3a58 [ 365.620227] 888d6b9d4000 887e12709508 ff00 888d7235a000 [ 365.620292] 888d7235a000 888d6b9d3a70 aabeb4e1 887e127094a0 [ 365.620358] Call Trace: [ 365.620416] [] ? btrfs_get_token_32+0x6b/0x130 [btrfs] [ 365.620475] [] ? schedule+0x31/0x80 [ 365.620542] [] ? btrfs_tree_read_lock+0xd5/0x120 [btrfs] [ 365.620597] [] ? wake_atomic_t_function+0x60/0x60 [ 365.620666] [] ? btrfs_read_lock_root_node+0x2f/0x40 [btrfs] [ 365.620742] [] ? btrfs_search_slot+0x756/0x9f0 [btrfs] [ 365.620817] [] ? btrfs_buffer_uptodate+0x4b/0x70 [btrfs] [ 365.620889] [] ? generic_bin_search.constprop.37+0x9b/0x210 [btrfs] [ 365.620971] [] ? btrfs_lookup_file_extent+0x4a/0x70 [btrfs] [ 365.621049] [] ? __btrfs_drop_extents+0x164/0xdd0 [btrfs] [ 365.621105] [] ? kmem_cache_alloc+0xbc/0x530 [ 365.621176] [] ? insert_reserved_file_extent.constprop.64+0xb4/0x330 [btrfs] [ 365.621263] [] ? start_transaction+0x95/0x4b0 [btrfs] [ 365.621336] [] ? btrfs_finish_ordered_io+0x307/0x680 [btrfs] [ 365.621394] [] ? check_preempt_curr+0x50/0x90 [ 365.621467] [] ? btrfs_scrubparity_helper+0xd1/0x2d0 [btrfs] [ 365.621524] [] ? process_one_work+0x160/0x410 [ 365.621570] [] ? worker_thread+0x4d/0x480 [ 365.621614] [] ? process_one_work+0x410/0x410 [ 365.621662] [] ? kthread+0xcd/0xf0 [ 365.621704] [] ? ret_from_fork+0x1f/0x40 [ 365.621748] [] ? kthread_create_on_node+0x190/0x190 [ 365.621799] INFO: task kworker/u480:2:1467 blocked for more than 120 seconds. [ 365.621852] Not tainted 4.8.0-1-amd64 #1 [ 365.621886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 365.621942] kworker/u480:2 D 888d7bd98180 0 1467 2 0x [ 365.622032] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 365.622085] 888d6bab50c0 888d74b8d040 c03c68fb 888d6a1f7a58 [ 365.622151] 888d6a1f8000 887e12709508 ff00 888d6bab50c0 [ 365.622217] 888d6bab50c0 888d6a1f7a70 aabeb4e1 887e127094a0 [ 365.622283] Call Trace: [ 365.622334] [] ? btrfs_get_token_32+0x6b/0x130 [btrfs] [ 365.622387] [] ? schedule+0x31/0x80 [ 365.622453] [] ? btrfs_tree_read_lock+0xd5/0x120 [btrfs] [ 365.622506] [] ? wake_atomic_t_function+0x60/0x60 [ 365.622575] [] ? btrfs_read_lock_root_node+0x2f/0x40 [btrfs] [ 365.624240] [] ? btrfs_search_slot+0x756/0x9f0 [btrfs] [ 365.625855] [] ? swiotlb_map_sg_attrs+0x6a/0x130 [ 365.627491] [] ? btrfs_lookup_file_extent+0x4a/0x70 [btrfs] [ 365.629124] [] ? __btrfs_drop_extents+0x164/0xdd0 [btrfs] [ 365.630692] [] ? kmem_cache_alloc+0xbc/0x530 [ 365.632279] [] ? insert_reserved_file_extent.constprop.64+0xb4/0x330 [btrfs] [ 365.633759] [] ? start_transaction+0x95/0x4b0 [btrfs] [ 365.635211] [] ? btrfs_finish_ordered_io+0x307/0x680 [btrfs] [ 365.636647] [] ? check_preempt_curr+0x50/0x90 [ 365.638095] [] ? btrfs_scrubparity_helper+0xd1/0x2d0 [btrfs] [ 365.639526] [] ? process_one_work+0x160/0x410 [ 365.640959] [] ? worker_thread+0x4d/0x480 [ 365.642361] [] ? process_one_work+0x410/0x410 [ 365.643769] [] ? kthread+0xcd/0xf0 [ 365.645072] [] ? ret_from_fork+0x1f/0x40 [ 365.646349] [] ? kthread_create_on_node+0x190/0x190 [ 365.647669] INFO: task btrfs-transacti:4130 blocked for more than 120 seconds. [ 365.648981] Not tainted 4.8.0-1-amd64 #1 [ 365.650270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 365.651585] btrfs-transacti D 888d7bc18180 0 4130 2 0x [ 365.652928] 888d6a269000 888d71ec aa6e6f39 888d6d38be08 [ 365.654270] 888d6d38c000 888d720eb9f0 888d720eb800 888d720eb9f0 [ 365.655612] 888d6a269000 aabeb4e1 888d6b57e3a0 [ 365.656951] Call Trace: [ 365.658183] [] ? try_to_del_timer_sync+0x59/0x80 [ 365.659411] [] ? schedule+0x31/0x80 [ 365.660680] [] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] [ 365.662001] [] ? wake_atomic_t_function+0x60/0x60 [ 365.663249] [] ?
Re: Help repairing a partition
On Fri, Oct 21, 2016 at 12:36 AM, Suvayu Ali <fatkasuvayu+li...@gmail.com> wrote: > I had upgraded to 4.7.3 to test this issue: > > https://bugzilla.redhat.com/show_bug.cgi?id=1372910 > > It hadn't helped, but I didn't have time to debug it any further. > Since the Fedora 23 repos have 4.4.1, I guess downgrading is easier > for me. Better is to go to http://koji.fedoraproject.org/ and type in btrfs-progs for the package, and find the most recent x.y-1.z version - right now that's 4.7.3, although 4.8.1 is probably OK also - it has no new features, mainly just a pile of bug fixes, which might be useful. So that'd be either: btrfs-progs-4.8.1-2.fc26 or btrfs-progs-4.7.3-1.fc26 And rpmbuild --rebuild them for F23 and then install. I would not downgrade to 4.4.1 - it's not that it's bad, it's just a waste of time if it can't help fix the problem which is very likely the older progs you have. > > Thanks for the pointer to the changelog; under 4.7.2 it mentions not > to repair with 4.7.1, so I'll try `btrfs check --repair` after the > downgrade. No. The older the progs the less safe the repair is. And this particular problem you have probably needs a newer progs to fix it anyway. So you need to go newer not older. That's pretty much always the case with Btrfs. > >>> followed by this summary: >>> >>> checking csums >>> checking root refs >>> checking quota groups >>> Counts for qgroup id: 0/257 are different >>> our:referenced 7746465792 referenced compressed 7746465792 >>> disk: referenced 7746461696 referenced compressed 7746461696 >>> diff: referenced 4096 referenced compressed 4096 >>> our:exclusive 7746465792 exclusive compressed 7746465792 >>> disk: exclusive 7746461696 exclusive compressed 7746461696 >>> diff: exclusive 4096 exclusive compressed 4096 >>> Counts for qgroup id: 0/259 are different >>> our:referenced 135641784320 referenced compressed 135641784320 >>> disk: referenced 135633862656 referenced compressed 135633862656 >>> diff: referenced 7921664 referenced compressed 7921664 >>> our:exclusive 135641784320 exclusive compressed 135641784320 >>> disk: exclusive 135633862656 exclusive compressed 135633862656 >>> diff: exclusive 7921664 exclusive compressed 7921664 >>> found 167864082432 bytes used err is 0 >>> total csum bytes: 161187492 >>> total tree bytes: 2021015552 >>> total fs tree bytes: 1725759488 >>> total extent tree bytes: 86228992 >>> btree space waste bytes: 386160897 >>> file data blocks allocated: 1269363683328 >>> referenced 164438126592 >>> >>> How do I repair this? >> >> Yeah good question. I can't tell from the message whether different >> counts is a bad thing, or if it's just a notification, or what. Yet >> again btrfs-progs does not help the user make informed decisions, it's >> really frustrating. I think that part can be ignored though for now, >> and see if btrfs check --repair can fix the problem now that you have >> a backup. > > Indeed, I have never been this confused about a file system before. > > I tried repairing after the downgrade to 4.4.1, it says "Couldn't open > file system"! Mounting now works without errors, I can also r/w files > as normal; go figure! Oh shit. That's hilarious. I'm not even going to edit what I wrote above. Anyway, it looks like you have quotas enabled. There are a number of quota related bug fixes in progs newer than 4.4, so you really ought to use something newer, and if it breaks then it's a bug and needs a good bug report write up so it can get fixed. In the meantime I would be wary with this file system if it's the only backup copy. (Actually I feel that way no matter the file system.) I'd make sure btrfs check with progs 4.7.3 or 4.8.1 come up clean (i.e. err is 0 is generally a good sign), and that a scrub also comes up clean with no errors: either 'btrfs scrub start ' and then later check with 'btrfs scrub status' or use -BR flag to not background and show stats after completion. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help repairing a partition
Hi Chris, Thanks for your response :). On 21 October 2016 at 05:18, Chris Murphy <li...@colorremedies.com> wrote: > On Thu, Oct 20, 2016 at 3:20 PM, Suvayu Ali <fatkasuvayu+li...@gmail.com> > wrote: >> Hi, >> >> (please CC me in replies, I'm not subscribed) >> >> I'm using kernel 4.7.7-100.fc23 with btrfs-progs v4.7.1. >> >> I had my /home, /var, and /opt as subvolumes in a btrfs partition. >> Last night btrfs failed, and I was unable to mount it normally >> (leading to boot failures). The journal had messages like this: >> >> BTRFS: open_ctree failed >> BTRFS error: super_total_bytes ... mismatch with fs_devices total_rw_bytes >> BTRFS error: failed to read chunk tree: -22 >> >> Finally I managed to mount it manually like this (after making a dd >> image of the partition): >> >> # mount -t btrfs -o ro,recovery,nospace_cache /dev/sdb2 /mnt >> >> and managed to recover my data. Initially "btrfs check" yielded a few >> >> parent transid verify failed on 101679726592 wanted 822619 found 822617 >> >> and >> >> checksum verify failed on 101756387328 found 78C8A0BC wanted B7C59D79 >> >> however after backing up my data, I mounted without the "-o ro" (I got >> a transid related message, but it did mount). "btrfs check" now spits >> out a whole bunch of: >> >> Incorrect local backref count on 202118008832 root 259 owner 178928 >> offset 41181184 found 2 wanted 7 back 0x55713fbbf150 >> Incorrect global backref count on 202118008832 found 2 wanted 7 >> backpointer mismatch on [202118008832 376832] > > > This is a known problem with btrfs-progs 4.7.1 it should not be used. > https://btrfs.wiki.kernel.org/index.php/Changelog#btrfs-progs_4.7.1_.28Aug_2016.29 > > Upgrade to 4.7.3 or 4.8.1 is advised. I had upgraded to 4.7.3 to test this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1372910 It hadn't helped, but I didn't have time to debug it any further. Since the Fedora 23 repos have 4.4.1, I guess downgrading is easier for me. Thanks for the pointer to the changelog; under 4.7.2 it mentions not to repair with 4.7.1, so I'll try `btrfs check --repair` after the downgrade. >> followed by this summary: >> >> checking csums >> checking root refs >> checking quota groups >> Counts for qgroup id: 0/257 are different >> our:referenced 7746465792 referenced compressed 7746465792 >> disk: referenced 7746461696 referenced compressed 7746461696 >> diff: referenced 4096 referenced compressed 4096 >> our:exclusive 7746465792 exclusive compressed 7746465792 >> disk: exclusive 7746461696 exclusive compressed 7746461696 >> diff: exclusive 4096 exclusive compressed 4096 >> Counts for qgroup id: 0/259 are different >> our:referenced 135641784320 referenced compressed 135641784320 >> disk: referenced 135633862656 referenced compressed 135633862656 >> diff: referenced 7921664 referenced compressed 7921664 >> our:exclusive 135641784320 exclusive compressed 135641784320 >> disk: exclusive 135633862656 exclusive compressed 135633862656 >> diff: exclusive 7921664 exclusive compressed 7921664 >> found 167864082432 bytes used err is 0 >> total csum bytes: 161187492 >> total tree bytes: 2021015552 >> total fs tree bytes: 1725759488 >> total extent tree bytes: 86228992 >> btree space waste bytes: 386160897 >> file data blocks allocated: 1269363683328 >> referenced 164438126592 >> >> How do I repair this? > > Yeah good question. I can't tell from the message whether different > counts is a bad thing, or if it's just a notification, or what. Yet > again btrfs-progs does not help the user make informed decisions, it's > really frustrating. I think that part can be ignored though for now, > and see if btrfs check --repair can fix the problem now that you have > a backup. Indeed, I have never been this confused about a file system before. I tried repairing after the downgrade to 4.4.1, it says "Couldn't open file system"! Mounting now works without errors, I can also r/w files as normal; go figure! Cheers, -- Suvayu Open source is the future. It sets us free. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help repairing a partition
On Thu, Oct 20, 2016 at 3:20 PM, Suvayu Ali <fatkasuvayu+li...@gmail.com> wrote: > Hi, > > (please CC me in replies, I'm not subscribed) > > I'm using kernel 4.7.7-100.fc23 with btrfs-progs v4.7.1. > > I had my /home, /var, and /opt as subvolumes in a btrfs partition. > Last night btrfs failed, and I was unable to mount it normally > (leading to boot failures). The journal had messages like this: > > BTRFS: open_ctree failed > BTRFS error: super_total_bytes ... mismatch with fs_devices total_rw_bytes > BTRFS error: failed to read chunk tree: -22 > > Finally I managed to mount it manually like this (after making a dd > image of the partition): > > # mount -t btrfs -o ro,recovery,nospace_cache /dev/sdb2 /mnt > > and managed to recover my data. Initially "btrfs check" yielded a few > > parent transid verify failed on 101679726592 wanted 822619 found 822617 > > and > > checksum verify failed on 101756387328 found 78C8A0BC wanted B7C59D79 > > however after backing up my data, I mounted without the "-o ro" (I got > a transid related message, but it did mount). "btrfs check" now spits > out a whole bunch of: > > Incorrect local backref count on 202118008832 root 259 owner 178928 > offset 41181184 found 2 wanted 7 back 0x55713fbbf150 > Incorrect global backref count on 202118008832 found 2 wanted 7 > backpointer mismatch on [202118008832 376832] This is a known problem with btrfs-progs 4.7.1 it should not be used. https://btrfs.wiki.kernel.org/index.php/Changelog#btrfs-progs_4.7.1_.28Aug_2016.29 Upgrade to 4.7.3 or 4.8.1 is advised. > > followed by this summary: > > checking csums > checking root refs > checking quota groups > Counts for qgroup id: 0/257 are different > our:referenced 7746465792 referenced compressed 7746465792 > disk: referenced 7746461696 referenced compressed 7746461696 > diff: referenced 4096 referenced compressed 4096 > our:exclusive 7746465792 exclusive compressed 7746465792 > disk: exclusive 7746461696 exclusive compressed 7746461696 > diff: exclusive 4096 exclusive compressed 4096 > Counts for qgroup id: 0/259 are different > our:referenced 135641784320 referenced compressed 135641784320 > disk: referenced 135633862656 referenced compressed 135633862656 > diff: referenced 7921664 referenced compressed 7921664 > our:exclusive 135641784320 exclusive compressed 135641784320 > disk: exclusive 135633862656 exclusive compressed 135633862656 > diff: exclusive 7921664 exclusive compressed 7921664 > found 167864082432 bytes used err is 0 > total csum bytes: 161187492 > total tree bytes: 2021015552 > total fs tree bytes: 1725759488 > total extent tree bytes: 86228992 > btree space waste bytes: 386160897 > file data blocks allocated: 1269363683328 > referenced 164438126592 > > How do I repair this? Yeah good question. I can't tell from the message whether different counts is a bad thing, or if it's just a notification, or what. Yet again btrfs-progs does not help the user make informed decisions, it's really frustrating. I think that part can be ignored though for now, and see if btrfs check --repair can fix the problem now that you have a backup. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help repairing a partition
Hi, (please CC me in replies, I'm not subscribed) I'm using kernel 4.7.7-100.fc23 with btrfs-progs v4.7.1. I had my /home, /var, and /opt as subvolumes in a btrfs partition. Last night btrfs failed, and I was unable to mount it normally (leading to boot failures). The journal had messages like this: BTRFS: open_ctree failed BTRFS error: super_total_bytes ... mismatch with fs_devices total_rw_bytes BTRFS error: failed to read chunk tree: -22 Finally I managed to mount it manually like this (after making a dd image of the partition): # mount -t btrfs -o ro,recovery,nospace_cache /dev/sdb2 /mnt and managed to recover my data. Initially "btrfs check" yielded a few parent transid verify failed on 101679726592 wanted 822619 found 822617 and checksum verify failed on 101756387328 found 78C8A0BC wanted B7C59D79 however after backing up my data, I mounted without the "-o ro" (I got a transid related message, but it did mount). "btrfs check" now spits out a whole bunch of: Incorrect local backref count on 202118008832 root 259 owner 178928 offset 41181184 found 2 wanted 7 back 0x55713fbbf150 Incorrect global backref count on 202118008832 found 2 wanted 7 backpointer mismatch on [202118008832 376832] followed by this summary: checking csums checking root refs checking quota groups Counts for qgroup id: 0/257 are different our:referenced 7746465792 referenced compressed 7746465792 disk: referenced 7746461696 referenced compressed 7746461696 diff: referenced 4096 referenced compressed 4096 our:exclusive 7746465792 exclusive compressed 7746465792 disk: exclusive 7746461696 exclusive compressed 7746461696 diff: exclusive 4096 exclusive compressed 4096 Counts for qgroup id: 0/259 are different our:referenced 135641784320 referenced compressed 135641784320 disk: referenced 135633862656 referenced compressed 135633862656 diff: referenced 7921664 referenced compressed 7921664 our:exclusive 135641784320 exclusive compressed 135641784320 disk: exclusive 135633862656 exclusive compressed 135633862656 diff: exclusive 7921664 exclusive compressed 7921664 found 167864082432 bytes used err is 0 total csum bytes: 161187492 total tree bytes: 2021015552 total fs tree bytes: 1725759488 total extent tree bytes: 86228992 btree space waste bytes: 386160897 file data blocks allocated: 1269363683328 referenced 164438126592 How do I repair this? Any thoughts and guidance would be greatly appreciated. I am not well versed with all the btrfs commands and utilities, so I hope I have managed to provide all the right information. Thanks, PS: I see that it now it mounts normally as well! As in, with default fstab options, so I guess I can boot. I would still like to repair the errors. -- Suvayu Open source is the future. It sets us free. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Some help with the code.
On Tue, Sep 06, 2016 at 04:22:25PM +0100, Tomasz Kusmierz wrote: > This is predominantly for maintainers: > > I've noticed that there is a lot of code for btrfs ... and after few > glimpses I've noticed that there are occurrences which beg for some > refactoring to make it less of a pain to maintain. > > I'm speaking of occurrences where: > - within a function there are multiple checks for null pointer and > then whenever there is anything hanging on the end of that pointer to > finally call the function, pass the pointer to it and watch it perform > same checks to finally deallocate stuff on the end of a pointer. Can you please point me to an example? If it's a bad pattern it would be worth cleaning up. > - single line functions ... called only in two places That might not be always useless, as the function name tells us what it does, not how, so it's a form of selfdocumenting code. If the function body is some common code construct, it would be harder to grep for it. But I understand what you mean. This could be also a leftover from some broader changes that removed calls, reduced function size to the one line. > and so on. > > I know that you guys are busy, but maintaining code that is only > growing must be a pain. Depends. Standalone features bring a lot of new code, but it's separated. Random sample of patches from recent releases tells me that net line growth is spread accross many patches that add just a few lines (eg. enhanced tests, more helpers). https://btrfs.wiki.kernel.org/index.php/Contributors#Statistics Doing broader cleanups is good when done from time to time, as it tends to interfere with other patches, so it's more a matter of scheduling when to do it. The beginning or end of the particular development cycle are good candidates. Reducing size should be done in the way that does not make the code less readable, which is kind of subjective metric but should be sorted when patches (or samples) are posted. That said, cleanups and refactoring patches are welcome. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Some help with the code.
This is predominantly for maintainers: I've noticed that there is a lot of code for btrfs ... and after few glimpses I've noticed that there are occurrences which beg for some refactoring to make it less of a pain to maintain. I'm speaking of occurrences where: - within a function there are multiple checks for null pointer and then whenever there is anything hanging on the end of that pointer to finally call the function, pass the pointer to it and watch it perform same checks to finally deallocate stuff on the end of a pointer. - single line functions ... called only in two places and so on. I know that you guys are busy, but maintaining code that is only growing must be a pain. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/13] btrfs-progs: mkfs: help and usage now to to stdout
Signed-off-by: David Sterba--- mkfs.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/mkfs.c b/mkfs.c index f063323903dc..ef0b099a58d7 100644 --- a/mkfs.c +++ b/mkfs.c @@ -344,31 +344,31 @@ static int create_data_reloc_tree(struct btrfs_trans_handle *trans, static void print_usage(int ret) { - fprintf(stderr, "usage: mkfs.btrfs [options] dev [ dev ... ]\n"); - fprintf(stderr, "options:\n"); - fprintf(stderr, "\t-A|--alloc-start START the offset to start the FS\n"); - fprintf(stderr, "\t-b|--byte-count SIZEtotal number of bytes in the FS\n"); - fprintf(stderr, "\t-d|--data PROFILE data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n"); - fprintf(stderr, "\t-f|--force force overwrite of existing filesystem\n"); - fprintf(stderr, "\t-l|--leafsize SIZE deprecated, alias for nodesize\n"); - fprintf(stderr, "\t-L|--label LABELset a label\n"); - fprintf(stderr, "\t-m|--metadata PROFILE metadata profile, values like data profile\n"); - fprintf(stderr, "\t-M|--mixed mix metadata and data together\n"); - fprintf(stderr, "\t-n|--nodesize SIZE size of btree nodes\n"); - fprintf(stderr, "\t-s|--sectorsize SIZEmin block allocation (may not mountable by current kernel)\n"); - fprintf(stderr, "\t-r|--rootdir DIRthe source directory\n"); - fprintf(stderr, "\t-K|--nodiscard do not perform whole device TRIM\n"); - fprintf(stderr, "\t-O|--features LIST comma separated list of filesystem features, use '-O list-all' to list features\n"); - fprintf(stderr, "\t-U|--uuid UUID specify the filesystem UUID\n"); - fprintf(stderr, "\t-q|--quiet no messages except errors\n"); - fprintf(stderr, "\t-V|--versionprint the mkfs.btrfs version and exit\n"); + printf("usage: mkfs.btrfs [options] dev [ dev ... ]\n"); + printf("options:\n"); + printf("\t-A|--alloc-start START the offset to start the FS\n"); + printf("\t-b|--byte-count SIZEtotal number of bytes in the FS\n"); + printf("\t-d|--data PROFILE data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n"); + printf("\t-f|--force force overwrite of existing filesystem\n"); + printf("\t-l|--leafsize SIZE deprecated, alias for nodesize\n"); + printf("\t-L|--label LABELset a label\n"); + printf("\t-m|--metadata PROFILE metadata profile, values like data profile\n"); + printf("\t-M|--mixed mix metadata and data together\n"); + printf("\t-n|--nodesize SIZE size of btree nodes\n"); + printf("\t-s|--sectorsize SIZEmin block allocation (may not mountable by current kernel)\n"); + printf("\t-r|--rootdir DIRthe source directory\n"); + printf("\t-K|--nodiscard do not perform whole device TRIM\n"); + printf("\t-O|--features LIST comma separated list of filesystem features, use '-O list-all' to list features\n"); + printf("\t-U|--uuid UUID specify the filesystem UUID\n"); + printf("\t-q|--quiet no messages except errors\n"); + printf("\t-V|--versionprint the mkfs.btrfs version and exit\n"); exit(ret); } static void print_version(void) __attribute__((noreturn)); static void print_version(void) { - fprintf(stderr, "mkfs.btrfs, part of %s\n", PACKAGE_STRING); + printf("mkfs.btrfs, part of %s\n", PACKAGE_STRING); exit(0); } -- 2.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
04.06.2016 20:31, B. S. пишет: >>> >>> Yeah, when it comes to FDE, you either have to make your peace with >>> trusting the manufacturer, or you can't. If you are going to boot >>> your system with a traditional boot loader, an unencrypted partition >>> is mandatory. >> >> No, it is not with grub2 that supports LUKS (and geli in *BSD world). Of >> course initial grub image must be written outside of encrypted area and >> readable by firmware. > > Good to know. Do you have a link to a how to on such? > As long as you use grub-install and grub-mkconfig this "just works" in the sense they both detect encrypted container and add necessary drivers and other steps to access it. The only manual setup is to add GRUB_ENABLE_CRYPTODISK=y to /etc/default/grub. You will need to enter LUKS password twice - once in GRUB, once in kernel (there is no interface for passing passphrase from bootloader to Linux kernel). Some suggest including passphrase in initrd (on assumption that it is encrypted anyway already); there are patches to support use of external keyfile in grub as well. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
04.06.2016 22:05, Chris Murphy пишет: ... >> >> Yeah, when it comes to FDE, you either have to make your peace with >> trusting the manufacturer, or you can't. If you are going to boot your >> system with a traditional boot loader, an unencrypted partition is >> mandatory. > > /boot can be encrypted, GRUB supports this, but I'm unaware of any > installer that does. openSUSE supports installation on LUKS encrypted /boot. Installer has some historical limitations regarding how encrypted container can be setup, but bootloader part should be OK (including secure boot support). > The ESP can't be encrypted. > It should be possible if you use hardware encryption (SED). > http://dustymabe.com/2015/07/06/encrypting-more-boot-joins-the-party/ > > It's vaguely possible for the SED variety of drive to support fully > encrypted everything, including the ESP. The problem is we don't have > OPAL support on Linux at all anywhere. And for some inexplicable > reason, the TCG hasn't commissioned a free UEFI application for > managing the keys and unlocking the drive in the preboot environment. > For now, it seems, such support has to already be in the firmware. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
On Fri, Jun 3, 2016 at 7:39 PM, Justin Brownwrote: > Here's some thoughts: > >> Assume a CD sized (680MB) /boot > > Some distros carry patches for grub that allow booting from Btrfs Upstream GRUB has had Btrfs support for a long time. There's been no need for distros to carry separate patches for years. The exception is openSUSE, where they have a healthy set of patches for supporting the discovery of and boot of read only snapshots created by snapper. Those patches are not merged upstream, I'm not sure if they will be. >, so > no separate /boot file system is required. (Fedora does not; Ubuntu -- > and therefore probably all Debians -- does.) The problem on Fedora is that they depend on grubby to modify the grub.cfg. And grubby gets confused when the kernel/initramfs are located on a Btrfs subvolume other than the top level. And Fedora's installer only installs the system onto a subvolume (specifically, every mount point defined in the installer becomes a subvolume if you use Btrfs). So it's stuck being unable to support /boot if it's on Btrfs. > >> perhaps a 200MB (?) sized EFI partition > > Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB > might be the max UEFI allows. You're confusing the ESP with BIOSBoot. The minimum size for 512 byte sector drives per Microsoft's technotes is 100MiB. Most OEMs use something between 100MiB and 300MiB. Apple creates a 200MB ESP even though they don't use it for booting, rather just to stage firmware updates. The UEFI spec itself doesn't say how big the ESP should be. 200MiBi is sane for 512 byte drives. It needs to be 260MiB minimum for 4Kn drives, because of the minimum number of FAT allocation units at 4096 bytes each requires a 260MiB minimum volume. > >> The additional problem is most articles reference FDE (Full Disk Encryption) >> - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having >> problems finding concise links on the topics, -FDE -"Full Disk Encryption". > > Yeah, when it comes to FDE, you either have to make your peace with > trusting the manufacturer, or you can't. If you are going to boot your > system with a traditional boot loader, an unencrypted partition is > mandatory. /boot can be encrypted, GRUB supports this, but I'm unaware of any installer that does. The ESP can't be encrypted. http://dustymabe.com/2015/07/06/encrypting-more-boot-joins-the-party/ It's vaguely possible for the SED variety of drive to support fully encrypted everything, including the ESP. The problem is we don't have OPAL support on Linux at all anywhere. And for some inexplicable reason, the TCG hasn't commissioned a free UEFI application for managing the keys and unlocking the drive in the preboot environment. For now, it seems, such support has to already be in the firmware. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
On 06/04/2016 03:46 AM, Andrei Borzenkov wrote: 04.06.2016 04:39, Justin Brown пишет: Here's some thoughts: Assume a CD sized (680MB) /boot Some distros carry patches for grub that allow booting from Btrfs, so no separate /boot file system is required. (Fedora does not; Ubuntu -- and therefore probably all Debians -- does.) Which grub (or which Fedora) do you mean? btrfs support is upstream since 2010. There are restrictions, in particular RAID levels support (RAID5/6 are not implemented). Good to know / be reminded of (such specifics) - thanks. perhaps a 200MB (?) sized EFI partition Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB might be the max UEFI allows. You may want to review recent discussion on systemd regarding systemd boot (a.k.a. gummiboot) which wants to have ESP mounted as /boot. UEFI mandates support for FAT32 on ESP so max size should be whatever max size FAT32 has. ... The additional problem is most articles reference FDE (Full Disk Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having problems finding concise links on the topics, -FDE -"Full Disk Encryption". Yeah, when it comes to FDE, you either have to make your peace with trusting the manufacturer, or you can't. If you are going to boot your system with a traditional boot loader, an unencrypted partition is mandatory. No, it is not with grub2 that supports LUKS (and geli in *BSD world). Of course initial grub image must be written outside of encrypted area and readable by firmware. Good to know. Do you have a link to a how to on such? That being said, we live in a world with UEFI Secure Boot. While your EFI parition must be unencrypted vfat, you can sign the kernels (or shims), and the UEFI can be configured to only boot signed executables, including only those signed by your own key. Some distros already provide this feature, including using keys probably already trusted by the default keystore. UEFI Secure Boot is rather orthogonal to the question of disk encryption. Perhaps, but not orthogonal to the OP question. In the end, the OP is about all this 'stuff' landing at once, the majority btrfs centric, and a call for help finding the end of the string to pull on in a linear way. e.g., as pointed out, most articles premising FDE, which is not in play per OP. The OP requesting pointers to good concise how to links. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
04.06.2016 04:39, Justin Brown пишет: > Here's some thoughts: > >> Assume a CD sized (680MB) /boot > > Some distros carry patches for grub that allow booting from Btrfs, > so no separate /boot file system is required. (Fedora does not; > Ubuntu -- and therefore probably all Debians -- does.) > Which grub (or which Fedora) do you mean? btrfs support is upstream since 2010. There are restrictions, in particular RAID levels support (RAID5/6 are not implemented). >> perhaps a 200MB (?) sized EFI partition > > Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB > might be the max UEFI allows. > You may want to review recent discussion on systemd regarding systemd boot (a.k.a. gummiboot) which wants to have ESP mounted as /boot. UEFI mandates support for FAT32 on ESP so max size should be whatever max size FAT32 has. ... > >> The additional problem is most articles reference FDE (Full Disk >> Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted >> /boot. So having problems finding concise links on the topics, -FDE >> -"Full Disk Encryption". > > Yeah, when it comes to FDE, you either have to make your peace with > trusting the manufacturer, or you can't. If you are going to boot > your system with a traditional boot loader, an unencrypted partition > is mandatory. No, it is not with grub2 that supports LUKS (and geli in *BSD world). Of course initial grub image must be written outside of encrypted area and readable by firmware. > That being said, we live in a world with UEFI Secure > Boot. While your EFI parition must be unencrypted vfat, you can sign > the kernels (or shims), and the UEFI can be configured to only boot > signed executables, including only those signed by your own key. Some > distros already provide this feature, including using keys probably > already trusted by the default keystore. > UEFI Secure Boot is rather orthogonal to the question of disk encryption. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
r, an unencrypted partition is mandatory. That being said, we live in a world with UEFI Secure Boot. Another learning curve (UEFI) to swallow at the same time as all the other here. Current install is the first time it has occurred to me to try to incorporate SecureBoot, UEFI, crypt, and all such 'goodness' on a fresh (raw) install. Debian is bringing apt-secure along for the ride on me, too. While your EFI parition must be unencrypted vfat, you can sign the kernels (or shims), and the UEFI can be configured to only boot signed executables, including only those signed by your own key. Some distros already provide this feature, including using keys probably already trusted by the default keystore. mirror subvolumes (or it inherently comes along for the ride?) Yes, that is correct. Just to give some more background: the data and metadata profiles control "mirroring," and they are set at the file system level. Subvolumes live entirely within one file system, so whatever profile is set in the FS applies to subvolumes. Gotcha, thus your dup observation. However ... the question was aimed at a crypto sda3, thus containing @, and probably @home, sda4 created ... how might one kick in to (btrfs) mirror sda3 in sda4, including @ and @home. I would guess, from your comment, once one adds sda4 to the sda3 set, all that (sda3) profile applies, gets applied to sda4, and all the btrfs magic goodness ... just happens. Particularly after running balance to force all that goodness to happen at once / now, rather than upon next write. So, I could take an HD, create partitions as above (how? e.g. Set up encryption / btrfs mirror volumes), then clonezilla (?) partitions from a current machine in. Are you currently using Btrfs? If so, use Btrfs' `send` and `receive` commands. Yeah. Ick. :-) Have had better luck in the past just cloning or mounting and cp -a. Likely, my lack of experience was the issue. In any case, here, the question was pointed at a new install. > That should be lot friendlier to your SSD. (I'll take this opportunity to say that you need to consider the `discard` mount *and* `/etc/crypttab` options. Discard -- or scheduling `fstrim` -- is extremely important to maintain optimal performance of a SSD, but there are some privacy trade-offs on encrypted systems.) If not, then `cp -a` or similar will work. SSD not yet in play here, but I do take your point. I had to work through all that on the SSD I do have, so I do know to peek at such whenever an SSD come into play. Didn't know about the /etc/crypttab options, thanks for that. Heck, hadn't gotten as far as knowing there was an /etc/crypttab. - thus, I think part of my OP question is what all am I attempting to swallow in one go on a fresh install here? I get dmcrypt and uefi is involved, so I can start to break down the googling into component pieces. Thus, I think, the request for an appropriate link. Something that does it all on a fresh install in one go would be good, particularly if it identifies the major sub-topics, and has 'links to more info'. Obviously, you'll have to get your boot mechanism and file system identifiers updated in addition to `/etc/crypttab` described above. Lastly, strongly consider `autodefrag` and possibly setting some highly violatile -- but *unimportant* -- directories to `nodatacow` via purging and `chattr +C`. (I do this for ~/.cache and /var/cache.) Yep, autodefrag is in the mount options. I have a number of home systems running btrfs for some years now. Started with Kubuntu 12.04 LTS (since running hwe kernels to get later btrfs tools), and a couple of 14.04's. GB rsyncs and mondoarchives fly all about the house in cascading archives, nightly. A recent 4TB HD failure is part of the reason for the OP questions. A scrub at the time revealed many failures, and dealing with that and figuring out which files to fetch from secondary archives was a challenge. BUT, FANTASTICALLY, for the first time (pre-btrfs days), at least btrfs / something specifically identified WHICH files were botched. I wasn't left wondering what botched file will reveal itself months from now ... after the botched file had cascaded to all backups! Having been bitten, and facing a new install, thought I'd better OP. Yet not looking to put in a 2nd HD If you change your mind and decide on a backup device, or even if you just want local backup snapshots, one of the best snapshot managers is btrfs-sxbackup (no association with the FS project). Thank you for that! Thus far, keeping only the OS on / and mondoarchiving it nightly, and rsync'ing /everythingelse seems to be doing the job. Perhaps even keeping the 'after the failure' complexity level down. On Fri, Jun 3, 2016 at 3:30 PM, B. S. <bs27...@gmail.com> wrote: Hallo. I'm continuing on sinking in to btrfs, so pointers to concise help articles appreciated. I've got a couple new home systems, so perhaps it's time
Re: Pointers to mirroring partitions (w/ encryption?) help?
Here's some thoughts: > Assume a CD sized (680MB) /boot Some distros carry patches for grub that allow booting from Btrfs, so no separate /boot file system is required. (Fedora does not; Ubuntu -- and therefore probably all Debians -- does.) > perhaps a 200MB (?) sized EFI partition Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB might be the max UEFI allows. > then creates another partition for mirroring, later. IIUC, btrfs add device > /dev/sda4 / is appropriate, then. Then running a balance seems recommended. Don't do this. It's not going to provide any additional protection that you can't do in a smarter way. If you only have one device and want data duplication, just use the `dup` data profile (settable via `balance`). In fact, by default Btrfs uses the `dup` profile for metadata (and `single` for data). You'll get all the data integrity benefits with `dup`. One of the best features and initally confusing things about Btrfs is how much is done "within" a file system. (There is a certain "the Btrfs way" to it.) > Confusing, however, is having those (both) partitions encrypted. Seems some > work is needed beforehand. But I've never done encryption. (This is moot if you go with `dup`.) It's actually quite easy with every major distro. If we're talking about a fresh install, the distro installer probably has full support for passphrase-based dm-crypt LUKS encryption, including multiple volumes sharing a passphrase. An existing install should be convertable without much trouble. It's ususally just a matter of setting up the container with `cryptsetup`, populating `/etc/crypttab`, possibly adding crypto modules to your initrd and/or updating settings, and rebuilding the initrd. (I have first-hand experience doing this on a Fedora install recently, and it took about half an hour and I knew nothing about Fedora's `dracut` initrd generator tool.) If you do need multiple encrypted file systems, simply use the same passphrase for all volumes (but never do this by cloning the LUKS headers). You'll only need to enter it once at boot. > The additional problem is most articles reference FDE (Full Disk Encryption) > - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having > problems finding concise links on the topics, -FDE -"Full Disk Encryption". Yeah, when it comes to FDE, you either have to make your peace with trusting the manufacturer, or you can't. If you are going to boot your system with a traditional boot loader, an unencrypted partition is mandatory. That being said, we live in a world with UEFI Secure Boot. While your EFI parition must be unencrypted vfat, you can sign the kernels (or shims), and the UEFI can be configured to only boot signed executables, including only those signed by your own key. Some distros already provide this feature, including using keys probably already trusted by the default keystore. > mirror subvolumes (or it inherently comes along for the ride?) Yes, that is correct. Just to give some more background: the data and metadata profiles control "mirroring," and they are set at the file system level. Subvolumes live entirely within one file system, so whatever profile is set in the FS applies to subvolumes. > So, I could take an HD, create partitions as above (how? e.g. Set up > encryption / btrfs mirror volumes), then clonezilla (?) partitions from a > current machine in. Are you currently using Btrfs? If so, use Btrfs' `send` and `receive` commands. That should be lot friendlier to your SSD. (I'll take this opportunity to say that you need to consider the `discard` mount *and* `/etc/crypttab` options. Discard -- or scheduling `fstrim` -- is extremely important to maintain optimal performance of a SSD, but there are some privacy trade-offs on encrypted systems.) If not, then `cp -a` or similar will work. Obviously, you'll have to get your boot mechanism and file system identifiers updated in addition to `/etc/crypttab` described above. Lastly, strongly consider `autodefrag` and possibly setting some highly violatile -- but *unimportant* -- directories to `nodatacow` via purging and `chattr +C`. (I do this for ~/.cache and /var/cache.) > Yet not looking to put in a 2nd HD If you change your mind and decide on a backup device, or even if you just want local backup snapshots, one of the best snapshot managers is btrfs-sxbackup (no association with the FS project). On Fri, Jun 3, 2016 at 3:30 PM, B. S. <bs27...@gmail.com> wrote: > Hallo. I'm continuing on sinking in to btrfs, so pointers to concise help > articles appreciated. I've got a couple new home systems, so perhaps it's > time to investigate encryption, and given the bit rot I've seen here, > perhaps time to mirror volumes so the wonderful btrfs self-healing > facilities can be taken advantage of. > > Problem with today's hard drives, a quick look at C
Pointers to mirroring partitions (w/ encryption?) help?
Hallo. I'm continuing on sinking in to btrfs, so pointers to concise help articles appreciated. I've got a couple new home systems, so perhaps it's time to investigate encryption, and given the bit rot I've seen here, perhaps time to mirror volumes so the wonderful btrfs self-healing facilities can be taken advantage of. Problem with today's hard drives, a quick look at Canada Computer shows the smallest drives 500GB, 120GB SSDs, far more than the 20GB or so an OS needs. Yet not looking to put in a 2nd HD, either. It feels like mirroring volumes makes sense. (EFI [partitions] also seem to be sticking their fingers in here.] Assume a CD sized (680MB) /boot, and perhaps a 200MB (?) sized EFI partition, it seems to me one sets up / as usual (less complex install), then creates another partition for mirroring, later. IIUC, btrfs add device /dev/sda4 / is appropriate, then. Then running a balance seems recommended. Confusing, however, is having those (both) partitions encrypted. Seems some work is needed beforehand. But I've never done encryption. I have come across https://github.com/gebi/keyctl_keyscript, so I understand there will be gotchas to deal with - later. But not there yet, and not real sure how to start. The additional problem is most articles reference FDE (Full Disk Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having problems finding concise links on the topics, -FDE -"Full Disk Encryption". Any good links to concise instructions on building / establishing encrypted btrfs mirror volumes? dm_crypt seems to be the basis, and not looking to add LVM, seems an unnecessary extra layer of complexity. It also feels like I could mkfs.btrfs /dev/sda3 /dev/sda4, then mirror subvolumes (or it inherently comes along for the ride?) - so my confusion level increases. Especially if encryption is added to the mix. So, I could take an HD, create partitions as above (how? e.g. Set up encryption / btrfs mirror volumes), then clonezilla (?) partitions from a current machine in. I assume mounting a live cd then cp -a from old disk partition to new disk partition won't 'just work'. (?) Article suggestions? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help ! "btrfs check" looping recursive
Hi there, Thanks for your reply Duncan ! Le 15/04/2016 02:24, Duncan wrote : > Swâmi Petaramesh posted on Thu, 14 Apr 2016 18:56:29 +0200 as excerpted: > >> It seems that i have a "btrfs check" process that’s stuck in an infinite >> recursive loop… > Given the prompt above, you're running from parted-magic, but that > doesn't tell us the btrfs-progs or kernel versions unless we look it up. True, I forgot to specify this. This FS is from a machine that currently runs : 4.5.0-1-ARCH As I had a couple of "dead" files (KDE session files in ~/.config/session) that showed "" for all their attributes and couldn’t be accessed nor deleted, I ran "btrfs check" from a reasonably recent live Parted Magic, which has : - Kernel : 4.3.2 - BTRFS tools : 4.1.2 > So kernel and btrfs-progs version? Also, btrfs filesystem show output > might be useful. Taken from the currently running machine (as I in the end choosed to abort the "btrfs check" using ^C) : # btrfs fi sh Label: 'LINUX' uuid: 13c87f57-3a85-4daf-a4bf-ba777407c169 Total devices 1 FS bytes used 268.07GiB devid1 size 334.50GiB used 294.54GiB path /dev/mapper/VGZ-LINUX # btrfs fi df / Data, single: total=289.46GiB, used=264.17GiB System, DUP: total=32.00MiB, used=56.00KiB Metadata, single: total=5.01GiB, used=3.90GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs fi us / Overall: Device size: 334.50GiB Device allocated:294.54GiB Device unallocated: 39.96GiB Device missing: 0.00B Used:268.07GiB Free (estimated): 65.26GiB (min: 45.28GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:289.46GiB, Used:264.17GiB /dev/mapper/VGZ-LINUX 289.46GiB Metadata,single: Size:5.01GiB, Used:3.90GiB /dev/mapper/VGZ-LINUX 5.01GiB System,DUP: Size:32.00MiB, Used:56.00KiB /dev/mapper/VGZ-LINUX 64.00MiB Unallocated: /dev/mapper/VGZ-LINUX 39.96GiB # df -h / Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur /dev/mapper/VGZ-LINUX 335G269G 66G 81% / > Btrfs-progs version in particular, since the recursive nature of this > loop is very obviously a bug. I hope I gave all the necessary information now. TIA and best regards. ॐ -- Swâmi PetarameshPGP 9076E32E -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help ! "btrfs check" looping recursive
Swâmi Petaramesh posted on Thu, 14 Apr 2016 18:56:29 +0200 as excerpted: > It seems that i have a "btrfs check" process that’s stuck in an infinite > recursive loop… > > How could I end this without breaking my filesystem ? ... > root@PartedMagic:~# btrfs check --repair /dev/VGZ/LINUX > enabling repair mode [...] [Just a btrfs user and list regular myself, not a btrfs dev and not at a level to specifically answer the question.] Given the prompt above, you're running from parted-magic, but that doesn't tell us the btrfs-progs or kernel versions unless we look it up. So kernel and btrfs-progs version? Also, btrfs filesystem show output might be useful. (Tho in this specific context, kernel version isn't as useful as normal, since unlike many btrfs commands that simply call kernel code to do the real work, check code is all userspace. But it's can't hurt to post it. Similarly, btrfs fi df to compliment btrfs fi show, or btrfs fi usage to output the same information as both, would in other contexts be useful, but they require a mounted filesystem, not something you can really even try with check running.) Btrfs-progs version in particular, since the recursive nature of this loop is very obviously a bug. If it's a current progs version, the bug may have been recently introduced. If it's a dated version, the bug may have already been fixed. (Either way, it may be that someone else will recognize the bug and tell you to try a later/earlier version, or if not, you very well may prompt a new patch, possibly after some further debugging.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help ! "btrfs check" looping recursive
Hi folks, It seems that i have a "btrfs check" process that’s stuck in an infinite recursive loop… How could I end this without breaking my filesystem ? Help much needed & appreciated… TIA. Kind regards. root@PartedMagic:~# btrfs check --repair /dev/VGZ/LINUX enabling repair mode Checking filesystem on /dev/VGZ/LINUX UUID: 13c87f57-3a85-4daf-a4bf-ba777407c169 checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots Deleting bad dir index [2188127,96,2152] root 267 Deleting bad dir index [2188127,96,2155] root 267 Deleting bad dir index [2188127,96,2152] root 40298 Deleting bad dir index [2188127,96,2155] root 40298 Deleting bad dir index [2188127,96,2152] root 40761 Deleting bad dir index [2188127,96,2155] root 40761 reset isize for dir 2188127 root 40815 Trying to rebuild inode:8089093 Can't determint the filetype for inode 8089093, assume it is a normal file Can't get file name for inode 8089093, using '8089093' as fallback Can't get file type for inode 8089093, using FILE as fallback Moving file '8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 Trying to rebuild inode:8089098 Can't determint the filetype for inode 8089098, assume it is a normal file Can't get file name for inode 8089098, using '8089098' as fallback Can't get file type for inode 8089098, using FILE as fallback Moving file '8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 root 40815 inode 8089093 errors 10, odd dir item Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 root 40815 inode 8089098 errors 10, odd dir item Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 root 40815 inode 8089093 errors 10, odd dir item Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 root 40815 inode 8089098 errors 10, odd dir item Deleting bad dir index [2188127,96,2152] root 40815 Deleting bad dir index [2188127,96,2155] root 40815 reset isize for dir 2188127 root 40815 Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093.8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098.8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093.8089093.8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098.8089098.8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 Deleting bad dir index [2188127,96,2152] root 40869 Deleting bad dir index [2188127,96,2155] root 40869 Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093.8089093.8089093.8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098.8089098.8089098.8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093.8089093.8089093.8089093.8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098.8089098.8089098.8089098.8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 Can't get file name for inode 8089093, using '8089093' as fallback Moving file '8089093.8089093.8089093.8089093.8089093.8089093.8089093.8089093' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089093 Can't get file name for inode 8089098, using '8089098' as fallback Moving file '8089098.8089098.8089098.8089098.8089098.8089098.8089098.8089098' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 8089098 Deleting bad dir index [2188127,96,2152] root 40905 Deleting bad dir index [2188127,96,2155] root 40905 Can't get file name for inode
Re: Re: unable to mount btrfs partition, please help :(
On Sun, Mar 20, 2016 at 1:31 PM, Patrick Tschackert <killing-t...@gmx.de> wrote: > My raid is done with the scrub now, this is what i get: > > $ cat /sys/block/md0/md/mismatch_cnt > 311936608 I think this is an assembly problem. Read errors don't result in mismatch counts. An md mismatch count happens when there's a mismatch between data strip and parity strip(s). So this is a lot of mismatches. I think you need to take this problem to the linux-raid@ list, I don't think anyone on this list is going to be able to help with this portion of the problem. I'm only semi-literate with this, and you need to find out why there are so many mismatches and confirm whether the array is being assembled correctly. In your writeup for the list you can include the URL for the first post to this list. I wouldn't repeat any of the VM crashing stuff because it's not really relevant. You'll need to include the kernel you were using at the time of the problem, the kernel you're using for the scrub, the version of mdadm, and all the device metadata (-E for each device) and the array (-D), and smartctl -A for each device (you could put smartctl -x for each drive into a file and the put the file up somewhere like dropbox or google drive, or individually pastebin them if you can keep it all separate, -x is really verbose but sometimes contains read error information) to show bad sectors. The summary line is basically: this was working, after a VM crash followed by shutdown -r now, the Btrfs filesystem won't mount. A drive was faulty and rebuilt with a spare. You just did a check scrub and have all these errors in mismatch_cnt. The question is: how to confirm the array is properly assembled? Because that's too many errors, and the file system on that array will not mount. Further complicating matters is even after rebuild you have another drive that has some read errors. Those weren't being fixed this whole time (during rebuild for example) likely because of the timeout vs SCT ERC misconfiguration, other wise they would have been fixed. > > I also attached my dmesg output to this mail. Here's an excerpt: > [12235.372901] sd 7:0:0:0: [sdh] tag#15 FAILED Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > [12235.372906] sd 7:0:0:0: [sdh] tag#15 Sense Key : Medium Error [current] > [descriptor] > [12235.372909] sd 7:0:0:0: [sdh] tag#15 Add. Sense: Unrecovered read error - > auto reallocate failed > [12235.372913] sd 7:0:0:0: [sdh] tag#15 CDB: Read(16) 88 00 00 00 00 00 af b2 > bb 48 00 00 05 40 00 00 > [12235.372916] blk_update_request: I/O error, dev sdh, sector 2947727304 > [12235.372941] ata8: EH complete > [12266.856747] ata8.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action > 0x0 > [12266.856753] ata8.00: irq_stat 0x4008 > [12266.856756] ata8.00: failed command: READ FPDMA QUEUED > [12266.856762] ata8.00: cmd 60/40:d8:08:17:b5/05:00:af:00:00/40 tag 27 ncq > 688128 in > res 41/40:00:18:1b:b5/00:00:af:00:00/40 Emask 0x409 (media error) > [12266.856765] ata8.00: status: { DRDY ERR } > [12266.856767] ata8.00: error: { UNC } > [12266.858112] ata8.00: configured for UDMA/133 What do you get for smartctl -x /dev/sdh I see this too: [11440.088441] ata8.00: status: { DRDY } [11440.088443] ata8.00: failed command: READ FPDMA QUEUED [11440.088447] ata8.00: cmd 60/40:c8:e8:bc:15/05:00:ab:00:00/40 tag 25 ncq 688128 in res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error) That's weird. You have several other identical model drives, so I doubt this is some sort of NCQ incompatibility with this model drive, no other drive is complaining like this. So I wonder if there's just something wrong with this drive aside from the bad sectors (?) I can't really tell but it's suspicious. > If I understand correctly, my /dev/sdh drive is having trouble. > Could this be the problem? Should I set the drive to failed and rebuild on a > spare disk? You need to really slow down and understand the problem first. Every data loss case I've ever come across with md/mdadm raid6 was user induced because they changed too much stuff too fast without consulting people who know better. They got impatient. So I suggest going to the linux-raid@ list and asking there what's going on. The less you change the better because most of the changes md/mdadm does are irreversible. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Sun, Mar 20, 2016 at 6:19 AM, Martin Steigerwaldwrote: > On Sonntag, 20. März 2016 10:18:26 CET Patrick Tschackert wrote: >> > I think in retrospect the safe way to do these kinds of Virtual Box >> > updates, which require kernel module updates, would have been to >> > shutdown the VM and stop the array. *shrug* >> >> >> After this, I think I'll just do away with the virtual machine on this host, >> as the app contained in that vm can also run on the host. I tried to be >> fancy, and it seems to needlessly complicate things. > > I am not completely sure and I have no exact reference anymore, but I think I > read more than once about fs benchmarks running faster in Virtualbox than on > the physical system, which may point at an at least incomplete fsync() > implementation for writing into Virtualbox image files. > > I never found any proof of this nor did I specificially seeked to research it. > So it may be true or not. Sure but that would only affect the guest's file system, the one inside the VDI. It's the host managed filesystem that's busted. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Sun, Mar 20, 2016 at 3:18 AM, Patrick Tschackertwrote: > Thanks for answering again! > So, first of all I installed a newer kernel from the backports as per > Nicholas D Steeves suggestion: > > $ apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64 > > After rebooting: > $ uname -a > Linux vmhost 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) > x86_64 GNU/Linux > > But the problem with mounting the filesystem persists :( > >> OK I went back and read this again: host is managing the md raid5, the >> guest is writing Btrfs to an "encrypted container" but what is that? A >> LUKS encrypted LVM LV that's directly used by Virtual Box as a raw >> device? It's hard to say what layer broke this. But the VM crashing is >> in effect like a power failure, and it's an open question (for me) how >> this setup deals with barriers. A shutdown -r now should still cleanly >> stop the array so I wouldn't expect there to be an array problem but >> then you also report a device failure. Bad luck. > > The host is managing an md raid 6 (/dev/md0), and I had an encrypted volume > (via cryptsetup) on top of that device. > The host mounted the btrfs filesystem contained in that volume, and the VM > wrote to the filesystem as well using a virtualbox shared folder. OK well to me the VM doesn't seem related off hand. Ultimately its only the host writing to the filesystem, even for the shared folder. The guest VM has no direct access to do Btrfs writes, it's something like a network-like shared folder. > After this, I think I'll just do away with the virtual machine on this host, > as the app contained in that vm can also run on the host. > I tried to be fancy, and it seems to needlessly complicate things. virt-manager or gnome-boxes work better, although you lose shared folder, you'll have to come up with a work around, like using NFS. > $ for i in /sys/class/scsi_generic/*/device/timeout; do echo 120 > "$i"; done > (I know this isn't persistent across reboots...) Correct. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Sonntag, 20. März 2016 10:18:26 CET Patrick Tschackert wrote: > > I think in retrospect the safe way to do these kinds of Virtual Box > > updates, which require kernel module updates, would have been to > > shutdown the VM and stop the array. *shrug* > > > After this, I think I'll just do away with the virtual machine on this host, > as the app contained in that vm can also run on the host. I tried to be > fancy, and it seems to needlessly complicate things. I am not completely sure and I have no exact reference anymore, but I think I read more than once about fs benchmarks running faster in Virtualbox than on the physical system, which may point at an at least incomplete fsync() implementation for writing into Virtualbox image files. I never found any proof of this nor did I specificially seeked to research it. So it may be true or not. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
Thanks for answering, I already upgraded to a backports kernel as mentioned here: https://mail-archive.com/linux-btrfs@vger.kernel.org/msg51748.html I now have $ uname -a Linux vmhost 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) x86_64 GNU/Linux As I wrote here https://mail-archive.com/linux-btrfs@vger.kernel.org/msg51748.html the problem still persists :( Cheers, Patrick Gesendet: Sonntag, 20. März 2016 um 13:11 Uhr Von: "Martin Steigerwald" <mar...@lichtvoll.de> An: "Chris Murphy" <li...@colorremedies.com> Cc: "Patrick Tschackert" <killing-t...@gmx.de>, "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Betreff: Re: unable to mount btrfs partition, please help :( On Samstag, 19. März 2016 19:34:55 CET Chris Murphy wrote: > >>> $ uname -a > >>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 > >>> (2016-02-29) x86_64 GNU/Linux > >> > >>This is old. You should upgrade to something newer, ideally 4.5 but > >>4.4.6 is good also, and then oldest I'd suggest is 4.1.20. > >> > > Shouldn't I be able to get the newest kernel by executing "apt-get update > > && apt-get dist-upgrade"? That's what I ran just now, and it doesn't > > install a newer kernel. Do I really have to manually upgrade to a newer > > one? > I'm not sure. You might do a list search for debian, as I know debian > users are using newer kernels that they didn't build themselves. Try a backport¹ kernel. Add backports and do apt-cache search linux-image I use 4.3 backport kernel successfully on two server VMs which use BTRFS. [1] http://backports.debian.org/ Thx, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Samstag, 19. März 2016 19:34:55 CET Chris Murphy wrote: > >>> $ uname -a > >>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 > >>> (2016-02-29) x86_64 GNU/Linux > >> > >>This is old. You should upgrade to something newer, ideally 4.5 but > >>4.4.6 is good also, and then oldest I'd suggest is 4.1.20. > >> > > Shouldn't I be able to get the newest kernel by executing "apt-get update > > && apt-get dist-upgrade"? That's what I ran just now, and it doesn't > > install a newer kernel. Do I really have to manually upgrade to a newer > > one? > I'm not sure. You might do a list search for debian, as I know debian > users are using newer kernels that they didn't build themselves. Try a backport¹ kernel. Add backports and do apt-cache search linux-image I use 4.3 backport kernel successfully on two server VMs which use BTRFS. [1] http://backports.debian.org/ Thx, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
Thanks for answering again! So, first of all I installed a newer kernel from the backports as per Nicholas D Steeves suggestion: $ apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64 After rebooting: $ uname -a Linux vmhost 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) x86_64 GNU/Linux But the problem with mounting the filesystem persists :( > OK I went back and read this again: host is managing the md raid5, the > guest is writing Btrfs to an "encrypted container" but what is that? A > LUKS encrypted LVM LV that's directly used by Virtual Box as a raw > device? It's hard to say what layer broke this. But the VM crashing is > in effect like a power failure, and it's an open question (for me) how > this setup deals with barriers. A shutdown -r now should still cleanly > stop the array so I wouldn't expect there to be an array problem but > then you also report a device failure. Bad luck. The host is managing an md raid 6 (/dev/md0), and I had an encrypted volume (via cryptsetup) on top of that device. The host mounted the btrfs filesystem contained in that volume, and the VM wrote to the filesystem as well using a virtualbox shared folder. The vm then crashed, but I shut down the host with "shutdown -r now". After the reboot, one disk of the array was no longer present, but I managed to rebuild/restore using a spare disk. The RAID now seems to be healthy. > I think in retrospect the safe way to do these kinds of Virtual Box > updates, which require kernel module updates, would have been to > shutdown the VM and stop the array. *shrug* After this, I think I'll just do away with the virtual machine on this host, as the app contained in that vm can also run on the host. I tried to be fancy, and it seems to needlessly complicate things. > These drives are technically not suitable for use in any kind of raid > except linear and raid 0 (which have no redundancy so they aren't > really raid). You'd have to dig up drive specs, assuming they're > published, to see what the recovery times are for the drive models > when a bad sector is encountered. But it's typical for such drives to > exceed 30 seconds for recovery, with some drives reported to have 2+ > minute recoveries. To properly configure them, you'll have to increase > the kernel's SCSI comment timer to at least 120 to make sure there's > sufficient time to wait for the drive to explicitly spit back a read > error to the kernel. Otherwise, the kernel gives up after 30 seconds, > and resets the link to the drive, and any possibility of fixing up the > bad sector via the raid read error fixup mechanism is thwarted. It's > really common, the linux-raid@ list has many of these kinds of threads > with this misconfiguration as the source problem. > For the first listing of drives yes. And 120 second delays might be > too long for your use case, but that's the reality. > You should change the command timer for the drives that do not support > configurable SCT ERC. And then do a scrub check. And then check both > cat /sys/block/mdX/md/mismatch_cnt, which ideally should be 0, and > also check kernel messages for libata read errors. So I did this: $ cat /sys/block/md0/md/mismatch_cnt 0 $ for i in /sys/class/scsi_generic/*/device/timeout; do echo 120 > "$i"; done (I know this isn't persistent across reboots...) $ echo check > /sys/block/md0/md/sync_action $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sda[0] sdf[12](S) sdg[11](S) sdj[9] sdh[7] sdi[6] sdk[10] sde[4] sdd[3] sdc[2] sdb[1] 20510948416 blocks super 1.2 level 6, 64k chunk, algorithm 2 [9/9] [U] [>] check = 1.0% (30812476/2930135488) finish=340.6min speed=141864K/sec unused devices: So the raid is currently doing a scrub, which will take a few hours. > Hmm not good. See this similar thread. > http://www.spinics.net/lists/linux-btrfs/msg51711.html > backups in all superblocks have the same chunk_root, no alternative > chunk root to try. > So at the moment I think it's worth trying a newer kernel version and > mounting normally; then mounting with -o recovery; then - recovery,ro. > If that doesn't work, you're best off waiting for a developer to give > advice on the next step; 'btrfs rescue chunk-recover' seems most > appropriate but again someone else a while back had success with > zero-log, but it's hard to say if the two cases are really similar and > maybe that person just got lucky. Both of those change the file system > in irreversible ways, that's why I suggest waiting or asking on IRC. Thanks again for taking the time to answer. I'll wait while my RAID is doing the scrub, maybe a dev will answer (like you said). The friendly people on IRC couldn't help and sent me here. -- To unsubscrib
Re: unable to mount btrfs partition, please help :(
Patrick Tschackert posted on Sat, 19 Mar 2016 23:15:33 +0100 as excerpted: > I'm growing increasingly desperate, can anyone help me? No need to be desperate. As the sysadmin's rule of backups states, simple form, you either have at least one level of backup, or you are by your (in)action defining the data not backed up as worth less than the time, hassle and resources necessary to do that backup. Therefore, there are only two possibilities: 1) You have a backup. No sweat. You can use it if you need to, so no desperation needed. 2) You don't have a backup. No sweat. By not having a backup, your actions defined the data at risk as worth less than the time, hassle and resources necessary for that backup, so if you lose the data, you can still be happy, because you saved what you defined as of most importance, the time, resources and hassle of doing that backup. Since you saved what you yourself defined by your own actions as of most value to you, either way, you have what was most valuable to you and can thus be happy to have the valuable stuff, even if you lost what was therefore much more trivial. There are no other possibilities. Your words might lie. Your actions don't. Either way, you saved the valuable stuff and thus have no reason to be desperate. And of course, btrfs, while stabilizing, is not yet fully stable and mature, and while stable enough to be potentially suitable for those who have tested backups or are only using it with trivial data they can afford to lose anyway, if they don't have backups, it's certainly not to the level of stability of the more mature filesystems the above sysadmin's rule of backups was designed for. So that rule applies even MORE strongly to btrfs than it does to more mature and stable filesystems. (FWIW, there's a more complex version of the rule that takes relative risk into account and covers multiple levels of backup where either the risk is high enough or the data valuable enough to warrant it, but the simple form just says if you don't have at least one backup, you are by that lack of backup defining the data at risk as not worth the time and trouble to do it.) And there's no way that not knowing the btrfs status changes that either, because if you didn't know the status, it can only be because you didn't care enough about the reliability of the filesystem you were entrusting your data to, to care about researching it. After all, both the btrfs wiki and the kernel btrfs option stress the need for backups if you're choosing btrfs, as does this list, repeatedly. So the only way someone couldn't know is if they didn't care enough to /bother/ to know, which again defines the data stored on the filesystem as of only trivial value, worth so little it's not worth researching a new filesystem you plan on storing it on. So there's no reason to be desperate. It'll only stress you out and increase your blood pressure. Either you considered the data valuable enough to have a backup, or you didn't. There is no third option. And either way, it's not worth stressing out over, because you either have that backup and thus don't need to stress, or you yourself defined the data as trivial by not having it. > $ uname -a Linux vmhost 3.16.0-4-amd64 #1 SMP Debian > 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux > > $ btrfs --version btrfs-progs v4.4 As CMurphy says, that's an old kernel, not really supported by the list. With btrfs still stabilizing, the code is still changing pretty fast, and old kernels are known buggy kernels. The list focuses on the mainline kernel and its two primary tracks, LTS kernel series and current kernel series. On the current kernel track, the last two kernels are best supported. With 4.5 just out, that's 4.5 and 4.4. On the LTS track, the two latest LTS kernel series are recommended, with 4.4 being the latest LTS kernel, and 4.1 being the one previous to that. However, 3.18 was the one previous to that and has been reasonably stable, so while the two latest LTS series remain recommended, we're still trying to support 3.18 too, for those who need that far back. But 3.16 is previous to that and is really too far back to be practically supported well by the list, as btrfs really is still stabilizing and our focus is forward, not backward. That doesn't mean we won't try to support it, it simply means that when there's a problem, the first recommendation, as you've seen, is likely to be try a newer kernel. Of course various distros do offer support for btrfs on older kernels and we recognize that. However, our focus is on mainline, and we don't track what patches the various distros have backported and what patches they haven't, so we're not in a particularly good position to provide support for them, at least back further than the mainline kernels we support. If you wish to use btrfs on such old kernels, then, our recommendation is to g
Re: unable to mount btrfs partition, please help :(
On 19 March 2016 at 21:34, Chris Murphywrote: > On Sat, Mar 19, 2016 at 5:35 PM, Patrick Tschackert > wrote: $ uname -a Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux >>>This is old. You should upgrade to something newer, ideally 4.5 but >>>4.4.6 is good also, and then oldest I'd suggest is 4.1.20. >> >> Shouldn't I be able to get the newest kernel by executing "apt-get update && >> apt-get dist-upgrade"? >> That's what I ran just now, and it doesn't install a newer kernel. Do I >> really have to manually upgrade to a newer one? > > I'm not sure. You might do a list search for debian, as I know debian > users are using newer kernels that they didn't build themselves. > > >> On top of the sticky situation i'm already in, i'm not sure if I trust >> myself manually building a new kernel. Should I? If you enable Debian backports, which I assume you have since you're running the version of btrfs-progs that was backported without a warning not to use it with old kernels...well, if backports are enabled then you can try: apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64 linux-4.3.x was a complete mess for both my laptop (Thinkpad X220, quite well supported), and I'm not sure if it was driver-related or btrfs-related. I actually started tracking linux-4.4 at rc1, it was so bad. If you don't want to try building your own kernel, I'd file a bug report against linux-image-amd64 asking for a backport of linux-4.4, which is in Stretch/testing; I'm surprised it hasn't been backported yet... The only issue I remember is an error message when booting, I think because the microcode interface changed between 4.3.x and 4.4.x. Installing microcode-related packages from backports is how think I worked around this. Alternatively, if you want to build your own kernel you might be able to install linux-image from backports, download and untar linux-4.1.x somewhere, and then copy the config from /boot/config-4.3* to somedir/linux-4.1.x/.config. I uploaded two scripts to github that I've been using for ages to track the upstream LTS kernel branch that Debian didn't choose. You can find them here: https://github.com/sten0/lts-convenience All those syncs and btrfs sub sync lines are there because I always seem to run strange issues with adding and removing snapshots. Cheers, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Sat, Mar 19, 2016 at 5:35 PM, Patrick Tschackertwrote: > Hi Chris, > > thank you for answering so quickly! > >> Try 'btrfs check' without any options first. > $ btrfs check /dev/mapper/storage > checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 > checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 > bytenr mismatch, want=36340960788480, have=4530277753793296986 > Couldn't read chunk tree > Couldn't open file system > >> To me it seems the problem is instigated by lower layers either not >> completing critical writes at the time of the power failure, or didn't >> rebuild correctly. > > There wasn't a power failure, a VM crashed whilst writing to the btrfs > filesys. OK I went back and read this again: host is managing the md raid5, the guest is writing Btrfs to an "encrypted container" but what is that? A LUKS encrypted LVM LV that's directly used by Virtual Box as a raw device? It's hard to say what layer broke this. But the VM crashing is in effect like a power failure, and it's an open question (for me) how this setup deals with barriers. A shutdown -r now should still cleanly stop the array so I wouldn't expect there to be an array problem but then you also report a device failure. Bad luck. I think in retrospect the safe way to do these kinds of Virtual Box updates, which require kernel module updates, would have been to shutdown the VM and stop the array. *shrug* > >> You should check the SCT ERC setting on each drive with 'smartctl -l >> scterc /dev/sdX' and also the kernel command timer setting with 'cat >> /sys/block/sdX/device/timeout' also for each device. The SCT ERC value >> must be less than the command timer. It's a common misconfiguration >> with raid setups. > > $ smartctl -l scterc /dev/sda (sdb, sdc, sde, sdg) > gives me > > smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org > > SCT Error Recovery Control command not supported These drives are technically not suitable for use in any kind of raid except linear and raid 0 (which have no redundancy so they aren't really raid). You'd have to dig up drive specs, assuming they're published, to see what the recovery times are for the drive models when a bad sector is encountered. But it's typical for such drives to exceed 30 seconds for recovery, with some drives reported to have 2+ minute recoveries. To properly configure them, you'll have to increase the kernel's SCSI comment timer to at least 120 to make sure there's sufficient time to wait for the drive to explicitly spit back a read error to the kernel. Otherwise, the kernel gives up after 30 seconds, and resets the link to the drive, and any possibility of fixing up the bad sector via the raid read error fixup mechanism is thwarted. It's really common, the linux-raid@ list has many of these kinds of threads with this misconfiguration as the source problem. > > while > $ smartctl -l scterc /dev/sdf (sdh, sdi, sdj, sdk) > gives me > > smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org > > SCT Error Recovery Control: >Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) These drives are suitable for raid out of the box. > > $ cat /sys/block/sdX/device/timeout > gives me "30" for every device > > Does that mean my settings for the device timeouts are wrong? For the first listing of drives yes. And 120 second delays might be too long for your use case, but that's the reality. You should change the command timer for the drives that do not support configurable SCT ERC. And then do a scrub check. And then check both cat /sys/block/mdX/md/mismatch_cnt, which ideally should be 0, and also check kernel messages for libata read errors. > >> After that's fixed you should do a scrub, and I'm thinking it's best >> to do only a check, which means 'echo check > >> /sys/block/mdX/md/sync_action' rather than issuing repair which >> assumes data strips are correct and parity strips are wrong and >> rebuilds all parity strips. > > I don't quite understand, I thought a scrub could only be done on a mounted > filesys? You have two scrubs. There's a Btrfs scrub. And an md scrub. I'm referring to the latter. > Do you reall mean executing the command "echo check > > /sys/block/md0/md/sync_action"? At the moment it says "idle" in that file. > Also, the btrfs filesys sits in an encrypted container, so the setup looks > like this: > > /dev/md0 (this is the Raid device) > /dev/mapper/storage (after cryptsetup luksOpen, this is where filesys should > be mounted from) > /media/storage (i always mounted the filesystem into this folder by executing > "mount /dev/mapper/storage /media/storage") > > Apologies if I didn't make that clear enough in my initial email Ok so the host is writing Btrfs to
Re: unable to mount btrfs partition, please help :(
8 level: 3 backup_fs_root: 24022070902784 gen: 1322968 level: 3 backup_dev_root: 24014655901696 gen: 1275381 level: 2 backup_csum_root: 24022070956032 gen: 1322968 level: 4 backup_total_bytes: 21003208163328 backup_bytes_used: 17670808895488 backup_num_devices: 1 backup 1: backup_tree_root: 24022114037760 gen: 1322968 level: 2 backup_chunk_root: 36340959809536 gen: 1275381 level: 2 backup_extent_root: 24022186385408 gen: 1322969 level: 3 backup_fs_root: 24022186381312 gen: 1322969 level: 3 backup_dev_root: 24014655901696 gen: 1275381 level: 2 backup_csum_root: 24022186536960 gen: 1322969 level: 4 backup_total_bytes: 21003208163328 backup_bytes_used: 17670826078208 backup_num_devices: 1 backup 2: backup_tree_root: 24022309593088 gen: 1322969 level: 2 backup_chunk_root: 36340959809536 gen: 1275381 level: 2 backup_extent_root: 24022337949696 gen: 1322970 level: 3 backup_fs_root: 24022337937408 gen: 1322970 level: 3 backup_dev_root: 24014655901696 gen: 1275381 level: 2 backup_csum_root: 24022337990656 gen: 1322970 level: 4 backup_total_bytes: 21003208163328 backup_bytes_used: 17670866358272 backup_num_devices: 1 backup 3: backup_tree_root: 24021840482304 gen: 1322966 level: 2 backup_chunk_root: 36340959809536 gen: 1275381 level: 2 backup_extent_root: 24021883957248 gen: 1322967 level: 3 backup_fs_root: 24021883949056 gen: 1322967 level: 3 backup_dev_root: 24014655901696 gen: 1275381 level: 2 backup_csum_root: 24021884100608 gen: 1322967 level: 4 backup_total_bytes: 21003208163328 backup_bytes_used: 17670630260736 backup_num_devices: 1 On Sun, Mar 20, 2016 at 12:02 AM, Chris Murphy <li...@colorremedies.com> wrote: > On Sat, Mar 19, 2016 at 4:15 PM, Patrick Tschackert <killing-t...@gmx.de> > wrote: > >> I'm growing increasingly desperate, can anyone help me? I'm thinking >> of trying one or more of the following, but would like an informed >> opinion: >> 1) btrfs check --fix-crc >> 2) btrfs-check --init-csum-tree >> 3) btrfs rescue chunk-recover >> 4) btrfs-check --repair >> 5) btrfs rescue zero-log > > None of the above. Try 'btrfs check' without any options first. > > To me it seems the problem is instigated by lower layers either not > completing critical writes at the time of the power failure, or didn't > rebuild correctly. > > You should check the SCT ERC setting on each drive with 'smartctl -l > scterc /dev/sdX' and also the kernel command timer setting with 'cat > /sys/block/sdX/device/timeout' also for each device. The SCT ERC value > must be less than the command timer. It's a common misconfiguration > with raid setups. > > After that's fixed you should do a scrub, and I'm thinking it's best > to do only a check, which means 'echo check > > /sys/block/mdX/md/sync_action' rather than issuing repair which > assumes data strips are correct and parity strips are wrong and > rebuilds all parity strips. > > >> >> $ uname -a >> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 >> (2016-02-29) x86_64 GNU/Linux > > This is old. You should upgrade to something newer, ideally 4.5 but > 4.4.6 is good also, and then oldest I'd suggest is 4.1.20. > >> >> $ btrfs --version >> btrfs-progs v4.4 > > Good. > >> $ btrfs fi show >> Label: none uuid: 9868d803-78d1-40c3-b1ee-a4ce3363df87 >> Total devices 1 FS bytes used 16.07TiB >> devid 1 size 19.10TiB used 16.27TiB path /dev/mapper/storage >> >> excerpt from DMESG: >> [ 151.970916] BTRFS: device fsid 9868d803-78d1-40c3-b1ee-a4ce3363df87 >> devid 1 transid 1322969 /dev/dm-0 >> [ 163.105784] BTRFS info (device dm-0): disk space caching is enabled >> [ 165.304968] BTRFS: bad tree block start 4530277753793296986 36340960788480 >> [ 165.305233] BTRFS: bad tree block start 4530277753793296986 36340960788480 >> [ 165.305281] BTRFS: failed to read chunk tree on dm-0 >> [ 165.331407] BTRFS: open_ctree failed > > Yeah this isn't a good message typically. There's one surpri
Re: unable to mount btrfs partition, please help :(
On Sat, Mar 19, 2016 at 4:15 PM, Patrick Tschackert <killing-t...@gmx.de> wrote: > I'm growing increasingly desperate, can anyone help me? I'm thinking > of trying one or more of the following, but would like an informed > opinion: > 1) btrfs check --fix-crc > 2) btrfs-check --init-csum-tree > 3) btrfs rescue chunk-recover > 4) btrfs-check --repair > 5) btrfs rescue zero-log None of the above. Try 'btrfs check' without any options first. To me it seems the problem is instigated by lower layers either not completing critical writes at the time of the power failure, or didn't rebuild correctly. You should check the SCT ERC setting on each drive with 'smartctl -l scterc /dev/sdX' and also the kernel command timer setting with 'cat /sys/block/sdX/device/timeout' also for each device. The SCT ERC value must be less than the command timer. It's a common misconfiguration with raid setups. After that's fixed you should do a scrub, and I'm thinking it's best to do only a check, which means 'echo check > /sys/block/mdX/md/sync_action' rather than issuing repair which assumes data strips are correct and parity strips are wrong and rebuilds all parity strips. > > $ uname -a > Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 > (2016-02-29) x86_64 GNU/Linux This is old. You should upgrade to something newer, ideally 4.5 but 4.4.6 is good also, and then oldest I'd suggest is 4.1.20. > > $ btrfs --version > btrfs-progs v4.4 Good. > $ btrfs fi show > Label: none uuid: 9868d803-78d1-40c3-b1ee-a4ce3363df87 > Total devices 1 FS bytes used 16.07TiB > devid 1 size 19.10TiB used 16.27TiB path /dev/mapper/storage > > excerpt from DMESG: > [ 151.970916] BTRFS: device fsid 9868d803-78d1-40c3-b1ee-a4ce3363df87 > devid 1 transid 1322969 /dev/dm-0 > [ 163.105784] BTRFS info (device dm-0): disk space caching is enabled > [ 165.304968] BTRFS: bad tree block start 4530277753793296986 36340960788480 > [ 165.305233] BTRFS: bad tree block start 4530277753793296986 36340960788480 > [ 165.305281] BTRFS: failed to read chunk tree on dm-0 > [ 165.331407] BTRFS: open_ctree failed Yeah this isn't a good message typically. There's one surprising (to me) case where someone had luck getting this fixed with btrfs-zero-log which is unexpected. But I think it's very premature to make changes to the file system until you have more information. What do you get for btrfs-find-root /dev/mdX btrfs-show-super -fa /dev/mdX -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
unable to mount btrfs partition, please help :(
Hi, Apologies if this eMail reaches the mailing list multiple times, i can't seem to get throught to the mailing list, so I'm sending it through a different account now... I'm having problems mounting my BTRFS filesystem. Here's what happened: My BTRFS filesystem sits in an encrypted container on a linux software RAID 6. A VirtualBox crash (while writing to the filesystem I presume), and I rebooted the machine with "shutdown -r now", becaue a reboot was necessary due to upgraded VirtualBox drivers. When the system was running again, I couldn't mount the filesystem. This is what I did: $ cryptsetup luksOpen /dev/md0 storage (this worked fine) $ mount /dev/mapper/storage /media/storage mount: wrong fs type, bad option, bad superblock on /dev/mapper/storage, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. I then saw that one of the RAID's disks was no longer present in the array, so I started a rebuild/recover by executing mdadm --run. The RAID rebuilt itself using one of the spare disks. After the rebuild, the problem persists, I cannot mount my file system. Mounting with options "ro" and/or "recovery" makes no difference. I am unable to do a backup of the metadata: $ btrfs-image -c9 /dev/mapper/storage ~/btrfs_img checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 bytenr mismatch, want=36340960788480, have=4530277753793296986 Couldn't read chunk tree Open ctree failed create failed (Success) I'm growing increasingly desperate, can anyone help me? I'm thinking of trying one or more of the following, but would like an informed opinion: 1) btrfs check --fix-crc 2) btrfs-check --init-csum-tree 3) btrfs rescue chunk-recover 4) btrfs-check --repair 5) btrfs rescue zero-log Here is various info about my system as it is now, including the info requested on https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list. The full DMESG is attached to this eMail. $ btrfs restore -D /dev/mapper/storage /media/rest checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 bytenr mismatch, want=36340960788480, have=4530277753793296986 Couldn't read chunk tree Could not open root, trying backup super checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 bytenr mismatch, want=36340960788480, have=4530277753793296986 Couldn't read chunk tree Could not open root, trying backup super checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 bytenr mismatch, want=36340960788480, have=4530277753793296986 Couldn't read chunk tree Could not open root, trying backup super $ btrfs check --readonly /dev/mapper/storage checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 checksum verify failed on 36340960788480 found 8F8E1006 wanted 4AA1BC89 bytenr mismatch, want=36340960788480, have=4530277753793296986 Couldn't read chunk tree Couldn't open file system $ btrfs-show-super /dev/mapper/storage superblock: bytenr=65536, device=/dev/mapper/storage - csum 0xf3887f83 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 9868d803-78d1-40c3-b1ee-a4ce3363df87 label generation 1322969 root 24022309593088 sys_array_size 97 chunk_root_generation 1275381 root_level 2 chunk_root 36340959809536 chunk_root_level 2 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 21003208163328 bytes_used 17670843191296 sectorsize 4096 nodesize 4096 leafsize 4096 stripesize 4096 root_dir 6 num_devices 1 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x1 ( MIXED_BACKREF ) csum_type 0 csum_size 4 cache_generation 1322969 uuid_tree_generation 1322969 dev_item.uuid c1123f55-46ce-4931-8722-7387fee07608 dev_item.fsid 9868d803-78d1-40c3-b1ee-a4ce3363df87 [match] dev_item.type 0 dev_item.total_bytes 21003208163328 dev_item.bytes_used 17886424858624 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active (auto-read-only) raid6 sda[0] sdg[11](S) sdf[12](S) sdj[9] sdh[7] sdi[6] sdk[10] sde[4] sdd[3] sdc[2] sdb[1] 20510948416 blocks super 1.2 level 6, 64k chunk, algorithm 2 [9/9] [U] unused devices: $ mdadm -D /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Jun 14 18:47:44 2014 Raid Level : raid6 Array Size : 20510948416 (19560.77 GiB 21003.21 GB) Used Dev Size : 2930135488 (2794.40 GiB 3000.46 GB) Raid Devices : 9 Total Devices : 11 Persistence : Superblock is persistent Update Time : Sat Mar 19 13:5
Re: [PATCH v2 RESEND] btrfs: maintain consistency in logging to help debugging
On Thu, Mar 10, 2016 at 12:22:58PM +0800, Anand Jain wrote: > Optional Label may or may not be set, or it might be set at some time > later. However while debugging to search through the kernel logs the > scripts would need the logs to be consistent, so logs search key words > shouldn't depend on the optional variables, instead fsid is better. I think the label is a useful information, as it's set by the user. So if I'm looking to the log, I'll recognize the labels, not the device or fsid. It would be better to show all of them, ie. label, fsid, device and transid. The line will get longer, but I hope it's ok. Proposed order of the fields: - device PATH - devid ID - fsid UUID - transid TID > - if (disk_super->label[0]) { > - printk(KERN_INFO "BTRFS: device label %s ", > disk_super->label); > - } else { > - printk(KERN_INFO "BTRFS: device fsid %pU ", > disk_super->fsid); > - } > - > - printk(KERN_CONT "devid %llu transid %llu %s\n", devid, > transid, path); > + printk(KERN_INFO "BTRFS: device fsid %pU devid %llu transid > %llu %s\n", > + disk_super->fsid, devid, > transid, path); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 RESEND] btrfs: maintain consistency in logging to help debugging
On 03/17/2016 12:18 AM, David Sterba wrote: On Thu, Mar 10, 2016 at 12:22:58PM +0800, Anand Jain wrote: Optional Label may or may not be set, or it might be set at some time later. However while debugging to search through the kernel logs the scripts would need the logs to be consistent, so logs search key words shouldn't depend on the optional variables, instead fsid is better. I think the label is a useful information, as it's set by the user. So if I'm looking to the log, I'll recognize the labels, not the device or fsid. It would be better to show all of them, ie. label, fsid, device and transid. The line will get longer, but I hope it's ok. Proposed order of the fields: - device PATH - devid ID - fsid UUID - transid TID (I am not too particular about the below but just my opinion.) The patch titled in the ML: Btrfs: fix fs logging for multi device Would prefix BTRFS: : to most of the logs in dmesg. So I guess if we have following then BTRFS: : (*) is better. For end users, I hope we provide all those requisites through btrfs-progs cli, and they wouldn't have to review dmesg. Further, 'btrfs fi show' provides the FSID to label mapping. So I hope the next set of targeted community, the troubleshooters will be familiar with the FSID, and they could do dmesg | grep "BTRFS: :" To filter to get logs of one btrfs which they want to troubleshoot. (as there may be more than one btrfs in the system). [*] May be in future. (There is a bug that we might fails to know / assemble right set devices as per last assembled-volume and to fix this, its better create a new device UUID for the replace target device instead of copying the device UUID of source device (bit vague of now). If this is successful, then device UUID will be useful to printk here). Thanks, Anand - if (disk_super->label[0]) { - printk(KERN_INFO "BTRFS: device label %s ", disk_super->label); - } else { - printk(KERN_INFO "BTRFS: device fsid %pU ", disk_super->fsid); - } - - printk(KERN_CONT "devid %llu transid %llu %s\n", devid, transid, path); + printk(KERN_INFO "BTRFS: device fsid %pU devid %llu transid %llu %s\n", + disk_super->fsid, devid, transid, path); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 RESEND] btrfs: maintain consistency in logging to help debugging
Optional Label may or may not be set, or it might be set at some time later. However while debugging to search through the kernel logs the scripts would need the logs to be consistent, so logs search key words shouldn't depend on the optional variables, instead fsid is better. Signed-off-by: Anand Jain--- v2: fix commit log fs/btrfs/volumes.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index dc2db98..af176d6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -998,13 +998,8 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, ret = device_list_add(path, disk_super, devid, fs_devices_ret); if (ret > 0) { - if (disk_super->label[0]) { - printk(KERN_INFO "BTRFS: device label %s ", disk_super->label); - } else { - printk(KERN_INFO "BTRFS: device fsid %pU ", disk_super->fsid); - } - - printk(KERN_CONT "devid %llu transid %llu %s\n", devid, transid, path); + printk(KERN_INFO "BTRFS: device fsid %pU devid %llu transid %llu %s\n", + disk_super->fsid, devid, transid, path); ret = 0; } if (!ret && fs_devices_ret) -- 2.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Raid 6 corruption - please help with restore
On Wed, Mar 2, 2016 at 11:42 AM, Stuart Gittings <gitting...@gmail.com> wrote: > All devices are present. Btrfs if show is listed below and shows they are > all there. I'm afraid btrfs dev scan does not help What do you get for 'btrfs check' (do not use --repair yet) -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Raid 6 corruption - please help with restore
On Wed, Mar 2, 2016 at 3:47 AM, Stuart Gittings <gitting...@gmail.com> wrote: > Hi - I have some corruption on a 12 drive Raid 6 volume. Here's the > basics - if someone could help with restore it would save me a ton of > time (and some data loss - I have critical data backed up, but not > all). > > stuart@debian:~$ uname -a > Linux debian 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-7~bpo8+1 > (2016-01-19) x86_64 GNU/Linux > > stuart@debian:~$ sudo btrfs --version > btrfs-progs v4.4 > > sudo btrfs fi sh > Label: none uuid: 7f994e11-e146-4dee-80f0-c16ac3073e91 > Total devices 12 FS bytes used 14.25TiB > devid1 size 2.73TiB used 167.14GiB path /dev/sdc > devid2 size 5.46TiB used 1.75TiB path /dev/sdd > devid3 size 5.46TiB used 1.75TiB path /dev/sde > devid4 size 2.73TiB used 167.14GiB path /dev/sdn > devid5 size 5.46TiB used 1.75TiB path /dev/sdf > devid6 size 2.73TiB used 1.75TiB path /dev/sdm > devid9 size 2.73TiB used 1.75TiB path /dev/sdj > devid 10 size 2.73TiB used 1.75TiB path /dev/sdi > devid 11 size 2.73TiB used 1.75TiB path /dev/sdg > devid 13 size 2.73TiB used 1.75TiB path /dev/sdl > devid 14 size 2.73TiB used 1.75TiB path /dev/sdk > devid 15 size 2.73TiB used 1.75TiB path /dev/sdh > > sudo mount -t btrfs -oro,recover /dev/sdc /data > mount: wrong fs type, bad option, bad superblock on /dev/sdc, >missing codepage or helper program, or other error > >In some cases useful info is found in syslog - try >dmesg | tail or so. > > dmesg: > > [ 5642.118303] BTRFS info (device sdc): enabling auto recovery > [ 5642.118313] BTRFS info (device sdc): disk space caching is enabled > [ 5642.118316] BTRFS: has skinny extents > [ 5642.130145] btree_readpage_end_io_hook: 39 callbacks suppressed > [ 5642.130148] BTRFS (device sdc): bad tree block start > 13629298965300190098 47255853072384 > [ 5642.130759] BTRFS (device sdc): bad tree block start > 10584834564968318131 47255853105152 > [ 5642.131289] BTRFS (device sdc): bad tree block start > 2775635947161390306 47255853121536 > [ 5644.730012] BTRFS: bdev /dev/sdc errs: wr 1664846, rd 210656, flush > 18054, corrupt 0, gen 0 > [ 5644.801291] BTRFS (device sdc): bad tree block start > 8578409561856120450 47254279438336 > [ 5644.801304] BTRFS (device sdc): bad tree block start > 18087369170870825197 47254279454720 > [ 5644.831199] BTRFS (device sdc): bad tree block start > 9721403008164124267 47254277718016 > [ 5644.842763] BTRFS (device sdc): bad tree block start > 18087369170870825197 47254279454720 > [ 5644.891992] BTRFS (device sdc): bad tree block start > 17582844917171188859 47254194176000 > [ 5644.951366] BTRFS (device sdc): bad tree block start > 3962496226683925584 47254278586368 > [ 5645.097168] BTRFS (device sdc): bad tree block start > 17049293152820168762 47255619846144 > [ 5646.159819] BTRFS: Failed to read block groups: -5 > [ 5646.215905] BTRFS: open_ctree failed > stuart@debian:~$ > > Finally: > sudo btrfs restore /dev/sdc /backup > checksum verify failed on 47255853072384 found 70F58CCA wanted AE18D5BC > checksum verify failed on 47255853072384 found 70F58CCA wanted AE18D5BC > checksum verify failed on 47255853072384 found 805B1FF7 wanted B76A652F > checksum verify failed on 47255853072384 found 70F58CCA wanted AE18D5BC > bytenr mismatch, want=47255853072384, have=13629298965300190098 > Couldn't read chunk tree > Could not open root, trying backup super > warning, device 3 is missing > warning, device 2 is missing > warning, device 5 is missing > warning, device 4 is missing > bytenr mismatch, want=47255851761664, have=47255851958272 > Couldn't read chunk root > Could not open root, trying backup super > warning, device 3 is missing > warning, device 2 is missing > warning, device 5 is missing > warning, device 4 is missing > bytenr mismatch, want=47255851761664, have=47255851958272 > Couldn't read chunk root > Could not open root, trying backup super > Well there appear to be too many devices missing, I count four. What does 'btrfs fi show' look like? If there are missing devices, try 'btrfs dev scan' and then 'btrfs fi show' again and see if it changes. I don't think much can be done if there really are four missing devices. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS Raid 6 corruption - please help with restore
Hi - I have some corruption on a 12 drive Raid 6 volume. Here's the basics - if someone could help with restore it would save me a ton of time (and some data loss - I have critical data backed up, but not all). stuart@debian:~$ uname -a Linux debian 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19) x86_64 GNU/Linux stuart@debian:~$ sudo btrfs --version btrfs-progs v4.4 sudo btrfs fi sh Label: none uuid: 7f994e11-e146-4dee-80f0-c16ac3073e91 Total devices 12 FS bytes used 14.25TiB devid1 size 2.73TiB used 167.14GiB path /dev/sdc devid2 size 5.46TiB used 1.75TiB path /dev/sdd devid3 size 5.46TiB used 1.75TiB path /dev/sde devid4 size 2.73TiB used 167.14GiB path /dev/sdn devid5 size 5.46TiB used 1.75TiB path /dev/sdf devid6 size 2.73TiB used 1.75TiB path /dev/sdm devid9 size 2.73TiB used 1.75TiB path /dev/sdj devid 10 size 2.73TiB used 1.75TiB path /dev/sdi devid 11 size 2.73TiB used 1.75TiB path /dev/sdg devid 13 size 2.73TiB used 1.75TiB path /dev/sdl devid 14 size 2.73TiB used 1.75TiB path /dev/sdk devid 15 size 2.73TiB used 1.75TiB path /dev/sdh sudo mount -t btrfs -oro,recover /dev/sdc /data mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. dmesg: [ 5642.118303] BTRFS info (device sdc): enabling auto recovery [ 5642.118313] BTRFS info (device sdc): disk space caching is enabled [ 5642.118316] BTRFS: has skinny extents [ 5642.130145] btree_readpage_end_io_hook: 39 callbacks suppressed [ 5642.130148] BTRFS (device sdc): bad tree block start 13629298965300190098 47255853072384 [ 5642.130759] BTRFS (device sdc): bad tree block start 10584834564968318131 47255853105152 [ 5642.131289] BTRFS (device sdc): bad tree block start 2775635947161390306 47255853121536 [ 5644.730012] BTRFS: bdev /dev/sdc errs: wr 1664846, rd 210656, flush 18054, corrupt 0, gen 0 [ 5644.801291] BTRFS (device sdc): bad tree block start 8578409561856120450 47254279438336 [ 5644.801304] BTRFS (device sdc): bad tree block start 18087369170870825197 47254279454720 [ 5644.831199] BTRFS (device sdc): bad tree block start 9721403008164124267 47254277718016 [ 5644.842763] BTRFS (device sdc): bad tree block start 18087369170870825197 47254279454720 [ 5644.891992] BTRFS (device sdc): bad tree block start 17582844917171188859 47254194176000 [ 5644.951366] BTRFS (device sdc): bad tree block start 3962496226683925584 47254278586368 [ 5645.097168] BTRFS (device sdc): bad tree block start 17049293152820168762 47255619846144 [ 5646.159819] BTRFS: Failed to read block groups: -5 [ 5646.215905] BTRFS: open_ctree failed stuart@debian:~$ Finally: sudo btrfs restore /dev/sdc /backup checksum verify failed on 47255853072384 found 70F58CCA wanted AE18D5BC checksum verify failed on 47255853072384 found 70F58CCA wanted AE18D5BC checksum verify failed on 47255853072384 found 805B1FF7 wanted B76A652F checksum verify failed on 47255853072384 found 70F58CCA wanted AE18D5BC bytenr mismatch, want=47255853072384, have=13629298965300190098 Couldn't read chunk tree Could not open root, trying backup super warning, device 3 is missing warning, device 2 is missing warning, device 5 is missing warning, device 4 is missing bytenr mismatch, want=47255851761664, have=47255851958272 Couldn't read chunk root Could not open root, trying backup super warning, device 3 is missing warning, device 2 is missing warning, device 5 is missing warning, device 4 is missing bytenr mismatch, want=47255851761664, have=47255851958272 Couldn't read chunk root Could not open root, trying backup super Thanks in advance to anyone who might be able to suggest ideas. Stuart -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can you help explain these OOM crashes?
On Thu, Feb 25, 2016 at 02:42:51PM -0500, Chris Mason wrote: > Al, any ideas why get_anon_bdev is doing an atomic allocation here? > > if (ida_pre_get(_dev_ida, GFP_ATOMIC) == 0) Because set() callback of sget() runs under sb_lock - it must be atomic wrt scanning the list of superblock in search of match. And get_anon_bdev() is called from such callbacks... In principle, we could change locking rules for case when test callback is NULL, except that it's also called from ns_set_super(), which *does* come along with non-NULL test() (see mount_ns()), so that really doesn't help... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can you help explain these OOM crashes?
On Thu, Feb 25, 2016 at 11:20:29AM -0800, Marc MERLIN wrote: > Which kind of RAM am I missing? :) > > Thanks, > Marc > > [46320.200703] btrfs: page allocation failure: order:1, mode:0x2204020 > [46320.221174] CPU: 7 PID: 12576 Comm: btrfs Not tainted > 4.4.2-amd64-i915-volpreempt-20160213bc1 #3 > [46320.249161] Hardware name: System manufacturer System Product Name/P8H67-M > PRO, BIOS 3904 04/27/2013 > [46320.277878] 8801cdb636f0 8134ae0a > > [46320.301563] 8801cdb63788 81124ab6 0086 > 0086 > [46320.325248] 88021f5f5e00 fffe 8801cdb63750 > 8108c770 > [46320.348911] Call Trace: > [46320.357508] [] dump_stack+0x44/0x55 > [46320.374222] [] warn_alloc_failed+0x114/0x12c > [46320.393259] [] ? __wake_up+0x44/0x4b > [46320.410229] [] __alloc_pages_nodemask+0x7cb/0x84c > [46320.430671] [] kmem_getpages+0x5c/0x137 > [46320.448328] [] fallback_alloc+0x109/0x1b1 > [46320.466472] [] cache_alloc_node+0x123/0x130 > [46320.486219] [] kmem_cache_alloc+0xa4/0x14f > [46320.504600] [] ida_pre_get+0x32/0xb6 > [46320.521395] [] get_anon_bdev+0x1f/0xc8 Al, any ideas why get_anon_bdev is doing an atomic allocation here? if (ida_pre_get(_dev_ida, GFP_ATOMIC) == 0) [ rest of the oom below for reference ] -chris > [46320.538780] [] btrfs_init_fs_root+0x104/0x14e > [46320.557889] [] btrfs_get_fs_root+0xb7/0x1bf > [46320.576480] [] create_pending_snapshot+0x65e/0xb09 > [46320.596850] [] create_pending_snapshots+0x72/0x8e > [46320.616946] [] ? create_pending_snapshots+0x72/0x8e > [46320.637533] [] btrfs_commit_transaction+0x3a5/0x921 > [46320.658117] [] btrfs_mksubvol+0x2f4/0x408 > [46320.676044] [] ? wake_up_atomic_t+0x2c/0x2c > [46320.694626] [] > btrfs_ioctl_snap_create_transid+0x148/0x17a > [46320.716984] [] btrfs_ioctl_snap_create_v2+0xc7/0x110 > [46320.737714] [] btrfs_ioctl+0x545/0x2630 > [46320.755071] [] ? > mem_cgroup_charge_statistics.isra.23+0x33/0x69 > [46320.778689] [] ? __lru_cache_add+0x23/0x44 > [46320.796851] [] ? > lru_cache_add_active_or_unevictable+0x2d/0x6b > [46320.820156] [] ? set_pte_at+0x9/0xd > [46320.836407] [] ? handle_mm_fault+0x4f0/0xf06 > [46320.854969] [] ? do_mmap+0x2de/0x327 > [46320.871422] [] do_vfs_ioctl+0x3a1/0x414 > [46320.888953] [] ? __audit_syscall_entry+0xc0/0xe4 > [46320.908531] [] ? do_audit_syscall_entry+0x60/0x62 > [46320.928487] [] SyS_ioctl+0x57/0x79 > [46320.944553] [] entry_SYSCALL_64_fastpath+0x16/0x75 > [46320.964603] Mem-Info: > [46320.972173] active_anon:40431 inactive_anon:129104 isolated_anon:0 > [46320.972173] active_file:414564 inactive_file:956231 isolated_file:0 > [46320.972173] unevictable:1220 dirty:227385 writeback:4016 unstable:0 > [46320.972173] slab_reclaimable:46015 slab_unreclaimable:67059 > [46320.972173] mapped:11769 shmem:2295 pagetables:2380 bounce:0 > [46320.972173] free:13404 free_pcp:1790 free_cma:0 > [46321.081552] Node 0 DMA free:15888kB min:20kB low:24kB high:28kB > active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB > unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB > managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB > slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB > unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > [46321.209469] lowmem_reserve[]: 0 3201 7672 7672 > [46321.223638] Node 0 DMA32 free:27816kB min:4640kB low:5800kB high:6960kB > active_anon:72844kB inactive_anon:205672kB active_file:725732kB > inactive_file:1484744kB unevictable:1524kB isolated(anon):0kB > isolated(file):0kB present:3362068kB managed:3283032kB mlocked:1524kB > dirty:206660kB writeback:189832kB mapped:18112kB shmem:3456kB > slab_reclaimable:75980kB slab_unreclaimable:103932kB kernel_stack:4800kB > pagetables:3932kB unstable:0kB bounce:0kB free_pcp:2160kB local_pcp:380kB > free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > [46321.370242] lowmem_reserve[]: 0 0 4471 4471 > [46321.383748] Node 0 Normal free:14396kB min:6480kB low:8100kB high:9720kB > active_anon:88268kB inactive_anon:310756kB active_file:1001176kB > inactive_file:1915152kB unevictable:3356kB isolated(anon):0kB > isolated(file):0kB present:4708352kB managed:4578512kB mlocked:120259087644kB > dirty:274088kB writeback:255988kB mapped:28380kB shmem:5736kB > slab_reclaimable:108288kB slab_unreclaimable:163908kB kernel_stack:7312kB > pagetables:5540kB unstable:0kB bounce:0kB free_pcp:2468kB local_pcp:688kB > free_cma:2628kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > [46321.534455] lowmem_reserve[]: 0 0 0 0 > [46321.546480] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB > (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB > [46321.588226] Node 0 DMA32: 6462*4kB (UME) 1397*8kB (UM) 9*16kB (U)
Can you help explain these OOM crashes?
Which kind of RAM am I missing? :) Thanks, Marc [46320.200703] btrfs: page allocation failure: order:1, mode:0x2204020 [46320.221174] CPU: 7 PID: 12576 Comm: btrfs Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3 [46320.249161] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 [46320.277878] 8801cdb636f0 8134ae0a [46320.301563] 8801cdb63788 81124ab6 0086 0086 [46320.325248] 88021f5f5e00 fffe 8801cdb63750 8108c770 [46320.348911] Call Trace: [46320.357508] [] dump_stack+0x44/0x55 [46320.374222] [] warn_alloc_failed+0x114/0x12c [46320.393259] [] ? __wake_up+0x44/0x4b [46320.410229] [] __alloc_pages_nodemask+0x7cb/0x84c [46320.430671] [] kmem_getpages+0x5c/0x137 [46320.448328] [] fallback_alloc+0x109/0x1b1 [46320.466472] [] cache_alloc_node+0x123/0x130 [46320.486219] [] kmem_cache_alloc+0xa4/0x14f [46320.504600] [] ida_pre_get+0x32/0xb6 [46320.521395] [] get_anon_bdev+0x1f/0xc8 [46320.538780] [] btrfs_init_fs_root+0x104/0x14e [46320.557889] [] btrfs_get_fs_root+0xb7/0x1bf [46320.576480] [] create_pending_snapshot+0x65e/0xb09 [46320.596850] [] create_pending_snapshots+0x72/0x8e [46320.616946] [] ? create_pending_snapshots+0x72/0x8e [46320.637533] [] btrfs_commit_transaction+0x3a5/0x921 [46320.658117] [] btrfs_mksubvol+0x2f4/0x408 [46320.676044] [] ? wake_up_atomic_t+0x2c/0x2c [46320.694626] [] btrfs_ioctl_snap_create_transid+0x148/0x17a [46320.716984] [] btrfs_ioctl_snap_create_v2+0xc7/0x110 [46320.737714] [] btrfs_ioctl+0x545/0x2630 [46320.755071] [] ? mem_cgroup_charge_statistics.isra.23+0x33/0x69 [46320.778689] [] ? __lru_cache_add+0x23/0x44 [46320.796851] [] ? lru_cache_add_active_or_unevictable+0x2d/0x6b [46320.820156] [] ? set_pte_at+0x9/0xd [46320.836407] [] ? handle_mm_fault+0x4f0/0xf06 [46320.854969] [] ? do_mmap+0x2de/0x327 [46320.871422] [] do_vfs_ioctl+0x3a1/0x414 [46320.888953] [] ? __audit_syscall_entry+0xc0/0xe4 [46320.908531] [] ? do_audit_syscall_entry+0x60/0x62 [46320.928487] [] SyS_ioctl+0x57/0x79 [46320.944553] [] entry_SYSCALL_64_fastpath+0x16/0x75 [46320.964603] Mem-Info: [46320.972173] active_anon:40431 inactive_anon:129104 isolated_anon:0 [46320.972173] active_file:414564 inactive_file:956231 isolated_file:0 [46320.972173] unevictable:1220 dirty:227385 writeback:4016 unstable:0 [46320.972173] slab_reclaimable:46015 slab_unreclaimable:67059 [46320.972173] mapped:11769 shmem:2295 pagetables:2380 bounce:0 [46320.972173] free:13404 free_pcp:1790 free_cma:0 [46321.081552] Node 0 DMA free:15888kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [46321.209469] lowmem_reserve[]: 0 3201 7672 7672 [46321.223638] Node 0 DMA32 free:27816kB min:4640kB low:5800kB high:6960kB active_anon:72844kB inactive_anon:205672kB active_file:725732kB inactive_file:1484744kB unevictable:1524kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283032kB mlocked:1524kB dirty:206660kB writeback:189832kB mapped:18112kB shmem:3456kB slab_reclaimable:75980kB slab_unreclaimable:103932kB kernel_stack:4800kB pagetables:3932kB unstable:0kB bounce:0kB free_pcp:2160kB local_pcp:380kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [46321.370242] lowmem_reserve[]: 0 0 4471 4471 [46321.383748] Node 0 Normal free:14396kB min:6480kB low:8100kB high:9720kB active_anon:88268kB inactive_anon:310756kB active_file:1001176kB inactive_file:1915152kB unevictable:3356kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4578512kB mlocked:120259087644kB dirty:274088kB writeback:255988kB mapped:28380kB shmem:5736kB slab_reclaimable:108288kB slab_unreclaimable:163908kB kernel_stack:7312kB pagetables:5540kB unstable:0kB bounce:0kB free_pcp:2468kB local_pcp:688kB free_cma:2628kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [46321.534455] lowmem_reserve[]: 0 0 0 0 [46321.546480] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB [46321.588226] Node 0 DMA32: 6462*4kB (UME) 1397*8kB (UM) 9*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 37168kB [46321.628535] Node 0 Normal: 4296*4kB (UMEC) 880*8kB (UMEC) 130*16kB (UMC) 9*32kB (C) 1*64kB (C) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 26656kB [46321.672956] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [46321.699161] 1267593 total pagecache pages [46321.712115] 933 pages in swap
[PATCH v2 03/13] btrfs: maintain consistency in logging to help debugging
Optional Label may or may not be set, or it might be set at some time later. However while debugging to search through the kernel logs the scripts would need the logs to be consistent, so logs search key words shouldn't depend on the optional variables, instead fsid is better. Signed-off-by: Anand Jain--- v2: fix commit log fs/btrfs/volumes.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9860b10..36108e9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1080,13 +1080,8 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, ret = device_list_add(path, disk_super, devid, fs_devices_ret); if (ret > 0) { - if (disk_super->label[0]) { - printk(KERN_INFO "BTRFS: device label %s ", disk_super->label); - } else { - printk(KERN_INFO "BTRFS: device fsid %pU ", disk_super->fsid); - } - - printk(KERN_CONT "devid %llu transid %llu %s\n", devid, transid, path); + printk(KERN_INFO "BTRFS: device fsid %pU devid %llu transid %llu %s\n", + disk_super->fsid, devid, transid, path); ret = 0; } if (!ret && fs_devices_ret) -- 2.7.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check help
Vincent Olivier wrote on 2015/11/27 06:25 -0500: On Nov 26, 2015, at 10:03 PM, Vincent Olivier <vinc...@up4.com> wrote: On Nov 25, 2015, at 8:44 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote: Vincent Olivier wrote on 2015/11/25 11:51 -0500: I should probably point out that there is 64GB of RAM on this machine and it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs served via Samba and the kernel panic was caused Btrfs (as per what I remember from the log on the screen just before I rebooted) and happened in the middle of the night when zero (0) client was connected. You will find below the full “btrfs check” log for each device in the order it is listed by “btrfs fi show”. There is really no need to do such thing, as btrfs is able to manage multiple device, calling btrfsck on any of them is enough as long as it's not hugely damaged. Ca I get a strong confirmation that I should run with the “—repair” option on each device? Thanks. YES. Inode nbytes fix is *VERY* safe as long as it's the only error. Although it's not that convincing since the inode nbytes fix code is written by myself and authors always tend to believe their codes are good But at least, some other users with more complicated problem(with inode nbytes error) fixed it. The last decision is still on you anyway. I will do it on the first device from the “fi show” output and report. ok this doesn’t look good. i ran —repair and check again and it looks even worse. please help. [root@3dcpc5 ~]# btrfs check --repair /dev/sdk enabling repair mode Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots reset nbytes for ino 1341670 root 5 reset nbytes for ino 1341670 root 11406 As mentioned by other guys, inode nbytes seems to be fixed. But to make it sure, if the inode is a directory or a normal file? warning line 3653 Seems to be a unexpected warning. The subvolume root seems to be shared by other subvolume. It may be one corner case for inode nbytes repair code. But it seems no harm yet. checking csums checking root refs found 19343374874998 bytes used err is 0 total csum bytes: 18863243900 total tree bytes: 27413118976 total fs tree bytes: 4455694336 total extent tree bytes: 3077373952 btree space waste bytes: 2882193883 file data blocks allocated: 19461564538880 referenced 20155355832320 root@3dcpc5 ~]# btrfs check /dev/sdk Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents checking free space cache block group 53328591454208 has wrong amount of free space failed to load free space cache for block group 53328591454208 block group 53329665196032 has wrong amount of free space failed to load free space cache for block group 53329665196032 Wanted offset 58836887044096, found 58836887011328 Wanted offset 58836887044096, found 58836887011328 cache appears valid but isnt 58836887011328 Wanted offset 60505481887744, found 60505481805824 Wanted offset 60505481887744, found 60505481805824 cache appears valid but isnt 60505481805824 Wanted bytes 16384, found 81920 for off 60979001966592 Wanted bytes 1073725440, found 81920 for off 60979001966592 cache appears valid but isnt 60979001950208 Wanted offset 61297908056064, found 61297908006912 Wanted offset 61297908056064, found 61297908006912 cache appears valid but isnt 61297903271936 Wanted bytes 32768, found 16384 for off 61711301296128 Wanted bytes 1066319872, found 16384 for off 61711301296128 cache appears valid but isnt 61711293874176 There is no free space entry for 62691824041984-62691824058368 There is no free space entry for 62691824041984-62692693901312 cache appears valid but isnt 62691620159488 There is no free space entry for 63723505205248-63723505221632 There is no free space entry for 63723505205248-63724559794176 cache appears valid but isnt 63723486052352 Wanted bytes 32768, found 16384 for off 64746920902656 Wanted bytes 914849792, found 16384 for off 64746920902656 cache appears valid but isnt 64746762010624 There is no free space entry for 65770368401408-65770368434176 There is no free space entry for 65770368401408-6577710720 cache appears valid but isnt 65770037968896 Wanted offset 66758954270720, found 66758954221568 Wanted offset 66758954270720, found 66758954221568 cache appears valid but isnt 66758954188800 block group 70204591702016 has wrong amount of free space failed to load free space cache for block group 70204591702016 block group 70205665443840 has wrong amount of free space failed to load free space cache for block group 70205665443840 block group 70206739185664 has wrong amount of free space failed to load free space cache for block group 70206739185664 Wanted offset 70216543715328, found 70216543698944 Wanted offset 70216543715328, found 70216543698944 cache appears va
Re: btrfs check help
My experience/interpretation of the 2 checks is that it is OK, see some more comments inserted below. Hopefully a developer of btrfs-progs can comment in more detail. > [root@3dcpc5 ~]# btrfs check --repair /dev/sdk > enabling repair mode > Checking filesystem on /dev/sdk > UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated This might be there because of the crash earlier, but a cache invalidation should not be a problem. > checking fs roots > reset nbytes for ino 1341670 root 5 > reset nbytes for ino 1341670 root 11406 At least the nbytes error seems to be fixed. > warning line 3653 > checking csums > checking root refs > found 19343374874998 bytes used err is 0 > total csum bytes: 18863243900 > total tree bytes: 27413118976 > total fs tree bytes: 4455694336 > total extent tree bytes: 3077373952 > btree space waste bytes: 2882193883 > file data blocks allocated: 19461564538880 > referenced 20155355832320 The second readonly check partly can't deal with the just invalidated space cache I think (I assume you haven't mounted and/or/ used read-write the filesystem in between), but even if the space cache wouldn't be touched in the --repair check, my experience is that those errors, like in dmesg on my system: [38018.645187] BTRFS info (device sdi): The free space cache file (6258971115520) is invalid. skip it will disappear over time when the filesystem is filled/used. This particular error is from a backup fs where one disk had gone bad. A btrfs replace still worked and just after that, I saw many of those errors, but now after a few weeks they are mostly gone. I did not explicitly unmount or check--repair the fs, I just had to reboot the system for another reason. Your kernel+tools is new enough, you probably want to have a look at the 'Space cache control' options on the wiki: https://btrfs.wiki.kernel.org/index.php/Mount_options before you decide what to do. > root@3dcpc5 ~]# btrfs check /dev/sdk > Checking filesystem on /dev/sdk > UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > checking extents > checking free space cache > block group 53328591454208 has wrong amount of free space > failed to load free space cache for block group 53328591454208 > block group 53329665196032 has wrong amount of free space > failed to load free space cache for block group 53329665196032 > Wanted offset 58836887044096, found 58836887011328 > Wanted offset 58836887044096, found 58836887011328 > cache appears valid but isnt 58836887011328 > Wanted offset 60505481887744, found 60505481805824 > Wanted offset 60505481887744, found 60505481805824 > cache appears valid but isnt 60505481805824 > Wanted bytes 16384, found 81920 for off 60979001966592 > Wanted bytes 1073725440, found 81920 for off 60979001966592 > cache appears valid but isnt 60979001950208 > Wanted offset 61297908056064, found 61297908006912 > Wanted offset 61297908056064, found 61297908006912 > cache appears valid but isnt 61297903271936 > Wanted bytes 32768, found 16384 for off 61711301296128 > Wanted bytes 1066319872, found 16384 for off 61711301296128 > cache appears valid but isnt 61711293874176 > There is no free space entry for 62691824041984-62691824058368 > There is no free space entry for 62691824041984-62692693901312 > cache appears valid but isnt 62691620159488 > There is no free space entry for 63723505205248-63723505221632 > There is no free space entry for 63723505205248-63724559794176 > cache appears valid but isnt 63723486052352 > Wanted bytes 32768, found 16384 for off 64746920902656 > Wanted bytes 914849792, found 16384 for off 64746920902656 > cache appears valid but isnt 64746762010624 > There is no free space entry for 65770368401408-65770368434176 > There is no free space entry for 65770368401408-6577710720 > cache appears valid but isnt 65770037968896 > Wanted offset 66758954270720, found 66758954221568 > Wanted offset 66758954270720, found 66758954221568 > cache appears valid but isnt 66758954188800 > block group 70204591702016 has wrong amount of free space > failed to load free space cache for block group 70204591702016 > block group 70205665443840 has wrong amount of free space > failed to load free space cache for block group 70205665443840 > block group 70206739185664 has wrong amount of free space > failed to load free space cache for block group 70206739185664 > Wanted offset 70216543715328, found 70216543698944 > Wanted offset 70216543715328, found 70216543698944 > cache appears valid but isnt 70216537079808 > Wanted offset 71025067474944, found 71025067409408 > Wanted offset 71025067474944, found 71025067409408 > cache appears valid but isnt 71025064673280 > Wanted offset 71455641354240, found 71455641337856 > Wanted offset 71455641354240, found 71455641337856 > cache appears valid but isnt 71455635144704 > block group 71662867316736 has wrong amount of free space > failed to load free space
Re: btrfs check help
> On Nov 26, 2015, at 10:03 PM, Vincent Olivier <vinc...@up4.com> wrote: > >> >> On Nov 25, 2015, at 8:44 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote: >> >> >> >> Vincent Olivier wrote on 2015/11/25 11:51 -0500: >>> I should probably point out that there is 64GB of RAM on this machine and >>> it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs >>> served via Samba and the kernel panic was caused Btrfs (as per what I >>> remember from the log on the screen just before I rebooted) and happened in >>> the middle of the night when zero (0) client was connected. >>> >>> You will find below the full “btrfs check” log for each device in the order >>> it is listed by “btrfs fi show”. >> >> There is really no need to do such thing, as btrfs is able to manage >> multiple device, calling btrfsck on any of them is enough as long as it's >> not hugely damaged. >> >>> >>> Ca I get a strong confirmation that I should run with the “—repair” option >>> on each device? Thanks. >> >> YES. >> >> Inode nbytes fix is *VERY* safe as long as it's the only error. >> >> Although it's not that convincing since the inode nbytes fix code is written >> by myself and authors always tend to believe their codes are good >> But at least, some other users with more complicated problem(with inode >> nbytes error) fixed it. >> >> The last decision is still on you anyway. > > I will do it on the first device from the “fi show” output and report. ok this doesn’t look good. i ran —repair and check again and it looks even worse. please help. [root@3dcpc5 ~]# btrfs check --repair /dev/sdk enabling repair mode Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots reset nbytes for ino 1341670 root 5 reset nbytes for ino 1341670 root 11406 warning line 3653 checking csums checking root refs found 19343374874998 bytes used err is 0 total csum bytes: 18863243900 total tree bytes: 27413118976 total fs tree bytes: 4455694336 total extent tree bytes: 3077373952 btree space waste bytes: 2882193883 file data blocks allocated: 19461564538880 referenced 20155355832320 root@3dcpc5 ~]# btrfs check /dev/sdk Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents checking free space cache block group 53328591454208 has wrong amount of free space failed to load free space cache for block group 53328591454208 block group 53329665196032 has wrong amount of free space failed to load free space cache for block group 53329665196032 Wanted offset 58836887044096, found 58836887011328 Wanted offset 58836887044096, found 58836887011328 cache appears valid but isnt 58836887011328 Wanted offset 60505481887744, found 60505481805824 Wanted offset 60505481887744, found 60505481805824 cache appears valid but isnt 60505481805824 Wanted bytes 16384, found 81920 for off 60979001966592 Wanted bytes 1073725440, found 81920 for off 60979001966592 cache appears valid but isnt 60979001950208 Wanted offset 61297908056064, found 61297908006912 Wanted offset 61297908056064, found 61297908006912 cache appears valid but isnt 61297903271936 Wanted bytes 32768, found 16384 for off 61711301296128 Wanted bytes 1066319872, found 16384 for off 61711301296128 cache appears valid but isnt 61711293874176 There is no free space entry for 62691824041984-62691824058368 There is no free space entry for 62691824041984-62692693901312 cache appears valid but isnt 62691620159488 There is no free space entry for 63723505205248-63723505221632 There is no free space entry for 63723505205248-63724559794176 cache appears valid but isnt 63723486052352 Wanted bytes 32768, found 16384 for off 64746920902656 Wanted bytes 914849792, found 16384 for off 64746920902656 cache appears valid but isnt 64746762010624 There is no free space entry for 65770368401408-65770368434176 There is no free space entry for 65770368401408-6577710720 cache appears valid but isnt 65770037968896 Wanted offset 66758954270720, found 66758954221568 Wanted offset 66758954270720, found 66758954221568 cache appears valid but isnt 66758954188800 block group 70204591702016 has wrong amount of free space failed to load free space cache for block group 70204591702016 block group 70205665443840 has wrong amount of free space failed to load free space cache for block group 70205665443840 block group 70206739185664 has wrong amount of free space failed to load free space cache for block group 70206739185664 Wanted offset 70216543715328, found 70216543698944 Wanted offset 70216543715328, found 70216543698944 cache appears valid b
Re: btrfs check help
On Fri, Nov 27, 2015 at 4:25 AM, Vincent Olivierwrote: > > [root@3dcpc5 ~]# btrfs check --repair /dev/sdk > enabling repair mode > Checking filesystem on /dev/sdk > UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > reset nbytes for ino 1341670 root 5 > reset nbytes for ino 1341670 root 11406 > warning line 3653 I'm not sure what that last line means. > root@3dcpc5 ~]# btrfs check /dev/sdk > Checking filesystem on /dev/sdk > UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > checking extents > checking free space cache > block group 53328591454208 has wrong amount of free space > failed to load free space cache for block group 53328591454208 > block group 53329665196032 has wrong amount of free space > failed to load free space cache for block group 53329665196032 > Wanted offset 58836887044096, found 58836887011328 > Wanted offset 58836887044096, found 58836887011328 > cache appears valid but isnt 58836887011328 > Wanted offset 60505481887744, found 60505481805824 > Wanted offset 60505481887744, found 60505481805824 > cache appears valid but isnt 60505481805824 > Wanted bytes 16384, found 81920 for off 60979001966592 > Wanted bytes 1073725440, found 81920 for off 60979001966592 > cache appears valid but isnt 60979001950208 > Wanted offset 61297908056064, found 61297908006912 > Wanted offset 61297908056064, found 61297908006912 > cache appears valid but isnt 61297903271936 > Wanted bytes 32768, found 16384 for off 61711301296128 > Wanted bytes 1066319872, found 16384 for off 61711301296128 > cache appears valid but isnt 61711293874176 > There is no free space entry for 62691824041984-62691824058368 > There is no free space entry for 62691824041984-62692693901312 > cache appears valid but isnt 62691620159488 > There is no free space entry for 63723505205248-63723505221632 > There is no free space entry for 63723505205248-63724559794176 > cache appears valid but isnt 63723486052352 > Wanted bytes 32768, found 16384 for off 64746920902656 > Wanted bytes 914849792, found 16384 for off 64746920902656 > cache appears valid but isnt 64746762010624 > There is no free space entry for 65770368401408-65770368434176 > There is no free space entry for 65770368401408-6577710720 > cache appears valid but isnt 65770037968896 > Wanted offset 66758954270720, found 66758954221568 > Wanted offset 66758954270720, found 66758954221568 > cache appears valid but isnt 66758954188800 > block group 70204591702016 has wrong amount of free space > failed to load free space cache for block group 70204591702016 > block group 70205665443840 has wrong amount of free space > failed to load free space cache for block group 70205665443840 > block group 70206739185664 has wrong amount of free space > failed to load free space cache for block group 70206739185664 > Wanted offset 70216543715328, found 70216543698944 > Wanted offset 70216543715328, found 70216543698944 > cache appears valid but isnt 70216537079808 > Wanted offset 71025067474944, found 71025067409408 > Wanted offset 71025067474944, found 71025067409408 > cache appears valid but isnt 71025064673280 > Wanted offset 71455641354240, found 71455641337856 > Wanted offset 71455641354240, found 71455641337856 > cache appears valid but isnt 71455635144704 > block group 71662867316736 has wrong amount of free space > failed to load free space cache for block group 71662867316736 > block group 71663941058560 has wrong amount of free space > failed to load free space cache for block group 71663941058560 > There is no free space entry for 72725872967680-72725872984064 > There is no free space entry for 72725872967680-72726945464320 > cache appears valid but isnt 72725871722496 > block group 73207981801472 has wrong amount of free space > failed to load free space cache for block group 73207981801472 > found 19343374940534 bytes used err is -22 > total csum bytes: 18863243900 > total tree bytes: 27413184512 > total fs tree bytes: 4455727104 > total extent tree bytes: 3077406720 > btree space waste bytes: 2882234096 > file data blocks allocated: 19461573357568 > referenced 20155367563264 Except for the bytes used err is -22, I think this is just acknowledging that the space caches are invalid, i.e. not a surprise. It should get rebuilt at mount time, depending on the size of the file system, it might take a while (?). -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check help
> On Nov 25, 2015, at 8:44 PM, Qu Wenruowrote: > > > > Vincent Olivier wrote on 2015/11/25 11:51 -0500: >> I should probably point out that there is 64GB of RAM on this machine and >> it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs >> served via Samba and the kernel panic was caused Btrfs (as per what I >> remember from the log on the screen just before I rebooted) and happened in >> the middle of the night when zero (0) client was connected. >> >> You will find below the full “btrfs check” log for each device in the order >> it is listed by “btrfs fi show”. > > There is really no need to do such thing, as btrfs is able to manage multiple > device, calling btrfsck on any of them is enough as long as it's not hugely > damaged. > >> >> Ca I get a strong confirmation that I should run with the “—repair” option >> on each device? Thanks. > > YES. > > Inode nbytes fix is *VERY* safe as long as it's the only error. > > Although it's not that convincing since the inode nbytes fix code is written > by myself and authors always tend to believe their codes are good > But at least, some other users with more complicated problem(with inode > nbytes error) fixed it. > > The last decision is still on you anyway. I will do it on the first device from the “fi show” output and report. Thanks, Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check help
I should probably point out that there is 64GB of RAM on this machine and it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs served via Samba and the kernel panic was caused Btrfs (as per what I remember from the log on the screen just before I rebooted) and happened in the middle of the night when zero (0) client was connected. You will find below the full “btrfs check” log for each device in the order it is listed by “btrfs fi show”. Ca I get a strong confirmation that I should run with the “—repair” option on each device? Thanks. Vincent Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdp UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [O] checking free space cache [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdi UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [.] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdq UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdh UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdm UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [O] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdj UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [.] checking free space cache [.] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdo UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [O] checking free space cache [.] checking fs roots [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes: 2881050910 file data blocks allocated: 19445786390528 referenced 20138885959680 Checking filesystem on /dev/sdg UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents [o] checking free space cache [o] root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328980191604 bytes used err is 1 total csum bytes: 18849205856 total tree bytes: 27393392640 total fs tree bytes: 4452958208 total extent tree bytes: 3075571712 btree space waste bytes:
Re: btrfs check help
[...] > Ca I get a strong confirmation that I should run with the “—repair” option on > each device? Thanks. > > Vincent > > > Checking filesystem on /dev/sdk > UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > checking extents [o] > checking free space cache [.] > root 5 inode 1341670 errors 400, nbytes wrong > root 11406 inode 1341670 errors 400, nbytes wrong [...] I just remember that I have seen this kind of error before; luckily, I found the btrfs check output (august 2015) on some backup of an old snapshot; In my case it was on a raid5 fs from november 2013, 7 small txt files (all several 100 bytes) and the 7 errors are repeated for about 10 snapshots. I did # find . -inum to find the files. 2 of the 7 were still in the latest/actual subvol and I just recreated them. The errors from the older snapshots are still there as far as I remember from the last btrfs check I did (with kernel 4.3.0 tools 4.3.x). The fs is converted to raid10 since 3 months. As I also got other fake errors (as in this https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html ), I won't run a repair until I see proof that this 'errors 400, nbytes wrong' is a risk for file-server stability. I just see that on an archive clone fs with this 10 old snapshots (created via send|receive), there is no error. In your case, it is likely just 1 small file in rootvol (5) and the same allocation in other subvol (11406), so maybe you can fix this like I did and don't run a '--repair' -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check help
On 2015-11-24 12:06, Vincent Olivier wrote: Hi, Woke up this morning with a kernel panic (for which I do not have details). Please find below the output for btrfs check. Is this normal ? What should I do ? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10. You get bonus points for being on a reasonably up-to-date kernel and userspace :) This is actually a pretty tame check result for a filesystem that's been through kernel panic. I think everything listed here is safe for check to fix, but I would suggest waiting until the devs provide opinions before actually running with --repair. I would also suggest comparing results between the different devices in the FS, if things are drastically different, you may have issues that check can't fix on it's own. [root@3dcpc5 ~]# btrfs check /dev/sdk Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents checking free space cache checking fs roots These next two lines are errors, but I'm not 100% certain if it's safe to have check fix them: root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong This next one is also an error, and I am fairly certain that it's safe to have check fix as long as the number at the end is not too big. found 19328809638262 bytes used err is 1 The rest is just reference info total csum bytes: 18849042724 total tree bytes: 27389886464 total fs tree bytes: 4449746944 total extent tree bytes: 3075457024 btree space waste bytes: 2880474254 The only other thing I know that's worth mentioning is that if the numbers on these next two lines don't match, you may be missing some writes from right before the crash. file data blocks allocated: 19430708535296 referenced 20123773407232 smime.p7s Description: S/MIME Cryptographic Signature
Re: btrfs check help
On Tue, Nov 24, 2015 at 03:28:28PM -0500, Austin S Hemmelgarn wrote: > On 2015-11-24 12:06, Vincent Olivier wrote: > >Hi, > > > >Woke up this morning with a kernel panic (for which I do not have details). > >Please find below the output for btrfs check. Is this normal ? What should I > >do ? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10. > You get bonus points for being on a reasonably up-to-date kernel and > userspace :) > > This is actually a pretty tame check result for a filesystem that's > been through kernel panic. I think everything listed here is safe > for check to fix, but I would suggest waiting until the devs provide > opinions before actually running with --repair. I would also > suggest comparing results between the different devices in the FS, > if things are drastically different, you may have issues that check > can't fix on it's own. > >[root@3dcpc5 ~]# btrfs check /dev/sdk > >Checking filesystem on /dev/sdk > >UUID: 6a742786-070d-4557-9e67-c73b84967bf5 > >checking extents > >checking free space cache > >checking fs roots > These next two lines are errors, but I'm not 100% certain if it's > safe to have check fix them: > >root 5 inode 1341670 errors 400, nbytes wrong > >root 11406 inode 1341670 errors 400, nbytes wrong I think so yes. > This next one is also an error, and I am fairly certain that it's > safe to have check fix as long as the number at the end is not too > big. > >found 19328809638262 bytes used err is 1 Agreed. Hugo. > The rest is just reference info > >total csum bytes: 18849042724 > >total tree bytes: 27389886464 > >total fs tree bytes: 4449746944 > >total extent tree bytes: 3075457024 > >btree space waste bytes: 2880474254 > The only other thing I know that's worth mentioning is that if the > numbers on these next two lines don't match, you may be missing some > writes from right before the crash. > >file data blocks allocated: 19430708535296 > >referenced 20123773407232 -- Hugo Mills | Great films about cricket: Umpire of the Rising Sun hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
btrfs check help
Hi, Woke up this morning with a kernel panic (for which I do not have details). Please find below the output for btrfs check. Is this normal ? What should I do ? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10. Regards, Vincent [root@3dcpc5 ~]# btrfs check /dev/sdk Checking filesystem on /dev/sdk UUID: 6a742786-070d-4557-9e67-c73b84967bf5 checking extents checking free space cache checking fs roots root 5 inode 1341670 errors 400, nbytes wrong root 11406 inode 1341670 errors 400, nbytes wrong found 19328809638262 bytes used err is 1 total csum bytes: 18849042724 total tree bytes: 27389886464 total fs tree bytes: 4449746944 total extent tree bytes: 3075457024 btree space waste bytes: 2880474254 file data blocks allocated: 19430708535296 referenced 20123773407232-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted RAID1: unsuccessful recovery / help needed
Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted: > If there is one subvolume that contains all other (read only) snapshots > and there is insufficient storage to copy them all separately: > Is there an elegant way to preserve those when moving the data across > disks? AFAIK, no elegant way without a writable mount. Tho I'm not sure, btrfs send, to a btrfs elsewhere using receive, may work, since you did specify read-only snapshots, which is what send normally works with in ordered to avoid changes to the snapshot while it's sending it. My own use-case doesn't involve either snapshots or send/receive, however, so I'm not sure if send can work with a read-only filesystem or not, but I think its normal method of operation is to create those read-only snapshots itself, which of course would require a writable filesystem, so I'm guessing it won't work unless you can convince it to use the read-only mounts as-is. The less elegant way would involve manual deduplication. Copy one snapshot, then another, and dedup what hasn't changed between the two, then add a third and dedup again. ... Depending on the level of dedup (file vs block level) and the level of change in your filesystem, this should ultimately take about the same level of space as a full backup plus a series of incrementals. Meanwhile, this does reinforce the point that snapshots don't replace full backups, that being the reason I don't use them here, since if the filesystem goes bad, it'll very likely take all the snapshots with it. Snapshots do tend to be pretty convenient, arguably /too/ convenient and near-zero-cost to make, as people then tend to just do scheduled snapshots, without thinking about their overhead and maintenance costs on the filesystem, until they already have problems. I'm not sure if you are a regular list reader and have thus seen my normal spiel on btrfs snapshot scaling and recommended limits to avoid problems or not, so if not, here's a slightly condensed version... Btrfs has scaling issues that appear when trying to manage too many snapshots. These tend to appear first in tools like balance and check, where time to process a filesystem goes up dramatically as the number of snapshots increases, to the point where it can become entirely impractical to manage at all somewhere near the 100k snapshots range, and is already dramatically affecting runtime at 10k snapshots. As a result, I recommend keeping per-subvol snapshots to 250-ish, which will allow snapshotting four subvolumes while still keeping total filesystem snapshots to 1000, or eight subvolumes at a filesystem total of 2000 snapshots, levels where the scaling issues should remain well within control. And 250-ish snapshots per subvolume is actually very reasonable even with half-hour scheduled snapshotting, provided a reasonable scheduled snapshot thinning program is also implemented, cutting say to hourly after six hours, six-hourly after a day, 12 hourly after 2 days, daily after a week, and weekly after four weeks to a quarter (13 weeks). Out beyond a quarter or two, certainly within a year, longer term backups to other media should be done, and snapshots beyond that can be removed entirely, freeing up the space the old snapshots kept locked down and helping to keep the btrfs healthy and functioning well within its practical scalability limits. Because a balance that takes a month to complete because it's dealing with a few hundred k snapshots is in practice (for most people) not worthwhile to do at all, and also in practice, a year or even six months out, are you really going to care about the precise half-hour snapshot, or is the next daily or weekly snapshot going to be just as good, and a whole lot easier to find among a couple hundred snapshots than hundreds of thousands? If you have far too many snapshots, perhaps this sort of thinning strategy will as well allow you to copy and dedup only key snapshots, say weekly plus daily for the last week, doing the backup thing manually, as well, modifying the thinning strategy accordingly if necessary to get it to fit. Tho using the copy and dedup strategy above will still require at least double the full space of a single copy, plus the space necessary for each deduped snapshot copy you keep, since the dedup occurs after the copy. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted RAID1: unsuccessful recovery / help needed
Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted: > Is e.g. "balance" also influenced by the userspace tools or does > the kernel the actual work? btrfs balance is done "online", that is, on the (writable-)mounted filesystem, and the kernel does the real work. It's the tools that work on the unmounted filesystem, btrfs check, btrfs restore, btrfs rescue, etc, where the userspace code does the real work, and thus where being current and having all the latests userspace fixes is vital. If you can't mount writable, you can't balance. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted RAID1: unsuccessful recovery / help needed
On Fri, Oct 30, 2015 at 10:58:47AM +, Duncan wrote: > Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted: > > > If there is one subvolume that contains all other (read only) snapshots > > and there is insufficient storage to copy them all separately: > > Is there an elegant way to preserve those when moving the data across > > disks? If they're read-only snapshots already, then yes: sent= for sub in *; do btrfs send $sent $sub | btrfs receive /where/ever sent="$sent -c$sub" done That will preserve the shared extents between the subvols on the receiving FS. If they're not read-only, then snapshotting each one again as RO before sending would be the approach, but if your FS is itself RO, that's not going to be possible, and you need to look at Duncan's email. Hugo. > AFAIK, no elegant way without a writable mount. > > Tho I'm not sure, btrfs send, to a btrfs elsewhere using receive, may > work, since you did specify read-only snapshots, which is what send > normally works with in ordered to avoid changes to the snapshot while > it's sending it. My own use-case doesn't involve either snapshots or > send/receive, however, so I'm not sure if send can work with a read-only > filesystem or not, but I think its normal method of operation is to > create those read-only snapshots itself, which of course would require a > writable filesystem, so I'm guessing it won't work unless you can > convince it to use the read-only mounts as-is. > > The less elegant way would involve manual deduplication. Copy one > snapshot, then another, and dedup what hasn't changed between the two, > then add a third and dedup again. ... Depending on the level of dedup > (file vs block level) and the level of change in your filesystem, this > should ultimately take about the same level of space as a full backup > plus a series of incrementals. > > > Meanwhile, this does reinforce the point that snapshots don't replace > full backups, that being the reason I don't use them here, since if the > filesystem goes bad, it'll very likely take all the snapshots with it. > > Snapshots do tend to be pretty convenient, arguably /too/ convenient and > near-zero-cost to make, as people then tend to just do scheduled > snapshots, without thinking about their overhead and maintenance costs on > the filesystem, until they already have problems. I'm not sure if you > are a regular list reader and have thus seen my normal spiel on btrfs > snapshot scaling and recommended limits to avoid problems or not, so if > not, here's a slightly condensed version... > > Btrfs has scaling issues that appear when trying to manage too many > snapshots. These tend to appear first in tools like balance and check, > where time to process a filesystem goes up dramatically as the number of > snapshots increases, to the point where it can become entirely > impractical to manage at all somewhere near the 100k snapshots range, and > is already dramatically affecting runtime at 10k snapshots. > > As a result, I recommend keeping per-subvol snapshots to 250-ish, which > will allow snapshotting four subvolumes while still keeping total > filesystem snapshots to 1000, or eight subvolumes at a filesystem total > of 2000 snapshots, levels where the scaling issues should remain well > within control. And 250-ish snapshots per subvolume is actually very > reasonable even with half-hour scheduled snapshotting, provided a > reasonable scheduled snapshot thinning program is also implemented, > cutting say to hourly after six hours, six-hourly after a day, 12 hourly > after 2 days, daily after a week, and weekly after four weeks to a > quarter (13 weeks). Out beyond a quarter or two, certainly within a > year, longer term backups to other media should be done, and snapshots > beyond that can be removed entirely, freeing up the space the old > snapshots kept locked down and helping to keep the btrfs healthy and > functioning well within its practical scalability limits. > > Because a balance that takes a month to complete because it's dealing > with a few hundred k snapshots is in practice (for most people) not > worthwhile to do at all, and also in practice, a year or even six months > out, are you really going to care about the precise half-hour snapshot, > or is the next daily or weekly snapshot going to be just as good, and a > whole lot easier to find among a couple hundred snapshots than hundreds > of thousands? > > If you have far too many snapshots, perhaps this sort of thinning > strategy will as well allow you to copy and dedup only key snapshots, say > weekly plus daily for the last week, doing the backup thing manually, as > well, modifying the thinning strategy accordingly if necessary to get it > to fit. Tho using the copy and dedup strategy above will still require > at least double the full space of a single copy, plus the space necessary > for each deduped snapshot