Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Hugo Mills
On Mon, Aug 03, 2015 at 11:05:46AM -0400, Sonic wrote:
 On Mon, Aug 3, 2015 at 10:43 AM, Sonic sonicsm...@gmail.com wrote:
  Is btrfs rescue super-recover safe to run? IOW, will it ask before
  doing anything possibly destructive (assuming I don't give it a -y)?
 
 Seemed a bit safe so I went for it and:
 
 sartre ~ # btrfs rescue super-recover /dev/sdc
 All supers are valid, no need to recover
 sartre ~ # btrfs rescue super-recover /dev/sde
 All supers are valid, no need to recover
 
 So it may not be a superblock issue.
 
 From the dmesg earlier:
 [ 3421.193734] BTRFS (device sde): bad tree block start
 8330001001141004672 20971520
 [ 3421.193738] BTRFS: failed to read chunk root on sde
 [ 3421.203221] BTRFS: open_ctree failed
 
 I may need a chunk-recover and also wonder if zero-log is advisable.
 Any ideas in those directions?

   Very unlikely and definitely not, respectively. There's nothing at
all here to indicate that you've got a broken log, so dropping it
would be at best pointless. The chunk tree is also most likely
undamaged on both copies.

   Hugo.

-- 
Hugo Mills | The early bird gets the worm, but the second mouse
hugo@... carfax.org.uk | gets the cheese.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
On Mon, Aug 3, 2015 at 1:17 AM, Duncan 1i5t5.dun...@cox.net wrote:
 The first thing you need to do in terms of trying to recover, is restore
 the superblock on the damaged device.  Since btrfs keeps multiple copies
 (up to three, once the filesystem is large enough, as yours is) per
 device, that's actually relatively easy.  Use...

 btrfs rescue super-recover

Not sure how to tell if there is a superblock issue:
===
btrfs-show-super -f /dev/sdc
superblock: bytenr=65536, device=/dev/sdc
-
dev_item.type   0
dev_item.total_bytes4000787030016
dev_item.bytes_used 3527267057664
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520)
chunk length 16777216 owner 2 type SYSTEM|RAID0 num_stripes 2
stripe 0 devid 2 offset 1048576
stripe 1 devid 1 offset 20971520
backup_roots[4]:
backup 0:
backup_tree_root:   1517037699072   gen: 9025   level: 1
backup_chunk_root:  20971520gen: 8957   level: 1
backup_extent_root: 576585728   gen: 9025   level: 2
backup_fs_root: 2056568832  gen: 1106   level: 0
backup_dev_root:52576256gen: 9021   level: 1
backup_csum_root:   1517028753408   gen: 9025   level: 3
backup_total_bytes: 8001574060032
backup_bytes_used:  7038625824768
backup_num_devices: 2

backup 1:
backup_tree_root:   1517167755264   gen: 9026   level: 1
backup_chunk_root:  20971520gen: 8957   level: 1
backup_extent_root: 1517167771648   gen: 9026   level: 2
backup_fs_root: 2056568832  gen: 1106   level: 0
backup_dev_root:52576256gen: 9021   level: 1
backup_csum_root:   2503711637504   gen: 9026   level: 3
backup_total_bytes: 8001574060032
backup_bytes_used:  7038625824768
backup_num_devices: 2

backup 2:
backup_tree_root:   980877312   gen: 9023   level: 1
backup_chunk_root:  20971520gen: 8957   level: 1
backup_extent_root: 1026768896  gen: 9023   level: 2
backup_fs_root: 2056568832  gen: 1106   level: 0
backup_dev_root:52576256gen: 9021   level: 1
backup_csum_root:   1790377984  gen: 9023   level: 3
backup_total_bytes: 8001574060032
backup_bytes_used:  7038617616384
backup_num_devices: 2

backup 3:
backup_tree_root:   1960509440  gen: 9024   level: 1
backup_chunk_root:  20971520gen: 8957   level: 1
backup_extent_root: 1960525824  gen: 9024   level: 2
backup_fs_root: 2056568832  gen: 1106   level: 0
backup_dev_root:52576256gen: 9021   level: 1
backup_csum_root:   2106736640  gen: 9024   level: 3
backup_total_bytes: 8001574060032
backup_bytes_used:  7038617616384
backup_num_devices: 2

btrfs-show-super -f /dev/sde
superblock: bytenr=65536, device=/dev/sde
-
csum0x9634c164 [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
fsid09024c28-7932-4ddb-960d-becc1ea839e5
label   terrafirm
generation  9026
root1517167755264
sys_array_size  129
chunk_root_generation   8957
root_level  1
chunk_root  20971520
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 8001574060032
bytes_used  7038625824768
sectorsize  4096
nodesize16384
leafsize16384
stripesize  4096
root_dir6
num_devices 2
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x21
( MIXED_BACKREF |
  BIG_METADATA )
csum_type   0
csum_size   4
cache_generation9026

Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
On Mon, Aug 3, 2015 at 11:28 AM, Hugo Mills h...@carfax.org.uk wrote:
Very unlikely and definitely not, respectively. There's nothing at
 all here to indicate that you've got a broken log, so dropping it
 would be at best pointless. The chunk tree is also most likely
 undamaged on both copies.

Will it hurt to try a chunk-recover? I'm guessing it wont do anything
unless I answer Y to some prompt.

Is there anything else you would suggest?

Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
Also tried:
mount -o recovery,ro /mnt/butter/

And dmesg gives:
[88228.756622] BTRFS info (device sde): enabling auto recovery
[88228.756635] BTRFS info (device sde): disk space caching is enabled
[88228.757244] BTRFS (device sde): bad tree block start
8330001001141004672 20971520
[88228.757248] BTRFS: failed to read chunk root on sde
[88228.769877] BTRFS: open_ctree failed


On Mon, Aug 3, 2015 at 12:03 PM, Sonic sonicsm...@gmail.com wrote:
 On Mon, Aug 3, 2015 at 11:28 AM, Hugo Mills h...@carfax.org.uk wrote:
Very unlikely and definitely not, respectively. There's nothing at
 all here to indicate that you've got a broken log, so dropping it
 would be at best pointless. The chunk tree is also most likely
 undamaged on both copies.

 Will it hurt to try a chunk-recover? I'm guessing it wont do anything
 unless I answer Y to some prompt.

 Is there anything else you would suggest?

 Thanks,

 Chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
Output of btrfs check:

btrfs check /dev/sdc
warning, device 2 is missing
bytenr mismatch, want=20971520, have=0
Couldn't read chunk root
Couldn't open file system

btrfs check /dev/sde
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
bytenr mismatch, want=20971520, have=8330001001141004672
Couldn't read chunk root
Couldn't open file system

In case the above helps.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
On Mon, Aug 3, 2015 at 12:32 PM, Sonic sonicsm...@gmail.com wrote:
 btrfs rescue chunk-recover

Ran this:
btrfs rescue chunk-recover -v /dev/sde |tee brcr.txt

Got this (very end of output):
==
Unrecoverable Chunks:

Total Chunks:   3292
  Recoverable:  3292
  Unrecoverable:0

Orphan Block Groups:

Orphan Device Extents:

Fail to recover the chunk tree.
==
If earlier output of this process might be helpful I can provide it
(used tee to create a file).

Didn't work - seems all chunks found were recoverable - it's
apparently the missing chunks that are the problem.

I'm thinking this is a lost cause. Also thinking I need to provide for
some redundancy in the future :-)

Any other suggestions before I give up the ghost on this?

Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-03 Thread Qu Wenruo



John Ettedgui wrote on 2015/07/31 21:35 -0700:

On Thu, Jul 30, 2015 at 10:45 PM, John Ettedgui john.etted...@gmail.com wrote:

On Thu, Jul 30, 2015 at 10:40 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:


It seems that you're using Chromium while doing the dump. :)
If no CD drive, I'll recommend to use Archlinux installation iso to make a
bootable USB stick and do the dump.
(just download and dd would do the trick)
As its kernel and tools is much newer than most distribution.

So I did not have any usb sticks large enough for this task (only 4Gb)
so I restarted into emergency runlevel with only / mounted and as ro,
I hope that'll do.


It's better to provide two trace.
One is the function tracer one, with btrfs:* as set_event.
The other is the function_graph one. with btrfs_mount as
set_graph_function.

So I got 2 new traces, and I am hoping that these are what you meant,
but I am still not sure.
Here are the commands I used in case...:

trace-cmd record -o
trace-function_graph.dat -p function_graph -g btrfs_mount mount MountPoint

and

trace-function_graph.dat -p function -l 'btrfs_*' mount MountPoint
(using -e btrfs only lead to a crash but -l 'btrfs_*' passed, though I
am sure they have different purposes.. I hope that's the correct one)

The first one was so big, 2Gb, I had to use xz to compress it and host
it somewhere else, the ML would most likely not take it.
The other one is quite small but I hosted it in the same place
Here are the links:
https://mega.nz/#!8tgTjKyK!XJnWH05bsv9sJ3nANIxKsdkL20RePPS4cKgWSxit0eQ
https://mega.nz/#!xopkVA6L!z9xjo3us1Nv6wdOs05jNZdhNbiAP5yeLdneEp0huUzI

I hope that was it this time!

Oh, you were using trace-cmd, that's why the data is so huge.

I was originally hoping you just copy the trace file, which is human 
readable and not so huge.


But that's OK anyway.

I'll try to analyse it to find a clue if possible.

Thanks,
Qu

Thanks,
John


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-03 Thread John Ettedgui
On Mon, Aug 3, 2015 at 6:55 PM, John Ettedgui john.etted...@gmail.com wrote:
 On Mon, Aug 3, 2015 at 6:39 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Oh, you were using trace-cmd, that's why the data is so huge.
 Oh, I thought it was just automating the work for me, but without any
 sort of impact.

 I was originally hoping you just copy the trace file, which is human
 readable and not so huge.
 If you mean something like the ouput of trace-cmd report, it was
 actually bigger than the dat files (about twice the size) that's why I
 shared the dats instead.
 If you want the reports instead I'll gladly share them.
In case it helps here are the reports instead of the dats:
https://mega.co.nz/#!FwpwHQyL!m0dQHSfQSNGzw9yUwJ6l0eb7Mzta0pOSAf1JHDZ1zfo
https://mega.co.nz/#!B1JgXLxZ!oI1bm0RyhqFbkCWnT95GNKohGozmvqxgJDSUtVdo77s

I guess once compressed the size difference is meaningless

Thanks,
John
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-03 Thread John Ettedgui
On Mon, Aug 3, 2015 at 6:39 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Oh, you were using trace-cmd, that's why the data is so huge.
Oh, I thought it was just automating the work for me, but without any
sort of impact.

 I was originally hoping you just copy the trace file, which is human
 readable and not so huge.
If you mean something like the ouput of trace-cmd report, it was
actually bigger than the dat files (about twice the size) that's why I
shared the dats instead.
If you want the reports instead I'll gladly share them.

 But that's OK anyway.

 I'll try to analyse it to find a clue if possible.

 Thanks,
 Qu
Great thank you!

By the way, I just thought of a few things to mention.
This btrfs partition is an ext4 converted partition, and I hit the
same behavior as these guys under heavy load:
http://www.spinics.net/lists/linux-btrfs/msg44660.html
http://www.spinics.net/lists/linux-btrfs/msg44191.html
I don't think it's related to the crash, but maybe to the conversion?

Thanks Qu!
John
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
Out of frustration and vodka I tried:

btrfs check --repair /dev/sde

To instantly be met with:

enabling repair mode
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
bytenr mismatch, want=20971520, have=8330001001141004672
Couldn't read chunk root
Couldn't open file system



On Mon, Aug 3, 2015 at 10:20 PM, Sonic sonicsm...@gmail.com wrote:
 On Mon, Aug 3, 2015 at 12:32 PM, Sonic sonicsm...@gmail.com wrote:
 btrfs rescue chunk-recover

 Ran this:
 btrfs rescue chunk-recover -v /dev/sde |tee brcr.txt

 Got this (very end of output):
 ==
 Unrecoverable Chunks:

 Total Chunks:   3292
   Recoverable:  3292
   Unrecoverable:0

 Orphan Block Groups:

 Orphan Device Extents:

 Fail to recover the chunk tree.
 ==
 If earlier output of this process might be helpful I can provide it
 (used tee to create a file).

 Didn't work - seems all chunks found were recoverable - it's
 apparently the missing chunks that are the problem.

 I'm thinking this is a lost cause. Also thinking I need to provide for
 some redundancy in the future :-)

 Any other suggestions before I give up the ghost on this?

 Thanks,

 Chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-03 Thread John Ettedgui
On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Oh, converted...
 That's too bad. :(

 [[What's wrong with convert]]
 Although btrfs is flex enough in theory to fit itself into the free space of
 ext* and works fine,
 But in practice, ext* is too fragmental in the standard of btrfs, not to
 mention it also enables mixed-blockgroup.

Oh oh :/

 [[Recommendations]]
 I'd recommend to delete the ext*_img subvolume and rebalance all chunks in
 the fs if you're stick to the converted filesystem.

Already done (well the rebalance crashed towards the end both time
with the read only error, but someone on #btrfs looked at my partition
stats and said it was probably good enough)
 Although the best practice is staying away from such converted fs, either
 using pure, newly created btrfs, or convert back to ext* before any balance.

Unfortunately I don't have enough hard drive space to do a clean
btrfs, so my only way to use btrfs for that partition was a
conversion.
 [[But before that, just try something]]
 But you have already provided some interesting facts. As the filesystem is
 high fragmented, I'd like to recommend to do some little test:
 (BTW I assume you don't use some special mount options)
Current mount options in fstab:
btrfs   defaults,noatime,compress=lzo,space_cache,autodefrag0   0
It's the same as my other btrfs partitions, apart from the fact that
they are on a SSD and way smaller.
 To test if it's the space cache causing the mount speed drop.

 1) clear page cache
# echo 3  /proc/sys/vm/drop_caches
 2) Do a normal mount
Just as what you do as usual, with your normal mount options
Record the mount time
0.01s user 0.42s system 0% cpu 1:01.70 total
 3) umount it.
not asked but might as well:
0.00s user 0.65s system 1% cpu 35.536 total
 4) clear page cache
# echo 3  /proc/sys/vm/drop_caches
 5) mount it with clear_cache mount option
It may takes sometime to clear the existing cache.
It's just used to clear space cache.
Don't compare mount time!
Yes I know it's supposed to be slower :)
although... it was pretty much the same actually:
0.01s user 0.44s system 0% cpu 1:02.07 total
 6) umount it

 7) clear page cache
# echo 3  /proc/sys/vm/drop_caches
Is it ok if that value never changed since 1) ?
 8) mount with nospace_cache mount option
To see if there is obvious mount time change.

0.00s user 0.44s system 0% cpu 1:01.86 total
 Hopes that's the space cache thing causing the slow mount.
 But don't expect it too much anyway, it's just one personal guess.

Unfortunately it is about the same :/
 After the test, I'd recommend to follow the [[Recommendations]] if you just
 want a stable filesystem.

I am already within these recommendations I think.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Doc: Add extra notes for qgroup

2015-08-03 Thread Qu Wenruo
Add extra explaination on btrfs qgroup on the following two things:
1) Need sync to show accurate qgroup numbers
2) Cow may cause extra quota usage

Although btrfs qgroup is still buggy, especially for limit behavior, but
some strange behavior is not really a bug.
Just add these explaination for end users if they really want to try it.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 Documentation/btrfs-qgroup.asciidoc | 40 +
 1 file changed, 40 insertions(+)

diff --git a/Documentation/btrfs-qgroup.asciidoc 
b/Documentation/btrfs-qgroup.asciidoc
index 57cf012..f2fc514 100644
--- a/Documentation/btrfs-qgroup.asciidoc
+++ b/Documentation/btrfs-qgroup.asciidoc
@@ -127,6 +127,46 @@ If no prefix is given, use ascending order by default.
 +
 If multiple attrs is given, use comma to separate.
 
+EXTRA NOTES
+---
+1. Need sync before *btrfs qgroup show* command
++
+Sync is needed to output correct numbers, especially after kernel v4.2.
+
+2. Copy-on-write(CoW) may cause extra space usage
++
+Cow will cause extra space usage, compared to overwrite filesystem, like ext4
+or xfs.
++
+Here is a example, for a file with 12K contents:
++
+--
+File1:
+0  4K  8K  12K
+|---Extent A--|
+--
++
+It takes 12K space.
++
+But after a rewrite of the 4K in the middle: +
++
+--
+File1:
+0  4K  8K  12K
+|---Extent A--|
+   |--B--|
+--
++
+New extent B will record the new data, although the original extent A still
+exists, until none part of it is referred. +
+So in this case, it will take 12 + 4 = 16K, other than 12K as a traditional
+filesystem.
++
+The overhead can be reduced by either using *nodatacow* mount option to disable
+possible COW or defrag. +
+But it won't completely remove the extra space usage when snapshot/subvolume is
+involved.
+
 EXIT STATUS
 ---
 *btrfs qgroup* returns a zero exit status if it succeeds. Non zero is
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-03 Thread Qu Wenruo



John Ettedgui wrote on 2015/08/03 18:55 -0700:

On Mon, Aug 3, 2015 at 6:39 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:


Oh, you were using trace-cmd, that's why the data is so huge.

Oh, I thought it was just automating the work for me, but without any
sort of impact.


I was originally hoping you just copy the trace file, which is human
readable and not so huge.

If you mean something like the ouput of trace-cmd report, it was
actually bigger than the dat files (about twice the size) that's why I
shared the dats instead.
If you want the reports instead I'll gladly share them.

Nop, not the report, but /sys/kernel/debug/tracing/trace.

But that needs some manual operation, like set event and graph functions.


But that's OK anyway.

I'll try to analyse it to find a clue if possible.

Thanks,
Qu

Great thank you!

By the way, I just thought of a few things to mention.
This btrfs partition is an ext4 converted partition, and I hit the
same behavior as these guys under heavy load:
http://www.spinics.net/lists/linux-btrfs/msg44660.html
http://www.spinics.net/lists/linux-btrfs/msg44191.html
I don't think it's related to the crash, but maybe to the conversion?


Oh, converted...
That's too bad. :(

[[What's wrong with convert]]
Although btrfs is flex enough in theory to fit itself into the free 
space of ext* and works fine,
But in practice, ext* is too fragmental in the standard of btrfs, not to 
mention it also enables mixed-blockgroup.



[[Recommendations]]
I'd recommend to delete the ext*_img subvolume and rebalance all chunks 
in the fs if you're stick to the converted filesystem.


Although the best practice is staying away from such converted fs, 
either using pure, newly created btrfs, or convert back to ext* before 
any balance.


[[But before that, just try something]]
But you have already provided some interesting facts. As the filesystem 
is high fragmented, I'd like to recommend to do some little test:

(BTW I assume you don't use some special mount options)
To test if it's the space cache causing the mount speed drop.

1) clear page cache
   # echo 3  /proc/sys/vm/drop_caches
2) Do a normal mount
   Just as what you do as usual, with your normal mount options
   Record the mount time
3) umount it.
4) clear page cache
   # echo 3  /proc/sys/vm/drop_caches
5) mount it with clear_cache mount option
   It may takes sometime to clear the existing cache.
   It's just used to clear space cache.
   Don't compare mount time!
6) umount it
7) clear page cache
   # echo 3  /proc/sys/vm/drop_caches
8) mount with nospace_cache mount option
   To see if there is obvious mount time change.

Hopes that's the space cache thing causing the slow mount.
But don't expect it too much anyway, it's just one personal guess.

After the test, I'd recommend to follow the [[Recommendations]] if you 
just want a stable filesystem.


Thanks,
Qu



Thanks Qu!
John


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Duncan
Sonic posted on Mon, 03 Aug 2015 22:20:38 -0400 as excerpted:

 I'm thinking this is a lost cause. Also thinking I need to provide for
 some redundancy in the future :-)
 
 Any other suggestions before I give up the ghost on this?

I don't remember you saying anything about trying btrfs restore.  I'm not 
sure it can do anything for this case, and it definitely requires enough 
additional space on some other filesystem to store the data it restores, 
but if you had stuff on there you'd really like to get back, it's worth a 
shot, and if you're going to be backing up in the future, you'll need the 
additional devices for the backup in any case, so might as well get 'em 
now, mkfs them to whatever (doesn't have to be btrfs since all you're 
doing is using it as a place to store the files, it's the one you're 
restoring /from/ that's btrfs), mount 'em, and try restoring to them.

Worst-case, it doesn't work and you have your set of backup devices 
already prepped and ready to go.  Best-case, it recovers pretty much 
everything.

But I am guessing restore won't be able to do anything on it's own, 
you'll need to try btrfs-find-root, and feed the bytenr from roots it 
finds into restore until you get something useful.  Take a look at the 
link and brief instructions I posted in my first reply.

Of course, if btrfs-find-root doesn't show you any roots (these are the 
filesystem master roots), or if fed those roots using -t bytenr, btrfs 
restore --list-roots (these are the various subtree roots) doesn't give 
you anything worthwhile, then it /would/ appear you're looking at a lost-
cause... unless anyone else has a bright idea at this point.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-08-03 Thread Robert Munteanu
On Mon, Aug 3, 2015 at 4:22 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Yes, you're right, that's a dead loop.

 But for better debugging, would you please upload the following info?
 1) output of command btrfs-debug-tree -t 5 DEV.
 The only important things are info about that inode.
 Whose objectid(first item in a key) is 14214570 and type is one of the
 following:
 INODE_ITEM, INODE_REF, EXTENT_DATA
 So you may only need to cut the things like below:
 ==
 item 4 key (14214570 INODE_ITEM 0) itemoff 15881 itemsize 160
 inode generation 6 transid 6 size 1073741824 nbytes
 1073741824
 block group 0 mode 100644 links 1 uid 0 gid 0
 rdev 0 flags 0x0
 item 5 key (14214570 INODE_REF XXX) itemoff 15866 itemsize 15
 inode ref index 2 namelen 5 name: file1
 item 6 key (14214570 EXTENT_DATA 0) itemoff 15813 itemsize 53
 extent data disk byte 2176843776 nr 134217728
 extent data offset 0 nr 134217728 ram 134217728
 extent compression 0
 item 7 key (14214570 EXTENT_DATA XXX) itemoff 15760 itemsize 53
 extent data disk byte 2311061504 nr 134217728
 extent data offset 0 nr 134217728 ram 134217728
 extent compression 0
 (All items with 14214570 objectid is needed to debug)
 ==

 And it's highly recommended to only cut that part and paste it.
 Not only to reduce the output, but also help your privacy.
 As you can see, INODE_REF contains file name, which can be sometimes leaking
 your personal infomation.

item 46 key (14214570 INODE_ITEM 0) itemoff 11902 itemsize 160
inode generation 2285 transid 2308 size 32768 nbytes 0
block group 0 mode 100644 links 1 uid 1000 gid 100
rdev 0 flags 0x10
item 47 key (14214570 INODE_REF 5506079) itemoff 11875 itemsize 27
inode ref index 300 namelen 17 name: root-0bc95412.log


I double-checked and there is no EXTENT_DATA entry.


 2) output of command btrfs-debug-tree -t 2 DEV
 Just in case your extent tree mismatch with fs tree.

The gzipped log is 13MB, so I've uploaded it to
https://dl.dropboxusercontent.com/u/3160732/btrfs-debug-tree-t-2.log.gz
; sha1sum is fb4c671bb90b97aa64f6d3938948100c2175e6a5 .


 If you don't like to execute 2 commands and are OK with leaking file/dir
 names, you can also use btrfs-debug-tree DEV to dump every metadata
 info.

If the above aren't enough I will provide the more comprehensive output.


 Alternatively, if btrfs-image -c9 DEV works without problem, it will
 also helps a lot for debugging.

This one is also quite large ( 332MB ) -
https://dl.dropboxusercontent.com/u/3160732/sda1-btrfs-image-c9.img ;
sha1sum is c243e127a317f69faa5548993914a678f6f79524.

Thanks,

Robert
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement

2015-08-03 Thread Qu Wenruo



David Sterba wrote on 2015/07/28 16:50 +0200:

On Tue, Jul 28, 2015 at 04:30:36PM +0800, Qu Wenruo wrote:

Although Liu Bo has already submitted a V10 version of his deduplication
implement, here is another implement for it.


What's the reason to start another implementation?


[[CORE FEATURES]]
The main design concept is the following:
1) Controllable memory usage
2) No guarantee to dedup every duplication.
3) No on-disk format change or new format
4) Page size level deduplication


1 and 2) are good goals, allow usability tradeoffs

3) so the dedup hash is stored only for the mount life time. Though it
avoids the on-disk format changes, it also reduces the effectivity. It
is possible to seed the in-memory tree by reading all files that
contain potentially duplicate blocks but one would have to do that after
each mount.

4) page-sized dedup chunk is IMHO way too small. Although it can achieve
high dedup rate, the metadata can potentially explode and cause more
fragmentation.


Implement details includes the following:
1) LRU hash maps to limit the memory usage
The hash - extent mapping is control by LRU (or unlimited), to
get a controllable memory usage (can be tuned by mount option)
alone with controllable read/write overhead used for hash searching.


In Liu Bo's series, I rejected the mount options as an interface and
will do that here as well. His patches added a dedup ioctl to (at least)
enable/disable the dedup.
BTW, would you please give me some reason why that's not a good idea to 
use mount option to trigger/change dedup options?


Thanks,
Qu



2) Reuse existing ordered_extent infrastructure
For duplicated page, it will still submit a ordered_extent(only one
page long), to make the full use of all existing infrastructure.
But only not submit a bio.
This can reduce the number of code lines.



3) Mount option to control dedup behavior
Deduplication and its memory usage can be tuned by mount option.
No need to indicated ioctl interface.


I'd say the other way around.


And further more, it can easily support BTRFS_INODE flag like
compression, to allow further per file dedup fine tunning.

[[TODO]]
3. Add support for per file dedup flags
Much easier, just like compression flags.


How is that supposed to work? You mean add per-file flags/attributes to
mark a file so it fills the dedup hash tree and is actively going to be
deduped agains other files?


Any early review or advice/question on the design is welcomed.


The implementation is looks simpler than the Liu Bo's, but (IMHO) at the
cost of reduced funcionality.

Ideally, we merge one patchset with all desired functionality. Some kind
of control interface is needed not only to enable/dsiable the whole
feature but to affect the trade-offs (memory consumptin vs dedup
efficiency vs speed), and that in a way that's flexible according to
immediate needs.

The persistent dedup hash storage is not mandatory in theory, so we
could implement an in-memory tree only mode, ie. what you're
proposing, on top of Liu Bo's patchset.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Hugo Mills
On Sun, Aug 02, 2015 at 11:42:25PM -0600, Chris Murphy wrote:
 I can't tell what the data and metadata profile are? That it won't
 mount degraded makes me think the metadata is not explicitly raid1;
 and it either raid0 or accidentally single or dup which can happen at
 mkfs time on single device, and just doing btrfs dev add to add
 another device.
 
 I think recovery is difficult but it depends on what sort of critical
 information is in those first 32MB other than the superblock. There
 are copies of the superblock so that can probably be reconstructed.

   It's probably that the small empty single chunks left behind by
mkfs are still there. I don't think we have a good solution to this
yet (other than fixing mkfs so it doesn't happen in the first place).

   Hugo.

-- 
Hugo Mills | Our so-called leaders speak
hugo@... carfax.org.uk | with words they try to jail ya
http://carfax.org.uk/  | They subjugate the meek
PGP: E2AB1DE4  | but it's the rhetoric of failure.  The Police


signature.asc
Description: Digital signature


[PATCH] btrfs: qgroup: Fix a regression in qgroup reserved space.

2015-08-03 Thread Qu Wenruo
During the change to new btrfs extent-oriented qgroup implement, due to
it doesn't use the old __qgroup_excl_accounting() for exclusive extent,
it didn't free the reserved bytes.

The bug will cause limit function go crazy as the reserved space is
never freed, increasing limit will have no effect and still cause
EQOUT.

The fix is easy, just free reserved bytes for newly created exclusive
extent as what it does before.

Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com
Signed-off-by: Yang Dongsheng yangds.f...@cn.fujitsu.com
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
To Chris:

Would you please consider merging this patch in the late v4.2 merge
window?
As it's a huge regression and the fix is small enough and just does
what it did before in __qgroup_excl_accounting() function.

The corresponding test case will follow soon.

And sorry for the regression I introduced.

Thanks,
Qu
---
 fs/btrfs/qgroup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index d5f1f03..1667567 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1618,6 +1618,11 @@ static int qgroup_update_counters(struct btrfs_fs_info 
*fs_info,
/* Exclusive - exclusive, nothing changed */
}
}
+
+   /* For exclusive extent, free its reserved bytes too */
+   if (nr_old_roots == 0  nr_new_roots == 1 
+   cur_new_count == nr_new_roots)
+   qg-reserved -= num_bytes;
if (dirty)
qgroup_dirty(fs_info, qg);
}
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4 convert bugs, wiki warning?

2015-08-03 Thread Marc MERLIN
On Sat, Aug 01, 2015 at 02:48:41PM -0600, Chris Murphy wrote:
 On Sat, Aug 1, 2015 at 2:42 PM, Hugo Mills h...@carfax.org.uk wrote:
  On Sat, Aug 01, 2015 at 11:29:40AM -0600, Chris Murphy wrote:
  Does someone with wiki edit capability want to put up a warning about
  btrfs-convert problems? I don't think it needs to be a lengthy write
  up since the scope of the problem is not clear yet. But since there's
  definitely reproduced problems that have been going on for some time
  now maybe it'd be a good idea to put up a warning until this is more
  stable?
 
  I'm thinking of just a yellow warning sidebar like thing that maybe
  just says there's been a regression and it's breaking file systems,
  sometimes irreparably?
 
 You mean a warning like the very first sentence, in bold, that's
  already on the wiki page?
 
  https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3
 
 Doh!

I added that after the last thread.
You're welcome :)

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report - btrfs hanging

2015-08-03 Thread Chris Murphy
On Mon, Aug 3, 2015 at 12:09 PM, Alex alexinbeij...@gmail.com wrote:

 20:03 ~ % uname -a
 Linux alex-ThinkPad-L530 4.1.0-rc3+ #2 SMP Sun Jul 5 22:24:05 CAT 2015
 x86_64 x86_64 x86_64 GNU/Linux

4.1.0 is not mainline anymore, so I don't see a mid release candidate
of it being relevant because those rc's become old inside of a
couple weeks. I suggest using 4.1.4 which is the current stable
version of that series, and it has quite a few Btrfs patches.

4.2.0rc5 is current mainline and you could test against that too if you wish.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Sonic
Are either one of these called for?

btrfs check --repair
or
btrfs check --repair --init-csum-tree

Seems like they might be a last ditch attempt. Is one preferred over the other?

Is:
btrfs rescue chunk-recover
a much less dangerous attempt (IOW it wont hurt to try it first)?

Thanks,

Chris


On Mon, Aug 3, 2015 at 12:22 PM, Sonic sonicsm...@gmail.com wrote:
 Output of btrfs check:

 btrfs check /dev/sdc
 warning, device 2 is missing
 bytenr mismatch, want=20971520, have=0
 Couldn't read chunk root
 Couldn't open file system

 btrfs check /dev/sde
 checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
 checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
 bytenr mismatch, want=20971520, have=8330001001141004672
 Couldn't read chunk root
 Couldn't open file system

 In case the above helps.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Bug report - btrfs hanging

2015-08-03 Thread Alex
Dear Btrfs devs,

I have an external HD formatted with btrfs, and have noticed that
various operations (copying files, deleting files, etc) hang from time
to time. Here's debug output from the latest hang:

dmesg output:

[496960.834080] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.834192] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.834261] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.834334] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.834357] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.834658] pci_bus :01: Allocating resources
[496960.834719] pci_bus :06: Allocating resources
[496960.834777] pci_bus :07: Allocating resources
[496960.834807] pci_bus :0c: Allocating resources
[496960.834833] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.834976] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[496960.835140] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.637861] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.638005] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.638114] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.638224] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.638261] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.638650] pci_bus :01: Allocating resources
[501706.638760] pci_bus :06: Allocating resources
[501706.638821] pci_bus :07: Allocating resources
[501706.638851] pci_bus :0c: Allocating resources
[501706.638878] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.639022] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[501706.639226] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment
[503148.505218] usb 4-1.5: new high-speed USB device number 4 using ehci-pci
[503148.659324] usb 4-1.5: New USB device found, idVendor=0480, idProduct=a200
[503148.659332] usb 4-1.5: New USB device strings: Mfr=2, Product=3,
SerialNumber=1
[503148.659336] usb 4-1.5: Product: External USB 3.0
[503148.659340] usb 4-1.5: Manufacturer: TOSHIBA
[503148.659343] usb 4-1.5: SerialNumber: 20140919001100F
[503148.659996] usb-storage 4-1.5:1.0: USB Mass Storage device detected
[503148.660322] scsi host10: usb-storage 4-1.5:1.0
[503149.659297] scsi 10:0:0:0: Direct-Access TOSHIBA  External USB
3.0 0PQ: 0 ANSI: 6
[503149.660108] sd 10:0:0:0: Attached scsi generic sg2 type 0
[503151.230437] sd 10:0:0:0: [sdb] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[503151.231640] sd 10:0:0:0: [sdb] Write Protect is off
[503151.231658] sd 10:0:0:0: [sdb] Mode Sense: 43 00 00 00
[503151.232815] sd 10:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[503151.255437] sd 10:0:0:0: [sdb] Attached SCSI disk
[503151.491833] BTRFS: device label AlexEHD devid 1 transid 889 /dev/sdb
[503151.876617] BTRFS info (device sdb): disk space caching is enabled
[503371.659804] INFO: task cinnamon-settin:2211 blocked for more than
120 seconds.
[503371.659817]   Not tainted 4.1.0-rc3+ #2
[503371.659821] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[503371.659835] cinnamon-settin D 880213cdbca8 0  2211   2117 0x
[503371.659848]  880213cdbca8 880215e56480 8802032e1920
88015afb6054
[503371.659856]  880213cdc000 880207f3be4c 8802032e1920

[503371.659863]  880207f3be50 880213cdbcc8 8179bd27
8802030bc000
[503371.659868] Call Trace:
[503371.659882]  [8179bd27] schedule+0x37/0x90
[503371.659898]  [8179c05e] schedule_preempt_disabled+0xe/0x10
[503371.659903]  [8179dc45] __mutex_lock_slowpath+0x95/0x110
[503371.659908]  [8179dce3] mutex_lock+0x23/0x37
[503371.659946]  [a027efcb] btrfs_show_devname+0x2b/0xe0 [btrfs]
[503371.659953]  [81229ccf] show_vfsmnt+0x3f/0x150
[503371.659958]  [81209846] m_show+0x16/0x20
[503371.659962]  [8120ee18] seq_read+0x218/0x370
[503371.659968]  [811ea568] __vfs_read+0x28/0xe0
[503371.659973]  [8130cc74] ? security_file_permission+0x84/0xa0
[503371.659977]  [811eaac6] ? rw_verify_area+0x56/0xe0
[503371.659982]  [811eabd6] vfs_read+0x86/0x140
[503371.659986]  [811eba56] SyS_read+0x46/0xb0
[503371.659992]  [81177903] ? context_tracking_user_enter+0x13/0x20
[503371.659997]  [81024bb5] ? syscall_trace_leave+0xa5/0x120
[503371.660001]  [8179ffb2] system_call_fastpath+0x16/0x75
[503371.660064] INFO: task btrfs-transacti:32320 blocked for more than
120 seconds.
[503371.660069]   Not tainted 

Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-03 Thread Duncan
Sonic posted on Mon, 03 Aug 2015 12:32:21 -0400 as excerpted:

 Are either one of these called for?
 
 btrfs check --repair or btrfs check --repair --init-csum-tree
 
 Seems like they might be a last ditch attempt. Is one preferred over the
 other?

The read-only check (without --repair) couldn't read the chunk tree, so 
check --repair probably should run into the same issue and not be able to 
do anything as well.

check --init-csum-tree reinitializes the checksums, but to do that you 
have to have something to reinitialize on, so that's unlikely to do 
anything, either.  (I did have that in mind as a possibility after the 
superblock recovery and chunk-recover operations, tho, if necessary.  
That's why I mentioned check at the end.  But the message was long enough 
already and that would have been getting ahead of things, so...)

 Is:
 btrfs rescue chunk-recover a much less dangerous attempt (IOW it wont
 hurt to try it first)?

I'd not call it less dangerous, but given that the chunk tree can't be 
read, you're already dealing with dangerous.  That's what I'd try here 
(tho Hugo's suggesting it won't help has me doubting, but that's where 
things do point from what I see).

Which is why I mentioned doing a raw device backup, before attempting 
it.  Chunk-recover is pretty invasive, and if it doesn't work, it could 
make the filesystem unrecoverable.  It's basically a one-shot deal.  A 
raw device backup can be used to avoid it being one-shot, but of course 
that does require that you have enough devices of sufficient size to 
backup to, which certainly in the terabyte range, is a potential problem 
unless you're a big-budget corporate user with a stack of spare devices 
on hand.

I believe Hugo's right on zero-log, tho.  That's a fix for a very 
specific situation, seen most often due to a specific bug that was short 
lived and has been fixed for quite some time, now, tho in certain very 
narrow post-crash situations the same fix has been known to work too.  
There's no evidence you're anywhere near that very specific situation, so 
zero-log's just not the tool for this job.  And it could make things 
slightly worse, too, tho in theory at least, all you're doing is cutting 
off the last 30 seconds or so of activity, so the chances of it doing 
major harm are small, unless you were actively rebalancing or something 
when the filesystem was last unmounted (gracefully or not).

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html