Re: shall distros run btrfsck on boot?

2015-11-24 Thread Hugo Mills
On Tue, Nov 24, 2015 at 04:26:47PM -0600, Eric Sandeen wrote:
> On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote:
> 
> > if the system was
> > shut down cleanly, you're fine barring software bugs, but if it
> > crashed, you should be running a check on the FS.
> 
> Um, no...
> 
> The *entire point* of having a journaling filesystem is that after a
> crash or power loss, a journal replay on next mount will bring the
> metadata into a consistent state.

   Not an actual argument within the discussion, but an interesting
observation on a fine distinction:

   It's interesting to note that there's a difference here between
journalling and CoW filesystems. A journalling FS needs a journal
replay to become consistent. A CoW FS is _always_ consistent, by
design. Now, btrfs has a log that should be replayed after an unclean
shutdown, but that's all about the data that got written within the
current transaction that wasn't committed, rather than about FS
metadata consistency. This means that a read-only mount of btrfs can
_actually_ be read-only, not modifying any of the data on the disk,
whereas a read-only mount of a journalling FS _must_ modify the disk
data after an unclean shitdown, in order to be useful (because the FS
isn't consistent without the journal replay).

   Hugo.

-- 
Hugo Mills | I'll take your bet, but make it ten thousand francs.
hugo@... carfax.org.uk | I'm only a _poor_ corrupt official.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  Capt. Renaud, Casablanca


signature.asc
Description: Digital signature


Re: shall distros run btrfsck on boot?

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 22:33 +, Hugo Mills wrote:
> whereas a read-only mount of a journalling FS _must_ modify the disk
> data after an unclean shitdown, in order to be useful (because the FS
> isn't consistent without the journal replay).
I've always considered that rather a bug,... or at least a very
annoying handling in ext*
If I specify "read-only" than nothing should ever be written.
If that's not possible because of an unclean shutdown and a journal
that needs to be replayed, the mount should (without any further
special option) rather fail then mount it pseudo-read-only.

Cheers,
Chris

smime.p7s
Description: S/MIME cryptographic signature


Re: subvols and parents - how?

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 21:55 +, Hugo Mills wrote:
>    In practice, new content is checked by a number of people when
> it's
> put in, so the case of someone putting random poorly-thought-out crap
> in the wiki isn't particularly likely to happen.
Well... it may work in 99% cases... but there could something slip
through, which isn't as easy the case in manpages, which also tend to
be less messy than the huge pile of wiki pages where similar/related
things are described on different pages.

Imagine a case, a non-experienced user update the wiki saying that --
repair should be used, he may not even doing it in bad faith, perhaps
he had success with it and now writes a recipe.
It may take a while until someone of the more experienced guys notices
that and corrects it.
But if ", in the meantime had some fs corruptions,... I may experience
already severe problems by following that suggestion... (and while I do
have many backups of all my data, others may not, and if their life's
data is concerned, they'd be screwed).

So even if it takes you just a few hours to correct such rubbish, you
know that Murphy's law may still hit n people during that time ;-)


> Please feel free to add the things you'd like to see. As I said
> above, we do check the docs on the wiki as they're changed, so if
> you're wrong on some details, it won't be a major issue for very
> long. If you want to discuss details before you write something,
> start
> a conversation -- either on here, or on IRC (or even on the Talk
> pages
> of the wiki).
Well I can write a list together of things which I think should be part
of some more general documentation (i.e. less documentation about
options of the tools)... questions a complete newcomer to btrfs may
have who needs however more than "just a filesystem".


>    Note that the "parent" of send -p and of snapshots is not the same
> relationship as the "parent" (containing subvol) of the tree
> structure. This is an awkward nomenclature problem, and I'm not sure
> how to fix it.
Yeah, that was clear... :-)
Maybe call the "parent" from send -p "base" or something like that...
IMHO that would fit more as the parent there is more like a
"fundament".

Anyway, it's still not as bad as the usage of "RAID1" ;-)


> because
> you can't rename a subvol across another subvol boundary.
That's not quite clear to me... I had subvols like that:
/top/root/below-root
/top/below-top
and was able to move that to:
/top/root/below-top
/top/below-root

i.e. not just changing names but swapping as in:
mv /top/root/below-top /top/tmp
mv /top/below-root /top/root/below-root
mv /top/tmp /top/below-top

with top, root, below-top and below-root all being the same subvols


Thanks a lot for your explanations :)

Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Qu Wenruo



Christoph Anton Mitterer wrote on 2015/11/24 19:25 +0100:

On Tue, 2015-11-24 at 13:35 +0800, Qu Wenruo wrote:

Hopes you didn't wait too long.

No worries, didn't hold my breath ;)



The fixing patch is CCed to you, or you can get it from patchwork:
https://patchwork.kernel.org/patch/7687611/

Unfortunately that doesn't make the error messages go away.
:(

Shall I start debugging again?

Cheers,
Chris.


Quite strange...

I succeeded in reproducing the bug, just disable skinny metadata and 
create fill a btrfs with fsstress.


Btrfsck will report a lot of such false alert.
But with my patch applied, all the warning just disappeared...

Did you use the complied btrfsck? Or use the system btrfsck by mistake?

Thanks,
Qu

--
This message has been scanned for viruses and
dangerous content by Fujitsu, and is believed to be clean.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shall distros run btrfsck on boot?(Off topic, btrfs per-inode tree idea)

2015-11-24 Thread Qu Wenruo



Hugo Mills wrote on 2015/11/24 22:33 +:

On Tue, Nov 24, 2015 at 04:26:47PM -0600, Eric Sandeen wrote:

On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote:


if the system was
shut down cleanly, you're fine barring software bugs, but if it
crashed, you should be running a check on the FS.


Um, no...

The *entire point* of having a journaling filesystem is that after a
crash or power loss, a journal replay on next mount will bring the
metadata into a consistent state.


Not an actual argument within the discussion, but an interesting
observation on a fine distinction:

It's interesting to note that there's a difference here between
journalling and CoW filesystems. A journalling FS needs a journal
replay to become consistent. A CoW FS is _always_ consistent, by
design. Now, btrfs has a log that should be replayed after an unclean
shutdown, but that's all about the data that got written within the
current transaction that wasn't committed,


In fact, log tree of btrfs is only used to speedup fsync. And there is a 
"notreelog" mount option to disable such log tree, if one uses it, fsync 
performance will just drop to the level of sync.


So it's just an optimization, although it's already quite away from the 
original topic, I think the best method for btrfs to improve fsync 
performance is to introduce something like ext*:


Per-file extent map tree.


The reason btrfs is slow on fsync is, file extent and inode info are all 
stored in the same tree(fs tree or subvolume tree).


To only fsync a inode, it's impossible only fsync all its file extents, 
but to sync the whole tree, which may just as slow as a full sync.


That's why log tree is introduced, only writeback file extents of an 
inode and record its metadata changes into the log tree.

And performance test result also supports this.


But other filesystem, at least ext* uses a better solution, each inode 
(no matter regular file or dir) has its own tree to record its file 
extents or dir entries.

Making fsync quite easy and straightforward.

If btrfs follows the same design, at least the random RW performance may 
have a boost and simplify the fsync codes.


Thanks,
Qu



rather than about FS
metadata consistency. This means that a read-only mount of btrfs can
_actually_ be read-only, not modifying any of the data on the disk,
whereas a read-only mount of a journalling FS _must_ modify the disk
data after an unclean shitdown, in order to be useful (because the FS
isn't consistent without the journal replay).

Hugo.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-mount improvemt suggestions

2015-11-24 Thread Christoph Anton Mitterer
Hey.

I though rather than just being around here and complaining all the
time about documentation I might help to improve the same a bit.

the btrfs-mount manpage could be a good start and I'd propose a more
structurised format as I've did it for the first few options:

*alloc_start='bytes'*::
(default: *1M*) +
Sets the start of block allocations on each of the filesystem’s devices
to happen above 'bytes' bytes (optionally followed by one of the case
insensitive suffixes *K* (for Ki), *M* (for Mi) and *G* (for Gi). +
+
Mainly used for debugging purposes.

*autodefrag*::
*noautodefrag*::
(default: *noautodefrag*; since: 3.0) +
Control whether auto-defragmentation is enabled (with *autodefrag*) or
disabled (with *noautodefrag*). +
+
Auto-defragmentation detects small random writes into files and queues
them up for defragmentation. Works best for small files and is not well
suited for large database or virtual machine image workloads.

*check_int*::
*check_int_data*::
*check_int_print_mask='bitmask'*::
(default: unset; since: 3.0) +
Control whether integrity checking module (which requires the kernel
option *BTRFS_FS_CHECK_INTEGRITY* to be enabled) is enabled (with
*check_int*, *check_int_data* or *check_int_print_mask*) or disabled
(when unset) as well as its operation mode. +
+
With *check_int* the integrity checking module examines all block write
requests in order to ensure on-disk consistency, at a large memory and
CPU cost. +
With *check_int_data*, which implies the option *check_int*, it further
includes extent data in the integrity checks.
With *check_int_print_mask*, its operation mode can be controlled by
a bitmask 'bitmask' of *BTRFSIC_PRINT_MASK_** values as defined in
`fs/btrfs/check-integrity.c`. +
+
See the comments at the top of `fs/btrfs/check-integrity.c` for more
information.

*commit='seconds'*::
(default: *30*; since: 3.12) +
Set periodic commit interval to 'seconds' seconds. +
+
This is the interval at which data is synchronised to the block device.
Higher values may improve performance but at the expense of loosing data
from a longer period in case of system crashes, et cetera. +
A warning is give for values of 'seconds' greater than *300*.

*compress[='type']*::
*compress-force[='type']*::
(default: unset) +
Control whether data compression is enabled (with *compress* or
*compress-force*) or or disabled (when unset). +
+
With *compress* only data that compresses well is going to be
compressed, while with *compress-force* data is compressed whether it
compresses well or not. +
The compression type can be set via 'type', with valid values being: +
*no* (disables compression, useful when re-mounting), *zlib* (the
default if no 'type' is set), *lzo*
+
Enabling compression implies the options *datacow* and *datasum*.


That would also include these changes:
commit 88a0ba7065e09497806ffc2a493ab72d0940e1af
Author: Christoph Anton Mitterer 
Date:   Wed Nov 25 02:51:25 2015 +0100

btrfs-progs: minor documentation improvements

Overhauled the formatting of symbols:
- Options, terminal-values or parts thereof are marked with *…*.
- Non-terminal-values are marked with '…'.
- Commands, pathnames are marked with `…`.
- Added missing marks and manpage references.
- Used the correct spelling of option names (lower-case).

Signed-off-by: Christoph Anton Mitterer 

commit 830f71df85232e12c3795bc5c0335c1c1150c2f4
Author: Christoph Anton Mitterer 
Date:   Wed Nov 25 02:09:10 2015 +0100

btrfs-progs: minor documentation improvements

Swapt the order of the default value and the availability.
The former seems much more important for daily use, while no one will care 
about
the version in which something was introduced in 10 years, as everyone has 
far
newer versions.

Signed-off-by: Christoph Anton Mitterer 

commit a1f913c9dd678fba10d134a651f67d01e8c8ae38
Author: Christoph Anton Mitterer 
Date:   Wed Nov 25 02:02:17 2015 +0100

btrfs-progs: minor documentation improvements

- Moved the documentation of all default values to the top of each option’s
  section.
- Added missing default values.
- Added missing line breaks.

Signed-off-by: Christoph Anton Mitterer 


If the responsible maintainer likes that style, I would continue to
refurbish the remainder of the manpage in it.
But since it's quite some work I 

Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Christoph Anton Mitterer
On Wed, 2015-11-25 at 08:59 +0800, Qu Wenruo wrote:
> Did you use the complied btrfsck? Or use the system btrfsck by
> mistake?
I'm pretty sure cause I already did the whole procedure twice, but let
me repeat and record it here just to be 100% sure:

$ make clean
Cleaning
$ md5sum cmds-check.c 
a7e7d871c3b666df6b56c724dbfd1d86  cmds-check.c
$ export CFLAGS="-g -O0 -Wall -D_FORTIFY_SOURCE=2"
$ ./configure --disable-convert --disable-documentation
[snip]
$ make
# ./btrfs check /dev/mapper/data-b
Checking filesystem on /dev/mapper/data-b
UUID: 250ddae1-7b37-4b22-89e9-4dc5886c810f
checking extents
[getting a cup of tea]
.. and voila it works...

which is kinda weird... I still have the previous run in the bash
history... and I *did* invoke ./btrfs and not btrfs.
Also I just haven't done any further patching... so it cannot be that
the patch wasn't applied before.

WTF?! Apparently I suffer from Gremlins :-/

*a little while later*

And back they are...(the errors)... o.O
This time I checked both of my devices that shown the symptoms
concurrently... data-b as above showed no errors.
data-old-a, came with the same errors as before.

Is there anything non-deterministic involved?


Cheers,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Christoph Anton Mitterer
Hey again.

So it seems that data-b is always fine (well at least three times in a
row) and data-old-a always gives errors.

including e.g:
bad extent [3067663679488, 3067663695872), type mismatch with chunk
bad extent [3067663876096, 3067663892480), type mismatch with chunk
bad extent [3067663892480, 3067663908864), type mismatch with chunk
bad extent [3067663908864, 3067663925248), type mismatch with chunk
bad extent [3067669348352, 3067669364736), type mismatch with chunk
bad extent [3067669430272, 3067669446656), type mismatch with chunk
bad extent [3067669659648, 3067669676032), type mismatch with chunk
bad extent [3067669790720, 3067669807104), type mismatch with chunk
bad extent [3067669807104, 3067669823488), type mismatch with chunk
bad extent [3067669823488, 3067669839872), type mismatch with chunk
bad extent [3067669872640, 3067669889024), type mismatch with chunk
bad extent [3067669921792, 3067669938176), type mismatch with chunk
bad extent [3067671805952, 3067671822336), type mismatch with chunk

I've started debugging (everything as before) with:
(gdb) break cmds-check.c:4387
Breakpoint 1 at 0x42cf2b: file cmds-check.c, line 4387.
(gdb) break cmds-check.c:4394
Breakpoint 2 at 0x42cf57: file cmds-check.c, line 4394.
(gdb) break cmds-check.c:4411
Breakpoint 3 at 0x42cfa6: file cmds-check.c, line 4411.
(gdb) break cmds-check.c:4421
Breakpoint 4 at 0x42d000: file cmds-check.c, line 4421.

Hit a:
Breakpoint 1, check_extent_type (rec=0x1a44130) at cmds-check.c:4387
4387rec->wrong_chunk_type = 1;
(gdb) bt
#0  check_extent_type (rec=0x1a44130) at cmds-check.c:4387
#1  0x0042d6a5 in add_extent_rec (extent_cache=0x7fffdf30, 
parent_key=0x0, parent_gen=0, start=1097665216512, nr=16384, 
extent_item_refs=1, is_root=0, inc_ref=0, set_checked=0, 
metadata=0, extent_rec=1, max_size=16384) at cmds-check.c:4576
#2  0x0042ecc9 in process_extent_item (root=0x919d20, 
extent_cache=0x7fffdf30, eb=0x1a0edb0, slot=95) at cmds-check.c:5142
#3  0x00430aea in run_next_block (root=0x919d20, bits=0x91e220, 
bits_nr=1024, last=0x7fffdb78, pending=0x7fffdf10, seen=0x7fffdf20, 
reada=0x7fffdf00, 
nodes=0x7fffdef0, extent_cache=0x7fffdf30, 
chunk_cache=0x7fffdf90, dev_cache=0x7fffdfa0, 
block_group_cache=0x7fffdf70, dev_extent_cache=0x7fffdf40, ri=0x6cef30)
at cmds-check.c:5960
#4  0x004356c4 in deal_root_from_list (list=0x7fffdc00, 
root=0x919d20, bits=0x91e220, bits_nr=1024, pending=0x7fffdf10, 
seen=0x7fffdf20, reada=0x7fffdf00, 
nodes=0x7fffdef0, extent_cache=0x7fffdf30, 
chunk_cache=0x7fffdf90, dev_cache=0x7fffdfa0, 
block_group_cache=0x7fffdf70, dev_extent_cache=0x7fffdf40)
at cmds-check.c:8014
#5  0x00435d91 in check_chunks_and_extents (root=0x919d20) at 
cmds-check.c:8181
#6  0x00438e2b in cmd_check (argc=1, argv=0x7fffe220) at 
cmds-check.c:9627
#7  0x00409d49 in main (argc=2, argv=0x7fffe220) at btrfs.c:252
(gdb) continue
Continuing.

Breakpoint 1, check_extent_type (rec=0x1a44130) at cmds-check.c:4387
4387rec->wrong_chunk_type = 1;
(gdb) bt
#0  check_extent_type (rec=0x1a44130) at cmds-check.c:4387
#1  0x0042d856 in add_tree_backref (extent_cache=0x7fffdf30, 
bytenr=1097665216512, parent=1314162819072, root=0, found_ref=0) at 
cmds-check.c:4624
#2  0x0042ede2 in process_extent_item (root=0x919d20, 
extent_cache=0x7fffdf30, eb=0x1a0edb0, slot=95) at cmds-check.c:5161
#3  0x00430aea in run_next_block (root=0x919d20, bits=0x91e220, 
bits_nr=1024, last=0x7fffdb78, pending=0x7fffdf10, seen=0x7fffdf20, 
reada=0x7fffdf00, 
nodes=0x7fffdef0, extent_cache=0x7fffdf30, 
chunk_cache=0x7fffdf90, dev_cache=0x7fffdfa0, 
block_group_cache=0x7fffdf70, dev_extent_cache=0x7fffdf40, ri=0x6cef30)
at cmds-check.c:5960
#4  0x004356c4 in deal_root_from_list (list=0x7fffdc00, 
root=0x919d20, bits=0x91e220, bits_nr=1024, pending=0x7fffdf10, 
seen=0x7fffdf20, reada=0x7fffdf00, 
nodes=0x7fffdef0, extent_cache=0x7fffdf30, 
chunk_cache=0x7fffdf90, dev_cache=0x7fffdfa0, 
block_group_cache=0x7fffdf70, dev_extent_cache=0x7fffdf40)
at cmds-check.c:8014
#5  0x00435d91 in check_chunks_and_extents (root=0x919d20) at 
cmds-check.c:8181
#6  0x00438e2b in cmd_check (argc=1, argv=0x7fffe220) at 
cmds-check.c:9627
#7  0x00409d49 in main (argc=2, argv=0x7fffe220) at btrfs.c:252

You've mentioned add_extent_rec() before, but that doesn't seem to
contain bytenr so I cannot break on it.

I tried it with add_tree_backref instead, maybe that's already helpful
for you until you give me further instructions on what to debug:
Breakpoint 5 at 0x42d84a: file cmds-check.c, line 4624.
(gdb) continue 
Continuing.

Breakpoint 5, add_tree_backref (extent_cache=0x7fffdf30, 

[PATCH v2] btrfs-progs: fsck: Fix a false alert where extent record has wrong metadata flag

2015-11-24 Thread Qu Wenruo
In process_extent_item(), it gives 'metadata' initial value 0, but for
non-skinny-metadata case, metadata extent can't be judged just from key
type and it forgot that case.

This causes a lot of false alert in non-skinny-metadata filesystem.

Fix it by set correct metadata value before calling add_extent_rec().

Reported-by: Christoph Anton Mitterer 
Signed-off-by: Qu Wenruo 
---
v2:
   Use bit AND instead of equal to check TREE_BLOCK bit.
---
 cmds-check.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index fd661d9..938b863 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5134,6 +5134,10 @@ static int process_extent_item(struct btrfs_root *root,
 
ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item);
refs = btrfs_extent_refs(eb, ei);
+   if (btrfs_extent_flags(eb, ei) & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+   metadata = 1;
+   else
+   metadata = 0;
 
add_extent_rec(extent_cache, NULL, 0, key.objectid, num_bytes,
   refs, 0, 0, 0, metadata, 1, num_bytes);
-- 
2.6.2


-- 
This message has been scanned for viruses and
dangerous content by Fujitsu, and is believed to be clean.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols and parents - how?

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 23:30 +, Hugo Mills wrote:
>    Yes, that makes sense.
Feel free to shamelessly use my idea (as well as the one to call btrfs'
RAID1 replica2 or something else)
:-O


>    With a recent mv
root@heisenberg:/mnt# mv --version
mv (GNU coreutils) 8.23

=> not recent enough...


> but I think you'll find that the UUID of the subvols changes. (At
> least, I hope it does. If it doesn't, then my mental model of what
> the FS is doing is *really* screwed up).
Well... see below:

root@heisenberg:~# truncate  -s 2G image
root@heisenberg:~# losetup -f image 
root@heisenberg:~# mkfs.btrfs /dev/loop0 
btrfs-progs v4.3
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM (2.00GiB) ...
Label:  (null)
UUID:   10e1a55c-448a-4f37-ae5c-6a7801a7f202
Node size:  16384
Sector size:4096
Filesystem size:2.00GiB
Block group profiles:
  Data: single8.00MiB
  Metadata: DUP 110.38MiB
  System:   DUP  12.00MiB
SSD detected:   no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   IDSIZE  PATH
1 2.00GiB  /dev/loop0

root@heisenberg:~# mount /dev/loop0 /mnt/
root@heisenberg:/mnt# btrfs subvolume create root
Create subvolume './root'
root@heisenberg:/mnt# btrfs subvolume create below-top
Create subvolume './below-top'
root@heisenberg:/mnt# cd root/
root@heisenberg:/mnt/root# btrfs subvolume create below-root
Create subvolume './below-root'
root@heisenberg:/mnt# btrfs subvolume list /mnt/ -pacguqt
ID  gen cgenparent  top level   parent_uuid uuidpath
--  --- --  -   --- 
257 9   7   5   5   -   
8fbf521e-77f9-0d49-9891-87767f98c655root
258 8   8   5   5   -   
b49131e9-4207-aa42-8195-c50de5f06136below-top
259 9   9   257 257 -   
20c042be-ead8-204a-a684-94c1a770e739/root/below-root

root@heisenberg:/mnt# mv root/below-root/ tmp
root@heisenberg:/mnt# mv below-top/ root/
root@heisenberg:/mnt# mv tmp/ below-root
root@heisenberg:/mnt# btrfs subvolume list /mnt/ -pacguqt
ID  gen cgenparent  top level   parent_uuid uuidpath
--  --- --  -   --- 
257 9   7   5   5   -   
8fbf521e-77f9-0d49-9891-87767f98c655root
258 8   8   257 257 -   
b49131e9-4207-aa42-8195-c50de5f06136/root/below-top
259 9   9   5   5   -   
20c042be-ead8-204a-a684-94c1a770e739below-root
root@heisenberg:/mnt# 


So the UUIDs seem to stay the same (or are these other UUIDs?)

Hope I haven't ruined your day now ;-)


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: subvols and parents - how?

2015-11-24 Thread Hugo Mills
On Wed, Nov 25, 2015 at 12:20:09AM +0100, Christoph Anton Mitterer wrote:
> On Tue, 2015-11-24 at 21:55 +, Hugo Mills wrote:
> >    In practice, new content is checked by a number of people when
> > it's
> > put in, so the case of someone putting random poorly-thought-out crap
> > in the wiki isn't particularly likely to happen.
> Well... it may work in 99% cases... but there could something slip
> through, which isn't as easy the case in manpages, which also tend to
> be less messy than the huge pile of wiki pages where similar/related
> things are described on different pages.
> 
> Imagine a case, a non-experienced user update the wiki saying that --
> repair should be used, he may not even doing it in bad faith, perhaps
> he had success with it and now writes a recipe.
> It may take a while until someone of the more experienced guys notices
> that and corrects it.

   You can get update notifications, with the diffs for each page, via
RSS. (At least, I do, and I think David and a few others monitor it in
the same way). The window of failure is fairly small, particularly in
view of the number of such "dangerous" changes made. The total
vulnerability is measured in hours per year...

> But if ", in the meantime had some fs corruptions,... I may experience
> already severe problems by following that suggestion... (and while I do
> have many backups of all my data, others may not, and if their life's
> data is concerned, they'd be screwed).
> 
> So even if it takes you just a few hours to correct such rubbish, you
> know that Murphy's law may still hit n people during that time ;-)
> 
> 
> > Please feel free to add the things you'd like to see. As I said
> > above, we do check the docs on the wiki as they're changed, so if
> > you're wrong on some details, it won't be a major issue for very
> > long. If you want to discuss details before you write something,
> > start
> > a conversation -- either on here, or on IRC (or even on the Talk
> > pages
> > of the wiki).
> Well I can write a list together of things which I think should be part
> of some more general documentation (i.e. less documentation about
> options of the tools)... questions a complete newcomer to btrfs may
> have who needs however more than "just a filesystem".
> 
> 
> >    Note that the "parent" of send -p and of snapshots is not the same
> > relationship as the "parent" (containing subvol) of the tree
> > structure. This is an awkward nomenclature problem, and I'm not sure
> > how to fix it.
> Yeah, that was clear... :-)
> Maybe call the "parent" from send -p "base" or something like that...
> IMHO that would fit more as the parent there is more like a
> "fundament".

   Yes, that makes sense.

> Anyway, it's still not as bad as the usage of "RAID1" ;-)
> 
> 
> > because
> > you can't rename a subvol across another subvol boundary.
> That's not quite clear to me... I had subvols like that:
> /top/root/below-root
> /top/below-top
> and was able to move that to:
> /top/root/below-top
> /top/below-root
> 
> i.e. not just changing names but swapping as in:
> mv /top/root/below-top /top/tmp
> mv /top/below-root /top/root/below-root
> mv /top/tmp /top/below-top
> 
> with top, root, below-top and below-root all being the same subvols

   With a recent mv, it'll be doing doing reflink copies followed by
delete for all of the contents, which makes it pretty efficient, but I
think you'll find that the UUID of the subvols changes. (At least, I
hope it does. If it doesn't, then my mental model of what the FS is
doing is *really* screwed up).

   Hugo.

> Thanks a lot for your explanations :)
> 
> Chris.



-- 
Hugo Mills | Never underestimate the bandwidth of a Volvo filled
hugo@... carfax.org.uk | with backup tapes.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: shall distros run btrfsck on boot?

2015-11-24 Thread Hugo Mills
On Wed, Nov 25, 2015 at 12:01:49AM +0100, Christoph Anton Mitterer wrote:
> On Tue, 2015-11-24 at 22:33 +, Hugo Mills wrote:
> > whereas a read-only mount of a journalling FS _must_ modify the disk
> > data after an unclean shitdown, in order to be useful (because the FS
> > isn't consistent without the journal replay).
> I've always considered that rather a bug,... or at least a very
> annoying handling in ext*
> If I specify "read-only" than nothing should ever be written.
> If that's not possible because of an unclean shutdown and a journal
> that needs to be replayed, the mount should (without any further
> special option) rather fail then mount it pseudo-read-only.

   At one point, I _think_, btrfs did replay the log tree
unconditionally, even on a RO mount, but it doesn't any more. There
was certainly some discussion on the point. It's actually quite handy
sometimes -- if you have a corrupt log tree, you can check it by
mounting RO (when it works) and RW (when it fails because the log tree
is broken), and the do btrfs-zero-log to clear it.

   For the record, this is about the only good use for btrfs-zero-log.
It doesn't magically fix anything else. (Yes, this is another futile
attempt at killing the persistent "btrfs-zero-log fixes everything"
meme that's been doing the rounds for the last few years).

   Hugo.

-- 
Hugo Mills | Never underestimate the bandwidth of a Volvo filled
hugo@... carfax.org.uk | with backup tapes.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Qu Wenruo



On 11/24/2015 09:15 PM, Laurent Bonnaud wrote:

On 23/11/2015 02:00, Qu Wenruo wrote:


Considering the size, I'd like not to touch the dump, metadata is over 5G,


It is only 2GB once compressed :>.


The size seems small enough, I'll try to download it as it's super 
useful to debug it.





and I think it's not related to on-disk data, but runtime problem like I 
mentioned above.


To test this hypothesis I did the following:

  - reboot the machine with a 4.3.0 kernel from Debian experimental
  - run "du" on the btrfs FS as a quick sanity check


Nice reproducer.
Is it 100% reproducible or has a chance to reproduce?

Thanks,
Qu



The kernel went read-only again with the following kernel errors:

[ 5759.890934] BTRFS info (device sdb1): disk space caching is enabled
[ 5773.278244] BTRFS warning (device sdb1): block group 314635714560 has wrong 
amount of free space
[ 5773.278247] BTRFS warning (device sdb1): failed to load free space cache for 
block group 314635714560, rebuild it now
[ 5773.947885] [ cut here ]
[ 5773.947908] WARNING: CPU: 0 PID: 2546 at 
/build/linux-7sjCdl/linux-4.3/fs/btrfs/extent-tree.c:2851 
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
[ 5773.947909] BTRFS: Transaction aborted (error -17)
[ 5773.947910] Modules linked in: xt_multiport cpufreq_conservative 
cpufreq_powersave cpufreq_userspace cpufreq_stats ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_analog 
snd_hda_codec_generic dell_wmi iTCO_wdt iTCO_vendor_support sparse_keymap evdev 
coretemp kvm_intel dcdbas snd_hda_intel snd_hda_codec snd_hda_core kvm 
snd_hwdep i915 snd_pcm_oss snd_mixer_oss pcspkr sg snd_pcm psmouse lpc_ich 
mfd_core serio_raw i2c_i801 snd_timer snd shpchp tpm_tis video drm_kms_helper 
drm soundcore mei_me mei i2c_algo_bit wmi tpm 8250_fintek button acpi_cpufreq 
processor ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi drbd lru_cache libcrc32c parport_pc 
ppdev lp parport loop dm_crypt dm_mod autofs4 ext4 crc16 mbcache
[ 5773.947951]  jbd2 crc32c_generic btrfs xor raid6_pq md_mod ses enclosure 
hid_generic usbhid hid sd_mod uas usb_storage ahci libahci ata_generic libata 
scsi_mod e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[ 5773.947967] CPU: 0 PID: 2546 Comm: kworker/u16:2 Not tainted 
4.3.0-trunk-amd64 #1 Debian 4.3-1~exp1
[ 5773.947968] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, 
BIOS A08 01/21/2011
[ 5773.947981] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 5773.947983]  a02a8250 812c53a9 8800af283d30 
8106ebad
[ 5773.947985]  8800ace5eae0 8800af283d80 8800ac6ade70 
8800ac6add10
[ 5773.947987]  0020 8106ec2c a02a8420 
0020
[ 5773.947989] Call Trace:
[ 5773.947994]  [] ? dump_stack+0x40/0x57
[ 5773.947997]  [] ? warn_slowpath_common+0x7d/0xb0
[ 5773.947999]  [] ? warn_slowpath_fmt+0x4c/0x50
[ 5773.948019]  [] ? btrfs_run_delayed_refs+0x26b/0x2a0 
[btrfs]
[ 5773.948027]  [] ? delayed_ref_async_start+0x32/0x80 [btrfs]
[ 5773.948039]  [] ? btrfs_scrubparity_helper+0xc8/0x260 
[btrfs]
[ 5773.948041]  [] ? process_one_work+0x19f/0x3d0
[ 5773.948043]  [] ? worker_thread+0x4d/0x450
[ 5773.948044]  [] ? process_one_work+0x3d0/0x3d0
[ 5773.948046]  [] ? kthread+0xbd/0xe0
[ 5773.948048]  [] ? kthread_create_on_node+0x170/0x170
[ 5773.948051]  [] ? ret_from_fork+0x3f/0x70
[ 5773.948053]  [] ? kthread_create_on_node+0x170/0x170
[ 5773.948054] ---[ end trace 654b175f2543b4e4 ]---
[ 5773.948057] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2851: 
errno=-17 Object already exists
[ 5773.948092] BTRFS info (device sdb1): forced readonly
[ 5936.235238] perf interrupt took too long (2502 > 2500), lowering 
kernel.perf_event_max_sample_rate to 5
[ 6427.280125] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.288873] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.381126] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.381747] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[...]

Are you interested in the btrfs-image output now ?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Filipe Manana
On Sun, Nov 22, 2015 at 9:59 PM, Nils Steinger  wrote:
> Hi,
>
> I recently ran into a problem while trying to back up some of my btrfs
> subvolumes over the network:
> `btrfs send` works flawlessly on snapshots of most subvolumes, but keeps
> failing on snapshots of a certain subvolume — always after sending 15 GiB:
>
> btrfs send /btrfs/snapshots/home/2015-11-17_03:28:14_BOOT-AUTOSNAPSHOT |
> pv | ssh kappa "btrfs receive /mnt/300gb/backups/snapshots/zeta/home/"
> At subvol /btrfs/snapshots/home/2015-11-17_03:28:14_BOOT-AUTOSNAPSHOT
> At subvol 2015-11-17_03:28:14_BOOT-AUTOSNAPSHOT
> ERROR: send ioctl failed with -2: No such file or directory
>   15GB 0:34:34 [7,41MB/s]

Which kernel version?

Try with a 4.3 kernel (or the latest you can, like a 4.2 or 4.1). If
it persists, you can create an image of your filesystem like this for
example:

btrfs-image -c 9 /dev/whatever fs.img

The image won't contain your data (it will all be replaced with
zeroes) but file and directory names and xattrs will remain untouched
(there's an option to sanitize file names, but that might not help
debugging what's going on with send).
If this is an option for you, you can send me the image for debugging
and getting the bug fixed - but please make sure you try a recent
kernel first (ideally 4.3) to see if the problem reproduces there, the
send code (like the rest of btrfs and the linux kernel) keeps changing
between kernel versions (bug fixes, etc).

>
> I've tried piping the output to /dev/null instead of ssh and got the
> same error (again after sending 15 GiB), so this seems to be on the
> sending side.
>
> However, btrfs scrub reports no errors and I don't get any messages in
> dmesg when the btrfs send fails.
>
> What could cause this kind of error?
> And is there a way to fix it, preferably without recreating the FS?
>
>
> Regards,
> Nils Steinger
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols and parents - how?

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 08:29 +, Duncan wrote:
> OK, found it on the wiki.  It wasn't under use-cases, where I
> initially 
> thought to look, but under sysadmin guide.  Specifically, see section
> 4.2, managing snapshots, but I'd suggest reading the entire
> subvolumes 
> discussion, section 4, or even most/all of the page.
> 
> https://btrfs.wiki.kernel.org/index.php/SysadminGuide
Well I've had read that, but it's pretty vague and especially doesn't
mentioned any of the filesystem internal implications (if there are
any)... like I wondered before, whether this has effects on ref-links
not being used when e.g. sending/recieving ... or on future planned
extensions like recursive snapshots.


> 
> Suppose you only want to rollback /, because some update screwed you
> up, 
> but not /home, which is fine.  If /home is a nested subvolume, then 
> you're now mounting the nested home subvolume from some other nesting
> tree entirely,
That's a bit unclear to me,... I thought when I make a snapshot, any
nested subvols wouldn't be snapshotted and thus be empty dirs.
So I'd have rather that if I would simply have no /home (if I didn't
move it to the rolled-back subvol manually)


> 5
> > 
> +-roots (dir not subvol, note the s, rootS, plural)
> > +-root (subvol, mountpoint /)
> > > +-boot/ (dir)
> > > +-root/ (dir)
> > > +-lib/  (dir)
> > > +-home/ (empty dir as mountpoint)
> > +-root-snapshot-2015.0301 (dated snapshot of root)
> > +-root-snapshot-2015.0601 (dated snapshot of root)
> > +-root-snapshot-2015.0901 (dated snapshot of root)
> +-homes (dir not subvol)
> > +-home (subvol, mountpoint /home)
> > +-home-snapshot-2015.0301 (dated snapshot of home)
> ...
That's more what I've had in mind...
Actually something like this:
5
|
+-root       (=subvol)
| +-boot
| +-home     (subvo=/home being mounted heron)
| +-lib
+-home       (subvol, the current version)
+-snapshots  (=dir)
  +-root
  | +-2015-01-14 (subvol, snapshot)
  | +-2015-09-30 (subvol, snapshot)
  +-home
    +-2015-06-04 (subvol, snapshot)
    +-2015-09-01 (subvol, snapshot)


And it once more points to the problem of the wiki... anyone can write
(I think even I) and it's totally unclear at the first glance how
"[non-]official" and outdated something may be.
Apart from the problem that many important questions (from the PoV of
an more advanced admin that doesn't just do mkfs.btrfs and then never
again thinks about it) :-(


> Meanwhile, if the intention for a subvolume is simply to exclude that
> subtree from snapshotting of the parent, as might be the case for
> example 
that is in fact also use case.. so in practise I'll probably have a mix
of (a) and (b).


> if you have a VMs subvol, with the VM image files set NOCOW to avoid 
> fragmentation, since snapshotting nocow files forces cow1 (a cow at
> the 
> first write of that block, before returning to nocow, because a
> snapshot 
> locks the existing extents in place for the snapshot, so initial
> writes 
> to a block after a snapshot /can't be nocow or it'd change the
> snapshot 
> too...)
Ah that's good to know how that works (or better said, that it works at
all)... I've already wondered myself several times what happens when I
snapshot NOCOW files, ... something that's I guess also missing from
the (missing ;-) ) btrfs-end-user overview and details documentation.


> OTOH, if there's a chance you might want to mount that subvolume in a
> roll-back scenario, under the snapshot you're rolling back to, then
> it 
> makes sense to put it directly under ID 5 again, and mount it in any
> case.
Yes.


> Then there's the security angle to consider.  With the (basically, 
> possibly modified as I suggested) flat layout, mounting something
> doesn't 
> automatically give people in-tree access to nested subvolumes
> (subject to 
> normal file permissions, of course), like nested layout does.  And
> with 
> (possibly modified) flat layout, the whole subvolume tree doesn't
> need to 
> be mounted all the time either, only when you're actually working
> with 
> subvolumes.
Uhm, I don't get the big security advantage here... whether nested or
manually mounted to a subdir,... if the permissions are insecure I'll
have a problem... if they're secure, than not.

Of course I use insecure permissions and don't mount the subvols then
I'd be safe in flat setup, but not in a nested setup...(where they
subvol is "auto-mounted")...

But that seems pretty awkward.



Mhh I think my main question turns out to be whether the different
layouts would have any technical (i.e. not administrative) effects...
like the aforementioned stuff of recursive snapshots (should such thing
ever come to life).
But at least from the userspace tools it seems that I can move subvols
arbitrarily and they adapt their parent IDs accordingly...

So I guess whatever setup one uses nested/flat/mixed... doesn't rule
out any functionalities for the future?!

thx,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: subvols and parents - how?

2015-11-24 Thread Hugo Mills
On Tue, Nov 24, 2015 at 10:25:50PM +0100, Christoph Anton Mitterer wrote:
> On Tue, 2015-11-24 at 08:29 +, Duncan wrote:
> > OK, found it on the wiki.  It wasn't under use-cases, where I
> > initially 
> > thought to look, but under sysadmin guide.  Specifically, see section
> > 4.2, managing snapshots, but I'd suggest reading the entire
> > subvolumes 
> > discussion, section 4, or even most/all of the page.
> > 
> > https://btrfs.wiki.kernel.org/index.php/SysadminGuide
> Well I've had read that, but it's pretty vague and especially doesn't
> mentioned any of the filesystem internal implications (if there are
> any)... like I wondered before, whether this has effects on ref-links
> not being used when e.g. sending/recieving ... or on future planned
> extensions like recursive snapshots.

   No, there's no particular implications one way or the other in
terms of reflinks. Obviously, if recursive snapshots get implemented,
it'll remove some of the current awkwardness with nested subvols, but
it won't invalidate any existing setups that use the recommendations
in the Sysadmin Guide.

> > Suppose you only want to rollback /, because some update screwed you
> > up, 
> > but not /home, which is fine.  If /home is a nested subvolume, then 
> > you're now mounting the nested home subvolume from some other nesting
> > tree entirely,
> That's a bit unclear to me,... I thought when I make a snapshot, any
> nested subvols wouldn't be snapshotted and thus be empty dirs.

   Correct. This is actually a use-case for nested subvols, and one
which snapper uses -- the target for snapshots of /foo is
/foo/.snapshots/$date, where /foo/.snapshots is a subvol in its own
right. So, if you have a subdir which you won't want to include in
snapshots of a subvol, make it a subvol itself.

> So I'd have rather that if I would simply have no /home (if I didn't
> move it to the rolled-back subvol manually)

   Yup, this is one of the main reasons for not nesting subvols.

> > 5
> > > 
> > +-roots (dir not subvol, note the s, rootS, plural)
> > > +-root (subvol, mountpoint /)
> > > > +-boot/ (dir)
> > > > +-root/ (dir)
> > > > +-lib/  (dir)
> > > > +-home/ (empty dir as mountpoint)
> > > +-root-snapshot-2015.0301 (dated snapshot of root)
> > > +-root-snapshot-2015.0601 (dated snapshot of root)
> > > +-root-snapshot-2015.0901 (dated snapshot of root)
> > +-homes (dir not subvol)
> > > +-home (subvol, mountpoint /home)
> > > +-home-snapshot-2015.0301 (dated snapshot of home)
> > ...
> That's more what I've had in mind...
> Actually something like this:
> 5
> |
> +-root       (=subvol)
> | +-boot
> | +-home     (subvo=/home being mounted heron)
> | +-lib
> +-home       (subvol, the current version)
> +-snapshots  (=dir)
>   +-root
>   | +-2015-01-14 (subvol, snapshot)
>   | +-2015-09-30 (subvol, snapshot)
>   +-home
>     +-2015-06-04 (subvol, snapshot)
>     +-2015-09-01 (subvol, snapshot)
> 
> 
> And it once more points to the problem of the wiki... anyone can write
> (I think even I) and it's totally unclear at the first glance how
> "[non-]official" and outdated something may be.
> Apart from the problem that many important questions (from the PoV of
> an more advanced admin that doesn't just do mkfs.btrfs and then never
> again thinks about it) :-(

   In practice, new content is checked by a number of people when it's
put in, so the case of someone putting random poorly-thought-out crap
in the wiki isn't particularly likely to happen.

> > Meanwhile, if the intention for a subvolume is simply to exclude that
> > subtree from snapshotting of the parent, as might be the case for
> > example 
> that is in fact also use case.. so in practise I'll probably have a mix
> of (a) and (b).
> 
> 
> > if you have a VMs subvol, with the VM image files set NOCOW to avoid 
> > fragmentation, since snapshotting nocow files forces cow1 (a cow at
> > the 
> > first write of that block, before returning to nocow, because a
> > snapshot 
> > locks the existing extents in place for the snapshot, so initial
> > writes 
> > to a block after a snapshot /can't be nocow or it'd change the
> > snapshot 
> > too...)
> Ah that's good to know how that works (or better said, that it works at
> all)... I've already wondered myself several times what happens when I
> snapshot NOCOW files, ... something that's I guess also missing from
> the (missing ;-) ) btrfs-end-user overview and details documentation.

   Please feel free to add the things you'd like to see. As I said
above, we do check the docs on the wiki as they're changed, so if
you're wrong on some details, it won't be a major issue for very
long. If you want to discuss details before you write something, start
a conversation -- either on here, or on IRC (or even on the Talk pages
of the wiki).

> > OTOH, if there's a chance you might want to mount that subvolume in a
> > roll-back scenario, under the snapshot you're rolling back to, then
> > it 
> > makes sense to put it directly under ID 5 again, 

Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Austin S Hemmelgarn

On 2015-11-24 15:50, Christoph Anton Mitterer wrote:

On Tue, 2015-11-24 at 15:44 -0500, Austin S Hemmelgarn wrote:

I would say it's currently usable for one-shot stuff, but probably
not
reliably useable for automated things without some kind of
administrative oversight.  In theory, it wouldn't be hard to write a
script to automate fixing this particular issue when send encounters
it,
but that has it's own issues (you have to either toggle the snapshot
writable temporarily, or modify the source and re-snapshot).


Well AFAIU, *this* very issue is at least something that bails out
loudly with an error... I rather worry about cases where send/receive
just exits without any error (status or message) and still didn't
manage to correctly copy everything.

The case that I had was that I incrementally send/received (with -p)
backups to another disk.
At some point in time I removed one of the older snapshots on that
backup disk... and then had fs errors... as if the data would have been
gone.. :(

I had tried using send/receive once with -p, but had numerous issues. 
The incrementals I've been doing have used -c instead, and I hadn't had 
any issues with data loss with that.  The issue outlined here was only a 
small part of why I stopped using it for backups.  The main reason was 
to provide better consistency between my local copies and what I upload 
to S3/Dropbox, meaning I only have to test one back up image per 
filesystem backed-up, instead of two.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 15:58 -0500, Austin S Hemmelgarn wrote:
> I had tried using send/receive once with -p, but had numerous issues.
 
> The incrementals I've been doing have used -c instead, and I hadn't had 
> any issues with data loss with that.  The issue outlined here was only a 
> small part of why I stopped using it for backups.  The main reason was 
> to provide better consistency between my local copies and what I upload 
> to S3/Dropbox, meaning I only have to test one back up image per 
> filesystem backed-up, instead of two.

Okay maybe I just don't understand how to use send/receive correctly...


What I have is about the following (simplified):

master-fs:
5
|
+--data (subvol, my precious data)
|
+--snapshots
   |
   +--2015-11-01 (suvol, ro-snapshot of /data)

So 2015-11-01 is basically the first snapshot ever made.

Now I want to have it on:
backup-fs
+--2015-11-01 (suvol, ro-snapshot of /data)


So far I did
btrfs send /master-fs/snapshots/2015-11-01 | btrfs receive /backup-fs/2015-11-01




Then time goes by and I get new content in the data subvol, so what
I'd like to have then is a new snapshot on the master-fs:
5
|
+--data (subvol, more of my precious data)
|
+--snapshots
   |
   +--2015-11-01 (suvol, ro-snapshot of /data)
   +--2015-11-20 (suvol, ro-snapshot of /data)

And this should go incrementally on backup-fs:
backup-fs
+--2015-11-01 (suvol, ro-snapshot of /data)
+--2015-11-20
(suvol, ro-snapshot of /data)

So far I used something like:
btrfs send -p 2015-11-01 /master-fs/snapshots/2015-11-20 | btrfs receive 
/backup-fs/2015-11-20

And obviously I want it to share all the ref-links and stuf...


So in other words, what's the difference between -p and -c? :D


Thx,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs check help

2015-11-24 Thread Austin S Hemmelgarn

On 2015-11-24 12:06, Vincent Olivier wrote:

Hi,

Woke up this morning with a kernel panic (for which I do not have details). 
Please find below the output for btrfs check. Is this normal ? What should I do 
? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10.
You get bonus points for being on a reasonably up-to-date kernel and 
userspace :)


This is actually a pretty tame check result for a filesystem that's been 
through kernel panic. I think everything listed here is safe for check 
to fix, but I would suggest waiting until the devs provide opinions 
before actually running with --repair.  I would also suggest comparing 
results between the different devices in the FS, if things are 
drastically different, you may have issues that check can't fix on it's own.

[root@3dcpc5 ~]# btrfs check /dev/sdk
Checking filesystem on /dev/sdk
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents
checking free space cache
checking fs roots
These next two lines are errors, but I'm not 100% certain if it's safe 
to have check fix them:

root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong
This next one is also an error, and I am fairly certain that it's safe 
to have check fix as long as the number at the end is not too big.

found 19328809638262 bytes used err is 1

The rest is just reference info

total csum bytes: 18849042724
total tree bytes: 27389886464
total fs tree bytes: 4449746944
total extent tree bytes: 3075457024
btree space waste bytes: 2880474254
The only other thing I know that's worth mentioning is that if the 
numbers on these next two lines don't match, you may be missing some 
writes from right before the crash.

file data blocks allocated: 19430708535296
referenced 20123773407232






smime.p7s
Description: S/MIME Cryptographic Signature


Re: shall distros run btrfsck on boot?

2015-11-24 Thread Austin S Hemmelgarn

On 2015-11-24 12:23, Christoph Anton Mitterer wrote:

On Tue, 2015-11-24 at 11:14 -0600, Eric Sandeen wrote:

In a nutshell, though, I think a filesystem repair should be an
admin-initiated
action, not something that surprises you on a boot, at least for a
journaling
filesystem which is designed to maintain its integrity even in the
face of
a power loss or crash.


Well I wouldn't agree here... I maintain some >2PiB of storage for a
LHC Tier-2,... right now everything with ext4.
During normal operation we can of course not have any fsck, but every
now and then, when we reboot, it happens automatically,... and
regularly shows at least some (apparently non-serious) glitches.
Yeah, that's pretty normal for any large storage array with a high 
uptime.  ext4 also doesn't correct anything on the fly, so it's more 
important that you always run a check on boot when you don't reboot 
often (which brings up why i personally suggest stuff like GlusterFS or 
Ceph for large scale data storage, you can reboot individual nodes one 
at a time, have zero down time, and maintain a high degree of 
performance and data safety).


IMHO, either the kernel driver itself already checks "everything", then
we wouldn't need a dedicated check tool.
Or it does not, but in that case, there will be people who want to have
that in-depth checks run regularly (and even if it's just every half a
year).
I better wait half an hour at boot, and find such errors, rather than
that they silently pile up until I really run into troubles.
Well, that depends on the type of errors.  XFS doesn't need a fsck on 
mount usually, but there is still a xfs_repair tool for fixing badly 
damaged filesystems that the kernel can't mount.  btrfs check falls into 
the same general usage as XFS repair, IOW, if the system was shut down 
cleanly, you're fine barring software bugs, but if it crashed, you 
should be running a check on the FS.  Like I mentioned above, ext4 
doesn't correct errors while online, it either (depending on how the fs 
is configured) ignores them, goes read-only, or panics the system. 
BTRFS on the other hand, can correct many types of errors while online 
(that's part of what scrub is for), and is usually pretty resilient when 
it comes to disk errors (I have a few TB worth of data on assorted BTRFS 
filesystems, I run scrubs on them weekly (which usually turns up about a 
single block error across the whole data set per month), and run a check 
on them monthly, which has never turned up anything unless the system 
had crashed).


That being said, of course it should be configurable for the admin...
and it is, via fstab.
So apart from that, given the expectation that btrfsck should be rock-
solid as e.g. e2fsck in some future, I wouldn't see why people
shouldn't have the necessary facilities to have it auto-run.
btrfsck has to parse all the data in the FS, and unlike ext4, BTRFS has 
multiple copies of each metadata block (and often on large filesystems, 
is configured for multiple copies of each data block), and has checksums 
on _everything_, which need to be validated.  There is no way that this 
can be made all that much faster short of getting faster hardware to run 
it on.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Hugo Mills
On Tue, Nov 24, 2015 at 10:17:13PM +0100, Christoph Anton Mitterer wrote:
> On Tue, 2015-11-24 at 15:58 -0500, Austin S Hemmelgarn wrote:
> > I had tried using send/receive once with -p, but had numerous issues.
>  
> > The incrementals I've been doing have used -c instead, and I hadn't had 
> > any issues with data loss with that.  The issue outlined here was only a 
> > small part of why I stopped using it for backups.  The main reason was 
> > to provide better consistency between my local copies and what I upload 
> > to S3/Dropbox, meaning I only have to test one back up image per 
> > filesystem backed-up, instead of two.
> 
> Okay maybe I just don't understand how to use send/receive correctly...
> 
> 
> What I have is about the following (simplified):
> 
> master-fs:
> 5
> |
> +--data (subvol, my precious data)
> |
> +--snapshots
>    |
>    +--2015-11-01 (suvol, ro-snapshot of /data)
> 
> So 2015-11-01 is basically the first snapshot ever made.
> 
> Now I want to have it on:
> backup-fs
> +--2015-11-01 (suvol, ro-snapshot of /data)
> 
> 
> So far I did
> btrfs send /master-fs/snapshots/2015-11-01 | btrfs receive 
> /backup-fs/2015-11-01
> 
> 
> 
> 
> Then time goes by and I get new content in the data subvol, so what
> I'd like to have then is a new snapshot on the master-fs:
> 5
> |
> +--data (subvol, more of my precious data)
> |
> +--snapshots
>    |
>    +--2015-11-01 (suvol, ro-snapshot of /data)
>    +--2015-11-20 (suvol, ro-snapshot of /data)
> 
> And this should go incrementally on backup-fs:
> backup-fs
> +--2015-11-01 (suvol, ro-snapshot of /data)
> +--2015-11-20
> (suvol, ro-snapshot of /data)
> 
> So far I used something like:
> btrfs send -p 2015-11-01 /master-fs/snapshots/2015-11-20 | btrfs receive 
> /backup-fs/2015-11-20
> 
> And obviously I want it to share all the ref-links and stuf...
> 
> 
> So in other words, what's the difference between -p and -c? :D

   -p only sends the file metadata for the changes from the reference
snapshot to the sent snapshot. -c sends all the file metadata, but
will preserve the reflinks between the sent snapshot and the (one or
more) reference snapshots. You can only use one -p (because there's
only one difference you can compute at any one time), but you can use
as many -c as you like (because you can share extents with any number
of subvols).

   In both cases, the reference snapshot(s) must exist on the
receiving side.

   In implementation terms, on the receiver, -p takes a (writable)
snapshot of the reference subvol, and modifies it according to the
stream data. -c makes a new empty subvol, and populates it from
scratch, using the reflink ioctl to use data which is known to exist
in the reference subvols.

   Hugo.

-- 
Hugo Mills | Anyone who claims their cryptographic protocol is
hugo@... carfax.org.uk | secure is either a genius or a fool. Given the
http://carfax.org.uk/  | genius/fool ratio for our species, the odds aren't
PGP: E2AB1DE4  | good.  Bruce Schneier


signature.asc
Description: Digital signature


Re: [PATCH v2 1/5] btrfs-progs: introduce framework to check kernel supported features

2015-11-24 Thread Austin S Hemmelgarn

On 2015-11-24 09:39, Mike Fleetwood wrote:

On 23 November 2015 at 12:56, Anand Jain  wrote:

In the newer kernel, supported kernel features can be known from
   /sys/fs/btrfs/features
however this interface was introduced only after 3.14, and most the
incompatible FS features were introduce before 3.14.

This patch proposes to maintain kernel version against the feature list,
and so that will be the minimum kernel version needed to use the feature.

Further, for features supported later than 3.14 this list can still be
updated, so it serves as a repository which can be displayed for easy
reference.

Signed-off-by: Anand Jain 
---
v2: Check for condition that what happens when we fail to read kernel
 version. Now the code will fail back to use the default as set by
 the progs.

  utils.c | 80 -
  utils.h |  1 +
  2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/utils.c b/utils.c
index b754686..24042e5 100644
--- a/utils.c
+++ b/utils.c
@@ -32,10 +32,12 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
  #include 
+#include 
  #include 

  #include "kerncompat.h"
@@ -567,21 +569,28 @@ out:
 return ret;
  }

+/*
+ * min_ker_ver: update with minimum kernel version at which the feature
+ * was integrated into the mainline. For the transit period, that is
+ * feature not yet in mainline but in mailing list and for testing,
+ * please use "0.0" to indicate the same.
+ */
  static const struct btrfs_fs_feature {
 const char *name;
 u64 flag;
 const char *desc;
+   const char *min_ker_ver;
  } mkfs_features[] = {
 { "mixed-bg", BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS,
-   "mixed data and metadata block groups" },
+   "mixed data and metadata block groups", "2.7.31"},

I think you mean 2.6.37 here.
67377734fd24c3 "Btrfs: add support for mixed data+metadata block groups"

This brings up a rather important question:
Should compat-X.Y mean features that were considered usable in that 
version, or everything that version offered?  I understand wanting 
consistency with the kernel versions, but we shouldn't be creating 
filesystems that we know will break on the specified kernel even if it 
is mountable on it.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 15:44 -0500, Austin S Hemmelgarn wrote:
> I would say it's currently usable for one-shot stuff, but probably
> not 
> reliably useable for automated things without some kind of 
> administrative oversight.  In theory, it wouldn't be hard to write a 
> script to automate fixing this particular issue when send encounters
> it, 
> but that has it's own issues (you have to either toggle the snapshot 
> writable temporarily, or modify the source and re-snapshot).

Well AFAIU, *this* very issue is at least something that bails out
loudly with an error... I rather worry about cases where send/receive
just exits without any error (status or message) and still didn't
manage to correctly copy everything.

The case that I had was that I incrementally send/received (with -p)
backups to another disk.
At some point in time I removed one of the older snapshots on that
backup disk... and then had fs errors... as if the data would have been
gone.. :(

smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs check help

2015-11-24 Thread Hugo Mills
On Tue, Nov 24, 2015 at 03:28:28PM -0500, Austin S Hemmelgarn wrote:
> On 2015-11-24 12:06, Vincent Olivier wrote:
> >Hi,
> >
> >Woke up this morning with a kernel panic (for which I do not have details). 
> >Please find below the output for btrfs check. Is this normal ? What should I 
> >do ? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10.
> You get bonus points for being on a reasonably up-to-date kernel and
> userspace :)
> 
> This is actually a pretty tame check result for a filesystem that's
> been through kernel panic. I think everything listed here is safe
> for check to fix, but I would suggest waiting until the devs provide
> opinions before actually running with --repair.  I would also
> suggest comparing results between the different devices in the FS,
> if things are drastically different, you may have issues that check
> can't fix on it's own.
> >[root@3dcpc5 ~]# btrfs check /dev/sdk
> >Checking filesystem on /dev/sdk
> >UUID: 6a742786-070d-4557-9e67-c73b84967bf5
> >checking extents
> >checking free space cache
> >checking fs roots
> These next two lines are errors, but I'm not 100% certain if it's
> safe to have check fix them:
> >root 5 inode 1341670 errors 400, nbytes wrong
> >root 11406 inode 1341670 errors 400, nbytes wrong

   I think so yes.

> This next one is also an error, and I am fairly certain that it's
> safe to have check fix as long as the number at the end is not too
> big.
> >found 19328809638262 bytes used err is 1

   Agreed.

   Hugo.

> The rest is just reference info
> >total csum bytes: 18849042724
> >total tree bytes: 27389886464
> >total fs tree bytes: 4449746944
> >total extent tree bytes: 3075457024
> >btree space waste bytes: 2880474254
> The only other thing I know that's worth mentioning is that if the
> numbers on these next two lines don't match, you may be missing some
> writes from right before the crash.
> >file data blocks allocated: 19430708535296
> >referenced 20123773407232

-- 
Hugo Mills | Great films about cricket: Umpire of the Rising Sun
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Austin S Hemmelgarn

On 2015-11-24 13:48, Christoph Anton Mitterer wrote:

Hey.

All that sounds pretty serious, doesn't it? So in other words, AFAIU,
send/receive cannot really be reliably used.

I did so far for making incremental backups, but I've also experienced
some problems (though not what this is about here).

I would say it's currently usable for one-shot stuff, but probably not 
reliably useable for automated things without some kind of 
administrative oversight.  In theory, it wouldn't be hard to write a 
script to automate fixing this particular issue when send encounters it, 
but that has it's own issues (you have to either toggle the snapshot 
writable temporarily, or modify the source and re-snapshot).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 21:27 +, Hugo Mills wrote:
>    -p only sends the file metadata for the changes from the reference
> snapshot to the sent snapshot. -c sends all the file metadata, but
> will preserve the reflinks between the sent snapshot and the (one or
> more) reference snapshots.
Let me see if I got that right:
- -p sends just the differences, for both data and meta-data.
- Plus, -c sends *all* the metadata, you said... but will it send all
data (and simply ignore what's already there) or will it also just send
the differences in terms of data?
- So that means effectively I'll end up with the same... right?

In other words, -p should be a tiny bit faster... but not that extremely much 
(unless I have tons[0] of metadata changes)

>  You can only use one -p (because there's
> only one difference you can compute at any one time), but you can use
> as many -c as you like (because you can share extents with any number
> of subvols).
So that means, if it would work correctly, -p would be the right choice
for me, as I never have multiple snapshots that I need to draw my
relinks from, right?


>    In implementation terms, on the receiver, -p takes a (writable)
> snapshot of the reference subvol, and modifies it according to the
> stream data. -c makes a new empty subvol, and populates it from
> scratch, using the reflink ioctl to use data which is known to exist
> in the reference subvols.
I see...
I think the manpage needs more information like this... :)


Thanks, for you help :-)
Chris.


[0] People may argue that one has XXbytes of metadata, and tons are a
measurement of weight... but when I recently carried 4 of the 8TB HDDs
in my back... I came to the conclusion that data correlates to gram ;-)

smime.p7s
Description: S/MIME cryptographic signature


Re: Using Btrfs on single drives

2015-11-24 Thread Russell Coker
On Sun, 15 Nov 2015 03:01:57 PM Duncan wrote:
> That looks to me like native drive limitations.
> 
> Due to the fact that a modern hard drive spins at the same speed no 
> matter where the read/write head is located, when it's reading/writing to 
> the first part of the drive -- the outside -- much more linear drive 
> distance will pass under the read/write heads in say a tenth of a second 
> than will be the case as the last part of the drive is filled -- the 
> inside -- and throughput will be much higher at the first of the drive.

http://www.coker.com.au/bonnie++/zcav/results.html

The above page has the results of my ZCAV benchmark (part of the Bonnie++ 
suite) which shows this.  You can safely tun ZCAV in read mode on a device 
that's got a filesystem on it so it's not too late to test these things.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shall distros run btrfsck on boot?

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 11:14 -0600, Eric Sandeen wrote:
> In a nutshell, though, I think a filesystem repair should be an
> admin-initiated
> action, not something that surprises you on a boot, at least for a
> journaling
> filesystem which is designed to maintain its integrity even in the
> face of
> a power loss or crash.

Well I wouldn't agree here... I maintain some >2PiB of storage for a
LHC Tier-2,... right now everything with ext4.
During normal operation we can of course not have any fsck, but every
now and then, when we reboot, it happens automatically,... and
regularly shows at least some (apparently non-serious) glitches.


IMHO, either the kernel driver itself already checks "everything", then
we wouldn't need a dedicated check tool.
Or it does not, but in that case, there will be people who want to have
that in-depth checks run regularly (and even if it's just every half a
year).
I better wait half an hour at boot, and find such errors, rather than
that they silently pile up until I really run into troubles.

That being said, of course it should be configurable for the admin...
and it is, via fstab.
So apart from that, given the expectation that btrfsck should be rock-
solid as e.g. e2fsck in some future, I wouldn't see why people
shouldn't have the necessary facilities to have it auto-run.


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [Bug] btrfs-progs v4.3.1, mkfs.btrfs manpage, profiles table missing raid1

2015-11-24 Thread David Sterba
On Mon, Nov 23, 2015 at 07:07:51PM +0100, David Sterba wrote:
> > Also, it calls raid5/6 "copies" rather than "parity".  Perhaps add 
> > another column for parity, change the redundancy column to copies, and 
> > adjust accordingly?  Alternatively, keep the single redundancy column and 
> > just change raid5 to 1 parity and raid6 to 2 parity.
> 
> Yeah, parity would be better. I'll split it to copy and parity, where a
> copy really means 1:1 byte copy.

Copy from manual page rendering:

   ┌┬┬─┐
   │││ │
   │Profile │ Redundancy │ Min/max devices │
   │├──┬┬┤ │
   ││  │││ │
   ││Copies│ Parity │  Striping  │ │
   ├┼──┼┼┼─┤
   ││  │││ │
   │single  │  1   │││  1/any  │
   ├┼──┼┼┼─┤
   ││  │││ │
   │  DUP   │ 2 / 1 device │││ 1/1 (see note)  │
   ├┼──┼┼┼─┤
   ││  │││ │
   │ RAID0  │  ││   1 to N   │  2/any  │
   ├┼──┼┼┼─┤
   ││  │││ │
   │ RAID1  │  2   │││  2/any  │
   ├┼──┼┼┼─┤
   ││  │││ │
   │RAID10  │  2   ││   1 to N   │  4/any  │
   ├┼──┼┼┼─┤
   ││  │││ │
   │ RAID5  │  1   │   1│ 2 to N - 1 │  2/any  │
   ├┼──┼┼┼─┤
   ││  │││ │
   │ RAID6  │  1   │   2│ 3 to N - 2 │  3/any  │
   └┴──┴┴┴─┘

https://github.com/kdave/btrfs-progs/blob/devel/Documentation/mkfs.btrfs.asciidoc#profiles

(ignore the alignment and other formatting artifacts, it looks better in
the manual page or html)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fsck: Fix a false alert where extent record has wrong metadata flag

2015-11-24 Thread David Sterba
On Tue, Nov 24, 2015 at 01:16:51PM +0800, Qu Wenruo wrote:
> In process_extent_item(), it gives 'metadata' initial value 0, but for
> non-skinny-metadata case, metadata extent can't be judged just from key
> type and it forgot that case.
> 
> This causes a lot of false alert in non-skinny-metadata filesystem.
> 
> Fix it by set correct metadata value before calling add_extent_rec().
> 
> Reported-by: Christoph Anton Mitterer 
> Signed-off-by: Qu Wenruo 

Applied, thanks you both for tracking it down. I'll add a test image.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread David Sterba
On Tue, Nov 24, 2015 at 08:46:03AM +0800, Qu Wenruo wrote:
> 
> 
> Christoph Anton Mitterer wrote on 2015/11/23 19:12 +0100:
> > On Mon, 2015-11-23 at 09:10 +0800, Qu Wenruo wrote:
> >> Also, you won't want compiler to do extra optimization
> > I did the following:
> > $ export CFLAGS="-g -O0 -Wall -D_FORTIFY_SOURCE=2"
> 
> Wow, I didn't ever know it's possible to override FORTIFY_SOURCE to 
> suppress the warning.

FWIW, my tip:

make EXTRA_CFLAGS='-g -O0 -U_FORTIFY_SOURCE'
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Christoph Anton Mitterer
On Tue, 2015-11-24 at 13:35 +0800, Qu Wenruo wrote:
> Hopes you didn't wait too long.
No worries, didn't hold my breath ;)


> The fixing patch is CCed to you, or you can get it from patchwork:
> https://patchwork.kernel.org/patch/7687611/
Unfortunately that doesn't make the error messages go away.
:(

Shall I start debugging again?

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Christoph Anton Mitterer
Hey.

All that sounds pretty serious, doesn't it? So in other words, AFAIU,
send/receive cannot really be reliably used.

I did so far for making incremental backups, but I've also experienced
some problems (though not what this is about here).


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: shall distros run btrfsck on boot?

2015-11-24 Thread Eric Sandeen
On 11/24/15 12:56 AM, Duncan wrote:
> Duncan posted on Tue, 24 Nov 2015 06:46:18 + as excerpted:
> 
>> That wouldn't be entirely uncommon, because as Eric mentions, btrfs
>> check is intended to be thorough, where the kernel mount-time check is
>> intended to be fast.
>>
>> But of course, as Eric also mentions, that's yet another reason you
>> don't want btrfs check running at boot... it's *SSLLLOOW*, because
>> it's being thorough.
> 
> Oops!  Mis-attribution.  Qu not Eric.
> 
> (I had read both replies in my email but only saw Eric's on the list, 
> which I read in my news client via gmane's list2news service, when I 
> composed the above.  So I presumed the points I remembered being made 
> were from Eric's post, when it was Qu's.) 

Yeah, I don't think that being thorough requires being slow.  ;)

In a nutshell, though, I think a filesystem repair should be an admin-initiated
action, not something that surprises you on a boot, at least for a journaling
filesystem which is designed to maintain its integrity even in the face of
a power loss or crash.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check help

2015-11-24 Thread Vincent Olivier
Hi,

Woke up this morning with a kernel panic (for which I do not have details). 
Please find below the output for btrfs check. Is this normal ? What should I do 
? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10.

Regards,

Vincent

[root@3dcpc5 ~]# btrfs check /dev/sdk
Checking filesystem on /dev/sdk
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents
checking free space cache
checking fs roots
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong
found 19328809638262 bytes used err is 1
total csum bytes: 18849042724
total tree bytes: 27389886464
total fs tree bytes: 4449746944
total extent tree bytes: 3075457024
btree space waste bytes: 2880474254
file data blocks allocated: 19430708535296
referenced 20123773407232--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Hugo Mills
On Tue, Nov 24, 2015 at 10:36:26PM +0100, Christoph Anton Mitterer wrote:
> On Tue, 2015-11-24 at 21:27 +, Hugo Mills wrote:
> >    -p only sends the file metadata for the changes from the reference
> > snapshot to the sent snapshot. -c sends all the file metadata, but
> > will preserve the reflinks between the sent snapshot and the (one or
> > more) reference snapshots.
> Let me see if I got that right:
> - -p sends just the differences, for both data and meta-data.
> - Plus, -c sends *all* the metadata, you said... but will it send all
> data (and simply ignore what's already there) or will it also just send
> the differences in terms of data?

   Well, if you have a snapshot A, snap to A', and then send -p A A',
it'll send the same amount of data as send -c A A'.

   However, the effect on the receiving system is slightly different
in terms of the subvol metadata -- with -p, it will preserve the
information that A and A' are snapshots of the same original. With -c,
it won't preserve that.

   This will probably have knock-on effects in terms of round-tripping
the snapshots (e.g. for restoring one to the hosed system and
continuing with the incremental backup scheme). I'd have to do some
hard thinking again with the send/receive algebra to work out what the
effect would be, but with the -c approach, you'd probably have
difficulties. The round-tripping feature hasn't been implemented yet,
so the point is currently moot, but it's certainly possible to do it
(with a small send stream change), and it probably will be done at
some point.

> - So that means effectively I'll end up with the same... right?
> 
> In other words, -p should be a tiny bit faster... but not that extremely much 
> (unless I have tons[0] of metadata changes)

   Yes.

> >  You can only use one -p (because there's
> > only one difference you can compute at any one time), but you can use
> > as many -c as you like (because you can share extents with any number
> > of subvols).
> So that means, if it would work correctly, -p would be the right choice
> for me, as I never have multiple snapshots that I need to draw my
> relinks from, right?

   Correct. The -c case is much less often needed. It's useful if you
have, say, several otherwise unrelated subvols that you need to
transfer efficiently from a filesystem that has had dedup run on it.
(Other use cases may apply as well).

> >    In implementation terms, on the receiver, -p takes a (writable)
> > snapshot of the reference subvol, and modifies it according to the
> > stream data. -c makes a new empty subvol, and populates it from
> > scratch, using the reflink ioctl to use data which is known to exist
> > in the reference subvols.
> I see...
> I think the manpage needs more information like this... :)
[snip]
> [0] People may argue that one has XXbytes of metadata, and tons are a
> measurement of weight... but when I recently carried 4 of the 8TB HDDs
> in my back... I came to the conclusion that data correlates to gram ;-)

   Yeah, I've met that particular equation too... :)

   Hugo.

-- 
Hugo Mills | Anyone who claims their cryptographic protocol is
hugo@... carfax.org.uk | secure is either a genius or a fool. Given the
http://carfax.org.uk/  | genius/fool ratio for our species, the odds aren't
PGP: E2AB1DE4  | good.  Bruce Schneier


signature.asc
Description: Digital signature


Re: shall distros run btrfsck on boot?

2015-11-24 Thread Eric Sandeen
On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote:

> if the system was
> shut down cleanly, you're fine barring software bugs, but if it
> crashed, you should be running a check on the FS.

Um, no...

The *entire point* of having a journaling filesystem is that after a
crash or power loss, a journal replay on next mount will bring the
metadata into a consistent state.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Qu Wenruo



On 11/24/2015 09:15 PM, Laurent Bonnaud wrote:

On 23/11/2015 02:00, Qu Wenruo wrote:


Considering the size, I'd like not to touch the dump, metadata is over 5G,


It is only 2GB once compressed :>.


and I think it's not related to on-disk data, but runtime problem like I 
mentioned above.


To test this hypothesis I did the following:

  - reboot the machine with a 4.3.0 kernel from Debian experimental
  - run "du" on the btrfs FS as a quick sanity check

The kernel went read-only again with the following kernel errors:

[ 5759.890934] BTRFS info (device sdb1): disk space caching is enabled
[ 5773.278244] BTRFS warning (device sdb1): block group 314635714560 has wrong 
amount of free space
[ 5773.278247] BTRFS warning (device sdb1): failed to load free space cache for 
block group 314635714560, rebuild it now
[ 5773.947885] [ cut here ]
[ 5773.947908] WARNING: CPU: 0 PID: 2546 at 
/build/linux-7sjCdl/linux-4.3/fs/btrfs/extent-tree.c:2851 
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
[ 5773.947909] BTRFS: Transaction aborted (error -17)
[ 5773.947910] Modules linked in: xt_multiport cpufreq_conservative 
cpufreq_powersave cpufreq_userspace cpufreq_stats ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_analog 
snd_hda_codec_generic dell_wmi iTCO_wdt iTCO_vendor_support sparse_keymap evdev 
coretemp kvm_intel dcdbas snd_hda_intel snd_hda_codec snd_hda_core kvm 
snd_hwdep i915 snd_pcm_oss snd_mixer_oss pcspkr sg snd_pcm psmouse lpc_ich 
mfd_core serio_raw i2c_i801 snd_timer snd shpchp tpm_tis video drm_kms_helper 
drm soundcore mei_me mei i2c_algo_bit wmi tpm 8250_fintek button acpi_cpufreq 
processor ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi drbd lru_cache libcrc32c parport_pc 
ppdev lp parport loop dm_crypt dm_mod autofs4 ext4 crc16 mbcache
[ 5773.947951]  jbd2 crc32c_generic btrfs xor raid6_pq md_mod ses enclosure 
hid_generic usbhid hid sd_mod uas usb_storage ahci libahci ata_generic libata 
scsi_mod e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[ 5773.947967] CPU: 0 PID: 2546 Comm: kworker/u16:2 Not tainted 
4.3.0-trunk-amd64 #1 Debian 4.3-1~exp1
[ 5773.947968] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, 
BIOS A08 01/21/2011
[ 5773.947981] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 5773.947983]  a02a8250 812c53a9 8800af283d30 
8106ebad
[ 5773.947985]  8800ace5eae0 8800af283d80 8800ac6ade70 
8800ac6add10
[ 5773.947987]  0020 8106ec2c a02a8420 
0020
[ 5773.947989] Call Trace:
[ 5773.947994]  [] ? dump_stack+0x40/0x57
[ 5773.947997]  [] ? warn_slowpath_common+0x7d/0xb0
[ 5773.947999]  [] ? warn_slowpath_fmt+0x4c/0x50
[ 5773.948019]  [] ? btrfs_run_delayed_refs+0x26b/0x2a0 
[btrfs]
[ 5773.948027]  [] ? delayed_ref_async_start+0x32/0x80 [btrfs]
[ 5773.948039]  [] ? btrfs_scrubparity_helper+0xc8/0x260 
[btrfs]
[ 5773.948041]  [] ? process_one_work+0x19f/0x3d0
[ 5773.948043]  [] ? worker_thread+0x4d/0x450
[ 5773.948044]  [] ? process_one_work+0x3d0/0x3d0
[ 5773.948046]  [] ? kthread+0xbd/0xe0
[ 5773.948048]  [] ? kthread_create_on_node+0x170/0x170
[ 5773.948051]  [] ? ret_from_fork+0x3f/0x70
[ 5773.948053]  [] ? kthread_create_on_node+0x170/0x170
[ 5773.948054] ---[ end trace 654b175f2543b4e4 ]---
[ 5773.948057] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2851: 
errno=-17 Object already exists
[ 5773.948092] BTRFS info (device sdb1): forced readonly
[ 5936.235238] perf interrupt took too long (2502 > 2500), lowering 
kernel.perf_event_max_sample_rate to 5
[ 6427.280125] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.288873] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.381126] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.381747] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[...]

Are you interested in the btrfs-image output now ?

BTW, did you encountered the same btrfsck error "chunk type dismatch" 
from Christoph?


If so that will provide great help for the btrfsck false alert debugging.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Qu Wenruo



On 11/25/2015 02:25 AM, Christoph Anton Mitterer wrote:

On Tue, 2015-11-24 at 13:35 +0800, Qu Wenruo wrote:

Hopes you didn't wait too long.

No worries, didn't hold my breath ;)



The fixing patch is CCed to you, or you can get it from patchwork:
https://patchwork.kernel.org/patch/7687611/

Unfortunately that doesn't make the error messages go away.
:(

Shall I start debugging again?


That's too bad...
Although you can try debugging again, but the result may not change at 
all. :(


There maybe some complicated debugging method, like add breakpoint at 
add_extent_rec() for special bytenr(bytenr in btrfsck error output, e.g 
5993525264384) to check if its metadata flag is set correctly.


If metadata flag is not set correctly, the backtrace will provide a lot 
of useful info...

But that's what dev should do, not an end user.
I'm totally OK if you can't provide that output.

I'll continue searching the code for any possible false alert.
But without a local test image, I'm afraid you may try several patches 
until a final fix is found


Thanks,
Qu



Cheers,
Chris.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols and parents - how?

2015-11-24 Thread Duncan
Christoph Anton Mitterer posted on Tue, 24 Nov 2015 05:56:00 +0100 as
excerpted:

> When I use subvolumes than these have always a parent subvolume (except
> ID5), so I can basically decide between two ways:
> 
> a) make child subvolumes, e.g.
> 5
> |
> +-root   (=subvol, mountpoint /)
>   +-boot/
>   +-root/
>   +-lib/
>   +-home/ (=subvolume)
> and soon on... perhaps the whole thing without the dedicated root-
> subovlume (although that's probably not so smart, I guess).
> 
> b) place at least some of the subvolumes directly below the top-level
> and mount them e.g. via /etc/fstab, e.g.
> 5
> |
> +-root   (=subvol, mountpoint /)
> | +-boot/
> | +-root/
> | +-lib/
> +-home/ (=subvolume, mountpoint /home)
> 
> 
> Now I wondered whether this has any technical implications, but neither
> the wiki, nor the manpages seem to explain a lot here.

Very astute question! =:^)

Somewhere on the wiki I believe there's a recommendation to use (b) 
layout, but to some extent it depends on why you're actually doing 
subvolumes.  

OK, found it on the wiki.  It wasn't under use-cases, where I initially 
thought to look, but under sysadmin guide.  Specifically, see section 
4.2, managing snapshots, but I'd suggest reading the entire subvolumes 
discussion, section 4, or even most/all of the page.

https://btrfs.wiki.kernel.org/index.php/SysadminGuide

(More below.)

> The "differences", AFAIU, are the follows:
> - When I mount a given subvolume,.. it's childs are automatically
>   "there".
>   Whereas when I don't have them as childs (as in (b)) I must of course
>   mount them somehow manually.
> - Analogously for umounting.
> - I can move existing subvols to higher/lower levels, and the parent
>   IDs will change accordingly.
> 
> So basically it makes no difference, right? Or is there anything more
> technical going on? E.g. with the ref-links or so?
> Right now, there are, AFAIK, neither recursive snapshots (and especially
> not atomic ones) nor recursive send/receive, right?
> If that should ever be implemented, would I perhaps have problems with
> (a) or (b)?

If you're doing subvolumes for snapshotting and potential rollback 
purposes, layout (b) can be preferable as it allows a more direct mix and 
match rollback.

Suppose you only want to rollback /, because some update screwed you up, 
but not /home, which is fine.  If /home is a nested subvolume, then 
you're now mounting the nested home subvolume from some other nesting 
tree entirely, whereas if they're all under top-level, you simply mount 
the /home subvolume under whatever snapshot of / you are currently 
booting.

Of course the reverse applies as well, if / is fine but you want to 
rollback /home.  Again, with nesting you're reaching into some other 
nesting to mount what you want, and it can get a bit unintuitive and 
difficult to track, particularly if you go more than the two levels deep, 
but if all the snapshots are direct children of the top level ID 5, it's 
a lot easier.

Tho I'd actually suggest a variant of the flat layout they suggest in the 
sysadmin's guide.  What I'd do is something like this (using your tree 
drawing style):

5
|
+-roots (dir not subvol, note the s, rootS, plural)
| +-root (subvol, mountpoint /)
| | +-boot/ (dir)
| | +-root/ (dir)
| | +-lib/  (dir)
| | +-home/ (empty dir as mountpoint)
| +-root-snapshot-2015.0301 (dated snapshot of root)
| +-root-snapshot-2015.0601 (dated snapshot of root)
| +-root-snapshot-2015.0901 (dated snapshot of root)
+-homes (dir not subvol)
| +-home (subvol, mountpoint /home)
| +-home-snapshot-2015.0301 (dated snapshot of home)
...


Of course, you might also organize by date instead of subvol...

5
|
+- heads (dir, headS plural)
| +-root (subvol)
| +-home (subvol)
| +-whatever (subvol)
+-snapshots-2015.0301 (dir, snapshotS plural)
| +-root-2015.0301 (snapshot of heads/root)
| +-home-2015.0301 (snapshot of heads/home)
| ...
+-snapshots-2015.0601 (dir)
| +-root-2015.0601 (snapshot)
| ...
+-snapshots-2015.0901 (dir)
| +-root-2015.0901 (snapshot)
...


Either of these would make finding a desired snapshot to rollback to much 
easier than a pure flat subvols/snapshots layout, with the preferred one 
depending on whether you want subvols/snapshots grouped by date or by 
snapshotted mountpoint.

The dates organization would make cleaning up old snapshots by date, and 
visually checking that the snapshot cleanup script (if automated) is 
working as intended, somewhat easier, however.


Meanwhile, if the intention for a subvolume is simply to exclude that 
subtree from snapshotting of the parent, as might be the case for example 
if you have a VMs subvol, with the VM image files set NOCOW to avoid 
fragmentation, since snapshotting nocow files forces cow1 (a cow at the 
first write of that block, before returning to nocow, because a snapshot 
locks the existing extents in place for the snapshot, so initial writes 
to a block after a snapshot /can't be nocow or it'd change the snapshot 
too...), and it's 

Re: [RFC][PATCH 00/12] Enhanced file stat system call

2015-11-24 Thread Martin Steigerwald
Am Dienstag, 24. November 2015, 00:13:08 CET schrieb Christoph Hellwig:
> On Fri, Nov 20, 2015 at 05:19:31PM +0100, Martin Steigerwald wrote:
> > I know its mostly relevant for just for FAT32, but on any account rather
> > than trying to write 4 GiB and then file, it would be good to at some
> > time get a dialog at the beginning of the copy.
> 
> pathconf/fpathconf is supposed to handle that.  It's not super pretty
> but part of Posix.  Linus hates it, but it might be time to give it
> another try.

It might be interesting for BTRFS as well, to be able to ask what amount of 
free space there currently is *at* a given path. Cause with BTRFS and 
Subvolumes this may differ between different paths. Even tough its not 
implemented yet, it may be possible in the future to have one subvolume with 
RAID 1 profile and one with RAID 0 profile.

That said an application wanting to make sure it can write a certain amount of 
data can use fallocate. And thats thats the only reliable way to ensure it, I 
know of. Which can become tedious for several files, but there is no principal 
problem with preallocating all files if their sizes are known. Even rsync or 
desktop environments could work like that. First fallocate everything, then, 
only if that succeeds, start actually copying data. Disadvantage: On aborted 
copies you have all files with their correct sizes and no easy indicates on 
where the copy stopped.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] Btrfs-convert rework to support native separate

2015-11-24 Thread Qu Wenruo



David Sterba wrote on 2015/11/23 18:33 +0100:

On Fri, Nov 20, 2015 at 11:24:04AM +0800, Qu Wenruo wrote:

Here comes the 1st version of btrfs-convert rework.
Any test is welcomed, and it can already pass the convert test from
btrfs-progs. (Since the test doesn't test rollback function)


I went through the patches, looks mostly ok akin to the proposed
changes. I'll take the independent patches rightaway. Unfortunatelly
there are code changes that clash significantly with the pending
reiserfs addition to convert, I'm afraid you'll have to rework your
patchset on top of that.  The code is now pushed to branch
'dev/convert-reiser'.



Hi David,

It seems the conflict is quite huge, your reiserfs support is based on 
the old behavior, just like what old ext2 one do: custom extent allocation.


I'm afraid the rebase will take a lot of time since I'm completely a 
newbie about reiserfs... :(


I may need to change a lot of ext2 direct call to generic one, and may 
even change the generic function calls.(no alloc/free, only free space 
lookup)


And some (maybe a lot) of reiserfs codes may be removed during the rework.

Will it be OK for you?

Thanks,
Qu

--
This message has been scanned for viruses and
dangerous content by Fujitsu, and is believed to be clean.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 00/12] Enhanced file stat system call

2015-11-24 Thread Christoph Hellwig
On Tue, Nov 24, 2015 at 09:48:22AM +0100, Martin Steigerwald wrote:
> It might be interesting for BTRFS as well, to be able to ask what amount of 
> free space there currently is *at* a given path. Cause with BTRFS and 
> Subvolumes this may differ between different paths.

We can handle this trivial with the current statfs interface.  Take a
look at xfs_fs_statfs and xfs_qm_statvfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: speedup generic/027 for new version of btrfs

2015-11-24 Thread Zhaolei
From: Zhao Lei 

New version of btrfs create non-mixed blockgroups in all case.

For generic/027, the filesystem in test is convert from
mixed-blockgroup to non-mixed blockgroup.
And test time is changed from 400s -> 2700s in my node.

To test btrfs with all mountoptions, this testitem need about
7.5H. (actually, some mountoption as compress needs more time)

This patch reduce test loop count, to make testtime about equal
with old version.

Signed-off-by: Zhao Lei 
---
 tests/generic/027 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/generic/027 b/tests/generic/027
index d2e59d6..42f0685 100755
--- a/tests/generic/027
+++ b/tests/generic/027
@@ -78,7 +78,7 @@ rm -f $SCRATCH_MNT/testfile
 loop=100
 # btrfs takes much longer time, reduce the loop count
 if [ "$FSTYP" == "btrfs" ]; then
-   loop=10
+   loop=2
 fi
 
 dir=$SCRATCH_MNT/testdir
-- 
1.8.5.1


-- 
This message has been scanned for viruses and
dangerous content by Fujitsu, and is believed to be clean.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] btrfs-progs: introduce framework to check kernel supported features

2015-11-24 Thread Mike Fleetwood
On 23 November 2015 at 12:56, Anand Jain  wrote:
> In the newer kernel, supported kernel features can be known from
>   /sys/fs/btrfs/features
> however this interface was introduced only after 3.14, and most the
> incompatible FS features were introduce before 3.14.
>
> This patch proposes to maintain kernel version against the feature list,
> and so that will be the minimum kernel version needed to use the feature.
>
> Further, for features supported later than 3.14 this list can still be
> updated, so it serves as a repository which can be displayed for easy
> reference.
>
> Signed-off-by: Anand Jain 
> ---
> v2: Check for condition that what happens when we fail to read kernel
> version. Now the code will fail back to use the default as set by
> the progs.
>
>  utils.c | 80 
> -
>  utils.h |  1 +
>  2 files changed, 76 insertions(+), 5 deletions(-)
>
> diff --git a/utils.c b/utils.c
> index b754686..24042e5 100644
> --- a/utils.c
> +++ b/utils.c
> @@ -32,10 +32,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include "kerncompat.h"
> @@ -567,21 +569,28 @@ out:
> return ret;
>  }
>
> +/*
> + * min_ker_ver: update with minimum kernel version at which the feature
> + * was integrated into the mainline. For the transit period, that is
> + * feature not yet in mainline but in mailing list and for testing,
> + * please use "0.0" to indicate the same.
> + */
>  static const struct btrfs_fs_feature {
> const char *name;
> u64 flag;
> const char *desc;
> +   const char *min_ker_ver;
>  } mkfs_features[] = {
> { "mixed-bg", BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS,
> -   "mixed data and metadata block groups" },
> +   "mixed data and metadata block groups", "2.7.31"},
I think you mean 2.6.37 here.
67377734fd24c3 "Btrfs: add support for mixed data+metadata block groups"

Thanks,
Mike

> { "extref", BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF,
> -   "increased hardlink limit per file to 65536" },
> +   "increased hardlink limit per file to 65536", "3.7"},
> { "raid56", BTRFS_FEATURE_INCOMPAT_RAID56,
> -   "raid56 extended format" },
> +   "raid56 extended format", "3.9"},
> { "skinny-metadata", BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA,
> -   "reduced-size metadata extent refs" },
> +   "reduced-size metadata extent refs", "3.10"},
> { "no-holes", BTRFS_FEATURE_INCOMPAT_NO_HOLES,
> -   "no explicit hole extents for files" },
> +   "no explicit hole extents for files", "3.14"},
> /* Keep this one last */
> { "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
>  };
> @@ -3077,3 +3086,64 @@ unsigned int get_unit_mode_from_arg(int *argc, char 
> *argv[], int df_mode)
>
> return unit_mode;
>  }
> +
> +static int version_to_code(char *v)
> +{
> +   int i = 0;
> +   char *b[3] = {NULL};
> +   char *save_b = NULL;
> +
> +   for (b[i] = strtok_r(v, ".", _b);
> +   b[i] != NULL;
> +   b[i] = strtok_r(NULL, ".", _b))
> +   i++;
> +
> +   if (b[2] == NULL)
> +   return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), 0);
> +   else
> +   return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2]));
> +
> +}
> +
> +static int get_kernel_code()
> +{
> +   int ret;
> +   struct utsname utsbuf;
> +   char *version;
> +
> +   ret = uname();
> +   if (ret)
> +   return -ret;
> +
> +   if (!strlen(utsbuf.release))
> +   return -EINVAL;
> +
> +   version = strtok(utsbuf.release, "-");
> +
> +   return version_to_code(version);
> +}
> +
> +u64 btrfs_features_allowed_by_kernel(void)
> +{
> +   int i;
> +   int local_kernel_code = get_kernel_code();
> +   u64 features = 0;
> +
> +   /*
> +* When system did not provide the kernel version then just
> +* return 0, the caller has to depend on the intelligence as
> +* per btrfs-progs version
> +*/
> +   if (local_kernel_code <= 0)
> +   return 0;
> +
> +   for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
> +   char *ver = strdup(mkfs_features[i].min_ker_ver);
> +
> +   if (local_kernel_code >= version_to_code(ver))
> +   features |= mkfs_features[i].flag;
> +
> +   free(ver);
> +   }
> +   return (features);
> +}
> diff --git a/utils.h b/utils.h
> index 192f3d1..9044643 100644
> --- a/utils.h
> +++ b/utils.h
> @@ -104,6 +104,7 @@ void btrfs_list_all_fs_features(u64 mask_disallowed);
>  char* btrfs_parse_fs_features(char *namelist, u64 *flags);
>  void btrfs_process_fs_features(u64 flags);
>  void btrfs_parse_features_to_string(char 

Re: [PATCH v2 0/5] Make btrfs-progs really compatible with any kernel version

2015-11-24 Thread Anand Jain




Thanks for comments.


Distro would also want to use the latest btrfs-progs on older kernel
since it will have latest fsck/send/receive fixes, better UI
and updated man pages.

btrfs-progs which claim backward kernel compatible and it shouldn't
fail on the below cmd when the btrfs-progs is upgraded.
   mkfs.btrfs /dev/sda && mount /dev/sda /btrfs
but it does in some test cases.


A warning is unnecessary IMO. Imagine user who upgrade progs for
better doc/UI and have no intention to upgrade the kernel, gets a 
Warning!. If user does not upgrade kernel its a fair assumption

that they don't need/not-looking for latest kernel features. (What
did I miss ?).


Next.
For users looking to have a disk-layout which is compatible with
older kernels (and may not be a running kernel), then with the
current patch set its quite possible to do something like below,

mkfs.btrfs -O as-per-kernel=3.2
mkfs.btrfs -O as-per-kernel=4.0
mkfs.btrfs -O as-per-kernel=x.x (anything)

And only those features that are supported until version x.x
(mainline) will be enabled by default unless user want to over
default totally by using -O .

Thanks, Anand



Anand Jain wrote:

Btrfs-progs is a tool for the btrfs kernel and we hope latest btrfs-progs
be compatible w any set of older/newer kernels.

So far mkfs.btrfs and btrfs-convert sets the default features, for eg,
skinny-metadata even if the running kernel does not supports it, and
so the mount fails on the running.

Here in this set of patches will make sure the progs understands the
kernel supported features.

So in this patch, checks if sysfs tells whether the feature is
supported if not, then it will relay on static kernel version which
provided that feature (skinny-metadata here in this example), next
if for some reason the running kernel does not provide the kernel
version, then it will fall back to the original method to enable
the feature with a hope that kernel will support it.

Also the last patch adds a warning when we fail to read either
sysfs features or the running kernel version.

With this I hope all the concerns from the review comments are
addressed.


Anand Jain (5):
   btrfs-progs: introduce framework to check kernel supported features
   btrfs-progs: add framework to check features supported by sysfs
   btrfs-progs: kernel based default features for mkfs
   btrfs-progs: kernel based default features for btrfs-convert
   btrfs-progs: add warning when we fail to read sysfs or kernel version

  btrfs-convert.c |  18 ++-
  mkfs.c  |  22 -
  utils.c | 146 +++-
  utils.h |   2 +
  4 files changed, 173 insertions(+), 15 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send reproducibly fails for a specific subvolume after sending 15 GiB, scrub reports no errors

2015-11-24 Thread Austin S Hemmelgarn

On 2015-11-24 00:42, Duncan wrote:

Nils Steinger posted on Mon, 23 Nov 2015 22:10:12 +0100 as excerpted:


Do we anything about what might cause a filesystem to enter a state
which `send` chokes on?
I've only seen a small sample of the corrupted files before growing
tired of the process and just recreating the whole thing, but all of
them were database files (presumably SQLite). Could it be that the files
were being written to during an unclean shutdown, leading to some kind
of corruption of the FS? Unfortunately, I was a little triggerhappy when
cleaning up old snapshots, so there aren't any left to aid in
troubleshooting this problem further…
That's OK, I've not been able to figure out much anyway, despite the 
case of this I had about a month ago with about 200 different files 
hitting the issue (I had written a script at that time to automate 
fixing it, but haven't been able to find it for some reason), and the 
other cases I've had on my systems over the past year (I only started 
using send about a year ago for backups).  It might be worth noting that 
you're the first person who's directly reported this (I would have, but 
I hate to report stuff that isn't a critical data safety issue without a 
reliable reproducer).


Austin's the one attempting to trace down the problem, so he'd have the
most direct answer there.  (My use-case doesn't involve snapshotting or
send/receive at all.)
I stopped using send/receive for backups after hitting this for what I 
think is the seventh time in the past year about a month ago (I still 
use snapshots for backups, but now I use them to generate SquashFS 
images (I really don't care about the block layout or inode numbers or 
most of the BTRFS related properties), which preserves my desire to have 
bootable backups, and also saves significant storage space both locally 
and on the cloud storage services I use for off-site backups (and in 
turn saves money on those too)).  I am still trying to pull together 
something to reliably reproduce this though, as I still use send/receive 
for some things (like cloning VM's without taking them offline or 
hitting the issues with block copies of a BTRFS filesystem).


But if any type of files would be likely to create issues, it'd be
something like database or VM image files, since the random-file-rewrite-
pattern they typically have is in general the most problematic for copy-
on-write (COW) filesystems such as btrfs.  Without some sort of
additional fragmentation management (like the autodefrag mount option),
these files will end up _highly_ fragmented on btrfs, often thousands of
fragments, tens of thousands when the files in question are multi-gig.

In general, I've seen this mostly with three types of files:
1. Database files and VM images (In my experience, this has been the 
majority of the issue on filesystems that have them.  Autodefrag doesn't 
seem to help, at least, not for SQLite or BerkDB/GDBM databases).
2. Shared libraries and executables (these are the majority of the issue 
on filesystems without databases or VM images, although I can't for the 
life of me figure out why, as they are usually written to very infrequently)

3. Plain text configuration files.

For example, the last time I had this happen, it was on the root 
filesystem of one of my systems, and about a third of the problem files 
were either in /etc or text files under /usr/share, while the remaining 
2 thirds were mostly stuff under /usr/lib and /lib.  It's probably worth 
noting also that I've never seen certain files trigger this that I would 
expect to based on the above info, in particular:
1. ClamAV virus databases (IIRC, these are similar in structure to 
SQLite DB's).

2. BOINC applications.
3. Almost anything in /usr/libexec (stuff like GCC and binutils).
4. Almost any kind of script.
It's probably also worth noting that I occasionally see inconsistencies 
in database files that cause this to happen, but have never seen any 
corruption in any other types of file, so it doesn't seem to have an 
impact on data safety.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-24 Thread Laurent Bonnaud
On 23/11/2015 02:00, Qu Wenruo wrote:

> Considering the size, I'd like not to touch the dump, metadata is over 5G, 

It is only 2GB once compressed :>.

> and I think it's not related to on-disk data, but runtime problem like I 
> mentioned above.

To test this hypothesis I did the following:

 - reboot the machine with a 4.3.0 kernel from Debian experimental
 - run "du" on the btrfs FS as a quick sanity check

The kernel went read-only again with the following kernel errors:

[ 5759.890934] BTRFS info (device sdb1): disk space caching is enabled
[ 5773.278244] BTRFS warning (device sdb1): block group 314635714560 has wrong 
amount of free space
[ 5773.278247] BTRFS warning (device sdb1): failed to load free space cache for 
block group 314635714560, rebuild it now
[ 5773.947885] [ cut here ]
[ 5773.947908] WARNING: CPU: 0 PID: 2546 at 
/build/linux-7sjCdl/linux-4.3/fs/btrfs/extent-tree.c:2851 
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
[ 5773.947909] BTRFS: Transaction aborted (error -17)
[ 5773.947910] Modules linked in: xt_multiport cpufreq_conservative 
cpufreq_powersave cpufreq_userspace cpufreq_stats ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_analog 
snd_hda_codec_generic dell_wmi iTCO_wdt iTCO_vendor_support sparse_keymap evdev 
coretemp kvm_intel dcdbas snd_hda_intel snd_hda_codec snd_hda_core kvm 
snd_hwdep i915 snd_pcm_oss snd_mixer_oss pcspkr sg snd_pcm psmouse lpc_ich 
mfd_core serio_raw i2c_i801 snd_timer snd shpchp tpm_tis video drm_kms_helper 
drm soundcore mei_me mei i2c_algo_bit wmi tpm 8250_fintek button acpi_cpufreq 
processor ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi drbd lru_cache libcrc32c parport_pc 
ppdev lp parport loop dm_crypt dm_mod autofs4 ext4 crc16 mbcache
[ 5773.947951]  jbd2 crc32c_generic btrfs xor raid6_pq md_mod ses enclosure 
hid_generic usbhid hid sd_mod uas usb_storage ahci libahci ata_generic libata 
scsi_mod e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[ 5773.947967] CPU: 0 PID: 2546 Comm: kworker/u16:2 Not tainted 
4.3.0-trunk-amd64 #1 Debian 4.3-1~exp1
[ 5773.947968] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, 
BIOS A08 01/21/2011
[ 5773.947981] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 5773.947983]  a02a8250 812c53a9 8800af283d30 
8106ebad
[ 5773.947985]  8800ace5eae0 8800af283d80 8800ac6ade70 
8800ac6add10
[ 5773.947987]  0020 8106ec2c a02a8420 
0020
[ 5773.947989] Call Trace:
[ 5773.947994]  [] ? dump_stack+0x40/0x57
[ 5773.947997]  [] ? warn_slowpath_common+0x7d/0xb0
[ 5773.947999]  [] ? warn_slowpath_fmt+0x4c/0x50
[ 5773.948019]  [] ? btrfs_run_delayed_refs+0x26b/0x2a0 
[btrfs]
[ 5773.948027]  [] ? delayed_ref_async_start+0x32/0x80 [btrfs]
[ 5773.948039]  [] ? btrfs_scrubparity_helper+0xc8/0x260 
[btrfs]
[ 5773.948041]  [] ? process_one_work+0x19f/0x3d0
[ 5773.948043]  [] ? worker_thread+0x4d/0x450
[ 5773.948044]  [] ? process_one_work+0x3d0/0x3d0
[ 5773.948046]  [] ? kthread+0xbd/0xe0
[ 5773.948048]  [] ? kthread_create_on_node+0x170/0x170
[ 5773.948051]  [] ? ret_from_fork+0x3f/0x70
[ 5773.948053]  [] ? kthread_create_on_node+0x170/0x170
[ 5773.948054] ---[ end trace 654b175f2543b4e4 ]---
[ 5773.948057] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2851: 
errno=-17 Object already exists
[ 5773.948092] BTRFS info (device sdb1): forced readonly
[ 5936.235238] perf interrupt took too long (2502 > 2500), lowering 
kernel.perf_event_max_sample_rate to 5
[ 6427.280125] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.288873] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.381126] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[ 6427.381747] BTRFS (device sdb1): parent transid verify failed on 
353291255808 wanted 9058 found 9056
[...]

Are you interested in the btrfs-image output now ?

-- 
Laurent.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Make btrfs-progs really compatible with any kernel version

2015-11-24 Thread Anand Jain



David Sterba wrote:

On Mon, Nov 23, 2015 at 08:56:13PM +0800, Anand Jain wrote:

Btrfs-progs is a tool for the btrfs kernel and we hope latest btrfs-progs
be compatible w any set of older/newer kernels.

So far mkfs.btrfs and btrfs-convert sets the default features, for eg,
skinny-metadata even if the running kernel does not supports it, and
so the mount fails on the running.


So the default behaviour of mkfs will try to best guess the feature set
of currently running kernel. I think this is is the most common scenario
and justifies the change in default behaviours.

For the other cases I'd like to introduce some human-readable shortcuts
to the --features option. Eg. 'mkfs.btrfs -O compat-3.2' will pick all
options supported by the unpatched mainline kernel of version 3.2.


 This is a nice idea. I am planning. How about 'as-per-kernel=x.x'
 instead of compat-3.2.

 Also looks like it better to list the feature and version mapping
 as btrfs-progs already knows it at this patchset.



This
would be present for all version, regardless if there was a change in the
options or not.


  Hmm.. I didn't quite get that.


Similarly for convenience, add 'running' that would pick the options
from running kernel but will be explicit.

A remaining option should override the 'running' behaviour and pick the
latest mkfs options. Naming it 'defaults' sounds a bit ambiguous so the
name is yet to be determined.


Here in this set of patches will make sure the progs understands the
kernel supported features.

So in this patch, checks if sysfs tells whether the feature is
supported if not, then it will relay on static kernel version which
provided that feature (skinny-metadata here in this example), next
if for some reason the running kernel does not provide the kernel
version, then it will fall back to the original method to enable
the feature with a hope that kernel will support it.

Also the last patch adds a warning when we fail to read either
sysfs features or the running kernel version.


Your patchset is a good start, the additional options I've described can
be added on top of that.


 Yes.

Thanks, Anand




We might need to switch the version
representation from string to KERNEL_VERSION but that's an
implementation detail.





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html