Re: [PATCH 3/8] Fix btrfs/094 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/094 | 75 
> ++---
>  tests/btrfs/094.out | 71 --
>  2 files changed, 100 insertions(+), 46 deletions(-)
>
> diff --git a/tests/btrfs/094 b/tests/btrfs/094
> index 6f6cdeb..45f108d 100755
> --- a/tests/btrfs/094
> +++ b/tests/btrfs/094
> @@ -67,36 +67,41 @@ mkdir $send_files_dir
>  _scratch_mkfs >>$seqres.full 2>&1
>  _scratch_mount "-o compress"
>
> -# Create the file with a single extent of 128K. This creates a metadata file
> -# extent item with a data start offset of 0 and a logical length of 128K.
> -$XFS_IO_PROG -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> -
> -# Now rewrite the range 64K to 112K of our file. This will make the inode's
> -# metadata continue to point to the 128K extent we created before, but now
> -# with an extent item that points to the extent with a data start offset of
> -# 112K and a logical length of 16K.
> -# That metadata file extent item is associated with the logical file offset
> -# at 176K and covers the logical file range 176K to 192K.
> -$XFS_IO_PROG -c "pwrite -S 0xbb 64K 112K" -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> -
> -# Now rewrite the range 180K to 12K. This will make the inode's metadata
> -# continue to point the the 128K extent we created earlier, with a single
> -# extent item that points to it with a start offset of 112K and a logical
> -# length of 4K.
> -# That metadata file extent item is associated with the logical file offset
> -# at 176K and covers the logical file range 176K to 180K.
> -$XFS_IO_PROG -c "pwrite -S 0xcc 180K 12K" -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +# Create the file with a single extent of 32 blocks. This creates a metadata
> +# file extent item with a data start offset of 0 and a logical length of
> +# 32 blocks.
> +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((16 * $BLOCK_SIZE)) $((32 * 
> $BLOCK_SIZE))" \
> +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +# Now rewrite the block range [16, 28[ of our file. This will make
> +# the inode's metadata continue to point to the single 32 block extent
> +# we created before, but now with an extent item that points to the
> +# extent with a data start offset referring to the 28th block and a
> +# logical length of 4 blocks.
> +# That metadata file extent item is associated with the block range
> +# [44, 48[.
> +$XFS_IO_PROG -c "pwrite -S 0xbb $((16 * $BLOCK_SIZE)) $((28 * $BLOCK_SIZE))" 
> \
> +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +
> +# Now rewrite the block range [45, 48[. This will make the inode's
> +# metadata continue to point the 32 block extent we created earlier,
> +# with a single extent item that points to it with a start offset
> +# referring to the 28th block and a logical length of 1 block.
> +# That metadata file extent item is associated with the block range
> +# [44, 45[.
> +$XFS_IO_PROG -c "pwrite -S 0xcc $((45 * $BLOCK_SIZE)) $((3 * $BLOCK_SIZE))" \
> +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
>  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>
> -# Now clone that same region of the 128K extent into a new file, so that it
> +# Now clone that same region of the 32 block extent into a new file, so that 
> it
>  # gets referenced twice and the incremental send operation below decides to
>  # issue a clone operation instead of copying the data.
>  touch $SCRATCH_MNT/bar
> -$CLONER_PROG -s $((176 * 1024)) -d $((176 * 1024)) -l $((4 * 1024)) \
> +$CLONER_PROG -s $((44 * $BLOCK_SIZE)) -d $((44 * $BLOCK_SIZE)) -l 
> $BLOCK_SIZE \
> $SCRATCH_MNT/foo $SCRATCH_MNT/bar
>
>  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
> @@ -105,10 +110,13 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f 
> $send_files_dir/1.snap
>  _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
> -f $send_files_dir/2.snap
>
> -echo "File digests in the original filesystem:"
> -md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch
> -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
> -md5sum $SCRATCH_MNT/mysnap2/bar | _filter_scratch
> +echo "File contents in the original filesystem:"
> +echo "mysnap1/foo"
> +od -t x1 $SCRATCH_MNT/mysnap1/foo | _filter_od
> +echo "mysnap2/foo"
> +od -t x1 $SCRATCH_MNT/mysnap2/foo | _filter_od
> +echo "mysnap2/bar"
> +od -t x1 $SCRATCH_MNT/mysnap2/bar | _filter_od
>
>  # 

Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
 > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
 > > Not sure if I've already reported this one, but I've been seeing this
 > > a lot this last couple days.
 > > 
 > > kernel BUG at mm/page-writeback.c:2654!
 > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
 > > 4.4.0-rc4-think+ #14
 > > task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
 > > RIP: 0010:[]  [] 
 > > clear_page_dirty_for_io+0x180/0x1d0
 > 
 > Huh, are you able to reproduce at will?  From this code path it should
 > mean somebody else is unlocking a page they don't own.

pretty easily yeah. I hit it maybe a couple dozen times yesterday.
So if you've got some idea of printk's to spray anywhere I can give
that a shot.

Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] reflink: more tests

2015-12-10 Thread Darrick J. Wong
On Thu, Dec 10, 2015 at 08:34:49AM -0800, Christoph Hellwig wrote:
> The new 849 fails reliably on btrfs, which makes me wonder if either
> the test is doing something wrong, or the btrfs whole file clone
> behavior is broken, which wouldn't be very reasuring.  I didn't have
> time to look into why it's failing yet.

Huh.  Works reliably for /me; could you send me the output from 849?

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: attacking btrfs filesystems via UUID collisions?

2015-12-10 Thread Chris Murphy
On Wed, Dec 9, 2015 at 2:48 PM, S.J.  wrote:
>> 1. better practices, we really need to tell users, and documentation
>> writers, that using dd (or variant) to copy Btrfs volumes has a
>> consequence and should not be used to make copies.
>
>
>> 2. Btrfs needs a better way to make a copy of a volume when there are
>> snapshots (including even rw snapshots); e.g. permit send/receive to
>> work on rw snapshots if the fs is ro mounted; e.g. a way to do
>> "recursive" send/receive.
>
>
>> 3. Some way to fail gracefully, when there's ambiguity that cannot be
>> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
>> then there's simply no way to resolve the ambiguity automatically, and
>> the volume should just refuse to rw mount until the user resolves the
>> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
>> in such a case rather than totally fail to mount.
>
>
> About 3:
> RO fallback for the second device/partitions is not good.
> It won't stop confusing the two partitions, and even if both are RO,
> thinking it's ok to read and then reading the wrong data is bad.

That isn't what I'm suggesting. In the multiple device volume case
where there are two exact (same UUID, same devid, same generation)
instances of one of the block devices, Btrfs could randomly choose
either one if it's an RO mount.

It may very well be safer to just refuse to mount it with an error
indicating the ambiguity, and suggesting the user explicitly specify
the devices to use to assemble the volume, and if the generations
differ on those chosen devices, at least warn about that also.


>
> About 1 and 2 ... if 3 gets fulfilled, why?
> DD itself is not a problem "if" the UUID is changed after it
> (which is a command as simple as dd), and if someone doesn't
> know that, he/she will notice when mount refuses to work
> because UUID duplicate.

dd is not a copy operation. It's creating a 2nd original. You don't
end up with an original and a copy (or clone). A copy or clone has
some distinguishing difference. Volume UUID is used throughout Btrfs
metadata, it's not just in the superblocks. Changing volume UUID
requires a rewrite of all metadata. This is inefficient for two
reasons: one dd copies unused sectors; two it copies metadata that
will have to be completely rewritten by btrfstune to change volume
UUID; and also the subvolume UUIDs aren't changed, so it's an
incomplete solution that has problems (see other threads).

If your workflow requires making an exact copy (for the shelf or for
an emergency) then dd might be OK. But most often it's used because
it's been easy, not because it's a good practice. Note that Btrfs is
not unique, XFS v5 does a very similar thing with volume UUID as well,
and resulted in this change:
http://oss.sgi.com/pipermail/xfs/2015-April/041267.html

Using dd also means the volume is offline. For even medium sized
multiple device volumes, it's a huge penalty. dd does not scale. Using
dd means source and destination physical configurations are identical
(at least the number of devices and the data and metadata profiles)
which I may not want or need for a clone. Maybe I want a 1x6TB clone
for the 5x1TB raid5 volume.

Even for an online full volume copy/clone of a 5x1TB raid5, moving all
subvolume+snapshots to a new 3x4TB raid5 (or whatever), that could be
hundreds of subvolumes to btrfs send/receive. OK yeah script it. But
that's tedious even assuming I have a script friendly subvolume naming
convention to get the send/receive order correct, which I don't.

Anyway, I think it's a nice to have now, that'll eventually be a need.
And dd is just totally disqualified outside of very specific edge case
need.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Chris Mason
On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
> Not sure if I've already reported this one, but I've been seeing this
> a lot this last couple days.
> 
> kernel BUG at mm/page-writeback.c:2654!
> invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   4.4.0-rc4-think+ 
> #14
> task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
> RIP: 0010:[]  [] 
> clear_page_dirty_for_io+0x180/0x1d0

Huh, are you able to reproduce at will?  From this code path it should
mean somebody else is unlocking a page they don't own.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] Filter xfs_io's output in units of page size

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> The helpers introduced in this commit will be used to make btrfs tests that
> assume 4k as the page size to work on non-4k page-sized systems as well.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  common/filter | 8 
>  common/rc | 6 ++
>  2 files changed, 14 insertions(+)
>
> diff --git a/common/filter b/common/filter
> index 05f2fab..1be377c 100644
> --- a/common/filter
> +++ b/common/filter
> @@ -261,6 +261,14 @@ _filter_xfs_io_blocks_modified()
> _filter_xfs_io_units_modified "Block" $BLOCK_SIZE
>  }
>
> +_filter_xfs_io_pages_modified()
> +{
> +   PAGE_SIZE=$(get_page_size)
> +
> +   _filter_xfs_io_units_modified "Page" $PAGE_SIZE
> +}
> +
> +
>  _filter_test_dir()
>  {
> sed -e "s,$TEST_DEV,TEST_DEV,g" -e "s,$TEST_DIR,TEST_DIR,g"
> diff --git a/common/rc b/common/rc
> index 4c2f42c..82c1bbb 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -3151,6 +3151,12 @@ get_block_size()
> echo `stat -f -c %S $1`
>  }
>
> +get_page_size()
> +{
> +   echo $(getconf PAGE_SIZE)
> +}
> +
> +
>  init_rc
>
>  
> 
> --
> 2.1.0
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 4/5] Fix btrfs/056 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:16 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified and _filter_od
> filtering functions to print information in terms of file blocks rather than
> file offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/056 |  51 ++
>  tests/btrfs/056.out | 152 
> +---
>  2 files changed, 90 insertions(+), 113 deletions(-)
>
> diff --git a/tests/btrfs/056 b/tests/btrfs/056
> index 66a59b8..6dc3bfd 100755
> --- a/tests/btrfs/056
> +++ b/tests/btrfs/056
> @@ -68,33 +68,42 @@ test_btrfs_clone_fsync_log_recover()
> MOUNT_OPTIONS="$MOUNT_OPTIONS $2"
> _mount_flakey
>
> -   # Create a file with 4 extents and 1 hole, all with a size of 8Kb 
> each.
> -   # The hole is in the range [16384, 24576[.
> -   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b 8192 0 8192" \
> -   -c "pwrite -S 0x02 -b 8192 8192 8192" \
> -   -c "pwrite -S 0x04 -b 8192 24576 8192" \
> -   -c "pwrite -S 0x05 -b 8192 32768 8192" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> -
> -   # Clone destination file, 1 extent of 96kb.
> -   $XFS_IO_PROG -f -c "pwrite -S 0xff -b 98304 0 98304" -c "fsync" \
> -   $SCRATCH_MNT/bar | _filter_xfs_io
> -
> -   # Clone second half of the 2nd extent, the 8kb hole, the 3rd extent
> +   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +   EXTENT_SIZE=$((2 * $BLOCK_SIZE))
> +
> +   # Create a file with 4 extents and 1 hole, all with a size of
> +   # 2 blocks each.
> +   # The hole is in the block range [4, 5].
> +   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b $EXTENT_SIZE 0 $EXTENT_SIZE" 
> \
> +   -c "pwrite -S 0x02 -b $EXTENT_SIZE $((2 * 
> $BLOCK_SIZE)) $EXTENT_SIZE" \
> +   -c "pwrite -S 0x04 -b $EXTENT_SIZE $((6 * 
> $BLOCK_SIZE)) $EXTENT_SIZE" \
> +   -c "pwrite -S 0x05 -b $EXTENT_SIZE $((8 * 
> $BLOCK_SIZE)) $EXTENT_SIZE" \
> +   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +   # Clone destination file, 1 extent of 24 blocks.
> +   $XFS_IO_PROG -f -c "pwrite -S 0xff -b $((24 * $BLOCK_SIZE)) 0 $((24 * 
> $BLOCK_SIZE))" \
> +-c "fsync" $SCRATCH_MNT/bar | 
> _filter_xfs_io_blocks_modified
> +
> +   # Clone second half of the 2nd extent, the 2 block hole, the 3rd 
> extent
> # and the first half of the 4th extent into file bar.
> -   $CLONER_PROG -s 12288 -d 0 -l 24576 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
> +   $CLONER_PROG -s $((3 * $BLOCK_SIZE)) -d 0 -l $((6 * $BLOCK_SIZE)) \
> +$SCRATCH_MNT/foo $SCRATCH_MNT/bar
> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
>
> # Test small files too consisting of 1 inline extent
> -   $XFS_IO_PROG -f -c "pwrite -S 0x00 -b 3500 0 3500" -c "fsync" \
> -   $SCRATCH_MNT/foo2 | _filter_xfs_io
> +   EXTENT_SIZE=$(($BLOCK_SIZE - 48))
> +   $XFS_IO_PROG -f -c "pwrite -S 0x00 -b $EXTENT_SIZE 0 $EXTENT_SIZE" -c 
> "fsync" \
> +   $SCRATCH_MNT/foo2 | _filter_xfs_io_blocks_modified
>
> -   $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 1000 0 1000" -c "fsync" \
> -   $SCRATCH_MNT/bar2 | _filter_xfs_io
> +   EXTENT_SIZE=$(($BLOCK_SIZE - 1048))
> +   $XFS_IO_PROG -f -c "pwrite -S 0xcc -b $EXTENT_SIZE 0 $EXTENT_SIZE" -c 
> "fsync" \
> +   $SCRATCH_MNT/bar2 | _filter_xfs_io_blocks_modified
>
> # Clone the entire foo2 file into bar2, overwriting all data in bar2
> # and increasing its size.
> -   $CLONER_PROG -s 0 -d 0 -l 3500 $SCRATCH_MNT/foo2 $SCRATCH_MNT/bar2
> +   EXTENT_SIZE=$(($BLOCK_SIZE - 48))
> +   $CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo2 
> $SCRATCH_MNT/bar2
> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar2
>
> _flakey_drop_and_remount yes
> @@ -102,10 +111,10 @@ test_btrfs_clone_fsync_log_recover()
> # Verify the cloned range was persisted by fsync and the log recovery
> # code did its work well.
> echo "Verifying file bar content"
> -   od -t x1 $SCRATCH_MNT/bar
> +   od -t x1 $SCRATCH_MNT/bar | _filter_od
>
> echo "Verifying file bar2 content"
> -   od -t x1 $SCRATCH_MNT/bar2
> +   od -t x1 $SCRATCH_MNT/bar2 | _filter_od
>
> _unmount_flakey
>
> diff --git a/tests/btrfs/056.out b/tests/btrfs/056.out
> index 1b77ae3..c4c6b2c 100644
> --- a/tests/btrfs/056.out
> +++ b/tests/btrfs/056.out
> @@ -1,129 +1,97 @@
>  QA output created by 056
>  Testing without the NO_HOLES feature
> -wrote 8192/8192 bytes at offset 0
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 8192/8192 bytes at offset 8192
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX 

Re: subvols, ro- and bind mounts - how?

2015-12-10 Thread Christoph Anton Mitterer
Hey.

I'd have an additional question about subvols O:-)

Given the following setup:
5
|
+--root (subvol, /)
   +-- mnt (dir)

with the following done:
- init 1
- remount,ro / (i.e. the subvol root)
- mount /dev/btrfs-device /mnt (i.e. mount the top subvol at /mnt)

The following happened:
- / was ro-mounted (obviously, at least one thing that I had expected
  correctly)
- /mnt was ro-mounted either (and the /mnt/root/ nested subvol then as
  well).
  => why is /mnt (i.e. the top level subvol) mounted ro??
  => I would have expected that, since / (i.e. the subvol "root" is ro
     mounted), it's also ro mounted as the nested subvol below 5, i.e.
     my naive thinking was in terms of logic:
     "/ mounted ro" => "subvol root is mounted ro (everywhere)"
       => "thus /mnt/root/ is mounted ro as well"

However, the later doesn't seem to be true, cause then I did:
- remount,rw /mnt
=> now /mnt/*, including /mnt/root/* was rw moutned



So I guess my assumption of subvols behaving more or less as if they'd
be a fs (and thus mounted at one place ro => everywhere ro) is not
true, is it?

Do, ro,rw (and possibly others) instead only affect the respective
mountpoint?
And automatically any nested subvols of that mountpoint?

So I could have basically:
/mount-point1/subvol-a  ; ro, noexec
/mount-point2/subvol-a  ; rw, compress=yes
/root   ; rw, compress=no
/root/here/it/is/nested/subvol-a ; (no mountpoint)

(with subvol-a being the same subvol)

And when I write via mount-point1 I'd get an error, but via mount-
point2 it works and in addition I get compression, while when writing
via the /root mountpoint, where it is nested, I'd get the rw and
compress=no from the "parent" mountpoint /root


Does that sounds correct?
It seems to make sense actually, though it's a bit unfamiliar... if I'm
not correctly wrong, than e.g. in terms of ext* I cannot have the same
fs mounted with different settings,... of course I cannot have it
mounted twice at all, but speaking of bind mounts.

So I guess, that when I'd do --bind mounts instead, I actually do get
the "old" behaviour, i.e. when the source is ro, then the --bind
mount's target is also forcibly ro.


Still, one unclear thing, why got /mnt mounted ro very above?



Thanks,
Chris.

btw: Not sure if I just missed it, but I guess the above should be more
or less documented, showing people that mounting subvols (especially
when mounting the same several times, either directly or as nested
subvol) has these implications.

smime.p7s
Description: S/MIME cryptographic signature


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Liu Bo
On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote:
> On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote:
> > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
> >  > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
> >  > > Not sure if I've already reported this one, but I've been seeing this
> >  > > a lot this last couple days.
> >  > > 
> >  > > kernel BUG at mm/page-writeback.c:2654!
> >  > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> >  > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
> > 4.4.0-rc4-think+ #14
> >  > > task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
> >  > > RIP: 0010:[]  [] 
> > clear_page_dirty_for_io+0x180/0x1d0
> >  > 
> >  > Huh, are you able to reproduce at will?  From this code path it should
> >  > mean somebody else is unlocking a page they don't own.
> > 
> > pretty easily yeah. I hit it maybe a couple dozen times yesterday.
> > So if you've got some idea of printk's to spray anywhere I can give
> > that a shot.
> 
> I'd rather try to trigger it here.  Going to have to add some way to
> record which stack trace last unlocked and/or freed the page.

Looks like a bisect with 4.3 might target the commit.

Thanks,

-liubo
> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Chris Mason
On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote:
> On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
>  > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
>  > > Not sure if I've already reported this one, but I've been seeing this
>  > > a lot this last couple days.
>  > > 
>  > > kernel BUG at mm/page-writeback.c:2654!
>  > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
>  > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
> 4.4.0-rc4-think+ #14
>  > > task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
>  > > RIP: 0010:[]  [] 
> clear_page_dirty_for_io+0x180/0x1d0
>  > 
>  > Huh, are you able to reproduce at will?  From this code path it should
>  > mean somebody else is unlocking a page they don't own.
> 
> pretty easily yeah. I hit it maybe a couple dozen times yesterday.
> So if you've got some idea of printk's to spray anywhere I can give
> that a shot.

I'd rather try to trigger it here.  Going to have to add some way to
record which stack trace last unlocked and/or freed the page.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Georg Lukas
* Chris Mason  [2015-12-10 20:02]:
> Huh, are you able to reproduce at will?  From this code path it should
> mean somebody else is unlocking a page they don't own.

I've got another code path causing this bug that happened during a
"btrfs dev delete missing". Didn't try to reproduce it though, but
downgraded to 4.3 where it doesn't happen:


[10661.929152] BTRFS info (device dm-1): relocating block group 19173384781824 
flags 17
[10709.050290] [ cut here ]
[10709.050316] kernel BUG at mm/page-writeback.c:2654!
[10709.050338] invalid opcode:  [#1] SMP 
[10709.050366] Modules linked in: dm_crypt loop sha256_ssse3 sha256_generic 
hmac drbg ansi_cprng xts gf128mul algif_skcipher af_alg cpuid nfsd auth_rpcg
ss oid_registry nfs_acl nfs lockd grace fscache sunrpc btrfs xor intel_rapl 
iosf_mbi x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass snd_p
cm crct10dif_pclmul snd_timer evdev snd crc32_pclmul iTCO_wdt 
iTCO_vendor_support soundcore cryptd psmouse pcspkr serio_raw hpilo hpwdt 
lpc_ich raid6_pq
 8250_fintek mfd_core acpi_power_meter button pcc_cpufreq acpi_cpufreq tpm_tis 
tpm shpchp processor coretemp ipmi_watchdog dm_mod ipmi_si ipmi_poweroff 
ipmi_devintf ipmi_msghandler fuse autofs4 ext4 crc16 mbcache jbd2 hid_generic 
usbhid hid sg sd_mod ses enclosure usb_storage crc32c_intel ahci libahci libata 
scsi_mod uhci_hcd thermal xhci_pci xhci_hcd
[10709.050854]  tg3 ptp pps_core libphy ehci_pci ehci_hcd usbcore usb_common
[10709.050899] CPU: 1 PID: 14215 Comm: btrfs Tainted: GW   
4.4.0-rc4-gl+ #44
[10709.050933] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 06/06/2014
[10709.050956] task: 8800791e5100 ti: 880016a2c000 task.ti: 
880016a2c000
[10709.050989] RIP: 0010:[]  [] 
clear_page_dirty_for_io+0xd3/0x190
[10709.051030] RSP: 0018:880016a2f7c0  EFLAGS: 00010246
[10709.051051] RAX: 0180082c RBX: ea000185efc0 RCX: ea000185efc0
[10709.051073] RDX:  RSI: 880016a2f7c0 RDI: ea000185efc0
[10709.051096] RBP: 880036a4b700 R08: 168a3405fe9c R09: 
[10709.051118] R10:  R11: 03e5 R12: 880036a4b700
[10709.051140] R13: 880016a2f8a0 R14:  R15: ea000185efc0
[10709.051162] FS:  7ff63f1708c0() GS:88007ac2() 
knlGS:
[10709.051196] CS:  0010 DS:  ES:  CR0: 80050033
[10709.051217] CR2: 7fbab82c8000 CR3: 2fc6 CR4: 001406e0
[10709.051238] Stack:
[10709.051255]  880016a2f830 880016a2f910 880036a4b700 
880016a2f8a0
[10709.051298]   a0525bed 880016a2f8d8 
36a4b428
[10709.051340]   0002 880036a4b598 
002a
[10709.051384] Call Trace:
[10709.051417]  [] ? 
extent_write_cache_pages.isra.31.constprop.51+0x14d/0x330 [btrfs]
[10709.051460]  [] ? extent_writepages+0x48/0x60 [btrfs]
[10709.051489]  [] ? btrfs_real_readdir+0x4f0/0x4f0 [btrfs]
[10709.051513]  [] ? __filemap_fdatawrite_range+0xa2/0xe0
[10709.051543]  [] ? btrfs_fdatawrite_range+0x16/0x40 [btrfs]
[10709.051572]  [] ? 
__btrfs_write_out_cache.isra.25+0x3c4/0x410 [btrfs]
[10709.051613]  [] ? btrfs_write_out_cache+0x83/0xd0 [btrfs]
[10709.051641]  [] ? 
btrfs_write_dirty_block_groups+0x232/0x2a0 [btrfs]
[10709.051679]  [] ? commit_cowonly_roots+0x206/0x2a3 [btrfs]
[10709.051708]  [] ? btrfs_commit_transaction+0x516/0x9f0 
[btrfs]
[10709.051748]  [] ? start_transaction+0x90/0x480 [btrfs]
[10709.051776]  [] ? relocate_block_group+0x2b8/0x6a0 [btrfs]
[10709.051806]  [] ? btrfs_wait_ordered_roots+0x1a3/0x1c0 
[btrfs]
[10709.051845]  [] ? btrfs_relocate_block_group+0x197/0x270 
[btrfs]
[10709.051886]  [] ? btrfs_relocate_chunk.isra.38+0x3c/0xc0 
[btrfs]
[10709.051926]  [] ? btrfs_shrink_device+0x196/0x520 [btrfs]
[10709.051955]  [] ? btrfs_rm_device+0x30e/0x7b0 [btrfs]
[10709.051984]  [] ? btrfs_ioctl+0x20bb/0x2e10 [btrfs]
[10709.052007]  [] ? page_add_file_rmap+0xa/0x50
[10709.052029]  [] ? do_set_pte+0xc8/0xf0
[10709.052050]  [] ? filemap_map_pages+0x208/0x210
[10709.052073]  [] ? mem_cgroup_try_charge+0x5f/0x1a0
[10709.052095]  [] ? handle_mm_fault+0x11df/0x16a0
[10709.052118]  [] ? getname_flags+0x6a/0x1e0
[10709.052140]  [] ? do_vfs_ioctl+0x293/0x470
[10709.052162]  [] ? SyS_ioctl+0x6f/0x80
[10709.052183]  [] ? entry_SYSCALL_64_fastpath+0x12/0x6d
[10709.052205] Code: b4 24 00 01 00 00 f0 0f ba 33 04 72 20 31 db 48 85 ed 0f 
85 9a 00 00 00 4c 89 ef e8 28 1a 05 00 89 d8 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 
4d 85 ed 74 0c 49 8b 85 68 02 00 00 65 48 ff 48 20 be 0b 
[10709.052472] RIP  [] clear_page_dirty_for_io+0xd3/0x190
[10709.052499]  RSP 
[10709.052860] ---[ end trace d5563d95fa19d835 ]---

Hope that helps,


Georg


signature.asc
Description: Digital signature


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Filipe Manana
On Thu, Dec 10, 2015 at 9:35 PM, Liu Bo  wrote:
> On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote:
>> On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote:
>> > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
>> >  > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
>> >  > > Not sure if I've already reported this one, but I've been seeing this
>> >  > > a lot this last couple days.
>> >  > >
>> >  > > kernel BUG at mm/page-writeback.c:2654!
>> >  > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
>> >  > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
>> > 4.4.0-rc4-think+ #14
>> >  > > task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
>> >  > > RIP: 0010:[]  [] 
>> > clear_page_dirty_for_io+0x180/0x1d0
>> >  >
>> >  > Huh, are you able to reproduce at will?  From this code path it should
>> >  > mean somebody else is unlocking a page they don't own.
>> >
>> > pretty easily yeah. I hit it maybe a couple dozen times yesterday.
>> > So if you've got some idea of printk's to spray anywhere I can give
>> > that a shot.
>>
>> I'd rather try to trigger it here.  Going to have to add some way to
>> record which stack trace last unlocked and/or freed the page.
>
> Looks like a bisect with 4.3 might target the commit.

Not necessarily a regression added in 4.3. We've had this issue
reported before on older releases, like 4.1 for example:

https://lkml.org/lkml/2015/5/19/190

>
> Thanks,
>
> -liubo
>>
>> -chris
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols, ro- and bind mounts - how?

2015-12-10 Thread S.J.



Hey.

I'd have an additional question about subvols O:-)

Given the following setup:
5
|
+--root (subvol, /)
+-- mnt (dir)

with the following done:
- init 1
- remount,ro / (i.e. the subvol root)
- mount /dev/btrfs-device /mnt (i.e. mount the top subvol at /mnt)

The following happened:
- / was ro-mounted (obviously, at least one thing that I had expected
   correctly)
- /mnt was ro-mounted either (and the /mnt/root/ nested subvol then as
   well).
   => why is /mnt (i.e. the top level subvol) mounted ro??
   => I would have expected that, since / (i.e. the subvol "root" is ro
  mounted), it's also ro mounted as the nested subvol below 5, i.e.
  my naive thinking was in terms of logic:
  "/ mounted ro" => "subvol root is mounted ro (everywhere)"
=> "thus /mnt/root/ is mounted ro as well"

However, the later doesn't seem to be true, cause then I did:
- remount,rw /mnt
=> now /mnt/*, including /mnt/root/* was rw moutned



So I guess my assumption of subvols behaving more or less as if they'd
be a fs (and thus mounted at one place ro => everywhere ro) is not
true, is it?

Do, ro,rw (and possibly others) instead only affect the respective
mountpoint?
And automatically any nested subvols of that mountpoint?

So I could have basically:
/mount-point1/subvol-a  ; ro, noexec
/mount-point2/subvol-a  ; rw, compress=yes
/root   ; rw, compress=no
/root/here/it/is/nested/subvol-a ; (no mountpoint)

(with subvol-a being the same subvol)

And when I write via mount-point1 I'd get an error, but via mount-
point2 it works and in addition I get compression, while when writing
via the /root mountpoint, where it is nested, I'd get the rw and
compress=no from the "parent" mountpoint /root


Does that sounds correct?
It seems to make sense actually, though it's a bit unfamiliar... if I'm
not correctly wrong, than e.g. in terms of ext* I cannot have the same
fs mounted with different settings,... of course I cannot have it
mounted twice at all, but speaking of bind mounts.

So I guess, that when I'd do --bind mounts instead, I actually do get
the "old" behaviour, i.e. when the source is ro, then the --bind
mount's target is also forcibly ro.


Still, one unclear thing, why got /mnt mounted ro very above?



Thanks,
Chris.

btw: Not sure if I just missed it, but I guess the above should be more
or less documented, showing people that mounting subvols (especially
when mounting the same several times, either directly or as nested
subvol) has these implications.


Quote:

" Most mount options apply to the whole filesystem, and only the options 
for the first subvolume
to be mounted will take effect. This is due to lack of implementation 
and may change in the future. "


from https://btrfs.wiki.kernel.org/index.php/Mount_options in a red box 
on the top.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols, ro- and bind mounts - how?

2015-12-10 Thread Christoph Anton Mitterer
On Thu, 2015-12-10 at 23:36 +0100, S.J. wrote:
> Quote:
> 
> " Most mount options apply to the whole filesystem, and only the
> options 
> for the first subvolume
> to be mounted will take effect. This is due to lack of implementation
> and may change in the future. "
> 
> from https://btrfs.wiki.kernel.org/index.php/Mount_options in a red
> box 
> on the top.

I've had read that, but it doesn't really make clear that that options
can effectively differ for the *same* subvol, when mounted several
times (or when appearing additionally as nested subvolume).

Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote:
 > On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote:
 > > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
 > >  > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
 > >  > > Not sure if I've already reported this one, but I've been seeing this
 > >  > > a lot this last couple days.
 > >  > > 
 > >  > > kernel BUG at mm/page-writeback.c:2654!
 > >  > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 > >  > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
 > > 4.4.0-rc4-think+ #14
 > >  > > task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
 > >  > > RIP: 0010:[]  [] 
 > > clear_page_dirty_for_io+0x180/0x1d0
 > >  > 
 > >  > Huh, are you able to reproduce at will?  From this code path it should
 > >  > mean somebody else is unlocking a page they don't own.
 > > 
 > > pretty easily yeah. I hit it maybe a couple dozen times yesterday.
 > > So if you've got some idea of printk's to spray anywhere I can give
 > > that a shot.
 > 
 > I'd rather try to trigger it here.  Going to have to add some way to
 > record which stack trace last unlocked and/or freed the page.

perhaps a clue.. this is the log fragment from the last trinity run..

[child4:1416] [25] splice(fd_in=257, off_in=0x0, fd_out=257, off_out=0x0, 
len=0xe000, flags=0x0) = -1 (Invalid argument)
[child4:1416] [26] pwrite64(fd=257, buf=0x2253530, count=782, pos=0x40404000) 
[child5:1414] [28] semop(semid=-16274, tsops=0x0, nsops=0x20c6b4d5) = 
-1 (Invalid argument)
[child5:1414] [29] mlockall(flags=0x1) 
[child6:1427] [50] getgid() = 1000
[child6:1427] [51] mlock(addr=0x7f4c00d62000, len=0) 
[child7:1402] [79] setuid(uid=0x790617fe) = -1 (Invalid argument)
[child7:1402] [80] write(fd=257, buf=0x2255d90, count=2296) 

the oops I got was in write(), so it looked like a parallel write() with 
pwrite() was the trigger.

Sure enough, doing a run with -c write -c pwrite64 reproduces it even faster.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote:
 > On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote:
 > > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
 > >  > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
 > >  > > Not sure if I've already reported this one, but I've been seeing this
 > >  > > a lot this last couple days.
 > >  > > 
 > >  > > kernel BUG at mm/page-writeback.c:2654!
 > >  > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 > >  > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
 > > 4.4.0-rc4-think+ #14
 > >  > > task: 880462811b80 ti: 8800cd808000 task.ti: 8800cd808000
 > >  > > RIP: 0010:[]  [] 
 > > clear_page_dirty_for_io+0x180/0x1d0
 > >  > 
 > >  > Huh, are you able to reproduce at will?  From this code path it should
 > >  > mean somebody else is unlocking a page they don't own.
 > > 
 > > pretty easily yeah. I hit it maybe a couple dozen times yesterday.
 > > So if you've got some idea of printk's to spray anywhere I can give
 > > that a shot.
 > 
 > I'd rather try to trigger it here.  Going to have to add some way to
 > record which stack trace last unlocked and/or freed the page.

I'm using..

trinity -q -l off -C8 -a64 -x fsync -x fdatasync -x syncfs -x sync 
--enable-fds=testfile,pseudo

interestingly, if I just use 'testfile' by itself, I can't reproduce it.
(That means "create a bunch a few files in current dir and use their fds")
the "pseudo" bit means "also use fds from /proc, /sys and /dev".

strange.

(also, using trinity.git rather than the last version released, though
 I doubt it makes a difference in this case)

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] reflink: more tests

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 11:14:29AM -0800, Darrick J. Wong wrote:
> On Thu, Dec 10, 2015 at 08:34:49AM -0800, Christoph Hellwig wrote:
> > The new 849 fails reliably on btrfs, which makes me wonder if either
> > the test is doing something wrong, or the btrfs whole file clone
> > behavior is broken, which wouldn't be very reasuring.  I didn't have
> > time to look into why it's failing yet.
> 
> Huh.  Works reliably for /me; could you send me the output from 849?

--- tests/generic/849.out   2015-12-09 15:31:50.492879152 +
+++ /root/xfstests/results//generic/849.out.bad 2015-12-11
00:02:25.154347175 +
@@ -1,6 +1,7 @@
 QA output created by 849
 Create the original files
 f4820540fc0ac02750739896fe028d56  TEST_DIR/test-849/file1
-dc881c004745c49f7f4e9cc766f57bc8  TEST_DIR/test-849/file2
+eb34153e9ed1e774db28cbbe4090a449  TEST_DIR/test-849/file2
 dc881c004745c49f7f4e9cc766f57bc8  TEST_DIR/test-849/file2.chk
 Compare against check files
+file2 and file2.chk do not match
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs fixes for 4.4

2015-12-10 Thread fdmanana
From: Filipe Manana 

Hi Chris,

Please consider the following fixes for kernel 4.4. Two of them are fixes
to new issues introduced in the 4.4 merge window and 4.4 release candidates.
The other one just fixes a warning message that is confusing and has made
several users wonder if they are supposed to do anything or not when we
fail to read a space cache.
All these fixes have been previously sent to the mailing list.

Thanks.

The following changes since commit dba72cb30b6a4811038128c8a98b268d18ca60fe:

  btrfs: fix balance range usage filters in 4.4-rc (2015-11-25 05:27:33 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git for-chris-4.4

for you to fetch changes up to 94356889c404faf050895099fd0d23f8bef118c4:

  btrfs: fix misleading warning when space cache failed to load (2015-12-10 
11:38:08 +)


Filipe Manana (2):
  Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list
  Btrfs: fix transaction handle leak in balance

Holger Hoffstätte (1):
  btrfs: fix misleading warning when space cache failed to load

 fs/btrfs/extent-tree.c  | 10 +++---
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/transaction.c  |  1 -
 fs/btrfs/transaction.h  |  2 +-
 fs/btrfs/volumes.c  |  3 +--
 5 files changed, 10 insertions(+), 8 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: attacking btrfs filesystems via UUID collisions?

2015-12-10 Thread Austin S Hemmelgarn
On 2015-12-09 16:48, S.J. wrote:
>> 1. better practices, we really need to tell users, and documentation
>> writers, that using dd (or variant) to copy Btrfs volumes has a
>> consequence and should not be used to make copies.
> 
>> 2. Btrfs needs a better way to make a copy of a volume when there are
>> snapshots (including even rw snapshots); e.g. permit send/receive to
>> work on rw snapshots if the fs is ro mounted; e.g. a way to do
>> "recursive" send/receive.
> 
>> 3. Some way to fail gracefully, when there's ambiguity that cannot be
>> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
>> then there's simply no way to resolve the ambiguity automatically, and
>> the volume should just refuse to rw mount until the user resolves the
>> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
>> in such a case rather than totally fail to mount.
> 
> About 3:
> RO fallback for the second device/partitions is not good.
> It won't stop confusing the two partitions, and even if both are RO,
> thinking it's ok to read and then reading the wrong data is bad.
> 
> About 1 and 2 ... if 3 gets fulfilled, why?
> DD itself is not a problem "if" the UUID is changed after it
> (which is a command as simple as dd), and if someone doesn't
> know that, he/she will notice when mount refuses to work
> because UUID duplicate.
Unless things have changed significantly, changing the UUID on a BTRFS
image is not anywhere near as simple as copying it with dd.  The UUID
gets used internally somehow, and changing it would require rewriting
_all_ the metadata blocks.




smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] Btrfs: fix transaction handle leak in balance

2015-12-10 Thread fdmanana
From: Filipe Manana 

If we fail to allocate a new data chunk, we were jumping to the error path
without release the transaction handle we got before. Fix this by always
releasing it before doing the jump.

Fixes: 2c9fe8355258 ("btrfs: Fix lost-data-profile caused by balance bg")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/volumes.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3e55e07..f5e5e20 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3548,12 +3548,11 @@ again:
 
ret = btrfs_force_chunk_alloc(trans, chunk_root,
  BTRFS_BLOCK_GROUP_DATA);
+   btrfs_end_transaction(trans, chunk_root);
if (ret < 0) {
mutex_unlock(_info->delete_unused_bgs_mutex);
goto error;
}
-
-   btrfs_end_transaction(trans, chunk_root);
chunk_reserved = 1;
}
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols and parents - how?

2015-12-10 Thread Austin S Hemmelgarn
On 2015-12-09 22:56, Duncan wrote:
> Austin S Hemmelgarn posted on Wed, 09 Dec 2015 14:04:06 -0500 as
> excerpted:
> 
>> Agreed.  It's not too bad fixing a Gentoo system (as long as
>> /var/lib/portage/world is still correct, you can just nuke the installed
>> package database and emerge world, it'll take time, but it will get your
>> system in a guaranteed consistent state).
> 
> For sufficiently loose values of "consistent", yes, as I found out by 
> experience.  But it can be done, and I do have the experience to prove it.
> 
> What happens in practice is that while yes, as long as @world is correct 
> you can install to current and have all those files tracked again as 
> appropriate, if your package installation database is missing or out of 
> sync with what's actually on your filesystem(s), where the new version of 
> various packages will replace older files as they come across them during 
> the install process (subject to CONFIG_PROTECT of course, this part isn't 
> the problem), the problem is actually where the files of the actually 
> installed but untracked version differ from those of the version you're 
> installing.
Oh, definately, it's a useable system short term, but not something you
should be depending on.  The other big difference though is that it's
then trivial to bootstrap a clean install on the same system if you have
the space for it (which is what I usually end up doing).



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 2/8] Fix btrfs/052 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/052 | 122 +
>  tests/btrfs/052.out | 744 
> +++-
>  2 files changed, 515 insertions(+), 351 deletions(-)
>
> diff --git a/tests/btrfs/052 b/tests/btrfs/052
> index c75193d..b760b92 100755
> --- a/tests/btrfs/052
> +++ b/tests/btrfs/052
> @@ -59,78 +59,98 @@ test_btrfs_clone_same_file()
> _scratch_mkfs >/dev/null 2>&1
> _scratch_mount $MOUNT_OPTIONS
>
> -   # Create a file with 5 extents, 4 of 8Kb each and 1 of 64Kb.
> -   $XFS_IO_PROG -f -c "pwrite -S 0x01 -b 8192 0 8192" $SCRATCH_MNT/foo \
> -   | _filter_xfs_io
> +   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +   EXTENT_SIZE=$((2 * $BLOCK_SIZE))
> +
> +   # Create a file with 5 extents, 4 extents of 2 blocks each and 1 
> extent
> +   # of 16 blocks.
> +   OFFSET=0
> +   $XFS_IO_PROG -f -c "pwrite -S 0x01 -b $EXTENT_SIZE $OFFSET 
> $EXTENT_SIZE" $SCRATCH_MNT/foo \
> +   | _filter_xfs_io_blocks_modified
> sync
> -   $XFS_IO_PROG -c "pwrite -S 0x02 -b 8192 8192 8192" $SCRATCH_MNT/foo \
> -   | _filter_xfs_io
> +
> +   OFFSET=$(($OFFSET + $EXTENT_SIZE))
> +   $XFS_IO_PROG -c "pwrite -S 0x02 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
> $SCRATCH_MNT/foo \
> +   | _filter_xfs_io_blocks_modified
> sync
> -   $XFS_IO_PROG -c "pwrite -S 0x03 -b 8192 16384 8192" $SCRATCH_MNT/foo \
> -   | _filter_xfs_io
> +
> +   OFFSET=$(($OFFSET + $EXTENT_SIZE))
> +   $XFS_IO_PROG -c "pwrite -S 0x03 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
> $SCRATCH_MNT/foo \
> +   | _filter_xfs_io_blocks_modified
> sync
> -   $XFS_IO_PROG -c "pwrite -S 0x04 -b 8192 24576 8192" $SCRATCH_MNT/foo \
> -   | _filter_xfs_io
> +
> +   OFFSET=$(($OFFSET + $EXTENT_SIZE))
> +   $XFS_IO_PROG -c "pwrite -S 0x04 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
> $SCRATCH_MNT/foo \
> +   | _filter_xfs_io_blocks_modified
> sync
> -   $XFS_IO_PROG -c "pwrite -S 0x05 -b 65536 32768 65536" 
> $SCRATCH_MNT/foo \
> -   | _filter_xfs_io
> +
> +   OFFSET=$(($OFFSET + $EXTENT_SIZE))
> +   EXTENT_SIZE=$((16 * $BLOCK_SIZE))
> +   $XFS_IO_PROG -c "pwrite -S 0x05 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
> $SCRATCH_MNT/foo \
> +   | _filter_xfs_io_blocks_modified
> sync
>
> -   # Digest of initial content.
> -   md5sum $SCRATCH_MNT/foo | _filter_scratch
> +   # Initial file content.
> +   od -t x1 $SCRATCH_MNT/foo | _filter_od
>
> # Same source and target ranges - must fail.
> -   $CLONER_PROG -s 8192 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +   $CLONER_PROG -s $((2 * $BLOCK_SIZE)) -d $((2 * $BLOCK_SIZE)) \
> +-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> # Check file content didn't change.
> -   md5sum $SCRATCH_MNT/foo | _filter_scratch
> +   od -t x1 $SCRATCH_MNT/foo | _filter_od
>
> # Intersection between source and target ranges - must fail too.
> -   $CLONER_PROG -s 4096 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +   # $CLONER_PROG -s 4096 -d 8192 -l 8192 $SCRATCH_MNT/foo 
> $SCRATCH_MNT/foo
> +   $CLONER_PROG -s $((1 * $BLOCK_SIZE)) -d $((2 * $BLOCK_SIZE)) \
> +-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> # Check file content didn't change.
> -   md5sum $SCRATCH_MNT/foo | _filter_scratch
> +   od -t x1 $SCRATCH_MNT/foo | _filter_od
>
> # Clone an entire extent from a higher range to a lower range.
> -   $CLONER_PROG -s 24576 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> -
> -   # Check entire file, the 8Kb block at offset 0 now has the same 
> content
> -   # as the 8Kb block at offset 24576.
> -   od -t x1 $SCRATCH_MNT/foo
> +   $CLONER_PROG -s $((6 * $BLOCK_SIZE)) -d 0 -l $((2 * $BLOCK_SIZE)) \
> +$SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +   # Check entire file, 0th and 1st blocks now have the same content
> +   # as the 6th and 7th blocks.
> +   od -t x1 $SCRATCH_MNT/foo | _filter_od
>
> # Clone an entire extent from a lower range to a higher range.
> -   $CLONER_PROG -s 8192 -d 16384 -l 8192 $SCRATCH_MNT/foo 
> $SCRATCH_MNT/foo
> -
> -   # Check entire file, the 8Kb block at offset 0 now has the same 
> content
> -   # as the 8Kb block at offset 24576, and the 8Kb block at offset 16384
> -   # now has the same content as the 8Kb block at offset 8192.
> -   od -t x1 

Re: [PATCH 6/8] Fix btrfs/098 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/098 | 67 
> +
>  tests/btrfs/098.out | 27 -
>  2 files changed, 58 insertions(+), 36 deletions(-)
>
> diff --git a/tests/btrfs/098 b/tests/btrfs/098
> index 8aef119..49f6d16 100755
> --- a/tests/btrfs/098
> +++ b/tests/btrfs/098
> @@ -58,43 +58,50 @@ _scratch_mkfs >>$seqres.full 2>&1
>  _init_flakey
>  _mount_flakey
>
> -# Create our test file with a single 100K extent starting at file offset 
> 800K.
> -# We fsync the file here to make the fsync log tree gets a single csum item 
> that
> -# covers the whole 100K extent, which causes the second fsync, done after the
> -# cloning operation below, to not leave in the log tree two csum items 
> covering
> -# two sub-ranges ([0, 20K[ and [20K, 100K[)) of our extent.
> -$XFS_IO_PROG -f -c "pwrite -S 0xaa 800K 100K"  \
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +# Create our test file with a single 25 block extent starting at file offset
> +# mapped by 200th block We fsync the file here to make the fsync log tree 
> get a
> +# single csum item that covers the whole 25 block extent, which causes the
> +# second fsync, done after the cloning operation below, to not leave in the 
> log
> +# tree two csum items covering two block sub-ranges ([0, 5[ and [5, 25[)) of 
> our
> +# extent.
> +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((200 * $BLOCK_SIZE)) $((25 * 
> $BLOCK_SIZE))" \
> -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> +   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
>
> -# Now clone part of our extent into file offset 400K. This adds a file extent
> -# item to our inode's metadata that points to the 100K extent we created 
> before,
> -# using a data offset of 20K and a data length of 20K, so that it refers to
> -# the sub-range [20K, 40K[ of our original extent.
> -$CLONER_PROG -s $((800 * 1024 + 20 * 1024)) -d $((400 * 1024)) \
> -   -l $((20 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +# Now clone part of our extent into file offset mapped by 100th block. This 
> adds
> +# a file extent item to our inode's metadata that points to the 25 block 
> extent
> +# we created before, using a data offset of 5 blocks and a data length of 5
> +# blocks, so that it refers to the block sub-range [5, 10[ of our original
> +# extent.
> +$CLONER_PROG -s $(((200 * $BLOCK_SIZE) + (5 * $BLOCK_SIZE))) \
> +-d $((100 * $BLOCK_SIZE)) -l $((5 * $BLOCK_SIZE)) \
> +$SCRATCH_MNT/foo $SCRATCH_MNT/foo
>
>  # Now fsync our file to make sure the extent cloning is durably persisted. 
> This
>  # fsync will not add a second csum item to the log tree containing the 
> checksums
> -# for the blocks in the sub-range [20K, 40K[ of our extent, because there was
> +# for the blocks in the block sub-range [5, 10[ of our extent, because there 
> was
>  # already a csum item in the log tree covering the whole extent, added by the
>  # first fsync we did before.
>  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
>
> -echo "File digest before power failure:"
> -md5sum $SCRATCH_MNT/foo | _filter_scratch
> +echo "File contents before power failure:"
> +od -t x1 $SCRATCH_MNT/foo | _filter_od
>
>  # The fsync log replay first processes the file extent item corresponding to 
> the
> -# file offset 400K (the one which refers to the [20K, 40K[ sub-range of our 
> 100K
> -# extent) and then processes the file extent item for file offset 800K. It 
> used
> -# to happen that when processing the later, it erroneously left in the csum 
> tree
> -# 2 csum items that overlapped each other, 1 for the sub-range [20K, 40K[ 
> and 1
> -# for the whole range of our extent. This introduced a problem where 
> subsequent
> -# lookups for the checksums of blocks within the range [40K, 100K[ of our 
> extent
> -# would not find anything because lookups in the csum tree ended up looking 
> only
> -# at the smaller csum item, the one covering the subrange [20K, 40K[. This 
> made
> -# read requests assume an expected checksum with a value of 0 for those 
> blocks,
> -# which caused checksum verification failure when the read operations 
> finished.
> +# file offset mapped by 100th block (the one which refers to the [5, 10[ 
> block
> +# sub-range of our 25 block extent) and then processes the file extent item 
> for
> +# file offset mapped by 200th block. It used to happen that when processing 
> the
> +# later, it erroneously left in the csum tree 2 csum items that overlapped 
> each
> +# other, 1 for the block sub-range [5, 10[ and 1 for the whole range of our
> +# 

Re: [PATCH V2 3/5] Fix btrfs/055 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:16 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified and _filter_od
> filtering functions to print information in terms of file blocks rather than
> file offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/055 | 128 ++
>  tests/btrfs/055.out | 378 
> +---
>  2 files changed, 259 insertions(+), 247 deletions(-)
>
> diff --git a/tests/btrfs/055 b/tests/btrfs/055
> index c0dd9ed..1f50850 100755
> --- a/tests/btrfs/055
> +++ b/tests/btrfs/055
> @@ -60,88 +60,110 @@ test_btrfs_clone_with_holes()
> _scratch_mkfs "$1" >/dev/null 2>&1
> _scratch_mount
>
> -   # Create a file with 4 extents and 1 hole, all with a size of 8Kb 
> each.
> -   # The hole is in the range [16384, 24576[.
> -   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b 8192 0 8192" \
> -   -c "pwrite -S 0x02 -b 8192 8192 8192" \
> -   -c "pwrite -S 0x04 -b 8192 24576 8192" \
> -   -c "pwrite -S 0x05 -b 8192 32768 8192" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> +   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
>
> -   # Clone destination file, 1 extent of 96kb.
> -   $XFS_IO_PROG -s -f -c "pwrite -S 0xff -b 98304 0 98304" \
> -   $SCRATCH_MNT/bar | _filter_xfs_io
> +   EXTENT_SIZE=$((2 * $BLOCK_SIZE))
>
> -   # Clone 2nd extent, 8Kb hole and 3rd extent of foo into bar.
> -   $CLONER_PROG -s 8192 -d 0 -l 24576 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
> +   OFFSET=0
> +
> +   # Create a file with 4 extents and 1 hole, all with 2 blocks each.
> +   # The hole is in the block range [4, 5[.
> +   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b $EXTENT_SIZE $OFFSET 
> $EXTENT_SIZE" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +   OFFSET=$(($OFFSET + $EXTENT_SIZE))
> +   $XFS_IO_PROG -s -f -c "pwrite -S 0x02 -b $EXTENT_SIZE $OFFSET 
> $EXTENT_SIZE" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +   OFFSET=$(($OFFSET + 2 * $EXTENT_SIZE))
> +   $XFS_IO_PROG -s -f -c "pwrite -S 0x04 -b $EXTENT_SIZE $OFFSET 
> $EXTENT_SIZE" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +   OFFSET=$(($OFFSET + $EXTENT_SIZE))
> +   $XFS_IO_PROG -s -f -c "pwrite -S 0x05 -b $EXTENT_SIZE $OFFSET 
> $EXTENT_SIZE" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +   # Clone destination file, 1 extent of 24 blocks.
> +   EXTENT_SIZE=$((24 * $BLOCK_SIZE))
> +   $XFS_IO_PROG -s -f -c "pwrite -S 0xff -b $EXTENT_SIZE 0 $EXTENT_SIZE" 
> \
> +   $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified
> +
> +   # Clone 2nd extent, 2-blocks sized hole and 3rd extent of foo into 
> bar.
> +   $CLONER_PROG -s $((2 * $BLOCK_SIZE)) -d 0 -l $((6 * $BLOCK_SIZE)) \
> +$SCRATCH_MNT/foo $SCRATCH_MNT/bar
>
> # Verify both extents and the hole were cloned.
> echo "1) Check both extents and the hole were cloned"
> -   od -t x1 $SCRATCH_MNT/bar
> +   od -t x1 $SCRATCH_MNT/bar | _filter_od
>
> -   # Cloning range starts at the middle of an hole.
> -   $CLONER_PROG -s 20480 -d 32768 -l 12288 $SCRATCH_MNT/foo \
> -   $SCRATCH_MNT/bar
> +   # Cloning range starts at the middle of a hole.
> +   $CLONER_PROG -s $((5 * $BLOCK_SIZE)) -d $((8 * $BLOCK_SIZE)) \
> +-l $((3 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/bar
>
> -   # Verify that half of the hole and the following 8Kb extent were 
> cloned.
> -   echo "2) Check half hole and one 8Kb extent were cloned"
> -   od -t x1 $SCRATCH_MNT/bar
> +   # Verify that half of the hole and the following 2 block extent were 
> cloned.
> +   echo "2) Check half hole and the following 2 block extent were cloned"
> +   od -t x1 $SCRATCH_MNT/bar | _filter_od
>
> -   # Cloning range ends at the middle of an hole.
> -   $CLONER_PROG -s 0 -d 65536 -l 20480 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
> +   # Cloning range ends at the middle of a hole.
> +   $CLONER_PROG -s 0 -d $((16 * $BLOCK_SIZE)) -l $((5 * $BLOCK_SIZE)) \
> +$SCRATCH_MNT/foo $SCRATCH_MNT/bar
>
> -   # Verify that 2 extents of 8kb and a 4kb hole were cloned.
> -   echo "3) Check that 2 extents of 8kb eacg and a 4kb hole were cloned"
> -   od -t x1 $SCRATCH_MNT/bar
> +   # Verify that 2 extents of 2 blocks size and a 1-block hole were 
> cloned.
> +   echo "3) Check that 2 extents of 2 blocks each and a hole of 1 block 
> were cloned"
> +   od -t x1 $SCRATCH_MNT/bar | _filter_od
>
> -   # Create a 24Kb hole at the end of the source file (foo).
> -   $XFS_IO_PROG -c 

Re: [PATCH V2 5/5] Fix btrfs/096 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/096 | 45 +
>  tests/btrfs/096.out | 15 +--
>  2 files changed, 30 insertions(+), 30 deletions(-)
>
> diff --git a/tests/btrfs/096 b/tests/btrfs/096
> index f5b3a7f..896a209 100755
> --- a/tests/btrfs/096
> +++ b/tests/btrfs/096
> @@ -51,30 +51,35 @@ rm -f $seqres.full
>  _scratch_mkfs >>$seqres.full 2>&1
>  _scratch_mount
>
> -# Create our test files. File foo has the same 2K of data at offset 4K as 
> file
> -# bar has at its offset 0.
> -$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
> -   -c "pwrite -S 0xbb 4k 2K" \
> -   -c "pwrite -S 0xcc 8K 4K" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
>
> -# File bar consists of a single inline extent (2K size).
> -$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
> -   $SCRATCH_MNT/bar | _filter_xfs_io
> +# Create our test files. File foo has the same 2k of data at offset 
> $BLOCK_SIZE
> +# as file bar has at its offset 0.
> +$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 $BLOCK_SIZE" \
> +   -c "pwrite -S 0xbb $BLOCK_SIZE 2k" \
> +   -c "pwrite -S 0xcc $(($BLOCK_SIZE * 2)) $BLOCK_SIZE" \
> +   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
> -# Now call the clone ioctl to clone the extent of file bar into file foo at 
> its
> -# offset 4K. This made file foo have an inline extent at offset 4K, something
> -# which the btrfs code can not deal with in future IO operations because all
> -# inline extents are supposed to start at an offset of 0, resulting in all 
> sorts
> -# of chaos.
> -# So here we validate that the clone ioctl returns an EOPNOTSUPP, which is 
> what
> -# it returns for other cases dealing with inlined extents.
> -$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
> +# File bar consists of a single inline extent (2k in size).
> +$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2k" \
> +   $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified
> +
> +# Now call the clone ioctl to clone the extent of file bar into file
> +# foo at its $BLOCK_SIZE offset. This made file foo have an inline
> +# extent at offset $BLOCK_SIZE, something which the btrfs code can not
> +# deal with in future IO operations because all inline extents are
> +# supposed to start at an offset of 0, resulting in all sorts of
> +# chaos.
> +# So here we validate that the clone ioctl returns an EOPNOTSUPP,
> +# which is what it returns for other cases dealing with inlined
> +# extents.
> +$CLONER_PROG -s 0 -d $BLOCK_SIZE -l 2048 \
> $SCRATCH_MNT/bar $SCRATCH_MNT/foo
>
> -# Because of the inline extent at offset 4K, the following write made the 
> kernel
> -# crash with a BUG_ON().
> -$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
> +# Because of the inline extent at offset $BLOCK_SIZE, the following
> +# write made the kernel crash with a BUG_ON().
> +$XFS_IO_PROG -c "pwrite -S 0xdd $(($BLOCK_SIZE + 2048)) 2k" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
>  status=0
>  exit
> diff --git a/tests/btrfs/096.out b/tests/btrfs/096.out
> index 235198d..2a4251e 100644
> --- a/tests/btrfs/096.out
> +++ b/tests/btrfs/096.out
> @@ -1,12 +1,7 @@
>  QA output created by 096
> -wrote 4096/4096 bytes at offset 0
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 2048/2048 bytes at offset 4096
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 4096/4096 bytes at offset 8192
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 2048/2048 bytes at offset 0
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +Blocks modified: [0 - 0]
> +Blocks modified: [1 - 1]
> +Blocks modified: [2 - 2]
> +Blocks modified: [0 - 0]
>  clone failed: Operation not supported
> -wrote 2048/2048 bytes at offset 6144
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +Blocks modified: [1 - 1]
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/5] Fix btrfs/017 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:16 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/017 | 16 
>  tests/btrfs/017.out |  3 +--
>  2 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/tests/btrfs/017 b/tests/btrfs/017
> index f8855e3..34c5f0a 100755
> --- a/tests/btrfs/017
> +++ b/tests/btrfs/017
> @@ -63,13 +63,21 @@ rm -f $seqres.full
>  _scratch_mkfs "--nodesize 65536" >>$seqres.full 2>&1
>  _scratch_mount
>
> -$XFS_IO_PROG -f -d -c "pwrite 0 8K" $SCRATCH_MNT/foo | _filter_xfs_io
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +EXTENT_SIZE=$((2 * $BLOCK_SIZE))
> +
> +$XFS_IO_PROG -f -d -c "pwrite 0 $EXTENT_SIZE" $SCRATCH_MNT/foo \
> +   | _filter_xfs_io_blocks_modified
>
>  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
>
> -$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink
> -$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink
> -$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
> $SCRATCH_MNT/snap/foo-reflink2
> +$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo 
> $SCRATCH_MNT/foo-reflink
> +
> +$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo \
> +$SCRATCH_MNT/snap/foo-reflink
> +
> +$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo \
> +$SCRATCH_MNT/snap/foo-reflink2
>
>  _run_btrfs_util_prog quota enable $SCRATCH_MNT
>  _run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
> diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out
> index f940f3a..503eb88 100644
> --- a/tests/btrfs/017.out
> +++ b/tests/btrfs/017.out
> @@ -1,5 +1,4 @@
>  QA output created by 017
> -wrote 8192/8192 bytes at offset 0
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +Blocks modified: [0 - 1]
>  65536 65536
>  65536 65536
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/5] Filter xfs_io and od's output in units of FS block size

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:16 AM, Chandan Rajendra
 wrote:
> The helpers introduced in this commit will be used to make btrfs tests that
> assume 4k as the block size to work on non-4k blocksized filesystem instances
> as well.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  common/filter | 45 +
>  1 file changed, 45 insertions(+)
>
> diff --git a/common/filter b/common/filter
> index af456c9..05f2fab 100644
> --- a/common/filter
> +++ b/common/filter
> @@ -229,6 +229,38 @@ _filter_xfs_io_unique()
>  common_line_filter | _filter_xfs_io
>  }
>
> +_filter_xfs_io_units_modified()
> +{
> +   UNIT=$1
> +   UNIT_SIZE=$2
> +
> +   $AWK_PROG -v unit="$UNIT" -v unit_size=$UNIT_SIZE '
> +   /wrote/ {
> +   split($2, bytes, "/")
> +
> +   bytes_written = strtonum(bytes[1])
> +
> +   offset = strtonum($NF)
> +
> +   unit_start = offset / unit_size
> +   unit_start = int(unit_start)
> +   unit_end = (offset + bytes_written - 1) / unit_size
> +   unit_end = int(unit_end)
> +
> +   printf("%ss modified: [%d - %d]\n", unit, unit_start, 
> unit_end)
> +
> +   next
> +   }
> +   '
> +}
> +
> +_filter_xfs_io_blocks_modified()
> +{
> +   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +   _filter_xfs_io_units_modified "Block" $BLOCK_SIZE
> +}
> +
>  _filter_test_dir()
>  {
> sed -e "s,$TEST_DEV,TEST_DEV,g" -e "s,$TEST_DIR,TEST_DIR,g"
> @@ -323,5 +355,18 @@ _filter_ro_mount() {
> -e "s/mount: cannot mount block device/mount: cannot mount/g"
>  }
>
> +_filter_od()
> +{
> +   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +   $AWK_PROG -v block_size=$BLOCK_SIZE '
> +   /^[0-9]+/ {
> +   offset = strtonum("0"$1);
> +   $1 = sprintf("%o", offset / block_size);
> +   print $0;
> +   }
> +   /\*/
> +   '
> +}
> +
>  # make sure this script returns success
>  /bin/true
> --
> 2.1.0
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvols, ro- and bind mounts - how?

2015-12-10 Thread Chris Murphy
On Thu, Dec 10, 2015 at 3:36 PM, S.J.  wrote:

> Quote:
>
> " Most mount options apply to the whole filesystem, and only the options for
> the first subvolume
> to be mounted will take effect. This is due to lack of implementation and
> may change in the future. "
>
> from https://btrfs.wiki.kernel.org/index.php/Mount_options in a red box on
> the top.

That seems due for a revision because I do rw, ro, rw, rw, ro mounts
in sequence and they stick fine. In fact they stick with the same
subvolume.

[root@f23m ]# mount /dev/sda7 /mnt/1 -o subvol=home
[root@f23m ]# mount /dev/sda7 /mnt/2 -o subvol=home,ro
[root@f23m ]# mount /dev/sda7 /mnt/3 -o subvol=home
[root@f23m ]# mount
[...snip...]
/dev/sda7 on /mnt/1 type btrfs
(rw,relatime,seclabel,ssd,space_cache,subvolid=258,subvol=/home)
/dev/sda7 on /mnt/2 type btrfs
(ro,relatime,seclabel,ssd,space_cache,subvolid=258,subvol=/home)
/dev/sda7 on /mnt/3 type btrfs
(rw,relatime,seclabel,ssd,space_cache,subvolid=258,subvol=/home)

And Project Atomic, a.k.a. ostree and rpm-ostree etc., depend on
mounting different parts of the same fs volume to different mounts
points with different read and read/write settings (bind mounts), and
that works too. http://projectatomic.io/


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 05:57:20PM -0500, Dave Jones wrote:
 > On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote:
 >  > On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote:
 >  > > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote:
 >  > >  > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote:
 >  > >  > > Not sure if I've already reported this one, but I've been seeing 
 > this
 >  > >  > > a lot this last couple days.
 >  > >  > > 
 >  > >  > > kernel BUG at mm/page-writeback.c:2654!
 >  > >  > > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 >  > >  > > CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW   
 > 4.4.0-rc4-think+ #14
 >  > >  > > task: 880462811b80 ti: 8800cd808000 task.ti: 
 > 8800cd808000
 >  > >  > > RIP: 0010:[]  [] 
 > clear_page_dirty_for_io+0x180/0x1d0
 >  > >  > 
 >  > >  > Huh, are you able to reproduce at will?  From this code path it 
 > should
 >  > >  > mean somebody else is unlocking a page they don't own.
 >  > > 
 >  > > pretty easily yeah. I hit it maybe a couple dozen times yesterday.
 >  > > So if you've got some idea of printk's to spray anywhere I can give
 >  > > that a shot.
 >  > 
 >  > I'd rather try to trigger it here.  Going to have to add some way to
 >  > record which stack trace last unlocked and/or freed the page.
 > 
 > I'm using..
 > 
 > trinity -q -l off -C8 -a64 -x fsync -x fdatasync -x syncfs -x sync 
 > --enable-fds=testfile,pseudo
 > 
 > interestingly, if I just use 'testfile' by itself, I can't reproduce it.
 > (That means "create a bunch a few files in current dir and use their fds")
 > the "pseudo" bit means "also use fds from /proc, /sys and /dev".

Actually scratch that. I finally got it to reproduce with just 'testfile'.
Which makes sense, given its a btrfs bug we're chasing, not anything to do
with sys/proc

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: attacking btrfs filesystems via UUID collisions?

2015-12-10 Thread S.J.


Am 10.12.2015 13:41, schrieb Hugo Mills:

On Thu, Dec 10, 2015 at 07:08:51AM -0500, Austin S Hemmelgarn wrote:

On 2015-12-09 16:48, S.J. wrote:

1. better practices, we really need to tell users, and documentation
writers, that using dd (or variant) to copy Btrfs volumes has a
consequence and should not be used to make copies.
2. Btrfs needs a better way to make a copy of a volume when there are
snapshots (including even rw snapshots); e.g. permit send/receive to
work on rw snapshots if the fs is ro mounted; e.g. a way to do
"recursive" send/receive.
3. Some way to fail gracefully, when there's ambiguity that cannot be
resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
then there's simply no way to resolve the ambiguity automatically, and
the volume should just refuse to rw mount until the user resolves the
ambiguity. I think it's OK to fallback to ro mount (maybe) by default
in such a case rather than totally fail to mount.

About 3:
RO fallback for the second device/partitions is not good.
It won't stop confusing the two partitions, and even if both are RO,
thinking it's ok to read and then reading the wrong data is bad.

About 1 and 2 ... if 3 gets fulfilled, why?
DD itself is not a problem "if" the UUID is changed after it
(which is a command as simple as dd), and if someone doesn't
know that, he/she will notice when mount refuses to work
because UUID duplicate.

Unless things have changed significantly, changing the UUID on a BTRFS
image is not anywhere near as simple as copying it with dd.  The UUID
gets used internally somehow, and changing it would require rewriting
_all_ the metadata blocks.

Indeed, but there is now a tool to do that. :) (btrfstune -u or -U)

Hugo.


Yes, I meant that :)
I'm not saying that the tool is internally as simple as a
"dumb" dd block copy , but calling it certainly is.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: attacking btrfs filesystems via UUID collisions?

2015-12-10 Thread Hugo Mills
On Thu, Dec 10, 2015 at 07:08:51AM -0500, Austin S Hemmelgarn wrote:
> On 2015-12-09 16:48, S.J. wrote:
> >> 1. better practices, we really need to tell users, and documentation
> >> writers, that using dd (or variant) to copy Btrfs volumes has a
> >> consequence and should not be used to make copies.
> > 
> >> 2. Btrfs needs a better way to make a copy of a volume when there are
> >> snapshots (including even rw snapshots); e.g. permit send/receive to
> >> work on rw snapshots if the fs is ro mounted; e.g. a way to do
> >> "recursive" send/receive.
> > 
> >> 3. Some way to fail gracefully, when there's ambiguity that cannot be
> >> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
> >> then there's simply no way to resolve the ambiguity automatically, and
> >> the volume should just refuse to rw mount until the user resolves the
> >> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
> >> in such a case rather than totally fail to mount.
> > 
> > About 3:
> > RO fallback for the second device/partitions is not good.
> > It won't stop confusing the two partitions, and even if both are RO,
> > thinking it's ok to read and then reading the wrong data is bad.
> > 
> > About 1 and 2 ... if 3 gets fulfilled, why?
> > DD itself is not a problem "if" the UUID is changed after it
> > (which is a command as simple as dd), and if someone doesn't
> > know that, he/she will notice when mount refuses to work
> > because UUID duplicate.
> Unless things have changed significantly, changing the UUID on a BTRFS
> image is not anywhere near as simple as copying it with dd.  The UUID
> gets used internally somehow, and changing it would require rewriting
> _all_ the metadata blocks.

   Indeed, but there is now a tool to do that. :) (btrfstune -u or -U)

   Hugo.

-- 
Hugo Mills | Go not to the elves for counsel, for they will say
hugo@... carfax.org.uk | both no and yes.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: [PATCH 4/8] Fix btrfs/095 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/095 | 110 
> +---
>  tests/btrfs/095.out |  42 
>  2 files changed, 96 insertions(+), 56 deletions(-)
>
> diff --git a/tests/btrfs/095 b/tests/btrfs/095
> index 1b4ba90..dec530c 100755
> --- a/tests/btrfs/095
> +++ b/tests/btrfs/095
> @@ -63,84 +63,98 @@ _scratch_mkfs >>$seqres.full 2>&1
>  _init_flakey
>  _mount_flakey
>
> -# Create prealloc extent covering range [160K, 620K[
> -$XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
>
> -# Now write to the last 80K of the prealloc extent plus 40K to the 
> unallocated
> -# space that immediately follows it. This creates a new extent of 40K that 
> spans
> -# the range [620K, 660K[.
> -$XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io
> +# Create prealloc extent covering file block range [40, 155[
> +$XFS_IO_PROG -f -c "falloc $((40 * $BLOCK_SIZE)) $((115 * $BLOCK_SIZE))" \
> +$SCRATCH_MNT/foo
> +
> +# Now write to the last 20 blocks of the prealloc extent plus 10 blocks to 
> the
> +# unallocated space that immediately follows it. This creates a new extent 
> of 10
> +# blocks that spans the block range [155, 165[.
> +$XFS_IO_PROG -c "pwrite -S 0xaa $((135 * $BLOCK_SIZE)) $((30 * 
> $BLOCK_SIZE))" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
>  # At this point, there are now 2 back references to the prealloc extent in 
> our
> -# extent tree. Both are for our file offset 160K and one relates to a file
> -# extent item with a data offset of 0 and a length of 380K, while the other
> -# relates to a file extent item with a data offset of 380K and a length of 
> 80K.
> +# extent tree. Both are for our file offset mapped by the 40th block of the 
> file
> +# and one relates to a file extent item with a data offset of 0 and a length 
> of
> +# 95 blocks, while the other relates to a file extent item with a data 
> offset of
> +# 95 blocks and a length of 20 blocks.
>
>  # Make sure everything done so far is durably persisted (all back references 
> are
>  # in the extent tree, etc).
>  sync
>
> -# Now clone all extents of our file that cover the offset 160K up to its eof
> -# (660K at this point) into itself at offset 2M. This leaves a hole in the 
> file
> -# covering the range [660K, 2M[. The prealloc extent will now be referenced 
> by
> -# the file twice, once for offset 160K and once for offset 2M. The 40K extent
> -# that follows the prealloc extent will also be referenced twice by our file,
> -# once for offset 620K and once for offset 2M + 460K.
> -$CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 
> $SCRATCH_MNT/foo \
> -   $SCRATCH_MNT/foo
> -
> -# Now create one new extent in our file with a size of 100Kb. It will span 
> the
> -# range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
> -# range [2M + 460K, 3M[. Our new file size is 3M + 100K.
> -$XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io
> +# Now clone all extents of our file that cover the file range spanned by 40th
> +# block up to its eof (165th block at this point) into itself at 512th
> +# block. This leaves a hole in the file covering the block range [165, 512[. 
> The
> +# prealloc extent will now be referenced by the file twice, once for offset
> +# mapped by the 40th block and once for offset mapped by 512th block. The 10
> +# blocks extent that follows the prealloc extent will also be referenced 
> twice
> +# by our file, once for offset mapped by the 155th block and once for offset
> +# (512 block + 115 blocks)
> +$CLONER_PROG -s $((40 * $BLOCK_SIZE)) -d $((512 * $BLOCK_SIZE)) -l 0 \
> +$SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +
> +# Now create one new extent in our file with a size of 25 blocks. It will 
> span
> +# the block range [768, 768 + 25[. It also will cause creation of a hole
> +# spanning the block range [512 + 115, 768[. Our new file size is the file
> +# offset mapped by (768 + 25)th block.
> +$XFS_IO_PROG -c "pwrite -S 0xbb $((768 * $BLOCK_SIZE)) $((25 * 
> $BLOCK_SIZE))" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
>  # At this point, there are now (in memory) 4 back references to the prealloc
>  # extent.
>  #
> -# Two of them are for file offset 160K, related to file extent items
> -# matching the file offsets 160K and 540K respectively, with data offsets of
> -# 0 and 380K respectively, and with lengths of 380K and 80K respectively.
> +# Two of them are for file offset mapped by the 40th block, related to file
> +# 

Re: [PATCH 5/8] Fix btrfs/097 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/097 | 41 -
>  tests/btrfs/097.out | 23 +--
>  2 files changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/tests/btrfs/097 b/tests/btrfs/097
> index d9138ea..d1cfff1 100755
> --- a/tests/btrfs/097
> +++ b/tests/btrfs/097
> @@ -57,22 +57,29 @@ mkdir $send_files_dir
>  _scratch_mkfs >>$seqres.full 2>&1
>  _scratch_mount
>
> -# Create our test file with a single extent of 64K starting at file offset 
> 128K.
> -$XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo | 
> _filter_xfs_io
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +# Create our test file with a single extent of 16 blocks starting at a file
> +# offset mapped by 32nd block.
> +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((32 * $BLOCK_SIZE)) $((16 * 
> $BLOCK_SIZE))" \
> +$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
>  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>
>  # Now clone parts of the original extent into lower offsets of the file.
>  #
>  # The first clone operation adds a file extent item to file offset 0 that 
> points
> -# to our initial extent with a data offset of 16K. The corresponding data 
> back
> -# reference in the extent tree has an offset of 18446744073709535232, which 
> is
> -# the result of file_offset - data_offset = 0 - 16K.
> -#
> -# The second clone operation adds a file extent item to file offset 16K that
> -# points to our initial extent with a data offset of 48K. The corresponding 
> data
> -# back reference in the extent tree has an offset of 18446744073709518848, 
> which
> -# is the result of file_offset - data_offset = 16K - 48K.
> +# to our initial extent with a data offset of 4 blocks. The corresponding 
> data back
> +# reference in the extent tree has a large value for the 'offset' field, 
> which is
> +# the result of file_offset - data_offset = 0 - (file offset of 4th block).  
> For
> +# example in case of 4k block size, it will be 0 - 16k = 
> 18446744073709535232.
> +
> +# The second clone operation adds a file extent item to file offset mapped by
> +# 4th block that points to our initial extent with a data offset of 12
> +# blocks. The corresponding data back reference in the extent tree has a 
> large
> +# value for the 'offset' field, which is the result of file_offset - 
> data_offset
> +# = (file offset of 4th block) - (file offset of 12th block). For example in
> +# case of 4k block size, it will be 16K - 48K = 18446744073709518848.
>  #
>  # Those large back reference offsets (result of unsigned arithmetic 
> underflow)
>  # confused the back reference walking code (used by an incremental send and
> @@ -83,10 +90,10 @@ _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT 
> $SCRATCH_MNT/mysnap1
>  # "BTRFS error (device sdc): did not find backref in send_root. inode=257, \
>  #  offset=0, disk_byte=12845056 found extent=12845056"
>  #
> -$CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \
> -   $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> -$CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) -l $((16 * 1024)) \
> +$CLONER_PROG -s $(((32 + 4) * $BLOCK_SIZE)) -d 0 -l $((4 * $BLOCK_SIZE)) \
> $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +$CLONER_PROG -s $(((32 + 12) * $BLOCK_SIZE)) -d $((4 * $BLOCK_SIZE)) \
> +-l $((4 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
>
>  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>
> @@ -94,8 +101,8 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f 
> $send_files_dir/1.snap
>  _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
> -f $send_files_dir/2.snap
>
> -echo "File digest in the original filesystem:"
> -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
> +echo "File contents in the original filesystem:"
> +od -t x1 $SCRATCH_MNT/mysnap2/foo | _filter_od
>
>  # Now recreate the filesystem by receiving both send streams and verify we 
> get
>  # the same file contents that the original filesystem had.
> @@ -106,8 +113,8 @@ _scratch_mount
>  _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
>  _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap
>
> -echo "File digest in the new filesystem:"
> -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
> +echo "File contents in the new filesystem:"
> +od -t x1 $SCRATCH_MNT/mysnap2/foo | _filter_od
>
>  status=0
>  exit
> diff --git a/tests/btrfs/097.out b/tests/btrfs/097.out
> index 5e87eb2..aa9e549 100644
> --- a/tests/btrfs/097.out
> +++ b/tests/btrfs/097.out
> @@ -1,7 +1,18 @@
>  QA output 

Re: [PATCH 7/8] Fix btrfs/103 to work on non-4k block sized filesystems

2015-12-10 Thread Filipe Manana
On Mon, Nov 30, 2015 at 10:17 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
Reviewed-by: Filipe Manana 

Thanks!

> ---
>  tests/btrfs/103 |  44 +++---
>  tests/btrfs/103.out | 132 
> ++--
>  2 files changed, 122 insertions(+), 54 deletions(-)
>
> diff --git a/tests/btrfs/103 b/tests/btrfs/103
> index 3020c86..9d11d0f 100755
> --- a/tests/btrfs/103
> +++ b/tests/btrfs/103
> @@ -56,31 +56,37 @@ test_clone_and_read_compressed_extent()
> _scratch_mkfs >>$seqres.full 2>&1
> _scratch_mount $mount_opts
>
> +   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> # Create a test file with a single extent that is compressed (the
> # data we write into it is highly compressible no matter which
> # compression algorithm is used, zlib or lzo).
> -   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"\
> -   -c "pwrite -S 0xbb 4K 8K"\
> -   -c "pwrite -S 0xcc 12K 4K"   \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> +   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K $((1 * $BLOCK_SIZE))" \
> +   -c "pwrite -S 0xbb $((1 * $BLOCK_SIZE)) $((2 * $BLOCK_SIZE))" 
> \
> +   -c "pwrite -S 0xcc $((3 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" 
> \
> +   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
>
> # Now clone our extent into an adjacent offset.
> -   $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
> -   $SCRATCH_MNT/foo $SCRATCH_MNT/foo
> +   $CLONER_PROG -s $((1 * $BLOCK_SIZE)) -d $((4 * $BLOCK_SIZE)) \
> +-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
>
> # Same as before but for this file we clone the extent into a lower
> # file offset.
> -   $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \
> -   -c "pwrite -S 0xbb 12K 8K"\
> -   -c "pwrite -S 0xcc 20K 4K"\
> -   $SCRATCH_MNT/bar | _filter_xfs_io
> +   $XFS_IO_PROG -f \
> +   -c "pwrite -S 0xaa $((2 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" 
> \
> +   -c "pwrite -S 0xbb $((3 * $BLOCK_SIZE)) $((2 * $BLOCK_SIZE))" 
> \
> +   -c "pwrite -S 0xcc $((5 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" 
> \
> +   $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified
>
> -   $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
> +   $CLONER_PROG -s $((3 * $BLOCK_SIZE)) -d 0 -l $((2 * $BLOCK_SIZE)) \
> $SCRATCH_MNT/bar $SCRATCH_MNT/bar
>
> -   echo "File digests before unmounting filesystem:"
> -   md5sum $SCRATCH_MNT/foo | _filter_scratch
> -   md5sum $SCRATCH_MNT/bar | _filter_scratch
> +   echo "File contents before unmounting filesystem:"
> +   echo "foo:"
> +   od -t x1 $SCRATCH_MNT/foo | _filter_od
> +   echo "bar:"
> +   od -t x1 $SCRATCH_MNT/bar | _filter_od
>
> # Evicting the inode or clearing the page cache before reading again
> # the file would also trigger the bug - reads were returning all bytes
> @@ -91,10 +97,12 @@ test_clone_and_read_compressed_extent()
> # ranges that point to the same compressed extent.
> _scratch_remount
>
> -   echo "File digests after mounting filesystem again:"
> -   # Must match the same digests we got before.
> -   md5sum $SCRATCH_MNT/foo | _filter_scratch
> -   md5sum $SCRATCH_MNT/bar | _filter_scratch
> +   echo "File contents after mounting filesystem again:"
> +   # Must match the same contents we got before.
> +   echo "foo:"
> +   od -t x1 $SCRATCH_MNT/foo | _filter_od
> +   echo "bar:"
> +   od -t x1 $SCRATCH_MNT/bar | _filter_od
>  }
>
>  echo -e "\nTesting with zlib compression..."
> diff --git a/tests/btrfs/103.out b/tests/btrfs/103.out
> index f62de2f..8e31b5f 100644
> --- a/tests/btrfs/103.out
> +++ b/tests/btrfs/103.out
> @@ -1,41 +1,101 @@
>  QA output created by 103
>
>  Testing with zlib compression...
> -wrote 4096/4096 bytes at offset 0
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 8192/8192 bytes at offset 4096
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 4096/4096 bytes at offset 12288
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 4096/4096 bytes at offset 8192
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 8192/8192 bytes at offset 12288
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 4096/4096 bytes at offset 20480
> -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -File digests before unmounting 

Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Markus Trippelsdorf
On 2015.12.08 at 23:25 -0500, Dave Jones wrote:
> Not sure if I've already reported this one, but I've been seeing this
> a lot this last couple days.
> 
> kernel BUG at mm/page-writeback.c:2654!

Just hit the same issue trying to build ghc-7.10.3:

[55704.436096] [ cut here ]
[55704.436155] kernel BUG at mm/page-writeback.c:2654!
[55704.436213] invalid opcode:  [#1] SMP 
[55704.436261] CPU: 2 PID: 17177 Comm: ghc Not tainted 
4.4.0-rc4-00060-g9a0f76fde9ad-dirty #69
[55704.436370] Hardware name: System manufacturer System Product Name/M4A78T-E, 
BIOS 350304/13/2011
[55704.436491] task: 88015c2c1d40 ti: 880199268000 task.ti: 
880199268000
[55704.436585] RIP: 0010:[]  [] 
clear_page_dirty_for_io+0xdd/0x180
[55704.436710] RSP: 0018:88019926bcd0  EFLAGS: 00010246
[55704.436770] RAX: 4868 RBX: ea00029f2080 RCX: 
[55704.436860] RDX:  RSI: 0286 RDI: ea00029f2080
[55704.436949] RBP: 8801c7900e30 R08: a2bc9694357c R09: 
[55704.437037] R10:  R11: 88015c2c1da0 R12: 8801c7900e30
[55704.437131] R13: 88019926bda0 R14: 0007 R15: ea00029f2080
[55704.437222] FS:  7f79ba21e700() GS:88021fd0() 
knlGS:
[55704.437326] CS:  0010 DS:  ES:  CR0: 8005003b
[55704.437395] CR2: 7fffa6d2cff8 CR3: 7ee1b000 CR4: 06e0
[55704.437485] Stack:
[55704.437495]  88019926bd38 88019926be78 8801c7900e30 
812f81ae
[55704.437595]  88019926bdd8 0040  

[55704.437693]  8801c7900cf0 000e 000e 

[55704.437792] Call Trace:
[55704.437812]  [] ? 
extent_write_cache_pages.isra.39.constprop.74+0x14e/0x320
[55704.437925]  [] ? extent_writepages+0x4b/0x120
[55704.437998]  [] ? __start_delalloc_inodes+0x3a0/0x3a0
[55704.438080]  [] ? do_writepages+0x25/0x80
[55704.438152]  [] ? do_signal+0x2c7/0x540
[55704.438228]  [] ? filemap_flush+0x65/0xa0
[55704.438303]  [] ? btrfs_release_file+0x2e/0x40
[55704.438378]  [] ? fput+0xcc/0x1c0
[55704.438438]  [] ? task_work_run+0x6c/0xa0
[55704.438510]  [] ? syscall_return_slowpath+0xcc/0xe0
[55704.438591]  [] ? int_ret_from_sys_call+0x25/0x8f
[55704.438672] Code: e1 81 74 19 49 8b 44 24 18 48 3b 05 be e2 d2 00 0f 84 8a 
00 00 00 48 8b a8 c8 00 00 00 f0 0f ba 33 04 72 82 31 c0 e9 76 ff ff ff <0f> 0b 
48 c7 c0 60 ce e1 81 e9 59 ff ff ff 48 89 df e8 4d 00 01 
[55704.439055] RIP  [] clear_page_dirty_for_io+0xdd/0x180
[55704.439144]  RSP 
[55704.475600] ---[ end trace 09f06afe4a05a024 ]---

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] reflink: more tests

2015-12-10 Thread Christoph Hellwig
The new 849 fails reliably on btrfs, which makes me wonder if either
the test is doing something wrong, or the btrfs whole file clone
behavior is broken, which wouldn't be very reasuring.  I didn't have
time to look into why it's failing yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html