Re: [RFC PATCH v3 0/2] Online data deduplication

2013-05-03 Thread Liu Bo
 You didn't use an INCOPMAT option for this so you need to deal with a user
 mounting the file system with an older kernel or even forgetting to use mount 
 -o
 dedup.  Otherwise your dedup tree will become out of date and you could 
 corrupt
 peoples data.  So if you aren't going to use an INCOMPAT flag you need to at
 least use a COMPAT flag so we know the option has been used at all and then 
 you
 need to have a mechanism to know if you need to invalidate the hash tree.
 
 Users are also going to make the mistake of thinking dedup will make their
 workload awesome, and when it doesn't they need a way to turn it off.  If you 
 do
 an INCOMPAT option then you need to have a way to delete the hash tree and 
 unset
 the INCOMPAT flag.  If you do the COMPAT route then you get this for free 
 since
 the user just needs to stop using -o dedup, but you'll probably also want to
 provide a mechanism to delete the tree to free up space.  Thanks,
 
 Josef

I made a few mistakes on this, yeah I should also provide a dedup disable way
and I'm going to use INCOMPAT.

But forgetting to use mount -o dedup will not get dedup tree to be out of date,
because dedup tree is loaded if we have it, no matter whether using 'mount -o
dedup'.

Thanks for the nice reminder, Josef :)

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating recursive snapshots for all filesystems

2013-05-03 Thread Sander
Alexander Skwar wrote (ao):
 Where I'm hanging right now, is that I can't seem to figure out a
 bullet proof way to find all the subvolumes of the filesystems I
 might have.

 Is there an easier way to achieve what I want? I want to achieve:
 
 Creating recursive snapshots for all filesystems

Not sure if this helps, but I have subvolid=0, which contains all my
subvolumes, mounted under /.root/

/etc/fstab:
LABEL=panda   /  btrfs  
subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd  0  0
LABEL=panda   /home   btrfs   subvol=home   
0  0
LABEL=panda   /root   btrfs   subvol=root   
0  0
LABEL=panda   /varbtrfs   subvol=var
0  0
LABEL=panda   /holdingbtrfs   subvol=.holding   
0  0
LABEL=panda   /.root  btrfs   subvolid=0
0  0
LABEL=panda   /.backupadmin   btrfs   subvol=backupadmin
0  0 
/Varlib   /var/libnonebind  
0  0

panda:~# ls -l /.root/
total 0
drwxr-xr-x. 1 root root 580800 Jan 30 17:46 backupadmin
drwxr-xr-x. 1 root root 24 Mar 27  2012 home
drwx--. 1 root root742 Mar 19 15:50 root
drwxr-xr-x. 1 root root226 May 16  2012 rootvolume
drwxr-xr-x. 1 root root 96 Apr  3  2012 var

In my snapshots script:

  ...
  mmddhhmm=`date +%Y%m%d_%H.%M`
  ...
  for subvolume in `ls /.root/`
  do
...
/sbin/btrfs subvolume snapshot ${filesystem}/${subvolume}/ \
  /.root/.snapshot_${mmddhhmm}_${hostname}_${subvolume}/ || result=2
...
  done
  ...

This creates timestamped snapshots for all subvolumes.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


open_ctree failure on upgrading 3.7 to 3.8 kernel

2013-05-03 Thread Tomasz Kusmierz

Hi,

Long story short:
I've got btrfs raid10 six disk array plus 2 other disks just having a 
normal setup btrfs filesystems.

Everything was running happily under linux 3.5 and 3.7.
3.5 was a stock ubuntu kernel, 3.7 was slightly less stock ubuntu kernel.
Now I've upgraded my box to 3.8 and none of btrfs file systems mounts 
any more. I got open_ctree errors every time I try to mount those. When 
I reboot system choosing old kernel from grub - everything runs smooth 
again. Was there any on disk format change or compatibility change?.





Some kernel.log output:

[ 13.517952] device fsid 9415cddb-e3b8-4977-804c-369553a7eda7 devid 4 
transid 30 /dev/sdh1

[ 13.518535] btrfs: disk space caching is enabled
[ 13.518773] btrfs: failed to read the system array on sdh1
[ 13.523175] btrfs: open_ctree failed
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs subv list - ERROR: Failed to lookup path for root 0 - No such file or directory

2013-05-03 Thread Alexander Skwar
Hi Russel

Russell Coker russell at coker.com.au writes:

 I asked a similar question about 10 days ago and got the below response 
which 
 solved it for me.


Thanks a lot. This solved it for me as well.

Cheers,
Alexander

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Script for creating/managing snapshots of all subvolumes of all filesystems

2013-05-03 Thread Alexander Skwar
Hi

FWIW, I've also written a script which creates and manages
(ie. deletes old) snapshots.

It figures out all the available filesystems and creates snaps
for all the available (sub)volumes.

It's also on https://copy.com/WI9AXqTH2nD4 and http://pastebin.com/YX8WKcsR 
to avoid line break issues and also with comments.

Regards,
Alexander

- cut here

#!/bin/sh
echo Usage: $0 SNAPSHOT_TAG NUM_SNAPSHOTS
Create hourly, daily, weekly, and monthly snapshots of btrfs filesystems.

Based somewhat on http://article.gmane.org/gmane.comp.file-
systems.btrfs/12609

Here's my crontab:
00,15,30,45  * * * *  $0 frequently  4
38   * * * *  $0 hourly 24
08  00 * * *  $0 daily   7
08  12 * * 0  $0 weekly  4
exit 1
fi

SNAPSHOT_TAG=$1
NUM_SNAPSHOTS=$2

snap_prefix=snapshot:$SNAPSHOT_TAG:
snap_date=`date +%Y-%m-%d--%H.%M.%S.%N`
script_name=`basename $0`

log_fac=local5
log_tag=$script_name

btrfs_progs_dev_path=/home/a/Copy/Computerkram/Programme/btrfs-
progs.dev/bin
PATH=$btrfs_progs_dev_path:$PATH

btrfs fi show 2/dev/null | awk '/ path / {print $NF}' | while read dev; do
set -- `btrfs fi show 2/dev/null | grep -B2  path $dev | \
 grep Label: | sed  's,.*: \(.*\)  uuid: \(.*\),\1 \2,'`
label=$1
uuid=$2
logger -t $log_tag -p $log_fac.info -- \
 Processing filesystem with label $label and uuid $uuid on $dev
safe_dev=`echo $dev | tr / .`
tmp_mount_dir=`mktemp -d /tmp/.btrfs.mount.$uuid.$safe_dev.XX`
if ! mount -t btrfs $dev $tmp_mount_dir; then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: mount -t btrfs $dev $tmp_mount_dir
exit 1
fi

_snap_name=$tmp_mount_dir/,$snap_prefix$snap_date
if ! btrfs subv snaps -r $tmp_mount_dir $_snap_name  /dev/null; 
then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: btrfs subv snaps -r $tmp_mount_dir 
$_snap_name
exit 1
else
logger -t $log_tag -p $log_fac.info -- \
Created snapshot $Path,$snap_prefix$snap_date for root volume of fs 
with uuid $uuid
fi
(btrfs subv list -r $tmp_mount_dir | grep  path ,$snap_prefix \
 | tail -$NUM_SNAPSHOTS
 btrfs subv list -r $tmp_mount_dir | grep  path ,$snap_prefix) \
 | sort | uniq -u \
 | while read __id IdDel __gen GenDel __top __level ToplevelDel __path 
PathDel; do
if ! btrfs subv del $tmp_mount_dir/$PathDel  /dev/null; then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: btrfs subv del $tmp_mount_dir/$PathDel
exit 1
else
logger -t $log_tag -p $log_fac.info -- \
 Removed snapshot $PathDel
fi
done

(btrfs subv list -ar $tmp_mount_dir; btrfs subv list -a $tmp_mount_dir) 
\
 | sort | uniq -u \
 | while read _id Id _gen Gen _top _level Toplevel _path Path; do
_snap_name=$tmp_mount_dir/$Path,$snap_prefix$snap_date
if ! btrfs subv snaps -r $tmp_mount_dir/$Path $_snap_name  
/dev/null; then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: btrfs subv snaps -r $tmp_mount_dir/$Path 
$_snap_name
exit 1
else
logger -t $log_tag -p $log_fac.info -- \
 Created snapshot $Path,$snap_prefix$snap_date for subvolume 
$Path
fi
(btrfs subv list -r $tmp_mount_dir \
 | grep  path $Path,$snap_prefix | tail -$NUM_SNAPSHOTS
 btrfs subv list -r $tmp_mount_dir|grep  path 
$Path,$snap_prefix) \
 | sort | uniq -u \
 | while read __id IdDel __gen GenDel __top __level ToplevelDel 
__path PathDel; do
if ! btrfs subv del $tmp_mount_dir/$PathDel  /dev/null; then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: btrfs subv del 
$tmp_mount_dir/$PathDel
exit 1
else
logger -t $log_tag -p $log_fac.info -- \
 Removed snapshot $PathDel
fi
done
done
if ! umount $tmp_mount_dir; then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: umount$tmp_mount_dir
fi
if ! rmdir $tmp_mount_dir; then
logger -t $log_tag -p $log_fac.err -- \
 Error! Could not do: rmdir $tmp_mount_dir
fi
done

exit 0


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating recursive snapshots for all filesystems

2013-05-03 Thread Alexander Skwar
Hi

Sander sander at humilis.net writes:

 
 Alexander Skwar wrote (ao):
  Where I'm hanging right now, is that I can't seem to figure out a
  bullet proof way to find all the subvolumes of the filesystems I
  might have.
 
  Is there an easier way to achieve what I want? I want to achieve:
  
  Creating recursive snapshots for all filesystems
 
 Not sure if this helps, but I have subvolid=0, which contains all my
 subvolumes, mounted under /.root/

Hm, not quite what I'm after and not nearly as easy as ZFS...

Problem with your approach: The admin has to maintain this. 
I was looking for something, which maints itself, so to say.
And your approach also wouldn't scale if there are sub-subvolumes.

ZFS really is so much easier (at least regarding that).

Thanks a lot, though. It's a worthwhile idea.

Regards,
Alexander


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 311: test fsync with dm flakey V3

2013-05-03 Thread Rich Johnston

Josef,

The patch does not compile on older kernels (i.e. SLES11 SP2).

fsync-tester.c: In function 'test_three':
fsync-tester.c:133: warning: implicit declaration of function 'syncfs'
/tmp/cciHR6Gb.o: In function `test_three':
/data/lwork/gulag1c/rjohnston/xfstests/src/fsync-tester.c:133: undefined 
reference to `syncfs'

collect2: ld returned 1 exit status
gmake[3]: *** [fsync-tester] Error 1
gmake[2]: *** [src] Error 2
make[1]: *** [default] Error 2
make: *** [default] Error 2


src/fsync-tester.c
133 syncfs(test_fd);
Typo ?  ^^
Did you mean fsync?

Regards,
--Rich

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: free up reserved space if we fail to insert extent entry

2013-05-03 Thread Josef Bacik
If we are inserting an extent entry for the first allocation of an extent and
the addition fails we need to clean up the reserved space otherwise we'll get
WARN_ON()'s on unmount because we have left over reserve space.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/extent-tree.c |   45 +
 1 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2305b5c..7049bbc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6407,16 +6407,16 @@ static int alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
size = sizeof(*extent_item) + btrfs_extent_inline_ref_size(type);
 
path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
 
path-leave_spinning = 1;
ret = btrfs_insert_empty_item(trans, fs_info-extent_root, path,
  ins, size);
-   if (ret) {
-   btrfs_free_path(path);
-   return ret;
-   }
+   if (ret)
+   goto out;
 
leaf = path-nodes[0];
extent_item = btrfs_item_ptr(leaf, path-slots[0],
@@ -6444,14 +6444,21 @@ static int alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
 
btrfs_mark_buffer_dirty(path-nodes[0]);
btrfs_free_path(path);
+   path = NULL;
 
ret = update_block_group(root, ins-objectid, ins-offset, 1);
-   if (ret) { /* -ENOENT, logic error */
+   if (ret) {
btrfs_err(fs_info, update block group failed for %llu %llu,
(unsigned long long)ins-objectid,
(unsigned long long)ins-offset);
-   BUG();
+   goto out;
}
+
+   return ret;
+out:
+   btrfs_free_path(path);
+   btrfs_pin_extent(root, ins-objectid, ins-offset, 1);
+   btrfs_del_csums(trans, root, ins-objectid, ins-offset);
return ret;
 }
 
@@ -6476,16 +6483,16 @@ static int alloc_reserved_tree_block(struct 
btrfs_trans_handle *trans,
size += sizeof(*block_info);
 
path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
 
path-leave_spinning = 1;
ret = btrfs_insert_empty_item(trans, fs_info-extent_root, path,
  ins, size);
-   if (ret) {
-   btrfs_free_path(path);
-   return ret;
-   }
+   if (ret)
+   goto out;
 
leaf = path-nodes[0];
extent_item = btrfs_item_ptr(leaf, path-slots[0],
@@ -6517,14 +6524,20 @@ static int alloc_reserved_tree_block(struct 
btrfs_trans_handle *trans,
 
btrfs_mark_buffer_dirty(leaf);
btrfs_free_path(path);
+   path = NULL;
 
ret = update_block_group(root, ins-objectid, root-leafsize, 1);
-   if (ret) { /* -ENOENT, logic error */
+   if (ret) {
btrfs_err(fs_info, update block group failed for %llu %llu,
(unsigned long long)ins-objectid,
(unsigned long long)ins-offset);
-   BUG();
+   goto out;
}
+
+   return ret;
+out:
+   btrfs_free_path(path);
+   btrfs_pin_extent(root, ins-objectid, root-leafsize, 1);
return ret;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: increase the max global reserve size to 1gig

2013-05-03 Thread Josef Bacik
Apparently 512mb was too small, with a fs_mark command we could get so much
delayed work built up that we'd never trip the lets commit the transaction
logic until we'd gotten too much delayed refs built up.  Increasing this to 1
gig makes us much safer and we no longer abort with Dave's fs_mark tester.
Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/extent-tree.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7049bbc..f10ac46 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4516,7 +4516,7 @@ static void update_global_block_rsv(struct btrfs_fs_info 
*fs_info)
spin_lock(sinfo-lock);
spin_lock(block_rsv-lock);
 
-   block_rsv-size = min_t(u64, num_bytes, 512 * 1024 * 1024);
+   block_rsv-size = min_t(u64, num_bytes, 1024 * 1024 * 1024);
 
num_bytes = sinfo-bytes_used + sinfo-bytes_pinned +
sinfo-bytes_reserved + sinfo-bytes_readonly +
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/276 - stop all fsstress before exiting

2013-05-03 Thread Rich Johnston

On 04/26/2013 08:10 AM, Eric Sandeen wrote:

On Apr 26, 2013, at 3:35 AM, Jan Schmidt list.bt...@jan-o-sch.net wrote:


On Fri, April 26, 2013 at 07:29 (+0200), Eric Sandeen wrote:

Tests after 276 were failing because the background fsstress
hadn't quit prior to exit, devices couldn't be unmounted, etc.


I don't see how that would happen. Any further insight?


Yes, sorry for not including it.  The parent process was killed, but the 
fsstress processes just got reparented to init.

I tried for a while to use pkill to knock them of first but this seems simpler, 
actually.

Eric



Jan, with Eric's explanation, may I put your Reviewed-by: on this patch?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/276 - stop all fsstress before exiting

2013-05-03 Thread Rich Johnston

Thanks for the patch Eric and the review Jan, this has been committed.

--Rich

commit 0b5677123b5d8c0a29b45f55c7b981aeeca9b2c8
Author: Eric Sandeen sand...@redhat.com
Date:   Fri Apr 26 05:29:21 2013 +

xfstests: btrfs/276 - stop all fsstress before exiting

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: increase the max global reserve size to 1gig

2013-05-03 Thread Liu Bo
On Fri, May 03, 2013 at 08:56:54AM -0400, Josef Bacik wrote:
 Apparently 512mb was too small, with a fs_mark command we could get so much
 delayed work built up that we'd never trip the lets commit the transaction
 logic until we'd gotten too much delayed refs built up.  Increasing this to 1
 gig makes us much safer and we no longer abort with Dave's fs_mark tester.
 Thanks,

I remember that last time I made a similar commit, but users complains that
they cannot boot their system on root btrfs partition due to lacking space and
Chris eventually got to revert that one...

thanks,
liubo

 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/extent-tree.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 7049bbc..f10ac46 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -4516,7 +4516,7 @@ static void update_global_block_rsv(struct 
 btrfs_fs_info *fs_info)
   spin_lock(sinfo-lock);
   spin_lock(block_rsv-lock);
  
 - block_rsv-size = min_t(u64, num_bytes, 512 * 1024 * 1024);
 + block_rsv-size = min_t(u64, num_bytes, 1024 * 1024 * 1024);
  
   num_bytes = sinfo-bytes_used + sinfo-bytes_pinned +
   sinfo-bytes_reserved + sinfo-bytes_readonly +
 -- 
 1.7.7.6
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 311: test fsync with dm flakey V3

2013-05-03 Thread Josef Bacik
On Fri, May 03, 2013 at 06:28:03AM -0600, Rich Johnston wrote:
 Josef,
 
 The patch does not compile on older kernels (i.e. SLES11 SP2).
 
 fsync-tester.c: In function 'test_three':
 fsync-tester.c:133: warning: implicit declaration of function 'syncfs'
 /tmp/cciHR6Gb.o: In function `test_three':
 /data/lwork/gulag1c/rjohnston/xfstests/src/fsync-tester.c:133: undefined 
 reference to `syncfs'
 collect2: ld returned 1 exit status
 gmake[3]: *** [fsync-tester] Error 1
 gmake[2]: *** [src] Error 2
 make[1]: *** [default] Error 2
 make: *** [default] Error 2
 
 
 src/fsync-tester.c
 133 syncfs(test_fd);
 Typo ?  ^^
 Did you mean fsync?
 

Argh crap I should have noticed this in the manpage

syncfs() first appeared in Linux 2.6.39

You can just replace it with sync(), or do you want me to resend the patch with
that change?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 311: test fsync with dm flakey V3

2013-05-03 Thread Rich Johnston

On 05/03/2013 12:30 PM, Josef Bacik wrote:

On Fri, May 03, 2013 at 06:28:03AM -0600, Rich Johnston wrote:

Josef,

The patch does not compile on older kernels (i.e. SLES11 SP2).

fsync-tester.c: In function 'test_three':
fsync-tester.c:133: warning: implicit declaration of function 'syncfs'
/tmp/cciHR6Gb.o: In function `test_three':
/data/lwork/gulag1c/rjohnston/xfstests/src/fsync-tester.c:133: undefined
reference to `syncfs'
collect2: ld returned 1 exit status
gmake[3]: *** [fsync-tester] Error 1
gmake[2]: *** [src] Error 2
make[1]: *** [default] Error 2
make: *** [default] Error 2


src/fsync-tester.c
133 syncfs(test_fd);
Typo ?  ^^
Did you mean fsync?



Argh crap I should have noticed this in the manpage

syncfs() first appeared in Linux 2.6.39

You can just replace it with sync(), or do you want me to resend the patch with
that change?  Thanks,

Josef


No need to repost I will change it to sync() at commit time ;-)

--Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 311: test fsync with dm flakey V3

2013-05-03 Thread Rich Johnston
Thanks for another patch Josef, it has been committed with the change 
discussed.


--Rich

commit 2ca254dfddbbab8def35472b6ca39140400aff76
Author: Josef Bacik jba...@fusionio.com
Date:   Fri Apr 26 19:13:59 2013 +

xfstests 311: test fsync with dm flakey V3


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 311: test fsync with dm flakey V3

2013-05-03 Thread Josef Bacik
On Fri, May 03, 2013 at 12:21:59PM -0600, Rich Johnston wrote:
 Thanks for another patch Josef, it has been committed with the change 
 discussed.
 

Err I forgot to point out I already have a sync variable in there so it fails
to compile, we'll need to change the var to do_sync or something.  Want me to
send a patch along?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 311: test fsync with dm flakey V3

2013-05-03 Thread Rich Johnston

On 05/03/2013 02:05 PM, Josef Bacik wrote:

On Fri, May 03, 2013 at 12:21:59PM -0600, Rich Johnston wrote:

Thanks for another patch Josef, it has been committed with the change
discussed.



Err I forgot to point out I already have a sync variable in there so it fails
to compile, we'll need to change the var to do_sync or something.  Want me to
send a patch along?  Thanks,

Josef


Sorry this was my fault, I have reverted


commit 7f622f44b651aec13b99ef62c2942388a6fbee5d
Author: Rich Johnston rjohns...@sgi.com
Date:   Fri May 3 14:07:59 2013 -0500

Revert xfstests 311: test fsync with dm flakey V3

and committed it again.

commit dd3b5268312e0518ae695e8ee2a618f13805c425
Author: Josef Bacik jba...@fusionio.com
Date:   Fri Apr 26 19:13:59 2013 +

xfstests 311: test fsync with dm flakey V4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests: unmount scratch mnt in test 307

2013-05-03 Thread Josef Bacik
So if you have a mount command that doesn't use /etc/mtab then it will spit out
a different device for the mounted device.  So say we have

SCRATCH_DEV_POOL=/dev/sda /dev/sdb /dev/sdc

we will turn this into

SCRATCH_DEV=/dev/sda
SCRATCH_DEV_POOL=/dev/sdb /dev/sdc

and then when you mkfs this you do _scratch_mkfs $SCRATCH_DEV_POOL which turns
into this

mkfs.btrfs /dev/sdb /dev/sdc /dev/sda

becuase we do

mkfs $* $SCRATCH_DEV

Then btrfs will always show the lowest devid in /proc/mounts to maintain
consistency, so even though we do mount /dev/sda $SCRATCH_MNT, you will see
/dev/sdb as the mounted device in /proc/mounts.  So then say the next test wants
to just use $SCRATCH_DEV, it will do _require_scratchdev which will check to see
if $SCRATCH_DEV is mounted, which it will look like it is not because
/proc/mounts shows /dev/sdb instead of /dev/sda, and so it won't umount
$SCRATCH_MNT, and then that test will fail because we can't mkfs the device
because it is busy.  I reproduced this on a box that doesn't use /etc/mtab by
doing

./check btrfs/307 generic/015

and 015 would fail.  With this patch it passes now.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 tests/btrfs/307 |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tests/btrfs/307 b/tests/btrfs/307
index 87314c6..15157b3 100644
--- a/tests/btrfs/307
+++ b/tests/btrfs/307
@@ -35,6 +35,7 @@ _cleanup()
 {
 cd /
 rm -f $tmp.*
+umount $SCRATCH_MNT
 }
 
 # get standard environment, filters and checks
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: unmount scratch mnt in test 307

2013-05-03 Thread Eric Sandeen
On 5/3/13 3:11 PM, Josef Bacik wrote:
 So if you have a mount command that doesn't use /etc/mtab then it will spit 
 out
 a different device for the mounted device.  So say we have
 
 SCRATCH_DEV_POOL=/dev/sda /dev/sdb /dev/sdc
 
 we will turn this into
 
 SCRATCH_DEV=/dev/sda
 SCRATCH_DEV_POOL=/dev/sdb /dev/sdc
 
 and then when you mkfs this you do _scratch_mkfs $SCRATCH_DEV_POOL which turns
 into this
 
 mkfs.btrfs /dev/sdb /dev/sdc /dev/sda
 
 becuase we do
 
 mkfs $* $SCRATCH_DEV
 
 Then btrfs will always show the lowest devid in /proc/mounts to maintain
 consistency, so even though we do mount /dev/sda $SCRATCH_MNT, you will see
 /dev/sdb as the mounted device in /proc/mounts.  So then say the next test 
 wants
 to just use $SCRATCH_DEV, it will do _require_scratchdev which will check to 
 see
 if $SCRATCH_DEV is mounted, which it will look like it is not because
 /proc/mounts shows /dev/sdb instead of /dev/sda, and so it won't umount
 $SCRATCH_MNT, and then that test will fail because we can't mkfs the device
 because it is busy.  I reproduced this on a box that doesn't use /etc/mtab by
 doing
 
 ./check btrfs/307 generic/015
 
 and 015 would fail.  With this patch it passes now.  Thanks,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  tests/btrfs/307 |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/tests/btrfs/307 b/tests/btrfs/307
 index 87314c6..15157b3 100644
 --- a/tests/btrfs/307
 +++ b/tests/btrfs/307
 @@ -35,6 +35,7 @@ _cleanup()
  {
  cd /
  rm -f $tmp.*
 +umount $SCRATCH_MNT
  }
  
  # get standard environment, filters and checks
 

This seems fine for this particular test.

Is it really a hard requirement that each test unmount SCRATCH_[DEV|MNT] if it 
used it?
If so, fine... the README does indicate this.

But I wonder if we can make it a little more foolproof by updating 
_require_scratch
to handle this situation more gracefully?

-Eric
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: use only memmove_extent_buffer and simplify the helpers

2013-05-03 Thread Josef Bacik
On Mon, Apr 29, 2013 at 07:38:01AM -0600, David Sterba wrote:
 After commit a65917156e34594 (Btrfs: stop using highmem for
 extent_buffers) we don't need to call kmap_atomic anymore and can
 reduce the move_pages helper to a simple memmove.
 
 There's only one caller of memcpy_extent_buffer, we can use the
 memmove_ variant here.
 

This makes -l 64k blow the hell up, just try generic/001.  I'm kicking this
patch out.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: use only memmove_extent_buffer and simplify the helpers

2013-05-03 Thread Chris Mason
Quoting Josef Bacik (2013-05-03 16:33:44)
 On Mon, Apr 29, 2013 at 07:38:01AM -0600, David Sterba wrote:
  After commit a65917156e34594 (Btrfs: stop using highmem for
  extent_buffers) we don't need to call kmap_atomic anymore and can
  reduce the move_pages helper to a simple memmove.
  
  There's only one caller of memcpy_extent_buffer, we can use the
  memmove_ variant here.
  
 
 This makes -l 64k blow the hell up, just try generic/001.  I'm kicking this
 patch out.  Thanks,

Sorry Dave, I only now remember having this same problem the last time I
tried to get rid of memcpy.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


grub/grub2 boot into btrfs raid root and with no initrd

2013-05-03 Thread Martin
I've made a few attempts to boot into a root filesystem created using:

mkfs.btrfs -d raid1 -m raid1 -L btrfs_root_3 /dev/sda3 /dev/sdb3

Both grub and grub2 pick up a kernel image fine from an ext4 /boot on
/dev/sda1 for exaample, but then fail to find or assemble the btrfs root.

Setting up an initrd and grub operates fine for the btrfs raid.


What is the special magic to do this without the need for an initrd?

Is the comment/patch below from last year languishing unknown? Or is
there some problem with that kernel approach?


Thanks,
Martin


See:

http://forums.gentoo.org/viewtopic-t-923554-start-0.html


Below is my patch, which is working fine for me with 3.8.2.
Code:

$ cat /etc/portage/patches/sys-kernel/gentoo-sources/earlydevtmpfs.patch
--- init/do_mounts.c.orig   2013-03-24 20:49:53.446971127 +0100
+++ init/do_mounts.c   2013-03-24 20:51:46.408237541 +0100
@@ -529,6 +529,7 @@
create_dev(/dev/root, ROOT_DEV);
if (saved_root_name[0]) {
   create_dev(saved_root_name, ROOT_DEV);
+  devtmpfs_mount(dev);
   mount_block_root(saved_root_name, root_mountflags);
} else {
   create_dev(/dev/root, ROOT_DEV);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: grub/grub2 boot into btrfs raid root and with no initrd

2013-05-03 Thread Harald Glatt
On Fri, May 3, 2013 at 10:42 PM, Martin m_bt...@ml1.co.uk wrote:
 I've made a few attempts to boot into a root filesystem created using:

 mkfs.btrfs -d raid1 -m raid1 -L btrfs_root_3 /dev/sda3 /dev/sdb3

 Both grub and grub2 pick up a kernel image fine from an ext4 /boot on
 /dev/sda1 for exaample, but then fail to find or assemble the btrfs root.

 Setting up an initrd and grub operates fine for the btrfs raid.


 What is the special magic to do this without the need for an initrd?

 Is the comment/patch below from last year languishing unknown? Or is
 there some problem with that kernel approach?


 Thanks,
 Martin


 See:

 http://forums.gentoo.org/viewtopic-t-923554-start-0.html


 Below is my patch, which is working fine for me with 3.8.2.
 Code:

 $ cat /etc/portage/patches/sys-kernel/gentoo-sources/earlydevtmpfs.patch
 --- init/do_mounts.c.orig   2013-03-24 20:49:53.446971127 +0100
 +++ init/do_mounts.c   2013-03-24 20:51:46.408237541 +0100
 @@ -529,6 +529,7 @@
 create_dev(/dev/root, ROOT_DEV);
 if (saved_root_name[0]) {
create_dev(saved_root_name, ROOT_DEV);
 +  devtmpfs_mount(dev);
mount_block_root(saved_root_name, root_mountflags);
 } else {
create_dev(/dev/root, ROOT_DEV);


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

The initrd has to run btrfs-scan so that btrfs can find the other
devices that have btrfs on them. Alternatively you can give all
involved devices in the fstab and kernel command line with
device=/dev/name
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: init free space ctl with proper unit

2013-05-03 Thread Josef Bacik
btrfsck was blowing up when checking the free space cache when we ran xfstests
with -l 64k.  That is because I was init'ing the free space ctl to whatever the
leafsize was, which isn't right for data block groups.  With this patch btrfsck
no longer complains.  This also fixes a tiny little typo in free-space-cache.c I
noticed while figuring this problem out.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 cmds-check.c   |   11 +--
 free-space-cache.c |2 --
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 030ab77..02bfedd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -3001,8 +3001,15 @@ static int check_space_cache(struct btrfs_root *root)
 
start = cache-key.objectid + cache-key.offset;
if (!cache-free_space_ctl) {
-   if (btrfs_init_free_space_ctl(cache,
- root-leafsize)) {
+   int sectorsize;
+
+   if (cache-flags  (BTRFS_BLOCK_GROUP_METADATA |
+   BTRFS_BLOCK_GROUP_SYSTEM))
+   sectorsize = root-leafsize;
+   else
+   sectorsize = root-sectorsize;
+
+   if (btrfs_init_free_space_ctl(cache, sectorsize)) {
ret = -ENOMEM;
break;
}
diff --git a/free-space-cache.c b/free-space-cache.c
index 8a77a32..5fb8ece 100644
--- a/free-space-cache.c
+++ b/free-space-cache.c
@@ -808,8 +808,6 @@ int btrfs_add_free_space(struct btrfs_free_space_ctl *ctl, 
u64 offset,
try_merge_free_space(ctl, info);
 
ret = link_free_space(ctl, info);
-   if (ret)
-
if (ret) {
printk(KERN_CRIT btrfs: unable to add free space :%d\n, ret);
BUG_ON(ret == -EEXIST);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: unmount scratch mnt in test 307

2013-05-03 Thread Dave Chinner
On Fri, May 03, 2013 at 03:15:01PM -0500, Eric Sandeen wrote:
 On 5/3/13 3:11 PM, Josef Bacik wrote:
  So if you have a mount command that doesn't use /etc/mtab then it will spit 
  out
  a different device for the mounted device.  So say we have
  
  SCRATCH_DEV_POOL=/dev/sda /dev/sdb /dev/sdc
  
  we will turn this into
  
  SCRATCH_DEV=/dev/sda
  SCRATCH_DEV_POOL=/dev/sdb /dev/sdc
  
  and then when you mkfs this you do _scratch_mkfs $SCRATCH_DEV_POOL which 
  turns
  into this
  
  mkfs.btrfs /dev/sdb /dev/sdc /dev/sda
  
  becuase we do
  
  mkfs $* $SCRATCH_DEV
  
  Then btrfs will always show the lowest devid in /proc/mounts to maintain
  consistency, so even though we do mount /dev/sda $SCRATCH_MNT, you will see
  /dev/sdb as the mounted device in /proc/mounts.  So then say the next test 
  wants
  to just use $SCRATCH_DEV, it will do _require_scratchdev which will check 
  to see
  if $SCRATCH_DEV is mounted, which it will look like it is not because
  /proc/mounts shows /dev/sdb instead of /dev/sda, and so it won't umount
  $SCRATCH_MNT, and then that test will fail because we can't mkfs the device
  because it is busy.  I reproduced this on a box that doesn't use /etc/mtab 
  by
  doing
  
  ./check btrfs/307 generic/015
  
  and 015 would fail.  With this patch it passes now.  Thanks,
  
  Signed-off-by: Josef Bacik jba...@fusionio.com
  ---
   tests/btrfs/307 |1 +
   1 files changed, 1 insertions(+), 0 deletions(-)
  
  diff --git a/tests/btrfs/307 b/tests/btrfs/307
  index 87314c6..15157b3 100644
  --- a/tests/btrfs/307
  +++ b/tests/btrfs/307
  @@ -35,6 +35,7 @@ _cleanup()
   {
   cd /
   rm -f $tmp.*
  +umount $SCRATCH_MNT
   }
   
   # get standard environment, filters and checks
  
 
 This seems fine for this particular test.
 
 Is it really a hard requirement that each test unmount SCRATCH_[DEV|MNT] if 
 it used it?
 If so, fine... the README does indicate this.
 
 But I wonder if we can make it a little more foolproof by updating 
 _require_scratch
 to handle this situation more gracefully?

It already tries to unmount $SCRATCH_DEV, and will through an error
if it's not mounted on $SCRATCH_MNT. I guess the opposite checks are
necessary in this case i.e. check that SCRATCH_MNT is not mounted,
and through an error if it's not SCRATCH_DEV that is mounted
there...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.9] parallel fsmark perf is real bad on sparse devices

2013-05-03 Thread Dave Chinner
Hi folks,

It's that time again - I ran fsmark on btrfs and found performance
was awful.

tl;dr: memory pressure causes random writeback of metadata (bad),
fragmenting the underlying sparse storage. This causes a downward
spiral as btrfs cycles through good IO patterns that get
fragmented at the device level due to the bad IO patterns
fragmenting the underlying sparse device.

FYI, The storage hardware is a DM RAID0 stripe across 4 SSDs sitting
behind 512MB of BBWC with an XFS filesystem on it. The only file on
the filesystem is the sparse 100TB file used for the device, and the
VM is using virtio,cache=none to access the filesystem image.

i.e. the storage I'm working on this time is a thinly provisioned
100TB device fed to an 8p, 4GB RAM VM, and this script is then run:

$ cat fsmark-50-test-btrfs.sh 
#!/bin/bash

sudo umount /mnt/scratch  /dev/null 21
sudo mkfs.btrfs /dev/vdc
sudo mount /dev/vdc /mnt/scratch
sudo chmod 777 /mnt/scratch
cd /home/dave/src/fs_mark-3.3/
time ./fs_mark  -D  1  -S0  -n  10  -s  0  -L  63 \
-d  /mnt/scratch/0  -d  /mnt/scratch/1 \
-d  /mnt/scratch/2  -d  /mnt/scratch/3 \
-d  /mnt/scratch/4  -d  /mnt/scratch/5 \
-d  /mnt/scratch/6  -d  /mnt/scratch/7 \
| tee (stats --trim-outliers | tail -1 12)
sync
$
$ ./fsmark-50-test-btrfs.sh

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/vdc
nodesize 4096 leafsize 4096 sectorsize 4096 size 100.00TB
Btrfs Btrfs v0.19

#  ./fs_mark  -D  1  -S0  -n  10  -s  0  -L  63  -d  /mnt/scratch/0  -d 
 /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3  -d  /mnt/scratch/4  -d 
 /mnt/scratch/5  -d  /mnt/scratch/6  -d  /mnt/scratch/7
#   Version 3.3, 8 thread(s) starting at Fri May  3 17:08:46 2013
#   Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
#   Directories:  Time based hash between directories across 1 
subdirectories with 180 seconds per subdirectory.
#   File names: 40 bytes long, (16 initial bytes of time stamp with 24 
random bytes at end of name)
#   Files info: size 0 bytes, written with an IO size of 16384 bytes per 
write
#   App overhead is time in microseconds spent in the test not doing file 
writing related system calls.

FSUse%Count SizeFiles/sec App Overhead
 0   800  53498.9  7898900
 0  1600  11186.5  9409278
 0  2400  17026.1  7907599
 0  3200  25815.6  9749980
 0  4000  11503.0  8556349
 0  4800  43561.9  8295238
 0  5600  17175.3  8304668
^C 0 80-560(3.2e+06+/-1.1e+06)0 
11186.50-53498.90(23016.4+/-1.1e+04) 
7898900-9749980(8.49463e+06+/-5e+05)

What I'm seeing is that the underlying image file is getting badly,
badly fragmented. This short test created approximately 8 million
extents in the image file in about 10 minutes runtime. Running
xfs_fsr on the image file pointed this out:

# xfs_fsr -d -v vm-100TB-sparse.img
vm-100TB-sparse.img
vm-100TB-sparse.img extents=7971773 can_save=7926036 tmp=./.fsr6198
DEBUG: fsize=109951162777600 blsz_dio=16773120 d_min=512
d_max=2147483136 pgsz=4096
Temporary file has 46107 extents (7971773 in original)
extents before:7971773 after:46107  vm-100TB-sparse.img
#

Most of the data written to the file is contiguous. This means that
btrfs is filling the filesystem in a contiguous manner, but it's IO
is anything but contiguous. So, what's happening here?

Turns out that when the machine first runs out of free memory (about
1.2m inodes in), btrfs goes from running a couple of hundred nice
large 512k IOs a second to an intense 10s long burst of 10-15kiops
of tiny random IOs. Looking at it from the IO completion side of
things:

253,32   4  238 5.936043934 0  C   W 103680 + 1024 [0]
253,32   4  239 5.936155917 0  C   W 2201728 + 1024 [0]
253,32   4  240 5.936172087 0  C   W 104704 + 1024 [0]
253,32   4  241 5.936283060 0  C   W 2202752 + 1024 [0]
253,32   4  242 5.936294881 0  C   W 105728 + 1024 [0]
253,32   4  243 5.936385182 0  C   W 106752 + 1024 [0]
253,32   4  244 5.936394695 0  C   W 107776 + 1024 [0]
253,32   4  245 5.936402936 0  C   W 108800 + 1024 [0]
253,32   4  246 5.936406721 0  C   W 109824 + 896 [0]
253,32   4  247 5.936414258 0  C   W 2203776 + 1024 [0]
253,32   4  248 5.936515302 0  C   W 2204800 + 1024 [0]
253,32   4  249 5.936606737 0  C   W 2205824 + 1024 [0]
253,32   4  250 5.936689345 0  C   W 2206848 + 1024 [0]

All nice and large, mostly sequential IO patterns. Fast foward to
where we've run out of memory:

253,32