date:20131218

Re: [PATCH v2] Btrfs: fix tree mod logging

2013-12-18 Thread Ahmet Inan

Thanks a lot Filipe!

Have been testing this patch now for 5 days and it fixed this annoying
Problem since 3.11.0 on 3x NFS Servers here.
This is also a candidate that should be back ported, as it fixes crashes.
Just for Information for others here: Your previous patch,
"Btrfs: return immediately if tree log mod is not necessary"
is also needed to make it apply cleanly.

Ahmet
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref

2013-12-18 Thread Dave Chinner

On Wed, Dec 18, 2013 at 04:07:27PM -0500, Josef Bacik wrote:
> qgroups need to have a consistent view of the references for a particular 
> extent
> record.  Currently they do this through sequence numbers on delayed refs, but
> this is no longer acceptable.  So instead introduce lock_ref/unlock_ref.  This
> will provide the qgroup code with a consistent view of the reference while it
> does its accounting calculations without interfering with the delayed ref 
> code.
> Thanks,
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/ctree.h   |  11 ++
>  fs/btrfs/delayed-ref.c |   2 +
>  fs/btrfs/delayed-ref.h |   1 +
>  fs/btrfs/extent-tree.c | 102 
> +++--
>  4 files changed, 113 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index a924274..8b3fd61 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1273,6 +1273,9 @@ struct btrfs_block_group_cache {
>  
>   /* For delayed block group creation */
>   struct list_head new_bg_list;
> +
> + /* For locking reference modifications */
> + struct extent_io_tree ref_lock;
>  };
>  
>  /* delayed seq elem */
> @@ -3319,6 +3322,14 @@ int btrfs_init_space_info(struct btrfs_fs_info 
> *fs_info);
>  int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans,
>struct btrfs_fs_info *fs_info);
>  int __get_raid_index(u64 flags);
> +int lock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr,
> +  u64 num_bytes, int for_cow,
> +  struct btrfs_block_group_cache **block_group,
> +  struct extent_state **cached_state);
> +int unlock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr,
> +u64 num_bytes, int for_cow,
> +struct btrfs_block_group_cache *block_group,
> +struct extent_state **cached_state);

Please namespace these - they are far too similar to the generic
struct lockref name and manipulation functions

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: sync-up with newly introduced ioctl number

2013-12-18 Thread Anand Jain

for now the manual sync up of new ioctls introduced in the btrfs
kernel. For which there wasn't any btrfs-progs patch.
however we might have better idea for the long run.

Signed-off-by: Anand Jain 
---
 ioctl.h |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index 29575d8..cd90afe 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -623,6 +623,12 @@ struct btrfs_ioctl_clone_range_args {
  struct btrfs_ioctl_dedup_args)
 #define BTRFS_IOC_GET_FSLIST _IOWR(BTRFS_IOCTL_MAGIC, 56, \
struct btrfs_ioctl_fslist_args)
+#define BTRFS_IOC_GET_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \
+  struct btrfs_ioctl_feature_flags)
+#define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 58, \
+  struct btrfs_ioctl_feature_flags[2])
+#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 59, \
+  struct btrfs_ioctl_feature_flags[3])
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: ioctls would need unique id

2013-12-18 Thread Anand Jain

BTRFS_IOC_SET_FEATURES and BTRFS_IOC_GET_SUPPORTED_FEATURES
conflicts with BTRFS_IOC_GET_FEATURES

Signed-off-by: Anand Jain 
---
 include/uapi/linux/btrfs.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 7d7f776..0fe736e 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -634,9 +634,9 @@ struct btrfs_ioctl_fslist_args {
struct btrfs_ioctl_fslist_args)
 #define BTRFS_IOC_GET_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \
   struct btrfs_ioctl_feature_flags)
-#define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 57, \
+#define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 58, \
   struct btrfs_ioctl_feature_flags[2])
-#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \
+#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 59, \
   struct btrfs_ioctl_feature_flags[3])
 
 #endif /* _UAPI_LINUX_BTRFS_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] btrfs: add framework to read fs info from btrfs-control

2013-12-18 Thread Anand Jain

This adds ioctl BTRFS_IOC_GET_FSIDS which reads the fs
info through the btrfs-control, needed to optimize
heavily used btrfs-progs function check_mounted()
plus few other minor uses.

Signed-off-by: Anand Jain 
---
 v3: rebase and update commit
 v2: accepts Zach suggested and now holds uuid_mutex

 fs/btrfs/super.c   |   66 
 fs/btrfs/volumes.c |   39 ++
 fs/btrfs/volumes.h |2 +
 include/uapi/linux/btrfs.h |   19 
 4 files changed, 120 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0d4b1c3..13884c5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1645,38 +1645,92 @@ static struct file_system_type btrfs_fs_type = {
 };
 MODULE_ALIAS_FS("btrfs");
 
+static int btrfs_ioc_get_fslist(void __user *arg)
+{
+   int ret = 0;
+   u64 sz_fslist_arg;
+   u64 sz_fslist;
+   u64 sz_out;
+   struct btrfs_ioctl_fslist_args *fslist_arg;
+   struct btrfs_ioctl_fslist_args *fslist_arg_tmp;
+   struct btrfs_ioctl_fslist *fslist;
+
+   u64 cnt = 0, ucnt;
+
+   sz_fslist_arg = sizeof(*fslist_arg);
+   sz_fslist = sizeof(*fslist);
+   if (copy_from_user(&ucnt,
+   (struct btrfs_ioctl_fslist_args __user *)(arg +
+   offsetof(struct btrfs_ioctl_fslist_args, count)),
+   sizeof(ucnt)))
+   return -EFAULT;
+
+   cnt = btrfs_get_fslist_cnt();
+
+   if (cnt > ucnt) {
+   if (copy_to_user(arg +
+   offsetof(struct btrfs_ioctl_fslist_args, count),
+   &cnt, sizeof(cnt)))
+   return -EFAULT;
+   return 1;
+   }
+
+   sz_out = sz_fslist_arg + sz_fslist * cnt;
+   fslist_arg_tmp = fslist_arg = memdup_user(arg, sz_out);
+   if (IS_ERR(fslist_arg))
+   return PTR_ERR(fslist_arg);
+   fslist = (struct btrfs_ioctl_fslist *) (++fslist_arg_tmp);
+   cnt = btrfs_get_fslist(fslist, cnt);
+   fslist_arg->count = cnt;
+   if (copy_to_user(arg, fslist_arg, sz_out)) {
+   ret = -EFAULT;
+   goto out;
+   }
+   ret = 0;
+out:
+   kfree(fslist_arg);
+   return ret;
+}
+
 /*
  * used by btrfsctl to scan devices when no FS is mounted
  */
 static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
 {
-   struct btrfs_ioctl_vol_args *vol;
+   struct btrfs_ioctl_vol_args *vol = NULL;
struct btrfs_fs_devices *fs_devices;
int ret = -ENOTTY;
+   void __user *argp = (void __user *)arg;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
-   vol = memdup_user((void __user *)arg, sizeof(*vol));
-   if (IS_ERR(vol))
-   return PTR_ERR(vol);
-
switch (cmd) {
case BTRFS_IOC_SCAN_DEV:
+   vol = memdup_user((void __user *)arg, sizeof(*vol));
+   if (IS_ERR(vol))
+   return PTR_ERR(vol);
ret = btrfs_scan_one_device(vol->name, FMODE_READ,
&btrfs_fs_type, &fs_devices);
+   kfree(vol);
break;
case BTRFS_IOC_DEVICES_READY:
+   vol = memdup_user((void __user *)arg, sizeof(*vol));
+   if (IS_ERR(vol))
+   return PTR_ERR(vol);
ret = btrfs_scan_one_device(vol->name, FMODE_READ,
&btrfs_fs_type, &fs_devices);
+   kfree(vol);
if (ret)
break;
ret = !(fs_devices->num_devices == fs_devices->total_devices);
break;
+   case BTRFS_IOC_GET_FSLIST:
+   ret = btrfs_ioc_get_fslist(argp);
+   break;
}
 
-   kfree(vol);
return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 92303f4..debd619 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6284,3 +6284,42 @@ int btrfs_scratch_superblock(struct btrfs_device *device)
 
return 0;
 }
+
+int btrfs_get_fslist_cnt(void)
+{
+   int cnt = 0;
+   struct btrfs_fs_devices *fs_devices;
+
+   mutex_lock(&uuid_mutex);
+   list_for_each_entry(fs_devices, &fs_uuids, list)
+   cnt++;
+   mutex_unlock(&uuid_mutex);
+
+   return cnt;
+}
+
+u64 btrfs_get_fslist(struct btrfs_ioctl_fslist *fslist, u64 ucnt)
+{
+   u64 cnt = 0;
+   struct btrfs_fs_devices *fs_devices;
+
+   mutex_lock(&uuid_mutex);
+   list_for_each_entry(fs_devices, &fs_uuids, list) {
+   if (!(cnt < ucnt))
+   break;
+   memcpy(fslist->fsid, fs_devices->fsid,
+   BTRFS_FSID_SIZE);
+   fslist->num_devices = fs_devices->num_devices;
+   fslist->missing_devices = fs_devices->missing_device

[PATCH] btrfs: fix the warning in prepare_pages

2013-12-18 Thread Anand Jain

would fix the below compile warning

fs/btrfs/file.c: In function ‘prepare_pages’:
fs/btrfs/file.c:1247: warning: ‘err’ may be used uninitialized in this function

Signed-off-by: Anand Jain 
---
 fs/btrfs/file.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 740ae8c..35bf838 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1244,7 +1244,7 @@ static noinline int prepare_pages(struct inode *inode, 
struct page **pages,
int i;
unsigned long index = pos >> PAGE_CACHE_SHIFT;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
-   int err;
+   int err = 0;
int faili;
 
for (i = 0; i < num_pages; i++) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Rework qgroup accounting

2013-12-18 Thread Liu Bo

On Wed, Dec 18, 2013 at 04:07:26PM -0500, Josef Bacik wrote:
> People have been complaining about autodefrag/defrag killing their box with 
> OOM.
> This is because the snapshot aware defrag stuff super sucks if you have lots 
> of
> snapshots, and so that needs to be reworked.  The problem is once that is 
> fixed
> you start to hit horrible lock contention on the delayed refs lock because we
> have thousands of like entries that can't be merged until when we go to 
> actually
> run the delayed ref.  This problem exists because of the delayed ref sequence
> number.
> 
> The major user of the delayed ref sequence number is the qgroup code.  It uses
> it to pass into btrfs_find_all_roots to see what roots pointed to a particular
> bytenr either before or including the current operation.  It needs this
> information to know if we were removing the last ref or an just the last ref 
> for
> this particular root.  The problem with this is that it has made the delayed 
> ref
> code incredibly fragile and has forced us to do things like
> btrfs_merge_delayed_refs which is what is causing us so much pain when we have
> thousands of ref updates for the same block.
> 
> In order to fix this I'm introducing a new way of adjusting quota counts.  
> I've
> called them qgroup operations, and we apply them in very specific situations.
> We only add these when we add or remove the only ref for a particular root.
> Obviously we have to account for shared refs as well so there is some extra 
> code
> for these special cases, but basically we make the qgroup accounting only 
> happen
> when we know there was a real change (or likely a real change in the case of
> shared refs).
> 
> In order to do this I've also introduced lock/unlock_ref.  This only gets used
> if we actually have qgroups enabled, but it will be relatively low cost even 
> if
> we have qgroups enabled as it only locks the bytenr for reference updates.  So
> delayed ref updates will not trip over this since we only do one at a time
> anyway, so we'll only have contention if we have delayed refs running at the
> same time as a qgroup operation update.
> 
> Then all we need to account for is the fact that we will get the full view of
> the roots at the time we run the operations, not what they were when our
> particular operation occurred.  This is ok because we will either ignore our
> root in the case of add or not ignore it in case of remove when calculating 
> the
> ref counts.  We use the same ref counting scheme that Arne developed as it's
> pretty freaking awesome, and just adjust how we count the ref counts based on
> our operations.
> 
> In addition to all of this new code I've added a big set of sanity tests to 
> make
> sure everything is working right.  Between this and the qgroups xfstests I'm
> pretty certain I haven't broken anything obvious with qgroups.  This is just 
> the
> first step in getting rid of the delayed ref sequence number and fixing the
> defrag OOM mess but it is the biggest part.  Thanks,

I'd say I love the idea, will look at it closer.

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] Btrfs: convert printk to btrfs_ and fix BTRFS prefix

2013-12-18 Thread Josef Bacik



On 12/12/2013 02:57 PM, Frank Holton wrote:

Convert all applicable cases of printk and pr_* to the btrfs_* macros.

Fix all uses of the BTRFS prefix.

Signed-off-by: Frank Holton 


There are tailing whitespaces everywhere.  Please run this through 
checkpatch.pl so you can fix all those up, or use


let c_space_errors = 1

in your .vimrc so you can see them.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: improve the performance fluctuating of the fsync

2013-12-18 Thread Josef Bacik



On 12/18/2013 05:52 AM, Miao Xie wrote:

In order to improve the performance of fsync, we use the outstanding
ordered extents to avoid looking up the checksum from the csum tree.
But we didn't filter out the ordered extents whose csum is still being
calculated, when we got those ordered extents, we had to wait for the
csum calculation. It made the performance dropped down suddenly. (On
my box, it drop down from 56MB/s to 4-10MB/s)

But actually, the csum calculation of the ordered extents which were
introduced by the current fsync had already completed. Those ordered
extents whose csum was being calculated didn't belong to the current
fsync, we can ignore them.


This isn't true because we will just start IO and carry on and wait 
later on, so we could very well have ordered extents that we started for 
this fsync without their csums ready which is why this code exists.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref

2013-12-18 Thread Josef Bacik

qgroups need to have a consistent view of the references for a particular extent
record.  Currently they do this through sequence numbers on delayed refs, but
this is no longer acceptable.  So instead introduce lock_ref/unlock_ref.  This
will provide the qgroup code with a consistent view of the reference while it
does its accounting calculations without interfering with the delayed ref code.
Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h   |  11 ++
 fs/btrfs/delayed-ref.c |   2 +
 fs/btrfs/delayed-ref.h |   1 +
 fs/btrfs/extent-tree.c | 102 +++--
 4 files changed, 113 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a924274..8b3fd61 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1273,6 +1273,9 @@ struct btrfs_block_group_cache {
 
/* For delayed block group creation */
struct list_head new_bg_list;
+
+   /* For locking reference modifications */
+   struct extent_io_tree ref_lock;
 };
 
 /* delayed seq elem */
@@ -3319,6 +3322,14 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
 int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info);
 int __get_raid_index(u64 flags);
+int lock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr,
+u64 num_bytes, int for_cow,
+struct btrfs_block_group_cache **block_group,
+struct extent_state **cached_state);
+int unlock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr,
+  u64 num_bytes, int for_cow,
+  struct btrfs_block_group_cache *block_group,
+  struct extent_state **cached_state);
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 int level, int *slot);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index fab60c1..ee1c29d 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -680,6 +680,7 @@ static noinline void add_delayed_tree_ref(struct 
btrfs_fs_info *fs_info,
ref->action = action;
ref->is_head = 0;
ref->in_tree = 1;
+   ref->for_cow = for_cow;
 
if (need_ref_seq(for_cow, ref_root))
seq = btrfs_get_tree_mod_seq(fs_info, &trans->delayed_ref_elem);
@@ -739,6 +740,7 @@ static noinline void add_delayed_data_ref(struct 
btrfs_fs_info *fs_info,
ref->action = action;
ref->is_head = 0;
ref->in_tree = 1;
+   ref->for_cow = for_cow;
 
if (need_ref_seq(for_cow, ref_root))
seq = btrfs_get_tree_mod_seq(fs_info, &trans->delayed_ref_elem);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index a54c9d4..db71a37 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -52,6 +52,7 @@ struct btrfs_delayed_ref_node {
 
unsigned int action:8;
unsigned int type:8;
+   unsigned int for_cow:1;
/* is this node still in the rbtree? */
unsigned int is_head:1;
unsigned int in_tree:1;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cd4d9ca..03b536c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -672,6 +672,79 @@ struct btrfs_block_group_cache *btrfs_lookup_block_group(
return cache;
 }
 
+
+/* This is used to lock the modification to an extent ref.  This only does
+ * something if the reference is a fs tree.
+ *
+ * @fs_info: the fs_info for this filesystem.
+ * @root_objectid: the root objectid that we are modifying for this extent.
+ * @bytenr: the byte we are modifying the reference for
+ * @num_bytes: the number of bytes we are locking.
+ * @for_cow: if this operation is for cow then we don't need to lock
+ * @block_group: we will store the block group we looked up so that the unlock
+ * doesn't have to do another search.
+ * @cached_state: this is for caching our location so when we unlock we don't
+ * have to do a tree search.
+ *
+ * This can return -ENOMEM if we cannot allocate our extent state.
+ */
+int lock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr,
+u64 num_bytes, int for_cow,
+struct btrfs_block_group_cache **block_group,
+struct extent_state **cached_state)
+{
+   struct btrfs_block_group_cache *cache;
+   int ret;
+
+   if (!fs_info->quota_enabled || !need_ref_seq(for_cow, root_objectid))
+   return 0;
+
+   cache = btrfs_lookup_block_group(fs_info, bytenr);
+   ASSERT(cache);
+   ASSERT(cache->key.objectid <= bytenr &&
+  (cache->key.objectid + cache->key.offset >=
+   bytenr + num_bytes));
+   ret = lock_extent_bits(&cache->ref_lock, bytenr,
+  bytenr + num_bytes - 1, 0, cached_state);
+   if (!ret)
+   *block_group = cache;
+   else
+   btrfs_put_block_group(cache)

Rework qgroup accounting

2013-12-18 Thread Josef Bacik

People have been complaining about autodefrag/defrag killing their box with OOM.
This is because the snapshot aware defrag stuff super sucks if you have lots of
snapshots, and so that needs to be reworked.  The problem is once that is fixed
you start to hit horrible lock contention on the delayed refs lock because we
have thousands of like entries that can't be merged until when we go to actually
run the delayed ref.  This problem exists because of the delayed ref sequence
number.

The major user of the delayed ref sequence number is the qgroup code.  It uses
it to pass into btrfs_find_all_roots to see what roots pointed to a particular
bytenr either before or including the current operation.  It needs this
information to know if we were removing the last ref or an just the last ref for
this particular root.  The problem with this is that it has made the delayed ref
code incredibly fragile and has forced us to do things like
btrfs_merge_delayed_refs which is what is causing us so much pain when we have
thousands of ref updates for the same block.

In order to fix this I'm introducing a new way of adjusting quota counts.  I've
called them qgroup operations, and we apply them in very specific situations.
We only add these when we add or remove the only ref for a particular root.
Obviously we have to account for shared refs as well so there is some extra code
for these special cases, but basically we make the qgroup accounting only happen
when we know there was a real change (or likely a real change in the case of
shared refs).

In order to do this I've also introduced lock/unlock_ref.  This only gets used
if we actually have qgroups enabled, but it will be relatively low cost even if
we have qgroups enabled as it only locks the bytenr for reference updates.  So
delayed ref updates will not trip over this since we only do one at a time
anyway, so we'll only have contention if we have delayed refs running at the
same time as a qgroup operation update.

Then all we need to account for is the fact that we will get the full view of
the roots at the time we run the operations, not what they were when our
particular operation occurred.  This is ok because we will either ignore our
root in the case of add or not ignore it in case of remove when calculating the
ref counts.  We use the same ref counting scheme that Arne developed as it's
pretty freaking awesome, and just adjust how we count the ref counts based on
our operations.

In addition to all of this new code I've added a big set of sanity tests to make
sure everything is working right.  Between this and the qgroups xfstests I'm
pretty certain I haven't broken anything obvious with qgroups.  This is just the
first step in getting rid of the delayed ref sequence number and fixing the
defrag OOM mess but it is the biggest part.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] Btrfs: add sanity tests for new qgroup accounting code

2013-12-18 Thread Josef Bacik

This exercises the various parts of the new qgroup accounting code.  We do some
basic stuff and do some things with the shared refs to make sure all that code
works.  I had to add a bunch of infrastructure because I needed to be able to
insert items into a fake tree without having to do all the hard work myself,
hopefully this will be usefull in the future.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/Makefile |   2 +-
 fs/btrfs/ctree.c  |   4 +
 fs/btrfs/ctree.h  |   3 +
 fs/btrfs/disk-io.c|  18 +-
 fs/btrfs/disk-io.h|   1 +
 fs/btrfs/extent-tree.c|  25 ++
 fs/btrfs/extent_io.c  |  47 
 fs/btrfs/extent_io.h  |   2 +
 fs/btrfs/qgroup.c |  23 ++
 fs/btrfs/super.c  |   3 +
 fs/btrfs/tests/btrfs-tests.c  |  91 +++
 fs/btrfs/tests/btrfs-tests.h  |   9 +
 fs/btrfs/tests/inode-tests.c  |  35 +--
 fs/btrfs/tests/qgroup-tests.c | 617 ++
 14 files changed, 843 insertions(+), 37 deletions(-)
 create mode 100644 fs/btrfs/tests/qgroup-tests.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 1a44e42..e6df2dd 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -16,4 +16,4 @@ btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
 
 btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
tests/extent-buffer-tests.o tests/btrfs-tests.o \
-   tests/extent-io-tests.o tests/inode-tests.o
+   tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a57507a..38ef590 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1344,6 +1344,10 @@ static inline int should_cow_block(struct 
btrfs_trans_handle *trans,
   struct btrfs_root *root,
   struct extent_buffer *buf)
 {
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
+   if (unlikely(root->dummy_root))
+   return 0;
+#endif
/* ensure we can see the force_cow */
smp_rmb();
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 944c916..8ad5adb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1783,6 +1783,7 @@ struct btrfs_root {
int in_radix;
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
int dummy_root;
+   u64 alloc_bytenr;
 #endif
u64 defrag_trans_start;
struct btrfs_key defrag_progress;
@@ -4094,6 +4095,8 @@ static inline int btrfs_defrag_cancelled(struct 
btrfs_fs_info *fs_info)
 /* Sanity test specific functions */
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 void btrfs_test_destroy_inode(struct inode *inode);
+int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
+  u64 rfer, u64 excl);
 #endif
 
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3eb27b9..c30de3d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1095,6 +1095,11 @@ struct extent_buffer *btrfs_find_tree_block(struct 
btrfs_root *root,
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
 u64 bytenr, u32 blocksize)
 {
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
+   if (unlikely(root->dummy_root))
+   return alloc_test_extent_buffer(root->fs_info, bytenr,
+   blocksize);
+#endif
return alloc_extent_buffer(root->fs_info, bytenr, blocksize);
 }
 
@@ -1245,6 +1250,7 @@ struct btrfs_root *btrfs_alloc_dummy_root(void)
return ERR_PTR(-ENOMEM);
__setup_root(4096, 4096, 4096, 4096, root, NULL, 1);
root->dummy_root = 1;
+   root->alloc_bytenr = 0;
 
return root;
 }
@@ -2034,7 +2040,7 @@ static void free_root_pointers(struct btrfs_fs_info 
*info, int chunk_root)
free_root_extent_buffers(info->chunk_root);
 }
 
-static void del_fs_roots(struct btrfs_fs_info *fs_info)
+void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info)
 {
int ret;
struct btrfs_root *gang[8];
@@ -2929,7 +2935,7 @@ fail_qgroup:
 fail_trans_kthread:
kthread_stop(fs_info->transaction_kthread);
btrfs_cleanup_transaction(fs_info->tree_root);
-   del_fs_roots(fs_info);
+   btrfs_free_fs_roots(fs_info);
 fail_cleaner:
kthread_stop(fs_info->cleaner_kthread);
 
@@ -3454,8 +3460,10 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info 
*fs_info,
btrfs_free_log_root_tree(NULL, fs_info);
}
 
-   __btrfs_remove_free_space_cache(root->free_ino_pinned);
-   __btrfs_remove_free_space_cache(root->free_ino_ctl);
+   if (root->free_ino_pinned)
+   __btrfs_remove_free_space_cache(root->free_ino_pinned);
+   if (root->free_ino_ctl)
+   __btrfs_remove_free_space_cache(root->free_ino_ctl);
free_fs_root(root);
 }
 
@@ -3580,7 +3588,7 @@ int close_ctree(struct btrfs_root *root)

Re: Backporting bugfixes

2013-12-18 Thread Duncan

Pavel Roskin posted on Wed, 18 Dec 2013 14:31:53 -0500 as excerpted:

> On Wed, 18 Dec 2013 19:23:08 + Chris Mason  wrote:
> 
>> We do tag some commits for stable, but Dave Sterba actually just sent a
>> request to the stable tree to pull in a few more.
> 
> That's great news!  Thank you for a quick reply!

In fact, here's the stable-queue request.

Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume

2013-12-18 Thread Michael Welsh Duggan

Wang Shilong  writes:

> On 12/18/2013 12:06 PM, Michael Welsh Duggan wrote:
>> Wang Shilong  writes:
>>
>>> It seems that you use older kernel version but use the latest
>>> btrfs-progs, new btrfs-progs use uuid tree to search but
>>> this tree did not exist yet.
>>>
>>> Can you try to upgrade your kernel?
>> What version is necessary?  (I am currently on 3.11.10.)
> 3.12 is ok, btw, can you run for 3.11.10
>
> #dmesg
>
> Let's see if it output somthing like:
>
> btrfs: can not found root: 9

Indeed.

$ dmesg | grep "root 9"
[305770.945287] could not find root 9
[305770.945300] could not find root 9
[305770.945369] could not find root 9
[305770.945398] could not find root 9
[305915.405421] could not find root 9
[305915.405483] could not find root 9
[305962.927150] could not find root 9
[305962.927222] could not find root 9
[399096.924559] could not find root 9
[399096.924617] could not find root 9
[399195.585768] could not find root 9
[399195.585823] could not find root 9

Looks like I'll be rebooting to a new kernel when I get home tonight.

-- 
Michael Welsh Duggan
(m...@md5i.com)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs stable updates for 3.12

2013-12-18 Thread Greg KH

On Wed, Dec 18, 2013 at 04:14:02PM +0100, David Sterba wrote:
> Hi,
> 
> please queue the following patches to 3.12 stable. They fix a few
> crashes or lockups that were reported by users.
> 
> The patch "stop using vfs_read in send" may seem big for stable, but without 
> it
> the send/receive ioctl hits the global open file limit sooner or later,
> depending on the ram size.
> 
> Subjects:
> Btrfs: do a full search everytime in btrfs_search_old_slot
> Btrfs: reset intwrite on transaction abort
> Btrfs: fix memory leak of chunks' extent map
> Btrfs: fix hole check in log_one_extent [bug 1]
> Btrfs: fix incorrect inode acl reset
> Btrfs: stop using vfs_read in send
> Btrfs: take ordered root lock when removing ordered operations inode
> Btrfs: do not run snapshot-aware defragment on error
> Btrfs: fix a crash when running balance and defrag concurrently
> Btrfs: fix lockdep error in async commit
> Commits:
> d4b4087c43cc00a196c5be57fac41f41309f1d56
> e0228285a8cad70e4b7b4833cc650e36ecd8de89
> 7d3d1744f8a7d62e4875bd69cc2192a939813880
> ed9e8af88e2551aaa6bf51d8063a2493e2d71597
> 8185554d3eb09d23a805456b6fa98dcbb34aa518
> ed2590953bd06b892f0411fc94e19175d32f197a
> 93858769172c4e3678917810e9d5de360eb991cc
> 6f519564d7d978c00351d9ab6abac3deeac31621
> 48ec47364b6d493f0a9cdc116977bf3f34e5c3ec
> b1a06a4b574996692b72b742bf6e6aa0c711a948
> 
> all apply cleanly on top of 3.12.5.

all now applied, along with 4 of these that seem to be applicable to
3.10-stable.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Backporting bugfixes

2013-12-18 Thread Pavel Roskin

On Wed, 18 Dec 2013 19:23:08 +
Chris Mason  wrote:

> We do tag some commits for stable, but Dave Sterba actually just sent
> a request to the stable tree to pull in a few more.

That's great news!  Thank you for a quick reply!

-- 
Regards,
Pavel Roskin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Backporting bugfixes

2013-12-18 Thread Chris Mason

On Wed, 2013-12-18 at 14:06 -0500, Pavel Roskin wrote:
> Hello!
> 
> I have noticed that there have been important fixes for btrfs in the  
> mainline Linux git repository.  However, there is just one btrfs fix  
> in Linux 3.12.5 after 3.12.
> 
> I think it's important to submit all serious bugfixes to the stable  
> kernels.  It would protect users against data corruption and improve  
> the image of btrfs as a serious filesystem that can be trusted at  
> least with semi-important data.
> 
> This post was inspired by http://lwn.net/Articles/577218/ and
> https://bugzilla.redhat.com/show_bug.cgi?id=1028750
> 

Hi Pavel,

We do tag some commits for stable, but Dave Sterba actually just sent a
request to the stable tree to pull in a few more.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Backporting bugfixes

2013-12-18 Thread Pavel Roskin


Hello!

I have noticed that there have been important fixes for btrfs in the  
mainline Linux git repository.  However, there is just one btrfs fix  
in Linux 3.12.5 after 3.12.


I think it's important to submit all serious bugfixes to the stable  
kernels.  It would protect users against data corruption and improve  
the image of btrfs as a serious filesystem that can be trusted at  
least with semi-important data.


This post was inspired by http://lwn.net/Articles/577218/ and
https://bugzilla.redhat.com/show_bug.cgi?id=1028750

--
Regards,
Pavel Roskin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2013-12-18 Thread eb

I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows:

/dev/sdb3 - cache0 (80 GB Intel SSD)
/dev/sdc1 - backing device (2 TB WD HDD)

sdb3+sdc1 => /dev/bcache0

On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
as / and /home. What's been bothering me are the following entries in
my kernel log:

[13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
[13870.326639] incomplete page write in btrfs with offset 3072 and length 1024

The offset/length values are always either 1536/2560 or 3072/1024,
they sum up nicely to 4K. There are 607 of those in there as I am
writing this, the machine has been up 18 hours and been under no
particular I/O strain (it's a desktop).

Trying to fix this, I unattached the cache (still using /dev/bcache0,
but without /dev/sdb3 attached), causing these errors to disappear. As
soon as I re-attached /dev/sdb3 they started again, so I am fairly
sure it's an unfavorable interaction between bcache and btrfs.

Is this something I should be worried about (they're only emitted with
KERN_INFO?) or just an alignment problem? The underlying HDD is using
4K-Sectors, while the block_size of bcache seems to be 512, could that
be the issue here?

I've also encountered incomplete reads and a few csum errors, but I
have not been able to trigger these regularly. I have a feeling that
the error is more likely  o be on the bcache end (I've mailed to that
list as well), however any insight into the matter would be much
appreciated.

Thanks,

- eb
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs stable updates for 3.12

2013-12-18 Thread David Sterba

Hi,

please queue the following patches to 3.12 stable. They fix a few
crashes or lockups that were reported by users.

The patch "stop using vfs_read in send" may seem big for stable, but without it
the send/receive ioctl hits the global open file limit sooner or later,
depending on the ram size.

Subjects:
Btrfs: do a full search everytime in btrfs_search_old_slot
Btrfs: reset intwrite on transaction abort
Btrfs: fix memory leak of chunks' extent map
Btrfs: fix hole check in log_one_extent [bug 1]
Btrfs: fix incorrect inode acl reset
Btrfs: stop using vfs_read in send
Btrfs: take ordered root lock when removing ordered operations inode
Btrfs: do not run snapshot-aware defragment on error
Btrfs: fix a crash when running balance and defrag concurrently
Btrfs: fix lockdep error in async commit
Commits:
d4b4087c43cc00a196c5be57fac41f41309f1d56
e0228285a8cad70e4b7b4833cc650e36ecd8de89
7d3d1744f8a7d62e4875bd69cc2192a939813880
ed9e8af88e2551aaa6bf51d8063a2493e2d71597
8185554d3eb09d23a805456b6fa98dcbb34aa518
ed2590953bd06b892f0411fc94e19175d32f197a
93858769172c4e3678917810e9d5de360eb991cc
6f519564d7d978c00351d9ab6abac3deeac31621
48ec47364b6d493f0a9cdc116977bf3f34e5c3ec
b1a06a4b574996692b72b742bf6e6aa0c711a948

all apply cleanly on top of 3.12.5.

Thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: no space left, metadata usage almost full?

2013-12-18 Thread Hugo Mills

On Wed, Dec 18, 2013 at 11:54:39PM +0900, Tomasz Chmielewski wrote:
> On Wed, 18 Dec 2013 12:46:52 +
> Hugo Mills  wrote:
> 
> > > # btrfs fi df /home
> > > Data, RAID1: total=2.51TiB, used=1.58TiB
> > > System, RAID1: total=32.00MiB, used=372.00KiB
> > > Metadata, RAID1: total=48.00GiB, used=47.23GiB
> > > # btrfs fi balance start -dusage=5 /home
> 
> >Currently, yes, it is the only approach.
> >Hope the above helps,
> 
> So the balance finished, and metadata is still almost full:
> 
> # btrfs fi df /home
> Data, RAID1: total=1.60TiB, used=1.58TiB
> System, RAID1: total=32.00MiB, used=248.00KiB
> Metadata, RAID1: total=49.00GiB, used=47.24GiB
> 
> Confused about the output - does it actually look any better?

   Yes, because...

> # btrfs fi show /home
> Label: crawler-btrfs  uuid: 60f1759c-45f6-4484-9f60-66a4e9bbf2b6
> Total devices 2 FS bytes used 1.63TiB
> devid3 size 2.56TiB used 1.66TiB path /dev/sdb4
> devid4 size 2.56TiB used 1.66TiB path /dev/sda4

   ... you have unallocated space here, so the FS can now allocate
more metadata as it needs to.

> Does it mean that data/system/metadata will be able to grow now,
> until their size in total in 2.56TiB?

   Yes.

   Although note that where btrfs fi df reports space, that's _usable_
space (i.e. how much data you can fit in it), but where btrfs fi show
reports space, it's disk bytes (i.e. how much of the disk has useful
content on it). With RAID-1, the first figure is half the second
figure. In your case, that's simple, but with different RAID levels
for data and metadata the calculation becomes a little bit more
complicated.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- What part of "gestalt" don't you understand? ---   


signature.asc
Description: Digital signature

Re: no space left, metadata usage almost full?

2013-12-18 Thread Tomasz Chmielewski

On Wed, 18 Dec 2013 12:46:52 +
Hugo Mills  wrote:

> > # btrfs fi df /home
> > Data, RAID1: total=2.51TiB, used=1.58TiB
> > System, RAID1: total=32.00MiB, used=372.00KiB
> > Metadata, RAID1: total=48.00GiB, used=47.23GiB
> > # btrfs fi balance start -dusage=5 /home

>Currently, yes, it is the only approach.
>Hope the above helps,

So the balance finished, and metadata is still almost full:

# btrfs fi df /home
Data, RAID1: total=1.60TiB, used=1.58TiB
System, RAID1: total=32.00MiB, used=248.00KiB
Metadata, RAID1: total=49.00GiB, used=47.24GiB

Confused about the output - does it actually look any better?


# btrfs fi show /home
Label: crawler-btrfs  uuid: 60f1759c-45f6-4484-9f60-66a4e9bbf2b6
Total devices 2 FS bytes used 1.63TiB
devid3 size 2.56TiB used 1.66TiB path /dev/sdb4
devid4 size 2.56TiB used 1.66TiB path /dev/sda4

Btrfs v3.12


Does it mean that data/system/metadata will be able to grow now,
until their size in total in 2.56TiB?

-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: no space left, metadata usage almost full?

2013-12-18 Thread Hugo Mills

On Wed, Dec 18, 2013 at 09:37:28PM +0900, Tomasz Chmielewski wrote:
> I have a btrfs filesystem which has plenty of free space left, yet it's
> hitting out of space regularly.
> 
> Here is how it looks like:
> 
> # btrfs fi df /home
> Data, RAID1: total=2.51TiB, used=1.58TiB
> System, RAID1: total=32.00MiB, used=372.00KiB
> Metadata, RAID1: total=48.00GiB, used=47.23GiB
> 
> 
> What I read from it, is we're almost full on metadata usage, and that
> might be causing out of space issues.

   This is highly likely.

> Reading past posts on this group, I can see it's recommended to run
> this if I hit out of space and the fs is low on metadata space:
> 
> # btrfs fi balance start -dusage=5 /home
> 
> Is it really the only workaround? Shouldn't the filesystem be more
> intelligent and be able to grab some more metadata space if it's
> running low?

   Currently, yes, it is the only approach.

   The automatic reclamation of unused chunks (or barely-used chunks)
is on the projects list. Nobody's got round to implementing it yet.

> I'd appreciate some clarifications on this (FYI, it was running
> 3.11.4, upgraded to the latest rc now).

   Hope the above helps,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- If it's December 1941 in Casablanca,  what time is it ---  
  in New York?   


signature.asc
Description: Digital signature

no space left, metadata usage almost full?

2013-12-18 Thread Tomasz Chmielewski

I have a btrfs filesystem which has plenty of free space left, yet it's
hitting out of space regularly.

Here is how it looks like:

# btrfs fi df /home
Data, RAID1: total=2.51TiB, used=1.58TiB
System, RAID1: total=32.00MiB, used=372.00KiB
Metadata, RAID1: total=48.00GiB, used=47.23GiB


What I read from it, is we're almost full on metadata usage, and that
might be causing out of space issues.

Reading past posts on this group, I can see it's recommended to run
this if I hit out of space and the fs is low on metadata space:

# btrfs fi balance start -dusage=5 /home

Is it really the only workaround? Shouldn't the filesystem be more
intelligent and be able to grab some more metadata space if it's
running low?

I'd appreciate some clarifications on this (FYI, it was running
3.11.4, upgraded to the latest rc now).


-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance on single device

2013-12-18 Thread Leonidas Spyropoulos

On Wed, Dec 18, 2013 at 11:05:29AM +, Hugo Mills wrote:
> On Wed, Dec 18, 2013 at 10:44:43AM +, Leonidas Spyropoulos wrote:
> > I'm using the same subject as it might be relevant, feel free to change it.#
> > 
> > I'm trying to do some maintenance to the system running over a btrfs file 
> > system on root (/). I started a balance on the '/' partition and it failed 
> > with the below information:
> > $ sudo btrfs balance start /
> > [sudo] password for inglor:
> > ERROR: error during balancing '/' - No space left on device
> > There may be more info in syslog - try dmesg | tail
> > $ dmesg | tail
> > [93827.115887] btrfs: found 29461 extents
> > [93827.481849] btrfs: relocating block group 29855055872 flags 1
> > [93841.646011] btrfs: found 33171 extents
> > [93851.421207] btrfs: found 33171 extents
> > [93851.782054] btrfs: relocating block group 28781314048 flags 1
> > [93866.815342] btrfs: found 52535 extents
> > [93877.159354] btrfs: found 52534 extents
> > [93877.356805] btrfs: relocating block group 28747759616 flags 34
> > [93880.287185] btrfs: found 1 extents
> > [93880.608798] btrfs: 1 enospc errors during balance
> 
>You don't specify your kernel version, but if it's older than 3.11
> or so, you should probably upgrade -- 3.10 and earlier had occasional
> bugs where the block reserve system never kept enough blocks free to
> add a new metadata chunk when it was needed, which led to exactly this
> kind of symptom.

You are right, apologies. It is an up to date Archlinux box with a kernel:
$ uname -a
Linux tiamat 3.12.5-1-ARCH #1 SMP PREEMPT Thu Dec 12 12:57:31 CET 2013 x86_64 
GNU/Linux

> 
>Alternatively, and this is a bit of a long shot given that the
> error seems to have been while relocating your system chunk (which
> argues against this particular diagnosis), but:
> 
>Do you have a large file on that filesystem (larger than 1 GiB)?

Unlikely since the btrfs file system in question is '/' exluding /opt and 
/media directories (these are other partitions)
$ sudo find / -type f -size +1048576k -and -not -path "/media*" -print
/proc/kcore
find: `/proc/27221/task/27221/fd/5': No such file or directory
find: `/proc/27221/task/27221/fdinfo/5': No such file or directory
find: `/proc/27221/fd/5': No such file or directory
find: `/proc/27221/fdinfo/5': No such file or directory
find: `/run/user/1000/gvfs': Permission denied
inglor@tiamat ~$

> 
>If so, I would recommend switching to a 3.12 kernel, and running a
> defrag on the file. There's a known and now-fixed bug where you can
> get ENOSPC while balancing, if a file has an extent larger than 1 GiB
> in size. (The bug being that there's an extent over 1 GiB in size in
> the first place).

I might try the defrag option anyway and restart the balance operation, see if 
this will help anyway.

Thanks,
Leonidas

> 
>Hugo.
> 
> -- 
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>  --- I'd make a joke about UDP,  but I don't know if --- 
>  anyone's actually listening...  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: improve the performance fluctuating of the fsync

2013-12-18 Thread Leonidas Spyropoulos

On Wed, Dec 18, 2013 at 06:52:44PM +0800, Miao Xie wrote:
> In order to improve the performance of fsync, we use the outstanding
> ordered extents to avoid looking up the checksum from the csum tree.
> But we didn't filter out the ordered extents whose csum is still being
> calculated, when we got those ordered extents, we had to wait for the
> csum calculation. It made the performance dropped down suddenly. (On
> my box, it drop down from 56MB/s to 4-10MB/s)
> 
> But actually, the csum calculation of the ordered extents which were
> introduced by the current fsync had already completed. Those ordered
> extents whose csum was being calculated didn't belong to the current
> fsync, we can ignore them.
> 
> By this patch, the performance fluctuating doesn't happen, and the average
> performance grows up by ~2%.
> [..] 

Will this help with apt-get performance over btrfs file system? As far as I 
understand it it's happening because of multiple fsync calls.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance on single device

2013-12-18 Thread Hugo Mills

On Wed, Dec 18, 2013 at 10:44:43AM +, Leonidas Spyropoulos wrote:
> I'm using the same subject as it might be relevant, feel free to change it.#
> 
> I'm trying to do some maintenance to the system running over a btrfs file 
> system on root (/). I started a balance on the '/' partition and it failed 
> with the below information:
> $ sudo btrfs balance start /
> [sudo] password for inglor:
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
> $ dmesg | tail
> [93827.115887] btrfs: found 29461 extents
> [93827.481849] btrfs: relocating block group 29855055872 flags 1
> [93841.646011] btrfs: found 33171 extents
> [93851.421207] btrfs: found 33171 extents
> [93851.782054] btrfs: relocating block group 28781314048 flags 1
> [93866.815342] btrfs: found 52535 extents
> [93877.159354] btrfs: found 52534 extents
> [93877.356805] btrfs: relocating block group 28747759616 flags 34
> [93880.287185] btrfs: found 1 extents
> [93880.608798] btrfs: 1 enospc errors during balance

   You don't specify your kernel version, but if it's older than 3.11
or so, you should probably upgrade -- 3.10 and earlier had occasional
bugs where the block reserve system never kept enough blocks free to
add a new metadata chunk when it was needed, which led to exactly this
kind of symptom.

   Alternatively, and this is a bit of a long shot given that the
error seems to have been while relocating your system chunk (which
argues against this particular diagnosis), but:

   Do you have a large file on that filesystem (larger than 1 GiB)?

   If so, I would recommend switching to a 3.12 kernel, and running a
defrag on the file. There's a known and now-fixed bug where you can
get ENOSPC while balancing, if a file has an extent larger than 1 GiB
in size. (The bug being that there's an extent over 1 GiB in size in
the first place).

   Hugo.

> $ df |grep sda2
> /dev/sda2   20971520  13980396   5797124  71% /
> 
> 
> $ sudo btrfs fi show
> [sudo] password for inglor:
> Label: none  uuid: 699d671b-7064-441d-95ec-c616049fe287
> Total devices 1 FS bytes used 12.75GB
> devid1 size 20.00GB used 15.31GB path /dev/sda2
> 
> Btrfs v0.20-rc1-358-g194aa4a-dirty
> 
> $ sudo btrfs fi df /
> [sudo] password for inglor:
> Data: total=13.00GB, used=12.16GB
> System, DUP: total=32.00MB, used=4.00KB
> Metadata, DUP: total=1.12GB, used=601.54MB
> 
> Does it really needs more than 5.7GB to do a balance? I though it suppose to 
> move chunks one by one and considering the chunks for Data is 1GB and for 
> MetaData 512MB (256 x2 for dublication) it should be more than enough.
> Also I had less space before and the dmesg reported 7 enospc errors. With 
> cleaning a bit of packages installed now it reports only 1 enospc. Is that 
> anywhere relevant?
> 
> Thanks,
> Leonidas

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I'd make a joke about UDP,  but I don't know if --- 
 anyone's actually listening...  


signature.asc
Description: Digital signature

[PATCH] Btrfs: improve the performance fluctuating of the fsync

2013-12-18 Thread Miao Xie

In order to improve the performance of fsync, we use the outstanding
ordered extents to avoid looking up the checksum from the csum tree.
But we didn't filter out the ordered extents whose csum is still being
calculated, when we got those ordered extents, we had to wait for the
csum calculation. It made the performance dropped down suddenly. (On
my box, it drop down from 56MB/s to 4-10MB/s)

But actually, the csum calculation of the ordered extents which were
introduced by the current fsync had already completed. Those ordered
extents whose csum was being calculated didn't belong to the current
fsync, we can ignore them.

By this patch, the performance fluctuating doesn't happen, and the average
performance grows up by ~2%.

Test Environment:
CPU:2CPU * 2Cores
Memory: 4GB
Partition:  20GB(HDD)

Test Command:
 # sysbench --num-threads=8 --test=fileio --file-num=1 \
 > --file-total-size=8G --file-block-size=32768 \
 > --file-io-mode=sync --file-fsync-freq=100 \
 > --file-fsync-end=no --max-requests=1 \
 > --file-test-mode=rndwr run

Signed-off-by: Miao Xie 
---
 fs/btrfs/ordered-data.c | 3 +++
 fs/btrfs/tree-log.c | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b8c2ded..df87ed5 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -433,6 +433,9 @@ void btrfs_get_logged_extents(struct btrfs_root *log, 
struct inode *inode)
spin_lock_irq(&tree->lock);
for (n = rb_first(&tree->tree); n; n = rb_next(n)) {
ordered = rb_entry(n, struct btrfs_ordered_extent, rb_node);
+   if (ordered->csum_bytes_left)
+   continue;
+
spin_lock(&log->log_extents_lock[index]);
if (list_empty(&ordered->log_list)) {
list_add_tail(&ordered->log_list, 
&log->logged_list[index]);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index ba2f151..3eae2eb 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3631,8 +3631,6 @@ again:
 * start over after this.
 */
 
-   wait_event(ordered->wait, ordered->csum_bytes_left == 0);
-
list_for_each_entry(sum, &ordered->list, list) {
ret = btrfs_csum_file_blocks(trans, log, sum);
if (ret) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance on single device

2013-12-18 Thread Leonidas Spyropoulos

I'm using the same subject as it might be relevant, feel free to change it.#

I'm trying to do some maintenance to the system running over a btrfs file 
system on root (/). I started a balance on the '/' partition and it failed with 
the below information:
$ sudo btrfs balance start /
[sudo] password for inglor:
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail
$ dmesg | tail
[93827.115887] btrfs: found 29461 extents
[93827.481849] btrfs: relocating block group 29855055872 flags 1
[93841.646011] btrfs: found 33171 extents
[93851.421207] btrfs: found 33171 extents
[93851.782054] btrfs: relocating block group 28781314048 flags 1
[93866.815342] btrfs: found 52535 extents
[93877.159354] btrfs: found 52534 extents
[93877.356805] btrfs: relocating block group 28747759616 flags 34
[93880.287185] btrfs: found 1 extents
[93880.608798] btrfs: 1 enospc errors during balance

$ df |grep sda2
/dev/sda2   20971520  13980396   5797124  71% /


$ sudo btrfs fi show
[sudo] password for inglor:
Label: none  uuid: 699d671b-7064-441d-95ec-c616049fe287
Total devices 1 FS bytes used 12.75GB
devid1 size 20.00GB used 15.31GB path /dev/sda2

Btrfs v0.20-rc1-358-g194aa4a-dirty

$ sudo btrfs fi df /
[sudo] password for inglor:
Data: total=13.00GB, used=12.16GB
System, DUP: total=32.00MB, used=4.00KB
Metadata, DUP: total=1.12GB, used=601.54MB

Does it really needs more than 5.7GB to do a balance? I though it suppose to 
move chunks one by one and considering the chunks for Data is 1GB and for 
MetaData 512MB (256 x2 for dublication) it should be more than enough.
Also I had less space before and the dmesg reported 7 enospc errors. With 
cleaning a bit of packages installed now it reports only 1 enospc. Is that 
anywhere relevant?

Thanks,
Leonidas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] Btrfs-progs: receive: fix the case that we can not find the subvolume

2013-12-18 Thread Wang Shilong

If we change our default subvolume, btrfs receive will fail to find
subvolume. To fix this problem, we have three ideas:

 1.make btrfs snapshot ioctl support passing source subvolume's objectid.
 2.when we want to using interval subvolume path, we mount it other place
 that use subvolume 5 as its default subvolume.
 3.tell the user to mount the toplevel subvol by himself and run
 receive

We's better use the third approach because first patch will bother kernel
change and the second approach is not very good for power users. So give this
option to users.

Reported-by: Michael Welsh Duggan 
Signed-off-by: Wang Shilong 
Signed-off-by: Miao Xie 
---
Changelog:
v1->v2:
addressed david's comments and use the third approach to fix the problem
---
 cmds-receive.c | 11 +--
 man/btrfs.8.in | 15 ++-
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index ed44107..cce37a7 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -257,8 +257,15 @@ static int process_snapshot(const char *path, const u8 
*uuid, u64 ctransid,
O_RDONLY | O_NOATIME);
if (args_v2.fd < 0) {
ret = -errno;
-   fprintf(stderr, "ERROR: open %s failed. %s\n",
-   parent_subvol->path, strerror(-ret));
+   if (errno != ENOENT)
+   fprintf(stderr, "ERROR: open %s failed. %s\n",
+   parent_subvol->path, strerror(-ret));
+   else
+   fprintf(stderr,
+   "It seems that you have changed your default "
+   "subvolume or you specify other subvolume to\n"
+   "mount btrfs, try to remount this btrfs 
filesystem "
+   "with fs tree, and run btrfs receive again!\n");
goto out;
}
 
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 901caa5..ece6a5a 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -668,11 +668,16 @@ Receive subvolumes from stdin.
 Receives one or more subvolumes that were previously
 sent with btrfs send. The received subvolumes are stored
 into \fI\fP.
-btrfs receive will fail in case a receiving subvolume
-already exists. It will also fail in case a previously
-received subvolume was changed after it was received.
-After receiving a subvolume, it is immediately set to
-read only.
+btrfs receive will fail with the following case:
+
+1.a receiving subvolume already exists.
+
+2.a previously received subvolume was changed after it was received.
+
+3.default subvolume is changed or you don't mount btrfs filesystem with
+fs tree.
+
+After receiving a subvolume, it is immediately set to read only.
 .RS
 
 \fIOptions\fR
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs RAID1 File System Grew Something Extra

2013-12-18 Thread Duncan

Garry T. Williams posted on Tue, 17 Dec 2013 23:12:25 -0500 as excerpted:

> On 12-18-13 10:46:29 Anand Jain wrote:
>> On 12/18/2013 10:03 AM, Garry T. Williams wrote:
>> > I have been using btrfs for my /home partition on my home machine for
>> > a few years now.  I created the file system RAID1 using two disk
>> > partitions.  Recently I noticed btrfs fi df shows extra Data, System,
>> > and Metadata allocations.
>>
>>   this is a known bug in mkfs.btrfs, the workaround for now is to run
>>   balance on FS having some data. so that unused group-
>>   profile will go away.
> 
> Thanks.
> 
> garry@vfr$ sudo btrfs balance start /home
> Done, had to relocate 50 out of 50 chunks
> garry@vfr$ sudo btrfs filesystem df /home
> Data, RAID1: total=22.00GiB, used=21.02GiB
> System, RAID1: total=32.00MiB, used=12.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, RAID1: total=1.00GiB, used=419.60MiB
> 
> Hmmm.
> 
> Well, it's better, but the extra allocation for System is baffling.  I
> believe that this happened sometime after creating the file system.

Keep in mind that btrfs remains under development, still improving old 
features and growing new ones as well as bugfixing (and of course 
unfortunately still adding new bugs with the new code occasionally, it 
comes with the development filesystem territory).

Having seen the same thing happen here, I think the extra allocations 
were there all the time, but simply weren't originally reported.  After 
some improvements in btrfs fi df, it more accurately reported the empty 
chunk-stub relics of mkfs.btrfs where it didn't before, so they appeared 
to be new even if they'd been there all the time.  But a balance normally 
does remove them.

Tho that doesn't explain why the balance didn't remove that 4 MiB single-
mode system stub.  It did on all my btrfs here.  But I run gentoo and 
build/install gentoo's live-git btrfs-progs, and build/run the mainline 
development kernel from live-git as well, so I'm well into the 3.13-rcs 
by now, while you haven't even upgraded to 3.12 yet and are still on 3.11-
stable series, which might account for that.  Or perhaps another balance 
would kill the system-stub as well?  I don't know.

> Also balance on a RAID1 file system with exactly two drives doesn't make
> much sense to me.  Why would any "chunks" have to be relocated? I'm
> clearly missing something here.

You haven't read up on how btrfs balance works at the wiki, have you?  
Which means you're probably missing other information that might be 
helpful in administering your btrfs as well.  It'll likely be worth your 
while to spend some time reading the user documentation there (and to 
bookmark it for further reference, too =:^) :

https://btrfs.wiki.kernel.org/

For balance in particular, see:

https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F

Also of interest:

https://btrfs.wiki.kernel.org/index.php/Balance_Filters

Of course, the btrfs manpage is also useful.

Basically, unless you limit it with the -d/-m/-s switches and/or filters, 
balance blindly rewrites/relocates every chunk on the filesystem, 
cleaning up and if applicable converting between redundancy types as it 
does so.  So all chunks are relocated/rewritten.

But the above documentation should also suggest trying this to see if it 
addresses that remaining single-mode system chunk stub:

btrfs balance start -fsconvert=raid1 /home

Especially if you're on spinning rust (not SSD), that should take quite a 
bit less time than a full balance as well, because you're only rebalancing 
the few MiB of system chunks, not the GiBs of data and metadata.

Hopefully that'll kill the single-mode system stub-chunk.  If not, you've 
probably hit a bug and should report it as such, tho you might wish to 
try it with the latest 3.12 stable or 3.13-rc first, in case the bug has 
already been fixed.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs RAID1 File System Grew Something Extra

2013-12-18 Thread Hugo Mills

On Tue, Dec 17, 2013 at 11:12:25PM -0500, Garry T. Williams wrote:
> On 12-18-13 10:46:29 Anand Jain wrote:
> > On 12/18/2013 10:03 AM, Garry T. Williams wrote:
> > > I have been using btrfs for my /home partition on my home machine for
> > > a few years now.  I created the file system RAID1 using two disk
> > > partitions.  Recently I noticed btrfs fi df shows extra Data, System,
> > > and Metadata allocations.  And btrfs fi show indicates extra
> > > allocations on one of my disk drives accounting for the 20 MiB
> > > allocation in the df display.
> >
> >   this is a known bug in mkfs.btrfs, the workaround for now is
> >   to run balance on FS having some data. so that unused group-
> >   profile will go away.
> 
> Thanks.
> 
> garry@vfr$ sudo btrfs balance start /home
> Done, had to relocate 50 out of 50 chunks
> garry@vfr$ sudo btrfs filesystem df /home
> Data, RAID1: total=22.00GiB, used=21.02GiB
> System, RAID1: total=32.00MiB, used=12.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, RAID1: total=1.00GiB, used=419.60MiB
> 
> Hmmm.
> 
> Well, it's better, but the extra allocation for System is baffling.  I
> believe that this happened sometime after creating the file system.

   It won't be spontaneously created -- it'll have been there since
the beginning. The first system chunk is "special" and is skipped
during balances, so it won't get cleaned up like this.

> Also balance on a RAID1 file system with exactly two drives doesn't
> make much sense to me.  Why would any "chunks" have to be relocated?
> I'm clearly missing something here.

   That's what balance does -- it rewrites every single piece of data
on the filesystem. In this case, you could have used a filter to
balance (and hence remove) only the single chunks:

btrfs balance start -mprofiles=single -dprofiles=single -sprofiles=single 
/mountpoint

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- One of these days, I'll catch that man without a quotation, ---   
and he'll look undressed.


signature.asc
Description: Digital signature

Re: [PATCH v2] Btrfs: fix tree mod logging

Re: [PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref

[PATCH] btrfs-progs: sync-up with newly introduced ioctl number

[PATCH] btrfs: ioctls would need unique id

[PATCH v3] btrfs: add framework to read fs info from btrfs-control

[PATCH] btrfs: fix the warning in prepare_pages

Re: Rework qgroup accounting

Re: [PATCH v4] Btrfs: convert printk to btrfs_ and fix BTRFS prefix

Re: [PATCH] Btrfs: improve the performance fluctuating of the fsync

[PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref

Rework qgroup accounting

[PATCH 3/3] Btrfs: add sanity tests for new qgroup accounting code

Re: Backporting bugfixes

Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume

Re: Btrfs stable updates for 3.12

Re: Backporting bugfixes

Re: Backporting bugfixes

Backporting bugfixes

btrfs on bcache

Btrfs stable updates for 3.12

Re: no space left, metadata usage almost full?

Re: no space left, metadata usage almost full?

Re: no space left, metadata usage almost full?

no space left, metadata usage almost full?

Re: btrfs balance on single device

Re: [PATCH] Btrfs: improve the performance fluctuating of the fsync

Re: btrfs balance on single device

[PATCH] Btrfs: improve the performance fluctuating of the fsync

Re: btrfs balance on single device

[PATCH v2] Btrfs-progs: receive: fix the case that we can not find the subvolume

Re: Btrfs RAID1 File System Grew Something Extra

Re: Btrfs RAID1 File System Grew Something Extra

32 matches

Site Navigation

Mail list logo

Footer information