[PATCH] Btrfs: fix reada debug code compilation

2013-04-16 Thread Vincent
This fixes the following errors:

  fs/btrfs/reada.c: In function ‘btrfs_reada_wait’:
  fs/btrfs/reada.c:958:42: error: invalid operands to binary  (have ‘atomic_t’ 
and ‘int’)
  fs/btrfs/reada.c:961:41: error: invalid operands to binary  (have ‘atomic_t’ 
and ‘int’)

Signed-off-by: Vincent Stehlé vincent.ste...@laposte.net
Cc: Chris Mason chris.ma...@fusionio.com
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/reada.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 692185e..d1690c3 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -955,10 +955,11 @@ int btrfs_reada_wait(void *handle)
while (atomic_read(rc-elems)) {
wait_event_timeout(rc-wait, atomic_read(rc-elems) == 0,
   5 * HZ);
-   dump_devs(rc-root-fs_info, rc-elems  10 ? 1 : 0);
+   dump_devs(rc-root-fs_info,
+ atomic_read(rc-elems)  10 ? 1 : 0);
}
 
-   dump_devs(rc-root-fs_info, rc-elems  10 ? 1 : 0);
+   dump_devs(rc-root-fs_info, atomic_read(rc-elems)  10 ? 1 : 0);
 
kref_put(rc-refcnt, reada_control_release);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Activating space_cache after read-only snapshots without space_cache have been taken

2013-04-16 Thread Sander
Liu Bo wrote (ao):
 On Tue, Apr 16, 2013 at 02:28:51AM +0200, Ochi wrote:
  The situation is the following: I have created a backup-volume to
  which I regularly rsync a backup of my system into a subvolume.
  After rsync'ing, I take a _read-only_ snapshot of that subvolume
  with a timestamp added to its name.
  
  Now at the time I started using this backup volume, I was _not_
  using the space_cache mount option and two read-only snapshots were
  taken during this time. Then I started using the space_cache option
  and continued doing snapshots.
  
  A bit later, I started having very long lags when unmounting the
  backup volume (both during shutdown and when unmounting manually). I
  scrubbed and fsck'd the volume but this didn't show any errors.
  Defragmenting the root and subvolumes took a long time but didn't
  improve the situation much.
 
 So are you using '-o nospace_cache' when creating two RO snapshots?

No, he first created two ro snapshots, then (some time later) mounted
with nospace_cache, and then continued to take ro snapshots.

  Now I started having the suspicion that maybe the space cache
  possibly couldn't be written to disk for the readonly
  subvolumes/snapshots that were created during the time when I wasn't
  using the space_cache option, forcing the cache to be rebuilt every
  time.
  
  Clearing the cache didn't help. But when I deleted the two snapshots
  that I think were taken during the time without the mount option,
  the unmounting time seems to have improved considerably.
 
 I don't know why this happens, but maybe you can observe the umount
 process's very slow behaviour by using 'cat /proc/{umount-pid}/stack'
 or 'perf top'.

AFAIUI the problem is not there anymore, but this is a good tip for the
future.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: return error when we specify wrong start

2013-04-16 Thread Liu Bo
We need such a sanity check for wrong start, otherwise, even with
a wrong start that's larger than file size, we can end up not only
changing inode's force compress flag but also FS's incompat flags.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/ioctl.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 69cd80d..262d9db 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1152,8 +1152,11 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
u64 new_align = ~((u64)128 * 1024 - 1);
struct page **pages = NULL;
 
-   if (extent_thresh == 0)
-   extent_thresh = 256 * 1024;
+   if (isize == 0)
+   return 0;
+
+   if (range-start = isize)
+   return -EINVAL;
 
if (range-flags  BTRFS_DEFRAG_RANGE_COMPRESS) {
if (range-compress_type  BTRFS_COMPRESS_TYPES)
@@ -1162,8 +1165,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
compress_type = range-compress_type;
}
 
-   if (isize == 0)
-   return 0;
+   if (extent_thresh == 0)
+   extent_thresh = 256 * 1024;
 
/*
 * if we were not given a file, allocate a readahead
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: record errno for ioctl DEFRAG_RANGE

2013-04-16 Thread Liu Bo
In order to sparse exact error message, we need to record errno here.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 cmds-filesystem.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 2210020..3f386e2 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -431,6 +431,7 @@ static int cmd_defrag(int argc, char **argv)
close(fd);
break;
}
+   e = errno;
}
if (ret) {
fprintf(stderr, ERROR: defrag failed on %s - %s\n,
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: return error when we specify wrong start

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 10:40 (+0200), Liu Bo wrote:
 We need such a sanity check for wrong start, otherwise, even with
 a wrong start that's larger than file size, we can end up not only
 changing inode's force compress flag but also FS's incompat flags.

That reads out very cryptic. Can you please add something hinting at defrag to
the title or at least the description?

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Btrfs: quota rescan for 3.10

2013-04-16 Thread Jan Schmidt
The kernel side for rescan, which is needed if you want to enable qgroup
tracking on a non-empty volume. The first patch splits
btrfs_qgroup_account_ref into readable ans reusable units. The second
patch adds the rescan implementation (refer to its commit message for a
description of the algorithm). The third  patch starts an automatic
rescan when qgroups are enabled. It is only separated to potentially
help bisecting things in case of a problem.

The required user space patch was sent at 2013-04-05, subject [PATCH]
Btrfs-progs: quota rescan.

--
Changes v1-v2:
- fix calculation of the exclusive field for qgroups in level != 0
- split btrfs_qgroup_account_ref
- take into account that mutex_unlock might schedule
- fix kzalloc error checking
- add some reserved ints to struct btrfs_ioctl_quota_rescan_args
- changed modification to unused #define BTRFS_QUOTA_CTL_RESCAN
- added missing (unsigned long long) casts for pr_debug
- more detailed commit messages

Jan Schmidt (3):
  Btrfs: split btrfs_qgroup_account_ref into four functions
  Btrfs: rescan for qgroups
  Btrfs: automatic rescan after quota enable command

 fs/btrfs/ctree.h   |   17 +-
 fs/btrfs/disk-io.c |6 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  517 +++-
 include/uapi/linux/btrfs.h |   12 +-
 5 files changed, 509 insertions(+), 126 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Jan Schmidt
If qgroup tracking is out of sync, a rescan operation can be started. It
iterates the complete extent tree and recalculates all qgroup tracking data.
This is an expensive operation and should not be used unless required.

A filesystem under rescan can still be umounted. The rescan continues on the
next mount.  Status information is provided with a separate ioctl while a
rescan operation is in progress.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |   17 ++-
 fs/btrfs/disk-io.c |6 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  295 +--
 include/uapi/linux/btrfs.h |   12 ++-
 5 files changed, 378 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d82922..bd4e2a7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1019,9 +1019,9 @@ struct btrfs_block_group_item {
  */
 #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL  0)
 /*
- * SCANNING is set during the initialization phase
+ * RESCAN is set during the initialization phase
  */
-#define BTRFS_QGROUP_STATUS_FLAG_SCANNING  (1ULL  1)
+#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL  1)
 /*
  * Some qgroup entries are known to be out of date,
  * either because the configuration has changed in a way that
@@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item {
 * only used during scanning to record the progress
 * of the scan. It contains a logical address
 */
-   __le64 scan;
+   __le64 rescan;
 } __attribute__ ((__packed__));
 
 struct btrfs_qgroup_info_item {
@@ -1587,6 +1587,11 @@ struct btrfs_fs_info {
/* used by btrfs_qgroup_record_ref for an efficient tree traversal */
u64 qgroup_seq;
 
+   /* qgroup rescan items */
+   struct mutex qgroup_rescan_lock; /* protects the progress item */
+   struct btrfs_key qgroup_rescan_progress;
+   struct btrfs_workers qgroup_rescan_workers;
+
/* filesystem state */
unsigned long fs_state;
 
@@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
btrfs_qgroup_status_item,
   version, 64);
 BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
   flags, 64);
-BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
-  scan, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
+  rescan, 64);
 
 /* btrfs_qgroup_info_item */
 BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
@@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
-int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6d19a0a..60d15fe 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb,
fs_info-qgroup_seq = 1;
fs_info-quota_enabled = 0;
fs_info-pending_quota_state = 0;
+   mutex_init(fs_info-qgroup_rescan_lock);
 
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
@@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(fs_info-readahead_workers, readahead,
   fs_info-thread_pool_size,
   fs_info-generic_worker);
+   btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
+  fs_info-generic_worker);
 
/*
 * endios are largely parallel and should have a very
@@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(fs_info-caching_workers);
ret |= btrfs_start_workers(fs_info-readahead_workers);
ret |= btrfs_start_workers(fs_info-flush_workers);
+   ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
@@ -2773,6 +2777,7 @@ fail_sb_buffer:
btrfs_stop_workers(fs_info-delayed_workers);
btrfs_stop_workers(fs_info-caching_workers);
btrfs_stop_workers(fs_info-flush_workers);
+   btrfs_stop_workers(fs_info-qgroup_rescan_workers);
 fail_alloc:
 fail_iput:
btrfs_mapping_tree_free(fs_info-mapping_tree);
@@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root)
btrfs_stop_workers(fs_info-caching_workers);

[PATCH v2 3/3] Btrfs: automatic rescan after quota enable command

2013-04-16 Thread Jan Schmidt
When qgroup tracking is enabled, we do an automatic cycle of the new rescan
mechanism.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index bb081b5..0ea2c3e 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1356,10 +1356,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 {
struct btrfs_root *quota_root = fs_info-quota_root;
int ret = 0;
+   int start_rescan_worker = 0;
 
if (!quota_root)
goto out;
 
+   if (!fs_info-quota_enabled  fs_info-pending_quota_state)
+   start_rescan_worker = 1;
+
fs_info-quota_enabled = fs_info-pending_quota_state;
 
spin_lock(fs_info-qgroup_lock);
@@ -1385,6 +1389,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
if (ret)
fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 
+   if (start_rescan_worker) {
+   ret = btrfs_qgroup_rescan(fs_info);
+   if (ret)
+   pr_err(btrfs: start rescan quota failed: %d\n, ret);
+   }
+
 out:
 
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Jan Schmidt
The function is separated into a preparation part and the three accounting
steps mentioned in the qgroups documentation. The goal is to make steps two
and three usable by the rescan functionality. A side effect is that the
function is restructured into readable subunits.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |  212 ++---
 1 files changed, 121 insertions(+), 91 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b44124d..c38a0c5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1075,6 +1075,122 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
+static void qgroup_account_ref_step1(struct btrfs_fs_info *fs_info,
+struct ulist *roots, struct ulist *tmp,
+u64 seq)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+   struct btrfs_qgroup *qg;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   /* XXX id not needed */
+   ulist_add(tmp, qg-qgroupid, (u64)(uintptr_t)qg, GFP_ATOMIC);
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
+   struct btrfs_qgroup_list *glist;
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
+   if (qg-refcnt  seq)
+   qg-refcnt = seq + 1;
+   else
+   ++qg-refcnt;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ulist_add(tmp, glist-group-qgroupid,
+ (u64)(uintptr_t)glist-group,
+ GFP_ATOMIC);
+   }
+   }
+   }
+}
+
+static void qgroup_account_ref_step2(struct btrfs_fs_info *fs_info,
+struct ulist *roots, struct ulist *tmp,
+u64 seq, int sgn, u64 num_bytes,
+struct btrfs_qgroup *qgroup)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct btrfs_qgroup_list *glist;
+
+   ulist_reinit(tmp);
+   ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC);
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(tmp, uiter))) {
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux;
+   if (qg-refcnt  seq) {
+   /* not visited by step 1 */
+   qg-rfer += sgn * num_bytes;
+   qg-rfer_cmpr += sgn * num_bytes;
+   if (roots-nnodes == 0) {
+   qg-excl += sgn * num_bytes;
+   qg-excl_cmpr += sgn * num_bytes;
+   }
+   qgroup_dirty(fs_info, qg);
+   }
+   WARN_ON(qg-tag = seq);
+   qg-tag = seq;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ulist_add(tmp, glist-group-qgroupid,
+ (uintptr_t)glist-group, GFP_ATOMIC);
+   }
+   }
+}
+
+static void qgroup_account_ref_step3(struct btrfs_fs_info *fs_info,
+struct ulist *roots, struct ulist *tmp,
+u64 seq, int sgn, u64 num_bytes)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
+   struct btrfs_qgroup_list *glist;
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
+   if (qg-tag == seq)
+   continue;
+
+   if (qg-refcnt - seq == roots-nnodes) {
+   qg-excl -= sgn * num_bytes;
+   qg-excl_cmpr -= sgn * num_bytes;
+   qgroup_dirty(fs_info, qg);
+   }

Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Wang Shilong
Hello Jan,

 The function is separated into a preparation part and the three accounting
 steps mentioned in the qgroups documentation. The goal is to make steps two
 and three usable by the rescan functionality. A side effect is that the
 function is restructured into readable subunits.


How about renaming the three functions like:

1 qgroup_walk_old_roots()
2 qgroup_walk_new_root()
3 qgroup_rewalk_old_root()

I'd like this function to be meaningful, but not just step1,2,3.
Maybe you can think out better function name.

Thanks,
Wang

 
 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
  fs/btrfs/qgroup.c |  212 
 ++---
  1 files changed, 121 insertions(+), 91 deletions(-)
 
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index b44124d..c38a0c5 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -1075,6 +1075,122 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
 *trans,
   return 0;
  }
  
 +static void qgroup_account_ref_step1(struct btrfs_fs_info *fs_info,
 +  struct ulist *roots, struct ulist *tmp,
 +  u64 seq)
 +{
 + struct ulist_node *unode;
 + struct ulist_iterator uiter;
 + struct ulist_node *tmp_unode;
 + struct ulist_iterator tmp_uiter;
 + struct btrfs_qgroup *qg;
 +
 + ULIST_ITER_INIT(uiter);
 + while ((unode = ulist_next(roots, uiter))) {
 + qg = find_qgroup_rb(fs_info, unode-val);
 + if (!qg)
 + continue;
 +
 + ulist_reinit(tmp);
 + /* XXX id not needed */
 + ulist_add(tmp, qg-qgroupid, (u64)(uintptr_t)qg, GFP_ATOMIC);
 + ULIST_ITER_INIT(tmp_uiter);
 + while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
 + struct btrfs_qgroup_list *glist;
 +
 + qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
 + if (qg-refcnt  seq)
 + qg-refcnt = seq + 1;
 + else
 + ++qg-refcnt;
 +
 + list_for_each_entry(glist, qg-groups, next_group) {
 + ulist_add(tmp, glist-group-qgroupid,
 +   (u64)(uintptr_t)glist-group,
 +   GFP_ATOMIC);
 + }
 + }
 + }
 +}
 +
 +static void qgroup_account_ref_step2(struct btrfs_fs_info *fs_info,
 +  struct ulist *roots, struct ulist *tmp,
 +  u64 seq, int sgn, u64 num_bytes,
 +  struct btrfs_qgroup *qgroup)
 +{
 + struct ulist_node *unode;
 + struct ulist_iterator uiter;
 + struct btrfs_qgroup *qg;
 + struct btrfs_qgroup_list *glist;
 +
 + ulist_reinit(tmp);
 + ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC);
 +
 + ULIST_ITER_INIT(uiter);
 + while ((unode = ulist_next(tmp, uiter))) {
 +
 + qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux;
 + if (qg-refcnt  seq) {
 + /* not visited by step 1 */
 + qg-rfer += sgn * num_bytes;
 + qg-rfer_cmpr += sgn * num_bytes;
 + if (roots-nnodes == 0) {
 + qg-excl += sgn * num_bytes;
 + qg-excl_cmpr += sgn * num_bytes;
 + }
 + qgroup_dirty(fs_info, qg);
 + }
 + WARN_ON(qg-tag = seq);
 + qg-tag = seq;
 +
 + list_for_each_entry(glist, qg-groups, next_group) {
 + ulist_add(tmp, glist-group-qgroupid,
 +   (uintptr_t)glist-group, GFP_ATOMIC);
 + }
 + }
 +}
 +
 +static void qgroup_account_ref_step3(struct btrfs_fs_info *fs_info,
 +  struct ulist *roots, struct ulist *tmp,
 +  u64 seq, int sgn, u64 num_bytes)
 +{
 + struct ulist_node *unode;
 + struct ulist_iterator uiter;
 + struct btrfs_qgroup *qg;
 + struct ulist_node *tmp_unode;
 + struct ulist_iterator tmp_uiter;
 +
 + ULIST_ITER_INIT(uiter);
 + while ((unode = ulist_next(roots, uiter))) {
 + qg = find_qgroup_rb(fs_info, unode-val);
 + if (!qg)
 + continue;
 +
 + ulist_reinit(tmp);
 + ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);
 + ULIST_ITER_INIT(tmp_uiter);
 + while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
 + struct btrfs_qgroup_list *glist;
 +
 + qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
 + if (qg-tag == seq)
 + continue;
 +
 + if (qg-refcnt - seq == 

Re: [PATCH] Btrfs: return error when we specify wrong start

2013-04-16 Thread Liu Bo
On Tue, Apr 16, 2013 at 10:44:33AM +0200, Jan Schmidt wrote:
 On Tue, April 16, 2013 at 10:40 (+0200), Liu Bo wrote:
  We need such a sanity check for wrong start, otherwise, even with
  a wrong start that's larger than file size, we can end up not only
  changing inode's force compress flag but also FS's incompat flags.
 
 That reads out very cryptic. Can you please add something hinting at defrag 
 to
 the title or at least the description?

Oh yeah, my bad.

thanks,
liubo

 
 Thanks,
 -Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: return error when we specify wrong start to defrag

2013-04-16 Thread Liu Bo
We need such a sanity check for wrong start when we defrag a file, otherwise,
even with a wrong start that's larger than file size, we can end up changing
not only inode's force compress flag but also FS's incompat flags.

Signed-off-by: Liu Bo bo.li@oracle.com
---
v2: make changelog more clearly.

 fs/btrfs/ioctl.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 69cd80d..262d9db 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1152,8 +1152,11 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
u64 new_align = ~((u64)128 * 1024 - 1);
struct page **pages = NULL;
 
-   if (extent_thresh == 0)
-   extent_thresh = 256 * 1024;
+   if (isize == 0)
+   return 0;
+
+   if (range-start = isize)
+   return -EINVAL;
 
if (range-flags  BTRFS_DEFRAG_RANGE_COMPRESS) {
if (range-compress_type  BTRFS_COMPRESS_TYPES)
@@ -1162,8 +1165,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
compress_type = range-compress_type;
}
 
-   if (isize == 0)
-   return 0;
+   if (extent_thresh == 0)
+   extent_thresh = 256 * 1024;
 
/*
 * if we were not given a file, allocate a readahead
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Wang Shilong
Hello, Jan

 If qgroup tracking is out of sync, a rescan operation can be started. It
 iterates the complete extent tree and recalculates all qgroup tracking data.
 This is an expensive operation and should not be used unless required.
 
 A filesystem under rescan can still be umounted. The rescan continues on the
 next mount.  Status information is provided with a separate ioctl while a
 rescan operation is in progress.
 
 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
  fs/btrfs/ctree.h   |   17 ++-
  fs/btrfs/disk-io.c |6 +
  fs/btrfs/ioctl.c   |   83 ++--
  fs/btrfs/qgroup.c  |  295 +--
  include/uapi/linux/btrfs.h |   12 ++-
  5 files changed, 378 insertions(+), 35 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 0d82922..bd4e2a7 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1019,9 +1019,9 @@ struct btrfs_block_group_item {
   */
  #define BTRFS_QGROUP_STATUS_FLAG_ON  (1ULL  0)
  /*
 - * SCANNING is set during the initialization phase
 + * RESCAN is set during the initialization phase
   */
 -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING(1ULL  1)
 +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN  (1ULL  1)
  /*
   * Some qgroup entries are known to be out of date,
   * either because the configuration has changed in a way that
 @@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item {
* only used during scanning to record the progress
* of the scan. It contains a logical address
*/
 - __le64 scan;
 + __le64 rescan;
  } __attribute__ ((__packed__));
  
  struct btrfs_qgroup_info_item {
 @@ -1587,6 +1587,11 @@ struct btrfs_fs_info {
   /* used by btrfs_qgroup_record_ref for an efficient tree traversal */
   u64 qgroup_seq;
  
 + /* qgroup rescan items */
 + struct mutex qgroup_rescan_lock; /* protects the progress item */
 + struct btrfs_key qgroup_rescan_progress;
 + struct btrfs_workers qgroup_rescan_workers;
 +
   /* filesystem state */
   unsigned long fs_state;
  
 @@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
 btrfs_qgroup_status_item,
  version, 64);
  BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
  flags, 64);
 -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
 -scan, 64);
 +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
 +rescan, 64);
  
  /* btrfs_qgroup_info_item */
  BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
 @@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info);
  int btrfs_quota_disable(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info);
 -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
 +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
  int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info, u64 src, u64 dst);
  int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 6d19a0a..60d15fe 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb,
   fs_info-qgroup_seq = 1;
   fs_info-quota_enabled = 0;
   fs_info-pending_quota_state = 0;
 + mutex_init(fs_info-qgroup_rescan_lock);
  
   btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
   btrfs_init_free_cluster(fs_info-data_alloc_cluster);
 @@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb,
   btrfs_init_workers(fs_info-readahead_workers, readahead,
  fs_info-thread_pool_size,
  fs_info-generic_worker);
 + btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
 +fs_info-generic_worker);
  
   /*
* endios are largely parallel and should have a very
 @@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb,
   ret |= btrfs_start_workers(fs_info-caching_workers);
   ret |= btrfs_start_workers(fs_info-readahead_workers);
   ret |= btrfs_start_workers(fs_info-flush_workers);
 + ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
   if (ret) {
   err = -ENOMEM;
   goto fail_sb_buffer;
 @@ -2773,6 +2777,7 @@ fail_sb_buffer:
   btrfs_stop_workers(fs_info-delayed_workers);
   btrfs_stop_workers(fs_info-caching_workers);
   btrfs_stop_workers(fs_info-flush_workers);
 + btrfs_stop_workers(fs_info-qgroup_rescan_workers);
  fail_alloc:
  fail_iput:
   btrfs_mapping_tree_free(fs_info-mapping_tree);
 @@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root)
   btrfs_stop_workers(fs_info-caching_workers);
   

[PATCH] btrfs-progs: delete unused function btrfs_read_super_device

2013-04-16 Thread Anand Jain
Signed-off-by: Anand Jain anand.j...@oracle.com
---
 volumes.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/volumes.c b/volumes.c
index b555ded..7a9b6f0 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1643,15 +1643,6 @@ static int read_one_dev(struct btrfs_root *root,
return ret;
 }
 
-int btrfs_read_super_device(struct btrfs_root *root, struct extent_buffer *buf)
-{
-   struct btrfs_dev_item *dev_item;
-
-   dev_item = (struct btrfs_dev_item *)offsetof(struct btrfs_super_block,
-dev_item);
-   return read_one_dev(root, buf, dev_item);
-}
-
 int btrfs_read_sys_array(struct btrfs_root *root)
 {
struct btrfs_super_block *super_copy = root-fs_info-super_copy;
-- 
1.8.1.227.g44fe835

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 11:20 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 The function is separated into a preparation part and the three accounting
 steps mentioned in the qgroups documentation. The goal is to make steps two
 and three usable by the rescan functionality. A side effect is that the
 function is restructured into readable subunits.
 
 
 How about renaming the three functions like:
 
 1 qgroup_walk_old_roots()
 2 qgroup_walk_new_root()
 3 qgroup_rewalk_old_root()
 
 I'd like this function to be meaningful, but not just step1,2,3.
 Maybe you can think out better function name.

I'd like to keep it like 1, 2, 3, because that matches the documentation in the
qgroup pdf and the code has always been documented in those three steps.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 11:26 (+0200), Wang Shilong wrote:
 Hello, Jan
 
 If qgroup tracking is out of sync, a rescan operation can be started. It
 iterates the complete extent tree and recalculates all qgroup tracking data.
 This is an expensive operation and should not be used unless required.

 A filesystem under rescan can still be umounted. The rescan continues on the
 next mount.  Status information is provided with a separate ioctl while a
 rescan operation is in progress.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
  fs/btrfs/ctree.h   |   17 ++-
  fs/btrfs/disk-io.c |6 +
  fs/btrfs/ioctl.c   |   83 ++--
  fs/btrfs/qgroup.c  |  295 
 +--
  include/uapi/linux/btrfs.h |   12 ++-
  5 files changed, 378 insertions(+), 35 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 0d82922..bd4e2a7 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1019,9 +1019,9 @@ struct btrfs_block_group_item {
   */
  #define BTRFS_QGROUP_STATUS_FLAG_ON (1ULL  0)
  /*
 - * SCANNING is set during the initialization phase
 + * RESCAN is set during the initialization phase
   */
 -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING   (1ULL  1)
 +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN (1ULL  1)
  /*
   * Some qgroup entries are known to be out of date,
   * either because the configuration has changed in a way that
 @@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item {
   * only used during scanning to record the progress
   * of the scan. It contains a logical address
   */
 -__le64 scan;
 +__le64 rescan;
  } __attribute__ ((__packed__));
  
  struct btrfs_qgroup_info_item {
 @@ -1587,6 +1587,11 @@ struct btrfs_fs_info {
  /* used by btrfs_qgroup_record_ref for an efficient tree traversal */
  u64 qgroup_seq;
  
 +/* qgroup rescan items */
 +struct mutex qgroup_rescan_lock; /* protects the progress item */
 +struct btrfs_key qgroup_rescan_progress;
 +struct btrfs_workers qgroup_rescan_workers;
 +
  /* filesystem state */
  unsigned long fs_state;
  
 @@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
 btrfs_qgroup_status_item,
 version, 64);
  BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
 flags, 64);
 -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
 -   scan, 64);
 +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
 +   rescan, 64);
  
  /* btrfs_qgroup_info_item */
  BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
 @@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle 
 *trans,
 struct btrfs_fs_info *fs_info);
  int btrfs_quota_disable(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info);
 -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
 +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
  int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 src, u64 dst);
  int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 6d19a0a..60d15fe 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb,
  fs_info-qgroup_seq = 1;
  fs_info-quota_enabled = 0;
  fs_info-pending_quota_state = 0;
 +mutex_init(fs_info-qgroup_rescan_lock);
  
  btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
  btrfs_init_free_cluster(fs_info-data_alloc_cluster);
 @@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb,
  btrfs_init_workers(fs_info-readahead_workers, readahead,
 fs_info-thread_pool_size,
 fs_info-generic_worker);
 +btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
 +   fs_info-generic_worker);
  
  /*
   * endios are largely parallel and should have a very
 @@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb,
  ret |= btrfs_start_workers(fs_info-caching_workers);
  ret |= btrfs_start_workers(fs_info-readahead_workers);
  ret |= btrfs_start_workers(fs_info-flush_workers);
 +ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
  if (ret) {
  err = -ENOMEM;
  goto fail_sb_buffer;
 @@ -2773,6 +2777,7 @@ fail_sb_buffer:
  btrfs_stop_workers(fs_info-delayed_workers);
  btrfs_stop_workers(fs_info-caching_workers);
  btrfs_stop_workers(fs_info-flush_workers);
 +btrfs_stop_workers(fs_info-qgroup_rescan_workers);
  fail_alloc:
  fail_iput:
  btrfs_mapping_tree_free(fs_info-mapping_tree);
 @@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root)
  

Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Wang Shilong
Jan Schmidt 写道:

..[snip]..
 
 I don't see where I'm calling ulist_free(roots) in the error path. One of us 
 is
 missing something :-)


Yeah, you are right. i read a more '}'.. ^_^

Thanks,
Wang

 
 +   spin_lock(fs_info-qgroup_lock);
 +   seq = fs_info-qgroup_seq;
 +   fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */
 +
 +   ulist_reinit(tmp);
 +   ULIST_ITER_INIT(uiter);
 +   while ((unode = ulist_next(roots, uiter))) {
 +   struct btrfs_qgroup *qg;
 +
 +   qg = find_qgroup_rb(fs_info, unode-val);
 +   if (!qg)
 +   continue;
 +
 +   ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);

 For this patch, you forget to add the check about ulist_add(), ulist_add() 
 may
 return -ENOMEM. In fact, i have sent the patch to fix this problem in 
 qgroup.c before.
 So you don't need to change patch1, but you dose need to add the check in 
 the patch2.
 
 Thanks for noticing, I'll send a fix in a few days to leave room for more 
 comments.
 
 -Jan
 
 +   }

[snip]

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Wang Shilong
Hello Jan,

 On Tue, April 16, 2013 at 11:20 (+0200), Wang Shilong wrote:
 Hello Jan,

 The function is separated into a preparation part and the three accounting
 steps mentioned in the qgroups documentation. The goal is to make steps two
 and three usable by the rescan functionality. A side effect is that the
 function is restructured into readable subunits.

 How about renaming the three functions like:

 1 qgroup_walk_old_roots()
 2 qgroup_walk_new_root()
 3 qgroup_rewalk_old_root()

 I'd like this function to be meaningful, but not just step1,2,3.
 Maybe you can think out better function name.
 
 I'd like to keep it like 1, 2, 3, because that matches the documentation in 
 the
 qgroup pdf and the code has always been documented in those three steps.


Oh, Yes, i have read the pdf carefully. I think the pdf document it three steps
just to make it clear that we need 3 steps. But static checker may want to know 
what is 3 steps
just by the function name but not to read the pdf.

In fact the tree steps are just do:
1walk old roots
2walk new root
3rewalk old root

So i think rename the function like these will make things better. ^_^

Thanks,
Wang

 
 Thanks,
 -Jan
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Wang Shilong
Hello Jan,
 

   slot = path-slots[0];
   ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item);
 + spin_lock(fs_info-qgroup_lock);


Why we need hold qgroup_lock here? would you please explain...

Thanks,
Wang

   btrfs_set_qgroup_status_flags(l, ptr, fs_info-qgroup_flags);
   btrfs_set_qgroup_status_generation(l, ptr, trans-transid);
 - /* XXX scan */
 + btrfs_set_qgroup_status_rescan(l, ptr,
 + fs_info-qgroup_rescan_progress.objectid);
 + spin_unlock(fs_info-qgroup_lock);

  
   btrfs_mark_buffer_dirty(l);
  
 @@ -830,7 +854,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   fs_info-qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON |
   BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
   btrfs_set_qgroup_status_flags(leaf, ptr, fs_info-qgroup_flags);
 - btrfs_set_qgroup_status_scan(leaf, ptr, 0);
 + btrfs_set_qgroup_status_rescan(leaf, ptr, 0);
  
   btrfs_mark_buffer_dirty(leaf);
  
 @@ -894,10 +918,11 @@ out:
   return ret;
  }
  
 -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info)
 +static void qgroup_dirty(struct btrfs_fs_info *fs_info,
 +  struct btrfs_qgroup *qgroup)
  {
 - /* FIXME */
 - return 0;
 + if (list_empty(qgroup-dirty))
 + list_add(qgroup-dirty, fs_info-dirty_qgroups);
  }
  
  int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
 @@ -1045,13 +1070,6 @@ unlock:
   return ret;
  }
  
 -static void qgroup_dirty(struct btrfs_fs_info *fs_info,
 -  struct btrfs_qgroup *qgroup)
 -{
 - if (list_empty(qgroup-dirty))
 - list_add(qgroup-dirty, fs_info-dirty_qgroups);
 -}
 -
  /*
   * btrfs_qgroup_record_ref is called when the ref is added or deleted. it 
 puts
   * the modification into a list that's later used by btrfs_end_transaction to
 @@ -1256,6 +1274,15 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
 *trans,
   BUG();
   }
  
 + mutex_lock(fs_info-qgroup_rescan_lock);
 + if (fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 + if (fs_info-qgroup_rescan_progress.objectid = node-bytenr) {
 + mutex_unlock(fs_info-qgroup_rescan_lock);
 + return 0;
 + }
 + }
 + mutex_unlock(fs_info-qgroup_rescan_lock);
 +
   /*
* the delayed ref sequence number we pass depends on the direction of
* the operation. for add operations, we pass (node-seq - 1) to skip
 @@ -1269,7 +1296,17 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
 *trans,
   if (ret  0)
   return ret;
  
 + mutex_lock(fs_info-qgroup_rescan_lock);
   spin_lock(fs_info-qgroup_lock);
 + if (fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 + if (fs_info-qgroup_rescan_progress.objectid = node-bytenr) {
 + ret = 0;
 + mutex_unlock(fs_info-qgroup_rescan_lock);
 + goto unlock;
 + }
 + }
 + mutex_unlock(fs_info-qgroup_rescan_lock);
 +
   quota_root = fs_info-quota_root;
   if (!quota_root)
   goto unlock;
 @@ -1652,3 +1689,233 @@ void assert_qgroups_uptodate(struct 
 btrfs_trans_handle *trans)
   trans-delayed_ref_elem.seq);
   BUG();
  }
 +
 +/*
 + * returns  0 on error, 0 when more leafs are to be scanned.
 + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared.
 + */
 +static int
 +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path,
 +struct btrfs_trans_handle *trans, struct ulist *tmp,
 +struct extent_buffer *scratch_leaf)
 +{
 + struct btrfs_key found;
 + struct btrfs_fs_info *fs_info = qscan-fs_info;
 + struct ulist *roots = NULL;
 + struct ulist_node *unode;
 + struct ulist_iterator uiter;
 + struct seq_list tree_mod_seq_elem = {};
 + u64 seq;
 + int slot;
 + int ret;
 +
 + path-leave_spinning = 1;
 + mutex_lock(fs_info-qgroup_rescan_lock);
 + ret = btrfs_search_slot_for_read(fs_info-extent_root,
 +  fs_info-qgroup_rescan_progress,
 +  path, 1, 0);
 +
 + pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n,
 +  (unsigned long long)fs_info-qgroup_rescan_progress.objectid,
 +  fs_info-qgroup_rescan_progress.type,
 +  (unsigned long long)fs_info-qgroup_rescan_progress.offset,
 +  ret);
 +
 + if (ret) {
 + /*
 +  * The rescan is about to end, we will not be scanning any
 +  * further blocks. We cannot unset the RESCAN flag here, because
 +  * we want to commit the transaction if everything went well.
 +  * To make the live accounting work in this phase, we set our
 +  * scan progress pointer such that 

[PATCH] Btrfs: remove unused variable in the iterate_extent_inodes()

2013-04-16 Thread Wang Shilong
Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
---
 fs/btrfs/backref.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index bd605c8..fa531e8 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1461,8 +1461,6 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
iterate_extent_inodes_t *iterate, void *ctx)
 {
int ret;
-   struct list_head data_refs = LIST_HEAD_INIT(data_refs);
-   struct list_head shared_refs = LIST_HEAD_INIT(shared_refs);
struct btrfs_trans_handle *trans;
struct ulist *refs = NULL;
struct ulist *roots = NULL;
-- 1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread David Sterba
On Tue, Apr 16, 2013 at 05:56:05PM +0800, Wang Shilong wrote:
 But static checker may want to know what is 3 steps
 just by the function name but not to read the pdf.

I'm curious what static checker you mean and how a function name helps
there.

thanks,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Wang Shilong
Hi David,

 On Tue, Apr 16, 2013 at 05:56:05PM +0800, Wang Shilong wrote:
 But static checker may want to know what is 3 steps
 just by the function name but not to read the pdf.
 
 I'm curious what static checker you mean and how a function name helps
 there.


I mean that other developers will get more information by the function name.
In fact, the tree steps they do more things.So maybe
the function name of mine is not good enough.
But i really hope that a meaningful function name gives more information. I 
don't
insist on this anyway ^_^

Thanks,
Wang

 
 thanks,
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 12:08 (+0200), Wang Shilong wrote:
 Hello Jan,
  
 
  slot = path-slots[0];
  ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item);
 +spin_lock(fs_info-qgroup_lock);
 
 
 Why we need hold qgroup_lock here? would you please explain...

It would have been easier for me if you had left the relevant context in there,
but I finally found it.

Thinking again about it, as update_qgroup_status_item is only called from
transaction commit context, we can do without a spinlock here. I meant to
protect fs_info-qgroup_flags and fs_info-qgroup_rescan_progress, but it seems
not required.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread David Sterba
On Tue, Apr 16, 2013 at 10:45:19AM +0200, Jan Schmidt wrote:
 +static long btrfs_ioctl_quota_rescan_status(struct file *file, void __user 
 *arg)
 +{
 + struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root;
 + struct btrfs_ioctl_quota_rescan_args *qsa;
 + int ret = 0;
 +
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 +
 + qsa = kzalloc(sizeof(*qsa), GFP_NOFS);

The reserved field increased the size of qsa to 64 bytes, thinking about
it again, an early ENOMEM is a good indicator that the system is unable
to get memory so starting a bigger operation does not make much sense
anyway. Keep it as it is.

 + if (!qsa)
 + return -ENOMEM;
 +
 + if (root-fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 + qsa-flags = 1;
 + qsa-progress = root-fs_info-qgroup_rescan_progress.objectid;
 + }
 +
 + if (copy_to_user(arg, qsa, sizeof(*qsa)))
 + ret = -EFAULT;
 +
 + kfree(qsa);
 + return ret;
 +}
 +
 --- a/include/uapi/linux/btrfs.h
 +++ b/include/uapi/linux/btrfs.h
 @@ -376,12 +376,18 @@ struct btrfs_ioctl_get_dev_stats {
  
  #define BTRFS_QUOTA_CTL_ENABLE   1
  #define BTRFS_QUOTA_CTL_DISABLE  2
 -#define BTRFS_QUOTA_CTL_RESCAN   3
 +#define BTRFS_QUOTA_CTL_RESCAN__NOTUSED  3

Looks ok.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread David Sterba
On Tue, Apr 16, 2013 at 06:47:15PM +0800, Wang Shilong wrote:
 I mean that other developers will get more information by the function name.

I don't dare to suggest putting a comment before the functions

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Activating space_cache after read-only snapshots without space_cache have been taken

2013-04-16 Thread Ochi

On 04/16/2013 10:10 AM, Sander wrote:

Liu Bo wrote (ao):

On Tue, Apr 16, 2013 at 02:28:51AM +0200, Ochi wrote:

The situation is the following: I have created a backup-volume to
which I regularly rsync a backup of my system into a subvolume.
After rsync'ing, I take a _read-only_ snapshot of that subvolume
with a timestamp added to its name.

Now at the time I started using this backup volume, I was _not_
using the space_cache mount option and two read-only snapshots were
taken during this time. Then I started using the space_cache option
and continued doing snapshots.

A bit later, I started having very long lags when unmounting the
backup volume (both during shutdown and when unmounting manually). I
scrubbed and fsck'd the volume but this didn't show any errors.
Defragmenting the root and subvolumes took a long time but didn't
improve the situation much.


So are you using '-o nospace_cache' when creating two RO snapshots?


No, he first created two ro snapshots, then (some time later) mounted
with nospace_cache, and then continued to take ro snapshots.


I need to clarify this: The NOspace_cache option was never used, I just 
didn't explicitly activate space_cache in the beginning. However, I was 
not aware that space_cache is the default anyways (at least in Arch 
which is the distro I'm using). I reviewed old system logs and it 
actually looks like space caching was always being used right from the 
beginning, even when I didn't explicitly use the space_cache mount 
option. So I guess this wasn't the problem after all :\



Now I started having the suspicion that maybe the space cache
possibly couldn't be written to disk for the readonly
subvolumes/snapshots that were created during the time when I wasn't
using the space_cache option, forcing the cache to be rebuilt every
time.

Clearing the cache didn't help. But when I deleted the two snapshots
that I think were taken during the time without the mount option,
the unmounting time seems to have improved considerably.


I don't know why this happens, but maybe you can observe the umount
process's very slow behaviour by using 'cat /proc/{umount-pid}/stack'
or 'perf top'.


AFAIUI the problem is not there anymore, but this is a good tip for the
future.

Sander


That's correct, the problem has vanished after the deletion of the 
oldest two snapshots. Mounting and unmounting is reasonably fast now. I 
will just continue to use the volume normally (i.e. making regular 
backups and snapshotting) and report back if the problem appears again.


Just for the records: The btrfs volume and the first snapshots were 
originally created under kernel 3.7.10. I then updated to 3.8.3. I don't 
know if this information is useful - just in case... :)


Thanks,
Sebastian

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: a copy of superblock is zero may not mean btrfs is not there

2013-04-16 Thread David Sterba
On Fri, Apr 12, 2013 at 03:55:06PM +0800, Anand Jain wrote:
 If one of the copy of the superblock is zero it does not
 confirm to us that btrfs isn't there on that disk. When
 we are having more than one copy of superblock we should
 rather let the for loop to continue to check other copies.
 
 the following test case and results would justify the
 fix
 
 mkfs.btrfs /dev/sdb /dev/sdc -f
 mount /dev/sdb /btrfs
 dd if=/dev/zero bs=1 count=8 of=/dev/sdc seek=$((64*1024+64))
 ~/before/btrfs-select-super -s 1 /dev/sdc
 using SB copy 1, bytenr 67108864
 
 here btrfs-select-super just wrote superblock to a mounted btrfs

Why does not check_mounted() catch this in the first place? Ie. based on
the status in /proc/mounts not on random bytes in the superblock.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Wang Shilong

Hello Jan, more comments below..

[...snip..]

  
 +
 +static long btrfs_ioctl_quota_rescan_status(struct file *file, void __user 
 *arg)
 +{
 + struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root;
 + struct btrfs_ioctl_quota_rescan_args *qsa;
 + int ret = 0;
 +
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 +
 + qsa = kzalloc(sizeof(*qsa), GFP_NOFS);
 + if (!qsa)
 + return -ENOMEM;
 +

Here, i think we should hold qgroup_rescan_lock and group_lock:

1 qgroup_rescan protect BTRFS_QGROUP_STATUS_RESCAN  
2quota disabling may happen this time..so group_lock should also be 
held here.


 + if (root-fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 + qsa-flags = 1;
 + qsa-progress = root-fs_info-qgroup_rescan_progress.objectid;
 + }
 +
 + if (copy_to_user(arg, qsa, sizeof(*qsa)))
 + ret = -EFAULT;
 +
 + kfree(qsa);
 + return ret;
 +}
 +
  
[….snip...]
 
 +
 +/*
 + * returns  0 on error, 0 when more leafs are to be scanned.
 + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared.
 + */
 +static int
 +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path,
 +struct btrfs_trans_handle *trans, struct ulist *tmp,
 +struct extent_buffer *scratch_leaf)
 +{
 + struct btrfs_key found;
 + struct btrfs_fs_info *fs_info = qscan-fs_info;
 + struct ulist *roots = NULL;
 + struct ulist_node *unode;
 + struct ulist_iterator uiter;
 + struct seq_list tree_mod_seq_elem = {};
 + u64 seq;
 + int slot;
 + int ret;
 +
 + path-leave_spinning = 1;
 + mutex_lock(fs_info-qgroup_rescan_lock);

Here in qgroup_rescan_leaf(), we don't need hold group_rescan_lock.
Because qgroup_rescan_lock is used to protect qgroup_flag, in 
group_rescan_leaf().
we don't change qgroup_flag.. So we don't need hold the group_rescan_lock.

Maybe we can just remove the lock qgroup_rescan_lock,  and i think what 
qgroup_rscan_lock
does that qgroup_lock can replace.


 + ret = btrfs_search_slot_for_read(fs_info-extent_root,
 +  fs_info-qgroup_rescan_progress,
 +  path, 1, 0);
 +
 + pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n,
 +  (unsigned long long)fs_info-qgroup_rescan_progress.objectid,
 +  fs_info-qgroup_rescan_progress.type,
 +  (unsigned long long)fs_info-qgroup_rescan_progress.offset,
 +  ret);
 +
 + if (ret) {
 + /*
 +  * The rescan is about to end, we will not be scanning any
 +  * further blocks. We cannot unset the RESCAN flag here, because
 +  * we want to commit the transaction if everything went well.
 +  * To make the live accounting work in this phase, we set our
 +  * scan progress pointer such that every real extent objectid
 +  * will be smaller.
 +  */
 + fs_info-qgroup_rescan_progress.objectid = (u64)-1;
 + btrfs_release_path(path);
 + mutex_unlock(fs_info-qgroup_rescan_lock);
 + return ret;
 + }
 +
 + btrfs_item_key_to_cpu(path-nodes[0], found,
 +   btrfs_header_nritems(path-nodes[0]) - 1);
 + fs_info-qgroup_rescan_progress.objectid = found.objectid + 1;
 +
 + btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem);
 + memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf));
 + slot = path-slots[0];
 + btrfs_release_path(path);
 + mutex_unlock(fs_info-qgroup_rescan_lock);
 +
 + for (; slot  btrfs_header_nritems(scratch_leaf); ++slot) {
 + btrfs_item_key_to_cpu(scratch_leaf, found, slot);
 + if (found.type != BTRFS_EXTENT_ITEM_KEY)
 + continue;
 + ret = btrfs_find_all_roots(trans, fs_info, found.objectid,
 +tree_mod_seq_elem.seq, roots);
 + if (ret  0)
 + break;
 + spin_lock(fs_info-qgroup_lock);

Quota may has been disabled now, so please adds the check, otherwise
we may get a NULL pointer panic here.


Thanks,
Wang
 + seq = fs_info-qgroup_seq;
 + fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */
 +
 + ulist_reinit(tmp);
 + ULIST_ITER_INIT(uiter);
 + while ((unode = ulist_next(roots, uiter))) {
 + struct btrfs_qgroup *qg;
 +
 + qg = find_qgroup_rb(fs_info, unode-val);
 + if (!qg)
 + continue;
 +
 + ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);
 + }
 +
 + /* this is similar to step 2 of btrfs_qgroup_account_ref */
 + ULIST_ITER_INIT(uiter);
 + while ((unode = 

Re: [PATCH 00/17] Btrfs-progs: some receive related patches

2013-04-16 Thread David Sterba
Hi,
On Tue, Apr 09, 2013 at 07:08:28PM +0200, Stefan Behrens wrote:
 Alex Lyakas (1):
   btrfs-progs: Fix the receive code pathing
 
 Stefan Behrens (16):
   Btrfs-progs: Use /proc/mounts instead of /etc/mtab
   Btrfs-progs: ignore subvols above BTRFS_LAST_FREE_OBJECTID
   Btrfs-progs: close file descriptor in cmds-send.c
   Btrfs-progs: fix a small memory leak in btrfs-list.c
   Btrfs-progs: add a function to free subvol_uuid_search memory
   Btrfs-progs: cleanup subvol_uuid_search memory in btrfs send/receive
   Btrfs-progs: free memory and close file descriptor in btrfs receive
   Btrfs-progs: Set the root-id for received subvols in btrfs receive
   Btrfs-progs: btrfs-receive: different levels (amount) of debug output
   Btrfs-progs: small parent_subvol cleanup for cmds-receive.c
   Btrfs-progs: fix bug in find_root_gen
   Btrfs-progs: btrfs-receive optionally honors the end-cmd
   Btrfs-progs: don't allocate one byte too much each time
   Btrfs-progs: Fix that BTRFS_FSID_SIZE is used instead of
 BTRFS_UUID_SIZE
   Btrfs-progs: remove some unused code
   Btrfs-progs: allow to receive to relative directories

All merged, thanks.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs scrub gives unable to find logical $hugenum len 16384

2013-04-16 Thread Martin Steigerwald
On Saturday 13 April 2013 17:48:31 Martin Steigerwald wrote:
 Hi!
 
 Please answer soon whether it would be a good idea to replay a backup
 right now as I am leaving to Berlin tomorrow for a week without my backup
 drive with me. Well, I made space on an external 2,5 inch drive, that I
 can take with me. I am taking that one with me, after having made sure it
 has a consistent backup. :)

Ping.

Any hints on this one? I am going to recreate the filesystem next weekend at 
latest.

I did not see any I/O or BTRFS errors in logs so far, so filesystem appears 
to be good.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] Btrfs: automatic rescan after quota enable command

2013-04-16 Thread Wang Shilong
Hi Chris,

I' ve put my efforts in btrfs quota for a period of time. and i send a 
bunch of patches
about btrfs quota.
If you pull Jan's qgroup rescan patches, i'd like that you pull my this 
patch firstly:
https://patchwork.kernel.org/patch/2402871/
Jan's group rescan patch relies on this patch.

Besides, i make a patch-set  to fix a btrfs quota's race condition:
https://patchwork.kernel.org/patch/2402901/
https://patchwork.kernel.org/patch/2402911/
https://patchwork.kernel.org/patch/2402881/
https://patchwork.kernel.org/patch/2402891/
https://patchwork.kernel.org/patch/2402921/

Some bug fixes about btrfs quota:
https://patchwork.kernel.org/patch/2356111/ 
https://patchwork.kernel.org/patch/2402941/
https://patchwork.kernel.org/patch/2407571/
https://patchwork.kernel.org/patch/2420731/ 
https://patchwork.kernel.org/patch/2426721/
https://patchwork.kernel.org/patch/2445051/
https://patchwork.kernel.org/patch/2368341/
https://patchwork.kernel.org/patch/2368291/


This patch improve performances of  ulist that btrfs quota ,send
relies on this:
https://patchwork.kernel.org/patch/2435001/

some minor cleanups:
https://patchwork.kernel.org/patch/2441951/ 
https://patchwork.kernel.org/patch/2444521/
https://patchwork.kernel.org/patch/2448741/


These patches are mainly related to btrfs quota. and i have sent them
to btrfs list  reviewed by the people for a period of time. Thanks to Miao 
Xie's help
to my efforts in btrfs. And Arne Jasen, Jan schdmit , David really help review
my patch, many thanks.

Every efforts i have made is to make btrfs quota works well. After you pull
my patches and Jan's qgroup rescan patch, i will have a deep look at codes, and
test Jan's patch.

I am really newbie, and usually  makes a lot of mistakes.. forgive me if i do
something wrong….


Thanks,
Wang

 
 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
 fs/btrfs/qgroup.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index bb081b5..0ea2c3e 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -1356,10 +1356,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
 *trans,
 {
   struct btrfs_root *quota_root = fs_info-quota_root;
   int ret = 0;
 + int start_rescan_worker = 0;
 
   if (!quota_root)
   goto out;
 
 + if (!fs_info-quota_enabled  fs_info-pending_quota_state)
 + start_rescan_worker = 1;
 +
   fs_info-quota_enabled = fs_info-pending_quota_state;
 
   spin_lock(fs_info-qgroup_lock);
 @@ -1385,6 +1389,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
   if (ret)
   fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 
 + if (start_rescan_worker) {
 + ret = btrfs_qgroup_rescan(fs_info);
 + if (ret)
 + pr_err(btrfs: start rescan quota failed: %d\n, ret);
 + }
 +
 out:
 
   return ret;
 -- 
 1.7.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix reada debug code compilation

2013-04-16 Thread David Sterba
On Tue, Apr 16, 2013 at 10:15:25AM +0200, Vincent wrote:
 This fixes the following errors:
 
   fs/btrfs/reada.c: In function ‘btrfs_reada_wait’:
   fs/btrfs/reada.c:958:42: error: invalid operands to binary  (have 
 ‘atomic_t’ and ‘int’)
   fs/btrfs/reada.c:961:41: error: invalid operands to binary  (have 
 ‘atomic_t’ and ‘int’)
 
 Signed-off-by: Vincent Stehlé vincent.ste...@laposte.net
 Cc: Chris Mason chris.ma...@fusionio.com
 Cc: linux-btrfs@vger.kernel.org

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: return error when we specify wrong start to defrag

2013-04-16 Thread David Sterba
On Tue, Apr 16, 2013 at 05:20:28PM +0800, Liu Bo wrote:
 We need such a sanity check for wrong start when we defrag a file, otherwise,
 even with a wrong start that's larger than file size, we can end up changing
 not only inode's force compress flag but also FS's incompat flags.

The range-start check is good, but why are you worried about the
incompat flag? LZO support has been around for more than 2 years.

 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1152,8 +1152,11 @@ int btrfs_defrag_file(struct inode *inode, struct file 
 *file,
   u64 new_align = ~((u64)128 * 1024 - 1);
   struct page **pages = NULL;
  
 - if (extent_thresh == 0)
 - extent_thresh = 256 * 1024;
 + if (isize == 0)
 + return 0;
 +
 + if (range-start = isize)
 + return -EINVAL;
...
 - if (isize == 0)
 - return 0;
 + if (extent_thresh == 0)
 + extent_thresh = 256 * 1024;

That's a more logical order of the checks, good.

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: return error when we specify wrong start to defrag

2013-04-16 Thread Liu Bo
On Tue, Apr 16, 2013 at 03:38:28PM +0200, David Sterba wrote:
 On Tue, Apr 16, 2013 at 05:20:28PM +0800, Liu Bo wrote:
  We need such a sanity check for wrong start when we defrag a file, 
  otherwise,
  even with a wrong start that's larger than file size, we can end up changing
  not only inode's force compress flag but also FS's incompat flags.
 
 The range-start check is good, but why are you worried about the
 incompat flag? LZO support has been around for more than 2 years.

As the code of setting LZO incompat flags is just there, so I take it as a side
effect.

Well, I'm not worried now :)

 
  --- a/fs/btrfs/ioctl.c
  +++ b/fs/btrfs/ioctl.c
  @@ -1152,8 +1152,11 @@ int btrfs_defrag_file(struct inode *inode, struct 
  file *file,
  u64 new_align = ~((u64)128 * 1024 - 1);
  struct page **pages = NULL;
   
  -   if (extent_thresh == 0)
  -   extent_thresh = 256 * 1024;
  +   if (isize == 0)
  +   return 0;
  +
  +   if (range-start = isize)
  +   return -EINVAL;
 ...
  -   if (isize == 0)
  -   return 0;
  +   if (extent_thresh == 0)
  +   extent_thresh = 256 * 1024;
 
 That's a more logical order of the checks, good.
 
 Reviewed-by: David Sterba dste...@suse.cz

Thanks for the quick review!

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro

2013-04-16 Thread Jon Nelson
Tried to mount with -o recovery using 3.8.7.  No change. Does
anybody have any suggestions?


On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote:
 I have a 4-disk btrfs filesystem in raid1 mode.
 I'm running openSUSE 12.3, 3.7.10, x86_64.
 A few days ago something went wrong and the filesystem re-mounted itself RO.
 After reboot, it didn't come up.
 After a fair bit of work, I can get the filesystem to mount with -o
 recovery,ro.  However, if I use -o recovery alone or any other option
 I eventually hit a BUG and that's that. I've tried with up to kernel
 3.8.6 without improvement.

 My first question is this: how I can make it so I can use the
 filesystem without having to mount it with -o recovery,ro from a
 rescue environment (I have imaged all four drives *and* made a full
 filesystem-level backup, except for snapshots and some others).

 My second set of question is: what went wrong initially, what went
 wrong with the recovery(s), and are there fixes in kernels after 3.8.6
 that might be involved?

 I have *some* logs, and I might be able to share portions of them.
 I also took a btrfs-image.


 Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me:
 ERROR: device scan failed '/dev/sdb' - Device or resource busy
 ERROR: device scan failed '/dev/sda' - Device or resource busy
 failed to open /dev/sr0: No medium found
 ERROR: device scan failed '/dev/sdb' - Device or resource busy
 ERROR: device scan failed '/dev/sda' - Device or resource busy
 failed to open /dev/sr0: No medium found
 checking extents
 Backref 341888225280 parent 2621340434432 owner 0 offset 0 num_refs 0
 not found in extent tree
 Incorrect local backref count on 341888225280 parent 2621340434432
 owner 0 offset 0 found 1 wanted 0 back 0x6dc8500
 Incorrect local backref count on 341888225280 root 1 owner 496 offset
 0 found 0 wanted 1 back 0x2bb636c0
 backpointer mismatch on [341888225280 262144]
 Unable to find block group for 0
 btrfs: extent-tree.c:284: find_search_start: Assertion `!(1)' failed.
 enabling repair mode
 Checking filesystem on /dev/sdd
 UUID: 7feedf1e-9711-4900-af9c-92738ea8aace


 and some of the errors are here:

 [  314.095449] [ cut here ]
 [  314.095526] WARNING: at
 /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5208
 __btrfs_free_extent+0x853/0x890 [btrfs]()
 [  314.095541] Hardware name: TA790GX XE
 [  314.09] Modules linked in: dm_mod af_packet
 cpufreq_conservative cpufreq_userspace cpufreq_powersave
 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
 snd_hwdep snd_pcm snd_timer snd bt
 rfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon
 sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm
 i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core
 edac_mce_amd thermal
 ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode
 crc_ccitt wmi soundcore snd_page_alloc button autofs4
 [  314.095867] Pid: 5310, comm: btrfs-transacti Not tainted 3.8.6-2-desktop #1
 [  314.095875] Call Trace:
 [  314.095904]  [81004748] dump_trace+0x88/0x300
 [  314.095923]  [815a9128] dump_stack+0x69/0x6f
 [  314.095937]  [81044f49] warn_slowpath_common+0x79/0xc0
 [  314.095968]  [a0400db3] __btrfs_free_extent+0x853/0x890 [btrfs]
 [  314.096061]  [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs]
 [  314.096147]  [a0408a9a] btrfs_run_delayed_refs+0xca/0x320 [btrfs]
 [  314.096249]  [a04182e0] btrfs_commit_transaction+0x80/0xb00 
 [btrfs]
 [  314.096379]  [a0411b4d] transaction_kthread+0x19d/0x220 [btrfs]
 [  314.096492]  [81068043] kthread+0xb3/0xc0
 [  314.096506]  [815bbf7c] ret_from_fork+0x7c/0xb0
 [  314.096515] ---[ end trace 64d3998241407ddc ]---
 [  314.096520] btrfs unable to find ref byte nr 2621340344320 parent 0
 root 2  owner 1 offset 0
 [  314.096526] [ cut here ]
 [  314.096551] WARNING: at
 /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5265
 __btrfs_free_extent+0x7ba/0x890 [btrfs]()
 [  314.096554] Hardware name: TA790GX XE
 [  314.096556] Modules linked in: dm_mod af_packet
 cpufreq_conservative cpufreq_userspace cpufreq_powersave
 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
 snd_hwdep snd_pcm snd_timer snd btrfs acpi_cpufreq mperf kvm_amd
 zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom
 processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug
 sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic
 thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi
 soundcore snd_page_alloc button autofs4
 [  314.096613] Pid: 5310, comm: btrfs-transacti Tainted: GW
 3.8.6-2-desktop #1
 [  314.096615] Call Trace:
 [  314.096627]  [81004748] dump_trace+0x88/0x300
 [  314.096636]  [815a9128] dump_stack+0x69/0x6f
 [  314.096646]  

One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?

2013-04-16 Thread Matt Pursley
Hi All,

I have an LSI HBA card (LSI SAS 9207-8i) with 12 7200rpm SAS drives
attached.  When it's formated with mdraid6+ext4 I get about 1200MB/s
for multiple streaming random reads with iozone.  With btrfs in
3.9.0-rc4 I can also get about 1200MB/s, but only with one stream at a
time.

As soon as I add a second (or more), the speed will drop to about
750MB/s.  If I add more streams (10, 20, etc), the total throughput
stays at around 750MB/s.  I only see the full 1200MB/s in btrfs when
I'm running a single read at a time (e.g. sequential reads with dd,
random reads with iozone, etc).

This feel like a bug or mis-configuration on my system.   As if can
read at the full speed, but just only with one stream running at a
time.  The options I have tried varying are -l 64k with mkfs.btrfs,
and -o thread_pool=16 when mounting.  But, neither of those options
seem to change the behaviour.



Anyone know any reasons why I would see the speed drop when going from
one to more then one stream at a time with btrfs raid6?  We would like
to use btrfs (mostly for snapshots), but we do need to get the full
1200MB/s streaming speeds too..





Thanks,
Matt



___
Here's some example output..



Single thread = ~1.1GB/s
_
kura1 persist # sysctl vm.drop_caches=1 ; dd if=/dev/zero
of=/var/data/persist/testfile bs=640k count=2
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 7.14139 s, 1.8 GB/s

kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null
if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 11.2666 s, 1.2 GB/s

kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null
if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 11.5005 s, 1.1 GB/s





1 thread = ~1000MB/s ...
___
kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done
vm.drop_caches = 1
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 6.52018 s, 1.0 GB/s
kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done
vm.drop_caches = 1
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 6.55731 s, 999 MB/s
___


2 threads = ~750MB/s combined...
___
# sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null
if=/var/data/persist/testfile_$j bs=640k  done
vm.drop_caches = 1
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 17.5068 s, 374 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 17.7599 s, 369 MB/s
___



20 threads = ~750MB/s combined...
___
# sysctl vm.drop_caches=1 ; for j in {1..20} ; do dd of=/dev/null
if=/var/data/persist/testfile_$j bs=640k  done
vm.drop_caches = 1
kura1 scripts # 1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 168.223 s, 39.0 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 168.275 s, 38.9 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 169.466 s, 38.7 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 169.606 s, 38.6 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.503 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.629 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.633 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.744 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.844 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.896 s, 38.3 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.027 s, 38.3 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.135 s, 38.3 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.389 s, 38.2 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.414 s, 38.2 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.674 s, 38.2 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.897 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.956 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.995 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 172.044 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 172.08 s, 38.1 MB/s




### Similar results with random reads in iozone...

1 thread = ~1000MB/s
_
kura1 scripts # for j in {1..1} ; do sysctl vm.drop_caches=1 ; iozone
-f 

[PATCH] Btrfs-progs: fix csum check when extent lands on block group

2013-04-16 Thread Josef Bacik
I was running fsync() tests and noticed that occasionally I was getting a bunch
of errors from fsck complaining about csums not having corresponding extents.
Thankfully after a few days of debugging this it turned out to be a bug with
fsck.  The csums were for an extent that started at the same offset as a block
group, and were offset within the extent.  So the search put us out at the block
group item and we just walked forward from there, never finding the actual
extent.  This is because the block group item key is higher than the extent item
key, so it comes first.  In order to fix this we need to check and see if we
landed on a block group item and take another step backwards to make sure we end
up at the extent item.  With this patch my reproducer no longer finds csums that
don't have matching extent records.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 cmds-check.c |   24 ++--
 1 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 9a7696f..030ab77 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2624,6 +2624,7 @@ static int check_extent_exists(struct btrfs_root *root, 
u64 bytenr,
key.type = BTRFS_EXTENT_ITEM_KEY;
key.offset = 0;
 
+
 again:
ret = btrfs_search_slot(NULL, root-fs_info-extent_root, key, path,
0, 0);
@@ -2631,8 +2632,27 @@ again:
fprintf(stderr, Error looking up extent record %d\n, ret);
btrfs_free_path(path);
return ret;
-   } else if (ret  path-slots[0]) {
-   path-slots[0]--;
+   } else if (ret) {
+   if (path-slots[0])
+   path-slots[0]--;
+   else
+   btrfs_prev_leaf(root, path);
+   }
+
+   btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]);
+
+   /*
+* Block group items come before extent items if they have the same
+* bytenr, so walk back one more just in case.  Dear future traveler,
+* first congrats on mastering time travel.  Now if it's not too much
+* trouble could you go back to 2006 and tell Chris to make the
+* BLOCK_GROUP_ITEM_KEY lower than the EXTENT_ITEM_KEY please?
+*/
+   if (key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) {
+   if (path-slots[0])
+   path-slots[0]--;
+   else
+   btrfs_prev_leaf(root, path);
}
 
while (num_bytes) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: make restore deal with really broken file systems V2

2013-04-16 Thread Josef Bacik
All we need for restore to work is the chunk root, the tree root and the fs root
we want to restore from.  So to do this we need to make a few adjustments

1) Make open_ctree_fs_info fail completely if it can't read the chunk tree.
There is no sense in continuing if we can't read the chunk tree since we won't
be able to translate logical to physical blocks.

2) Use open_ctree_fs_info in restore, and if we didn't load a tree root or
fs root go ahead and try to set those up manually ourselves.

This is related to work I did last year on restore, but it uses the
open_ctree_fs_info instead of my open coded open_ctree.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
V1-V2: fix possible invalid pointer dereference pointed out by kdave.

 cmds-check.c   |2 +-
 cmds-restore.c |   51 ---
 debug-tree.c   |2 +-
 disk-io.c  |   37 ++---
 disk-io.h  |6 ++
 5 files changed, 58 insertions(+), 40 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 12192fa..bdf74ba 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -3609,7 +3609,7 @@ int cmd_check(int argc, char **argv)
return -EBUSY;
}
 
-   info = open_ctree_fs_info(argv[optind], bytenr, rw, 1);
+   info = open_ctree_fs_info(argv[optind], bytenr, 0, rw, 1);
if (info == NULL)
return 1;
 
diff --git a/cmds-restore.c b/cmds-restore.c
index f4e75cf..681672a 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -839,27 +839,64 @@ static int do_list_roots(struct btrfs_root *root)
 static struct btrfs_root *open_fs(const char *dev, u64 root_location,
  int super_mirror, int list_roots)
 {
+   struct btrfs_fs_info *fs_info = NULL;
struct btrfs_root *root = NULL;
u64 bytenr;
int i;
 
for (i = super_mirror; i  BTRFS_SUPER_MIRROR_MAX; i++) {
bytenr = btrfs_sb_offset(i);
-   root = open_ctree_recovery(dev, bytenr, root_location);
-   if (root)
+   fs_info = open_ctree_fs_info(dev, bytenr, root_location, 0, 1);
+   if (fs_info)
break;
fprintf(stderr, Could not open root, trying backup super\n);
}
 
-   if (root  list_roots) {
-   int ret = do_list_roots(root);
-   if (ret) {
+   if (!fs_info)
+   return NULL;
+
+   /*
+* All we really need to succeed is reading the chunk tree, everything
+* else we can do by hand, since we only need to read the tree root and
+* the fs_root.
+*/
+   if (!extent_buffer_uptodate(fs_info-tree_root-node)) {
+   u64 generation;
+
+   root = fs_info-tree_root;
+   if (!root_location)
+   root_location = btrfs_super_root(fs_info-super_copy);
+   generation = btrfs_super_generation(fs_info-super_copy);
+   root-node = read_tree_block(root, root_location,
+root-leafsize, generation);
+   if (!extent_buffer_uptodate(root-node)) {
+   fprintf(stderr, Error opening tree root\n);
close_ctree(root);
-   root = NULL;
+   return NULL;
}
}
 
-   return root;
+   if (!list_roots  !fs_info-fs_root) {
+   struct btrfs_key key;
+
+   key.objectid = BTRFS_FS_TREE_OBJECTID;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   fs_info-fs_root = btrfs_read_fs_root_no_cache(fs_info, key);
+   if (IS_ERR(fs_info-fs_root)) {
+   fprintf(stderr, Couldn't read fs root: %ld\n,
+   PTR_ERR(fs_info-fs_root));
+   close_ctree(fs_info-tree_root);
+   return NULL;
+   }
+   }
+
+   if (list_roots  do_list_roots(fs_info-tree_root)) {
+   close_ctree(fs_info-tree_root);
+   return NULL;
+   }
+
+   return fs_info-fs_root;
 }
 
 static int find_first_dir(struct btrfs_root *root, u64 *objectid)
diff --git a/debug-tree.c b/debug-tree.c
index 0fc0ecd..bae7f94 100644
--- a/debug-tree.c
+++ b/debug-tree.c
@@ -166,7 +166,7 @@ int main(int ac, char **av)
if (ac != 1)
print_usage();
 
-   info = open_ctree_fs_info(av[optind], 0, 0, 1);
+   info = open_ctree_fs_info(av[optind], 0, 0, 0, 1);
if (!info) {
fprintf(stderr, unable to open %s\n, av[optind]);
exit(1);
diff --git a/disk-io.c b/disk-io.c
index a9fd374..5265c3c 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -944,8 +944,10 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const 
char *path,
 
if (!(btrfs_super_flags(disk_super)  BTRFS_SUPER_FLAG_METADUMP)) {

[PATCH 0/4] [RFC] btrfs: offline dedupe

2013-04-16 Thread Mark Fasheh
Hi,
 
The following series of patches implements in btrfs an ioctl to do
offline deduplication of file extents.

To be clear, offline in this sense means that the file system is
mounted and running, but the dedupe is not done during file writes,
but after the fact when some userspace software initiates a dedupe.

The primary patch is loosely based off of one sent by Josef Bacik back
in January, 2011.

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/8508

I've made significant updates and changes from the original. In
particular the structure passed is more fleshed out, this series has a
high degree of code sharing between itself and the clone code, and the
locking has been updated.


The ioctl accepts a struct:

struct btrfs_ioctl_same_args {
__u64 logical_offset;   /* in - start of extent in source */
__u64 length;   /* in - length of extent */
__u16 total_files;  /* in - total elements in info array */
__u16 files_deduped;/* out - number of files that got deduped */
__u32 reserved;
struct btrfs_ioctl_same_extent_info info[0];
};

Userspace puts each duplicate extent (other than the source) in an
item in the info array. As there can be multiple dedupes in one
operation, each info item has it's own status and 'bytes_deduped'
member. This provides a number of benefits:

- We don't have to fail the entire ioctl because one of the dedupes failed.

- Userspace will always know how much progress was made on a file as we always
  return the number of bytes deduped.


#define BTRFS_SAME_DATA_DIFFERS 1
/* For extent-same ioctl */
struct btrfs_ioctl_same_extent_info {
__s64 fd;   /* in - destination file */
__u64 logical_offset;   /* in - start of extent in destination */
__u64 bytes_deduped;/* out - total # of bytes we were able
 * to dedupe from this file */
/* status of this dedupe operation:
 * 0 if dedup succeeds
 *  0 for error
 * == BTRFS_SAME_DATA_DIFFERS if data differs
 */
__s32 status;   /* out - see above description */
__u32 reserved;
};


The kernel patches are based off Linux v3.8. The ioctl has been tested
in the most basic sense, and should not be trusted to keep your data
safe. There are bugs.

A git tree for the kernel changes can be found at:

https://github.com/markfasheh/btrfs-extent-same


I have a userspace project, duperemove available at:

https://github.com/markfasheh/duperemove

Hopefully this can serve as an example of one possible usage of the ioctl.

duperemove takes a list of files as argument and will search them for
duplicated extents. My most recent changes have been to integrate it
with btrfs_extent_same so that the '-D' switch will have it fire off
dedupe requests once processing of data is complete. Integration with
extent_same has *not* been tested yet so don't expect that to work
flawlessly.

Within the duperemove repo is a file, btrfs-extent-same.c that acts as
a test wrapper around the ioctl. It can be compiled completely
seperately from the rest of the project via make
btrfs-extent-same. This makes direct testing of the ioctl more
convenient.

Code review is very much appreciated. Thanks,
 --Mark
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] btrfs: abtract out range locking in clone ioctl()

2013-04-16 Thread Mark Fasheh
The range locking in btrfs_ioctl_clone is trivially broken out into it's own
function. This reduces the complexity of btrfs_ioctl_clone() by a small bit
and makes that locking code available to future functions in
fs/btrfs/ioctl.c

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/ioctl.c |   36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 338f259..7c80738 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2412,6 +2412,26 @@ out:
return ret;
 }
 
+static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
+{
+   /* do any pending delalloc/csum calc on src, one way or
+  another, and lock file content */
+   while (1) {
+   struct btrfs_ordered_extent *ordered;
+   lock_extent(BTRFS_I(inode)-io_tree, off, off + len - 1);
+   ordered = btrfs_lookup_first_ordered_extent(inode,
+   off + len - 1);
+   if (!ordered 
+   !test_range_bit(BTRFS_I(inode)-io_tree, off,
+   off + len - 1, EXTENT_DELALLOC, 0, NULL))
+   break;
+   unlock_extent(BTRFS_I(inode)-io_tree, off, off + len - 1);
+   if (ordered)
+   btrfs_put_ordered_extent(ordered);
+   btrfs_wait_ordered_range(inode, off, len);
+   }
+}
+
 static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
   u64 off, u64 olen, u64 destoff)
 {
@@ -2529,21 +2549,7 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
truncate_inode_pages_range(inode-i_data, destoff,
   PAGE_CACHE_ALIGN(destoff + len) - 1);
 
-   /* do any pending delalloc/csum calc on src, one way or
-  another, and lock file content */
-   while (1) {
-   struct btrfs_ordered_extent *ordered;
-   lock_extent(BTRFS_I(src)-io_tree, off, off + len - 1);
-   ordered = btrfs_lookup_first_ordered_extent(src, off + len - 1);
-   if (!ordered 
-   !test_range_bit(BTRFS_I(src)-io_tree, off, off + len - 1,
-   EXTENT_DELALLOC, 0, NULL))
-   break;
-   unlock_extent(BTRFS_I(src)-io_tree, off, off + len - 1);
-   if (ordered)
-   btrfs_put_ordered_extent(ordered);
-   btrfs_wait_ordered_range(src, off, len);
-   }
+   lock_extent_range(src, off, len);
 
/* clone data */
key.objectid = btrfs_ino(src);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs_ioctl_clone: Move clone code into it's own function

2013-04-16 Thread Mark Fasheh
There's some 250+ lines here that are easily encapsulated into their own
function. I don't change how anything works here, just create and document
the new btrfs_clone() function from btrfs_ioctl_clone() code.

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/ioctl.c |  232 ++
 1 file changed, 128 insertions(+), 104 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7c80738..d237447 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2432,125 +2432,43 @@ static inline void lock_extent_range(struct inode 
*inode, u64 off, u64 len)
}
 }
 
-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
-  u64 off, u64 olen, u64 destoff)
+/**
+ * btrfs_clone() - clone a range from inode file to another
+ *
+ * @src: Inode to clone from
+ * @inode: Inode to clone to
+ * @off: Offset within source to start clone from
+ * @olen: Original length, passed by user, of range to clone
+ * @olen_aligned: Block-aligned value of olen, extent_same uses
+ *   identical values here
+ * @destoff: Offset within @inode to start clone
+ */
+static int btrfs_clone(struct inode *src, struct inode *inode,
+  u64 off, u64 olen, u64 olen_aligned, u64 destoff)
 {
-   struct inode *inode = fdentry(file)-d_inode;
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct fd src_file;
-   struct inode *src;
-   struct btrfs_trans_handle *trans;
-   struct btrfs_path *path;
+   struct btrfs_path *path = NULL;
struct extent_buffer *leaf;
-   char *buf;
+   struct btrfs_trans_handle *trans;
+   char *buf = NULL;
struct btrfs_key key;
u32 nritems;
int slot;
int ret;
-   u64 len = olen;
-   u64 bs = root-fs_info-sb-s_blocksize;
-
-   /*
-* TODO:
-* - split compressed inline extents.  annoying: we need to
-*   decompress into destination's address_space (the file offset
-*   may change, so source mapping won't do), then recompress (or
-*   otherwise reinsert) a subrange.
-* - allow ranges within the same file to be cloned (provided
-*   they don't overlap)?
-*/
-
-   /* the destination must be opened for writing */
-   if (!(file-f_mode  FMODE_WRITE) || (file-f_flags  O_APPEND))
-   return -EINVAL;
-
-   if (btrfs_root_readonly(root))
-   return -EROFS;
-
-   ret = mnt_want_write_file(file);
-   if (ret)
-   return ret;
-
-   src_file = fdget(srcfd);
-   if (!src_file.file) {
-   ret = -EBADF;
-   goto out_drop_write;
-   }
-
-   ret = -EXDEV;
-   if (src_file.file-f_path.mnt != file-f_path.mnt)
-   goto out_fput;
-
-   src = src_file.file-f_dentry-d_inode;
-
-   ret = -EINVAL;
-   if (src == inode)
-   goto out_fput;
-
-   /* the src must be open for reading */
-   if (!(src_file.file-f_mode  FMODE_READ))
-   goto out_fput;
-
-   /* don't make the dst file partly checksummed */
-   if ((BTRFS_I(src)-flags  BTRFS_INODE_NODATASUM) !=
-   (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM))
-   goto out_fput;
-
-   ret = -EISDIR;
-   if (S_ISDIR(src-i_mode) || S_ISDIR(inode-i_mode))
-   goto out_fput;
-
-   ret = -EXDEV;
-   if (src-i_sb != inode-i_sb)
-   goto out_fput;
+   u64 len = olen_aligned;
 
ret = -ENOMEM;
buf = vmalloc(btrfs_level_size(root, 0));
if (!buf)
-   goto out_fput;
+   return ret;
 
path = btrfs_alloc_path();
if (!path) {
vfree(buf);
-   goto out_fput;
-   }
-   path-reada = 2;
-
-   if (inode  src) {
-   mutex_lock_nested(inode-i_mutex, I_MUTEX_PARENT);
-   mutex_lock_nested(src-i_mutex, I_MUTEX_CHILD);
-   } else {
-   mutex_lock_nested(src-i_mutex, I_MUTEX_PARENT);
-   mutex_lock_nested(inode-i_mutex, I_MUTEX_CHILD);
-   }
-
-   /* determine range to clone */
-   ret = -EINVAL;
-   if (off + len  src-i_size || off + len  off)
-   goto out_unlock;
-   if (len == 0)
-   olen = len = src-i_size - off;
-   /* if we extend to eof, continue to block boundary */
-   if (off + len == src-i_size)
-   len = ALIGN(src-i_size, bs) - off;
-
-   /* verify the end result is block aligned */
-   if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs) ||
-   !IS_ALIGNED(destoff, bs))
-   goto out_unlock;
-
-   if (destoff  inode-i_size) {
-   ret = btrfs_cont_expand(inode, inode-i_size, destoff);
-   if (ret)
-   goto out_unlock;
+   return ret;
}
 
-   /* truncate page cache 

[PATCH 3/4] btrfs: Introduce extent_read_full_page_nolock()

2013-04-16 Thread Mark Fasheh
We want this for btrfs_extent_same. Basically readpage and friends do their
own extent locking but for the purposes of dedupe, we want to have both
files locked down across a set of readpage operations (so that we can
compare data). Introduce this variant and a flag which can be set for
extent_read_full_page() to indicate that we are already locked.

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/extent_io.c |   44 
 fs/btrfs/extent_io.h |2 ++
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1b319df..9256503 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2592,7 +2592,7 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
   struct page *page,
   get_extent_t *get_extent,
   struct bio **bio, int mirror_num,
-  unsigned long *bio_flags)
+  unsigned long *bio_flags, int parent_locked)
 {
struct inode *inode = page-mapping-host;
u64 start = (u64)page-index  PAGE_CACHE_SHIFT;
@@ -2625,7 +2625,7 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
}
 
end = page_end;
-   while (1) {
+   while (1  !parent_locked) {
lock_extent(tree, start, end);
ordered = btrfs_lookup_ordered_extent(inode, start);
if (!ordered)
@@ -2659,15 +2659,18 @@ static int __extent_read_full_page(struct 
extent_io_tree *tree,
kunmap_atomic(userpage);
set_extent_uptodate(tree, cur, cur + iosize - 1,
cached, GFP_NOFS);
-   unlock_extent_cached(tree, cur, cur + iosize - 1,
-cached, GFP_NOFS);
+   if (!parent_locked)
+   unlock_extent_cached(tree, cur,
+cur + iosize - 1,
+cached, GFP_NOFS);
break;
}
em = get_extent(inode, page, pg_offset, cur,
end - cur + 1, 0);
if (IS_ERR_OR_NULL(em)) {
SetPageError(page);
-   unlock_extent(tree, cur, end);
+   if (!parent_locked)
+   unlock_extent(tree, cur, end);
break;
}
extent_offset = cur - em-start;
@@ -2719,7 +2722,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
if (test_range_bit(tree, cur, cur_end,
   EXTENT_UPTODATE, 1, NULL)) {
check_page_uptodate(tree, page);
-   unlock_extent(tree, cur, cur + iosize - 1);
+   if (!parent_locked)
+   unlock_extent(tree, cur, cur + iosize - 1);
cur = cur + iosize;
pg_offset += iosize;
continue;
@@ -2729,7 +2733,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
 */
if (block_start == EXTENT_MAP_INLINE) {
SetPageError(page);
-   unlock_extent(tree, cur, cur + iosize - 1);
+   if (!parent_locked)
+   unlock_extent(tree, cur, cur + iosize - 1);
cur = cur + iosize;
pg_offset += iosize;
continue;
@@ -2756,7 +2761,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
}
if (ret) {
SetPageError(page);
-   unlock_extent(tree, cur, cur + iosize - 1);
+   if (!parent_locked)
+   unlock_extent(tree, cur, cur + iosize - 1);
}
cur = cur + iosize;
pg_offset += iosize;
@@ -2778,7 +2784,21 @@ int extent_read_full_page(struct extent_io_tree *tree, 
struct page *page,
int ret;
 
ret = __extent_read_full_page(tree, page, get_extent, bio, mirror_num,
- bio_flags);
+ bio_flags, 0);
+   if (bio)
+   ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
+   return ret;
+}
+
+int extent_read_full_page_nolock(struct extent_io_tree *tree, struct page 
*page,
+get_extent_t *get_extent, int mirror_num)
+{
+   struct bio *bio = NULL;
+   unsigned long bio_flags = 0;
+   int ret;
+
+   ret = __extent_read_full_page(tree, page, get_extent, bio, 

[PATCH 4/4] btrfs: offline dedupe

2013-04-16 Thread Mark Fasheh
This patch adds an ioctl, BTRFS_IOC_FILE_EXTENT_SAME which will try to
de-duplicate a list of extents across a range of files.

Internally, the ioctl re-uses code from the clone ioctl. This avoids
rewriting a large chunk of extent handling code.

Userspace passes in an array of file, offset pairs along with a length
argument. The ioctl will then (for each dedupe) do a byte-by-byte comparison
of the user data before deduping the extent. Status and number of bytes
deduped are returned for each operation.

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/ioctl.c |  272 ++
 fs/btrfs/ioctl.h |   28 +-
 2 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d237447..7cad49e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -57,6 +57,9 @@
 #include send.h
 #include dev-replace.h
 
+static int btrfs_clone(struct inode *src, struct inode *inode,
+  u64 off, u64 olen, u64 olen_aligned, u64 destoff);
+
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
 {
@@ -2412,6 +2415,61 @@ out:
return ret;
 }
 
+static noinline int fill_data(struct inode *inode, u64 off, u64 len,
+ char **cur_buffer)
+{
+   struct page *page;
+   void *addr;
+   char *buffer;
+   pgoff_t index;
+   pgoff_t last_index;
+   int ret = 0;
+   int bytes_copied = 0;
+   struct extent_io_tree *tree = BTRFS_I(inode)-io_tree;
+
+   buffer = kmalloc(len, GFP_NOFS);
+   if (!buffer)
+   return -ENOMEM;
+
+   index = off  PAGE_CACHE_SHIFT;
+   last_index = (off + len - 1)  PAGE_CACHE_SHIFT;
+
+   while (index = last_index) {
+   page = grab_cache_page(inode-i_mapping, index);
+   if (!page) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (!PageUptodate(page)) {
+   extent_read_full_page_nolock(tree, page,
+btrfs_get_extent, 0);
+   lock_page(page);
+   if (!PageUptodate(page)) {
+   unlock_page(page);
+   page_cache_release(page);
+   ret = -EINVAL;
+   goto out;
+   }
+   }
+
+   addr = kmap(page);
+   memcpy(buffer + bytes_copied, addr, PAGE_CACHE_SIZE);
+   kunmap(page);
+   unlock_page(page);
+   page_cache_release(page);
+   bytes_copied += PAGE_CACHE_SIZE;
+   index++;
+   }
+
+   *cur_buffer = buffer;
+
+out:
+   if (ret)
+   kfree(buffer);
+   return ret;
+}
+
 static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
 {
/* do any pending delalloc/csum calc on src, one way or
@@ -2432,6 +2490,218 @@ static inline void lock_extent_range(struct inode 
*inode, u64 off, u64 len)
}
 }
 
+static void btrfs_double_unlock(struct inode *inode1, u64 loff1,
+   struct inode *inode2, u64 loff2, u64 len)
+{
+   unlock_extent(BTRFS_I(inode1)-io_tree, loff1, loff1 + len - 1);
+   unlock_extent(BTRFS_I(inode2)-io_tree, loff2, loff2 + len - 1);
+
+   mutex_unlock(inode1-i_mutex);
+   mutex_unlock(inode2-i_mutex);
+}
+
+static void btrfs_double_lock(struct inode *inode1, u64 loff1,
+ struct inode *inode2, u64 loff2, u64 len)
+{
+   if (inode1  inode2) {
+   mutex_lock_nested(inode1-i_mutex, I_MUTEX_PARENT);
+   mutex_lock_nested(inode2-i_mutex, I_MUTEX_CHILD);
+   lock_extent_range(inode1, loff1, len);
+   lock_extent_range(inode2, loff2, len);
+   } else {
+   mutex_lock_nested(inode2-i_mutex, I_MUTEX_PARENT);
+   mutex_lock_nested(inode1-i_mutex, I_MUTEX_CHILD);
+   lock_extent_range(inode2, loff2, len);
+   lock_extent_range(inode1, loff1, len);
+   }
+}
+
+static int btrfs_extent_same(struct inode *src, u64 loff, u64 len,
+struct inode *dst, u64 dst_loff)
+{
+   char *orig_buffer = NULL;
+   char *dst_inode_buffer = NULL;
+   int ret;
+
+   /*
+* btrfs_clone() can't handle extents in the same file
+* yet. Once that works, we can drop this check and replace it
+* with a check for the same inode, but overlapping extents.
+*/
+   if (src == dst)
+   return -EINVAL;
+
+   btrfs_double_lock(src, loff, dst, dst_loff, len);
+
+   ret = fill_data(src, loff, len, orig_buffer);
+   if (ret) {
+   printk(KERN_ERR btrfs: unable to source populate data 
+  

Re: [PATCH 0/4] [RFC] btrfs: offline dedupe

2013-04-16 Thread Marek Otahal
Hi Mark, 

could you compare (appart from online/offline) your implementation to LiuBo's 
work?, appeared on ML a while ago: 
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg23656.html

It would be interesting if the two approaches could share some code, and also 
confirmation that using one technique does not 
disregard using the other in future. 

Best wishes, 
Mark

On Tuesday 16 April 2013 15:15:31 Mark Fasheh wrote:
 Hi,
  
 The following series of patches implements in btrfs an ioctl to do
 offline deduplication of file extents.
 
 To be clear, offline in this sense means that the file system is
 mounted and running, but the dedupe is not done during file writes,
 but after the fact when some userspace software initiates a dedupe.
 
 The primary patch is loosely based off of one sent by Josef Bacik back
 in January, 2011.
 
 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/8508
 
 I've made significant updates and changes from the original. In
 particular the structure passed is more fleshed out, this series has a
 high degree of code sharing between itself and the clone code, and the
 locking has been updated.
 
...
 
 Code review is very much appreciated. Thanks,
  --Mark
-- 

Marek Otahal :o)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] [RFC] btrfs: offline dedupe

2013-04-16 Thread Mark Fasheh
On Wed, Apr 17, 2013 at 12:50:04AM +0200, Marek Otahal wrote: could you
 compare (appart from online/offline) your implementation to LiuBo's work?,
 appeared on ML a while ago:
 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg23656.html

Well that's the primary difference. Liu Bo's patch requires a format change
also since it's done online. My patch requires no format change. So they're
complimentary approaches in my opinion.

There's also the possibility that some other file systems could pick up the
ioctl. Ocfs2 in particular should be able to.


 It would be interesting if the two approaches could share some code, and
 also confirmation that using one technique does not disregard using the
 other in future.

Both features can exist together and probably should, I can see great uses
for both cases.

I haven't looked at the patches but with respect to code sharing I'll take a
look. My patches don't actually add any custom code for the actual let's
de-dupe this extent as I re-use the code from btrfs_ioctl_clone().
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: a copy of superblock is zero may not mean btrfs is not there

2013-04-16 Thread Anand Jain



On 04/16/2013 07:57 PM, David Sterba wrote:

On Fri, Apr 12, 2013 at 03:55:06PM +0800, Anand Jain wrote:

If one of the copy of the superblock is zero it does not
confirm to us that btrfs isn't there on that disk. When
we are having more than one copy of superblock we should
rather let the for loop to continue to check other copies.

the following test case and results would justify the
fix

mkfs.btrfs /dev/sdb /dev/sdc -f
mount /dev/sdb /btrfs
dd if=/dev/zero bs=1 count=8 of=/dev/sdc seek=$((64*1024+64))
~/before/btrfs-select-super -s 1 /dev/sdc
using SB copy 1, bytenr 67108864

here btrfs-select-super just wrote superblock to a mounted btrfs


Why does not check_mounted() catch this in the first place? Ie. based on
the status in /proc/mounts not on random bytes in the superblock.


 the reason is, as of now /proc/mounts just knows about the devid 1.

Thanks, Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] [RFC] btrfs: offline dedupe

2013-04-16 Thread Liu Bo
On Tue, Apr 16, 2013 at 04:17:15PM -0700, Mark Fasheh wrote:
 On Wed, Apr 17, 2013 at 12:50:04AM +0200, Marek Otahal wrote: could you
  compare (appart from online/offline) your implementation to LiuBo's work?,
  appeared on ML a while ago:
  http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg23656.html
 
 Well that's the primary difference. Liu Bo's patch requires a format change
 also since it's done online. My patch requires no format change. So they're
 complimentary approaches in my opinion.
 
 There's also the possibility that some other file systems could pick up the
 ioctl. Ocfs2 in particular should be able to.
 
 
  It would be interesting if the two approaches could share some code, and
  also confirmation that using one technique does not disregard using the
  other in future.
 
 Both features can exist together and probably should, I can see great uses
 for both cases.
 
 I haven't looked at the patches but with respect to code sharing I'll take a
 look. My patches don't actually add any custom code for the actual let's
 de-dupe this extent as I re-use the code from btrfs_ioctl_clone().

In online dedup, I just make some changes in write path, as we regard dedup as a
special kind of compression, doing dedup as compression way is the goal.

The difference is where hash database is -- offline dedup puts it in userspace
while online dedup in kernel.

Although there might be no code that can be shared here, I agree that both
online and offline one are useful.

Good job.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html