Re: [PATCH v2 1/4] btrfs: Remove userspace transaction ioctls

2018-02-05 Thread Wang Shilong

   These ioctl are originally introduced by Sage Weil for Ceph use?
Not sure whether it still useful, Cc Sage just in case.


在 2018年2月5日,下午5:41,Nikolay Borisov  写道:

Commit 3558d4f88ec8 ("btrfs: Deprecate userspace transaction ioctls")
marked the beginning of the end of userspace transaction. This commit
finishes the job!

Signed-off-by: Nikolay Borisov 

---

V2:
* Also remove the usage of btrfs_ioctl_trans_end from btrfs_release_file so 
that the patch compiles on its own as well.

fs/btrfs/ctree.h |  1 -
fs/btrfs/file.c  |  8 -
fs/btrfs/ioctl.c | 95 
3 files changed, 104 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1a462ab85c49..6a4752177ad8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3193,7 +3193,6 @@ void btrfs_destroy_inode(struct inode *inode);
int btrfs_drop_inode(struct inode *inode);
int __init btrfs_init_cachep(void);
void btrfs_destroy_cachep(void);
-long btrfs_ioctl_trans_end(struct file *file);
struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
 struct btrfs_root *root, int *was_new);
struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cba2ac371ce0..101e0c7fea92 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1996,8 +1996,6 @@ int btrfs_release_file(struct inode *inode, struct file 
*filp)
{
struct btrfs_file_private *private = filp->private_data;

-   if (private && private->trans)
-   btrfs_ioctl_trans_end(filp);
if (private && private->filldir_buf)
kfree(private->filldir_buf);
kfree(private);
@@ -2189,12 +2187,6 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
}

/*
-* ok we haven't committed the transaction yet, lets do a commit
-*/
-   if (file->private_data)
-   btrfs_ioctl_trans_end(file);
-
-   /*
 * We use start here because we will need to wait on the IO to complete
 * in btrfs_sync_log, which could require joining a transaction (for
 * example checking cross references in the nocow path).  If we use join
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f573cad72b7e..3094e079fc4f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3935,73 +3935,6 @@ int btrfs_clone_file_range(struct file *src_file, loff_t 
off,
return btrfs_clone_files(dst_file, src_file, off, len, destoff);
}

-/*
- * there are many ways the trans_start and trans_end ioctls can lead
- * to deadlocks.  They should only be used by applications that
- * basically own the machine, and have a very in depth understanding
- * of all the possible deadlocks and enospc problems.
- */
-static long btrfs_ioctl_trans_start(struct file *file)
-{
-   struct inode *inode = file_inode(file);
-   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-   struct btrfs_root *root = BTRFS_I(inode)->root;
-   struct btrfs_trans_handle *trans;
-   struct btrfs_file_private *private;
-   int ret;
-   static bool warned = false;
-
-   ret = -EPERM;
-   if (!capable(CAP_SYS_ADMIN))
-   goto out;
-
-   if (!warned) {
-   btrfs_warn(fs_info,
-   "Userspace transaction mechanism is considered "
-   "deprecated and slated to be removed in 4.17. "
-   "If you have a valid use case please "
-   "speak up on the mailing list");
-   WARN_ON(1);
-   warned = true;
-   }
-
-   ret = -EINPROGRESS;
-   private = file->private_data;
-   if (private && private->trans)
-   goto out;
-   if (!private) {
-   private = kzalloc(sizeof(struct btrfs_file_private),
- GFP_KERNEL);
-   if (!private)
-   return -ENOMEM;
-   file->private_data = private;
-   }
-
-   ret = -EROFS;
-   if (btrfs_root_readonly(root))
-   goto out;
-
-   ret = mnt_want_write_file(file);
-   if (ret)
-   goto out;
-
-   atomic_inc(_info->open_ioctl_trans);
-
-   ret = -ENOMEM;
-   trans = btrfs_start_ioctl_transaction(root);
-   if (IS_ERR(trans))
-   goto out_drop;
-
-   private->trans = trans;
-   return 0;
-
-out_drop:
-   atomic_dec(_info->open_ioctl_trans);
-   mnt_drop_write_file(file);
-out:
-   return ret;
-}
-
static long btrfs_ioctl_default_subvol(struct file *file, void __user *argp)
{
struct inode *inode = file_inode(file);
@@ -4243,30 +4176,6 @@ static long btrfs_ioctl_space_info(struct btrfs_fs_info 
*fs_info,
return ret;
}

-/*
- * there are many ways the trans_start and trans_end ioctls can lead
- * to deadlocks.  They should only be used by 

Re: [PATCH 0/5] btrfs-progs: resolve property and enhance defragment

2017-11-01 Thread Wang Shilong

在 2017年11月2日,上午11:44,Su Yue  写道:

Sorry, the patchset does not work as expected.
Please ignore it.

On 11/02/2017 11:23 AM, Su Yue wrote:
> The patchset adds an option '--compress-force' to work with
> 'btrfs fi defrag -c'. Then no-compression files will be set with
> compression property specified by '-c' (zlib default).
> patch[1-4] divide property handler to setter, getter, and printer.
> Then patch[5] could enhance defragment easier.
> Su Yue (5):
>   btrfs-progs: property: divide prop_handler_t into
> setter,getter,printer
>   btrfs-progs: property: set/get/print ro property
>   btrfs-progs: property: set/get/print label property
>   btrfs-progs: property: set/get/print compression property
>   btrfs-progs: fi defrag: extend -c to drop nocompress flag on files
>  cmds-filesystem.c |  94 +++--
>  cmds-property.c   |   7 +-
>  props.c   | 207 
> +-
>  props.h   |  19 +++--
>  4 files changed, 283 insertions(+), 44 deletions(-)

FYI, i think it better to add some tests to btrfs-progs if new interface
or function like this is added.

Thanks,
Shilong


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?

2017-08-04 Thread Wang Shilong
Hi Qu,

On Fri, Aug 4, 2017 at 10:05 PM, Qu Wenruo  wrote:
>
>
> On 2017年08月02日 16:38, Brendan Hide wrote:
>>
>> The title seems alarmist to me - and I suspect it is going to be
>> misconstrued. :-/
>>
>>  From the release notes at
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html
>>
>> "Btrfs has been deprecated
>>
>> The Btrfs file system has been in Technology Preview state since the
>> initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving
>> Btrfs to a fully supported feature and it will be removed in a future major
>> release of Red Hat Enterprise Linux.
>>
>> The Btrfs file system did receive numerous updates from the upstream in
>> Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat
>> Enterprise Linux 7 series. However, this is the last planned update to this
>> feature.
>>
>> Red Hat will continue to invest in future technologies to address the use
>> cases of our customers, specifically those related to snapshots,
>> compression, NVRAM, and ease of use. We encourage feedback through your Red
>> Hat representative on features and requirements you have for file systems
>> and storage technology."
>
>
> Personally speaking, unlike most of the btrfs supporters, I think Red Hat is
> doing the correct thing for their enterprise use case.
>
> (To clarify, I'm not going to Red Hat, just in case anyone wonders why I'm
> not supporting btrfs)
>
> [Good things of btrfs]
> Btrfs is indeed a technic pioneer in a lot of aspects (at least in linux
> world):
>
> 1) Metadata CoW instead of traditional journal
> 2) Snapshot and delta-backup
> I think this is the killer feature of Btrfs, and why SUSE is using it
> for root fs.
> 3) Default data CoW
> 4) Data checksum and scrubbing
> 5) Multi-device management
> 6) Online resize/balancing
> And a lot of more.
>
> [Bad things of btrfs]
> But for enterprise usage, it's too advanced and has several problems
> preventing them being widely applied:
>
> 1) Low performance from metadata/data CoW
> This is a little complicated dilemma.
> Although Btrfs can disable data CoW, nodatacow also disables data
> checksum, which is another main feature for btrfs.
> So Btrfs can't default to nodatacow, unlike XFS.
>
> And metadata CoW causes extra metadata write along with superblock
> update (FUA), further degrading the performance.
>
> Such pioneered design makes traditional performance-intense use case
> very unhappy.
> Especially for almost all kind of databases. (Note that nodatacow can't
> always solve the performance problem).
> Most performance intense usage is still based on tradtional fs design
> (journal with no CoW)
>
> 2) Low concurrency caused by tree design.
>  Unlike traditional one-tree-for-one-inode design, btrfs uses
> one-tree-for-one-subvolume.
>  The design makes snapshot implementation very easy, while makes tree
> very hot when a lot of modifiers are trying to modify any metadata.
>
>  Btrfs has a lot of different way to solve it.
>  For extent tree (the most busy tree), we are using delayed-ref to speed
> up extent tree update.
>  For fs tree fsync, we have log tree to speed things up.
>  These approaches work, at the cost of complexity and bugs, and we still
> have slow fs tree modification speed.
>
> 3) Low code reusage of device-mapper.
>  I totally understand that, due to the unique support for data csum,
> btrfs can't use device-mapper directly, as we must verify the data read out
> from device before passing it to higher level.
> So Btrfs uses its own device-mapper like implementation to handle
> multi-device management.
>
> The result is mixed. For easy to handle case like RAID0/1/10 btrfs is
> doing well.
> While for RAID5/6, everyone knows the result.
>
> Such btrfs *enhanced* re-implementation not only makes btrfs larger but
> also more complex and bug-prune.
>
> In short, btrfs is too advanced for generic use cases (performance) and
> developers (bugs), unfortunately.
>
> And even SUSE is just pushing btrfs as root fs, mainly for the snapshot
> feature.
> But still ext4/xfs for data or performance intense use case.
>
>
> [Other solution on the table]
> On the other hand, I think RedHat is pushing storage technology based on LVM
> (thin) and Xfs.
>
> For traditional LVM, it's stable but its snapshot design is old-fashion and
> low-performance.
> While new thin-provision LVM solves the problem using a method just like
> Btrfs, but at block level.
>
> And for XFS, it's still traditional designed, journal based,
> one-tree-for-one-inode.
> But with fancy new features like data CoW.
>
> Even XFS + LVM-thin lacks ability to shrink fs or scrub data or delta
> backup, it can do a lot of things just like Btrfs.
> From snapshot to multi-device management.
>
> And more importantly, has better 

Re: [PATCH] Btrfs: avoid unnecessarily locking inode when clearing a range

2017-08-03 Thread Wang Shilong
On Thu, Aug 3, 2017 at 11:00 PM, Chris Mason <c...@fb.com> wrote:
>
>
> On 07/27/2017 02:52 PM, fdman...@kernel.org wrote:
>>
>> From: Filipe Manana <fdman...@suse.com>
>>
>> If the range being cleared was not marked for defrag and we are not
>> about to clear the range from the defrag status, we don't need to
>> lock and unlock the inode.
>>
>> Signed-off-by: Filipe Manana <fdman...@suse.com>
>
>
> Thanks Filipe, looks like it goes all the way back to:
>
> commit 47059d930f0e002ff851beea87d738146804726d
> Author: Wang Shilong <wangsl.f...@cn.fujitsu.com>
> Date:   Thu Jul 3 18:22:07 2014 +0800
>
> Btrfs: make defragment work with nodatacow option
>
> I can't see how the inode lock is required here.

This blames to me, thanks for fixing it.

Reviewed-by: Wang Shilong <wangshilong1...@gmail.com>

>
> Reviewed-by: Chris Mason <c...@fb.com>
>
> -chris
>
>> ---
>>   fs/btrfs/inode.c | 7 ---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index eb495e956d53..51c45c0a8553 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -1797,10 +1797,11 @@ static void btrfs_clear_bit_hook(void
>> *private_data,
>> u64 len = state->end + 1 - state->start;
>> u32 num_extents = count_max_extents(len);
>>   - spin_lock(>lock);
>> -   if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG))
>> +   if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG)) {
>> +   spin_lock(>lock);
>> inode->defrag_bytes -= len;
>> -   spin_unlock(>lock);
>> +   spin_unlock(>lock);
>> +   }
>> /*
>>  * set_bit and clear bit hooks normally require _irqsave/restore
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?

2017-08-02 Thread Wang Shilong
I haven't seen active btrfs developers from some time, Redhat looks
put most of their efforts on XFS, It is time to switch to SLES/opensuse!


On Wed, Aug 2, 2017 at 4:38 PM, Brendan Hide  wrote:
> The title seems alarmist to me - and I suspect it is going to be
> misconstrued. :-/
>
> From the release notes at
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html
>
> "Btrfs has been deprecated
>
> The Btrfs file system has been in Technology Preview state since the initial
> release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a
> fully supported feature and it will be removed in a future major release of
> Red Hat Enterprise Linux.
>
> The Btrfs file system did receive numerous updates from the upstream in Red
> Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise
> Linux 7 series. However, this is the last planned update to this feature.
>
> Red Hat will continue to invest in future technologies to address the use
> cases of our customers, specifically those related to snapshots,
> compression, NVRAM, and ease of use. We encourage feedback through your Red
> Hat representative on features and requirements you have for file systems
> and storage technology."
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] Btrfs for 4.13, part 1

2017-06-24 Thread Wang Shilong
Sorry for bikeshedding.

On Fri, Jun 23, 2017 at 11:16 PM, David Sterba  wrote:
> Hi,
>
> this is the main batch for 4.13. There are some user visible changes, see
> below. The core updates improve error handling (mostly related to bios), with
> the usual incremental work on the GFP_NOFS (mis)use removal. All patches have
>
> Fabian Frederick (1):
>   btrfs: kmap() can't fail
>
<..SNIP..>
>
> Sargun Dhillon (2):
>   btrfs: add quota override flag to enable quota override for 
> CAP_SYS_RESOURCE
>   btrfs: Add quota_override knob into sysfs
>
> Su Yue (9):
>   btrfs: Introduce btrfs_is_name_len_valid to avoid reading beyond 
> boundary
>   btrfs: Check name_len with boundary in verify dir_item
>   btrfs: Check name_len on add_inode_ref call path
>   btrfs: Verify dir_item in replay_xattr_deletes
>   btrfs: Check name_len in btrfs_check_ref_name_override
>   btrfs: Check name_len before read in iterate_dir_item
>   btrfs: Check name_len before reading btrfs_get_name
>   btrfs: Check name_len before in btrfs_del_root_ref
>   btrfs: Verify dir_item in iterate_object_props

Hmm..add those check might be expensive for metadata operations,
especially in hot path, i could see similar behavior in Ext4 for ext4
dentry check.

Could we run some metadata tests to confirm?it makes sense to
add check but with min affects to performace.


Thanks,
Shilong

> Timofey Titovets (3):
>   Btrfs: lzo: fix typo in error message after failed deflate
>   Btrfs: lzo: compressed data size must be less then input size
>   Btrfs: compression must free at least one sector size
>
> Yonghong Song (1):
>   Btrfs: add statx support
>
>  fs/btrfs/backref.c   |  10 +-
>  fs/btrfs/check-integrity.c   |  53 ++---
>  fs/btrfs/compression.c   |  94 ++--
>  fs/btrfs/compression.h   |  44 +++-
>  fs/btrfs/ctree.c |  42 ++--
>  fs/btrfs/ctree.h |  84 ---
>  fs/btrfs/delayed-ref.c   |  29 ++-
>  fs/btrfs/delayed-ref.h   |   6 +-
>  fs/btrfs/dir-item.c  |  94 +++-
>  fs/btrfs/disk-io.c   | 179 +++
>  fs/btrfs/disk-io.h   |   8 +-
>  fs/btrfs/export.c|   5 +
>  fs/btrfs/extent-tree.c   | 481 
> +--
>  fs/btrfs/extent_io.c | 217 --
>  fs/btrfs/extent_io.h |  82 +--
>  fs/btrfs/file-item.c |  31 ++-
>  fs/btrfs/file.c  |  46 ++--
>  fs/btrfs/free-space-tree.c   |  38 ++--
>  fs/btrfs/inode-map.c |   4 +-
>  fs/btrfs/inode.c | 449 
>  fs/btrfs/ioctl.c |  16 +-
>  fs/btrfs/lzo.c   |  33 +--
>  fs/btrfs/print-tree.c|   7 +-
>  fs/btrfs/props.c |   7 +
>  fs/btrfs/qgroup.c| 223 +-
>  fs/btrfs/qgroup.h|   9 +-
>  fs/btrfs/raid56.c|  16 +-
>  fs/btrfs/reada.c |   1 -
>  fs/btrfs/relocation.c|  15 +-
>  fs/btrfs/root-tree.c |   7 +
>  fs/btrfs/scrub.c | 209 +++--
>  fs/btrfs/send.c  | 112 ++---
>  fs/btrfs/super.c |  72 +-
>  fs/btrfs/sysfs.c |  41 
>  fs/btrfs/tests/extent-io-tests.c |   2 +-
>  fs/btrfs/transaction.c   |  23 +-
>  fs/btrfs/tree-log.c  |  44 +++-
>  fs/btrfs/volumes.c   |  74 +++---
>  fs/btrfs/volumes.h   |   7 +
>  fs/btrfs/xattr.c |   2 +-
>  fs/btrfs/zlib.c  |  20 +-
>  include/trace/events/btrfs.h |  36 ---
>  include/uapi/linux/btrfs.h   |  63 +++--
>  43 files changed, 1682 insertions(+), 1353 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs-progs: du: fix to skip not btrfs dir/file

2016-07-07 Thread Wang Shilong
'btrfs file du' is a very useful tool to watch my system
file usage information with snapshot aware.

when trying to run following commands:
[root@localhost btrfs-progs]# btrfs file du /
 Total   Exclusive  Set shared  Filename
ERROR: Failed to lookup root id - Inappropriate ioctl for device
ERROR: cannot check space of '/': Unknown error -1

and My Filesystem looks like this:
[root@localhost btrfs-progs]# df -Th
Filesystem Type  Size  Used Avail Use% Mounted on
devtmpfs   devtmpfs   16G 0   16G   0% /dev
tmpfs  tmpfs  16G  368K   16G   1% /dev/shm
tmpfs  tmpfs  16G  1.4M   16G   1% /run
tmpfs  tmpfs  16G 0   16G   0% /sys/fs/cgroup
/dev/sda3  btrfs  60G   19G   40G  33% /
tmpfs  tmpfs  16G  332K   16G   1% /tmp
/dev/sdc   btrfs 2.8T  166G  1.7T   9% /data
/dev/sda2  xfs   2.0G  452M  1.6G  23% /boot
/dev/sda1  vfat  1.9G   11M  1.9G   1% /boot/efi
tmpfs  tmpfs 3.2G   24K  3.2G   1% /run/user/1000

So I installed Btrfs as my root partition, but boot partition
can be other fs.

We can Let btrfs tool aware of this is not a btrfs file or
directory and skip those files, so that someone like me
could just run 'btrfs file du /' to scan all btrfs filesystems.

After patch, it will look like:
   Total   Exclusive  Set shared  Filename
 0.00B   0.00B   -  //root/.bash_logout
 0.00B   0.00B   -  //root/.bash_profile
 0.00B   0.00B   -  //root/.bashrc
 0.00B   0.00B   -  //root/.cshrc
 0.00B   0.00B   -  //root/.tcshrc

This works for me to analysis system usage and analysis
performaces.

Signed-off-by: Wang Shilong <wangshilong1...@gmail.com>
---
v1->v2: remove extra unnecessary messages output
---
 cmds-fi-du.c   | 8 +++-
 cmds-inspect.c | 2 +-
 utils.c| 8 
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/cmds-fi-du.c b/cmds-fi-du.c
index 12855a5..6d5bf35 100644
--- a/cmds-fi-du.c
+++ b/cmds-fi-du.c
@@ -389,8 +389,14 @@ static int du_walk_dir(struct du_dir_ctxt *ctxt, struct 
rb_root *shared_extents)
  dirfd(dirstream),
  shared_extents, , ,
  0);
-   if (ret)
+   if (ret == -ENOTTY) {
+   continue;
+   } else if (ret) {
+   fprintf(stderr,
+   "failed to walk dir/file: %s 
:%s\n",
+   entry->d_name, strerror(-ret));
break;
+   }
 
ctxt->bytes_total += tot;
ctxt->bytes_shared += shr;
diff --git a/cmds-inspect.c b/cmds-inspect.c
index dd7b9dd..2ae44be 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -323,7 +323,7 @@ static int cmd_inspect_rootid(int argc, char **argv)
 
ret = lookup_ino_rootid(fd, );
if (ret) {
-   error("rootid failed with ret=%d", ret);
+   error("failed to lookup root id: %s", strerror(-ret));
goto out;
}
 
diff --git a/utils.c b/utils.c
index 578fdb0..f73b048 100644
--- a/utils.c
+++ b/utils.c
@@ -2815,6 +2815,8 @@ path:
if (fd < 0)
goto err;
ret = lookup_ino_rootid(fd, );
+   if (ret)
+   error("failed to lookup root id: %s", strerror(-ret));
close(fd);
if (ret < 0)
goto err;
@@ -3497,10 +3499,8 @@ int lookup_ino_rootid(int fd, u64 *rootid)
args.objectid = BTRFS_FIRST_FREE_OBJECTID;
 
ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, );
-   if (ret < 0) {
-   error("failed to lookup root id: %s", strerror(errno));
-   return ret;
-   }
+   if (ret < 0)
+   return -errno;
 
*rootid = args.treeid;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: du: fix to skip not btrfs dir/file

2016-07-06 Thread Wang Shilong
On Wed, Jul 6, 2016 at 10:35 PM, Holger Hoffstätte
<hol...@applied-asynchrony.com> wrote:
> On 07/06/16 14:25, Wang Shilong wrote:
>> 'btrfs file du' is a very useful tool to watch my system
>> file usage with snapshot aware.
>>
>> when trying to run following commands:
>> [root@localhost btrfs-progs]# btrfs file du /
>>  Total   Exclusive  Set shared  Filename
>> ERROR: Failed to lookup root id - Inappropriate ioctl for device
>> ERROR: cannot check space of '/': Unknown error -1
>>
>> and My Filesystem looks like this:
>> [root@localhost btrfs-progs]# df -Th
>> Filesystem Type  Size  Used Avail Use% Mounted on
>> devtmpfs   devtmpfs   16G 0   16G   0% /dev
>> tmpfs  tmpfs  16G  368K   16G   1% /dev/shm
>> tmpfs  tmpfs  16G  1.4M   16G   1% /run
>> tmpfs  tmpfs  16G 0   16G   0% /sys/fs/cgroup
>> /dev/sda3  btrfs  60G   19G   40G  33% /
>> tmpfs  tmpfs  16G  332K   16G   1% /tmp
>> /dev/sdc   btrfs 2.8T  166G  1.7T   9% /data
>> /dev/sda2  xfs   2.0G  452M  1.6G  23% /boot
>> /dev/sda1  vfat  1.9G   11M  1.9G   1% /boot/efi
>> tmpfs  tmpfs 3.2G   24K  3.2G   1% /run/user/1000
>>
>> So I installed Btrfs as my root partition, but boot partition
>> can be other fs.
>>
>> We can Let btrfs tool aware of this is not a btrfs file or
>> directory and skip those files, so that someone like me
>> could just run 'btrfs file du /' to scan all btrfs filesystems.
>>
>> After patch, it will look like:
>>Total   Exclusive  Set shared  Filename
>> skipping not btrfs dir/file: boot
>> skipping not btrfs dir/file: dev
>> skipping not btrfs dir/file: proc
>> skipping not btrfs dir/file: run
>> skipping not btrfs dir/file: sys
>>  0.00B   0.00B   -  //root/.bash_logout
>>  0.00B   0.00B   -  //root/.bash_profile
>>  0.00B   0.00B   -  //root/.bashrc
>>  0.00B   0.00B   -  //root/.cshrc
>>  0.00B   0.00B   -  //root/.tcshrc
>>
>> This works for me to analysis system usage and analysis
>> performaces.
>
> This is great, but can we please skip the "skipping .." messages?
> Maybe it's just me but I really don't see the value of printing them
> when they don't contribute to the result.
> They also mess up the display. :)

I don't have a taste whether it needed or not, because it is somehow
useful to let users know some files/directories skipped

Wait some other guys opinion for this...

thanks,
Shilong

>
> thanks,
> Holger
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: du: fix to skip not btrfs dir/file

2016-07-06 Thread Wang Shilong
'btrfs file du' is a very useful tool to watch my system
file usage with snapshot aware.

when trying to run following commands:
[root@localhost btrfs-progs]# btrfs file du /
 Total   Exclusive  Set shared  Filename
ERROR: Failed to lookup root id - Inappropriate ioctl for device
ERROR: cannot check space of '/': Unknown error -1

and My Filesystem looks like this:
[root@localhost btrfs-progs]# df -Th
Filesystem Type  Size  Used Avail Use% Mounted on
devtmpfs   devtmpfs   16G 0   16G   0% /dev
tmpfs  tmpfs  16G  368K   16G   1% /dev/shm
tmpfs  tmpfs  16G  1.4M   16G   1% /run
tmpfs  tmpfs  16G 0   16G   0% /sys/fs/cgroup
/dev/sda3  btrfs  60G   19G   40G  33% /
tmpfs  tmpfs  16G  332K   16G   1% /tmp
/dev/sdc   btrfs 2.8T  166G  1.7T   9% /data
/dev/sda2  xfs   2.0G  452M  1.6G  23% /boot
/dev/sda1  vfat  1.9G   11M  1.9G   1% /boot/efi
tmpfs  tmpfs 3.2G   24K  3.2G   1% /run/user/1000

So I installed Btrfs as my root partition, but boot partition
can be other fs.

We can Let btrfs tool aware of this is not a btrfs file or
directory and skip those files, so that someone like me
could just run 'btrfs file du /' to scan all btrfs filesystems.

After patch, it will look like:
   Total   Exclusive  Set shared  Filename
skipping not btrfs dir/file: boot
skipping not btrfs dir/file: dev
skipping not btrfs dir/file: proc
skipping not btrfs dir/file: run
skipping not btrfs dir/file: sys
 0.00B   0.00B   -  //root/.bash_logout
 0.00B   0.00B   -  //root/.bash_profile
 0.00B   0.00B   -  //root/.bashrc
 0.00B   0.00B   -  //root/.cshrc
 0.00B   0.00B   -  //root/.tcshrc

This works for me to analysis system usage and analysis
performaces.

Signed-off-by: Wang Shilong <wangshilong1...@gmail.com>
---
 cmds-fi-du.c   | 11 ++-
 cmds-inspect.c |  2 +-
 utils.c|  8 
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/cmds-fi-du.c b/cmds-fi-du.c
index 12855a5..bf0e62c 100644
--- a/cmds-fi-du.c
+++ b/cmds-fi-du.c
@@ -389,8 +389,17 @@ static int du_walk_dir(struct du_dir_ctxt *ctxt, struct 
rb_root *shared_extents)
  dirfd(dirstream),
  shared_extents, , ,
  0);
-   if (ret)
+   if (ret == -ENOTTY) {
+   fprintf(stdout,
+   "skipping not btrfs dir/file: 
%s\n",
+   entry->d_name);
+   continue;
+   } else if (ret) {
+   fprintf(stderr,
+   "failed to walk dir/file: %s 
:%s\n",
+   entry->d_name, strerror(-ret));
break;
+   }
 
ctxt->bytes_total += tot;
ctxt->bytes_shared += shr;
diff --git a/cmds-inspect.c b/cmds-inspect.c
index dd7b9dd..2ae44be 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -323,7 +323,7 @@ static int cmd_inspect_rootid(int argc, char **argv)
 
ret = lookup_ino_rootid(fd, );
if (ret) {
-   error("rootid failed with ret=%d", ret);
+   error("failed to lookup root id: %s", strerror(-ret));
goto out;
}
 
diff --git a/utils.c b/utils.c
index 578fdb0..f73b048 100644
--- a/utils.c
+++ b/utils.c
@@ -2815,6 +2815,8 @@ path:
if (fd < 0)
goto err;
ret = lookup_ino_rootid(fd, );
+   if (ret)
+   error("failed to lookup root id: %s", strerror(-ret));
close(fd);
if (ret < 0)
goto err;
@@ -3497,10 +3499,8 @@ int lookup_ino_rootid(int fd, u64 *rootid)
args.objectid = BTRFS_FIRST_FREE_OBJECTID;
 
ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, );
-   if (ret < 0) {
-   error("failed to lookup root id: %s", strerror(errno));
-   return ret;
-   }
+   if (ret < 0)
+   return -errno;
 
*rootid = args.treeid;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About in-band dedupe for v4.7

2016-05-12 Thread Wang Shilong
Hello Guys,

I am commenting not because of Qu's patch, of course, Qu and Mark Fasheh
Do a really good thing for Btrfs contributions.

Just my two cents!

1) I think Currently, we should really focus on Btrfs stability, there
are still many
bugs hidden inside Btrfs, please See Filipe flighting patches here and
there. Unfortunately,
I have not seen Btrfs's Bug is reducing for these two years even we
have frozen new features here.
we are adopting Btrfs internally sometime before, but it is really
unstable unfortunately.

So Why not sit down and make it stable firstly?

2)I am not against new feature, but for a new feature, I think we
should be really
careful now,  Especially if a new feature affect normal write/read
path, I think following things can
be done to make things better:

->Give your design and documents(maybe put it in wiki or somewhere
else) So that
other guys can really review your design instead of looking a bunch of
codes firstly. And we really
need understand pros and cons of design, also if there is TODO, please
do clarity out how we
can solve this problem(or it is possible?).

   ->We need reallly a lot of testing and benchmarking if it affects
normal write/read path.

   ->I think Chris is nice to merge patches, but I really argue next
time if we want to merge new feautres
Please make sure at least two other guys review for patches.


Thank you!
Shilong

On Wed, May 11, 2016 at 6:11 AM, Mark Fasheh  wrote:
> On Tue, May 10, 2016 at 03:19:52PM +0800, Qu Wenruo wrote:
>> Hi, Chris, Josef and David,
>>
>> As merge window for v4.7 is coming, it would be good to hear your
>> ideas about the inband dedupe.
>>
>> We are addressing the ENOSPC problem which Josef pointed out, and we
>> believe the final fix patch would come out at the beginning of the
>> merge window.(Next week)
>
> How about the fiemap performance problem you referenced before? My guess is
> that it happens because you don't coalesce writes into anything larger than
> a page so you're stuck deduping at some silly size like 4k. This in turn
> fragments the files so much that fiemap has a hard time walking backrefs.
>
> I have to check the patches to be sure but perhaps you can tell me whether
> my hunch is correct or not.
>
>
> In fact, I actually asked privately for time to review your dedupe patches,
> but I've been literally so busy cleaning up after the mess you left in your
> last qgroups rewrite I haven't had time.
>
> You literally broke qgroups in almost every spot that matters. In some cases
> (drop_snapshot) you tore out working code and left in a /* TODO */ comment
> for someone else to complete.  Snapshot create was so trivially and
> completely broken by your changes that weeks later, I'm still hunting a
> solution which doesn't involve adding an extra _commit_ to our commit.  This
> is a MASSIVE regression from where we were before.
>
> IMHO, you should not be trusted with large features or rewrites until you
> can demonstrate:
>
>  - A willingness to *completely* solve the problem you are trying to 'fix',
>not do half the job which someone else will have to complete for you.
>
>  - Actual testing. The snapshot bug I reference above exists purely because
>nobody created a snapshot inside of one and checked the qgroup numbers!
>
> Sorry to be so harsh.
>--Mark
>
> --
> Mark Fasheh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Wang Shilong
On Wed, Dec 2, 2015 at 9:53 PM, Tomasz Chmielewski  wrote:
> On 2015-12-02 22:03, Austin S Hemmelgarn wrote:
>
>>>  From these numbers (124 GB used where data size is 153 GB), it appears
>>> that we save around 20% with zlib compression enabled.
>>> Is 20% reasonable saving for zlib? Typically text compresses much better
>>> with that algorithm, although I understand that we have several
>>> limitations when applying that on a filesystem level.
>>
>>
>> This is actually an excellent question.  A couple of things to note
>> before I share what I've seen:
>> 1. Text compresses better with any compression algorithm.  It is by
>> nature highly patterned and moderately redundant data, which is what
>> benefits the most from compression.
>
>
> It looks that compress=zlib does not compress very well. Following Duncan's
> suggestion, I've changed it to compress-force=zlib, and re-copied the data
> to make sure the file are compressed.
>
> Compression ratio is much much better now (on a slightly changed data set):
>
> # df -h
> /dev/xvdb   200G   24G  176G  12% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 138G/var/log/remote/
>
>
> So, 138 GB files use just 24 GB on disk - nice!
>
> However, I would still expect that compress=zlib has almost the same effect
> as compress-force=zlib, for 100% text files/logs.

btw, what is your kernel version? there was a bug that detected inode
compression
ration wrong.

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5
http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78

Regards,
Shilong

>
>
> Tomasz Chmielewski
> http://wpkg.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shall distros run btrfsck on boot?

2015-11-23 Thread Wang Shilong
On Tue, Nov 24, 2015 at 12:02 PM, Christoph Anton Mitterer
 wrote:
> Hey.
>
> Short question since that came up on debian-devel.
>
> Now that btrfs check get's more and more useful, are the developers
> going to recommend running it periodically on boot (of course that
> wouldn't work right now, as it would *always* check)?
>
> Plus... is btrfs check (without any arguments) non-desctructive, or are
> there corner-cases where it may lead to any changes on the devices?

As far as i know, 'btrfs check' did not repair anything in default.
only output errors if repair options not specified.

Regards,
Shilong
>
> Thanks,
> Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix wrong check for btrfs_force_chunk_alloc()

2015-06-30 Thread Wang Shilong

 On Sun, Apr 12, 2015 at 7:35 AM, Wang Shilong wangshilong1...@gmail.com 
 wrote:
 btrfs_force_chunk_alloc() return 1 for allocation chunk successfully.
 This problem exists since commit c87f08ca4.

 With this patch, we might fix some enospc problems for balances.

 Signed-off-by: Wang Shilong wangshilong1...@gmail.com
 Reviewed-by: Filipe Manana fdman...@suse.com
 Tested-by: Filipe Manana fdman...@suse.com

Thanks Filipe!

Regards,
Shilong

 
 ---
  fs/btrfs/relocation.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
 index d830853..c453464 100644
 --- a/fs/btrfs/relocation.c
 +++ b/fs/btrfs/relocation.c
 @@ -4037,7 +4037,7 @@ restart:
 if (trans  progress  err == -ENOSPC) {
 ret = btrfs_force_chunk_alloc(trans, rc-extent_root,
   rc-block_group-flags);
 -   if (ret == 0) {
 +   if (ret == 1) {
 err = 0;
 progress = 0;
 goto restart;
 --
 1.7.12.4
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix wrong check for btrfs_force_chunk_alloc()

2015-04-12 Thread Wang Shilong
btrfs_force_chunk_alloc() return 1 for allocation chunk successfully.
This problem exists since commit c87f08ca4.

With this patch, we might fix some enospc problems for balances.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/relocation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d830853..c453464 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4037,7 +4037,7 @@ restart:
if (trans  progress  err == -ENOSPC) {
ret = btrfs_force_chunk_alloc(trans, rc-extent_root,
  rc-block_group-flags);
-   if (ret == 0) {
+   if (ret == 1) {
err = 0;
progress = 0;
goto restart;
-- 
1.7.12.4
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Send/Receive WARN_ON Cannot allocate memory And no such file or directory error

2015-03-17 Thread Wang Shilong
Hi,

 
 Hi,
 
 I have been testing btrfs send/receive recently.
 I got an error send ioctl failed with -12: Cannot allocate memory on
 send side.
 WARN_ON happened on len  PATH_MAX in fs_path_ensure_buf.
 
 I got an error rename failed: no such file or directory on receive side.
 The followings are simple reproduced steps and related information,
 Is there any  idea about what this might be or how to fix it?
 
 uanme -a
 Linux ubuntu 4.0.0-rc1-custom #2 SMP Fri Feb 27 22:20:12 CST 2015
 x86_64 x86_64 x86_64 GNU/Linux
 btrfs --version
 Btrfs v3.14.1

I don’t know exact commit that fix this problem.
But there are some bug fixes since v3.14.

Could you try latest kernel and see if problem still exist?
 
 
 Steps to reproduce:
 
  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt
  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt2
 
 $ mkdir /mnt/data
 $ mkdir /mnt/data/n1
 $ mkdir /mnt/data/n1/n2
 $ mkdir /mnt/data/n4
 $ mkdir /mnt/data/n1/n2/p1
 $ mkdir /mnt/data/n1/n2/p1/p2
 $ mkdir /mnt/data/t6
 $ mkdir /mnt/data/t7
 $ mkdir -p /mnt/data/t5/t7
 $ mkdir /mnt/data/t2
 $ mkdir /mnt/data/t4
 $ mkdir -p /mnt/data/t1/t3
 $ mkdir /mnt/data/p1
 $ mv /mnt/data/t1 /mnt/data/p1
 $ mkdir -p /mnt/data/p1/p2
 $ mv /mnt/data/t4 /mnt/data/p1/p2/t1
 $ mv /mnt/data/t5 /mnt/data/n4/t5
 $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/n4/t5/p2
 $ mv /mnt/data/t7 /mnt/data/n4/t5/p2/t7
 $ mv /mnt/data/t2 /mnt/data/n4/t1
 $ mv /mnt/data/p1 /mnt/data/n4/t5/p2/p1
 $ mv /mnt/data/n1/n2 /mnt/data/n4/t5/p2/p1/p2/n2
 $ mv /mnt/data/n4/t5/p2/p1/p2/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1
 $ mv /mnt/data/n4/t5/t7 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7
 $ mv /mnt/data/n4/t5/p2/p1/t1/t3 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3
 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/p1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1
 $ mv /mnt/data/t6 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t5
 $ mv /mnt/data/n4/t5/p2/p1/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t1
 $ mv /mnt/data/n1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/n1
 
  $ btrfs subvolume snapshot -r /mnt /mnt/snap1
 
 $ mv /mnt/data/n4/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/t1
 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/t1 /mnt/data/n4/
 $ mv /mnt/data/n4/t5/p2/p1/p2/n2 /mnt/data/n4/t1/n2
 $ mv /mnt/data/n4/t1/t7/p1 /mnt/data/n4/t1/n2/p1
 $ mv /mnt/data/n4/t1/t3/t1 /mnt/data/n4/t1/n2/t1
 $ mv /mnt/data/n4/t1/t3 /mnt/data/n4/t1/n2/t1/t3
 $ mv /mnt/data/n4/t5/p2/p1/p2 /mnt/data/n4/t1/n2/p1/p2
 $ mv /mnt/data/n4/t1/t7 /mnt/data/n4/t1/n2/p1/t7
 $ mv /mnt/data/n4/t5/p2/p1 /mnt/data/n4/t1/n2/p1/p2/p1
 $ mv /mnt/data/n4/t1/n2/t1/t3/t5 /mnt/data/n4/t1/n2/p1/p2/t5
 $ mv /mnt/data/n4/t5 /mnt/data/n4/t1/n2/p1/p2/p1/t5
 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/t5/p2 /mnt/data/n4/t1/n2/p1/p2/p1/p2
 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/p2/t7 /mnt/data/n4/t1/t7
 
  $ btrfs subvolume snapshot -r /mnt /mnt/snap2
 
  $ btrfs send /mnt/snap1 | btrfs receive /mnt2
  $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive -vv /mnt2
 ERROR: send ioctl failed with -12: Cannot allocate memory
 rename data/n4/t5/p2/p1/p2 - o272-10-0
 rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0
 ERROR: rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0 failed. No such
 file or directory
 
 It seems that data/n4/t5/p2/p1/p2 had been renamed but the file path
 had not been updated to
 o272-10-0.
 Thus it makes the next rename failed.
 
 Thanks.
 robbieko
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Send/Receive WARN_ON Cannot allocate memory And no such file or directory error

2015-03-17 Thread Wang Shilong

 
 On Tue, Mar 17, 2015 at 9:34 AM, Wang Shilong wangshilong1...@gmail.com 
 wrote:
 Hi,
 
 
 Hi,
 
 I have been testing btrfs send/receive recently.
 I got an error send ioctl failed with -12: Cannot allocate memory on
 send side.
 WARN_ON happened on len  PATH_MAX in fs_path_ensure_buf.
 
 I got an error rename failed: no such file or directory on receive side.
 The followings are simple reproduced steps and related information,
 Is there any  idea about what this might be or how to fix it?
 
 uanme -a
 Linux ubuntu 4.0.0-rc1-custom #2 SMP Fri Feb 27 22:20:12 CST 2015
 x86_64 x86_64 x86_64 GNU/Linux
 btrfs --version
 Btrfs v3.14.1
 
 I don’t know exact commit that fix this problem.
 But there are some bug fixes since v3.14.
 
 Could you try latest kernel and see if problem still exist?
 
 There's isn't any change between 4.0-rc1 and 4.0-rc4 that will fix
 this issue (iow, futile to try 4.0-rc4 or Chris' integration branch).
 It's another case for an infinite path build loop, likely due to
 inverting the ancestor-descendant relationship of 2 directories.
 
 I'll take a look at it when I get the time, can't promise how soon however.

Oops, you are right, i was only seeing Btrfs v3.14.1 which i was thinking
it was kernel version…


 
 thanks
 
 
 
 Steps to reproduce:
 
 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt
 $ mkfs.btrfs -f /dev/sdc
 $ mount /dev/sdc /mnt2
 
 $ mkdir /mnt/data
 $ mkdir /mnt/data/n1
 $ mkdir /mnt/data/n1/n2
 $ mkdir /mnt/data/n4
 $ mkdir /mnt/data/n1/n2/p1
 $ mkdir /mnt/data/n1/n2/p1/p2
 $ mkdir /mnt/data/t6
 $ mkdir /mnt/data/t7
 $ mkdir -p /mnt/data/t5/t7
 $ mkdir /mnt/data/t2
 $ mkdir /mnt/data/t4
 $ mkdir -p /mnt/data/t1/t3
 $ mkdir /mnt/data/p1
 $ mv /mnt/data/t1 /mnt/data/p1
 $ mkdir -p /mnt/data/p1/p2
 $ mv /mnt/data/t4 /mnt/data/p1/p2/t1
 $ mv /mnt/data/t5 /mnt/data/n4/t5
 $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/n4/t5/p2
 $ mv /mnt/data/t7 /mnt/data/n4/t5/p2/t7
 $ mv /mnt/data/t2 /mnt/data/n4/t1
 $ mv /mnt/data/p1 /mnt/data/n4/t5/p2/p1
 $ mv /mnt/data/n1/n2 /mnt/data/n4/t5/p2/p1/p2/n2
 $ mv /mnt/data/n4/t5/p2/p1/p2/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1
 $ mv /mnt/data/n4/t5/t7 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7
 $ mv /mnt/data/n4/t5/p2/p1/t1/t3 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3
 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/p1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1
 $ mv /mnt/data/t6 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t5
 $ mv /mnt/data/n4/t5/p2/p1/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t1
 $ mv /mnt/data/n1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/n1
 
 $ btrfs subvolume snapshot -r /mnt /mnt/snap1
 
 $ mv /mnt/data/n4/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/t1
 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/t1 /mnt/data/n4/
 $ mv /mnt/data/n4/t5/p2/p1/p2/n2 /mnt/data/n4/t1/n2
 $ mv /mnt/data/n4/t1/t7/p1 /mnt/data/n4/t1/n2/p1
 $ mv /mnt/data/n4/t1/t3/t1 /mnt/data/n4/t1/n2/t1
 $ mv /mnt/data/n4/t1/t3 /mnt/data/n4/t1/n2/t1/t3
 $ mv /mnt/data/n4/t5/p2/p1/p2 /mnt/data/n4/t1/n2/p1/p2
 $ mv /mnt/data/n4/t1/t7 /mnt/data/n4/t1/n2/p1/t7
 $ mv /mnt/data/n4/t5/p2/p1 /mnt/data/n4/t1/n2/p1/p2/p1
 $ mv /mnt/data/n4/t1/n2/t1/t3/t5 /mnt/data/n4/t1/n2/p1/p2/t5
 $ mv /mnt/data/n4/t5 /mnt/data/n4/t1/n2/p1/p2/p1/t5
 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/t5/p2 /mnt/data/n4/t1/n2/p1/p2/p1/p2
 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/p2/t7 /mnt/data/n4/t1/t7
 
 $ btrfs subvolume snapshot -r /mnt /mnt/snap2
 
 $ btrfs send /mnt/snap1 | btrfs receive /mnt2
 $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive -vv /mnt2
 ERROR: send ioctl failed with -12: Cannot allocate memory
 rename data/n4/t5/p2/p1/p2 - o272-10-0
 rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0
 ERROR: rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0 failed. No such
 file or directory
 
 It seems that data/n4/t5/p2/p1/p2 had been renamed but the file path
 had not been updated to
 o272-10-0.
 Thus it makes the next rename failed.
 
 Thanks.
 robbieko
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 Best Regards,
 Wang Shilong
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
 -- 
 Filipe David Manana,
 
 Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe or useful to use NOCOW flag and autodefrag mount option at same time?

2015-03-15 Thread Wang Shilong
Hello,

 
 Hi,
 first of all a great thank you to all developers and contributors. The
 Btrfs filesystem ist a great filesystem!
 
 Following common recommendations [1], I use these mount options on my
 main developing machine: noatime,autodefrag. This is desktop machine and
 it works well so far. Now, I'm also going to install several KVM virtual
 machines on this system. I want to use qcow2 files stored on SSD with a
 btrfs on it. In order to avoid bad performance with the VMs, I want to
 disable the Copy-On-Write mechanism on the storage directory of my VM
 images as for example described in [2].
 
 My question now is: Is it safe to use the autodefrag mount option and to
 disable CoW for single directories which contain VM files at the same
 time? Is it even useful or may it harm?
 
 Thank you very much for insights and comments in advance.

Btrfs use COW feautr to do defrag, acutally there was bug in the past
(nocow and defrag)..

But make sure your kernel include following commit:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=47059d930f0e002ff851beea87d738146804726d

With above commit, i think autodefrag should work well with nocow.

Best regards,
Wang Shilong

 
 Best regards,
 jploz
 
 
 As requested in the wiki basic information about my fresh setup attached:
 uname -a
 Linux hostname 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) i686 
 GNU/Linux
 
 btrfs --version
 Btrfs v3.17
 
 btrfs fi show
 Label: none  uuid: cc6492e7-a7a7-43f9-bba2-a1abe8635d36
   Total devices 1 FS bytes used 4.00GiB
   devid1 size 210.00GiB used 8.02GiB path /dev/sdb2
 
 Label: 'DataHddBtr'  uuid: aa3f41c0-5a60-4fd7-9365-8c99284e032c
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 480.00GiB used 2.04GiB path /dev/sda1
 
 Btrfs v3.17
 
 mount | grep btrfs
 /dev/sdb2 on / type btrfs (rw,noatime,ssd,space_cache,autodefrag)
 /dev/sda1 on /data/btr type btrfs (rw,noatime,space_cache,autodefrag)
 
 btrfs fi df /
 Data, single: total=7.01GiB, used=3.89GiB
 System, single: total=4.00MiB, used=16.00KiB
 Metadata, single: total=1.01GiB, used=114.39MiB
 GlobalReserve, single: total=48.00MiB, used=0.00B
 
 btrfs fi df /data/btr/
 Data, single: total=8.00MiB, used=256.00KiB
 System, DUP: total=8.00MiB, used=16.00KiB
 System, single: total=4.00MiB, used=0.00B
 Metadata, DUP: total=1.00GiB, used=112.00KiB
 Metadata, single: total=8.00MiB, used=0.00B
 GlobalReserve, single: total=16.00MiB, used=0.00B
 
 dmesg | grep -i btrfs
 [2.693408] Btrfs loaded
 [2.756178] BTRFS: device fsid cc6492e7-a7a7-43f9-bba2-a1abe8635d36 devid 
 1 transid 393 /dev/sdb2
 [2.756843] BTRFS: device label DataHddBtr devid 1 transid 9 /dev/sda1
 [2.775447] BTRFS info (device sdb2): disk space caching is enabled
 [2.779944] BTRFS: detected SSD devices, enabling SSD mode
 [3.430997] BTRFS info (device sdb2): enabling auto defrag
 [3.431002] BTRFS info (device sdb2): disk space caching is enabled
 [3.548937] BTRFS info (device sda1): enabling auto defrag
 [3.548939] BTRFS info (device sda1): disk space caching is enabled
 [  183.341398] BTRFS info (device sda1): enabling auto defrag
 [  183.341403] BTRFS info (device sda1): disk space caching is enabled
 
 
 References:
 [1] https://btrfs.wiki.kernel.org/index.php/Gotchas
 [2] https://wiki.archlinux.org/index.php/Btrfs#Copy-On-Write_.28CoW.29
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs performance, sudden drop to 0 IOPs

2015-02-12 Thread Wang Shilong
Hello guys,

 
 On Thu, Feb 12, 2015 at 05:33:41AM +0100, Kai Krakow wrote:
 Duncan 1i5t5.dun...@cox.net schrieb:
 
 P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:
 
 In the test, I use --direct=1 parameter for fio which basically does
 O_DIRECT on target file. The O_DIRECT should guarantee that the
 filesystem cache is bypassed and IO is sent directly to the underlaying
 storage. Are you saying that btrfs buffers writes despite of O_DIRECT?
 
 I'm out of my (admin, no claims at developer) league on that.  I see
 someone else replied, and would defer to them on this.
 
 I don't think that O_DIRECT can work efficiently on COW filesystems. It 
 probably has a negative effect and cannot be faster as normal access. Linus 
 itself said one time that O_DIRECT is broken and should go away, and instead 
 cache hinting should be used.
 
 Think of this: For the _unbuffered_ direct-io request to be fulfilled the 
 file system has to go through its COW logic first which it otherwise had 
 buffered and done in background. Bypassing the cache is probably only a 
 side-effect of O_DIRECT, not its purpose.
 
 Hmm, not true in btrfs, the COW logic mentioned above is nothing but to 
 allocate
 a NEW extent, and it's not done in background.
 
 Comparing to nocow logic, the main difference comes from 
 a) COW files' calculating checksums of the dirty data in DIO pages which 
 nocow files don't need to.
 b) their endio handlers.
 
 Or am I missing something?


We did benchmark Btrfs aio/dio performance before, we noticed one big 
differences
from COW and nocow is not only checksum but checksum cost more metadata, which 
will
make Btrfs performance drop suddenly for a while, because of metadata 
reservation.


 
 Thanks,
 
 -liubo
 
 At least I'd try with a nocow-file for the benchmark if you still have to 
 use O_DIRECT.
 
 -- 
 Replies to list only preferred.
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs: qgroup: destroy related qgroup when removing subvolume if needed.

2015-01-21 Thread Wang Shilong
Hello,

 
 When removing a subvol, if the related qgroup has no child qgroup, we should 
 destroy
 it at the same time. Also remove the TODO entry in qgroup.c.

My two cents here:

Take a quick look at this, i am not sure whether this is right way to do it.

Actually, i think we can only remove a subvolume qgroup safely when both refer 
and excl are 0,
because for subvolume removal case, to make qgroup accounting right, we need 
walk tree.

At this time, qgroup walking might have not finished, if we remove this qgroup 
directly,
accounting for this qgroup will be skipped.

Maybe we could remove such qgroup when at qgroup accounting, we find an 
qgroup’s refer and excl
are 0, we could remove it, or we do it at the background..

Also, there is another risk, that is to say, if qgroup accounting not work 
right, So
we might need gurantee that subvolume it going to be deleted or have been 
deleted at the same time,
then we could remove qgroup safely etc.


 Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
 ---
 fs/btrfs/inode.c  | 4 
 fs/btrfs/qgroup.c | 1 -
 2 files changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index e687bb0..31de211 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -59,6 +59,7 @@
 #include backref.h
 #include hash.h
 #include props.h
 +#include qgroup.h
 
 struct btrfs_iget_args {
   struct btrfs_key *location;
 @@ -4029,6 +4030,9 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle 
 *trans,
   ret = btrfs_update_inode_fallback(trans, root, dir);
   if (ret)
   btrfs_abort_transaction(trans, root, ret);
 + ret = btrfs_remove_qgroup(trans, root-fs_info, objectid);
 + if (ret == -EBUSY)
 + ret = 0;
 out:
   btrfs_free_path(path);
   return ret;
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index c3b1e4f..303c078 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -35,7 +35,6 @@
 #include qgroup.h
 
 /* TODO XXX FIXME
 - *  - subvol delete - delete when ref goes to 0? delete limits also?
  *  - reorganize keys
  *  - compressed
  *  - sync
 -- 
 1.8.4.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs-progs: make btrfs qgroups show human readable sizes

2015-01-18 Thread Wang Shilong
(struct btrfs_qgroup 
 *qgroup,
 {
   BUG_ON(column = BTRFS_QGROUP_ALL || column  0);
   int len;
 + int unit_mode = btrfs_qgroup_columns[column].unit_mode;
 + int max_len = btrfs_qgroup_columns[column].max_len;
 
   switch (column) {
 
 @@ -203,24 +223,20 @@ static void print_qgroup_column(struct btrfs_qgroup 
 *qgroup,
   print_qgroup_column_add_blank(BTRFS_QGROUP_QGROUPID, len);
   break;
   case BTRFS_QGROUP_RFER:
 - len = printf(%llu, qgroup-rfer);
 - print_qgroup_column_add_blank(BTRFS_QGROUP_RFER, len);
 + len = printf(%*s, max_len, pretty_size_mode(qgroup-rfer, 
 unit_mode));
   break;
   case BTRFS_QGROUP_EXCL:
 - len = printf(%llu, qgroup-excl);
 - print_qgroup_column_add_blank(BTRFS_QGROUP_EXCL, len);
 + len = printf(%*s, max_len, pretty_size_mode(qgroup-excl, 
 unit_mode));
   break;
   case BTRFS_QGROUP_PARENT:
   len = print_parent_column(qgroup);
   print_qgroup_column_add_blank(BTRFS_QGROUP_PARENT, len);
   break;
   case BTRFS_QGROUP_MAX_RFER:
 - len = printf(%llu, qgroup-max_rfer);
 - print_qgroup_column_add_blank(BTRFS_QGROUP_MAX_RFER, len);
 + len = printf(%*s, max_len, pretty_size_mode(qgroup-max_rfer, 
 unit_mode));
   break;
   case BTRFS_QGROUP_MAX_EXCL:
 - len = printf(%llu, qgroup-max_excl);
 - print_qgroup_column_add_blank(BTRFS_QGROUP_MAX_EXCL, len);
 + len = printf(%*s, max_len, pretty_size_mode(qgroup-max_excl, 
 unit_mode));
   break;
   case BTRFS_QGROUP_CHILD:
   len = print_child_column(qgroup);
 @@ -250,30 +266,41 @@ static void print_table_head()
 {
   int i;
   int len;
 + int max_len;
 
   for (i = 0; i  BTRFS_QGROUP_ALL; i++) {
 + max_len = btrfs_qgroup_columns[i].max_len;
   if (!btrfs_qgroup_columns[i].need_print)
   continue;
 - printf(%s, btrfs_qgroup_columns[i].name);
 - len = btrfs_qgroup_columns[i].max_len -
 -   strlen(btrfs_qgroup_columns[i].name);
 - while (len--)
 - printf( );
 + if ((i == BTRFS_QGROUP_QGROUPID) | (i == BTRFS_QGROUP_PARENT) |
 + (i == BTRFS_QGROUP_CHILD))
 + printf(%-*s, max_len, btrfs_qgroup_columns[i].name);
 + else
 + printf(%*s, max_len, btrfs_qgroup_columns[i].name);
   printf( );
   }
   printf(\n);
   for (i = 0; i  BTRFS_QGROUP_ALL; i++) {
 + max_len = btrfs_qgroup_columns[i].max_len;
   if (!btrfs_qgroup_columns[i].need_print)
   continue;
 -
 - len = strlen(btrfs_qgroup_columns[i].name);
 - while (len--)
 - printf(-);
 - len = btrfs_qgroup_columns[i].max_len -
 -   strlen(btrfs_qgroup_columns[i].name);
 + if ((i == BTRFS_QGROUP_QGROUPID) | (i == BTRFS_QGROUP_PARENT) |
 + (i == BTRFS_QGROUP_CHILD)) {
 + len = strlen(btrfs_qgroup_columns[i].name);
 + while (len--)
 + printf(-);
 + len = max_len - strlen(btrfs_qgroup_columns[i].name);
 + while (len--)
 + printf( );
 + } else {
 + len = max_len - strlen(btrfs_qgroup_columns[i].name);
 + while (len--)
 + printf( );
 + len = strlen(btrfs_qgroup_columns[i].name);
 + while (len--)
 + printf(-);
 + }
   printf( );
 - while (len--)
 - printf( );
   }
   printf(\n);
 }
 diff --git a/qgroup.h b/qgroup.h
 index 653cf1c..09070b6 100644
 --- a/qgroup.h
 +++ b/qgroup.h
 @@ -83,6 +83,7 @@ u64 btrfs_get_path_rootid(int fd);
 int btrfs_show_qgroups(int fd, struct btrfs_qgroup_filter_set *,
  struct btrfs_qgroup_comparer_set *);
 void btrfs_qgroup_setup_print_column(enum btrfs_qgroup_column_enum column);
 +void btrfs_qgroup_setup_human_readable(unsigned unit_mode);
 struct btrfs_qgroup_filter_set *btrfs_qgroup_alloc_filter_set(void);
 void btrfs_qgroup_free_filter_set(struct btrfs_qgroup_filter_set *filter_set);
 int btrfs_qgroup_setup_filter(struct btrfs_qgroup_filter_set **filter_set,
 -- 
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: btrfs performance - ssd array

2015-01-17 Thread Wang Shilong
Hello,

 
 Hello,
 
 
 Could you check how many extents with BTRFS and Ext4:
 # filefrag test1
 
 So my findings are odd:
 
 On BTRFS when I run fio with a single worker thread (target file is
 12GB large,and its 100% random write of 4kb blocks), then number of
 extents reported by filefrag is around 3.
 However when I do the same with 4 worker threads, I get some crazy
 number of extents - test1: 3141866 extents found. Also when running
 with 4 threads when I check CPU, the sys% utilization takes 80% of CPU
 ( in the top output I see that all is consumed by kworker processes).
 
 On the EXT4 I get only 13 extents even when running with 4 worker
 threads. (note that I created RAID10 using mdadm before setting up
 ext4 there in order to get comparable storage solution to what we
 test with  BTRFS).
 
 Another odd thing is, that it takes very long time for the filefrag
 utility to return the result on the BTRFS and not only for the case
 where I got 3 milions of extents but also for the first case where I
 ran single worker and the number of extents was only 3. Filefrag on
 EXT4 returns immediately.

So looks like Btrfs lock contention problem.

Take a look at look btrfs_drop_extents(), even for nodatacow
, there will be still many items removal for FS tree.

Unfortunately, btrfs lock contention problems seem very sensible
to item removal operations.

And, here Your SSD is very fast? hm, i am not very sensible to
Your IOPS number.

You could verify this problem by reduing threads number for example
compare 1 thread results. aslo i guess btrfs seq write formace should
be less serious here…
 
 
 To see if this is because bad fragments for BTRFS. I am still not
 sure how fio will test randwrite here, so here is possibilities:
 
 case1:
 if fio don’t repeat write same position for several time, i think
 you could add --overite=0, and retest to see if it helps.
 
 Not sure  what parameter do you mean here.


I mean ‘--overwrite' is an option for fio.

 
 case2:
if fio randwrite did write same position for several time, i think
you could use ‘-o nodatacow’ mount option to verify if this is because
BTRFS COW caused serious fragments.
 
 
 It seems that mounting it with this option does have some effect but
 not very significant and it is not very deterministic. The IOPs are
 slightly higher at the beginning (~25 000 IOPs) but IOPs perfromance
 is very spiky and I can still see that CPU sys% is very high. As soon
 as the kworker threads start consuming CPU, the IOPs performance goes
 down again to some ~15 000 IOPs.

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs performance - ssd array

2015-01-12 Thread Wang Shilong
Hello,

 Hello,
 
 we are currently investigating possiblities and performance limits of
 the Btrfs filesystem. Now it seems we are getting pretty poor
 performance for the writes and I would like to ask, if our results
 makes sense and if it is a result of some well known performance
 bottleneck.
 
 Our setup:
 
 Server:
   CPU: dual socket: E5-2630 v2
   RAM: 32 GB ram
   OS: Ubuntu server 14.10
   Kernel: 3.19.0-031900rc2-generic
   btrfs tools: Btrfs v3.14.1
   2x LSI 9300 HBAs - SAS3 12/Gbs
   8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
 
 Both HBAs see all 8 disks and we have set up multipathing using
 multipath command and device mapper. Then we using this command to
 create the filesystem:
 
 mkfs.btrfs -f -d raid10 /dev/mapper/prm-0 /dev/mapper/prm-1
 /dev/mapper/prm-2 /dev/mapper/prm-3 /dev/mapper/prm-4
 /dev/mapper/prm-5 /dev/mapper/prm-6 /dev/mapper/prm-7
 
 
 We run performance test using following command:
 
 fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
 --name=test1 --filename=test1 --bs=4k --iodepth=32 --size=12G
 --numjobs=24 --readwrite=randwrite

Could you check how many extents with BTRFS and Ext4:
# filefrag test1

To see if this is because bad fragments for BTRFS. I am still not
sure how fio will test randwrite here, so here is possibilities:

case1:
 if fio don’t repeat write same position for several time, i think
 you could add --overite=0, and retest to see if it helps.

case2:
if fio randwrite did write same position for several time, i think
you could use ‘-o nodatacow’ mount option to verify if this is because
BTRFS COW caused serious fragments.

 
 
 The results for the random read are more or less comparable with the
 performance of EXT4 filesystem, we get approximately 300 000 IOPs for
 random read.
 
 For random write however, we are getting only about 15 000 IOPs, which
 is much lower than for ESX4 (~200 000 IOPs for RAID10).
 
 
 Regards,
 Premek
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: should I use btrfs on Centos 7 for a new production server?

2014-12-30 Thread Wang Shilong
Hello,

 
 I have a well tested and working fine Centos5-Xen system. Accumulated cruft 
 from various development efforts make it desirable to redo the install. 
 Currently a RAID-10 ext4 filesystem with LVM and 750G of storage. There's a 
 hot spare 750 drive in the system.
 
 I'm thinking of migrating the web sites (almost the only use of the server) 
 to a spare then installing Centos-7 and btrfs, then migrating the sites back.
 
 I see RH marks btrfs in C7 as a technology preview but don't understand what 
 that implies for future support and a suitably stable basis for storage.
 
 The demand on the system is low and not likely to change in the near future, 
 storage access speeds are not likely to be dealbreakers and it would be nice 
 to not need to use LVM, btrfs seems to have a better feature set and more 
 intuitive command set. But I'm uncertain about stability. Anyone have an 
 opinion?
 

I used CentOS7 btrfs myself, just doing some tests..it crashed easily.
I don’t know how much efforts that Redhat do on btrfs for 7 series.


 Dave
 
 -- 
 As long as politics is the shadow cast on society by big business,
 the attenuation of the shadow will not change the substance.
 
 -- John Dewey
 
 
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: should I use btrfs on Centos 7 for a new production server?

2014-12-30 Thread Wang Shilong

 
 Hello,
 
 
 I have a well tested and working fine Centos5-Xen system. Accumulated cruft 
 from various development efforts make it desirable to redo the install. 
 Currently a RAID-10 ext4 filesystem with LVM and 750G of storage. There's a 
 hot spare 750 drive in the system.
 
 I'm thinking of migrating the web sites (almost the only use of the server) 
 to a spare then installing Centos-7 and btrfs, then migrating the sites back.
 
 I see RH marks btrfs in C7 as a technology preview but don't understand what 
 that implies for future support and a suitably stable basis for storage.
 
 The demand on the system is low and not likely to change in the near future, 
 storage access speeds are not likely to be dealbreakers and it would be nice 
 to not need to use LVM, btrfs seems to have a better feature set and more 
 intuitive command set. But I'm uncertain about stability. Anyone have an 
 opinion?
 
 
 I used CentOS7 btrfs myself, just doing some tests..it crashed easily.
 I don’t know how much efforts that Redhat do on btrfs for 7 series.

Maybe use SUSE enterprise for btrfs will be a better choice, they offered
better support for btrfs as far as i know.


 
 
 Dave
 
 -- 
 As long as politics is the shadow cast on society by big business,
 the attenuation of the shadow will not change the substance.
 
 -- John Dewey
 
 
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 Best Regards,
 Wang Shilong
 

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: call inode_dec_link_count() on mkdir error path

2014-12-23 Thread Wang Shilong
From: Wang Shilong wangshilong1...@gmail.com

In btrfs_mkdir(), if it fails to create dir, we should
clean up existed items, setting inode's link properly
to make sure it could be cleaned up properly.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8de2335..8a036ed 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6255,8 +6255,10 @@ static int btrfs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
 
 out_fail:
btrfs_end_transaction(trans, root);
-   if (drop_on_err)
+   if (drop_on_err) {
+   inode_dec_link_count(inode);
iput(inode);
+   }
btrfs_balance_delayed_items(root);
btrfs_btree_balance_dirty(root);
return err;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BTRFS-PROGS][PATCH][CLEANUP] Remove some gotos

2014-12-21 Thread Wang Shilong
Hello,

 
 Hi all,
 
 before sending this patch I thought twice if it is really worth.
 I know the coding style is really a hot topic. But the world
 is already too ugly, to allow even that. And it is Merry Christmas, 
 so let me to make me a gift: a small clean up.
 
 The culprit are some goto found in 'cmds-filesystem.c' inside the
 function 'cmd_show()'. These are used instead of an 'if' statement.
 
 I wish you all a Happy Christmas and a successful start to 2015.

I think Nice cleanups are always good for Btrfs..Just take a look
at Btrfs kernel codes etc which really scared a lot of code reviewers and 
newbies..

Merry Chirismas guys. ^_^

 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Btrfs: deal with all 'subvol=xxx' options once

2014-12-03 Thread Wang Shilong
Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt
 # btrfs sub create /mnt/dir
 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir

It fails with:
 mount: mount(2) failed: No such file or directory

Btrfs deal with subvolume mounting in a recursive way,
to avoid looping, it will stripe out 'subvol=' string,
then next loop will stop.Problem here is it only deal one
string once, if users specify mount option multiple times.
It will loop several times which is not good, and above
reproducing steps will also return confusing results.

Fix this problem by striping out all 'subvol=xxx' options,
only last is valid.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
v3-v4: zero buffer to make sure strncat() work...
---
 fs/btrfs/super.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 21c60ee..cf3faf5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1117,44 +1117,50 @@ static inline int is_subvolume_inode(struct inode 
*inode)
  */
 static char *setup_root_args(char *args)
 {
-   unsigned len = strlen(args) + 2 + 1;
-   char *src, *dst, *buf;
+   unsigned len;
+   char *src = args;
+   char *p = args;
+   char *dst, *buf;
 
/*
 * We need the same args as before, but with this substitution:
 * s!subvol=[^,]+!subvolid=0!
 *
 * Since the replacement string is up to 2 bytes longer than the
-* original, allocate strlen(args) + 2 + 1 bytes.
+* original, allocate strlen(args) + 2 * N + 1 bytes.
 */
-
-   src = strstr(args, subvol=);
+   p = strstr(src, subvol=);
/* This shouldn't happen, but just in case.. */
-   if (!src)
+   if (!p)
return NULL;
 
-   buf = dst = kmalloc(len, GFP_NOFS);
+   len = strlen(args) + 1;
+   while (p) {
+   len += 2;
+   p = strchr(p, ',');
+   if (!p)
+   break;
+   p = strstr(p, subvol=);
+   }
+
+   buf = dst = kzalloc(len, GFP_NOFS);
if (!buf)
return NULL;
 
-   /*
-* If the subvol= arg is not at the start of the string,
-* copy whatever precedes it into buf.
-*/
-   if (src != args) {
-   *src++ = '\0';
-   strcpy(buf, args);
-   dst += strlen(args);
+   /* loop and replace all subvol=xxx string */
+   while ((p = strstr(src, subvol=))) {
+   strncat(dst, src, p - src);
+   dst += p - src;
+   strcpy(dst, subvolid=0);
+   dst += strlen(subvolid=0);
+   src = p = strchr(p, ',');
+   if (!src)
+   break;
}
-
-   strcpy(dst, subvolid=0);
-   dst += strlen(subvolid=0);
-
/*
 * If there is a , after the original subvol=... string,
 * copy that suffix into our buffer.  Otherwise, we're done.
 */
-   src = strchr(src, ',');
if (src)
strcpy(dst, src);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/10] Implement device scrub/replace for RAID56

2014-12-02 Thread Wang Shilong

 
 
 
 On Tue, Dec 2, 2014 at 7:39 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 This patchset implement the device scrub/replace function for RAID56, the
 most implementation of the common data is similar to the other RAID type.
 The differentia or difficulty is the parity process. The basic idea is 
 reading
 and check the data which has checksum out of the raid56 stripe lock, if the
 data is right, then lock the raid56 stripe, read out the other data in the
 same stripe, if no IO error happens, calculate the parity and check the
 original one, if the original parity is right, the scrub parity passes.
 or write out the new one. But if the common data(not parity) that we read out
 is wrong, we will try to recover it, and then check and repair the parity.
 And in order to avoid making the code more and more complex, we copy some
 code of common data process for the parity, the cleanup work is in my
 TODO list.
 We have done some test, the patchset worked well. Of course, more tests
 are welcome. If you are interesting to use it or test it, you can pull
 the patchset from
  https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace
 Changelog v3 - v4:
 - Fix the problem that the scrub's raid bio was cached, which was reported
  by Chris.
 - Remove the 10st patch, the deadlock that was described in that patch 
 doesn't
  exist on the current kernel.
 - Rebase the patchset to the top of integration branch
 
 Thanks, I'll try this today.  I need to rebase in a new version of the RCU 
 patches, can you please cook one on top of v3.18-rc6 instead?
 

BTW, Chris could you please replace with this newer version:
https://patchwork.kernel.org/patch/5359251/

Fengguang reported a build failure regarding this.


 -chris
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: deal with all 'subvol=xxx' options once

2014-11-25 Thread Wang Shilong
Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt
 # btrfs sub create /mnt/dir
 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir

It fails with:
 mount: mount(2) failed: No such file or directory

Btrfs deal with subvolume mounting in a recursive way,
to avoid looping, it will stripe out 'subvol=' string,
then next loop will stop.Problem here is it only deal one
string once, if users specify mount option multiple times.
It will loop several times which is not good, and above
reproducing steps will also return confusing results.

Fix this problem by striping out all 'subvol=xxx' options,
only last is valid.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
v1-v2: error handling and comment update
---
 fs/btrfs/super.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 54bd91e..42f3176 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1115,7 +1115,7 @@ static inline int is_subvolume_inode(struct inode *inode)
  * subvolid=0 to make sure we get the actual tree root for path walking to the
  * subvol we want.
  */
-static char *setup_root_args(char *args)
+static char *__setup_root_args(char *args)
 {
unsigned len = strlen(args) + 2 + 1;
char *src, *dst, *buf;
@@ -1129,13 +1129,12 @@ static char *setup_root_args(char *args)
 */
 
src = strstr(args, subvol=);
-   /* This shouldn't happen, but just in case.. */
if (!src)
return NULL;
 
buf = dst = kmalloc(len, GFP_NOFS);
if (!buf)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
/*
 * If the subvol= arg is not at the start of the string,
@@ -1161,6 +1160,27 @@ static char *setup_root_args(char *args)
return buf;
 }
 
+static char *setup_root_args(char *args)
+{
+   char *p, *new_args;
+
+   p = new_args = __setup_root_args(args);
+   /* in case users specify subvol=xxx option multiple times */
+   while (p) {
+   p = __setup_root_args(new_args);
+   if (!p)
+   break;
+   if (!IS_ERR(p)) {
+   kfree(new_args);
+   new_args = p;
+   } else {
+   kfree(new_args);
+   return NULL;
+   }
+   }
+   return new_args;
+}
+
 static struct dentry *mount_subvol(const char *subvol_name, int flags,
   const char *device_name, char *data)
 {
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: deal with all 'subvol=xxx' options once

2014-11-25 Thread Wang Shilong

 
 On Tue, Nov 25, 2014 at 04:20:11PM +0800, Wang Shilong wrote:
 Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt
 # btrfs sub create /mnt/dir
 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir
 
 It fails with:
 mount: mount(2) failed: No such file or directory
 
 The bug is real, but I don't like the fix. The mount path is hard to
 read already, and I'm afraid your fix adds another unobvious step to the
 whole processing.
 
 setup_root_args replaces subvol= with subvolid=0 once. I suggest to
 replace all occurences of subvol= here and not rely on the recursive
 behaviour of the mount callbacks.

ok, if you like this way, i will do it.


 
 The (buggy) way how it works now is that the first occurence of subvol
 will get parsed and passed as
 
 newroot = vfs_kern_mount(,subvol=second,...,subvolid=0)
 
 and this will call back again to btrfs_mount and will try to mount the
 subvol 'second' but now relative to 'newroot'.
 
 Try this:
 
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt
 # btrfs sub create /mnt/dir
 # btrfs sub create /mnt/dir/dir2
 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir2
 
 mount succeeds and the mounted subvolume is dir2.

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Extent tree rebuilding with snapshots patches

2014-11-25 Thread Wang Shilong
I did send these patches a long while ago, but due to some reasons,
they were not merged, these are important fixes for fsck, without
these patches, extent tree rebuilding did not work with snapshots.

Also, /tests/fsck-tests.sh's extent tree rebuild test could always
detect failure without these patches, unluckily, it need extra
enviroment setting, so i supposed it was always 'NOTRUN'!

last patch fix a regression for root rebuilding, root rebuilding
should be at first, because if root(extent root eg) corrupted,
root items also won't succeed.

patches are based on David's integration-20141125

Wang Shilong (3):
  Btrfs-progs: fsck: deal with snapshot one by one when rebuilding
extent tree
  Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots
  Btrfs-progs, fsck: move root items repair after root rebuilding

 cmds-check.c | 412 +++
 1 file changed, 303 insertions(+), 109 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend 2/3] Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots

2014-11-25 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

This patch makes us to rebuild a really corrupt extent tree with snapshots.
To implement this, we have to verify whether a block is FULL BACKREF.

This idea come from Josef Bacik:

1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root-objectid.  We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root.  World peace
is achieved.

2) We walk down the snapshotted tree.  Say we didn't change anything
at all, it was just a clean snapshot and then boom.  So the
btrfs_header_owner(root-node) == root-objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root-objectid, but the level above did, so we add normal refs for all
of these blocks.  We go down the next level, now our
btrfs_header_owner(parent) != root-objectid and
btrfs_header_owner(eb) != root-objectid.  This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb.  If it does we are done, move on to the next block in this same
level, we don't have to go further down.

3) Harder case, we snapshotted and then changed things in the original
root.  Do the same thing as in step 2, but now we get down to
btrfs_header_owner(eb) != root-objectid  btrfs_header_owner(parent)
!= root-objectid.  We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb.  So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent-start as the
offset.  And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-check.c | 132 +--
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index ff2795d..a102752 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -112,6 +112,7 @@ struct extent_record {
unsigned int owner_ref_checked:1;
unsigned int is_root:1;
unsigned int metadata:1;
+   unsigned int flag_block_full_backref:1;
 };
 
 struct inode_backref {
@@ -4608,6 +4609,127 @@ static int is_dropped_key(struct btrfs_key *key,
return 0;
 }
 
+static int calc_extent_flag(struct btrfs_root *root,
+  struct cache_tree *extent_cache,
+  struct extent_buffer *buf,
+  struct root_item_record *ri,
+  u64 *flags)
+{
+   int i;
+   int nritems = btrfs_header_nritems(buf);
+   struct btrfs_key key;
+   struct extent_record *rec;
+   struct cache_extent *cache;
+   struct data_backref *dback;
+   struct tree_backref *tback;
+   struct extent_buffer *new_buf;
+   u64 owner = 0;
+   u64 bytenr;
+   u64 offset;
+   u64 ptr;
+   int size;
+   int ret;
+   u8 level;
+
+   /*
+* Except file/reloc tree, we can not have
+* FULL BACKREF MODE
+*/
+   if (ri-objectid  BTRFS_FIRST_FREE_OBJECTID)
+   goto normal;
+   /*
+* root node
+*/
+   if (buf-start == ri-bytenr)
+   goto normal;
+   if (btrfs_is_leaf(buf)) {
+   /*
+* we are searching from original root, world
+* peace is achieved, we use normal backref.
+*/
+   owner = btrfs_header_owner(buf);
+   if (owner == ri-objectid)
+   goto normal;
+   /*
+* we check every eb here, and if any of
+* eb dosen't have original root refers
+* to this eb, we set full backref flag for
+* this extent, otherwise normal backref.
+*/
+   for (i = 0; i  nritems; i++) {
+   struct btrfs_file_extent_item *fi;
+   btrfs_item_key_to_cpu(buf, key, i);
+
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   continue;
+   fi = btrfs_item_ptr(buf, i,
+   struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(buf, fi) ==
+   BTRFS_FILE_EXTENT_INLINE)
+   continue;
+   if (btrfs_file_extent_disk_bytenr(buf, fi) == 0)
+   continue;
+   bytenr = btrfs_file_extent_disk_bytenr(buf, fi);
+   cache = lookup_cache_extent(extent_cache, bytenr, 1);
+   if (!cache)
+   goto full_backref;
+   offset = btrfs_file_extent_offset(buf, fi);
+   rec = container_of(cache, struct

[PATCH resend 1/3] Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree

2014-11-25 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

Previously, we deal with node block firstly and then leaf block which can
maximize readahead. However, to rebuild extent tree, we need deal with snapshot
one by one.

This patch makes us deal with snapshot one by one if we need rebuild extent
tree otherwise we drop into previous way.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-check.c | 248 +--
 1 file changed, 158 insertions(+), 90 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 24b6b59..ff2795d 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -128,10 +128,14 @@ struct inode_backref {
char name[0];
 };
 
-struct dropping_root_item_record {
+struct root_item_record {
struct list_head list;
-   struct btrfs_root_item ri;
-   struct btrfs_key found_key;
+   u64 objectid;
+   u64 bytenr;
+   u8 level;
+   u8 drop_level;
+   int level_size;
+   struct btrfs_key drop_key;
 };
 
 #define REF_ERR_NO_DIR_ITEM(1  0)
@@ -4618,7 +4622,7 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
  struct rb_root *dev_cache,
  struct block_group_tree *block_group_cache,
  struct device_extent_tree *dev_extent_cache,
- struct btrfs_root_item *ri)
+ struct root_item_record *ri)
 {
struct extent_buffer *buf;
u64 bytenr;
@@ -4851,11 +4855,8 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
size = btrfs_level_size(root, level - 1);
btrfs_node_key_to_cpu(buf, key, i);
if (ri != NULL) {
-   struct btrfs_key drop_key;
-   btrfs_disk_key_to_cpu(drop_key,
- ri-drop_progress);
if ((level == ri-drop_level)
-is_dropped_key(key, drop_key)) {
+is_dropped_key(key, ri-drop_key)) {
continue;
}
}
@@ -4896,7 +4897,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
   struct cache_tree *pending,
   struct cache_tree *seen,
   struct cache_tree *nodes,
-  struct btrfs_key *root_key)
+  u64 objectid)
 {
if (btrfs_header_level(buf)  0)
add_pending(nodes, seen, buf-start, buf-len);
@@ -4905,13 +4906,12 @@ static int add_root_to_pending(struct extent_buffer 
*buf,
add_extent_rec(extent_cache, NULL, 0, buf-start, buf-len,
   0, 1, 1, 0, 1, 0, buf-len);
 
-   if (root_key-objectid == BTRFS_TREE_RELOC_OBJECTID ||
+   if (objectid == BTRFS_TREE_RELOC_OBJECTID ||
btrfs_header_backref_rev(buf)  BTRFS_MIXED_BACKREF_REV)
add_tree_backref(extent_cache, buf-start, buf-start,
 0, 1);
else
-   add_tree_backref(extent_cache, buf-start, 0,
-root_key-objectid, 1);
+   add_tree_backref(extent_cache, buf-start, 0, objectid, 1);
return 0;
 }
 
@@ -6481,6 +6481,99 @@ static int check_devices(struct rb_root *dev_cache,
return ret;
 }
 
+static int add_root_item_to_list(struct list_head *head,
+ u64 objectid, u64 bytenr,
+ u8 level, u8 drop_level,
+ int level_size, struct btrfs_key *drop_key)
+{
+
+   struct root_item_record *ri_rec;
+   ri_rec = malloc(sizeof(*ri_rec));
+   if (!ri_rec)
+   return -ENOMEM;
+   ri_rec-bytenr = bytenr;
+   ri_rec-objectid = objectid;
+   ri_rec-level = level;
+   ri_rec-level_size = level_size;
+   ri_rec-drop_level = drop_level;
+   if (drop_key)
+   memcpy(ri_rec-drop_key, drop_key, sizeof(*drop_key));
+   list_add_tail(ri_rec-list, head);
+
+   return 0;
+}
+
+static int deal_root_from_list(struct list_head *list,
+  struct btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct block_info *bits,
+  int bits_nr,
+  struct cache_tree *pending,
+  struct cache_tree *seen,
+  struct cache_tree *reada,
+  struct cache_tree *nodes,
+  struct cache_tree *extent_cache,
+  struct cache_tree *chunk_cache

[PATCH 3/3] Btrfs-progs, fsck: move root items repair after root rebuilding

2014-11-25 Thread Wang Shilong
If some critical roots are corrupt, reapr_root_items() will fail,
this is detected by fsck_tests.sh's extent rebuilding tests.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-check.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index a102752..ae9005e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -7987,22 +7987,6 @@ int cmd_check(int argc, char **argv)
 
root = info-fs_root;
 
-   ret = repair_root_items(info);
-   if (ret  0)
-   goto close_out;
-   if (repair) {
-   fprintf(stderr, Fixed %d roots.\n, ret);
-   ret = 0;
-   } else if (ret  0) {
-   fprintf(stderr,
-  Found %d roots with an outdated root item.\n,
-  ret);
-   fprintf(stderr,
-   Please run a filesystem check with the option --repair 
to fix them.\n);
-   ret = 1;
-   goto close_out;
-   }
-
/*
 * repair mode will force us to commit transaction which
 * will make us fail to load log tree when mounting.
@@ -8101,6 +8085,22 @@ int cmd_check(int argc, char **argv)
if (ret)
fprintf(stderr, Errors found in extent allocation tree or 
chunk allocation\n);
 
+   ret = repair_root_items(info);
+   if (ret  0)
+   goto close_out;
+   if (repair) {
+   fprintf(stderr, Fixed %d roots.\n, ret);
+   ret = 0;
+   } else if (ret  0) {
+   fprintf(stderr,
+  Found %d roots with an outdated root item.\n,
+  ret);
+   fprintf(stderr,
+   Please run a filesystem check with the option --repair 
to fix them.\n);
+   ret = 1;
+   goto close_out;
+   }
+
fprintf(stderr, checking free space cache\n);
ret = check_space_cache(root);
if (ret)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Btrfs-progs, fsck: move root items repair after root rebuilding

2014-11-25 Thread Wang Shilong
If some critical roots are corrupt, reapr_root_items() will fail,
this is detected by fsck_tests.sh's extent rebuilding tests.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-check.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index a102752..ae9005e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -7987,22 +7987,6 @@ int cmd_check(int argc, char **argv)
 
root = info-fs_root;
 
-   ret = repair_root_items(info);
-   if (ret  0)
-   goto close_out;
-   if (repair) {
-   fprintf(stderr, Fixed %d roots.\n, ret);
-   ret = 0;
-   } else if (ret  0) {
-   fprintf(stderr,
-  Found %d roots with an outdated root item.\n,
-  ret);
-   fprintf(stderr,
-   Please run a filesystem check with the option --repair 
to fix them.\n);
-   ret = 1;
-   goto close_out;
-   }
-
/*
 * repair mode will force us to commit transaction which
 * will make us fail to load log tree when mounting.
@@ -8101,6 +8085,22 @@ int cmd_check(int argc, char **argv)
if (ret)
fprintf(stderr, Errors found in extent allocation tree or 
chunk allocation\n);
 
+   ret = repair_root_items(info);
+   if (ret  0)
+   goto close_out;
+   if (repair) {
+   fprintf(stderr, Fixed %d roots.\n, ret);
+   ret = 0;
+   } else if (ret  0) {
+   fprintf(stderr,
+  Found %d roots with an outdated root item.\n,
+  ret);
+   fprintf(stderr,
+   Please run a filesystem check with the option --repair 
to fix them.\n);
+   ret = 1;
+   goto close_out;
+   }
+
fprintf(stderr, checking free space cache\n);
ret = check_space_cache(root);
if (ret)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend 1/3] Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree

2014-11-25 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

Previously, we deal with node block firstly and then leaf block which can
maximize readahead. However, to rebuild extent tree, we need deal with snapshot
one by one.

This patch makes us deal with snapshot one by one if we need rebuild extent
tree otherwise we drop into previous way.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-check.c | 248 +--
 1 file changed, 158 insertions(+), 90 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 24b6b59..ff2795d 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -128,10 +128,14 @@ struct inode_backref {
char name[0];
 };
 
-struct dropping_root_item_record {
+struct root_item_record {
struct list_head list;
-   struct btrfs_root_item ri;
-   struct btrfs_key found_key;
+   u64 objectid;
+   u64 bytenr;
+   u8 level;
+   u8 drop_level;
+   int level_size;
+   struct btrfs_key drop_key;
 };
 
 #define REF_ERR_NO_DIR_ITEM(1  0)
@@ -4618,7 +4622,7 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
  struct rb_root *dev_cache,
  struct block_group_tree *block_group_cache,
  struct device_extent_tree *dev_extent_cache,
- struct btrfs_root_item *ri)
+ struct root_item_record *ri)
 {
struct extent_buffer *buf;
u64 bytenr;
@@ -4851,11 +4855,8 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
size = btrfs_level_size(root, level - 1);
btrfs_node_key_to_cpu(buf, key, i);
if (ri != NULL) {
-   struct btrfs_key drop_key;
-   btrfs_disk_key_to_cpu(drop_key,
- ri-drop_progress);
if ((level == ri-drop_level)
-is_dropped_key(key, drop_key)) {
+is_dropped_key(key, ri-drop_key)) {
continue;
}
}
@@ -4896,7 +4897,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
   struct cache_tree *pending,
   struct cache_tree *seen,
   struct cache_tree *nodes,
-  struct btrfs_key *root_key)
+  u64 objectid)
 {
if (btrfs_header_level(buf)  0)
add_pending(nodes, seen, buf-start, buf-len);
@@ -4905,13 +4906,12 @@ static int add_root_to_pending(struct extent_buffer 
*buf,
add_extent_rec(extent_cache, NULL, 0, buf-start, buf-len,
   0, 1, 1, 0, 1, 0, buf-len);
 
-   if (root_key-objectid == BTRFS_TREE_RELOC_OBJECTID ||
+   if (objectid == BTRFS_TREE_RELOC_OBJECTID ||
btrfs_header_backref_rev(buf)  BTRFS_MIXED_BACKREF_REV)
add_tree_backref(extent_cache, buf-start, buf-start,
 0, 1);
else
-   add_tree_backref(extent_cache, buf-start, 0,
-root_key-objectid, 1);
+   add_tree_backref(extent_cache, buf-start, 0, objectid, 1);
return 0;
 }
 
@@ -6481,6 +6481,99 @@ static int check_devices(struct rb_root *dev_cache,
return ret;
 }
 
+static int add_root_item_to_list(struct list_head *head,
+ u64 objectid, u64 bytenr,
+ u8 level, u8 drop_level,
+ int level_size, struct btrfs_key *drop_key)
+{
+
+   struct root_item_record *ri_rec;
+   ri_rec = malloc(sizeof(*ri_rec));
+   if (!ri_rec)
+   return -ENOMEM;
+   ri_rec-bytenr = bytenr;
+   ri_rec-objectid = objectid;
+   ri_rec-level = level;
+   ri_rec-level_size = level_size;
+   ri_rec-drop_level = drop_level;
+   if (drop_key)
+   memcpy(ri_rec-drop_key, drop_key, sizeof(*drop_key));
+   list_add_tail(ri_rec-list, head);
+
+   return 0;
+}
+
+static int deal_root_from_list(struct list_head *list,
+  struct btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct block_info *bits,
+  int bits_nr,
+  struct cache_tree *pending,
+  struct cache_tree *seen,
+  struct cache_tree *reada,
+  struct cache_tree *nodes,
+  struct cache_tree *extent_cache,
+  struct cache_tree *chunk_cache

[PATCH 0/3] Extent tree rebuilding with snapshots patches

2014-11-25 Thread Wang Shilong
I did send these patches a long while ago, but due to some reasons,
they were not merged, these are important fixes for fsck, without
these patches, extent tree rebuilding did not work with snapshots.

Also, /tests/fsck-tests.sh's extent tree rebuild test could always
detect failure without these patches, unluckily, it need extra
enviroment setting, so i supposed it was always 'NOTRUN'!

last patch fix a regression for root rebuilding, root rebuilding
should be at first, because if root(extent root eg) corrupted,
root items also won't succeed.

patches are based on David's integration-20141125

Wang Shilong (3):
  Btrfs-progs: fsck: deal with snapshot one by one when rebuilding
extent tree
  Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots
  Btrfs-progs, fsck: move root items repair after root rebuilding

 cmds-check.c | 412 +++
 1 file changed, 303 insertions(+), 109 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend 2/3] Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots

2014-11-25 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

This patch makes us to rebuild a really corrupt extent tree with snapshots.
To implement this, we have to verify whether a block is FULL BACKREF.

This idea come from Josef Bacik:

1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root-objectid.  We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root.  World peace
is achieved.

2) We walk down the snapshotted tree.  Say we didn't change anything
at all, it was just a clean snapshot and then boom.  So the
btrfs_header_owner(root-node) == root-objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root-objectid, but the level above did, so we add normal refs for all
of these blocks.  We go down the next level, now our
btrfs_header_owner(parent) != root-objectid and
btrfs_header_owner(eb) != root-objectid.  This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb.  If it does we are done, move on to the next block in this same
level, we don't have to go further down.

3) Harder case, we snapshotted and then changed things in the original
root.  Do the same thing as in step 2, but now we get down to
btrfs_header_owner(eb) != root-objectid  btrfs_header_owner(parent)
!= root-objectid.  We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb.  So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent-start as the
offset.  And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-check.c | 132 +--
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index ff2795d..a102752 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -112,6 +112,7 @@ struct extent_record {
unsigned int owner_ref_checked:1;
unsigned int is_root:1;
unsigned int metadata:1;
+   unsigned int flag_block_full_backref:1;
 };
 
 struct inode_backref {
@@ -4608,6 +4609,127 @@ static int is_dropped_key(struct btrfs_key *key,
return 0;
 }
 
+static int calc_extent_flag(struct btrfs_root *root,
+  struct cache_tree *extent_cache,
+  struct extent_buffer *buf,
+  struct root_item_record *ri,
+  u64 *flags)
+{
+   int i;
+   int nritems = btrfs_header_nritems(buf);
+   struct btrfs_key key;
+   struct extent_record *rec;
+   struct cache_extent *cache;
+   struct data_backref *dback;
+   struct tree_backref *tback;
+   struct extent_buffer *new_buf;
+   u64 owner = 0;
+   u64 bytenr;
+   u64 offset;
+   u64 ptr;
+   int size;
+   int ret;
+   u8 level;
+
+   /*
+* Except file/reloc tree, we can not have
+* FULL BACKREF MODE
+*/
+   if (ri-objectid  BTRFS_FIRST_FREE_OBJECTID)
+   goto normal;
+   /*
+* root node
+*/
+   if (buf-start == ri-bytenr)
+   goto normal;
+   if (btrfs_is_leaf(buf)) {
+   /*
+* we are searching from original root, world
+* peace is achieved, we use normal backref.
+*/
+   owner = btrfs_header_owner(buf);
+   if (owner == ri-objectid)
+   goto normal;
+   /*
+* we check every eb here, and if any of
+* eb dosen't have original root refers
+* to this eb, we set full backref flag for
+* this extent, otherwise normal backref.
+*/
+   for (i = 0; i  nritems; i++) {
+   struct btrfs_file_extent_item *fi;
+   btrfs_item_key_to_cpu(buf, key, i);
+
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   continue;
+   fi = btrfs_item_ptr(buf, i,
+   struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(buf, fi) ==
+   BTRFS_FILE_EXTENT_INLINE)
+   continue;
+   if (btrfs_file_extent_disk_bytenr(buf, fi) == 0)
+   continue;
+   bytenr = btrfs_file_extent_disk_bytenr(buf, fi);
+   cache = lookup_cache_extent(extent_cache, bytenr, 1);
+   if (!cache)
+   goto full_backref;
+   offset = btrfs_file_extent_offset(buf, fi);
+   rec = container_of(cache, struct

Re: [PATCH 3/3] Btrfs-progs, fsck: move root items repair after root rebuilding

2014-11-25 Thread Wang Shilong
Oops, sorry for sending this twice, just entered one more *.patch…


 
 I did send these patches a long while ago, but due to some reasons,
 they were not merged, these are important fixes for fsck, without
 these patches, extent tree rebuilding did not work with snapshots.
 
 Also, /tests/fsck-tests.sh's extent tree rebuild test could always
 detect failure without these patches, unluckily, it need extra
 enviroment setting, so i supposed it was always 'NOTRUN'!
 
 last patch fix a regression for root rebuilding, root rebuilding
 should be at first, because if root(extent root eg) corrupted,
 root items also won't succeed.
 
 patches are based on David's integration-20141125
 
 Wang Shilong (3):
  Btrfs-progs: fsck: deal with snapshot one by one when rebuilding
extent tree
  Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots
  Btrfs-progs, fsck: move root items repair after root rebuilding
 
 cmds-check.c | 412 +++
 1 file changed, 303 insertions(+), 109 deletions(-)
 
 -- 
 2.1.0
 

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: deal with all 'subvol=xxx' options once

2014-11-24 Thread Wang Shilong
Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt
 # btrfs sub create /mnt/dir
 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir

It fails with:
 mount: mount(2) failed: No such file or directory

Btrfs deal with subvolume mounting in a recursive way,
to avoid looping, it will stripe out 'subvol=' string,
then next loop will stop.Problem here is it only deal one
string once, if users specify mount option multiple times.
It will loop several times which is not good, and above
reproducing steps will also return confusing results.

Fix this problem by striping out all 'subvol=xxx' options,
only last is valid.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/super.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 54bd91e..2b2fa4b 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1115,7 +1115,7 @@ static inline int is_subvolume_inode(struct inode *inode)
  * subvolid=0 to make sure we get the actual tree root for path walking to the
  * subvol we want.
  */
-static char *setup_root_args(char *args)
+static char *__setup_root_args(char *args)
 {
unsigned len = strlen(args) + 2 + 1;
char *src, *dst, *buf;
@@ -1161,6 +1161,24 @@ static char *setup_root_args(char *args)
return buf;
 }
 
+static char *setup_root_args(char *args)
+{
+   char *p, *new_args;
+
+   p = new_args = __setup_root_args(args);
+   /* in case users specify subvol=xxx option multiple times */
+   while (p) {
+   p = __setup_root_args(new_args);
+   if (p) {
+   kfree(new_args);
+   new_args = p;
+   } else {
+   break;
+   }
+   }
+   return new_args;
+}
+
 static struct dentry *mount_subvol(const char *subvol_name, int flags,
   const char *device_name, char *data)
 {
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: deal with all 'subvol=xxx' options once

2014-11-24 Thread Wang Shilong

 
 Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt
 # btrfs sub create /mnt/dir
 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir
 
 It fails with:
 mount: mount(2) failed: No such file or directory
 
 Btrfs deal with subvolume mounting in a recursive way,
 to avoid looping, it will stripe out 'subvol=' string,
 then next loop will stop.Problem here is it only deal one
 string once, if users specify mount option multiple times.
 It will loop several times which is not good, and above
 reproducing steps will also return confusing results.
 
 Fix this problem by striping out all 'subvol=xxx' options,
 only last is valid.
 
 Signed-off-by: Wang Shilong wangshilong1...@gmail.com
 ---
 fs/btrfs/super.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 54bd91e..2b2fa4b 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1115,7 +1115,7 @@ static inline int is_subvolume_inode(struct inode 
 *inode)
  * subvolid=0 to make sure we get the actual tree root for path walking to the
  * subvol we want.
  */
 -static char *setup_root_args(char *args)
 +static char *__setup_root_args(char *args)
 {
   unsigned len = strlen(args) + 2 + 1;
   char *src, *dst, *buf;
 @@ -1161,6 +1161,24 @@ static char *setup_root_args(char *args)
   return buf;
 }
 
 +static char *setup_root_args(char *args)
 +{
 + char *p, *new_args;
 +
 + p = new_args = __setup_root_args(args);
 + /* in case users specify subvol=xxx option multiple times */
 + while (p) {
 + p = __setup_root_args(new_args);

Oh, i missed error handling here, will update it..



 + if (p) {
 + kfree(new_args);
 + new_args = p;
 + } else {
 + break;
 + }
 + }
 + return new_args;
 +}
 +
 static struct dentry *mount_subvol(const char *subvol_name, int flags,
  const char *device_name, char *data)
 {
 -- 
 1.7.12.4
 

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix allocationg memory failure for btrfsic_state structure

2014-11-22 Thread Wang Shilong
size of @btrfsic_state needs more than 2M, it is very likely to
fail allocating memory using kzalloc(). see following mesage:

[91428.902148] Call Trace:
[816f6e0f] dump_stack+0x4d/0x66
[811b1c7f] warn_alloc_failed+0xff/0x170
[811b66e1] __alloc_pages_nodemask+0x951/0xc30
[811fd9da] alloc_pages_current+0x11a/0x1f0
[811b1e0b] ? alloc_kmem_pages+0x3b/0xf0
[811b1e0b] alloc_kmem_pages+0x3b/0xf0
[811d1018] kmalloc_order+0x18/0x50
[811d1074] kmalloc_order_trace+0x24/0x140
[a06c097b] btrfsic_mount+0x8b/0xae0 [btrfs]
[810af555] ? check_preempt_curr+0x85/0xa0
[810b2de3] ? try_to_wake_up+0x103/0x430
[a063d200] open_ctree+0x1bd0/0x2130 [btrfs]
[a060fdde] btrfs_mount+0x62e/0x8b0 [btrfs]
[811fd9da] ? alloc_pages_current+0x11a/0x1f0
[811b0a5e] ? __get_free_pages+0xe/0x50
[81230429] mount_fs+0x39/0x1b0
[812509fb] vfs_kern_mount+0x6b/0x150
[812537fb] do_mount+0x27b/0xc30
[811b0a5e] ? __get_free_pages+0xe/0x50
[812544f6] SyS_mount+0x96/0xf0
[81701970] system_call_fastpath+0x16/0x1b

Since we are allocating memory for hash table array, so
it will be good if we could allocate continuous pages here.

Fix this problem by firstly trying kzalloc(), if we fail,
use vzalloc() instead.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
v1-v2:
include vmalloc.h and swith kvfree() helper.
---
 fs/btrfs/check-integrity.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index cb7f3fe..f88b3a6 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -94,6 +94,7 @@
 #include linux/mutex.h
 #include linux/genhd.h
 #include linux/blkdev.h
+#include linux/vmalloc.h
 #include ctree.h
 #include disk-io.h
 #include hash.h
@@ -3130,10 +3131,13 @@ int btrfsic_mount(struct btrfs_root *root,
   root-sectorsize, PAGE_CACHE_SIZE);
return -1;
}
-   state = kzalloc(sizeof(*state), GFP_NOFS);
-   if (NULL == state) {
-   printk(KERN_INFO btrfs check-integrity: kmalloc() failed!\n);
-   return -1;
+   state = kzalloc(sizeof(*state), GFP_KERNEL | __GFP_NOWARN | 
__GFP_REPEAT);
+   if (!state) {
+   state = vzalloc(sizeof(*state));
+   if (!state) {
+   printk(KERN_INFO btrfs check-integrity: vzalloc() 
failed!\n);
+   return -1;
+   }
}
 
if (!btrfsic_is_initialized) {
@@ -3277,5 +3281,5 @@ void btrfsic_unmount(struct btrfs_root *root,
 
mutex_unlock(btrfsic_mutex);
 
-   kfree(state);
+   kvfree(state);
 }
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: switch to kvfree() helper

2014-11-22 Thread Wang Shilong
A new helper kvfree() in mm/utils.c will do this.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/raid56.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 6a41631..12e343b 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -221,12 +221,8 @@ int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info 
*info)
}
 
x = cmpxchg(info-stripe_hash_table, NULL, table);
-   if (x) {
-   if (is_vmalloc_addr(x))
-   vfree(x);
-   else
-   kfree(x);
-   }
+   if (x)
+   kvfree(x);
return 0;
 }
 
@@ -436,10 +432,7 @@ void btrfs_free_stripe_hash_table(struct btrfs_fs_info 
*info)
if (!info-stripe_hash_table)
return;
btrfs_clear_rbio_cache(info);
-   if (is_vmalloc_addr(info-stripe_hash_table))
-   vfree(info-stripe_hash_table);
-   else
-   kfree(info-stripe_hash_table);
+   kvfree(info-stripe_hash_table);
info-stripe_hash_table = NULL;
 }
 
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quota question

2014-11-12 Thread Wang Shilong

 Anyone on this ? There is an issue with quotas depending on the write rate. 
 The more we can write before a sync, the more we can exceed quotas limits

Absolutely, this was a regression bug since Josef introduce delay refs merging 
for qgroup.
Josef, would you please spend some time tracking this problem down?


 -- 
 Cyril SCETBON
 
 I did some new tests and the issue with quota exceeded comes from the fact 
 that the data is not sync to disk : http://pastebin.com/fZdh2YRu
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: request for info on the list of parameters to tweak for PCIe SSDs

2014-10-31 Thread Wang Shilong
Hi Lakshmi,

Personally, i haven’t done many benchmarks about performance tuning, but
there are some info you could refer to:

If you feel intrested, you could take a look at following url which offers 
btrfs mount options:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/btrfs.txt

And for Wiki, there are many information which might help you:
https://btrfs.wiki.kernel.org/index.php/Main_Page

There are also some options that you could tune for mkfs, see detailed info
by ‘man mkfs.btrfs’.I supposed for example different ’nodesize/leafsize’ 
will have effects for performance. Also there are some new features Btrfs,
like skinny_metadata, no_holes etc, you could clone a Latest btrfs-progs tools 
from:
http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/

Some features are not enabled by default(skinny_medata,no_holes etc),
but for benchmarking, you could turn on them and using latest kernel with 
latest btrfs-progs:

root@localhost autotest]# mkfs.btrfs -O list-all
Filesystem features available at mkfs time:
mixed-bg- mixed data and metadata block groups (0x4)
extref  - increased hardlink limit per file to 65536 (0x40, default)
raid56  - raid56 extended format (0x80)
skinny-metadata - reduced-size metadata extent refs (0x100)
no-holes- no explicit hole extents for files (0x200)

It will be good that you could do some benchmarks for Btrfs with different 
parameters, and feel
free to let others know your results(i am also curious).^_^

Best Regards,
Wang Shilong

 Hi,
 
 Could you kindly help us with the list of all the btrfs  file system 
 parameters , that can be tweaked for the best performance of the PCIe based 
 SSDs ?
 
 Thanks,
 Lakshmi
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-18 Thread Wang Shilong
 info in comments 10,14 in
 https://bugzilla.kernel.org/show_bug.cgi?id=64961
 
 I can't reproduce this, how big is your home directory, and are you
 still seeing corruptions after just rsyncing to a clean fs?  Thanks,
 
   as I wrote in comment 10, it has improved since year ago when I
 reported it: I see no corruption at all, neither after rsync, nor after
 balance crash: btrfs check doesn't find anything wrong, files look ok.
 The only problem is that after adding a disk the balance segfaults on a
 kernel bug and the fs gets stuck.  When I run balance again after
 reboot, it makes only a very small progress and crashes again the same
 way.
 
   There are some 2.5TB of data in 7.5M files on that fs.  And couple
 dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right
 now, but it takes more than day to copy the data and recreate all the
 snapshots.  But a test with ^skinny-metadata showed no problems, so I
 don't thing I got bitten by that bug.
 
   I have btrfs-image of one of previous runs after crashed balance.
 It's 15GB. I can place it somewhere with fast link, are you interested?
 
 
 
 Yup, send me the link and I'll pull it down.  Thanks,
 
 Josef
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-18 Thread Wang Shilong
Sure, that is cool, let me know if i could give any help!
I have an idle VM that could run btrfs tests there.^_^


 Thanks I'll run this on Monday.
 
 Josef
 
 Wang Shilong wangshilong1...@gmail.com wrote:
 
 
 Hello Josef,
 
 With Skinny metadta and i running your btrfs-next repo for-suse branch
 (which has extent ref patch), i hit following problem:
 
 [  250.679705] BTRFS info (device sdb): relocating block group 35597058048 
 flags 36
 [  250.728815] BTRFS info (device sdb): relocating block group 35462840320 
 flags 36
 [  253.562133] Dropping a ref for a root that doesn't have a ref on the block
 [  253.562475] Dumping block entry [34793177088 8192], num_refs 3, metadata 0
 [  253.562795]   Ref root 0, parent 35532013568, owner 23988, offset 0, 
 num_refs 18446744073709551615
 [  253.563126]   Ref root 0, parent 35560964096, owner 23988, offset 0, 
 num_refs 1
 [  253.563505]   Ref root 0, parent 35654615040, owner 23988, offset 0, 
 num_refs 1
 [  253.563837]   Ref root 0, parent 35678650368, owner 23988, offset 0, 
 num_refs 1
 [  253.564162]   Root entry 5, num_refs 1
 [  253.564520]   Root entry 18446744073709551608, num_refs 
 18446744073709551615
 [  253.564860]   Ref action 4, root 5, ref_root 5, parent 0, owner 23988, 
 offset 0, num_refs 1
 [  253.565205][a049d2f1] process_leaf.isra.6+0x281/0x3e0 [btrfs]
 [  253.565225][a049de83] build_ref_tree_for_root+0x433/0x460 
 [btrfs]
 [  253.565234][a049e1af] btrfs_build_ref_tree+0x18f/0x1c0 
 [btrfs]
 [  253.565241][a0419ce8] open_ctree+0x18b8/0x21a0 [btrfs]
 [  253.565247][a03ecb0e] btrfs_mount+0x62e/0x8b0 [btrfs]
 [  253.565251][812324e9] mount_fs+0x39/0x1b0
 [  253.565255][8125285b] vfs_kern_mount+0x6b/0x150
 [  253.565257][8125565b] do_mount+0x27b/0xc30
 [  253.565259][81256356] SyS_mount+0x96/0xf0
 [  253.565260][81795429] system_call_fastpath+0x16/0x1b
 [  253.565263][] 0x
 [  253.565272]   Ref action 1, root 18446744073709551608, ref_root 0, parent 
 35654615040, owner 23988, offset 0, num_refs 1
 [  253.565681][a049d564] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
 [  253.565692][a03f946b] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
 [  253.565697][a03fb77c] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
 [  253.565702][a0401504] btrfs_inc_ref+0x14/0x20 [btrfs]
 [  253.565707][a03f05ff] update_ref_for_cow+0x15f/0x380 [btrfs]
 [  253.565711][a03f0a3d] __btrfs_cow_block+0x21d/0x540 [btrfs]
 [  253.565716][a03f0f0c] btrfs_cow_block+0x12c/0x290 [btrfs]
 [  253.565721][a046f59c] do_relocation+0x49c/0x570 [btrfs]
 [  253.565728][a04723ce] relocate_tree_blocks+0x60e/0x660 
 [btrfs]
 [  253.565735][a0473ce7] relocate_block_group+0x407/0x690 
 [btrfs]
 [  253.565741][a0474148] btrfs_relocate_block_group+0x1d8/0x2f0 
 [btrfs]
 [  253.565746][a04455a7] 
 btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
 [  253.565753][a0448a8b] __btrfs_balance+0x4eb/0x8d0 [btrfs]
 [  253.565760][a044928a] btrfs_balance+0x41a/0x720 [btrfs]
 [  253.565766][a045112a] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
 [  253.565772][a0456df8] btrfs_ioctl+0x588/0x2cb0 [btrfs]
 [  253.565779]   Ref action 1, root 18446744073709551608, ref_root 0, parent 
 35560964096, owner 23988, offset 0, num_refs 1
 [  253.566143][a049d564] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
 [  253.566152][a03f946b] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
 [  253.566180][a03fb77c] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
 [  253.566186][a0401504] btrfs_inc_ref+0x14/0x20 [btrfs]
 [  253.566191][a03f071b] update_ref_for_cow+0x27b/0x380 [btrfs]
 [  253.566195][a03f0a3d] __btrfs_cow_block+0x21d/0x540 [btrfs]
 [  253.566199][a03f0f0c] btrfs_cow_block+0x12c/0x290 [btrfs]
 [  253.566203][a046f59c] do_relocation+0x49c/0x570 [btrfs]
 [  253.566210][a04723ce] relocate_tree_blocks+0x60e/0x660 
 [btrfs]
 [  253.566216][a0473ce7] relocate_block_group+0x407/0x690 
 [btrfs]
 [  253.566222][a0474148] btrfs_relocate_block_group+0x1d8/0x2f0 
 [btrfs]
 [  253.566227][a04455a7] 
 btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
 [  253.566233][a0448a8b] __btrfs_balance+0x4eb/0x8d0 [btrfs]
 [  253.566240][a044928a] btrfs_balance+0x41a/0x720 [btrfs]
 [  253.566245][a045112a] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
 [  253.566252][a0456df8] btrfs_ioctl+0x588/0x2cb0 [btrfs]
 [  253.566258]   Ref action 2, root 18446744073709551608, ref_root 5, parent 
 0, owner 23988, offset 0, num_refs 18446744073709551615
 [  253.566641][a049d710] btrfs_ref_tree_mod+0x2c0/0x570 [btrfs]
 [  253.566651][a040404a

Re: [PATCH 1/2] Btrfs: check_int: use the known block location

2014-10-17 Thread Wang Shilong

 The xfstest btrfs/014 which tests the balance operation caused issues with
 the check_int module. The attempt was made to use btrfs_map_block() to
 find the physical location for a written block. However, this was not
 at all needed since the location of the written block was known since
 a hook to submit_bio() was the reason for entering the check_int module.
 Additionally, after a block relocation it happened that btrfs_map_block()
 failed causing misleading error messages afterwards.
 
 This patch changes the check_int module to use the known information of
 the physical location from the bio.
 
 Reported-by: Wang Shilong wangshilong1...@gmail.com
 Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de

This passed my Tests after applying both patches.

Tested-by: Wang Shilong wangshilong1...@gmail.com


 ---
 fs/btrfs/check-integrity.c | 66 --
 1 file changed, 11 insertions(+), 55 deletions(-)
 
 diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
 index ce92ae30250f..65fc2e0bbc4a 100644
 --- a/fs/btrfs/check-integrity.c
 +++ b/fs/btrfs/check-integrity.c
 @@ -326,9 +326,6 @@ static int btrfsic_handle_extent_data(struct 
 btrfsic_state *state,
 static int btrfsic_map_block(struct btrfsic_state *state, u64 bytenr, u32 len,
struct btrfsic_block_data_ctx *block_ctx_out,
int mirror_num);
 -static int btrfsic_map_superblock(struct btrfsic_state *state, u64 bytenr,
 -   u32 len, struct block_device *bdev,
 -   struct btrfsic_block_data_ctx *block_ctx_out);
 static void btrfsic_release_block_ctx(struct btrfsic_block_data_ctx 
 *block_ctx);
 static int btrfsic_read_block(struct btrfsic_state *state,
 struct btrfsic_block_data_ctx *block_ctx);
 @@ -1609,25 +1606,6 @@ static int btrfsic_map_block(struct btrfsic_state 
 *state, u64 bytenr, u32 len,
   return ret;
 }
 
 -static int btrfsic_map_superblock(struct btrfsic_state *state, u64 bytenr,
 -   u32 len, struct block_device *bdev,
 -   struct btrfsic_block_data_ctx *block_ctx_out)
 -{
 - block_ctx_out-dev = btrfsic_dev_state_lookup(bdev);
 - block_ctx_out-dev_bytenr = bytenr;
 - block_ctx_out-start = bytenr;
 - block_ctx_out-len = len;
 - block_ctx_out-datav = NULL;
 - block_ctx_out-pagev = NULL;
 - block_ctx_out-mem_to_free = NULL;
 - if (NULL != block_ctx_out-dev) {
 - return 0;
 - } else {
 - printk(KERN_INFO btrfsic: error, cannot lookup dev (#2)!\n);
 - return -ENXIO;
 - }
 -}
 -
 static void btrfsic_release_block_ctx(struct btrfsic_block_data_ctx 
 *block_ctx)
 {
   if (block_ctx-mem_to_free) {
 @@ -2004,24 +1982,13 @@ again:
   }
   }
 
 - if (block-is_superblock)
 - ret = btrfsic_map_superblock(state, bytenr,
 -  processed_len,
 -  bdev, block_ctx);
 - else
 - ret = btrfsic_map_block(state, bytenr, processed_len,
 - block_ctx, 0);
 - if (ret) {
 - printk(KERN_INFO
 -btrfsic: btrfsic_map_block(root @%llu)
 - failed!\n, bytenr);
 - goto continue_loop;
 - }
 - block_ctx.datav = mapped_datav;
 - /* the following is required in case of writes to mirrors,
 -  * use the same that was used for the lookup */
   block_ctx.dev = dev_state;
   block_ctx.dev_bytenr = dev_bytenr;
 + block_ctx.start = bytenr;
 + block_ctx.len = processed_len;
 + block_ctx.pagev = NULL;
 + block_ctx.mem_to_free = NULL;
 + block_ctx.datav = mapped_datav;
 
   if (is_metadata || state-include_extent_data) {
   block-never_written = 0;
 @@ -2135,10 +2102,6 @@ again:
   /* this is getting ugly for the
* include_extent_data case... */
   bytenr = 0; /* unknown */
 - block_ctx.start = bytenr;
 - block_ctx.len = processed_len;
 - block_ctx.mem_to_free = NULL;
 - block_ctx.pagev = NULL;
   } else {
   processed_len = state-metablock_size;
   bytenr = btrfs_stack_header_bytenr(
 @@ -2151,22 +2114,15 @@ again:
  Written block @%llu (%s/%llu/?)
   !found in hash table, M.\n,
  bytenr, dev_state-name, dev_bytenr);
 -
 - ret

Re: [PATCH 2/2] Btrfs: check-int: don't complain about balanced blocks

2014-10-17 Thread Wang Shilong
 The xfstest btrfs/014 which tests the balance operation caused that the
 check_int module complained that known blocks changed their physical
 location. Since this is not an error in this case, only print such
 message if the verbose mode was enabled.
 
 Reported-by: Wang Shilong wangshilong1...@gmail.com
 Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de

This passed my Tests after applying both patches.

Tested-by: Wang Shilong wangshilong1...@gmail.com

 ---
 fs/btrfs/check-integrity.c | 87 ++
 1 file changed, 49 insertions(+), 38 deletions(-)
 
 diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
 index 65fc2e0bbc4a..65226d7c9fe0 100644
 --- a/fs/btrfs/check-integrity.c
 +++ b/fs/btrfs/check-integrity.c
 @@ -1325,24 +1325,28 @@ static int btrfsic_create_link_to_next_block(
   l = NULL;
   next_block-generation = BTRFSIC_GENERATION_UNKNOWN;
   } else {
 - if (next_block-logical_bytenr != next_bytenr 
 - !(!next_block-is_metadata 
 -   0 == next_block-logical_bytenr)) {
 - printk(KERN_INFO
 -Referenced block @%llu (%s/%llu/%d)
 - found in hash table, %c,
 - bytenr mismatch (!= stored %llu).\n,
 -next_bytenr, next_block_ctx-dev-name,
 -next_block_ctx-dev_bytenr, *mirror_nump,
 -btrfsic_get_block_type(state, next_block),
 -next_block-logical_bytenr);
 - } else if (state-print_mask  BTRFSIC_PRINT_MASK_VERBOSE)
 - printk(KERN_INFO
 -Referenced block @%llu (%s/%llu/%d)
 - found in hash table, %c.\n,
 -next_bytenr, next_block_ctx-dev-name,
 -next_block_ctx-dev_bytenr, *mirror_nump,
 -btrfsic_get_block_type(state, next_block));
 + if (state-print_mask  BTRFSIC_PRINT_MASK_VERBOSE) {
 + if (next_block-logical_bytenr != next_bytenr 
 + !(!next_block-is_metadata 
 +   0 == next_block-logical_bytenr))
 + printk(KERN_INFO
 +Referenced block @%llu (%s/%llu/%d)
 + found in hash table, %c,
 + bytenr mismatch (!= stored %llu).\n,
 +next_bytenr, next_block_ctx-dev-name,
 +next_block_ctx-dev_bytenr, *mirror_nump,
 +btrfsic_get_block_type(state,
 +   next_block),
 +next_block-logical_bytenr);
 + else
 + printk(KERN_INFO
 +Referenced block @%llu (%s/%llu/%d)
 + found in hash table, %c.\n,
 +next_bytenr, next_block_ctx-dev-name,
 +next_block_ctx-dev_bytenr, *mirror_nump,
 +btrfsic_get_block_type(state,
 +   next_block));
 + }
   next_block-logical_bytenr = next_bytenr;
 
   next_block-mirror_num = *mirror_nump;
 @@ -1528,7 +1532,9 @@ static int btrfsic_handle_extent_data(
   return -1;
   }
   if (!block_was_created) {
 - if (next_block-logical_bytenr != next_bytenr 
 + if ((state-print_mask 
 +  BTRFSIC_PRINT_MASK_VERBOSE) 
 + next_block-logical_bytenr != next_bytenr 
   !(!next_block-is_metadata 
 0 == next_block-logical_bytenr)) {
   printk(KERN_INFO
 @@ -1881,25 +1887,30 @@ again:
  dev_state,
  dev_bytenr);
   }
 - if (block-logical_bytenr != bytenr 
 - !(!block-is_metadata 
 -   block-logical_bytenr == 0))
 - printk(KERN_INFO
 -Written block @%llu (%s/%llu/%d)
 - found in hash table, %c,
 - bytenr mismatch
 - (!= stored %llu).\n,
 -bytenr, dev_state-name, dev_bytenr

Re: [PATCH v2] Btrfs-progs: check, fix return value check of is_child_root()

2014-10-16 Thread Wang Shilong

 The following commit:
 
btrfs-progs: fsck: remove unfriendly BUG_ON() for searching tree failure
f495a2ac66116f0a1b15e73380c8cbca6e0a4ca0
 
 introduced a regression, detected through xfstests/btrfs/054, where
 previously a negative return value (-1) was used to mean a particular
 root didn't had any parent root, and now, after that change, a negative
 value is also used to mean that an error happened. That change also made
 the only caller of is_child_root() interpret any negative return value
 as an error and therefore incorrectly made the caller leave with an
 error, instead of continuing.
 
 This affects only the 3.17 release candidates (3.16 and older releases
 don't have this issue).
 
 Signed-off-by: Filipe Manana fdman...@suse.com

Thanks for Fixing this, Filipe!

Reviewed-by: Wang Shilong wangshilong1...@gmail.com

 ---
 
 V2: Made it return 2 (instead of -1) when the root child_root_id doesn't
have any parent roots, in order to behave exactly like the code
pre-commit f495a2ac66116f0a1b15e73380c8cbca6e0a4ca0.
 
 cmds-check.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/cmds-check.c b/cmds-check.c
 index 99d1a94..310eb2a 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c
 @@ -895,6 +895,14 @@ static int leave_shared_node(struct btrfs_root *root,
   return 0;
 }
 
 +/*
 + * Returns:
 + *  0 - on error
 + * 1   - if the root with id child_root_id is a child of root parent_root_id
 + * 0   - if the root child_root_id isn't a child of the root parent_root_id 
 but
 + *   has other root(s) as parent(s)
 + * 2   - if the root child_root_id doesn't have any parent roots
 + */
 static int is_child_root(struct btrfs_root *root, u64 parent_root_id,
u64 child_root_id)
 {
 @@ -952,7 +960,7 @@ out:
   btrfs_release_path(path);
   if (ret  0)
   return ret;
 - return has_parent? 0 : -1;
 + return has_parent ? 0 : 2;
 }
 
 static int process_dir_item(struct btrfs_root *root,
 -- 
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems running Balance and checking integrity concurrently.

2014-10-16 Thread Wang Shilong

Hi Stefan Behrens,

Thanks very much for your attention!

More stress tests will be followed and will
give you feedbacks when finished.


 On Fri, 10 Oct 2014 21:35:02 +0800, Wang Shilong wrote:
 Hello,
 
 I have reproduced a problem with Btrfs integrity and balance using latest
 btrfs kernel,it is very easy to reproduce:
 
 With mount option “check_int”, and run xfstests/btrfs/ tests, below test 
 could definitely
 reproduce problem:
 
 ./check tests/btrfs/014
 
 The following two patches fix the issue.
 
 

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problems running Balance and checking integrity concurrently.

2014-10-10 Thread Wang Shilong
Hello,

I have reproduced a problem with Btrfs integrity and balance using latest
btrfs kernel,it is very easy to reproduce:

With mount option “check_int”, and run xfstests/btrfs/ tests, below test could 
definitely
reproduce problem:

./check tests/btrfs/014

5.122248] Written block @25800343552 (sdc/1210187776/0) found in hash table, M, 
bytenr mismatch (!= stored 20968505344).
[  455.122283] Written block @25800359936 (sdc/1210204160/1) found in hash 
table, M, bytenr mismatch (!= stored 20968521728).
[  455.122301] Written block @25800376320 (sdc/1210220544/1) found in hash 
table, M, bytenr mismatch (!= stored 20968538112).
[  455.122347] Referenced block @25800343552 (sdc/1344405504/2) found in hash 
table, M, bytenr mismatch (!= stored 20968505344).
[  455.122427] Referenced block @25800359936 (sdc/1344421888/2) found in hash 
table, M, bytenr mismatch (!= stored 20968521728).
[  455.122853] Written block @25800376320 (sdc/1344438272/0) found in hash 
table, M, bytenr mismatch (!= stored 20968538112).
[  455.126275] BTRFS critical (device sdc): unable to find logical 3452043264 
len 4096
[  455.126491] btrfsic: btrfsic_map_block(root @3452043264) failed!
[  455.126496] BTRFS critical (device sdc): unable to find logical 3452059648 
len 4096
[  455.126769] btrfsic: btrfsic_map_block(root @3452059648) failed!
[  455.126773] BTRFS critical (device sdc): unable to find logical 3452076032 
len 4096
[  455.127007] btrfsic: btrfsic_map_block(root @3452076032) failed!
[  455.127011] BTRFS critical (device sdc): unable to find logical 3452092416 
len 4096
snip
[  455.129864] btrfsic: btrfsic_map_block(root @3452272640) failed!
[  455.129869] BTRFS critical (device sdc): unable to find logical 3452289024 
len 4096
[  455.130113] btrfsic: btrfsic_map_block(root @3452289024) failed!
[  455.130756] BTRFS critical (device sdc): unable to find logical 3451994112 
len 4096
[  455.130984] btrfsic: btrfsic_map_block(root @3451994112) failed!
[  455.130988] BTRFS critical (device sdc): unable to find logical 3452010496 
len 4096
[  455.131192] btrfsic: btrfsic_map_block(root @3452010496) failed!
[  455.131195] BTRFS critical (device sdc): unable to find logical 3452026880 
len 4096
[  455.131401] btrfsic: btrfsic_map_block(root @3452026880) failed!
[  455.131914] Written block @25800392704 (sdc/1210236928/0) found in hash 
table, M, bytenr mismatch (!= stored 20968554496).
[  455.131938] Referenced block @25800409088 (sdc/1210253312/1) found in hash 
table, M, bytenr mismatch (!= stored 20968570880).
[  455.131971] Referenced block @25800409088 (sdc/1344471040/2) found in hash 
table, M, bytenr mismatch (!= stored 20968570880).
[  455.132017] Referenced block @25800425472 (sdc/1210269696/1) found in hash 
table, M, bytenr mismatch (!= stored 20968587264).
[  455.132030] Referenced block @25800425472 (sdc/1344487424/2) found in hash 
table, M, bytenr mismatch (!= stored 20968587264).
[  455.132344] Written block @25800392704 (sdc/1344454656/0) found in hash 
table, M, bytenr mismatch (!= stored 20968554496

I supposed there are some race problems between integrity check and
Btrfs balance, could you guys please take a look at this issue!


Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix allocationg memory failure for btrfsic_state structure

2014-10-10 Thread Wang Shilong
size of @btrfsic_state needs more than 2M, it is very likely to
fail allocating memory using kzalloc(). see following mesage:

[91428.902148] Call Trace:
[816f6e0f] dump_stack+0x4d/0x66
[811b1c7f] warn_alloc_failed+0xff/0x170
[811b66e1] __alloc_pages_nodemask+0x951/0xc30
[811fd9da] alloc_pages_current+0x11a/0x1f0
[811b1e0b] ? alloc_kmem_pages+0x3b/0xf0
[811b1e0b] alloc_kmem_pages+0x3b/0xf0
[811d1018] kmalloc_order+0x18/0x50
[811d1074] kmalloc_order_trace+0x24/0x140
[a06c097b] btrfsic_mount+0x8b/0xae0 [btrfs]
[810af555] ? check_preempt_curr+0x85/0xa0
[810b2de3] ? try_to_wake_up+0x103/0x430
[a063d200] open_ctree+0x1bd0/0x2130 [btrfs]
[a060fdde] btrfs_mount+0x62e/0x8b0 [btrfs]
[811fd9da] ? alloc_pages_current+0x11a/0x1f0
[811b0a5e] ? __get_free_pages+0xe/0x50
[81230429] mount_fs+0x39/0x1b0
[812509fb] vfs_kern_mount+0x6b/0x150
[812537fb] do_mount+0x27b/0xc30
[811b0a5e] ? __get_free_pages+0xe/0x50
[812544f6] SyS_mount+0x96/0xf0
[81701970] system_call_fastpath+0x16/0x1b

Since we are allocating memory for hash table array, so
it will be good if we could allocate continuous pages here.

Fix this problem by firstly trying kzalloc(), if we fail,
use vzalloc() instead.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/check-integrity.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index cb7f3fe..66e820f 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -3130,10 +3130,13 @@ int btrfsic_mount(struct btrfs_root *root,
   root-sectorsize, PAGE_CACHE_SIZE);
return -1;
}
-   state = kzalloc(sizeof(*state), GFP_NOFS);
-   if (NULL == state) {
-   printk(KERN_INFO btrfs check-integrity: kmalloc() failed!\n);
-   return -1;
+   state = kzalloc(sizeof(*state), GFP_KERNEL | __GFP_NOWARN | 
__GFP_REPEAT);
+   if (!state) {
+   state = vzalloc(sizeof(*state));
+   if (!state) {
+   printk(KERN_INFO btrfs check-integrity: vzalloc() 
failed!\n);
+   return -1;
+   }
}
 
if (!btrfsic_is_initialized) {
@@ -3277,5 +3280,8 @@ void btrfsic_unmount(struct btrfs_root *root,
 
mutex_unlock(btrfsic_mutex);
 
-   kfree(state);
+   if (is_vmalloc_addr(state))
+   vfree(state);
+   else
+   kfree(state);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix incorrect compression ratio detection

2014-10-07 Thread Wang Shilong
Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount -t btrfs /dev/sdb /mnt -o compress=lzo
 # dd if=/dev/zero of=/mnt/data bs=$((33*4096)) count=1

after previous steps, inode will be detected as bad compression ratio,
and NOCOMPRESS flag will be set for that inode.

Reason is that compress have a max limit pages every time(128K), if a
132k write in, it will be splitted into two write(128k+4k), this bug
is a leftover for commit 68bb462d42a(Btrfs: don't compress for a small write)

Fix this problem by checking every time before compression, if it is a
small write(=blocksize), we bail out and fall into nocompression directly.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/inode.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 344a322..b78e90a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -411,14 +411,6 @@ static noinline int compress_file_range(struct inode 
*inode,
(start  0 || end + 1  BTRFS_I(inode)-disk_i_size))
btrfs_add_inode_defrag(NULL, inode);
 
-   /*
-* skip compression for a small file range(=blocksize) that
-* isn't an inline extent, since it dosen't save disk space at all.
-*/
-   if ((end - start + 1) = blocksize 
-   (start  0 || end + 1  BTRFS_I(inode)-disk_i_size))
-   goto cleanup_and_bail_uncompressed;
-
actual_end = min_t(u64, isize, end + 1);
 again:
will_compress = 0;
@@ -440,6 +432,14 @@ again:
 
total_compressed = actual_end - start;
 
+   /*
+* skip compression for a small file range(=blocksize) that
+* isn't an inline extent, since it dosen't save disk space at all.
+*/
+   if (total_compressed = blocksize 
+  (start  0 || end + 1  BTRFS_I(inode)-disk_i_size))
+   goto cleanup_and_bail_uncompressed;
+
/* we want to make sure that amount of ram required to uncompress
 * an extent is reasonable, so we limit the total size in ram
 * of a compressed extent to 128k.  This is a crucial number
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests: add regression for inode bad compression ratio detection

2014-10-07 Thread Wang Shilong
This test add a regression test to make sure btrfs dosen't set
inode NOCOMPRESS flag when a small write(=blocksize) happens.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
Test is a little relying on btrfs now have a max 128k limit write, not sure
whether it is good since it is possible we may change it in the future, let me
know if you have better idea ^_^.
---
 tests/btrfs/060 | 72 +
 tests/btrfs/060.out |  3 +++
 tests/btrfs/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100644 tests/btrfs/060
 create mode 100644 tests/btrfs/060.out

diff --git a/tests/btrfs/060 b/tests/btrfs/060
new file mode 100644
index 000..6ef2e0f
--- /dev/null
+++ b/tests/btrfs/060
@@ -0,0 +1,72 @@
+#! /bin/bash
+# FS QA Test No. btrfs/060
+#
+# Regression test for btrfs incorrection inode ratio detection.
+# This was fixed in the following linux kernel patch:
+#
+# Btrfs: fix incorrect compression ratio detection
+#
+#---
+# Copyright (c) 2014 Wang Shilong.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_test
+_require_scratch
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs  $seqres.full 21
+_scratch_mount -o compress=lzo
+
+$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \
+   $SCRATCH_MNT/data  $seqres.full 21
+
+filefrag $SCRATCH_MNT/data | $AWK_PROG '{print $2}'
+
+$XFS_IO_PROG -f -c pwrite 0 $((4096*33)) -c fsync \
+   $SCRATCH_MNT/data  $seqres.full 21
+
+$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \
+   $SCRATCH_MNT/data  $seqres.full 21
+filefrag $SCRATCH_MNT/data | $AWK_PROG '{print $2}'
+
+status=0
+exit
diff --git a/tests/btrfs/060.out b/tests/btrfs/060.out
new file mode 100644
index 000..1c306d2
--- /dev/null
+++ b/tests/btrfs/060.out
@@ -0,0 +1,3 @@
+QA output created by 060
+80
+80
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 68b5c79..d06c163 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -62,3 +62,4 @@
 057 auto quick
 058 auto quick
 059 auto quick
+060 auto quick
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] xfstests: add regression for inode bad compression ratio detection

2014-10-07 Thread Wang Shilong
This test add a regression test to make sure btrfs dosen't set
inode NOCOMPRESS flag when a small write(=blocksize) happens.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
v1-v2:
switch _extent_count() helper.

Test relies on btrfs have a max 128k limit write, not sure
whether it is good since it is possible we may change it in the future.
let me know if you have better idea here. ^_^
---
 tests/btrfs/060 | 74 +
 tests/btrfs/060.out |  3 +++
 tests/btrfs/group   |  1 +
 3 files changed, 78 insertions(+)
 create mode 100644 tests/btrfs/060
 create mode 100644 tests/btrfs/060.out

diff --git a/tests/btrfs/060 b/tests/btrfs/060
new file mode 100644
index 000..3ff4055
--- /dev/null
+++ b/tests/btrfs/060
@@ -0,0 +1,74 @@
+#! /bin/bash
+# FS QA Test No. btrfs/060
+#
+# Regression test for btrfs incorrect inode ratio detection.
+# This was fixed in the following linux kernel patch:
+#
+# Btrfs: fix incorrect compression ratio detection
+#
+#---
+# Copyright (c) 2014 Wang Shilong.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/defrag
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_test
+_require_scratch
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs  $seqres.full 21
+_scratch_mount -o compress=lzo
+
+$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \
+   $SCRATCH_MNT/data  $seqres.full 21
+
+_extent_count $SCRATCH_MNT/data
+
+$XFS_IO_PROG -f -c pwrite 0 $((4096*33)) -c fsync \
+   $SCRATCH_MNT/data  $seqres.full 21
+
+$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \
+   $SCRATCH_MNT/data  $seqres.full 21
+
+_extent_count $SCRATCH_MNT/data
+
+status=0
+exit
diff --git a/tests/btrfs/060.out b/tests/btrfs/060.out
new file mode 100644
index 000..1c306d2
--- /dev/null
+++ b/tests/btrfs/060.out
@@ -0,0 +1,3 @@
+QA output created by 060
+80
+80
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 68b5c79..d06c163 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -62,3 +62,4 @@
 057 auto quick
 058 auto quick
 059 auto quick
+060 auto quick
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] xfstests,btrfs/010: use _extent_count() helper

2014-10-07 Thread Wang Shilong
cleanup to swith _extent_count(), this way we remove a
dependence on filefrag.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 tests/btrfs/010 | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
 mode change 100755 = 100644 tests/btrfs/010

diff --git a/tests/btrfs/010 b/tests/btrfs/010
old mode 100755
new mode 100644
index 371f1b8..6cfc561
--- a/tests/btrfs/010
+++ b/tests/btrfs/010
@@ -40,6 +40,7 @@ _cleanup()
 # get standard environment, filters and checks
 . ./common/rc
 . ./common/filter
+. ./common/defrag
 
 # real QA test starts here
 
@@ -47,7 +48,6 @@ _cleanup()
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
-_require_command /usr/sbin/filefrag
 
 rm -f $seqres.full
 
@@ -71,9 +71,9 @@ done
 
 $BTRFS_UTIL_PROG filesystem defrag $subvol/foobar  sync
 
-num_extent_orig=`filefrag $subvol/foobar | awk -F ' ' '{print $2}'`
-num_extent_snap1=`filefrag $SCRATCH_MNT/snap-1/foobar | awk -F ' ' '{print 
$2}'`
-num_extent_snap2=`filefrag $SCRATCH_MNT/snap-2/foobar | awk -F ' ' '{print 
$2}'`
+num_extent_orig=`_extent_count $subvol/foobar`
+num_extent_snap1=`_extent_count $SCRATCH_MNT/snap-1/foobar`
+num_extent_snap2=`_extent_count $SCRATCH_MNT/snap-2/foobar`
 
 echo $num_extent_orig $num_extent_snap1 $num_extent_snap2  $seqres.full
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] xfstests,btrfs/004: use FILEFRAG_PROG macro

2014-10-07 Thread Wang Shilong
Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
i didn't find a way to remove filefrag here, but let us
use macro firstly...
---
 tests/btrfs/004 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/btrfs/004 b/tests/btrfs/004
index 1d5b816..1f59963 100755
--- a/tests/btrfs/004
+++ b/tests/btrfs/004
@@ -53,7 +53,6 @@ _supported_os Linux
 _require_scratch
 _require_no_large_scratch_dev
 _require_btrfs inspect-internal
-_require_command /usr/sbin/filefrag
 
 rm -f $seqres.full
 
@@ -82,7 +81,7 @@ _filter_extents()
 
 _check_file_extents()
 {
-   cmd=filefrag -v $1
+   cmd=$FILEFRAG_PROG -v $1
echo # $cmd  $seqres.full
out=`$cmd | _filter_extents`
if [ -z $out ]; then
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xfstests,btrfs/004: use FILEFRAG_PROG macro

2014-10-07 Thread Wang Shilong
Cc list

 On Wed, Oct 08, 2014 at 06:35:22AM -0400, Wang Shilong wrote:
 Signed-off-by: Wang Shilong wangshilong1...@gmail.com
 ---
 i didn't find a way to remove filefrag here, but let us
 use macro firstly...
 
 YOu shoul dbe able to use fiemap to calculate the same output.
 i.e. _check_file_extents() outputs a strings of the format 
 #physical#length#logical#flags and this is then cut up later
 later by the test. fiemap outputs the same information,
 it just requires slightly different processing via the
 _filter_extents function to munge it into the same format…
 
 I supposed there are some differences between fiemap and filefrag:
 
 [root@vm01 mnt]# filefrag -v data 
 Filesystem type is: 9123683e
 File size of data is 1024 (1 block of 4096 bytes)
 ext: logical_offset:physical_offset: length:   expected: flags:
   0:0..4095:  0..  4095:   4096: 
 not_aligned,inline,eof
 data: 1 extent found
 
 [root@vm01 mnt]# xfs_io -c fiemap data 
 data:
   0: [0..7]: 0..7
 
 Hm….
 
 
 
 Cheers,
 
 Dave.
 -- 
 Dave Chinner
 da...@fromorbit.com
 
 Best Regards,
 Wang Shilong
 

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs rescue super-recover memory corruption

2014-09-18 Thread Wang Shilong
Hi Eric,

 This:
 
 # truncate --size=8g
 # dd if=/dev/zero of=file conv=notrunc  bs=4 seek=16384 count=1
 # valgrind ./btrfs rescue super-recover file -v
 
 yields:
 
 ==4604== Memcheck, a memory error detector
 ==4604== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
 ==4604== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
 ==4604== Command: ./btrfs rescue super-recover file -v
 ==4604== 
 All Devices:
   Device: id = 1, name = file
 
 Before Recovering:
   [All good supers]:
   device name = file
   superblock bytenr = 67108864
 
   [All bad supers]:
   device name = file
   superblock bytenr = 65536
 
 
 Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are 
 you sure? [y/N]: y
 Recovered bad superblocks successful
 ==4604== Invalid read of size 8
 ==4604==at 0x426B55: btrfs_recover_superblocks (list.h:204)
 ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148)
 ==4604==by 0x40420A: handle_command_group (btrfs.c:145)
 ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162)
 ==4604==by 0x404199: main (btrfs.c:247)
 ==4604==  Address 0x4c250b0 is 48 bytes inside a block of size 96 free'd
 ==4604==at 0x4A063F0: free (vg_replace_malloc.c:446)
 ==4604==by 0x43C77E: btrfs_close_devices (volumes.c:196)
 ==4604==by 0x42F5D1: close_ctree (disk-io.c:1404)
 ==4604==by 0x426A85: btrfs_recover_superblocks (super-recover.c:340)
 ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148)
 ==4604==by 0x40420A: handle_command_group (btrfs.c:145)
 ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162)
 ==4604==by 0x404199: main (btrfs.c:247)
 ==4604== 
 ==4604== Invalid free() / delete / delete[] / realloc()
 ==4604==at 0x4A063F0: free (vg_replace_malloc.c:446)
 ==4604==by 0x426B9E: btrfs_recover_superblocks (super-recover.c:85)
 ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148)
 ==4604==by 0x40420A: handle_command_group (btrfs.c:145)
 ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162)
 ==4604==by 0x404199: main (btrfs.c:247)
 ==4604==  Address 0x4c25080 is 0 bytes inside a block of size 96 free'd
 ==4604==at 0x4A063F0: free (vg_replace_malloc.c:446)
 ==4604==by 0x43C77E: btrfs_close_devices (volumes.c:196)
 ==4604==by 0x42F5D1: close_ctree (disk-io.c:1404)
 ==4604==by 0x426A85: btrfs_recover_superblocks (super-recover.c:340)
 ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148)
 ==4604==by 0x40420A: handle_command_group (btrfs.c:145)
 ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162)
 ==4604==by 0x404199: main (btrfs.c:247)
 ==4604== 
 ==4604== 
 ==4604== HEAP SUMMARY:
 ==4604== in use at exit: 0 bytes in 0 blocks
 ==4604==   total heap usage: 72 allocs, 73 frees, 140,384 bytes allocated
 ==4604== 
 ==4604== All heap blocks were freed -- no leaks are possible
 ==4604== 
 ==4604== For counts of detected and suppressed errors, rerun with: -v
 ==4604== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)
 
 i.e. I think we are double freeing memory:
 
close_ctree(root); // -- here
 no_recover:
recover_err_str(ret);
free_recover_superblock(recover); // -- and here
 
 I can't really work out what all this is all doing, but maybe the fix is 
 obvious
 to Wang Shilong (who wrote the original code)?

Though i no longer spend much time on btrfs, i will take a look at this. ^_^

Thanks Eric
 
 Thanks,
 -Eric

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: super-recover: fix double free fs_devices memory

2014-09-18 Thread Wang Shilong
super-recover collects btrfs devices infomation using existed
functions scan_one_devices().

Problem is fs_devices is freed twice in close_ctree() and
free_recover_superblock() for super correction path.

Fix this problem by checking whether fs_devices memory
have been freed before we free it.

Cc: Eric Sandeen sand...@redhat.com
Cc: Chris Murphy li...@colorremedies.com
Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 super-recover.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/super-recover.c b/super-recover.c
index 767de4b..419b86a 100644
--- a/super-recover.c
+++ b/super-recover.c
@@ -69,21 +69,11 @@ void init_recover_superblock(struct 
btrfs_recover_superblock *recover)
 static
 void free_recover_superblock(struct btrfs_recover_superblock *recover)
 {
-   struct btrfs_device *device;
struct super_block_record *record;
 
if (!recover-fs_devices)
return;
 
-   while (!list_empty(recover-fs_devices-devices)) {
-   device = list_entry(recover-fs_devices-devices.next,
-   struct btrfs_device, dev_list);
-   list_del_init(device-dev_list);
-   free(device-name);
-   free(device);
-   }
-   free(recover-fs_devices);
-
while (!list_empty(recover-good_supers)) {
record = list_entry(recover-good_supers.next,
struct super_block_record, list);
@@ -341,6 +331,9 @@ int btrfs_recover_superblocks(const char *dname,
 no_recover:
recover_err_str(ret);
free_recover_superblock(recover);
+   /* check if we have freed fs_deivces in close_ctree() */
+   if (!root)
+   btrfs_close_devices(recover.fs_devices);
return ret;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove empty block groups automatically

2014-09-13 Thread Wang Shilong
 @@ static int btrfs_relocate_chunk(struct btrfs_root 
 *root,
 
   if (map-stripes[i].dev) {
   ret = btrfs_update_device(trans, map-stripes[i].dev);
 - BUG_ON(ret);
 + if (ret) {
 + btrfs_abort_transaction(trans, root, ret);
 + goto out;
 + }
   }
   }
   ret = btrfs_free_chunk(trans, root, chunk_tree, chunk_objectid,
  chunk_offset);
 -
 - BUG_ON(ret);
 + if (ret) {
 + btrfs_abort_transaction(trans, root, ret);
 + goto out;
 + }
 
   trace_btrfs_chunk_free(root, map, chunk_offset, em-len);
 
   if (map-type  BTRFS_BLOCK_GROUP_SYSTEM) {
   ret = btrfs_del_sys_chunk(root, chunk_objectid, chunk_offset);
 - BUG_ON(ret);
 + if (ret) {
 + btrfs_abort_transaction(trans, root, ret);
 + goto out;
 + }
   }
 
   ret = btrfs_remove_block_group(trans, extent_root, chunk_offset);
 - BUG_ON(ret);
 + if (ret) {
 + btrfs_abort_transaction(trans, extent_root, ret);
 + goto out;
 + }
 
   write_lock(em_tree-lock);
   remove_extent_mapping(em_tree, em);
 @@ -2658,11 +2660,46 @@ static int btrfs_relocate_chunk(struct btrfs_root 
 *root,
 
   /* once for the tree */
   free_extent_map(em);
 +out:
   /* once for us */
   free_extent_map(em);
 + return ret;
 +}
 +
 +static int btrfs_relocate_chunk(struct btrfs_root *root,
 +  u64 chunk_tree, u64 chunk_objectid,
 +  u64 chunk_offset)
 +{
 + struct btrfs_root *extent_root;
 + struct btrfs_trans_handle *trans;
 + int ret;
 +
 + root = root-fs_info-chunk_root;
 + extent_root = root-fs_info-extent_root;
 +
 + ret = btrfs_can_relocate(extent_root, chunk_offset);
 + if (ret)
 + return -ENOSPC;
 +
 + /* step one, relocate all the extents inside this chunk */
 + ret = btrfs_relocate_block_group(extent_root, chunk_offset);
 + if (ret)
 + return ret;
 
 + trans = btrfs_start_transaction(root, 0);
 + if (IS_ERR(trans)) {
 + ret = PTR_ERR(trans);
 + btrfs_std_error(root-fs_info, ret);
 + return ret;
 + }
 +
 + /*
 +  * step two, delete the device extents and the
 +  * chunk tree entries
 +  */
 + ret = btrfs_remove_chunk(trans, root, chunk_offset);
   btrfs_end_transaction(trans, root);
 - return 0;
 + return ret;
 }
 
 static int btrfs_relocate_sys_chunks(struct btrfs_root *root)
 diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
 index 2b37da3..f648d19 100644
 --- a/fs/btrfs/volumes.h
 +++ b/fs/btrfs/volumes.h
 @@ -462,6 +462,8 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root 
 *root,
 int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
   struct btrfs_root *extent_root,
   u64 chunk_offset, u64 chunk_size);
 +int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
 +struct btrfs_root *root, u64 chunk_offset);
 
 static inline int btrfs_dev_stats_dirty(struct btrfs_device *dev)
 {
 -- 
 1.8.3.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix some build warnings on 32bit platform

2014-07-29 Thread Wang Shilong
Hi David,

 On Sun, Jul 27, 2014 at 12:49:55AM +0800, Wang Shilong wrote:
 cmds-restore.c:120:4: warning: format %lu expects argument of type
 long unsigned int, but argument 3 has type size_t [-Wformat=]
fprintf(stderr, bad compress length %lu\n, in_len);
 
 --- a/cmds-restore.c
 +++ b/cmds-restore.c
 @@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char 
 *outbuf, u64 compress_len,
  in_len = read_compress_length(inbuf);
 
  if ((tot_in + LZO_LEN + in_len)  tot_len) {
 -fprintf(stderr, bad compress length %lu\n, in_len);
 +fprintf(stderr, bad compress length %u\n, in_len);
 
 in_len is size_t, this prints a warning on 64bit. Let's make it %lu and
 cas to unsigned long (fixed locally, no need to resend), we'er not using
 the 'z' size_t modifier for printf anywhere.


I am sorry that i forgot to test the patch on 64bit.  As i have left current
company, i could not setup a 64 bit machine until next week.

So i will appreciate it if you could fix this when merging or wait
until next week once i got a 64bit machine.


Thanks,
Wang--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch

2014-07-28 Thread Wang Shilong

Hi Takashi,

This seems like a promising fix, i just have a little question:

ulist_add() logic is we firstly cast a pointer to u64(see ptr_to_u64()), 
and then
cast u64 to pointer back(u64_to_ptr()). So normally, arg u64 is actually 
a pointer.


If the below overflow happens, that means casting can change original value?

Am i missing something here?

Thanks,
Wang
On 07/28/2014 04:57 PM, Takashi Iwai wrote:

We've got bug reports that btrfs crashes when quota is enabled on
32bit kernel, typically with the Oops like below:
  BUG: unable to handle kernel NULL pointer dereference at 0004
  IP: [f9234590] find_parent_nodes+0x360/0x1380 [btrfs]
  *pde = 
  Oops:  [#1] SMP
  CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S  W 
3.15.2-1.gd43d97e-default #1
  Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
  task: f1478130 ti: f147c000 task.ti: f147c000
  EIP: 0060:[f9234590] EFLAGS: 00010213 CPU: 0
  EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
  EAX: f147dda8 EBX: f147ddb0 ECX: 0011 EDX: 
  ESI:  EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
   DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
  CR0: 8005003b CR2: 0004 CR3: 00bf3000 CR4: 0690
  Stack:
     f147dda4 0050 0001  0001 0050
   0001  d3059000 0001 0022 00a8  
    00a1   0001   1180
  Call Trace:
   [f923564d] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
   [f9237bb1] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
   [f9206148] normal_work_helper+0xc8/0x270 [btrfs]
   [c025e38b] process_one_work+0x11b/0x390
   [c025eea1] worker_thread+0x101/0x340
   [c026432b] kthread+0x9b/0xb0
   [c0712a71] ret_from_kernel_thread+0x21/0x30
   [c0264290] kthread_create_on_node+0x110/0x110

This indicates a NULL corruption in prefs_delayed list.  The further
investigation and bisection pointed that the call of ulist_add_merge()
results in the corruption.

ulist_add_merge() takes u64 as aux and writes a 64bit value into
old_aux.  The callers of this function in backref.c, however, pass a
pointer of a pointer to old_aux.  That is, the function overwrites
64bit value on 32bit pointer.  This caused a NULL in the adjacent
variable, in this case, prefs_delayed.

Here is a quick attempt to band-aid over this: a new function,
ulist_add_merge_ptr() is introduced to pass/store properly a pointer
value instead of u64.  There are still ugly void ** cast remaining
in the callers because void ** cannot be taken implicitly.  But, it's
safer than explicit cast to u64, anyway.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046
Cc: sta...@vger.kernel.org [v3.11+]
Signed-off-by: Takashi Iwai ti...@suse.de
---

Alternatively, we can change the argument of aux and old_aux to a
pointer from u64, as backref.c is the only user of ulist_add_merge()
function.  I'll cook up another patch if it's the preferred way.

  fs/btrfs/backref.c | 11 +--
  fs/btrfs/ulist.h   | 15 +++
  2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e25564bfcb46..d7a24620a963 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -276,9 +276,8 @@ static int add_all_parents(struct btrfs_root *root, struct 
btrfs_path *path,
}
if (ret  0)
goto next;
-   ret = ulist_add_merge(parents, eb-start,
- (uintptr_t)eie,
- (u64 *)old, GFP_NOFS);
+   ret = ulist_add_merge_ptr(parents, eb-start,
+ eie, (void **)old, GFP_NOFS);
if (ret  0)
break;
if (!ret  extent_item_pos) {
@@ -1008,9 +1007,9 @@ again:
goto out;
ref-inode_list = eie;
}
-   ret = ulist_add_merge(refs, ref-parent,
- (uintptr_t)ref-inode_list,
- (u64 *)eie, GFP_NOFS);
+   ret = ulist_add_merge_ptr(refs, ref-parent,
+ ref-inode_list,
+ (void **)eie, GFP_NOFS);
if (ret  0)
goto out;
if (!ret  extent_item_pos) {
diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h
index 7f78cbf5cf41..695fc2bac680 100644
--- a/fs/btrfs/ulist.h
+++ b/fs/btrfs/ulist.h
@@ -57,6 +57,21 @@ void ulist_free(struct ulist *ulist);
  int ulist_add(struct ulist *ulist, u64 val, u64 aux, gfp_t gfp_mask);
  int ulist_add_merge(struct ulist *ulist, u64 val, u64 aux,
u64 *old_aux, gfp_t 

Re: Help with Project on brtfs wiki

2014-07-26 Thread Wang Shilong

 Hi Hugo,
 
 Am Samstag, 26. Juli 2014, 11:05:03 schrieb Hugo Mills:
 I am not asking to hold my hand
 
   You are, though. Something like using grep to find code that (might
 be) related to the thing you're trying to work with is _fundamental_.
 You should expect to be reading the code -- several layers deep, in
 both directions from any given function -- in order to understand what
 it does and how it fits in, and what problems you might enncounter if
 you change it. If you won't put in that degree of basic effort, then
 you won't get very far.
 
 Actually, from what I read from Nick Krause so far here and on LKML:
 
 Can it be that he is at admittedly quite inventive trolling?
 
 I found none of his posts to be even remotely convincing although he 
 submitted 
 a patch elsewhere.
 
 Thanks for your detailed explaination tough.

Hugo is one of most reliable and serious man i have known.

Anyway, Nick if you really want to do something, just follow Hugo, Ted and 
others'
advices. Contributing Btrfs has many ways, try it and report some BUGs, add some
regression tests etc are all welcome IMO.

Thanks,
Wang
 
 
 To Nick: If I am mistaken – mea culpa. But your posts just seem to follow 
 what 
 I perceive the usual style of something into trolling.
 
 Ciao,
 -- 
 Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
 GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: fix some build warnings on 32bit platform

2014-07-26 Thread Wang Shilong
Fix following build warnings on 32bit platform:

...
utils.c:1708:3: warning: left shift count = width of
type [enabled by default]
   if (x  i  (1UL  63))
   ^
qgroup-verify.c:393:9: warning: cast to pointer from integer
of different size [-Wint-to-pointer-cast]
  return (struct tree_block *)unode-aux;
 ^
qgroup-verify.c:407:38: warning: cast from pointer to integer
of different size [-Wpointer-to-int-cast]
   if (ulist_add(tree_blocks, bytenr, (unsigned long long)block, 0) = 0)
  ^
cmds-restore.c:120:4: warning: format %lu expects argument of type
long unsigned int, but argument 3 has type size_t [-Wformat=]
fprintf(stderr, bad compress length %lu\n, in_len);
...

BTW, this patch also switches other castings with new helpers.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 cmds-inspect.c  | 4 ++--
 cmds-restore.c  | 2 +-
 extent-tree.c   | 3 +--
 kerncompat.h| 4 
 qgroup-verify.c | 4 ++--
 utils.c | 2 +-
 6 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/cmds-inspect.c b/cmds-inspect.c
index cd9d2c6..c6c17a5 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -49,7 +49,7 @@ static int __ino_to_path_fd(u64 inum, int fd, int verbose, 
const char *prepend)
memset(fspath, 0, sizeof(*fspath));
ipa.inum = inum;
ipa.size = 4096;
-   ipa.fspath = (uintptr_t)fspath;
+   ipa.fspath = ptr_to_u64(fspath);
 
ret = ioctl(fd, BTRFS_IOC_INO_PATHS, ipa);
if (ret) {
@@ -185,7 +185,7 @@ static int cmd_logical_resolve(int argc, char **argv)
memset(inodes, 0, sizeof(*inodes));
loi.logical = arg_strtou64(argv[optind]);
loi.size = size;
-   loi.inodes = (uintptr_t)inodes;
+   loi.inodes = ptr_to_u64(inodes);
 
fd = open_file_or_dir(argv[optind+1], dirstream);
if (fd  0) {
diff --git a/cmds-restore.c b/cmds-restore.c
index 3465f84..abd9773 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char 
*outbuf, u64 compress_len,
in_len = read_compress_length(inbuf);
 
if ((tot_in + LZO_LEN + in_len)  tot_len) {
-   fprintf(stderr, bad compress length %lu\n, in_len);
+   fprintf(stderr, bad compress length %u\n, in_len);
return -1;
}
 
diff --git a/extent-tree.c b/extent-tree.c
index 7979457..c46c92b 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -3090,8 +3090,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
break;
ret = get_state_private(info-block_group_cache, start, ptr);
if (!ret) {
-   cache = (struct btrfs_block_group_cache *)
-   (uintptr_t)ptr;
+   cache = u64_to_ptr(ptr);
if (cache-free_space_ctl) {
btrfs_remove_free_space_cache(cache);
kfree(cache-free_space_ctl);
diff --git a/kerncompat.h b/kerncompat.h
index 652275e..bb03194 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -28,6 +28,10 @@
 #include assert.h
 #include stddef.h
 #include linux/types.h
+#include stdint.h
+
+#define ptr_to_u64(x)  ((u64)(uintptr_t)x)
+#define u64_to_ptr(x)  ((void *)(uintptr_t)x)
 
 #ifndef READ
 #define READ 0
diff --git a/qgroup-verify.c b/qgroup-verify.c
index 2e1716d..a5bd4f8 100644
--- a/qgroup-verify.c
+++ b/qgroup-verify.c
@@ -390,7 +390,7 @@ static u64 resolve_one_root(u64 bytenr)
 
 static inline struct tree_block *unode_tree_block(struct ulist_node *unode)
 {
-   return (struct tree_block *)unode-aux;
+   return u64_to_ptr(unode-aux);
 }
 static inline u64 unode_bytenr(struct ulist_node *unode)
 {
@@ -404,7 +404,7 @@ static int alloc_tree_block(u64 bytenr, u64 num_bytes, int 
level)
if (block) {
block-num_bytes = num_bytes;
block-level = level;
-   if (ulist_add(tree_blocks, bytenr, (unsigned long long)block, 
0) = 0)
+   if (ulist_add(tree_blocks, bytenr, ptr_to_u64(block), 0) = 0)
return 0;
free(block);
}
diff --git a/utils.c b/utils.c
index 11250d9..241e5ff 100644
--- a/utils.c
+++ b/utils.c
@@ -1705,7 +1705,7 @@ static int fls64(u64 x)
int i;
 
for (i = 0; i 64; i++)
-   if (x  i  (1UL  63))
+   if (x  i  (1ULL  63))
return 64 - i;
return 64 - i;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: remove author and copyright info from man page

2014-07-24 Thread Wang Shilong
From:
http://man7.org/linux/man-pages/man7/man-pages.7.html
...
AUTHORS lists authors of the documentation or program.Use of
an AUTHORS section is strongly discouraged. Generally,
it is better not to clutter every page with a list of
(over time potentially numerous) authors; if you write
or significantly amend a page, add a copyright notice
as a comment in the source file.  If you are the author
of a device driver and want to include an address for
reporting bugs, place this under the BUGS section.
...

Suggested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 Documentation/btrfs-convert.txt | 12 
 Documentation/btrfs-debug-tree.txt  | 12 
 Documentation/btrfs-find-root.txt   | 12 
 Documentation/btrfs-image.txt   | 12 
 Documentation/btrfs-map-logical.txt | 12 
 Documentation/btrfs-show-super.txt  | 12 
 Documentation/btrfs-zero-log.txt| 12 
 Documentation/btrfstune.txt | 12 
 8 files changed, 96 deletions(-)

diff --git a/Documentation/btrfs-convert.txt b/Documentation/btrfs-convert.txt
index 1eff0bf..11d6044 100644
--- a/Documentation/btrfs-convert.txt
+++ b/Documentation/btrfs-convert.txt
@@ -31,18 +31,6 @@ EXIT STATUS
 *btrfs-convert* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-debug-tree.txt 
b/Documentation/btrfs-debug-tree.txt
index bfc8aa4..23fc115 100644
--- a/Documentation/btrfs-debug-tree.txt
+++ b/Documentation/btrfs-debug-tree.txt
@@ -33,18 +33,6 @@ EXIT STATUS
 *btrfs-debug-tree* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-find-root.txt 
b/Documentation/btrfs-find-root.txt
index a360f8f..c934b4c 100644
--- a/Documentation/btrfs-find-root.txt
+++ b/Documentation/btrfs-find-root.txt
@@ -28,18 +28,6 @@ EXIT STATUS
 *btrfs-find-root* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free to change and redistribute it. There is NO 
WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-image.txt b/Documentation/btrfs-image.txt
index 155194a..b7751f9 100644
--- a/Documentation/btrfs-image.txt
+++ b/Documentation/btrfs-image.txt
@@ -56,18 +56,6 @@ EXIT STATUS
 *btrfs-image* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-map-logical.txt 
b/Documentation/btrfs-map-logical.txt
index a8710bf..a3d110c 100644
--- a/Documentation/btrfs-map-logical.txt
+++ b/Documentation/btrfs-map-logical.txt
@@ -32,18 +32,6 @@ EXIT STATUS
 *btrfs-map-logical* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-show-super.txt 
b/Documentation/btrfs-show-super.txt
index 6fee0f1..1646be3 100644
--- a/Documentation/btrfs-show-super.txt
+++ b/Documentation/btrfs-show-super.txt
@@ -45,18 +45,6 @@ EXIT STATUS
 *btrfs-show-super* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http

Re: btrfs_qgroup_create unused parameter

2014-07-24 Thread Wang Shilong

Hi Kevin,

On 07/25/2014 07:23 AM, Kevin Brandstatter wrote:

I submitted a patch for this a week or two ago
(https://patchwork.kernel.org/patch/4486121/), but latest for-linus
doesn't have it merged, is it just being put of as minor, or is there a
problem with it?

I believe your patch will be picked up by Chris and sent to Linus
when next merge window is open.

Since Chris is sometimes busy, patch merging is always
delayed for some time.

Thanks,
Wang


-Kevin

On 07/04/2014 09:09 PM, Wang Shilong wrote:

Hi

I think you are right,  @name here is unneeded..
You can give a patch for that.^_^

Wang

|The code is pasted below for convenience of reference, but in the function to
create a qgruop, it taks a 4th parameter (char * name). I assume this is the 
name
of the path to limit, however, i don't see where its used anywhere in the 
function.

-Kevin Brandstatter

int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 qgroupid, *char** 
*name**)*
{
struct btrfs_root *quota_root;
struct btrfs_qgroup *qgroup;
int ret = 0;

mutex_lock(fs_info-qgroup_ioctl_lock);
quota_root = fs_info-quota_root;
if (!quota_root) {
ret = -EINVAL;
goto out;
}
qgroup = find_qgroup_rb(fs_info, qgroupid);
if (qgroup) {
ret = -EEXIST;
goto out;
}

ret = add_qgroup_item(trans, quota_root, qgroupid);
if (ret)
goto out;

spin_lock(fs_info-qgroup_lock);
qgroup = add_qgroup_rb(fs_info, qgroupid);
spin_unlock(fs_info-qgroup_lock);

if (IS_ERR(qgroup))
ret = PTR_ERR(qgroup);
out:
mutex_unlock(fs_info-qgroup_ioctl_lock);
return ret;
}|

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-07-24 Thread Wang Shilong

On 07/24/2014 10:48 PM, Liu Bo wrote:

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo bo.li@oracle.com
---
  fs/btrfs/inode.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ea7610 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -709,6 +709,18 @@ retry:
unlock_extent(io_tree, async_extent-start,
  async_extent-start +
  async_extent-ram_size - 1);
+
+   /*
+* we need to redirty the pages if we decide to
+* fallback to uncompressed IO, otherwise we
+* will not submit these pages down to lower
+* layers.
+*/
+   extent_range_redirty_for_io(inode,
+   async_extent-start,
+   async_extent-start +
+   async_extent-ram_size - 1);
+
goto retry;

BTW, if such ENOSPC happens, it means we could not reserve compressed space.
So we retry with nocompression codes, it will try to reserve more space. 
Any reason

we do such things?

Thanks,
Wang

}
goto out_free;


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-07-24 Thread Wang Shilong

On 07/25/2014 10:08 AM, Liu Bo wrote:

On Fri, Jul 25, 2014 at 09:53:43AM +0800, Wang Shilong wrote:

On 07/24/2014 10:48 PM, Liu Bo wrote:

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo bo.li@oracle.com
---
  fs/btrfs/inode.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ea7610 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -709,6 +709,18 @@ retry:
unlock_extent(io_tree, async_extent-start,
  async_extent-start +
  async_extent-ram_size - 1);
+
+   /*
+* we need to redirty the pages if we decide to
+* fallback to uncompressed IO, otherwise we
+* will not submit these pages down to lower
+* layers.
+*/
+   extent_range_redirty_for_io(inode,
+   async_extent-start,
+   async_extent-start +
+   async_extent-ram_size - 1);
+
goto retry;

BTW, if such ENOSPC happens, it means we could not reserve compressed space.
So we retry with nocompression codes, it will try to reserve more
space. Any reason
we do such things?

Compressed extents needs continuous space while uncompressed extents can have
more choices.

Yeah, that is reasonable. Thanks for your answer.^_^



thanks,
-liubo


Thanks,
Wang

}
goto out_free;

.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Quota Ignored On write problem still exist with 3.16-rc5

2014-07-22 Thread Wang Shilong
Hi Satoru-san,

On 07/23/2014 08:53 AM, Satoru Takeuchi wrote:
 Hi Wang,

 (2014/07/18 19:29), Wang Shilong wrote:
 On 07/18/2014 04:45 PM, Satoru Takeuchi wrote:
 Hi Josef, Chris,

 I found Quota Ignored On write problem still exist with 3.16-rc5,
 which Kevin reported before.

 Kevin's report:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35292.html

 The result of bisect:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35304.html
 I guess this is because Josef's patch delayed qgroup accounting, it will
 cause @refer and @excl updating very late...

 The patch maybe optimize to merge some delayed refs(for example), but
 it updates qgroup accounting when commiting transaction which will be
 very late,
 we may have accumulated many data..
 Thank you for your comment. I know of the code logic which caused
 this problem. However, what I want to say here is that this
 problem should be fixed as soon as possible. It is a important
 regression problem and we've already know the root cause.

 If it's impossible to fix it by releasing 3.16, I consider this
 patch should be reverted.
Since Btrfs Quota function is under heavy development, and should be
considered as *broken*.

I think we'd better close Btrfs quota function(Like snapshot-aware function)
until we really sit down and solve everything.

Thanks,
Wang

 Thanks,
 Satoru

 Thanks,
 Wang
 I bisected and found the bad commit is the following patch.

 ===
 commit fcebe4562dec83b3f8d3088d77584727b09130b2
 Author: Josef Bacik jba...@fb.com
 Date:   Tue May 13 17:30:47 2014 -0700

  Btrfs: rework qgroup accounting
 ===

 Josef, please take a look at this patch.
 Reproducer:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35299.html

 Could you tell me the progress of fixing this bug?
 In addition, could you fix it by 3.16?

 command log:
 ===
 # ./test.sh
 + uname -a
 Linux luna.soft.fujitsu.com 3.16.0-rc5 #2 SMP Tue Jul 15 13:39:46 JST 2014 
 x86_64 x86_64 x86_64 GNU/Linux
 + df -T /test7
 Filesystem Type  1K-blocks  Used Available Use% Mounted on
 /dev/sdc7  btrfs  29296640  1536  27169536   1% /test7
 + btrfs quota ena /test7
 + cd /test7
 + btrfs sub cre test
 Create subvolume './test'
 + btrfs sub l -a /test7
 ID 270 gen 66 top level 5 path test
 + btrfs qg lim 1G test  # limit test subvol to 1GB
 + btrfs qg show -pcre /test7
 qgroupid rfer  excl  max_rfer   max_excl parent  child
          --  -
 0/5  16384 16384 0  0--- ---
 0/27016384 16384 1073741824 0--- ---
 + dd if=/dev/zero of=test/file0 bs=1M count=2000
 2000+0 records in
 2000+0 records out
 2097152000 bytes (2.1 GB) copied, 9.67876 s, 217 MB/s   # write 2GB. It's a 
 bug!
 + sync
 + ls -lisaR /test7
 /test7:
 total 20
 256 16 drwxr-xr-x   1 root root8 Jul 18 15:12 .
2  4 drwxr-xr-x. 43 root root 4096 Jul 16 08:34 ..
 256  0 drwxr-xr-x   1 root root   10 Jul 18 15:17 test

 /test7/test:
 total 2048016
 256   0 drwxr-xr-x 1 root root 10 Jul 18 15:17 .
 256  16 drwxr-xr-x 1 root root  8 Jul 18 15:12 ..
 257 2048000 -rw-r--r-- 1 root root 2097152000 Jul 18 15:17 file0
 + btrfs qg show -pcre /test7
 qgroupid rfer   excl   max_rfer   max_excl parent  child
            --  -
 0/5  16384  16384  0  0--- ---
 0/2702097168384 2097168384 1073741824 0--- ---
 + btrfs quota dis /test7
 + btrfs sub del test
 Transaction commit: none (default)
 Delete subvolume '/test7/test'
 + set +x
 #
 ===

NOTE:
The reproducer here (./test.sh) is a bit different from above-mentioned
one because of some reason.

 Thanks,
 Satoru

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 .


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-21 Thread Wang Shilong

On 07/21/2014 10:00 PM, TM wrote:

Wang Shilong wangsl.fnst at cn.fujitsu.com writes:


Just my two cents:

Since 'btrfs replace' support RADI10, I suppose using replace
operation is better than 'device removal and add'.

Another Question is related to btrfs snapshot-aware balance.
How many snapshots did you have in your system?

Of course, During balance/resize/device removal operations,
you could still snapshot, but fewer snapshots should speed things up!

Anyway 'btrfs replace' is implemented more effective than
'device remova and add'.



Hi Wang,
just one subvolume, no snaphots or anything else.

device replace: to tell you the truth I have not used it in the past. Most
of my testing was done 2 years ago. So in this 'kind of production' system I
did not try it. But if I knew that it was faster, perhaps I could have used
it. Anyone has statistics for such a replace and the time it takes?

I don't have specific statistics about this. The conclusion come from
implementation differences between replace and 'device removal'.




Also, can replace be used when one device is missing? Cant find
documentation. eg.
btrfs replace start missing /dev/sdXX


TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-21 Thread Wang Shilong

On 07/21/2014 10:00 PM, TM wrote:

Wang Shilong wangsl.fnst at cn.fujitsu.com writes:


Just my two cents:

Since 'btrfs replace' support RADI10, I suppose using replace
operation is better than 'device removal and add'.

Another Question is related to btrfs snapshot-aware balance.
How many snapshots did you have in your system?

Of course, During balance/resize/device removal operations,
you could still snapshot, but fewer snapshots should speed things up!

Anyway 'btrfs replace' is implemented more effective than
'device remova and add'.



Hi Wang,
just one subvolume, no snaphots or anything else.

device replace: to tell you the truth I have not used it in the past. Most
of my testing was done 2 years ago. So in this 'kind of production' system I
did not try it. But if I knew that it was faster, perhaps I could have used
it. Anyone has statistics for such a replace and the time it takes?

I don't have specific statistics about this. The conclusion come from
implementation differences between replace and 'device removal'.



Also, can replace be used when one device is missing? Cant find
documentation. eg.
btrfs replace start missing /dev/sdXX
The latest btrfs-progs include man page of btrfs-replace. Actually, you 
could use it

something like:

btrfs replace start  srcdev|devid targetdev mnt

You could use 'btrfs file show' to see missing device id. and then run 
btrfs replace.


Thanks,
Wang



TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread Wang Shilong

Hi,

On 07/20/2014 04:45 PM, TM wrote:

Hi,

I have a raid10 with 4x 3TB disks on a microserver
http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM

Recently one disk started to fail (smart errors), so I replaced it
Mounted as degraded, added new disk, removed old
Started yesterday
I am monitoring /var/log/messages and it seems it will take a long time
Started at about 8010631739392
And 20 hours later I am at 6910631739392
btrfs: relocating block group 6910631739392 flags 65

At this rate it will take a week to complete the raid rebuild!!!

Just my two cents:

Since 'btrfs replace' support RADI10, I suppose using replace
operation is better than 'device removal and add'.

Another Question is related to btrfs snapshot-aware balance.
How many snapshots did you have in your system?

Of course, During balance/resize/device removal operations,
you could still snapshot, but fewer snapshots should speed things up!

Anyway 'btrfs replace' is implemented more effective than
'device remova and add'.:-)

Thanks,
Wang


Furthermore it seems that the operation is getting slower and slower
When the rebuild started I had a new message every half a minute, now it’s
getting to OneAndHalf minutes
Most files are small files like flac/jpeg

One week for a raid10 rebuild 4x3TB drives is a very long time.
Any thoughts?
Can you share any statistics from your RAID10 rebuilds?

If I shut down the system, before the rebuild, what is the proper procedure
to remount it? Again degraded? Or normally? Can the process of rebuilding
the raid continue after a reboot? Will it survive, and continue rebuilding?

Thanks in advance
TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Quota Ignored On write problem still exist with 3.16-rc5

2014-07-18 Thread Wang Shilong
On 07/18/2014 04:45 PM, Satoru Takeuchi wrote:
 Hi Josef, Chris,

 I found Quota Ignored On write problem still exist with 3.16-rc5,
 which Kevin reported before.

 Kevin's report:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35292.html

 The result of bisect:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35304.html
I guess this is because Josef's patch delayed qgroup accounting, it will
cause @refer and @excl updating very late...

The patch maybe optimize to merge some delayed refs(for example), but
it updates qgroup accounting when commiting transaction which will be
very late,
we may have accumulated many data..

Thanks,
Wang

 I bisected and found the bad commit is the following patch.

 ===
 commit fcebe4562dec83b3f8d3088d77584727b09130b2
 Author: Josef Bacik jba...@fb.com
 Date:   Tue May 13 17:30:47 2014 -0700

 Btrfs: rework qgroup accounting
 ===

 Josef, please take a look at this patch.
 Reproducer:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35299.html

 Could you tell me the progress of fixing this bug?
 In addition, could you fix it by 3.16?

 command log:
 ===
 # ./test.sh
 + uname -a
 Linux luna.soft.fujitsu.com 3.16.0-rc5 #2 SMP Tue Jul 15 13:39:46 JST 2014 
 x86_64 x86_64 x86_64 GNU/Linux
 + df -T /test7
 Filesystem Type  1K-blocks  Used Available Use% Mounted on
 /dev/sdc7  btrfs  29296640  1536  27169536   1% /test7
 + btrfs quota ena /test7
 + cd /test7
 + btrfs sub cre test
 Create subvolume './test'
 + btrfs sub l -a /test7
 ID 270 gen 66 top level 5 path test
 + btrfs qg lim 1G test  # limit test subvol to 1GB
 + btrfs qg show -pcre /test7
 qgroupid rfer  excl  max_rfer   max_excl parent  child
          --  -
 0/5  16384 16384 0  0--- ---
 0/27016384 16384 1073741824 0--- ---
 + dd if=/dev/zero of=test/file0 bs=1M count=2000
 2000+0 records in
 2000+0 records out
 2097152000 bytes (2.1 GB) copied, 9.67876 s, 217 MB/s   # write 2GB. It's a 
 bug!
 + sync
 + ls -lisaR /test7
 /test7:
 total 20
 256 16 drwxr-xr-x   1 root root8 Jul 18 15:12 .
   2  4 drwxr-xr-x. 43 root root 4096 Jul 16 08:34 ..
 256  0 drwxr-xr-x   1 root root   10 Jul 18 15:17 test

 /test7/test:
 total 2048016
 256   0 drwxr-xr-x 1 root root 10 Jul 18 15:17 .
 256  16 drwxr-xr-x 1 root root  8 Jul 18 15:12 ..
 257 2048000 -rw-r--r-- 1 root root 2097152000 Jul 18 15:17 file0
 + btrfs qg show -pcre /test7
 qgroupid rfer   excl   max_rfer   max_excl parent  child
            --  -
 0/5  16384  16384  0  0--- ---
 0/2702097168384 2097168384 1073741824 0--- ---
 + btrfs quota dis /test7
 + btrfs sub del test
 Transaction commit: none (default)
 Delete subvolume '/test7/test'
 + set +x
 #
 ===

   NOTE:
   The reproducer here (./test.sh) is a bit different from above-mentioned
   one because of some reason.

 Thanks,
 Satoru

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: `btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed.` in `btrfsck`

2014-07-18 Thread Wang Shilong

Hi,

There are some patches for fsck flighting, they  are integrated  in David's 
branches.
You can pull from David's latest branch, and see if it helps:

https://github.com/kdave/btrfs-progs  integration-20140704

Have a try and see if it helps anyway.

Thanks,
Wang

 Hi together,
 I'm experiencing the following issues when I invoke `btrfsck` on a
 sparse file image with a GPT and one (the only) btrfs partition attached
 to a loop device
 
$ sudo btrfsck --repair --init-csum-tree --init-extent-tree -b
 /dev/loop0p1
Incorrect local backref count on 128510738432 root 5 owner 3849475
 offset 0 found 1 wanted 0 back 0xbab41270
backpointer mismatch on [128510738432 4096]
ref mismatch on [128510742528 12288] extent item 0, found 1
btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags
  1)' failed.
 
$ sudo btrfsck --repair --init-csum-tree --init-extent-tree /dev/loop0p1
Incorrect local backref count on 128510726144 root 5 owner 3849470
 offset 0 found 1 wanted 0 back 0xbbcb9500
backpointer mismatch on [128510726144 12288]
ref mismatch on [128510738432 4096] extent item 0, found 1
adding new data backref on 128510738432 root 5 owner 3849475 offset
 0 found 1
Backref 128510738432 root 5 owner 3849475 offset 0 num_refs 0 not
 found in extent tree
Incorrect local backref count on 128510738432 root 5 owner 3849475
 offset 0 found 1 wanted 0 back 0xbbcb9630
backpointer mismatch on [128510738432 4096]
ref mismatch on [128510742528 12288] extent item 0, found 1
btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags
  1)' failed.
 
$ sudo btrfsck --repair /dev/loop0p1
Incorrect local backref count on 130861096960 root 5 owner 22733727
 offset 0 found 1 wanted 0 back 0xc7c7d170
backpointer mismatch on [130861096960 8192]
ref mismatch on [130861105152 8192] extent item 0, found 1
btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags
  1)' failed.
 
$ sudo btrfsck --repair /dev/loop0p1
Backref 130861096960 root 5 owner 22733727 offset 0 num_refs 0 not
 found in extent tree
Incorrect local backref count on 130861096960 root 5 owner 22733727
 offset 0 found 1 wanted 0 back 0xc7f31170
backpointer mismatch on [130861096960 8192]
ref mismatch on [130861105152 8192] extent item 0, found 1
btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags
  1)' failed.
 
 I'm using `btrfs-progs` 24cf4d8c3ee924b474f68514e0167cc2e602a48d on
 Linux 3.16-rc5 (anything else, i.e. older versions, give me immediate
 error after start because errornous file system)
 
 I'd like to know whether this (assertion) error is related to a bug or
 missing feature in btrfs-progs and might be fixed at some point or
 whether this might indicate a completely messed up btrfs.
 
 Best regards,
 Karl-P. Richter
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] Btrfs: fix wrong extent mapping for DirectIO

2014-07-16 Thread Wang Shilong
btrfs_next_leaf() will use current leaf's last key to search
and then return a bigger one. So it may still return a file extent
item that is smaller than expected value and we will
get an overflow here for @em-len.

This is easy to reproduce for Btrfs Direct writting, it did not
cause any problem, because writting will re-insert right mapping later.

However, by hacking code to make DIO support compression, wrong extent
mapping is kept and it encounter merging failure(EEXIST) quickly.

Fix this problem by looping to find next file extent item that is bigger
than @start or we could not find anything more.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ed8b55c..d1ba5e4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6308,6 +6308,8 @@ next:
goto not_found;
if (start + len = found_key.offset)
goto not_found;
+   if (start  found_key.offset)
+   goto next;
em-start = start;
em-orig_start = start;
em-len = found_key.offset - start;
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] Btrfs: fix off-by-one in cow_file_range_inline()

2014-07-16 Thread Wang Shilong
Btrfs could still inline file data if its size is same as
page size, so don't skip max value here.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e678391..e2c3d63 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -249,8 +249,8 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start  0 ||
-   actual_end = PAGE_CACHE_SIZE ||
-   data_len = BTRFS_MAX_INLINE_DATA_SIZE(root) ||
+   actual_end  PAGE_CACHE_SIZE ||
+   data_len  BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size 
(actual_end  (root-sectorsize - 1)) == 0) ||
end + 1  isize ||
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] Btrfs: fix wrong max inline data size limit

2014-07-16 Thread Wang Shilong
inline data is stored from offset of @disk_bytenr in
struct btrfs_file_extent_item. So substracting total
size of struct btrfs_file_extent_item is wrong, fix it.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/ctree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index be91397..22fbd04 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -394,7 +394,7 @@ struct btrfs_header {
 #define BTRFS_LEAF_DATA_SIZE(r) (__BTRFS_LEAF_DATA_SIZE(r-leafsize))
 #define BTRFS_MAX_INLINE_DATA_SIZE(r) (BTRFS_LEAF_DATA_SIZE(r) - \
sizeof(struct btrfs_item) - \
-   sizeof(struct btrfs_file_extent_item))
+   offsetof(struct btrfs_file_extent_item, 
disk_bytenr))
 #define BTRFS_MAX_XATTR_SIZE(r)(BTRFS_LEAF_DATA_SIZE(r) - \
 sizeof(struct btrfs_item) -\
 sizeof(struct btrfs_dir_item))
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] Btrfs: fix wrong skipping compression for an inode

2014-07-16 Thread Wang Shilong
If a file's compression ratios is bad, we will set NOCOMPRESS
flag for it, and it will skip compression for that inode next time.

However, if we remount fs to COMPRESS_FORCE, it still should try
if we could compress pages for that inode, this patch fix wrong
check for this problem.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a58f31c..20798f8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -348,6 +348,23 @@ static noinline int add_async_extent(struct async_cow *cow,
return 0;
 }
 
+static inline int inode_need_compress(struct inode *inode)
+{
+   struct btrfs_root *root = BTRFS_I(inode)-root;
+
+   /* force compress */
+   if (btrfs_test_opt(root, FORCE_COMPRESS))
+   return 1;
+   /* bad compression ratios */
+   if (BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS)
+   return 0;
+   if (btrfs_test_opt(root, COMPRESS) ||
+   BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS ||
+   BTRFS_I(inode)-force_compress)
+   return 1;
+   return 0;
+}
+
 /*
  * we create compressed extents in two phases.  The first
  * phase compresses a range of pages that have already been
@@ -444,10 +461,7 @@ again:
 * inode has not been flagged as nocompress.  This flag can
 * change at any time if we discover bad compression ratios.
 */
-   if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS) 
-   (btrfs_test_opt(root, COMPRESS) ||
-(BTRFS_I(inode)-force_compress) ||
-(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))) {
+   if (inode_need_compress(inode)) {
WARN_ON(pages);
pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
if (!pages) {
@@ -1076,7 +1090,8 @@ static int cow_file_range_async(struct inode *inode, 
struct page *locked_page,
async_cow-locked_page = locked_page;
async_cow-start = start;
 
-   if (BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS)
+   if (BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS 
+   !btrfs_test_opt(root, FORCE_COMPRESS))
cur_end = end;
else
cur_end = min(end, start + 512 * 1024 - 1);
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] Btrfs: fall into nocompression codes quickly if possible

2014-07-16 Thread Wang Shilong
If flag NOCOMPRESS is set which means bad compression ratio,
we could avoid call cow_file_range_async() for this case earlier.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 20798f8..e678391 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1468,7 +1468,6 @@ static int run_delalloc_range(struct inode *inode, struct 
page *locked_page,
  unsigned long *nr_written)
 {
int ret;
-   struct btrfs_root *root = BTRFS_I(inode)-root;
int force_cow = need_force_cow(inode, start, end);
 
if (BTRFS_I(inode)-flags  BTRFS_INODE_NODATACOW  !force_cow) {
@@ -1477,9 +1476,7 @@ static int run_delalloc_range(struct inode *inode, struct 
page *locked_page,
} else if (BTRFS_I(inode)-flags  BTRFS_INODE_PREALLOC  !force_cow) {
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 0, nr_written);
-   } else if (!btrfs_test_opt(root, COMPRESS) 
-  !(BTRFS_I(inode)-force_compress) 
-  !(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS)) {
+   } else if (!inode_need_compress(inode)) {
ret = cow_file_range(inode, locked_page, start, end,
  page_started, nr_written, 1);
} else {
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] Btrfs: fix wrong write range for filemap_fdatawrite_range()

2014-07-16 Thread Wang Shilong
filemap_fdatawrite_range() expect the third arg to be @end
not @len, fix it.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e2c3d63..ed8b55c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7567,7 +7567,8 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
count = iov_iter_count(iter);
if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
 BTRFS_I(inode)-runtime_flags))
-   filemap_fdatawrite_range(inode-i_mapping, offset, count);
+   filemap_fdatawrite_range(inode-i_mapping, offset,
+offset + count - 1);
 
if (rw  WRITE) {
/*
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qgroup destroy / assign

2014-07-06 Thread Wang Shilong

Hi Kevin,

On 07/05/2014 05:10 AM, Kevin Brandstatter wrote:

how are qgroups accounted for? Are they specifially tied to one
subvolume on creation?

Qgroup implementation is aslo a little confusing for me at first:-) .

Yes, a qgroup is created automatically tied to one subvolume on creation
with the same objectid.

To implement qgroup group, you may want to do something like following:

[1/1]
/   \
   / \
 sub1(5)   subv2(257)



If so, is it possible to auto delete relavant qgroups on deletion of the
subvolume?

I supposed so, according to latest qgroup patches flighting on, a subvolume
qgroup should be destroyed safely, when it finished sub-tree space 
accounting.




also, how exactly does qgroup assign work? I havent been able to get it
to work at all.
in btrfsprogs cmds-cgroup.c
if ((args.src  48) = (args.dst  48)) {
   fprintf(stderr, ERROR: bad relation requested '%s'\n, path);
   return 1;
}

Oh, this is to implement a strict level qgroup group, which means a u64 is
divided into two parts, 16bits for level and the rest for id.

So we ask parent qgroup's level must be greater than child's qgroup.that
is the code you see above.

You could create a qgroup relation like this:

# btrfs qgroup assign 256 1/1 mnt

Hopely, this could help you.

always seems to fail. I tried creating another qgroup id 1000, and
assigning it to as sub, and vice versa, as well as assigning the sub to
the root, and vice versa, as well as one subvol to another.
The fixme comment leads me to believe that the src should be a path not
a qgroup (FIXME src should accept subvol path)
but the progs let me create a qgroup without a subvol, which makes sense
if you want to be able to have some meta-qgroup for a bunch of subvols.
Further on noticing that a sub create also creates a qgroup with the
same id as the subvol, it would seem that the qgroup is tied to the
subvol via this shared id.

-Kevin Brandstatter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-04 Thread Wang Shilong

On 07/04/2014 02:02 PM, Marc MERLIN wrote:

On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote:

Well, I explained the problem, ext4 and others of course tell me which
devid
an error is on, hopefully btrfs will able to do so in the near future.

So it is ok for you to print one of btrfs filesystem device(for example
device name) ? maybe it is not really physical address the metadata
locates in, this is easier.

Yes, the device name is great, now I can see which of my 3 filesystems has a
problem, that's a start.
Next would be knowing which filename this occurred in, but I understand this
would be harder to get from that point in the code.
Ideally scrub should be able to find that problem and report it, at least I
would know which filesystem to rescan for errors:


Back to the original problem, would you agree that
find / -type f -print0 | xargs grep . /dev/nul?

I'll also have to try this to see if I get lucky with it :)


+   printk_ratelimited(BTRFS (device: %s) parent transid verify
failed on %llu wanted %llu found %llu\n,
+   eb-fs_info-sb-s_id, eb-start,
+   parent_transid, btrfs_header_generation(eb));

That looks great. Ideally all such errors would look like this.
OK,  i found there is a mirror num recorded  in struct @extent_buffer, 
so it is not diffcult
to locate the real physical address that this corrupt metadata block 
locates.


But another question is that if such problems happen, it should also 
help little, because

usually other mirrors maybe have the same errors...

Looking at btrfs codes, if it found such error which means checksum has 
passed but generation
verification fails it  dosen't even try other mirrors if existed, so i 
supposed such kinds of errors

is fatal and could be corrected by btrfs itself..

Anyway, let me see if there are any other output even dose not output 
btrfs specific info.


Thanks,


Thanks for looking into this, I appreciate it.

Best,
Marc


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-04 Thread Wang Shilong

On 07/04/2014 02:02 PM, Marc MERLIN wrote:

On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote:

Well, I explained the problem, ext4 and others of course tell me which
devid
an error is on, hopefully btrfs will able to do so in the near future.

So it is ok for you to print one of btrfs filesystem device(for example
device name) ? maybe it is not really physical address the metadata
locates in, this is easier.

Yes, the device name is great, now I can see which of my 3 filesystems has a
problem, that's a start.
Next would be knowing which filename this occurred in, but I understand this
would be harder to get from that point in the code.
Ideally scrub should be able to find that problem and report it, at least I
would know which filesystem to rescan for errors:
So there is a problem, ususally such generation verifications errors is 
related to
Btrfs metdata block.it is maybe just a Btrfs node , not related to any 
actually files,

or even a leaf that contains more that one file

If this is a read/write path from normal fs/file root, we may output 
its' root..but if this is

something like extent root...i think it helps little




Back to the original problem, would you agree that
find / -type f -print0 | xargs grep . /dev/nul?

I'll also have to try this to see if I get lucky with it :)


+   printk_ratelimited(BTRFS (device: %s) parent transid verify
failed on %llu wanted %llu found %llu\n,
+   eb-fs_info-sb-s_id, eb-start,
+   parent_transid, btrfs_header_generation(eb));

That looks great. Ideally all such errors would look like this.

Thanks for looking into this, I appreciate it.

Best,
Marc


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_qgroup_create unused parameter

2014-07-04 Thread Wang Shilong
Hi 

I think you are right,  @name here is unneeded..
You can give a patch for that.^_^

Wang
 |The code is pasted below for convenience of reference, but in the function 
 to 
 create a qgruop, it taks a 4th parameter (char * name). I assume this is the 
 name
 of the path to limit, however, i don't see where its used anywhere in the 
 function.
 
 -Kevin Brandstatter
 
 int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info, u64 qgroupid, *char** 
 *name**)*
 {
   struct btrfs_root *quota_root;
   struct btrfs_qgroup *qgroup;
   int ret = 0;
 
   mutex_lock(fs_info-qgroup_ioctl_lock);
   quota_root = fs_info-quota_root;
   if (!quota_root) {
   ret = -EINVAL;
   goto out;
   }
   qgroup = find_qgroup_rb(fs_info, qgroupid);
   if (qgroup) {
   ret = -EEXIST;
   goto out;
   }
 
   ret = add_qgroup_item(trans, quota_root, qgroupid);
   if (ret)
   goto out;
 
   spin_lock(fs_info-qgroup_lock);
   qgroup = add_qgroup_rb(fs_info, qgroupid);
   spin_unlock(fs_info-qgroup_lock);
 
   if (IS_ERR(qgroup))
   ret = PTR_ERR(qgroup);
 out:
   mutex_unlock(fs_info-qgroup_ioctl_lock);
   return ret;
 }|
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-03 Thread Wang Shilong

On 07/03/2014 04:13 PM, Liu Bo wrote:

On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote:

This got triggered by an rsync I think. I'm not sure which of my btrfs FS
has the issue yet since BUG_ON isn't very helpful as discussed earlier.

[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
found 18120
[160562.950297] [ cut here ]
[160562.965904] kernel BUG at fs/btrfs/locking.c:269!

But shouldn't messages like 'parent transid verify failed' print which device
this happened on to give the operator a hint on where the problem is?

Could someone do a pass at those and make sure they all print the device
ID/name?

Bug below:

Full log before the crash:
INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
btrfs-transacti D  0  3358  2 0x
  8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
  8800c6914390 000141c0 7fff 8801433b8f10
  0002 8161c9b0 7fff 8800c50ebc60
Call Trace:
  [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
  [8161d3c6] schedule+0x73/0x75
  [8161c9e9] schedule_timeout+0x39/0x129
  [8107653d] ? get_parent_ip+0xd/0x3c
  [8162338f] ? preempt_count_add+0x7a/0x8d
  [8161dbac] __wait_for_common+0x11a/0x159
  [8107810f] ? wake_up_state+0x12/0x12
  [8161dc0f] wait_for_completion+0x24/0x26
  [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
  [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
  [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
  [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
  [81227b41] transaction_kthread+0xf8/0x1ab
  [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
  [8106b4b4] kthread+0xae/0xb6
  [8106b406] ? __kthread_parkme+0x61/0x61
  [8162667c] ret_from_fork+0x7c/0xb0
  [8106b406] ? __kthread_parkme+0x61/0x61
INFO: task kworker/u8:13:13157 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
kworker/u8:13   D  0 13157  2 0x0080
Workqueue: btrfs-flush_delalloc normal_work_helper
  8800041cfc00 0046 8800041cfbd0 8800041cffd8
  8800034f40d0 000141c0 88021f2941c0 8800034f40d0
  8800041cfca0 810fdc2f 0002 8800041cfc10
Call Trace:
  [810fdc2f] ? wait_on_page_read+0x3c/0x3c
  [8161d3c6] schedule+0x73/0x75
  [8161d56b] io_schedule+0x60/0x7a
  [810fdc3d] sleep_on_page+0xe/0x12
  [8161d7fd] __wait_on_bit+0x48/0x7a
  [810fdbdd] wait_on_page_bit+0x7a/0x7c
  [81084821] ? autoremove_wake_function+0x34/0x34
  [810fef03] filemap_fdatawait_range+0x7e/0x126
  [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4
  [8122d7aa] ? btrfs_writepages+0x28/0x2a
  [811084c6] ? do_writepages+0x1e/0x2c
  [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57
  [8124006f] btrfs_wait_ordered_range+0x6a/0x11a
  [8122fe01] btrfs_run_delalloc_work+0x27/0x69
  [812508db] normal_work_helper+0xfe/0x240
  [81065d7e] process_one_work+0x195/0x2d2
  [81066020] worker_thread+0x136/0x205
  [81065eea] ? process_scheduled_works+0x2f/0x2f
  [8106b4b4] kthread+0xae/0xb6
  [8106b406] ? __kthread_parkme+0x61/0x61
  [8162667c] ret_from_fork+0x7c/0xb0
  [8106b406] ? __kthread_parkme+0x61/0x61
INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
btrfs-transacti D  0  3358  2 0x
  8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
  8800c6914390 000141c0 7fff 8801433b8f10
  0002 8161c9b0 7fff 8800c50ebc60
Call Trace:
  [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
  [8161d3c6] schedule+0x73/0x75
  [8161c9e9] schedule_timeout+0x39/0x129
  [8107653d] ? get_parent_ip+0xd/0x3c
  [8162338f] ? preempt_count_add+0x7a/0x8d
  [8161dbac] __wait_for_common+0x11a/0x159
  [8107810f] ? wake_up_state+0x12/0x12
  [8161dc0f] wait_for_completion+0x24/0x26
  [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
  [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
  [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
  [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
  [81227b41] transaction_kthread+0xf8/0x1ab
  [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c

Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-03 Thread Wang Shilong

On 07/04/2014 12:11 PM, Marc MERLIN wrote:

On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote:

[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
found 18120

What should I be doing about this?
Does it mean that I do have some kind of corruption/damage on my
filesystem?


If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read
the copy and repair the crc with the good one, it's all we can do about it.

Right. It's not quite my question though.
I mean I don't know what device it's on, never mind what file is affected.
If I know which file is corrupted, I can simply delete it and restore from
backup, no biggie.
Right now I don't even know which one of my 3 btrfs filesystems (over 10TB)
has this problem. That makes the message kind of problematic: you have a
problem, but not I'm not giving you any fighting chance of finding out
where :)
  

Also, is it possible to have all these messages state which devid they
occurred on? I don't even know which device I should be worrying about
right now, and although I'm running scrub now, my understanding is that
scrub doesn't actually look at FS structures and is likely to miss this
anyway.

Yes we can but it'd need a bit more effort, for now, all device msg we've seen
in panic info comes from sb-s_id which points to @fs_info-latest_device.

Food for though, as is the message is unfortunately close to useless, except
to an FS developer with a system that has only one btrfs filesystem.

On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote:

I am afraid, scrub maybe could not fix such kind of errors, all scrub
doing is to verify whether checksums match and if possible use good
mirrors to rewrite bad one.

I wouldn't be bothered if scrub can't fix it, but it would be good if it
could tell me.
  

Such errors seem imply contention itself is corrupted, we may have passed
checksum check after ending io, but we fail generation check afterwards.
  
So should I really replace scrub with

find / -type f -print0 | xargs grep . /dev/null ?

Basically we need something that will scan the filesystem and ensure that
all files are reachable correctly without causing filesystem problems, and
if one is bad, output the name of the bad file(s).
Scrub only does a half job of that it seems.


To get physical device name, we still need mirror num to know which device
we are locating.

Ok, so it's missing for now and therefore the code can't easily report it,
I understand.

Well, I explained the problem, ext4 and others of course tell me which devid
an error is on, hopefully btrfs will able to do so in the near future.
So it is ok for you to print one of btrfs filesystem device(for example 
device name)  ?
maybe it is not really  physical address the metadata locates in, this 
is easier.





Back to the original problem, would you agree that
find / -type f -print0 | xargs grep . /dev/nul?
may do a better job scanning the entire FS for problems than scrub would?

Thanks,
Marc


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-03 Thread Wang Shilong

On 07/04/2014 11:07 AM, Liu Bo wrote:

On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote:

Thanks for the patch. Hopefully this will make it to the next 3.15.x
kernel.

I also went back to 3.14 anyway since the 'blocked for 120 seconds' look
like another instance of deadlocks we've been discussing here.

But just curious:


[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
found 18120

What should I be doing about this?
Does it mean that I do have some kind of corruption/damage on my
filesystem?


If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read
the copy and repair the crc with the good one, it's all we can do about it.


Also, is it possible to have all these messages state which devid they
occurred on? I don't even know which device I should be worrying about
right now, and although I'm running scrub now, my understanding is that
scrub doesn't actually look at FS structures and is likely to miss this
anyway.

Yes we can but it'd need a bit more effort, for now, all device msg we've seen
in panic info comes from sb-s_id which points to @fs_info-latest_device.

You means something like this:

+   printk_ratelimited(BTRFS (device: %s) parent transid verify 
failed on %llu wanted %llu found %llu\n,

+   eb-fs_info-sb-s_id, eb-start,
+   parent_transid, btrfs_header_generation(eb));




thanks,
-liubo


Thanks,
Marc
--
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
    what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901

.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] btrfs: label should not contain return char

2014-07-01 Thread Wang Shilong
Hi Satoru and all,

I think there maybe a leftover issue.
That is if we don't set label, in default it will output a blank line.

Steps to reproduce:

# mkfs.btrfs -f /dev/sdb
# mount /dev/sdb
# cat /sys/fs/btrfs/uuid/label --an extra line will be outputed.

This is because in btrfs_label_show(), we did something like this directly:

return snprintf(buf, PAGE_SIZE, %s\n, fs_info-super_copy-label);

Maybe we can have a check whether label is NULL before we output?
otherwise,the
extra blank line is outputed, IMO this is not so nice thing!

Thanks,
Wang

On 07/01/2014 01:22 PM, Satoru Takeuchi wrote:
 Although Anand once sent the following two patches,

  - [PATCH 1/2 v4] btrfs: label should not contain return char
  - [PATCH 2/2 v4] btrfs: usage error should not be logged into system log

 only the latter patch was merged to mason/for-linus and 3.16-rc3
 as 402a0f4 (by accident?). It results in that the former patch can't
 be cleanly applied to 3.16-rc3.

 I fixed this problem, wrote a reproducer, and tested it.

 Test Result:
 3.16-rc3 w/o this patch: fail
 3.16-rc3 w/ this patch:  pass



 Subject: [PATCH v5] btrfs: label should not contain return char

 From: Anand Jain anand.j...@oracle.com

 generally if you use
   echo test  /sys/fs/btrfs/fsid/label
 it would introduce return char at the end and it can not
 be part of the label. The correct command is
   echo -n test  /sys/fs/btrfs/fsid/label

 This patch will check for this user error

 reproducer.sh:
 ===
 #!/bin/bash

 TEST_DEV=/dev/vdb
 TEST_DIR=/home/sat/mnt

 umount /home/sat/mnt

 mkfs.btrfs -f $TEST_DEV
 UUID=$(btrfs fi show $TEST_DEV | head -1 | sed -e 's/.*uuid: 
 \([-0-9a-z]*\)$/\1/')
 mount $TEST_DEV $TEST_DIR
 LABELFILE=/sys/fs/btrfs/$UUID/label

 echo testlabel $LABELFILE
 LINES=$(cat $LABELFILE | wc -l | awk '{print $1}')

 RET=1
 if [ $LINES -eq 1 ] ; then
 echo '[PASS] Trailing \n is removed correctly.' 2
 RET=0
 else
 echo '[FAIL] Trailing \n still exists.' 2
 fi

 exit $RET
 ===

 Signed-off-by: Anand Jain anand.j...@oracle.com
 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 ---
  v5: tweak to be able to apply on the top of 402a0f4.
  add test program.
  v4: used memcpy and memset. Thanks David again
  v3: accepts review comments. Thanks David and Eric again
  v2: accepts review comments. Thanks Eric and Roman

 ---
  fs/btrfs/sysfs.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
 index df39458..dcae61a 100644
 --- a/fs/btrfs/sysfs.c
 +++ b/fs/btrfs/sysfs.c
 @@ -374,8 +374,15 @@ static ssize_t btrfs_label_store(struct kobject *kobj,
   struct btrfs_trans_handle *trans;
   struct btrfs_root *root = fs_info-fs_root;
   int ret;
 + size_t p_len;
  
 - if (len = BTRFS_LABEL_SIZE)
 + /*
 +  * p_len is the len until the first occurrence of either
 +  * '\n' or '\0'
 +  */
 + p_len = strcspn(buf, \n);
 +
 + if (p_len = BTRFS_LABEL_SIZE)
   return -EINVAL;
  
   trans = btrfs_start_transaction(root, 0);
 @@ -383,7 +390,8 @@ static ssize_t btrfs_label_store(struct kobject *kobj,
   return PTR_ERR(trans);
  
   spin_lock(root-fs_info-super_lock);
 - strcpy(fs_info-super_copy-label, buf);
 + memset(fs_info-super_copy-label, 0, BTRFS_LABEL_SIZE);
 + memcpy(fs_info-super_copy-label, buf, p_len);
   spin_unlock(root-fs_info-super_lock);
   ret = btrfs_commit_transaction(trans, root);
  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v3 RESEND] Btrfs: device_list_add() should not update list when mounted

2014-07-01 Thread Wang Shilong

Hi Anand,

Sorry for delay reply, more comments below:

On 06/13/2014 12:26 PM, Anand Jain wrote:

From: Anand Jain anand.j...@oracle.com

device_list_add() is called when user runs btrfs dev scan, which would add
any btrfs device into the btrfs_fs_devices list.

Now think of a mounted btrfs. And a new device which contains the a SB
from the mounted btrfs devices.

In this situation when user runs btrfs dev scan, the current code would
just replace existing device with the new device.

Which is to note that old device is neither closed nor gracefully
removed from the btrfs.

The FS is still operational with the old bdev however the device name
is the btrfs_device is new which is provided by the btrfs dev scan.

reproducer:

devmgt[1] detach /dev/sdc

replace the missing disk /dev/sdc

btrfs rep start -f 1 /dev/sde /btrfs
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
 Total devices 2 FS bytes used 32.00KiB
 devid1 size 958.94MiB used 115.88MiB path /dev/sde
 devid2 size 958.94MiB used 103.88MiB path /dev/sdd

make /dev/sdc to reappear

devmgt attach host2

btrfs dev scan

btrfs fi show -m
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M
 Total devices 2 FS bytes used 32.00KiB^M
 devid1 size 958.94MiB used 115.88MiB path /dev/sdc - Wrong.
 devid2 size 958.94MiB used 103.88MiB path /dev/sdd

since /dev/sdc has been replaced with /dev/sde, the /dev/sdc shouldn't be
part of the btrfs-fsid when it reappears. If user want it to be part of it
then sys admin should be using btrfs device add instead.

[1] github.com/anajain/devmgt.git

Signed-off-by: Anand Jain anand.j...@oracle.com
---
v2-v3: simpler commit and comment message
v1-v2: remove '---' in commit messages which conflict with git am

  fs/btrfs/volumes.c | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bb2cd66..56822f0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -472,6 +472,7 @@ static noinline int device_list_add(const char *path,
device = __find_device(fs_devices-devices, devid,
   disk_super-dev_item.uuid);
}
+

You add an extra line here.

if (!device) {
if (fs_devices-opened)
return -EBUSY;
@@ -497,6 +498,32 @@ static noinline int device_list_add(const char *path,
  
  		device-fs_devices = fs_devices;

} else if (!device-name || strcmp(device-name-str, path)) {
+   /*
+* When FS is already mounted.
+* 1. If you are here and if the device-name is NULL that means
+*this device was missing at time of FS mount.
+* 2. If you are here and if the device-name is different from 
'path'
+*that means either
+*  a. The same device disappeared and reappeared with 
different
+* name. or
+*  b. The missing-disk-which-was-replaced, has reappeared 
now.
+*
+*  We must allow 1 and 2a above. But 2b would be a spurious and
+*  unintentional.
+*  Further in case of 1 and 2a above, the disk at 'path' would 
have
+*  missed some transaction when it was away and in case of 2a
+*  the stale bdev has to be updated as well.
+*  2b must not be allowed at all time.
+*/
+
+   /*
+* As of now don't allow update to btrfs_fs_device through the
+* btrfs dev scan cli, after FS has been mounted.
+*/
+
+   if (fs_devices-opened)
+   return -EBUSY;
+
I agree we don't allow to update device list when mounted, i tested this 
patch, it worked but

it will output the following message:

Scanning for Btrfs filesystems in '/dev/sdc'
ERROR: unable to scan the device '/dev/sdc' - Device or resource busy

I think this is a little confusing for common users. Maybe don't return 
error but output

some log message into kernel buffer, better?

Thanks,
Wang

name = rcu_string_strdup(path, GFP_NOFS);
if (!name)
return -ENOMEM;


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: check generation as replace duplicates devid+uuid

2014-07-01 Thread Wang Shilong

Hi Anand,

On 06/13/2014 12:26 PM, Anand Jain wrote:
  

..snip..

@@ -523,6 +523,16 @@ static noinline int device_list_add(const char *path,
  
  		if (fs_devices-opened)

return -EBUSY;
+   else {
+   /*
+* That is if the FS is _not_ mounted and if you are 
here, that
+* means there is more than one disk with same uuid and 
devid.
+* We keep the one with larger generation number or the 
last-in
+* if generation are equal.
+*/
+   if (found_transid  device-generation)
+   return -EINVAL;
+   }
I tried this patch it outputed the following message if it encounter two 
device with the same uuid and device id:


Scanning for Btrfs filesystems
ERROR: device scan failed '/dev/sdc' - Invalid argument

Same comment as your first patch here.
  
  		name = rcu_string_strdup(path, GFP_NOFS);

if (!name)
@@ -535,6 +545,15 @@ static noinline int device_list_add(const char *path,
}
}
  
+	/*

+* Unmount does not free the btrfs_device struct but would zero
+* generation along with most of the other members. So just update
+* it back. We need it to pick the disk with largest generation
+* (as above).
+*/
+   if (!fs_devices-opened)
+   device-generation = found_transid;
+
if (found_transid  fs_devices-latest_trans) {
fs_devices-latest_devid = devid;
fs_devices-latest_trans = found_transid;


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] btrfs: label should not contain return char

2014-07-01 Thread Wang Shilong
Hi Satoru,

On 07/01/2014 04:00 PM, Satoru Takeuchi wrote:
 Hi Wang,

 (2014/07/01 15:46), Wang Shilong wrote:
 Hi Satoru and all,

 I think there maybe a leftover issue.
 That is if we don't set label, in default it will output a blank line.

 Steps to reproduce:

 # mkfs.btrfs -f /dev/sdb
 # mount /dev/sdb
 # cat /sys/fs/btrfs/uuid/label --an extra line will be outputed.

 This is because in btrfs_label_show(), we did something like this directly:

 return snprintf(buf, PAGE_SIZE, %s\n, fs_info-super_copy-label);

 Maybe we can have a check whether label is NULL before we output?
 otherwise,the
 extra blank line is outputed, IMO this is not so nice thing!
 OK, how about it is? I also add a test for empty-label case.



 From: Anand Jain anand.j...@oracle.com

 generally if you use
   echo test  /sys/fs/btrfs/fsid/label
 it would introduce return char at the end and it can not
 be part of the label. The correct command is
   echo -n test  /sys/fs/btrfs/fsid/label

 This patch will check for this user error

 reproducer.sh:
 ===
 #!/bin/bash

 TEST_DEV=/dev/vdb
 TEST_DIR=/home/sat/mnt

 umount /home/sat/mnt

 mkfs.btrfs -f $TEST_DEV
 UUID=$(btrfs fi show $TEST_DEV | head -1 | sed -e 's/.*uuid: 
 \([-0-9a-z]*\)$/\1/')
 mount $TEST_DEV $TEST_DIR
 LABELFILE=/sys/fs/btrfs/$UUID/label

 echo Test for empty label... 2
 LINES=$(cat $LABELFILE | wc -l | awk '{print $1}')
 RET=0

 if [ $LINES -eq 0 ] ; then
 echo '[PASS] Trailing \n is removed correctly.' 2
 else
 echo '[FAIL] Trailing \n still exists.' 2
 RET=1
 fi

 echo Test for non-empty label... 2

 echo testlabel $LABELFILE
 LINES=$(cat $LABELFILE | wc -l | awk '{print $1}')

 if [ $LINES -eq 1 ] ; then
 echo '[PASS] Trailing \n is removed correctly.' 2
 else
 echo '[FAIL] Trailing \n still exists.' 2
 RET=1
 fi

 exit $RET 
 ===

 Signed-off-by: Anand Jain anand.j...@oracle.com
 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Thanks for pushing this patch, Now it Looks good to me.

Reviewed-by: Wang Shilong wangsl.f...@cn.fujitsu.com


 ---
  v6: fix empty-label case
  v5: tweak to be able to apply on the top of 402a0f4.
  add test program.
  v4: used memcpy and memset. Thanks David again
  v3: accepts review comments. Thanks David and Eric again
  v2: accepts review comments. Thanks Eric and Roman
 ---
  fs/btrfs/sysfs.c | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)

 diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
 index df39458..68d1f63 100644
 --- a/fs/btrfs/sysfs.c
 +++ b/fs/btrfs/sysfs.c
 @@ -363,7 +363,8 @@ static ssize_t btrfs_label_show(struct kobject *kobj,
   struct kobj_attribute *a, char *buf)
  {
   struct btrfs_fs_info *fs_info = to_fs_info(kobj);
 - return snprintf(buf, PAGE_SIZE, %s\n, fs_info-super_copy-label);
 + char *label = fs_info-super_copy-label;
 + return snprintf(buf, PAGE_SIZE, label[0] ? %s\n : %s, label);
  }
  
  static ssize_t btrfs_label_store(struct kobject *kobj,
 @@ -374,8 +375,15 @@ static ssize_t btrfs_label_store(struct kobject *kobj,
   struct btrfs_trans_handle *trans;
   struct btrfs_root *root = fs_info-fs_root;
   int ret;
 + size_t p_len;
  
 - if (len = BTRFS_LABEL_SIZE)
 + /*
 +  * p_len is the len until the first occurrence of either
 +  * '\n' or '\0'
 +  */
 + p_len = strcspn(buf, \n);
 +
 + if (p_len = BTRFS_LABEL_SIZE)
   return -EINVAL;
  
   trans = btrfs_start_transaction(root, 0);
 @@ -383,7 +391,8 @@ static ssize_t btrfs_label_store(struct kobject *kobj,
   return PTR_ERR(trans);
  
   spin_lock(root-fs_info-super_lock);
 - strcpy(fs_info-super_copy-label, buf);
 + memset(fs_info-super_copy-label, 0, BTRFS_LABEL_SIZE);
 + memcpy(fs_info-super_copy-label, buf, p_len);
   spin_unlock(root-fs_info-super_lock);
   ret = btrfs_commit_transaction(trans, root);
  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: clear compress-force when remounting with compress option

2014-07-01 Thread Wang Shilong

On 07/02/2014 09:58 AM, Satoru Takeuchi wrote:

(2014/07/02 1:04), David Sterba wrote:

On Mon, Jun 30, 2014 at 10:51:25AM +0800, Wang Shilong wrote:

Steps to reproduce:
  # mkfs.btrfs -f /dev/sdb
  # mount /dev/sdb /mnt -o compress-force=lzo
  # mount /dev/sdb /mnt -o remount,compress=zlib
  # cat /proc/mounts

Remounting from compress-force to compress could not clear 
compress-force
option. The problem is there is no way for users to clear 
compress-force

option separately.

Fix this problem by clearing @FORCE_COMPRESS flag when remounting to
compress=xxx.

Suggested-by: Tsutomu Itoh t-i...@jp.fujitsu.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com


I'm sorry for lack of answer in the V1 thread, V2 is the minimal bugfix
and I'm fine with it right away.

Reviewed-by: David Sterba dste...@suse.cz


Test result (I used your reproducer):
  3.16-rc3:   FAIL
  3.16-rc3 w/ your patch: PASS

Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com


Satoru, Thanks for doing this. ^_^

Regards,
Wang


Thanks,
Satoru


--
To unsubscribe from this list: send the line unsubscribe 
linux-btrfs in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html



.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v3 RESEND] Btrfs: device_list_add() should not update list when mounted

2014-07-01 Thread Wang Shilong

On 07/02/2014 12:31 AM, Anand Jain wrote:



Thanks for the commenting Wang.
inline below.


On 01/07/2014 15:16, Wang Shilong wrote:

Hi Anand,

Sorry for delay reply, more comments below:

On 06/13/2014 12:26 PM, Anand Jain wrote:

From: Anand Jain anand.j...@oracle.com


[...]

I agree we don't allow to update device list when mounted, i tested this
patch, it worked but


Thanks for testing this patch set.


it will output the following message:

Scanning for Btrfs filesystems in '/dev/sdc'
ERROR: unable to scan the device '/dev/sdc' - Device or resource busy

I think this is a little confusing for common users. Maybe don't return
error but output
some log message into kernel buffer, better?



 the Other choices are:

 display the error only when specific device is used
 by the user like 'btrfs dev scan /dev/sdc' And don't print busy /
 invalid error when a system wide scan is used like 'btrfs dev scan'.
 To achieve this we have to tweak btrfs-progs.


Maybe return EEXIST value here is better.

Also for your second patch, if we replace one disk with existed disk,
printk that info is good.:-)

Regard,
Wang


 or put the error under the verbose option in the btrfs-progs.





Thanks,
Wang

  name = rcu_string_strdup(path, GFP_NOFS);
  if (!name)
  return -ENOMEM;


--
To unsubscribe from this list: send the line unsubscribe 
linux-btrfs in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: looping 'mkfs.btrfs -f dev' may fail with EBUSY

2014-07-01 Thread Wang Shilong

On 06/12/2014 02:25 PM, Anand Jain wrote:

The thread holding the O_EXCL flag seems to be BTRFS_IOC_SCAN_DEV ioctl,
which in turn calls btrfs_scan_one_device() to open dev with the O_EXCL flag.

But btrfs_scan_one_device() does not write anything to the disk.
and it is called by
   . An intermediary step (not the final open_ctree) in the
 mount thread to read the SB and
   . btrfs-control ioctls viz BTRFS_IOC_SCAN_DEV and
 BTRFS_IOC_DEVICES_READY
they don't need the O_EXCL.

test script: (run this in a loop)
static int test_skip_this_disk(char *path)
{
 int fd;
 char c;

 printf(%s , path);
 fd = open(path, O_RDWR|O_EXCL);
 if (fd  0) {
 printf(Open failed\n);
 return 1;
 }
 /*fflush(stdout);
 printf(Open Fine press enter\n);
 scanf(%c, c);*/
 close(fd);
 return 0;
}

main(int arg, char **argv)
{
 int i;

 if (arg == 1) {
 printf(usage: %s dev-with-btrfs-sb .. \n, argv[0]);
 exit(1);
 }

 for (i = 1; i  arg; i++)
 test_skip_this_disk(argv[i]);
}

dump stack after the userland close(fd)

dump_stack+0x9/0x60
btrfs_scan_one_device+0x18d/0x1f0 [btrfs]
btrfs_control_ioctl+0xb9/0x210 [btrfs]
do_vfs_ioctl+0x84/0x4c0
inode_has_perm+0x28/0x30
file_has_perm+0x8a/0xa0
SyS_ioctl+0x91/0xa0
system_call_fastpath+0x16/0x1b

Signed-off-by: Anand Jain anand.j...@oracle.com
---
  fs/btrfs/volumes.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 636faa0..c186b5e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -914,7 +914,6 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, 
void *holder,
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
bytenr = btrfs_sb_offset(0);
-   flags |= FMODE_EXCL;
I could not think whether it will cause some big problem if we remove 
this flag.

So Cc Chris and others.

Any ideas about this problem?

Thanks,
Wang

mutex_lock(uuid_mutex);
  
  	bdev = blkdev_get_by_path(path, flags, holder);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fill_holes: Fix slot number passed to hole_mergeable() call.

2014-07-01 Thread Wang Shilong

On 07/01/2014 02:34 PM, Chandan Rajendra wrote:

For a non-existent key, btrfs_search_slot() sets path-slots[0] to the slot
where the key could have been present, which in this case would be the slot
containing the extent item which would be the next neighbor of the file range
being punched. The current code passes an incremented path-slots[0] and we
skip to the wrong file extent item. This would mean that we would fail to
merge the yet to be created hole with the next neighboring hole (if one
exists). Fix this.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---

Nice catch, this looks good to me.

Reviewed-by: Wang Shilong wangsl.f...@cn.fujitsu.com


  fs/btrfs/file.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ad7c059..3cd7997 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2126,10 +2126,9 @@ static int fill_holes(struct btrfs_trans_handle *trans, 
struct inode *inode,
goto out;
}
  
-	if (hole_mergeable(inode, leaf, path-slots[0]+1, offset, end)) {

+   if (hole_mergeable(inode, leaf, path-slots[0], offset, end)) {
u64 num_bytes;
  
-		path-slots[0]++;

key.offset = offset;
btrfs_set_item_key_safe(root, path, key);
fi = btrfs_item_ptr(leaf, path-slots[0],


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrub being stuck

2014-06-30 Thread Wang Shilong

On 06/30/2014 01:14 AM, Florian Lindner wrote:

Hello,

I've just completed a scrub on my home filesystem and now I wanted to start
on my Archiv btrfs RAID 0.

root@horus ~ # uname -
a
Linux horus 3.15.2-1-ARCH #1 SMP PREEMPT Fri Jun 27 07:41:19 CEST 2014
x86_64 GNU/Linux

root@horus ~ # btrfs --version
Btrfs v3.14.2-dirty

root@horus ~ # btrfs fi show
Label: 'home'  uuid: 571009c0-6cc4-4598-a77b-f175501d23fd
 Total devices 1 FS bytes used 2.03TiB
 devid1 size 2.73TiB used 2.04TiB path /dev/sdb1

Label: 'Archiv'  uuid: b2504193-efeb-4ef3-8797-a8d73536a416
 Total devices 2 FS bytes used 5.14TiB
 devid1 size 2.73TiB used 2.73TiB path /dev/sdd1
 devid2 size 2.73TiB used 2.73TiB path /dev/sde1

Btrfs v3.14.2-dirty

root@horus ~ # btrfs fi df /mnt/Archiv
Data, RAID0: total=5.44TiB, used=5.13TiB
Data, single: total=8.00MiB, used=244.00KiB
System, RAID1: total=32.00MiB, used=396.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=10.94GiB, used=8.71GiB
unknown, single: total=512.00MiB, used=0.00

root@horus ~ # btrfs scrub status /mnt/Archiv
scrub status for b2504193-efeb-4ef3-8797-a8d73536a416
 scrub started at Thu Jan  2 21:13:35 2014, running for 1336 seconds
 total bytes scrubbed: 473.28GiB with 0 errors

root@horus ~ # btrfs scrub cancel /mnt/Archiv
ERROR: scrub cancel failed on /mnt/Archiv: not running

root@horus ~ # btrfs scrub resume
/mnt/Archiv
ERROR: scrub is already running.
To cancel use 'btrfs scrub cancel /mnt/Archiv'.
To see the status use 'btrfs scrub status [-d] /mnt/Archiv'.

root@horus ~ # btrfs scrub start
/mnt/Archiv
ERROR: scrub is already running.
To cancel use 'btrfs scrub cancel /mnt/Archiv'.
To see the status use 'btrfs scrub status [-d] /mnt/Archiv'.


A scrub I started at Jan 02 seems to be stuck somehow. The command outputs
above were generated right now.

I'll be hapy to provide any more information.

Thx!

Florian
Scrub log record file seems corrupted sometimes, if you use latest 
btrfs-progs version.
There should be an option '-f' for 'btrfs scrub start' which will allow 
scrub to skip check in userspace.


Actually, i am thinking whether it would be better to remove userspace 
check about

scrub status since log record file is not so reliable.

Thanks,
Wang


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: clear compress-force when remounting with compress option

2014-06-29 Thread Wang Shilong
Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount /dev/sdb /mnt -o compress-force=lzo
 # mount /dev/sdb /mnt -o remount,compress=zlib
 # cat /proc/mounts

Remounting from compress-force to compress could not clear compress-force
option. The problem is there is no way for users to clear compress-force
option separately.

Fix this problem by clearing @FORCE_COMPRESS flag when remounting to
compress=xxx.

Suggested-by: Tsutomu Itoh t-i...@jp.fujitsu.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
v1-v2: remove unnecessary message output which was addressed
by David.
---
 fs/btrfs/super.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4662d92..5aae8f0 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -513,6 +513,13 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_info(root-fs_info,
   btrfs: use %s compression,
   compress_type);
+   /*
+* If we remount from compress-force=xxx to
+* compress=xxx, we need clear FORCE_COMPRESS
+* flag, otherwise, there is no way for users
+* to disable forcible compression separately.
+*/
+   btrfs_clear_opt(info-mount_opt, 
FORCE_COMPRESS);
}
break;
case Opt_ssd:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix race between balance recovery and root deletion

2014-06-25 Thread Wang Shilong
Balance recovery is called when RW mounting or remounting from
RO to RW, it is called to finish roots merging.

When doing balance recovery, relocation root's corresponding
fs root(whose root refs is 0) might be destroyed by cleaner
thread, this will make btrfs fail to mount.

Fix this problem by holding @cleaner_mutex when doing balance
recovery.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c | 2 ++
 fs/btrfs/super.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8bb4aa1..61d8e87 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2904,7 +2904,9 @@ retry_root_backup:
if (ret)
goto fail_qgroup;
 
+   mutex_lock(fs_info-cleaner_mutex);
ret = btrfs_recover_relocation(tree_root);
+   mutex_unlock(fs_info-cleaner_mutex);
if (ret  0) {
printk(KERN_WARNING
   BTRFS: failed to recover relocation\n);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4662d92..b6ebde2 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1467,7 +1467,9 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
goto restore;
 
/* recover relocation */
+   mutex_lock(fs_info-cleaner_mutex);
ret = btrfs_recover_relocation(root);
+   mutex_unlock(fs_info-cleaner_mutex);
if (ret)
goto restore;
 
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   >