Re: [PATCH] Btrfs: make sure fs_info is not null before its field is used in __btrfs_panic

2012-12-08 Thread Li Zefan
On 2012/12/7 23:42, Wang Sheng-Hui wrote:
 We should make sure fs_info is not null before we refer to its field.
 Add simple check here.

Why? Is there any caller passing NULL @fs_info to this function?

 
 Signed-off-by: Wang Sheng-Hui shh...@gmail.com
 ---
  fs/btrfs/super.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 915ac14..c6a3633 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -280,7 +280,7 @@ void __btrfs_panic(struct btrfs_fs_info *fs_info, const 
 char *function,
   vaf.va = args;
  
   errstr = btrfs_decode_error(fs_info, errno, nbuf);
 - if (fs_info-mount_opt  BTRFS_MOUNT_PANIC_ON_FATAL_ERROR)
 + if (fs_info  (fs_info-mount_opt  BTRFS_MOUNT_PANIC_ON_FATAL_ERROR))
   panic(KERN_CRIT BTRFS panic (device %s) in %s:%d: %pV (%s)\n,
   s_id, function, line, vaf, errstr);
  
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Return EINVAL when length to trim is less than FSB

2012-10-24 Thread Li Zefan
On 2012/10/24 16:15, Lukáš Czerner wrote:
 On Tue, 16 Oct 2012, Lukas Czerner wrote:
 
 Date: Tue, 16 Oct 2012 11:34:36 +0200
 From: Lukas Czerner lczer...@redhat.com
 To: linux-btrfs@vger.kernel.org
 Cc: jba...@fusionio.com, Lukas Czerner lczer...@redhat.com
 Subject: [PATCH] btrfs: Return EINVAL when length to trim is less than FSB

 Currently if len argument in btrfs_ioctl_fitrim() is smaller than
 one FSB we will continue and finally return 0 bytes discarded.
 However if the length to discard is smaller then file system block
 we should really return EINVAL.
 
 ping
 

This patch has already sit in Josef's btrfs-next tree:

http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=cd50b6b30f09446eafdd715db6d59299fcc5b19a

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: enquiry about defrag

2012-09-10 Thread Li Zefan
 may i ask a stupid question, if i remove my compress-force=lzo option, will 
 compression disabled for new written data?

the answer is yes. :)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tail packing

2012-09-02 Thread Li Zefan
On 2012/9/2 13:25, David Sterba wrote:
 On Fri, Aug 31, 2012 at 04:40:36PM +0200, Ben Wreder wrote:
 The disk format description implies that btrfs should be able to
 support tail packing. I just did some experimentation, and while small
 files are packed, it seems that files occupying more than one block
 are not. For example, a lot of 32769-byte files will end up taking
 over 36KB on average (a small excess due to the metadata itself).

 So just to confirm - btrfs does not currently support tail-packing, is that 
 correct?
 
 Currently it's not the reiserfs-style tail packing. The feature is
 advertised as Space-efficient packing of small files, as Josef
 replied, if a file fits into a leaf block, it's stored into the metadata
 blocks. The limit is 3916 bytes for 4k blocks, without compression. If
 compression is on, the 'small file' size limit is even higher for
 moderately compressible files.

If the actual size of the file is  3916 bytes, no matter how compressible it 
is,
it won't be inlined.

 An example where this could apply is a
 maildir, and IIRC this was why reiserfs was chosen on mail servers.
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] Btrfs: Check INCOMPAT flags on remount and add helper function

2012-07-22 Thread Li Zefan
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h

 index a0ee2f8..3a1a700 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -3103,6 +3103,19 @@ void __btrfs_abort_transaction(struct 
 btrfs_trans_handle *trans,
  struct btrfs_root *root, const char *function,
  unsigned int line, int errno);
  
 +static inline void btrfs_chk_lzo_incompat(struct btrfs_root *root)


Isn't btrfs_set_lzo_incompat() is a better name?

 +{
 + struct btrfs_super_block *disk_super;
 + u64 features;
 +
 + disk_super = root-fs_info-super_copy;
 + features = btrfs_super_incompat_flags(disk_super);
 + if (!(features  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) {
 + features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
 + btrfs_set_super_incompat_flags(disk_super, features);
 + }
 +}
 +


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: kill free_space pointer from inode structure

2012-07-11 Thread Li Zefan
On 2012/7/11 3:29, Josef Bacik wrote:

 On Mon, Jul 09, 2012 at 08:21:07PM -0600, Li Zefan wrote:
 Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
 which means every inode has the same BTRFS_I(inode)-free_space pointer.

 This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).

 Signed-off-by: Li Zefan lize...@huawei.com
 
 Li I can't apply any of your patches because they are all in base64 format and
 I'm having a hell of a time pulling them out to apply them, can you resend 
 with
 git send-email or something so I can apply them properly?  Thanks,
 


Hmm.. I got no complaints from Tejun or Chris before, so I didn't realize all
the emails I sent were in base64. It should be the email server that encoded
my patches, so I don't think using git-send-email will make any difference.
(not to mention I failed to make git-send-email work in my office :(

Is it ok if I attach the patches in attachments? Otherwise I'll use gmail
instead when I'm at home.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: zero unused bytes in inode item

2012-07-10 Thread Li Zefan
The otime field is not zeroed, so users will see random otime in an old
filesystem with a new kernel which has otime support in the future.

The reserved bytes are also not zeroed, and we'll have compatibility
issue if we make use of those bytes.

Signed-off-by: Li Zefan lize...@huawei.com
---
 fs/btrfs/delayed-inode.c |1 +
 fs/btrfs/inode.c |2 ++
 2 files changed, 3 insertions(+)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 21d91a8..335605c 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -62,6 +62,7 @@ static inline void btrfs_init_delayed_node(
INIT_LIST_HEAD(delayed_node-n_list);
INIT_LIST_HEAD(delayed_node-p_list);
delayed_node-bytes_reserved = 0;
+   memset(delayed_node-inode_item, 0, sizeof(delayed_node-inode_item));
 }
 
 static inline int btrfs_is_continuous_delayed_item(
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b189dd8..3a612f8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4688,6 +4688,8 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
inode-i_mtime = inode-i_atime = inode-i_ctime = CURRENT_TIME;
inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
  struct btrfs_inode_item);
+   memset_extent_buffer(path-nodes[0], 0, (unsigned long)inode_item,
+sizeof(*inode_item));
fill_inode_item(trans, path-nodes[0], inode_item, inode);
 
ref = btrfs_item_ptr(path-nodes[0], path-slots[0] + 1,
-- 
1.7.9.7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: kill free_space pointer from inode structure

2012-07-09 Thread Li Zefan
Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
which means every inode has the same BTRFS_I(inode)-free_space pointer.

This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).

Signed-off-by: Li Zefan lize...@huawei.com
---
 fs/btrfs/btrfs_inode.h |3 ---
 fs/btrfs/ctree.h   |3 ++-
 fs/btrfs/extent-tree.c |   20 
 fs/btrfs/inode.c   |3 ---
 4 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 12394a9..0b18b74 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -87,9 +87,6 @@ struct btrfs_inode {
/* node for the red-black tree that links inodes in subvolume root */
struct rb_node rb_node;
 
-   /* the space_info for where this inode's data allocations are done */
-   struct btrfs_space_info *space_info;
-
unsigned long runtime_flags;
 
/* full 64 bit generation number, struct vfs_inode doesn't have a big
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fa5c45b..6761490 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1240,6 +1240,8 @@ struct btrfs_fs_info {
 */
struct list_head space_info;
 
+   struct btrfs_space_info *data_sinfo;
+
struct reloc_control *reloc_ctl;
 
spinlock_t delalloc_lock;
@@ -2607,7 +2609,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 struct btrfs_root *root, u64 group_start);
 u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags);
 u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data);
-void btrfs_set_inode_space_info(struct btrfs_root *root, struct inode *ionde);
 void btrfs_clear_space_info_full(struct btrfs_fs_info *info);
 int btrfs_check_data_free_space(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index bbab3ff..b1336f5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3133,6 +3133,8 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
init_waitqueue_head(found-wait);
*space_info = found;
list_add_rcu(found-list, info-space_info);
+   if (flags  BTRFS_BLOCK_GROUP_DATA)
+   info-data_sinfo = found;
return 0;
 }
 
@@ -3262,12 +3264,6 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int 
data)
return get_alloc_profile(root, flags);
 }
 
-void btrfs_set_inode_space_info(struct btrfs_root *root, struct inode *inode)
-{
-   BTRFS_I(inode)-space_info = __find_space_info(root-fs_info,
-  BTRFS_BLOCK_GROUP_DATA);
-}
-
 /*
  * This will check the space that the inode allocates from to make sure we have
  * enough space for bytes.
@@ -3276,6 +3272,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 
bytes)
 {
struct btrfs_space_info *data_sinfo;
struct btrfs_root *root = BTRFS_I(inode)-root;
+   struct btrfs_fs_info *fs_info = root-fs_info;
u64 used;
int ret = 0, committed = 0, alloc_chunk = 1;
 
@@ -3288,7 +3285,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 
bytes)
committed = 1;
}
 
-   data_sinfo = BTRFS_I(inode)-space_info;
+   data_sinfo = fs_info-data_sinfo;
if (!data_sinfo)
goto alloc;
 
@@ -3329,10 +3326,9 @@ alloc:
goto commit_trans;
}
 
-   if (!data_sinfo) {
-   btrfs_set_inode_space_info(root, inode);
-   data_sinfo = BTRFS_I(inode)-space_info;
-   }
+   if (!data_sinfo)
+   data_sinfo = fs_info-data_sinfo;
+
goto again;
}
 
@@ -3379,7 +3375,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, 
u64 bytes)
/* make sure bytes are sectorsize aligned */
bytes = (bytes + root-sectorsize - 1)  ~((u64)root-sectorsize - 1);
 
-   data_sinfo = BTRFS_I(inode)-space_info;
+   data_sinfo = root-fs_info-data_sinfo;
spin_lock(data_sinfo-lock);
data_sinfo-bytes_may_use -= bytes;
trace_btrfs_space_reservation(root-fs_info, space_info,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0d507e6..b189dd8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4077,7 +4077,6 @@ static int btrfs_init_locked_inode(struct inode *inode, 
void *p)
struct btrfs_iget_args *args = p;
inode-i_ino = args-ino;
BTRFS_I(inode)-root = args-root;
-   btrfs_set_inode_space_info(args-root, inode);
return 0;
 }
 
@@ -4662,7 +4661,6 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
BTRFS_I(inode)-root = root;
BTRFS_I(inode)-generation

[PATCH] Btrfs: rewrite BTRFS_SETGET_FUNCS

2012-07-09 Thread Li Zefan
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.

The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.

It results in redunction of ~37K bytes.

   textdata bss dec hex filename
 629661   12489 216  642366   9cd3e fs/btrfs/btrfs.o.orig
 592637   12489 216  605342   93c9e fs/btrfs/btrfs.o

Signed-off-by: Li Zefan lize...@huawei.com
---
 fs/btrfs/ctree.h|   53 +++--
 fs/btrfs/struct-funcs.c |  196 ---
 2 files changed, 146 insertions(+), 103 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6761490..e262362 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1623,13 +1623,54 @@ static inline void btrfs_init_map_token (struct 
btrfs_map_token *token)
offsetof(type, member), \
   sizeof(((type *)0)-member)))
 
-#ifndef BTRFS_SETGET_FUNCS
+#define DECLARE_BTRFS_SETGET_BITS(bits)
\
+u##bits btrfs_get_token_##bits(struct extent_buffer *eb, void *ptr,\
+  unsigned long off,   \
+  struct btrfs_map_token *token);  \
+void btrfs_set_token_##bits(struct extent_buffer *eb, void *ptr,   \
+   unsigned long off, u##bits val, \
+   struct btrfs_map_token *token); \
+static inline u##bits btrfs_get_##bits(struct extent_buffer *eb, void *ptr, \
+  unsigned long off)   \
+{  \
+   return btrfs_get_token_##bits(eb, ptr, off, NULL);  \
+}  \
+static inline void btrfs_set_##bits(struct extent_buffer *eb, void *ptr, \
+   unsigned long off, u##bits val) \
+{  \
+   btrfs_set_token_##bits(eb, ptr, off, val, NULL);\
+}
+
+DECLARE_BTRFS_SETGET_BITS(8)
+DECLARE_BTRFS_SETGET_BITS(16)
+DECLARE_BTRFS_SETGET_BITS(32)
+DECLARE_BTRFS_SETGET_BITS(64)
+
 #define BTRFS_SETGET_FUNCS(name, type, member, bits)   \
-u##bits btrfs_##name(struct extent_buffer *eb, type *s);   \
-u##bits btrfs_token_##name(struct extent_buffer *eb, type *s, struct 
btrfs_map_token *token);  \
-void btrfs_set_token_##name(struct extent_buffer *eb, type *s, u##bits val, 
struct btrfs_map_token *token);\
-void btrfs_set_##name(struct extent_buffer *eb, type *s, u##bits val);
-#endif
+static inline u##bits btrfs_##name(struct extent_buffer *eb, type *s)  \
+{  \
+   BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0))-member);   \
+   return btrfs_get_##bits(eb, s, offsetof(type, member)); \
+}  \
+static inline void btrfs_set_##name(struct extent_buffer *eb, type *s, \
+   u##bits val)\
+{  \
+   BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0))-member);   \
+   btrfs_set_##bits(eb, s, offsetof(type, member), val);   \
+}  \
+static inline u##bits btrfs_token_##name(struct extent_buffer *eb, type *s, \
+struct btrfs_map_token *token) \
+{  \
+   BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0))-member);   \
+   return btrfs_get_token_##bits(eb, s, offsetof(type, member), token); \
+}  \
+static inline void btrfs_set_token_##name(struct extent_buffer *eb,\
+ type *s, u##bits val, \
+ struct btrfs_map_token *token) \
+{  \
+   BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0))-member);   \
+   btrfs_set_token_##bits(eb, s, offsetof(type, member), val, token); \
+}
 
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index c6ffa58..1f4ede9 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -17,15 +17,27 @@
  */
 
 #include linux/highmem.h
+#include asm/unaligned.h
 
-/* this is some

Re: [PATCH] Btrfs: Add code to support file creation time.

2012-07-09 Thread Li Zefan
On 2012/7/5 9:52, Alexander Block wrote:

 On Thu, Jul 5, 2012 at 3:07 AM, Li Zefan lize...@huawei.com wrote:
 On 2012/7/4 19:04, Alexander Block wrote:

 On Wed, Jul 4, 2012 at 9:56 AM, Li Zefan lize...@huawei.com wrote:
 On 2012/7/4 15:18, chandan r wrote:

 This patch adds a new member to the 'struct btrfs_inode' structure to hold
 the file creation time.



 Well, how do users use this file creation time? There's no syscall and 
 there's
 no ioctl that exports this information. That xstat syscall hasn't been 
 accepted,
 so you can revise and repost the patch when you see it happens.
 In my opinion we should still include this patch. Currently, otime is never 
 even
 initialized, having undefined values. If it ever gets possible to
 access otime, we
 would at least have some inodes with valid otime fields.


 otime (on disk) is initialized to 0, not some undefined value. But yeah, 
 your point makes
 some sense, that with this patch we can access valid otime in an old 
 filesystem once we
 update to a new kernel which has otime support.
 This is true for the inode items found in the root tree. But the inode
 items found in the filesystem trees are not initialized at all. I did
 a fast check by adding printing of the otime field in
 btrfs-debug-tree...and every inode's otime looks random.
 btrfs_new_inode uses btrfs_insert_empty_items which does not zero the
 new item, then fill_inode_item is used to initialize the fields and
 there otime is missing.
 


That's bad..
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Add code to support file creation time.

2012-07-04 Thread Li Zefan
On 2012/7/4 19:04, Alexander Block wrote:

 On Wed, Jul 4, 2012 at 9:56 AM, Li Zefan lize...@huawei.com wrote:
 On 2012/7/4 15:18, chandan r wrote:

 This patch adds a new member to the 'struct btrfs_inode' structure to hold
 the file creation time.



 Well, how do users use this file creation time? There's no syscall and 
 there's
 no ioctl that exports this information. That xstat syscall hasn't been 
 accepted,
 so you can revise and repost the patch when you see it happens.
 In my opinion we should still include this patch. Currently, otime is never 
 even
 initialized, having undefined values. If it ever gets possible to
 access otime, we
 would at least have some inodes with valid otime fields.


otime (on disk) is initialized to 0, not some undefined value. But yeah, your 
point makes
some sense, that with this patch we can access valid otime in an old filesystem 
once we
update to a new kernel which has otime support.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Add code to support file creation time.

2012-07-04 Thread Li Zefan
On 2012/7/4 15:18, chandan r wrote:

 This patch adds a new member to the 'struct btrfs_inode' structure to hold
 the file creation time.
 


Well, how do users use this file creation time? There's no syscall and there's
no ioctl that exports this information. That xstat syscall hasn't been accepted,
so you can revise and repost the patch when you see it happens.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix defrag regression

2012-06-11 Thread Li Zefan
If a file has 3 small extents:

| ext1 | ext2 | ext3 |

Running btrfs fi defrag will only defrag the last two extents, if those
extent mappings hasn't been read into memory from disk.

This bug was introduced by commit 17ce6ef8d731af5edac8c39e806db4c7e1f6956f
(Btrfs: add a check to decide if we should defrag the range)

The cause is, that commit looked into previous and next extents using
lookup_extent_mapping() only.

While at it, remove the code that checks the previous extent, since
it's sufficient to check the next extent.

Signed-off-by: Li Zefan lize...@huawei.com
---
 fs/btrfs/ioctl.c |   97 +++---
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 24b776c..ac910f2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -785,39 +785,57 @@ none:
return -ENOENT;
 }
 
-/*
- * Validaty check of prev em and next em:
- * 1) no prev/next em
- * 2) prev/next em is an hole/inline extent
- */
-static int check_adjacent_extents(struct inode *inode, struct extent_map *em)
+static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
 {
struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree;
-   struct extent_map *prev = NULL, *next = NULL;
-   int ret = 0;
+   struct extent_io_tree *io_tree = BTRFS_I(inode)-io_tree;
+   struct extent_map *em;
+   u64 len = PAGE_CACHE_SIZE;
 
+   /*
+* hopefully we have this extent in the tree already, try without
+* the full extent lock
+*/
read_lock(em_tree-lock);
-   prev = lookup_extent_mapping(em_tree, em-start - 1, (u64)-1);
-   next = lookup_extent_mapping(em_tree, em-start + em-len, (u64)-1);
+   em = lookup_extent_mapping(em_tree, start, len);
read_unlock(em_tree-lock);
 
-   if ((!prev || prev-block_start = EXTENT_MAP_LAST_BYTE) 
-   (!next || next-block_start = EXTENT_MAP_LAST_BYTE))
-   ret = 1;
-   free_extent_map(prev);
-   free_extent_map(next);
+   if (!em) {
+   /* get the big lock and read metadata off disk */
+   lock_extent(io_tree, start, start + len - 1);
+   em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
+   unlock_extent(io_tree, start, start + len - 1);
 
+   if (IS_ERR(em))
+   return NULL;
+   }
+
+   return em;
+}
+
+static bool defrag_check_next_extent(struct inode *inode, struct extent_map 
*em)
+{
+   struct extent_map *next;
+   bool ret = true;
+
+   /* this is the last extent */
+   if (em-start + em-len = i_size_read(inode))
+   return false;
+
+   next = defrag_lookup_extent(inode, em-start + em-len);
+   if (!next || next-block_start = EXTENT_MAP_LAST_BYTE)
+   ret = false;
+
+   free_extent_map(next);
return ret;
 }
 
-static int should_defrag_range(struct inode *inode, u64 start, u64 len,
-  int thresh, u64 *last_len, u64 *skip,
-  u64 *defrag_end)
+static int should_defrag_range(struct inode *inode, u64 start, int thresh,
+  u64 *last_len, u64 *skip, u64 *defrag_end)
 {
-   struct extent_io_tree *io_tree = BTRFS_I(inode)-io_tree;
-   struct extent_map *em = NULL;
-   struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree;
+   struct extent_map *em;
int ret = 1;
+   bool next_mergeable = true;
 
/*
 * make sure that once we start defragging an extent, we keep on
@@ -828,23 +846,9 @@ static int should_defrag_range(struct inode *inode, u64 
start, u64 len,
 
*skip = 0;
 
-   /*
-* hopefully we have this extent in the tree already, try without
-* the full extent lock
-*/
-   read_lock(em_tree-lock);
-   em = lookup_extent_mapping(em_tree, start, len);
-   read_unlock(em_tree-lock);
-
-   if (!em) {
-   /* get the big lock and read metadata off disk */
-   lock_extent(io_tree, start, start + len - 1);
-   em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
-   unlock_extent(io_tree, start, start + len - 1);
-
-   if (IS_ERR(em))
-   return 0;
-   }
+   em = defrag_lookup_extent(inode, start);
+   if (!em)
+   return 0;
 
/* this will cover holes, and inline extents */
if (em-block_start = EXTENT_MAP_LAST_BYTE) {
@@ -852,18 +856,15 @@ static int should_defrag_range(struct inode *inode, u64 
start, u64 len,
goto out;
}
 
-   /* If we have nothing to merge with us, just skip. */
-   if (check_adjacent_extents(inode, em)) {
-   ret = 0;
-   goto out;
-   }
+   next_mergeable = defrag_check_next_extent(inode, em);
 
/*
-* we hit a real extent, if it is big don't

[PATCH] Btrfs: fix incompat flags setting

2012-06-11 Thread Li Zefan
It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
has only one bit set.

Signed-off-by: Li Zefan lize...@huawei.com
---
 fs/btrfs/disk-io.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7ae51de..fe82070 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2118,7 +2118,7 @@ int open_ctree(struct super_block *sb,
 
features = btrfs_super_incompat_flags(disk_super);
features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
-   if (tree_root-fs_info-compress_type  BTRFS_COMPRESS_LZO)
+   if (tree_root-fs_info-compress_type == BTRFS_COMPRESS_LZO)
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
 
/*
-- 
1.7.9.7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make existing snapshots read-only?

2012-05-28 Thread Li Zefan
On 2012/5/29 2:37, Bruce Guenter wrote:

 
 Is there any way to mark existing snapshots as read-only? Making new
 ones read-only is easy enough, but what about existing ones?
 


We have code in the kernel side, so what we need to do is to update btrfs-progs,
which is trivial.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: make clear_extent_bit() always return 0 on success

2012-04-05 Thread Li Zefan
(Note: I've changed my email address ;)

David Sterba wrote:

 On Mon, Mar 12, 2012 at 04:39:28PM +0800, Li Zefan wrote:
 Currently it returns a set of bits that were cleared, but this return
 value is not used at all.

 Moreover it doesn't seem to be useful, because we may clear the bits
 of a few extent_states, but only the cleared bits of last one is
 returned.

 Signed-off-by: Li Zefan l...@cn.fujitsu.com
 ---
  fs/btrfs/extent_io.c |   19 +++
  1 files changed, 7 insertions(+), 12 deletions(-)

 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index a55fbe6..c968c95 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -394,18 +394,16 @@ static int split_state(struct extent_io_tree *tree, 
 struct extent_state *orig,
  
  /*
   * utility function to clear some bits in an extent state struct.
 - * it will optionally wake up any one waiting on this state (wake == 1), or
 - * forcibly remove the state from the tree (delete == 1).
 + * it will optionally wake up any one waiting on this state (wake == 1)
   *
   * If no bits are set on the state struct after clearing things, the
   * struct is freed and removed from the tree
   */
 -static int clear_state_bit(struct extent_io_tree *tree,
 +static void clear_state_bit(struct extent_io_tree *tree,
  struct extent_state *state,
  int *bits, int wake)
  {
  int bits_to_clear = *bits  ~EXTENT_CTLBITS;
 -int ret = state-state  bits_to_clear;
  
  if ((bits_to_clear  EXTENT_DIRTY)  (state-state  EXTENT_DIRTY)) {
  u64 range = state-end - state-start + 1;
 @@ -427,7 +425,6 @@ static int clear_state_bit(struct extent_io_tree *tree,
  } else {
  merge_state(tree, state);
  }
 -return ret;
  }
  
  static struct extent_state *
 
 The above part of the patch still applies and with only subject change
 to something like
 
   Btrfs: retrurn void from clear_state_bit
 
 is a rc2 material. So, Li, if you're ok with this change I'm adding it
 (with the 2/2 patch) to my local queue of rc patches for Chris.
 


Thanks for doing this!

--
Li Zefan

 
 david
 
 (the rest of the patch was done within the error handling series)
 
 @@ -449,8 +446,7 @@ alloc_extent_state_atomic(struct extent_state *prealloc)
   *
   * the range [start, end] is inclusive.
   *
 - * This takes the tree lock, and returns  0 on error,  0 if any of the
 - * bits were already set, or zero if none of the bits were already set.
 + * This takes the tree lock, and returns  0 on error.
   */
  int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
   int bits, int wake, int delete,
 @@ -464,7 +460,6 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 
 start, u64 end,
  struct rb_node *node;
  u64 last_end;
  int err;
 -int set = 0;
  int clear = 0;
  
  if (delete)
 @@ -547,7 +542,7 @@ hit_next:
  if (err)
  goto out;
  if (state-end = end) {
 -set |= clear_state_bit(tree, state, bits, wake);
 +clear_state_bit(tree, state, bits, wake);
  if (last_end == (u64)-1)
  goto out;
  start = last_end + 1;
 @@ -568,13 +563,13 @@ hit_next:
  if (wake)
  wake_up(state-wq);
  
 -set |= clear_state_bit(tree, prealloc, bits, wake);
 +clear_state_bit(tree, prealloc, bits, wake);
  
  prealloc = NULL;
  goto out;
  }
  
 -set |= clear_state_bit(tree, state, bits, wake);
 +clear_state_bit(tree, state, bits, wake);
  next:
  if (last_end == (u64)-1)
  goto out;
 @@ -591,7 +586,7 @@ out:
  if (prealloc)
  free_extent_state(prealloc);
  
 -return set;
 +return 0;
  
  search_again:
  if (start  end)
 -- 1.7.3.1 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: make clear_extent_bit() always return 0 on success

2012-03-12 Thread Li Zefan
Currently it returns a set of bits that were cleared, but this return
value is not used at all.

Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |   19 +++
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a55fbe6..c968c95 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -394,18 +394,16 @@ static int split_state(struct extent_io_tree *tree, 
struct extent_state *orig,
 
 /*
  * utility function to clear some bits in an extent state struct.
- * it will optionally wake up any one waiting on this state (wake == 1), or
- * forcibly remove the state from the tree (delete == 1).
+ * it will optionally wake up any one waiting on this state (wake == 1)
  *
  * If no bits are set on the state struct after clearing things, the
  * struct is freed and removed from the tree
  */
-static int clear_state_bit(struct extent_io_tree *tree,
+static void clear_state_bit(struct extent_io_tree *tree,
struct extent_state *state,
int *bits, int wake)
 {
int bits_to_clear = *bits  ~EXTENT_CTLBITS;
-   int ret = state-state  bits_to_clear;
 
if ((bits_to_clear  EXTENT_DIRTY)  (state-state  EXTENT_DIRTY)) {
u64 range = state-end - state-start + 1;
@@ -427,7 +425,6 @@ static int clear_state_bit(struct extent_io_tree *tree,
} else {
merge_state(tree, state);
}
-   return ret;
 }
 
 static struct extent_state *
@@ -449,8 +446,7 @@ alloc_extent_state_atomic(struct extent_state *prealloc)
  *
  * the range [start, end] is inclusive.
  *
- * This takes the tree lock, and returns  0 on error,  0 if any of the
- * bits were already set, or zero if none of the bits were already set.
+ * This takes the tree lock, and returns  0 on error.
  */
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 int bits, int wake, int delete,
@@ -464,7 +460,6 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 
start, u64 end,
struct rb_node *node;
u64 last_end;
int err;
-   int set = 0;
int clear = 0;
 
if (delete)
@@ -547,7 +542,7 @@ hit_next:
if (err)
goto out;
if (state-end = end) {
-   set |= clear_state_bit(tree, state, bits, wake);
+   clear_state_bit(tree, state, bits, wake);
if (last_end == (u64)-1)
goto out;
start = last_end + 1;
@@ -568,13 +563,13 @@ hit_next:
if (wake)
wake_up(state-wq);
 
-   set |= clear_state_bit(tree, prealloc, bits, wake);
+   clear_state_bit(tree, prealloc, bits, wake);
 
prealloc = NULL;
goto out;
}
 
-   set |= clear_state_bit(tree, state, bits, wake);
+   clear_state_bit(tree, state, bits, wake);
 next:
if (last_end == (u64)-1)
goto out;
@@ -591,7 +586,7 @@ out:
if (prealloc)
free_extent_state(prealloc);
 
-   return set;
+   return 0;
 
 search_again:
if (start  end)
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2][RESEND] Btrfs: avoid possible use-after-free in clear_extent_bit()

2012-03-12 Thread Li Zefan
clear_extent_bit()
{
next_node = rb_next(state-rb_node);
...
clear_state_bit(state);  -- this may free next_node
if (next_node) {
state = rb_entry(next_node);
...
}
}

clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---

no more Chinese characters in this section. ;)

---
 fs/btrfs/extent_io.c |   36 +---
 1 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c968c95..20f2f5a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -392,6 +392,15 @@ static int split_state(struct extent_io_tree *tree, struct 
extent_state *orig,
return 0;
 }
 
+static struct extent_state *next_state(struct extent_state *state)
+{
+   struct rb_node *next = rb_next(state-rb_node);
+   if (next)
+   return rb_entry(next, struct extent_state, rb_node);
+   else
+   return NULL;
+}
+
 /*
  * utility function to clear some bits in an extent state struct.
  * it will optionally wake up any one waiting on this state (wake == 1)
@@ -399,10 +408,11 @@ static int split_state(struct extent_io_tree *tree, 
struct extent_state *orig,
  * If no bits are set on the state struct after clearing things, the
  * struct is freed and removed from the tree
  */
-static void clear_state_bit(struct extent_io_tree *tree,
-   struct extent_state *state,
-   int *bits, int wake)
+static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
+   struct extent_state *state,
+   int *bits, int wake)
 {
+   struct extent_state *next;
int bits_to_clear = *bits  ~EXTENT_CTLBITS;
 
if ((bits_to_clear  EXTENT_DIRTY)  (state-state  EXTENT_DIRTY)) {
@@ -415,6 +425,7 @@ static void clear_state_bit(struct extent_io_tree *tree,
if (wake)
wake_up(state-wq);
if (state-state == 0) {
+   next = next_state(state);
if (state-tree) {
rb_erase(state-rb_node, tree-state);
state-tree = NULL;
@@ -424,7 +435,9 @@ static void clear_state_bit(struct extent_io_tree *tree,
}
} else {
merge_state(tree, state);
+   next = next_state(state);
}
+   return next;
 }
 
 static struct extent_state *
@@ -456,7 +469,6 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 
start, u64 end,
struct extent_state *state;
struct extent_state *cached;
struct extent_state *prealloc = NULL;
-   struct rb_node *next_node;
struct rb_node *node;
u64 last_end;
int err;
@@ -508,14 +520,11 @@ hit_next:
WARN_ON(state-end  start);
last_end = state-end;
 
-   if (state-end  end  !need_resched())
-   next_node = rb_next(state-rb_node);
-   else
-   next_node = NULL;
-
/* the state doesn't have the wanted bits, go ahead */
-   if (!(state-state  bits))
+   if (!(state-state  bits)) {
+   state = next_state(state);
goto next;
+   }
 
/*
 * |  desired range  |
@@ -569,16 +578,13 @@ hit_next:
goto out;
}
 
-   clear_state_bit(tree, state, bits, wake);
+   state = clear_state_bit(tree, state, bits, wake);
 next:
if (last_end == (u64)-1)
goto out;
start = last_end + 1;
-   if (start = end  next_node) {
-   state = rb_entry(next_node, struct extent_state,
-rb_node);
+   if (start = end  state  !need_resched())
goto hit_next;
-   }
goto search_again;
 
 out:
-- 1.7.3.1 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix locking issues in find_parent_nodes()

2012-02-29 Thread Li Zefan
- We might unlock head-mutex while it was not locked
- We might leave the function without unlocking delayed_refs-lock

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/backref.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 98f6bf10..0436c12 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -583,7 +583,7 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
struct btrfs_path *path;
struct btrfs_key info_key = { 0 };
struct btrfs_delayed_ref_root *delayed_refs = NULL;
-   struct btrfs_delayed_ref_head *head = NULL;
+   struct btrfs_delayed_ref_head *head;
int info_level = 0;
int ret;
struct list_head prefs_delayed;
@@ -607,6 +607,8 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
 * at a specified point in time
 */
 again:
+   head = NULL;
+
ret = btrfs_search_slot(trans, fs_info-extent_root, key, path, 0, 0);
if (ret  0)
goto out;
@@ -635,8 +637,10 @@ again:
goto again;
}
ret = __add_delayed_refs(head, seq, info_key, prefs_delayed);
-   if (ret)
+   if (ret) {
+   spin_unlock(delayed_refs-lock);
goto out;
+   }
}
spin_unlock(delayed_refs-lock);
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid setting -d_op twice

2012-02-21 Thread Li Zefan
Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():

  # mkfs.btrfs /dev/loop3
  # mount /dev/loop3 /mnt
  # btrfs sub create /mnt/sub
  # btrfs sub snap /mnt /mnt/snap
  # touch /mnt/snap/sub
  touch: cannot touch `tmp': Permission denied

__d_alloc() set d_op to sb-s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/inode.c |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 32214fe..2a623cc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3908,7 +3908,7 @@ static struct inode *new_simple_dir(struct super_block *s,
BTRFS_I(inode)-dummy_inode = 1;
 
inode-i_ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID;
-   inode-i_op = simple_dir_inode_operations;
+   inode-i_op = btrfs_dir_ro_inode_operations;
inode-i_fop = simple_dir_operations;
inode-i_mode = S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO;
inode-i_mtime = inode-i_atime = inode-i_ctime = CURRENT_TIME;
@@ -3979,14 +3979,18 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
 static int btrfs_dentry_delete(const struct dentry *dentry)
 {
struct btrfs_root *root;
+   struct inode *inode = dentry-d_inode;
 
-   if (!dentry-d_inode  !IS_ROOT(dentry))
-   dentry = dentry-d_parent;
+   if (!inode  !IS_ROOT(dentry))
+   inode = dentry-d_parent-d_inode;
 
-   if (dentry-d_inode) {
-   root = BTRFS_I(dentry-d_inode)-root;
+   if (inode) {
+   root = BTRFS_I(inode)-root;
if (btrfs_root_refs(root-root_item) == 0)
return 1;
+
+   if (btrfs_ino(inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)
+   return 1;
}
return 0;
 }
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: avoid setting -d_op twice

2012-02-21 Thread Li Zefan
David Sterba wrote:
 Hi,
 
 thanks for the patch!
 
 On Tue, Feb 21, 2012 at 05:04:28PM +0800, Li Zefan wrote:
 Follow those instructions, and you'll trigger a warning in the
 beginning of d_set_d_op():

   # mkfs.btrfs /dev/loop3
   # mount /dev/loop3 /mnt
   # btrfs sub create /mnt/sub
   # btrfs sub snap /mnt /mnt/snap
   # touch /mnt/snap/sub
   touch: cannot touch `tmp': Permission denied
 
 is this the right error code? permission denied == EACCESS and this is
 returned in case of file
 
 (strace touch file)
 
 open(file, O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 EACCES 
 (Permission denied)
 
 while mkdir returns (strace mkdir a-dir)
 
 mkdir(a-dir, 0777)= -1 EPERM (Operation not permitted)
 
 
 The commands were run as root and EPERM looks more adequate than
 EACCESS.
 

This is a different issue, and EPERM is returned by VFS not btrfs.

Feel free to submit a patch to fsdevel for discussion.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems Implementing Snappy Patches

2012-02-15 Thread Li Zefan
Mitch Harder wrote:
 I've been trying to test the snappy compression patches, but I'm
 getting corruptions when trying to use snappy as built on my system.
 
 I'm checking out the Linux 3.2.6 kernel, merging that with the latest
 'for-linus' branch on Chris Mason's kernel.org repo, and then
 integrating the snappy and lz4 patches from David Sterba's git
 repository (dev/compression-squad branch).
 
 I've tried a simple merge of the dev/compression-squad branch (which
 merged without complaint by git), and I also tried a second build of
 the kernel integrating the dev/compression-squad branch patches with
 format-patch (git format-patch -k -m -U5 --stdout range | git am -3
 -k).
 
 I've also tried sourcing my snappy patches from Chris Mason's snappy
 branch on kernel.org with the same result.
 

A month ago I saved emails as patches and applied them on linux-btrfs'
for-linus branch, and I got the same problem. Then I used the snappy
branch (plus my fix), and it was fine. Don't know why.

 I get the same results either way.
 
 My system is a Core 2 Duo x86_64 Sabayon based system (Gentoo is our
 parent distro).  My target btrfs/snappy partition is a 16 GB partition
 I use for testing on a 500GB Western Digital Hard Drive.
 
 When I copy directory containing a kernel sources git repository to a
 freshly formated partition mounted with snappy, I get a corrupted
 copy.  If I mount with lzo or lz4 compression, I don't see any
 corruptions from the copy.
 
 I'm not showing any errors in dmesg.
 

You're not seeing any errors, right?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0

2012-02-01 Thread Li Zefan
Jan Schmidt wrote:
 On 30.01.2012 07:33, Li Zefan wrote:
 Jan Schmidt wrote:
 I was looking at the clone range ioctl and have some remarks:

 On 27.01.2011 09:46, Li Zefan wrote:
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index f87552a..1b61dab 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file 
 *file, unsigned long srcfd,
  
memcpy(new_key, key, sizeof(new_key));
new_key.objectid = inode-i_ino;
 -  new_key.offset = key.offset + destoff - off;
 +  if (off = key.offset)
 +  new_key.offset = key.offset + destoff - off;
 +  else
 +  new_key.offset = destoff;
  ^^^
 1) This looks spurious to me. What if destoff isn't aligned? That's what
 the key.offset - off code above is for. Before the patch, the code
 didn't work at all, I agree. But this fix can only work for aligned
 requests.

 2) The error in new_key also has propagated to the extent item's backref
 and wasn't fixed there. I did a range clone and ended up with an extent
 item like that:
 item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047
 itemsize 169
 extent refs 8 gen 11103 flags 1
 [...]
 extent data backref root 257 objectid 272 offset
 18446744073709494272 count 1

 The last offset (equal to -14 * 4k) is obviously wrong. I didn't figure
 out how the variables are computed, but it looks like there's something
 wrong with the datao u64 to me.


 Unfortunately this is expected. The calculation is:

 extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset

 so you may get negative offset.
 
 I see where the negative offset comes from. But what can this offset be
 used for?
 
 The design idea was to reduce the number of extent backrefs in that
 a data backref can point to different file extents in the same file
 (in this case the count field  1). We didn't expect nagetive
 offset until range clone was implemented.
 
 Reducing the number of backrefs is a good thing. In case of count  1,
 it's clear that the offset cannot reference all of the extent data
 items. We have different design choices:
 
 a) Use the above computation and leave the filesystem with an unusable
 offset value for extent backrefs.
 
 b) Use either one of the extent data item offsets this backref references.
 
 c) Always use a predefined constant (like 0 or -1) when count  1.
 
 d) Disallow count  1 for those refs and turn them into shared refs as
 soon as count gets  1.
 

I expressed the same doubt. See this thread:

http://marc.info/?t=13142591281r=1w=2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-raid questions I couldn't find an answer to on the wiki

2012-01-29 Thread Li Zefan
Goffredo Baroncelli wrote:
 On Thursday, 26 January, 2012 16:41:32 Duncan wrote:
 1) My /boot partition and its backup (which I do want to keep separate 
 from root) are only 128 MB each.  The wiki recommends 1 gig sizes 
 minimum, but there's some indication that's dated info due to mixed data/
 metadata mode in recent kernels.

 Is a 128 MB btrfs reasonable?  What's the mixed-mode minumum recommended 
 and what is overhead going to look like?
 
 IIRC, the minimum size should be 256MB. Anyway, if you want/allow a separate 
 partition for  /boot I suggest to use a classic filesystem like ext3.
 

The 256MB limitation has been removed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting options permanently?

2012-01-29 Thread Li Zefan
 I currently don't see how to repair this afterwards without removing the
 uncompressed files and writing new ones, which on the other hand spoils
 hte memory saving effect of using snapshots instead of copies.
 
 A rebalance is the usual way of handling such things, since that rewrites 
 every chunk, taking account of current mount options, etc.  If I've been 
 reading correctly, a defrag should handle it now as well, altho that used 
 to break snapshots/cow/reflink-copies, but I /think/ I read that with 3.2 
 (or was it 3.3-rc1) it doesn't, any more.
 

Rebalance won't do the trick, but defrag can. And defrag still breaks
snapshots, which I'm working on.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0

2012-01-29 Thread Li Zefan
Jan Schmidt wrote:
 I was looking at the clone range ioctl and have some remarks:
 
 On 27.01.2011 09:46, Li Zefan wrote:
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index f87552a..1b61dab 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file 
 *file, unsigned long srcfd,
  
  memcpy(new_key, key, sizeof(new_key));
  new_key.objectid = inode-i_ino;
 -new_key.offset = key.offset + destoff - off;
 +if (off = key.offset)
 +new_key.offset = key.offset + destoff - off;
 +else
 +new_key.offset = destoff;
^^^
 1) This looks spurious to me. What if destoff isn't aligned? That's what
 the key.offset - off code above is for. Before the patch, the code
 didn't work at all, I agree. But this fix can only work for aligned
 requests.
 
 2) The error in new_key also has propagated to the extent item's backref
 and wasn't fixed there. I did a range clone and ended up with an extent
 item like that:
 item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047
 itemsize 169
 extent refs 8 gen 11103 flags 1
 [...]
 extent data backref root 257 objectid 272 offset
 18446744073709494272 count 1
 
 The last offset (equal to -14 * 4k) is obviously wrong. I didn't figure
 out how the variables are computed, but it looks like there's something
 wrong with the datao u64 to me.
 

Unfortunately this is expected. The calculation is:

extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset

so you may get negative offset.

The design idea was to reduce the number of extent backrefs in that
a data backref can point to different file extents in the same file
(in this case the count field  1). We didn't expect nagetive
offset until range clone was implemented.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting options permanently?

2012-01-29 Thread Li Zefan
 Is there a way to set options like compression on a btrfs permanently to
 activate them even when mounted automatically by the desktop or manually
 by a third person?

Actually there is.

Btrfs supports per file compression flag, and if this flag is set to a
directory, all files in that directory will inherit the flag.

So you can mount the filesystem for the first time and set the flag for
the root dir, and that's it. There's a flaw that the compression method
is zlib, which should be changed to lzo (or lz4, or snappy..) in the
future.

chattr is used to set file flags, and there was a patchset to enable
compression flag for chattr, but was never merged:

http://www.spinics.net/lists/linux-btrfs/msg09604.html
http://www.spinics.net/lists/linux-btrfs/msg09605.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated btrfs/crypto snappy interface ready for merging

2012-01-17 Thread Li Zefan
Andi Kleen wrote:
 It's because decompressing inline extents always fails. I've fixed it
 and will send the patch out in a new mail thread.
 
 Thanks for fixing.
 

 But seems there's bug in lib snappy code, which makes the decompressed
 data doesn't quite match the original data.

 Simply copy a file to a btrfs filesystem with snappy enabled, and clear
 page cache, and check the file:
 
 Hmm weird, I have never seen this. Do you have a reproducer?
 

Just use some randomly chosen text files. For example:

# mount -t btrfs -o compress=snappy /dev/xxx /mnt
# cp -r btrfs-progs-unstable/ /mnt
# sync
# echo 3  /proc/sys/vm/drop_caches
# diff -Nurp btrfs-progs-unstable /mnt/btrfs-progs-unstable

I've tested on both x86_32 and x86_64.

 The basic compression code is quite well tested, I have a reasonable
 unit test.
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix decompressing of snappy-compressed inline extents

2012-01-17 Thread Li Zefan
The first four bytes is the length of all data chunks, and the first
four bytes of each chunk is the length of compressed chunk data,
even when there's only one chunk, which is the case for inline
extents.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/snappy.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/snappy.c b/fs/btrfs/snappy.c
index d6bd607..35b877e 100644
--- a/fs/btrfs/snappy.c
+++ b/fs/btrfs/snappy.c
@@ -392,12 +392,16 @@ static int btrfs_snappy_decompress(struct list_head *ws, 
unsigned char *data_in,
struct workspace *workspace = list_entry(ws, struct workspace, list);
size_t in_len;
size_t out_len;
+   size_t tot_len;
int ret = 0;
char *kaddr;
unsigned long bytes;
 
BUG_ON(srclen  SNAPPY_LEN);
 
+   tot_len = read_compress_length(data_in);
+   data_in += SNAPPY_LEN;
+
in_len = read_compress_length(data_in);
data_in += SNAPPY_LEN;
 
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated btrfs/crypto snappy interface ready for merging

2012-01-17 Thread Li Zefan
Andi Kleen wrote:
 It's because decompressing inline extents always fails. I've fixed it
 and will send the patch out in a new mail thread.
 
 Thanks for fixing.
 

 But seems there's bug in lib snappy code, which makes the decompressed
 data doesn't quite match the original data.

 Simply copy a file to a btrfs filesystem with snappy enabled, and clear
 page cache, and check the file:
 
 Hmm weird, I have never seen this. Do you have a reproducer?
 
 The basic compression code is quite well tested, I have a reasonable
 unit test.
 

At first I saved emails and patched them in linux-btrfs git tree, and I
got weired result. Then I pulled the snappy branch directly, and now
nothing is wrong.. Sorry for the noise.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix incompat flags setting

2012-01-16 Thread Li Zefan
The new snappy code reveals this old bug in btrfs, which happened to
work for LZO.

# mkfs.btrfs /dev/loop2
# mount /dev/loop2
# umount /mnt
# btrfsck /dev/loop2
couldn't open because of unsupported option features (10).

The corresponding incompat flag was set, though we didn't mount with
snappy enabled.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---

This patch is for the snappy branch.

---
 fs/btrfs/disk-io.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5924163..27101e4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2115,9 +2115,9 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
features = btrfs_super_incompat_flags(disk_super);
features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
-   if (tree_root-fs_info-compress_type  BTRFS_COMPRESS_LZO)
+   if (tree_root-fs_info-compress_type == BTRFS_COMPRESS_LZO)
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
-   if (tree_root-fs_info-compress_type  BTRFS_COMPRESS_SNAPPY)
+   if (tree_root-fs_info-compress_type == BTRFS_COMPRESS_SNAPPY)
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_SNAPPY;
btrfs_set_super_incompat_flags(disk_super, features);
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/11] Btrfs: simplfy calculation of stripe length for discard operation

2012-01-10 Thread Li Zefan
For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().

However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:

dev1  dev2   dev3(raid0)
---
s0 s3 s6  s1 s4 s7   s2 s5

Each device has (total_stripes / nr_dev) stripes, or plus one.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |   95 +---
 1 files changed, 31 insertions(+), 64 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 540fdd2..563ef65 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3024,80 +3024,47 @@ static int __btrfs_map_block(struct btrfs_mapping_tree 
*map_tree, int rw,
atomic_set(bbio-error, 0);
 
if (rw  REQ_DISCARD) {
+   int factor = 0;
+   int sub_stripes = 0;
+   u64 stripes_per_dev = 0;
+   u32 remaining_stripes = 0;
+
+   if (map-type 
+   (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) {
+   if (map-type  BTRFS_BLOCK_GROUP_RAID0)
+   sub_stripes = 1;
+   else
+   sub_stripes = map-sub_stripes;
+
+   factor = map-num_stripes / sub_stripes;
+   stripes_per_dev = div_u64_rem(stripe_nr_end -
+ stripe_nr_orig,
+ factor,
+ remaining_stripes);
+   }
+
for (i = 0; i  num_stripes; i++) {
bbio-stripes[i].physical =
map-stripes[stripe_index].physical +
stripe_offset + stripe_nr * map-stripe_len;
bbio-stripes[i].dev = map-stripes[stripe_index].dev;
 
-   if (map-type  BTRFS_BLOCK_GROUP_RAID0) {
-   u64 stripes;
-   u32 last_stripe = 0;
-   int j;
-
-   div_u64_rem(stripe_nr_end - 1,
-   map-num_stripes,
-   last_stripe);
-
-   for (j = 0; j  map-num_stripes; j++) {
-   u32 test;
-
-   div_u64_rem(stripe_nr_end - 1 - j,
-   map-num_stripes, test);
-   if (test == stripe_index)
-   break;
-   }
-   stripes = stripe_nr_end - 1 - j;
-   do_div(stripes, map-num_stripes);
-   bbio-stripes[i].length = map-stripe_len *
-   (stripes - stripe_nr + 1);
-
-   if (i == 0) {
+   if (map-type  (BTRFS_BLOCK_GROUP_RAID0 |
+BTRFS_BLOCK_GROUP_RAID10)) {
+   bbio-stripes[i].length = stripes_per_dev *
+ map-stripe_len;
+   if (i / sub_stripes  remaining_stripes)
+   bbio-stripes[i].length +=
+   map-stripe_len;
+   if (i  sub_stripes)
bbio-stripes[i].length -=
stripe_offset;
-   stripe_offset = 0;
-   }
-   if (stripe_index == last_stripe)
-   bbio-stripes[i].length -=
-   stripe_end_offset;
-   } else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
-   u64 stripes;
-   int j;
-   int factor = map-num_stripes /
-map-sub_stripes;
-   u32 last_stripe = 0;
-
-   div_u64_rem(stripe_nr_end - 1,
-   factor, last_stripe);
-   last_stripe *= map-sub_stripes;
-
-   for (j = 0; j  factor; j++) {
-   u32 test;
-
-   div_u64_rem(stripe_nr_end - 1 - j,
-   factor, test

[PATCH 09/11][RESEND] Btrfs: rewrite btrfs_trim_block_group()

2012-01-10 Thread Li Zefan
There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Before patching:

# fstrim -v /mnt/
/mnt/: 1496465408 bytes were trimmed

After patching:

# fstrim -v /mnt/
/mnt/: 2193768448 bytes were trimmed

And this matches the total free space:

# btrfs fi df /mnt
Data: total=3.58GB, used=1.79GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=205.12MB, used=97.14MB
Metadata: total=8.00MB, used=0.00

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |  235 ++-
 1 files changed, 164 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index e4eb222..b3cbb89 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2594,17 +2594,57 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster 
*cluster)
cluster-block_group = NULL;
 }
 
-int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
-  u64 *trimmed, u64 start, u64 end, u64 minlen)
+static int do_trimming(struct btrfs_block_group_cache *block_group,
+  u64 *total_trimmed, u64 start, u64 bytes,
+  u64 reserved_start, u64 reserved_bytes)
 {
-   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-   struct btrfs_free_space *entry = NULL;
+   struct btrfs_space_info *space_info = block_group-space_info;
struct btrfs_fs_info *fs_info = block_group-fs_info;
-   u64 bytes = 0;
-   u64 actually_trimmed;
-   int ret = 0;
+   int ret;
+   int update = 0;
+   u64 trimmed = 0;
 
-   *trimmed = 0;
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (!block_group-ro) {
+   block_group-reserved += reserved_bytes;
+   space_info-bytes_reserved += reserved_bytes;
+   update = 1;
+   }
+   spin_unlock(block_group-lock);
+   spin_unlock(space_info-lock);
+
+   ret = btrfs_error_discard_extent(fs_info-extent_root,
+start, bytes, trimmed);
+   if (!ret)
+   *total_trimmed += trimmed;
+
+   btrfs_add_free_space(block_group, reserved_start, reserved_bytes);
+
+   if (update) {
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (block_group-ro)
+   space_info-bytes_readonly += reserved_bytes;
+   block_group-reserved -= reserved_bytes;
+   space_info-bytes_reserved -= reserved_bytes;
+   spin_unlock(space_info-lock);
+   spin_unlock(block_group-lock);
+   }
+
+   return ret;
+}
+
+static int trim_no_bitmap(struct btrfs_block_group_cache *block_group,
+ u64 *total_trimmed, u64 start, u64 end, u64 minlen)
+{
+   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
+   struct btrfs_free_space *entry;
+   struct rb_node *node;
+   int ret = 0;
+   u64 extent_start;
+   u64 extent_bytes;
+   u64 bytes;
 
while (start  end) {
spin_lock(ctl-tree_lock);
@@ -2615,81 +2655,118 @@ int btrfs_trim_block_group(struct 
btrfs_block_group_cache *block_group,
}
 
entry = tree_search_offset(ctl, start, 0, 1);
-   if (!entry)
-   entry = tree_search_offset(ctl,
-  offset_to_bitmap(ctl, start),
-  1, 1);
-
-   if (!entry || entry-offset = end) {
+   if (!entry) {
spin_unlock(ctl-tree_lock);
break;
}
 
-   if (entry-bitmap) {
-   ret = search_bitmap(ctl, entry, start, bytes);
-   if (!ret) {
-   if (start = end) {
-   spin_unlock(ctl-tree_lock);
-   break;
-   }
-   bytes = min(bytes, end - start);
-   bitmap_clear_bits(ctl, entry, start, bytes

[PATCH 10/11] Btrfs: update global block_rsv when creating a new block group

2012-01-10 Thread Li Zefan
A bug was triggered while using seed device:

# mkfs.btrfs /dev/loop1
# btrfstune -S 1 /dev/loop1
# mount -o /dev/loop1 /mnt
# btrfs dev add /dev/loop2 /mnt

btrfs: block rsv returned -28
[ cut here ]
WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 
[btrfs]()
...
Call Trace:
...
[f7b7c31c] btrfs_cow_block+0x101/0x147 [btrfs]
[f7b7eaa6] btrfs_search_slot+0x1b8/0x55f [btrfs]
[f7b7f844] btrfs_insert_empty_items+0x42/0x7f [btrfs]
[f7b7f8c1] btrfs_insert_item+0x40/0x7e [btrfs]
[f7b8ac02] btrfs_make_block_group+0x243/0x2aa [btrfs]
[f7bb3f53] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
[f7bb41ff] init_first_rw_device+0x77/0x13c [btrfs]
[f7bb5a62] btrfs_init_new_device+0x664/0x9fd [btrfs]
[f7bbb65a] btrfs_ioctl+0x694/0xdbe [btrfs]
[c04f55f7] do_vfs_ioctl+0x496/0x4cc
[c04f5660] sys_ioctl+0x33/0x4f
[c07b9edf] sysenter_do_call+0x12/0x38
---[ end trace 906adac595facc7d ]---

Since seed device is readonly, there's no usable space in the filesystem.
Afterwards we add a sprout device to it, and the kernel creates a METADATA
block group and a SYSTEM block group where comes free space we can reserve,
but we still get revervation failure because the global block_rsv hasn't
been updated accordingly.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5b53479..bf30f67 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7446,6 +7446,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
ret = update_space_info(root-fs_info, cache-flags, size, bytes_used,
cache-space_info);
BUG_ON(ret);
+   update_global_block_rsv(root-fs_info);
 
spin_lock(cache-space_info-lock);
cache-space_info-bytes_readonly += cache-bytes_super;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/11] Btrfs: fix possible deadlock when opening a seed device

2012-01-10 Thread Li Zefan
The correct lock order is uuid_mutex - volume_mutex - chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

open_ctree()
lock(chunk_mutex);
read_chunk_tree();
read_one_dev();
open_seed_devices();
lock(uuid_mutex);

and then we hit a lockdep splat.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |2 --
 fs/btrfs/volumes.c |9 +++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f9d555..858ab34 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2270,9 +2270,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
   (unsigned long)btrfs_header_chunk_tree_uuid(chunk_root-node),
   BTRFS_UUID_SIZE);
 
-   mutex_lock(fs_info-chunk_mutex);
ret = btrfs_read_chunk_tree(chunk_root);
-   mutex_unlock(fs_info-chunk_mutex);
if (ret) {
printk(KERN_WARNING btrfs: failed to read chunk tree on %s\n,
   sb-s_id);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 563ef65..fbb493b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3506,7 +3506,7 @@ static int open_seed_devices(struct btrfs_root *root, u8 
*fsid)
struct btrfs_fs_devices *fs_devices;
int ret;
 
-   mutex_lock(uuid_mutex);
+   BUG_ON(!mutex_is_locked(uuid_mutex));
 
fs_devices = root-fs_info-fs_devices-seed;
while (fs_devices) {
@@ -3544,7 +3544,6 @@ static int open_seed_devices(struct btrfs_root *root, u8 
*fsid)
fs_devices-seed = root-fs_info-fs_devices-seed;
root-fs_info-fs_devices-seed = fs_devices;
 out:
-   mutex_unlock(uuid_mutex);
return ret;
 }
 
@@ -3687,6 +3686,9 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
if (!path)
return -ENOMEM;
 
+   mutex_lock(uuid_mutex);
+   lock_chunks(root);
+
/* first we search for all of the device items, and then we
 * read in all of the chunk items.  This way we can create chunk
 * mappings that reference all of the devices that are afound
@@ -3737,6 +3739,9 @@ again:
}
ret = 0;
 error:
+   unlock_chunks(root);
+   mutex_unlock(uuid_mutex);
+
btrfs_free_path(path);
return ret;
 }
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/11] Btrfs: reserve metadata space in btrfs_ioctl_setflags()

2012-01-10 Thread Li Zefan
Check and reserve space for btrfs_update_inode().

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9619fb0..fe8a60c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -254,7 +254,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
ip-flags = ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS);
}
 
-   trans = btrfs_join_transaction(root);
+   trans = btrfs_start_transaction(root, 1);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
goto out_drop;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/11] Btrfs: check the return value of io_ctl_init()

2012-01-10 Thread Li Zefan
It can return -ENOMEM.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4e55af3..e4eb222 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -637,7 +637,10 @@ int __load_free_space_cache(struct btrfs_root *root, 
struct inode *inode,
if (!num_entries)
return 0;
 
-   io_ctl_init(io_ctl, inode, root);
+   ret = io_ctl_init(io_ctl, inode, root);
+   if (ret)
+   return ret;
+
ret = readahead_cache(inode);
if (ret)
goto out;
@@ -851,7 +854,9 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
if (!i_size_read(inode))
return -1;
 
-   io_ctl_init(io_ctl, inode, root);
+   ret = io_ctl_init(io_ctl, inode, root);
+   if (ret)
+   return -1;
 
/* Get the cluster for this block_group if it exists */
if (block_group  !list_empty(block_group-cluster_list))
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/11] Btrfs: some patches for 3.3

2012-01-10 Thread Li Zefan
The biggest one is a fix for fstrim, and there's a fix for on-disk
free space cache. Others are small fixes and cleanups.

The last three have been sent weeks ago.

The patchset is also available in this repo:

git://repo.or.cz/linux-btrfs-devel.git for-chris

Note there's a small confict with Al Viro's vfs changes.

Li Zefan (11):
  Btrfs: add pinned extents to on-disk free space cache correctly
  Btrfs: avoid possible NULL deref in io_ctl_drop_pages()
  Btrfs: check the return value of io_ctl_init()
  Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()
  Btrfs: reserve metadata space in btrfs_ioctl_setflags()
  Btrfs: don't pass a trans handle unnecessarily in volumes.c
  Btrfs: don't pre-allocate btrfs bio
  Btrfs: simplfy calculation of stripe length for discard operation
  Btrfs: rewrite btrfs_trim_block_group()
  Btrfs: update global block_rsv when creating a new block group
  Btrfs: fix possible deadlock when opening a seed device

 fs/btrfs/disk-io.c  |2 -
 fs/btrfs/extent-tree.c  |3 +-
 fs/btrfs/free-space-cache.c |  293 +--
 fs/btrfs/ioctl.c|   20 +++-
 fs/btrfs/volumes.c  |  189 ++--
 fs/btrfs/volumes.h  |3 +-
 6 files changed, 280 insertions(+), 230 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] Btrfs: avoid possible NULL deref in io_ctl_drop_pages()

2012-01-10 Thread Li Zefan
If we run into some failure path in io_ctl_prepare_pages(),
io_ctl-pages[] array may have some NULL pointers.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 01840ef..4e55af3 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl)
io_ctl_unmap_page(io_ctl);
 
for (i = 0; i  io_ctl-num_pages; i++) {
-   ClearPageChecked(io_ctl-pages[i]);
-   unlock_page(io_ctl-pages[i]);
-   page_cache_release(io_ctl-pages[i]);
+   if (io_ctl-pages[i]) {
+   ClearPageChecked(io_ctl-pages[i]);
+   unlock_page(io_ctl-pages[i]);
+   page_cache_release(io_ctl-pages[i]);
+   }
}
 }
 
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/11] Btrfs: don't pass a trans handle unnecessarily in volumes.c

2012-01-10 Thread Li Zefan
Some functions never use the transaction handle passed to them.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |2 +-
 fs/btrfs/volumes.c |   18 +++---
 fs/btrfs/volumes.h |3 +--
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8603ee4..5b53479 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7084,7 +7084,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
 * space to fit our block group in.
 */
if (device-total_bytes  device-bytes_used + min_free) {
-   ret = find_free_dev_extent(NULL, device, min_free,
+   ret = find_free_dev_extent(device, min_free,
   dev_offset, NULL);
if (!ret)
dev_nr++;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4b839f..73f673c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -829,7 +829,6 @@ out:
 
 /*
  * find_free_dev_extent - find free space in the specified device
- * @trans: transaction handler
  * @device:the device which we search the free space in
  * @num_bytes: the size of the free space that we need
  * @start: store the start of the free space.
@@ -848,8 +847,7 @@ out:
  * But if we don't find suitable free space, it is used to store the size of
  * the max free space.
  */
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 u64 *start, u64 *len)
 {
struct btrfs_key key;
@@ -893,7 +891,7 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
key.offset = search_start;
key.type = BTRFS_DEV_EXTENT_KEY;
 
-   ret = btrfs_search_slot(trans, root, key, path, 0, 0);
+   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
if (ret  0)
goto out;
if (ret  0) {
@@ -1469,8 +1467,7 @@ error_undo:
 /*
  * does all the dirty work required for changing file system's UUID.
  */
-static int btrfs_prepare_sprout(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root)
+static int btrfs_prepare_sprout(struct btrfs_root *root)
 {
struct btrfs_fs_devices *fs_devices = root-fs_info-fs_devices;
struct btrfs_fs_devices *old_devices;
@@ -1695,7 +1692,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
 
if (seeding_dev) {
sb-s_flags = ~MS_RDONLY;
-   ret = btrfs_prepare_sprout(trans, root);
+   ret = btrfs_prepare_sprout(root);
BUG_ON(ret);
}
 
@@ -2323,8 +2320,7 @@ done:
return ret;
 }
 
-static int btrfs_add_system_chunk(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
+static int btrfs_add_system_chunk(struct btrfs_root *root,
   struct btrfs_key *key,
   struct btrfs_chunk *chunk, int item_size)
 {
@@ -2496,7 +2492,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
if (total_avail == 0)
continue;
 
-   ret = find_free_dev_extent(trans, device,
+   ret = find_free_dev_extent(device,
   max_stripe_size * dev_stripes,
   dev_offset, max_avail);
if (ret  ret != -ENOSPC)
@@ -2687,7 +2683,7 @@ static int __finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
BUG_ON(ret);
 
if (map-type  BTRFS_BLOCK_GROUP_SYSTEM) {
-   ret = btrfs_add_system_chunk(trans, chunk_root, key, chunk,
+   ret = btrfs_add_system_chunk(chunk_root, key, chunk,
 item_size);
BUG_ON(ret);
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 78f2d4d..c1701ec 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -230,7 +230,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 
new_size);
 int btrfs_init_new_device(struct btrfs_root *root, char *path);
 int btrfs_balance(struct btrfs_root *dev_root);
 int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 u64 *start, u64 *max_avail);
 #endif
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/11] Btrfs: add pinned extents to on-disk free space cache correctly

2012-01-10 Thread Li Zefan
I got this while running xfstests:

[24256.836098] block group 317849600 has an wrong amount of free space
[24256.836100] btrfs: failed to load free space cache for block group 317849600

We should clamp the extent returned by find_first_extent_bit(),
so the start of the extent won't smaller than the start of the
block group.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |   41 -
 1 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ec23d43..01840ef 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -838,7 +838,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
struct io_ctl io_ctl;
struct list_head bitmap_list;
struct btrfs_key key;
-   u64 start, end, len;
+   u64 start, extent_start, extent_end, len;
int entries = 0;
int bitmaps = 0;
int ret;
@@ -857,25 +857,12 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
 struct btrfs_free_cluster,
 block_group_list);
 
-   /*
-* We shouldn't have switched the pinned extents yet so this is the
-* right one
-*/
-   unpin = root-fs_info-pinned_extents;
-
/* Lock all pages first so we can lock the extent safely. */
io_ctl_prepare_pages(io_ctl, inode, 0);
 
lock_extent_bits(BTRFS_I(inode)-io_tree, 0, i_size_read(inode) - 1,
 0, cached_state, GFP_NOFS);
 
-   /*
-* When searching for pinned extents, we need to start at our start
-* offset.
-*/
-   if (block_group)
-   start = block_group-key.objectid;
-
node = rb_first(ctl-free_space_offset);
if (!node  cluster) {
node = rb_first(cluster-root);
@@ -918,9 +905,20 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
 * We want to add any pinned extents to our free space cache
 * so we don't leak the space
 */
+
+   /*
+* We shouldn't have switched the pinned extents yet so this is the
+* right one
+*/
+   unpin = root-fs_info-pinned_extents;
+
+   if (block_group)
+   start = block_group-key.objectid;
+
while (block_group  (start  block_group-key.objectid +
   block_group-key.offset)) {
-   ret = find_first_extent_bit(unpin, start, start, end,
+   ret = find_first_extent_bit(unpin, start,
+   extent_start, extent_end,
EXTENT_DIRTY);
if (ret) {
ret = 0;
@@ -928,20 +926,21 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
}
 
/* This pinned extent is out of our range */
-   if (start = block_group-key.objectid +
+   if (extent_start = block_group-key.objectid +
block_group-key.offset)
break;
 
-   len = block_group-key.objectid +
-   block_group-key.offset - start;
-   len = min(len, end + 1 - start);
+   extent_start = max(extent_start, start);
+   extent_end = min(block_group-key.objectid +
+block_group-key.offset, extent_end + 1);
+   len = extent_end - extent_start;
 
entries++;
-   ret = io_ctl_add_entry(io_ctl, start, len, NULL);
+   ret = io_ctl_add_entry(io_ctl, extent_start, len, NULL);
if (ret)
goto out_nospc;
 
-   start = end + 1;
+   start = extent_end;
}
 
/* Write out the bitmaps */
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/11] Btrfs: don't pre-allocate btrfs bio

2012-01-10 Thread Li Zefan
We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.

Also we don't have to calcuate the stripe number twice.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |   67 ---
 1 files changed, 21 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73f673c..540fdd2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2897,26 +2897,13 @@ static int __btrfs_map_block(struct btrfs_mapping_tree 
*map_tree, int rw,
u64 stripe_nr;
u64 stripe_nr_orig;
u64 stripe_nr_end;
-   int stripes_allocated = 8;
-   int stripes_required = 1;
int stripe_index;
int i;
+   int ret = 0;
int num_stripes;
int max_errors = 0;
struct btrfs_bio *bbio = NULL;
 
-   if (bbio_ret  !(rw  (REQ_WRITE | REQ_DISCARD)))
-   stripes_allocated = 1;
-again:
-   if (bbio_ret) {
-   bbio = kzalloc(btrfs_bio_size(stripes_allocated),
-   GFP_NOFS);
-   if (!bbio)
-   return -ENOMEM;
-
-   atomic_set(bbio-error, 0);
-   }
-
read_lock(em_tree-lock);
em = lookup_extent_mapping(em_tree, logical, *length);
read_unlock(em_tree-lock);
@@ -2935,32 +2922,6 @@ again:
if (mirror_num  map-num_stripes)
mirror_num = 0;
 
-   /* if our btrfs_bio struct is too small, back off and try again */
-   if (rw  REQ_WRITE) {
-   if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_DUP)) {
-   stripes_required = map-num_stripes;
-   max_errors = 1;
-   } else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
-   stripes_required = map-sub_stripes;
-   max_errors = 1;
-   }
-   }
-   if (rw  REQ_DISCARD) {
-   if (map-type  (BTRFS_BLOCK_GROUP_RAID0 |
-BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_DUP |
-BTRFS_BLOCK_GROUP_RAID10)) {
-   stripes_required = map-num_stripes;
-   }
-   }
-   if (bbio_ret  (rw  (REQ_WRITE | REQ_DISCARD)) 
-   stripes_allocated  stripes_required) {
-   stripes_allocated = map-num_stripes;
-   free_extent_map(em);
-   kfree(bbio);
-   goto again;
-   }
stripe_nr = offset;
/*
 * stripe_nr counts the total number of stripes we have to stride
@@ -3055,6 +3016,13 @@ again:
}
BUG_ON(stripe_index = map-num_stripes);
 
+   bbio = kzalloc(btrfs_bio_size(num_stripes), GFP_NOFS);
+   if (!bbio) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   atomic_set(bbio-error, 0);
+
if (rw  REQ_DISCARD) {
for (i = 0; i  num_stripes; i++) {
bbio-stripes[i].physical =
@@ -3151,15 +3119,22 @@ again:
stripe_index++;
}
}
-   if (bbio_ret) {
-   *bbio_ret = bbio;
-   bbio-num_stripes = num_stripes;
-   bbio-max_errors = max_errors;
-   bbio-mirror_num = mirror_num;
+
+   if (rw  REQ_WRITE) {
+   if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
+BTRFS_BLOCK_GROUP_RAID10 |
+BTRFS_BLOCK_GROUP_DUP)) {
+   max_errors = 1;
+   }
}
+
+   *bbio_ret = bbio;
+   bbio-num_stripes = num_stripes;
+   bbio-max_errors = max_errors;
+   bbio-mirror_num = mirror_num;
 out:
free_extent_map(em);
-   return 0;
+   return ret;
 }
 
 int btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/11] Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()

2012-01-10 Thread Li Zefan
We can recover from errors and return -errno to user space.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |   18 ++
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c04f02c..9619fb0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -176,6 +176,8 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
struct btrfs_trans_handle *trans;
unsigned int flags, oldflags;
int ret;
+   u64 ip_oldflags;
+   unsigned int i_oldflags;
 
if (btrfs_root_readonly(root))
return -EROFS;
@@ -192,6 +194,9 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
 
mutex_lock(inode-i_mutex);
 
+   ip_oldflags = ip-flags;
+   i_oldflags = inode-i_flags;
+
flags = btrfs_mask_flags(inode-i_mode, flags);
oldflags = btrfs_flags_to_ioctl(ip-flags);
if ((flags ^ oldflags)  (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
@@ -250,18 +255,23 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
}
 
trans = btrfs_join_transaction(root);
-   BUG_ON(IS_ERR(trans));
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out_drop;
+   }
 
btrfs_update_iflags(inode);
inode-i_ctime = CURRENT_TIME;
ret = btrfs_update_inode(trans, root, inode);
-   BUG_ON(ret);
 
btrfs_end_transaction(trans, root);
+ out_drop:
+   if (ret) {
+   ip-flags = ip_oldflags;
+   inode-i_flags = i_oldflags;
+   }
 
mnt_drop_write(file-f_path.mnt);
-
-   ret = 0;
  out_unlock:
mutex_unlock(inode-i_mutex);
return ret;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs truncate() does not change inode times

2012-01-05 Thread Li Zefan
idank wrote:
 Hi all,
 I was running fstest (http://www.tuxera.com/community/posix-test-suite/) on 
 btrfs. Only one test failed, and I believe it to be a bug in btrfs. The 
 scenario is as follows:
 * crate a file.
 * note its times with stat.
 * sleep a few seconds
 * call truncate() on the file (not ftruncate(). ftruncate() works).
 * sync
 * note the file's times again with stat.
 expected result: ctime and mtime are greater.
 actual result: ctime and mtime remain unchanged.
 
 Example:

I followed your example, but got the expected result:

[root@lizf pjd-fstest-20090130-RC]# stat ctime_test 
  File: ctime_test
  Size: 0   Blocks: 0  IO Block: 4096   普通空文件
Device: 1bh/27d Inode: 1589980 Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2012-01-06 10:27:53.900570365 +0800
Modify: 2012-01-06 10:27:53.900570365 +0800
Change: 2012-01-06 10:27:53.900570365 +0800
[root@lizf pjd-fstest-20090130-RC]# ./fstest truncate ctime_test 200
0
[root@lizf pjd-fstest-20090130-RC]# sleep 2
[root@lizf pjd-fstest-20090130-RC]# stat ctime_test 
  File: ctime_test
  Size: 200 Blocks: 0  IO Block: 4096   普通文件
Device: 1bh/27d Inode: 1589980 Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2012-01-06 10:27:53.900570365 +0800
Modify: 2012-01-06 10:28:12.238569720 +0800
Change: 2012-01-06 10:28:12.238569720 +0800
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash in io_ctl_drop_pages after mount with csum errors

2012-01-05 Thread Li Zefan
David Sterba wrote:
 I mounted a multi-folume fs created not-so-long ago in a 3.1 based
 kernel and mounted with v3.2-rc7-83-g115e8e7 , it crashed immediately.
 It's quite possible that the disk is to blame, it's an old 160G
 SP1614C, but syslog does not contain any error messages. I'm not sure
 whether the fs was cleanly unmounted, seems not, but anyway I do not
 expect a crash.
 
 Label: none  uuid: 5f06f9eb-9736-49f7-91a2-2f45522512ef
 Total devices 4 FS bytes used 1.38GB
 devid4 size 34.00GB used 34.00GB path /dev/sdg8
 devid3 size 34.00GB used 34.00GB path /dev/sdg7
 devid2 size 34.00GB used 34.00GB path /dev/sdg6
 devid1 size 34.00GB used 34.00GB path /dev/sdg5
 
 mount options: compress-force=lzo,space_cache,autodefrag,inode_cache
 
 [ 1461.732855] btrfs: force lzo compression
 [ 1461.732876] btrfs: enabling auto defrag
 [ 1461.732893] btrfs: enabling inode map caching
 [ 1461.732907] btrfs: disk space caching is enabled
 [ 1499.796181] btrfs: csum mismatch on free space cache
 [ 1499.796266] btrfs: failed to load free space cache for block group 29360128
 [ 1499.888699] btrfs csum failed ino 18446744073709551604 off 65536 csum 
 2566472073 private 1925235876
 [ 1499.26] btrfs csum failed ino 18446744073709551604 off 327680 csum 
 2566472073 private 1925235876
 [ 1499.906229] btrfs csum failed ino 18446744073709551604 off 0 csum 
 1695430581 private 1170642078
 [ 1499.906345] btrfs csum failed ino 18446744073709551604 off 262144 csum 
 2566472073 private 1925235876
 [ 1499.906446] btrfs csum failed ino 18446744073709551604 off 524288 csum 
 2566472073 private 1925235876
 [ 1499.924469] btrfs csum failed ino 18446744073709551604 off 196608 csum 
 2566472073 private 1925235876
 [ 1499.924574] btrfs csum failed ino 18446744073709551604 off 458752 csum 
 2566472073 private 1925235876
 [ 1499.946076] btrfs csum failed ino 18446744073709551604 off 131072 csum 
 2566472073 private 1925235876
 [ 1499.946217] btrfs csum failed ino 18446744073709551604 off 393216 csum 
 2566472073 private 1925235876
 [ 1499.946318] btrfs csum failed ino 18446744073709551604 off 0 csum 
 1695430581 private 1170642078
 [ 1499.946362] btrfs: error reading free space cache

We have inconsitent data on disk with both free space cache and free ino cache.

 [ 1499.946409] BUG: unable to handle kernel NULL pointer dereference at 
 0001
 [ 1499.946437] IP: [a0456dd7] io_ctl_drop_pages+0x37/0x70 [btrfs]

0x01 is weired, don't know how it occured. Nevertheless we need this fix:

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ec23d43..81771ca 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl)
io_ctl_unmap_page(io_ctl);
 
for (i = 0; i  io_ctl-num_pages; i++) {
-   ClearPageChecked(io_ctl-pages[i]);
-   unlock_page(io_ctl-pages[i]);
-   page_cache_release(io_ctl-pages[i]);
+   if (io_ctl-pages[i]) {
+   ClearPageChecked(io_ctl-pages[i]);
+   unlock_page(io_ctl-pages[i]);
+   page_cache_release(io_ctl-pages[i]);
+   }
}
 }

I'll resend the patch along with my other pending patches for 3.3.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] Btrfs: backref walking rewrite

2012-01-04 Thread Li Zefan
Jan Schmidt wrote:
 This patch series is a major rewrite of the backref walking code. The patch
 series Arne sent some weeks ago for quota groups had a very interesting
 function, find_all_roots. I took this from him together with the bits needed
 for find_all_roots to work and replaced a major part of the code in backref.c
 with it.
 
 It can be pulled from
   git://git.jan-o-sch.net/btrfs-unstable for-chris
 There's also a gitweb for that repo on
   http://git.jan-o-sch.net/?p=btrfs-unstable
 

Thanks for the work!

I got a compile warning:

  CC [M]  fs/btrfs/backref.o
fs/btrfs/backref.c: In function 'inode_to_path':
fs/btrfs/backref.c:1312:3: warning: format '%ld' expects type 'long int', but 
argument 3 has type 'int
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fstrim on BTRFS

2011-12-29 Thread Li Zefan
Fajar A. Nugraha wrote:
 On Wed, Dec 28, 2011 at 11:57 PM, Martin Steigerwald
 mar...@lichtvoll.de wrote:
 But BTRFS does not:

 merkaba:~ fstrim -v /
 /: 4431613952 bytes were trimmed
 merkaba:~ fstrim -v /
 /: 4341846016 bytes were trimmed
 
  and apparently it can't trim everything. Or maybe my kernel is
 just too old.
 
 
 $ sudo fstrim -v /
 2258165760 Bytes was trimmed
 
 $ df -h /
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda6  50G   34G   12G  75% /
 
 $ mount | grep / 
 /dev/sda6 on / type btrfs (rw,noatime,subvolid=258,compress-force=lzo)
 
 so only about 2G out of 12G can be trimmed. This is on kernel 3.1.4.
 

That's because only free spaces in block groups will be trimmed. Btrfs
allocates space from block groups, and when there's no space availabe,
it will allocate a new block group from the pool. In your case there's
~10G in the pool.

You can do a btrfs fi df /, and you'll see the total size of existing
block groups.

You can empty the pool by:

# dd if=/dev/zero of=/mytmpfile bs=1M

Then release the space (but it won't return back to the pool):

# rm /mytmpfile
# sync

and try btrfs fi df / and trim again.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 0/3] Btrfs: speed up fstrim

2011-12-29 Thread Li Zefan
(This patchset is not for merge or review, except the first patch)

By remembering which areas have been trimmed, we can speed up
fstrim significantly.

# fstrim -v /mnt/
/mnt/: 152772608 bytes were trimmed
# fstrim -v /mnt/
/mnt/: 0 bytes were trimmed

To implement this, after a free space item has been trimmed, we
mark it as trimmed before inserting it into free space cache.

(*)If we want to speed up the first fstrim after mounting the filesystem,
we have to save the trimmed flag to disk, which will break backward
compatibility, but only 3.2-rcX kernels will be affected.

That is, if you use fstrim in newest kernel with this patchset applied,
and then you mount the fs in a 3.2-rcX kernel, you may trigger a BUG_ON()
in __load_free_space_cache() sooner or later.

So, is this acceptable?

 # fstrim -v /mnt/
 /mnt/: 267714560 bytes were trimmed
 # fstrim -v /mnt/
 /mnt/: 0 bytes were trimmed
 # sync
 # umount /mnt
 # !mount
 # fstrim -v /mnt/
 /mnt/: 152240128 bytes were trimmed

Because caches for block groups smaller than 100M will not be written
to disk, we'll still have to trim them.

*See this thread for a user request for this feature:
https://lkml.org/lkml/2011/12/1/24

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3][URGENT] Btrfs: allow future use of type field of struct btrfs_free_space_entry

2011-12-29 Thread Li Zefan
This field indicates if an entry is an extent or a bitmap, and only 2
bits of it are used.

This patch makes the other bits are avaiable for future use without
breaking old kernels. For example, we're going to use one bit to
mark if the free space has been trimmed.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---

This has to be queued for 3.2, so later patches can affect 3.2-rcX
kernels only.

---
 fs/btrfs/ctree.h|4 ++--
 fs/btrfs/free-space-cache.c |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6738503..ca4eb2d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -278,8 +278,8 @@ struct btrfs_chunk {
/* additional stripes go here */
 } __attribute__ ((__packed__));
 
-#define BTRFS_FREE_SPACE_EXTENT1
-#define BTRFS_FREE_SPACE_BITMAP2
+#define BTRFS_FREE_SPACE_EXTENT(1  0)
+#define BTRFS_FREE_SPACE_BITMAP(1  1)
 
 struct btrfs_free_space_entry {
__le64 offset;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ec23d43..044c0ec 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -669,7 +669,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
goto free_cache;
}
 
-   if (type == BTRFS_FREE_SPACE_EXTENT) {
+   if (type  BTRFS_FREE_SPACE_EXTENT) {
spin_lock(ctl-tree_lock);
ret = link_free_space(ctl, e);
spin_unlock(ctl-tree_lock);
@@ -679,7 +679,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
kmem_cache_free(btrfs_free_space_cachep, e);
goto free_cache;
}
-   } else {
+   } else if (type  BTRFS_FREE_SPACE_BITMAP) {
BUG_ON(!num_bitmaps);
num_bitmaps--;
e-bitmap = kzalloc(PAGE_CACHE_SIZE, GFP_NOFS);
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Btrfs: speed up fstrim

2011-12-29 Thread Li Zefan
By remembering which areas has been trimmed, we can speed up
fstrim significantly.

# fstrim -v /mnt/
/mnt/: 152772608 bytes were trimmed
# fstrim -v /mnt/
/mnt/: 0 bytes were trimmed

No bytes has to be trimmed for the second run.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c  |   29 ++---
 fs/btrfs/free-space-cache.c |   38 --
 fs/btrfs/free-space-cache.h |7 ---
 fs/btrfs/inode-map.c|   16 +---
 4 files changed, 63 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f5fbe57..e743395 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -319,7 +319,7 @@ static u64 add_new_free_space(struct 
btrfs_block_group_cache *block_group,
size = extent_start - start;
total_added += size;
ret = btrfs_add_free_space(block_group, start,
-  size);
+  size, false);
BUG_ON(ret);
start = extent_end + 1;
} else {
@@ -330,7 +330,7 @@ static u64 add_new_free_space(struct 
btrfs_block_group_cache *block_group,
if (start  end) {
size = end - start;
total_added += size;
-   ret = btrfs_add_free_space(block_group, start, size);
+   ret = btrfs_add_free_space(block_group, start, size, false);
BUG_ON(ret);
}
 
@@ -4631,7 +4631,7 @@ static int unpin_extent_range(struct btrfs_root *root, 
u64 start, u64 end)
 
if (start  cache-last_byte_to_unpin) {
len = min(len, cache-last_byte_to_unpin - start);
-   btrfs_add_free_space(cache, start, len);
+   btrfs_add_free_space(cache, start, len, false);
}
 
start += len;
@@ -4987,7 +4987,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle 
*trans,
 
WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, buf-bflags));
 
-   btrfs_add_free_space(cache, buf-start, buf-len);
+   btrfs_add_free_space(cache, buf-start, buf-len, false);
btrfs_update_reserved_bytes(cache, buf-len, RESERVE_FREE);
}
 out:
@@ -5427,14 +5427,16 @@ checks:
search_start = stripe_align(root, offset);
/* move on to the next group */
if (search_start + num_bytes = search_end) {
-   btrfs_add_free_space(used_block_group, offset, 
num_bytes);
+   btrfs_add_free_space(used_block_group, offset,
+num_bytes, false);
goto loop;
}
 
/* move on to the next group */
if (search_start + num_bytes 
used_block_group-key.objectid + 
used_block_group-key.offset) {
-   btrfs_add_free_space(used_block_group, offset, 
num_bytes);
+   btrfs_add_free_space(used_block_group, offset,
+num_bytes, false);
goto loop;
}
 
@@ -5443,13 +5445,14 @@ checks:
 
if (offset  search_start)
btrfs_add_free_space(used_block_group, offset,
-search_start - offset);
+search_start - offset, false);
BUG_ON(offset  search_start);
 
ret = btrfs_update_reserved_bytes(used_block_group, num_bytes,
  alloc_type);
if (ret == -EAGAIN) {
-   btrfs_add_free_space(used_block_group, offset, 
num_bytes);
+   btrfs_add_free_space(used_block_group, offset,
+num_bytes, false);
goto loop;
}
 
@@ -5459,7 +5462,7 @@ checks:
 
if (offset  search_start)
btrfs_add_free_space(used_block_group, offset,
-search_start - offset);
+search_start - offset, false);
BUG_ON(offset  search_start);
if (used_block_group != block_group)
btrfs_put_block_group(used_block_group);
@@ -5668,6 +5671,7 @@ static int __btrfs_free_reserved_extent(struct btrfs_root 
*root,
 {
struct btrfs_block_group_cache *cache;
int ret = 0;
+   bool trimmed = false;
 
cache = btrfs_lookup_block_group(root-fs_info, start);
if (!cache) {
@@ -5676,13 +5680,16 @@ static int __btrfs_free_reserved_extent(struct 
btrfs_root *root,
return -ENOSPC

[PATCH 3/3] Btrfs: save trimmed flag onto disk

2011-12-29 Thread Li Zefan
To speed up the first fstrim after mounting the filesystem, we save the
trimmed flag to disk.

 # fstrim -v /mnt/
 /mnt/: 267714560 bytes were trimmed
 # fstrim -v /mnt/
 /mnt/: 0 bytes were trimmed
 # sync
 # umount /mnt
 # !mount
 # fstrim -v /mnt/
 /mnt/: 152240128 bytes were trimmed

Because caches for block groups smaller than 100M will not be written
to disk, we'll still have to trim them.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ctree.h|1 +
 fs/btrfs/free-space-cache.c |   19 ---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ca4eb2d..84e9ff6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -280,6 +280,7 @@ struct btrfs_chunk {
 
 #define BTRFS_FREE_SPACE_EXTENT(1  0)
 #define BTRFS_FREE_SPACE_BITMAP(1  1)
+#define BTRFS_FREE_SPACE_TRIMMED   (1  2)
 
 struct btrfs_free_space_entry {
__le64 offset;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index cba2a94..592ba54 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -469,7 +469,7 @@ static int io_ctl_check_crc(struct io_ctl *io_ctl, int 
index)
 }
 
 static int io_ctl_add_entry(struct io_ctl *io_ctl, u64 offset, u64 bytes,
-   void *bitmap)
+   void *bitmap, bool trimmed)
 {
struct btrfs_free_space_entry *entry;
 
@@ -481,6 +481,8 @@ static int io_ctl_add_entry(struct io_ctl *io_ctl, u64 
offset, u64 bytes,
entry-bytes = cpu_to_le64(bytes);
entry-type = (bitmap) ? BTRFS_FREE_SPACE_BITMAP :
BTRFS_FREE_SPACE_EXTENT;
+   if (trimmed)
+   entry-type |= BTRFS_FREE_SPACE_TRIMMED;
io_ctl-cur += sizeof(struct btrfs_free_space_entry);
io_ctl-size -= sizeof(struct btrfs_free_space_entry);
 
@@ -669,6 +671,9 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
goto free_cache;
}
 
+   if (type  BTRFS_FREE_SPACE_TRIMMED)
+   e-trimmed = true;
+
if (type  BTRFS_FREE_SPACE_EXTENT) {
spin_lock(ctl-tree_lock);
ret = link_free_space(ctl, e);
@@ -899,7 +904,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
entries++;
 
ret = io_ctl_add_entry(io_ctl, e-offset, e-bytes,
-  e-bitmap);
+  e-bitmap, e-trimmed);
if (ret)
goto out_nospc;
 
@@ -937,7 +942,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
len = min(len, end + 1 - start);
 
entries++;
-   ret = io_ctl_add_entry(io_ctl, start, len, NULL);
+   ret = io_ctl_add_entry(io_ctl, start, len, NULL, false);
if (ret)
goto out_nospc;
 
@@ -2696,6 +2701,14 @@ int btrfs_trim_block_group(struct 
btrfs_block_group_cache *block_group,
if (update) {
spin_lock(space_info-lock);
spin_lock(block_group-lock);
+
+   if (btrfs_test_opt(fs_info-tree_root,
+   SPACE_CACHE) 
+   block_group-disk_cache_state 
+   BTRFS_DC_CLEAR);
+   block_group-disk_cache_state =
+   BTRFS_DC_CLEAR;
+   block_group-dirty = 1;
if (block_group-ro)
space_info-bytes_readonly += bytes;
block_group-reserved -= bytes;
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fstrim on BTRFS

2011-12-29 Thread Li Zefan
Fajar A. Nugraha wrote:
 On Thu, Dec 29, 2011 at 4:39 PM, Li Zefan l...@cn.fujitsu.com wrote:
 Fajar A. Nugraha wrote:
 On Wed, Dec 28, 2011 at 11:57 PM, Martin Steigerwald
 mar...@lichtvoll.de wrote:
 But BTRFS does not:

 merkaba:~ fstrim -v /
 /: 4431613952 bytes were trimmed
 merkaba:~ fstrim -v /
 /: 4341846016 bytes were trimmed

  and apparently it can't trim everything. Or maybe my kernel is
 just too old.


 $ sudo fstrim -v /
 2258165760 Bytes was trimmed

 $ df -h /
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda6  50G   34G   12G  75% /

 $ mount | grep / 
 /dev/sda6 on / type btrfs (rw,noatime,subvolid=258,compress-force=lzo)

 so only about 2G out of 12G can be trimmed. This is on kernel 3.1.4.


 That's because only free spaces in block groups will be trimmed. Btrfs
 allocates space from block groups, and when there's no space availabe,
 it will allocate a new block group from the pool. In your case there's
 ~10G in the pool.
 
 Thanks for your response.
 

 You can do a btrfs fi df /, and you'll see the total size of existing
 block groups.
 
 $ sudo btrfs fi df /
 Data: total=43.47GB, used=31.88GB
 System, DUP: total=8.00MB, used=12.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=3.25GB, used=619.88MB

This is DUP, so the actual physical size is (3.25 * 2) = 6.5G

 Metadata: total=8.00MB, used=0.00
 
 That should mean existing block groups is at least 46GB, right? In

so the sum is 50G.

 which case my pool (a 50G partition) should only have about 4GB of
 space not allocated to block groups. The numbers don't seem to match.
 

The pool has been emptied, so there're other reasons that you had only
~2GB trimmed, and the possible reason is fstrim in btrfs is buggy.

I sent a fix weeks ago, which is not merged yet:

http://marc.info/?l=linux-btrfsm=132212530410572w=2


 You can empty the pool by:

# dd if=/dev/zero of=/mytmpfile bs=1M

 Then release the space (but it won't return back to the pool):

# rm /mytmpfile
# sync
 
 Is there a bad side effect of doing so? For example, since all free
 space in the pool would be allocated to data block group, would that
 mean my metadata block group is capped at 3.25GB?

You can config the ratio of data block groups and metadata block groups
via metadata_ratio= mount option.

 Or would some data
 block group can be converted to metadata, and vice versa?
 

This won't happen. Also empty block groups won't be reclaimed, but it's
in TODO list.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fstrim on BTRFS

2011-12-28 Thread Li Zefan
Martin Steigerwald wrote:
 Hi!
 
 With 3.2-rc4 (probably earlier), Ext4 seems to remember what areas it 
 trimmed:
 
 merkaba:~ fstrim -v /boot
 /boot: 224657408 bytes were trimmed
 merkaba:~ fstrim -v /boot
 /boot: 0 bytes were trimmed
 
 
 But BTRFS does not:
 
 merkaba:~ fstrim -v /
 /: 4431613952 bytes were trimmed
 merkaba:~ fstrim -v /
 /: 4341846016 bytes were trimmed
 
 
 Is it planned to add this feature to BTRFS as well?
 

There's no such plan, but it's do-able, and I can take care of it.
There's an issue though.

Whether we want to store TRIMMED information on disk? ext4 doesn't
do this, so the first fstrim will be slow though you've done fstrim
in previous mount.

For btrfs this issue can't be solved without disk format change that
will break older kernels, but only 3.2-rcX kernels will be affected if
we push the following change into mainline before 3.2 release.

---
 ctree.h|4 ++--
 free-space-cache.c |5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6738503..919e055 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -278,8 +278,8 @@ struct btrfs_chunk {
/* additional stripes go here */
 } __attribute__ ((__packed__));
 
-#define BTRFS_FREE_SPACE_EXTENT1
-#define BTRFS_FREE_SPACE_BITMAP2
+#define BTRFS_FREE_SPACE_EXTENT0
+#define BTRFS_FREE_SPACE_BITMAP1
 
 struct btrfs_free_space_entry {
__le64 offset;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ec23d43..8a7c0e0 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -481,6 +481,7 @@ static int io_ctl_add_entry(struct io_ctl *io_ctl, u64 
offset, u64 bytes,
entry-bytes = cpu_to_le64(bytes);
entry-type = (bitmap) ? BTRFS_FREE_SPACE_BITMAP :
BTRFS_FREE_SPACE_EXTENT;
+   entry-type =  1  entry-type;
io_ctl-cur += sizeof(struct btrfs_free_space_entry);
io_ctl-size -= sizeof(struct btrfs_free_space_entry);
 
@@ -669,7 +670,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
goto free_cache;
}
 
-   if (type == BTRFS_FREE_SPACE_EXTENT) {
+   if (type  BTRFS_FREE_SPACE_EXTENT) {
spin_lock(ctl-tree_lock);
ret = link_free_space(ctl, e);
spin_unlock(ctl-tree_lock);
@@ -679,7 +680,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
kmem_cache_free(btrfs_free_space_cachep, e);
goto free_cache;
}
-   } else {
+   } else if (type  BTRFS_FREE_SPACE_BITMAP) {
BUG_ON(!num_bitmaps);
num_bitmaps--;
e-bitmap = kzalloc(PAGE_CACHE_SIZE, GFP_NOFS);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests: add a new test to verify on disk ctime update for chattr

2011-12-21 Thread Li Zefan
We had a bug in btrfs which can be triggered by this test.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 277 |   71 +++
 277.out |1 +
 group   |1 +
 3 files changed, 73 insertions(+), 0 deletions(-)
 create mode 100755 277
 create mode 100644 277.out

diff --git a/277 b/277
new file mode 100755
index 000..8021214
--- /dev/null
+++ b/277
@@ -0,0 +1,71 @@
+#! /bin/bash
+# FS QA Test No. 277
+#
+# Check if ctime update caused by chattr is written to disk
+#
+#---
+# Copyright (c) 2011 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+# creator
+owner=l...@cn.fujitsu.com
+
+seq=`basename $0`
+echo QA output created by $seq
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm -f $SCRATCH_MNT/tmp*
+}
+
+trap _cleanup ; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+
+_scratch_mkfs  /dev/null 21
+_scratch_mount
+
+touch $SCRATCH_MNT/tmp
+_scratch_remount
+ctime1=`stat -c %z $SCRATCH_MNT/tmp`
+
+sleep 1
+chattr +A $SCRATCH_MNT/tmp
+chattr -A $SCRATCH_MNT/tmp
+ctime2=`stat -c %z $SCRATCH_MNT/tmp`
+
+_scratch_remount
+ctime3=`stat -c %z $SCRATCH_MNT/tmp`
+
+if [ $ctime1 == $ctime2 ]; then
+   echo error: ctime not updated after chattr
+elif [ $ctime1 == $ctime3 ]; then
+   echo error: on disk ctime not updated
+else
+   status=0
+fi
+
+exit
diff --git a/277.out b/277.out
new file mode 100644
index 000..9614b16
--- /dev/null
+++ b/277.out
@@ -0,0 +1 @@
+QA output created by 277
diff --git a/group b/group
index dd9f00d..99592d3 100644
--- a/group
+++ b/group
@@ -390,3 +390,4 @@ deprecated
 274 auto rw
 275 auto rw
 276 auto rw metadata
+277 auto ioctl quick metadata
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Consistent crash with cp reflink on large files

2011-12-12 Thread Li Zefan
Nik Markovic wrote:
 Hi All,
 
 I have been encountering consistent btrfs filesystem crashes when
 using cp –reflink=always on a large file and modifying it. I believe
 that the test file needs to be fairly large as I was not able to
 reproduce with smaller files. The filesystem size is 45GB and file
 size is 10GB.
 

It should be fixed in 3.1 by this commit:

commit b6f3409b2197e8fcedb43e6600e37b7cfbe0715b
Author: Sage Weil s...@newdream.net
Date:   Tue Sep 20 14:48:51 2011 -0400

Btrfs: reserve sufficient space for ioctl clone

Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We
need to reserve space for:

 - adjusting the old extent (possibly splitting it)
 - adding the new extent
 - updating the inode
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Consistent crash with cp reflink on large files

2011-12-12 Thread Li Zefan
Nik Markovic wrote:
 Li Zefan wrote:

 Nik Markovic wrote:
 Hi All,

 I have been encountering consistent btrfs filesystem crashes when
 using cp –reflink=always on a large file and modifying it. I believe
 that the test file needs to be fairly large as I was not able to
 reproduce with smaller files. The filesystem size is 45GB and file
 size is 10GB.


 It should be fixed in 3.1 by this commit:

 commit b6f3409b2197e8fcedb43e6600e37b7cfbe0715b
 Author: Sage Weil s...@newdream.net
 Date:   Tue Sep 20 14:48:51 2011 -0400

Btrfs: reserve sufficient space for ioctl clone

Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We
need to reserve space for:

 - adjusting the old extent (possibly splitting it)
 - adding the new extent
 - updating the inode
 Thanks Li,
 
 You are probably right. I upgraded to 3.2-rc5 today and was not able
 to reproduce with 100k iterations.
 Do you think that applying the same patch to 3.0 should fix this issue in 3.0?
 

yes, it should be fine.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG during btrfs device delete missing

2011-12-10 Thread Li Zefan
 On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote:
 raid10 metadata and data filesystem.  dmesg log follows.  The system
 is unable to unmount the filesystem after this occurs.

 Filesystem mounted at/mnt/btrfs with -o compress,degraded
 Command: btrfs device delete missing /mnt/btrfs

 [  283.398222] [ cut here ]
 [  283.398289] kernel BUG at 
 /home/apw/COD/linux/fs/btrfs/transaction.c:1329!
 
 So this crash means we failed to write all the blocks required to commit
 the transaction.  The reason is that we're getting failed bios to the
 missing device, and that failure isn't properly eaten by the
 raid aware endio code.
 
 If you pull the top commit from my for-linus branch, it should all work.
 
 I know you've got a big FS here, I haven't tested this on raid10 yet,
 only raid1.  If you want to wait a bit for safety I'll do a raid10 run
 too.
 

The fix looks good to me, and I've tested it on raid10.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix ctime update of on-disk inode

2011-12-08 Thread Li Zefan
To reproduce the bug:

# touch /mnt/tmp
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:23.412105981 +0800
# chattr +i /mnt/tmp
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:43.198105295 +0800
# umount /mnt
# mount /dev/loop1 /mnt
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:23.412105981 +0800

We should update ctime of in-memory inode before calling
btrfs_update_inode().

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 72d4616..40eaa9f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -252,11 +252,11 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
trans = btrfs_join_transaction(root);
BUG_ON(IS_ERR(trans));
 
+   btrfs_update_iflags(inode);
+   inode-i_ctime = CURRENT_TIME;
ret = btrfs_update_inode(trans, root, inode);
BUG_ON(ret);
 
-   btrfs_update_iflags(inode);
-   inode-i_ctime = CURRENT_TIME;
btrfs_end_transaction(trans, root);
 
mnt_drop_write(file-f_path.mnt);
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix ctime update of on-disk inode

2011-12-08 Thread Li Zefan
Christoph Hellwig wrote:
 On Fri, Dec 09, 2011 at 09:40:35AM +0800, Li Zefan wrote:
 To reproduce the bug:

 # touch /mnt/tmp
 # stat /mnt/tmp | grep Change
 Change: 2011-12-09 09:32:23.412105981 +0800
 # chattr +i /mnt/tmp
 # stat /mnt/tmp | grep Change
 Change: 2011-12-09 09:32:43.198105295 +0800
 # umount /mnt
 # mount /dev/loop1 /mnt
 # stat /mnt/tmp | grep Change
 Change: 2011-12-09 09:32:23.412105981 +0800

 We should update ctime of in-memory inode before calling
 btrfs_update_inode().
 
 Can you submit this one as a test for xfstests?
 

Sure.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix possible deadlock when opening a seed device

2011-12-06 Thread Li Zefan
The correct lock order is uuid_mutex - volume_mutex - chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

open_ctree()
lock(chunk_mutex);
read_chunk_tree();
read_one_dev();
open_seed_devices();
lock(uuid_mutex);

and then we hit a lockdep splat.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |2 --
 fs/btrfs/volumes.c |9 +++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 94abc25..b2d103f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2262,9 +2262,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
   (unsigned long)btrfs_header_chunk_tree_uuid(chunk_root-node),
   BTRFS_UUID_SIZE);
 
-   mutex_lock(fs_info-chunk_mutex);
ret = btrfs_read_chunk_tree(chunk_root);
-   mutex_unlock(fs_info-chunk_mutex);
if (ret) {
printk(KERN_WARNING btrfs: failed to read chunk tree on %s\n,
   sb-s_id);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c37433d..51b8a8d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3562,7 +3562,7 @@ static int open_seed_devices(struct btrfs_root *root, u8 
*fsid)
struct btrfs_fs_devices *fs_devices;
int ret;
 
-   mutex_lock(uuid_mutex);
+   BUG_ON(!mutex_is_locked(uuid_mutex));
 
fs_devices = root-fs_info-fs_devices-seed;
while (fs_devices) {
@@ -3600,7 +3600,6 @@ static int open_seed_devices(struct btrfs_root *root, u8 
*fsid)
fs_devices-seed = root-fs_info-fs_devices-seed;
root-fs_info-fs_devices-seed = fs_devices;
 out:
-   mutex_unlock(uuid_mutex);
return ret;
 }
 
@@ -3743,6 +3742,9 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
if (!path)
return -ENOMEM;
 
+   mutex_lock(uuid_mutex);
+   lock_chunks(root);
+
/* first we search for all of the device items, and then we
 * read in all of the chunk items.  This way we can create chunk
 * mappings that reference all of the devices that are afound
@@ -3793,6 +3795,9 @@ again:
}
ret = 0;
 error:
+   unlock_chunks(root);
+   mutex_unlock(uuid_mutex);
+
btrfs_free_path(path);
return ret;
 }
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix oops when calling statfs on readonly device

2011-11-28 Thread Li Zefan
To reproduce this bug:

  # dd if=/dev/zero of=img bs=1M count=256
  # mkfs.btrfs img
  # losetup -r /dev/loop1 img
  # mount /dev/loop1 /mnt
  OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

  # df -h /dev/loop1
  FilesystemSize  Used Avail Use% Mounted on
  /dev/loop1250M   28K  238M   1% /mnt

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/super.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8bd9d6d..1a3ce9e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1083,7 +1083,7 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
int i = 0, nr_devices;
int ret;
 
-   nr_devices = fs_info-fs_devices-rw_devices;
+   nr_devices = fs_info-fs_devices-open_devices;
BUG_ON(!nr_devices);
 
devices_info = kmalloc(sizeof(*devices_info) * nr_devices,
@@ -1105,8 +1105,8 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
else
min_stripe_size = BTRFS_STRIPE_LEN;
 
-   list_for_each_entry(device, fs_devices-alloc_list, dev_alloc_list) {
-   if (!device-in_fs_metadata)
+   list_for_each_entry(device, fs_devices-devices, dev_list) {
+   if (!device-in_fs_metadata || !device-bdev)
continue;
 
avail_space = device-total_bytes - device-bytes_used;
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: check if the to-be-added device is writable

2011-11-28 Thread Li Zefan
If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c37433d..0a8c8f8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1611,7 +1611,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
if ((sb-s_flags  MS_RDONLY)  !root-fs_info-fs_devices-seeding)
return -EINVAL;
 
-   bdev = blkdev_get_by_path(device_path, FMODE_EXCL,
+   bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
  root-fs_info-bdev_holder);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: btrfs_calc_avail_data_space cope with no read/write devices V2

2011-11-28 Thread Li Zefan
Andy Whitcroft wrote:
 On Mon, Nov 28, 2011 at 06:11:06AM -0600, Jonathan Nieder wrote:
 Hi,

 Andy Whitcroft wrote:

 When we mount a btrfs filesystem from read-only media there will be no
 read/write devices; for example mounting an SD card with its lock enabled.
 This triggers an immediate BUG during mount:

   kernel BUG at .../fs/btrfs/super.c:984!
 [...]
 BugLink: http://bugs.launchpad.net/bugs/816770
 Signed-off-by: Andy Whitcroft a...@canonical.com
 ---

 Fixes http://bugs.debian.org/649847
 Tested-by: Ivan Vilata i Balaguer i...@selidor.net

 This patch fixes a regression introduced by 6d07bcec969a (btrfs: fix
 wrong free space information of btrfs), which hit mainline in
 v2.6.38-rc1.  Josef Bacik acked the patch, but I can't seem to find it
 in linux-next, linux-btrfs, or Josef's btrfs-work tree.  Maybe it was
 just forgotten.

 What can I do to help this patch get unstuck?
 
 I had completely forgotten about this one.  I wonder where it did get
 to.  I seem to remember testing this in house successfully.
 

This patch has the same problem with your previous one, that it will set
f_bavail to 0. I've sent out a new patch yesterday.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/20] Here's my current btrfs patchset

2011-11-28 Thread Li Zefan
Alexandre Oliva wrote:
 The first 11 patches are relatively simple fixes or improvements that
 I suppose go could make it even in 3.2 (02 is particularly essential
 to avoid progressive performance degradation and metadata space waste
 in the default clustered allocation strategy).
 

I think 02 (especially!) and 04 are good candidates for 3.2, and others
are all improvements to me, that can wait until next merge window.

 Patch 12 and its complement 15, and also 19, are debugging aids that
 helped me track down the problem fixed in 02.
 
 Patch 13 is a revised version of the larger-clusters patch I posted
 before, that adds a microoptimization to the bitmap computations to
 the earlier version.
 
 Patches 14 to 20 are probably not suitable for inclusion, and are
 provided only for reference, although I'm still undecided on 16: it
 seems to me to make sense to stick to the ordered list and index
 instead of jumping to the current cluster's block group, but it may
 also make sense performance-wise to start at the current cluster and
 advance from there.  We still do that, as long as we find a cluster
 to begin with, but I'm yet to double check on the race that causes
 multiple subsequent releases/creation of clusters under heavy load.
 I'm sure I saw it, and I no longer do, but now I'm no longer sure
 whether this is the patch that fixed it, or about the details of how
 we came about that scenario.
 
 Patches 14, 17, 18 and 20 were posted before, and I'm probably dropping
 them from future patchsets unless I find them to be still useful.
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: rewrite btrfs_trim_block_group()

2011-11-24 Thread Li Zefan
There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---

v2: fix possible uninitialized return value

---
 fs/btrfs/free-space-cache.c |  235 ++-
 1 files changed, 164 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 6e5b7e4..bd2a3cd 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2586,17 +2586,57 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster 
*cluster)
cluster-block_group = NULL;
 }
 
-int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
-  u64 *trimmed, u64 start, u64 end, u64 minlen)
+static int do_trimming(struct btrfs_block_group_cache *block_group,
+  u64 *total_trimmed, u64 start, u64 bytes,
+  u64 reserved_start, u64 reserved_bytes)
 {
-   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-   struct btrfs_free_space *entry = NULL;
+   struct btrfs_space_info *space_info = block_group-space_info;
struct btrfs_fs_info *fs_info = block_group-fs_info;
-   u64 bytes = 0;
-   u64 actually_trimmed;
-   int ret = 0;
+   int ret;
+   int update = 0;
+   u64 trimmed = 0;
 
-   *trimmed = 0;
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (!block_group-ro) {
+   block_group-reserved += reserved_bytes;
+   space_info-bytes_reserved += reserved_bytes;
+   update = 1;
+   }
+   spin_unlock(block_group-lock);
+   spin_unlock(space_info-lock);
+
+   ret = btrfs_error_discard_extent(fs_info-extent_root,
+start, bytes, trimmed);
+   if (!ret)
+   *total_trimmed += trimmed;
+
+   btrfs_add_free_space(block_group, reserved_start, reserved_bytes);
+
+   if (update) {
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (block_group-ro)
+   space_info-bytes_readonly += reserved_bytes;
+   block_group-reserved -= reserved_bytes;
+   space_info-bytes_reserved -= reserved_bytes;
+   spin_unlock(space_info-lock);
+   spin_unlock(block_group-lock);
+   }
+
+   return ret;
+}
+
+static int trim_no_bitmap(struct btrfs_block_group_cache *block_group,
+ u64 *total_trimmed, u64 start, u64 end, u64 minlen)
+{
+   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
+   struct btrfs_free_space *entry;
+   struct rb_node *node;
+   int ret = 0;
+   u64 extent_start;
+   u64 extent_bytes;
+   u64 bytes;
 
while (start  end) {
spin_lock(ctl-tree_lock);
@@ -2607,81 +2647,118 @@ int btrfs_trim_block_group(struct 
btrfs_block_group_cache *block_group,
}
 
entry = tree_search_offset(ctl, start, 0, 1);
-   if (!entry)
-   entry = tree_search_offset(ctl,
-  offset_to_bitmap(ctl, start),
-  1, 1);
-
-   if (!entry || entry-offset = end) {
+   if (!entry) {
spin_unlock(ctl-tree_lock);
break;
}
 
-   if (entry-bitmap) {
-   ret = search_bitmap(ctl, entry, start, bytes);
-   if (!ret) {
-   if (start = end) {
-   spin_unlock(ctl-tree_lock);
-   break;
-   }
-   bytes = min(bytes, end - start);
-   bitmap_clear_bits(ctl, entry, start, bytes);
-   if (entry-bytes == 0)
-   free_bitmap(ctl, entry);
-   } else {
-   start = entry-offset + BITS_PER_BITMAP *
-   block_group-sectorsize;
+   /* skip bitmaps */
+   while (entry-bitmap) {
+   node = rb_next(entry-offset_index

Re: [PATCH] Btrfs: rewrite btrfs_trim_block_group()

2011-11-23 Thread Li Zefan
David Sterba wrote:
 On Thu, Nov 17, 2011 at 03:26:17PM +0800, Li Zefan wrote:
 diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
 index 8c32434..89cc54e 100644
 --- a/fs/btrfs/free-space-cache.c
 +++ b/fs/btrfs/free-space-cache.c
 +static int trim_no_bitmap(struct btrfs_block_group_cache *block_group,
 +  u64 *total_trimmed, u64 start, u64 end, u64 minlen)
 +{
 +struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
 +struct btrfs_free_space *entry;
 +struct rb_node *node;
 +int ret;
 int ret = 0;
 
   CC [M]  fs/btrfs/free-space-cache.o
 fs/btrfs/free-space-cache.c: In function trim_no_bitmap:
 fs/btrfs/free-space-cache.c:2636: warning: ret may be used uninitialized in 
 this function
 
 Although it seems the uninitialized value could never be returned.
 

Actually it's possible. The odd thing is my gcc doesn't complain about this.

Will fix. Thanks!

--
Li Zefan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] 254: avoid output mismatch due to space cache

2011-11-21 Thread Li Zefan
This reverts commit a0c92a5871082c0aa6a7caae496e67a6e57bb0b6 (disable
space cache), as option nospace_cache was newly introduced in linux-3.2,
so we'll fail to mount btrfs in older kernels.

As an alternative fix, we just list subvolme names, don't assume
what ID numbers those subvolumes will have.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 254 |4 ++--
 254.out |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/254 b/254
index 1cd4832..7b74a02 100755
--- a/254
+++ b/254
@@ -48,7 +48,7 @@ _supported_os Linux
 _require_scratch
 
 _scratch_mkfs  /dev/null 21
-_scratch_mount -o nospace_cache
+_scratch_mount
 
 # First test basic snapshotting
 echo Creating file foo in root dir
@@ -95,7 +95,7 @@ ls $SCRATCH_MNT
 
 # Test listing the subvolumes
 echo Listing subvolumes
-btrfs subvolume list $SCRATCH_MNT | _filter_scratch
+btrfs subvolume list $SCRATCH_MNT | awk '{ print $NF }'
 
 # Delete the snapshot
 btrfs subvolume delete $SCRATCH_MNT/snap | _filter_scratch
diff --git a/254.out b/254.out
index 582357a..d4b5346 100644
--- a/254.out
+++ b/254.out
@@ -31,8 +31,8 @@ List root dir
 snap
 subvol
 Listing subvolumes
-ID 256 top level 5 path snap
-ID 257 top level 5 path subvol
+snap
+subvol
 Delete subvolume 'SCRATCH_MNT/snap'
 List root dir
 subvol
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 254: disable space cache

2011-11-20 Thread Li Zefan
Hidetoshi Seto wrote:
 (2011/11/18 17:43), Li Zefan wrote:
 I can't pass 254, and below is the output:

 254 3s ... - output mismatch (see 254.out.bad)
 ...
  ID 256 top level 5 path snap
 -ID 257 top level 5 path subvol
 +ID 258 top level 5 path subvol

 When space cache is enabled (and now mkfs.btrfs always enables it),
 there will be some space cache inodes in the root tree, and they
 consume some IDs, and that's why subvol has the ID 258 but not 257.

 Just disable space cache for this test case.

 Signed-off-by: Li Zefan l...@cn.fujitsu.com
 ---
  254 |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/254 b/254
 index 5216120..1cd4832 100755
 --- a/254
 +++ b/254
 @@ -48,7 +48,7 @@ _supported_os Linux
  _require_scratch
  
  _scratch_mkfs  /dev/null 21
 -_scratch_mount
 +_scratch_mount -o nospace_cache
  
  # First test basic snapshotting
  echo Creating file foo in root dir
 
 I got following error on fedora 16 with your patch:
 

This is an alternative fix:



[PATCH] 254: avoid output mismatch due to space cache

I can't pass 254, and below is the output:

254 3s ... - output mismatch (see 254.out.bad)
...
 ID 256 top level 5 path snap
-ID 257 top level 5 path subvol
+ID 258 top level 5 path subvol

When space cache is enabled (and now mkfs.btrfs always enables it),
there will be some space cache inodes in the root tree, and they
consume some IDs, and that's why subvol has the ID 258 but not 257.

Just list subvolume names, don't assume what ID numbers those subvolumes
will have.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 254 |2 +-
 254.out |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/254 b/254
index 5216120..7b74a02 100755
--- a/254
+++ b/254
@@ -95,7 +95,7 @@ ls $SCRATCH_MNT
 
 # Test listing the subvolumes
 echo Listing subvolumes
-btrfs subvolume list $SCRATCH_MNT | _filter_scratch
+btrfs subvolume list $SCRATCH_MNT | awk '{ print $NF }'
 
 # Delete the snapshot
 btrfs subvolume delete $SCRATCH_MNT/snap | _filter_scratch
diff --git a/254.out b/254.out
index 582357a..d4b5346 100644
--- a/254.out
+++ b/254.out
@@ -31,8 +31,8 @@ List root dir
 snap
 subvol
 Listing subvolumes
-ID 256 top level 5 path snap
-ID 257 top level 5 path subvol
+snap
+subvol
 Delete subvolume 'SCRATCH_MNT/snap'
 List root dir
 subvol
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 254: disable space cache

2011-11-18 Thread Li Zefan
I can't pass 254, and below is the output:

254 3s ... - output mismatch (see 254.out.bad)
...
 ID 256 top level 5 path snap
-ID 257 top level 5 path subvol
+ID 258 top level 5 path subvol

When space cache is enabled (and now mkfs.btrfs always enables it),
there will be some space cache inodes in the root tree, and they
consume some IDs, and that's why subvol has the ID 258 but not 257.

Just disable space cache for this test case.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 254 |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/254 b/254
index 5216120..1cd4832 100755
--- a/254
+++ b/254
@@ -48,7 +48,7 @@ _supported_os Linux
 _require_scratch
 
 _scratch_mkfs  /dev/null 21
-_scratch_mount
+_scratch_mount -o nospace_cache
 
 # First test basic snapshotting
 echo Creating file foo in root dir
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 254: fix to pass subvolid=0 to set default btrfs subvolme

2011-11-17 Thread Li Zefan
The usage is 'btrfs subvolume set-default id path', not
'path path'.

The code happens to work because strotoull(path) returns 0, but it
will fail if in the future we check the argument more strict in
btrfs-progs.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 254 |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/254 b/254
index 6320291..5216120 100755
--- a/254
+++ b/254
@@ -78,7 +78,7 @@ ls $SCRATCH_MNT/subvol
 echo Creating file bar in subvol
 dd if=/dev/zero of=$SCRATCH_MNT/subvol/bar bs=1M count=1  /dev/null
 echo Setting subvol to the default
-btrfs subvolume set-default $SCRATCH_MNT/subvol $SCRATCH_MNT/subvol | 
_filter_scratch
+btrfs subvolume set-default 0 $SCRATCH_MNT/subvol | _filter_scratch
 _scratch_remount
 echo List root dir which is now subvol
 ls $SCRATCH_MNT
@@ -88,7 +88,7 @@ _scratch_mount -o subvolid=0
 echo List root dir
 ls $SCRATCH_MNT
 echo Setting the root dir as the default again
-btrfs subvolume set-default $SCRATCH_MNT $SCRATCH_MNT | _filter_scratch
+btrfs subvolume set-default 0 $SCRATCH_MNT | _filter_scratch
 _scratch_remount
 echo List root dir
 ls $SCRATCH_MNT
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] Btrfs: fix to search one more bitmap for cluster setup

2011-11-16 Thread Li Zefan
Suppose there are two bitmaps [0, 256], [256, 512] and one extent
[100, 120] in the free space cache, and we want to setup a cluster
with offset=100, bytes=50.

In this case, there will be only one bitmap [256, 512] in the temporary
bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].

The cause is, the list is constructed in setup_cluster_no_bitmap(),
and only bitmaps with bitmap_entry-offset = offset will be added
into the list, and the very bitmap that convers offset has
bitmap_entry-offset = offset.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---

v2: fix a NULL pointer deref.

---
 fs/btrfs/free-space-cache.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 181760f..8f792f4 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2453,11 +2453,23 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
struct btrfs_free_space *entry;
struct rb_node *node;
int ret = -ENOSPC;
+   u64 bitmap_offset = offset_to_bitmap(ctl, offset);
 
if (ctl-total_bitmaps == 0)
return -ENOSPC;
 
/*
+* The bitmap that covers offset won't be in the list unless offset
+* is just its start offset.
+*/
+   entry = list_first_entry(bitmaps, struct btrfs_free_space, list);
+   if (entry-offset != bitmap_offset) {
+   entry = tree_search_offset(ctl, bitmap_offset, 1, 0);
+   if (entry  list_empty(entry-list))
+   list_add(entry-list, bitmaps);
+   }
+
+   /*
 * First check our cached list of bitmaps and see if there is an entry
 * here that will work.
 */
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: rewrite btrfs_trim_block_group()

2011-11-16 Thread Li Zefan
There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |  235 ++-
 1 files changed, 164 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 8c32434..89cc54e 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2575,17 +2575,57 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster 
*cluster)
cluster-block_group = NULL;
 }
 
-int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
-  u64 *trimmed, u64 start, u64 end, u64 minlen)
+static int do_trimming(struct btrfs_block_group_cache *block_group,
+  u64 *total_trimmed, u64 start, u64 bytes,
+  u64 reserved_start, u64 reserved_bytes)
 {
-   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-   struct btrfs_free_space *entry = NULL;
+   struct btrfs_space_info *space_info = block_group-space_info;
struct btrfs_fs_info *fs_info = block_group-fs_info;
-   u64 bytes = 0;
-   u64 actually_trimmed;
-   int ret = 0;
+   int ret;
+   int update = 0;
+   u64 trimmed = 0;
 
-   *trimmed = 0;
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (!block_group-ro) {
+   block_group-reserved += reserved_bytes;
+   space_info-bytes_reserved += reserved_bytes;
+   update = 1;
+   }
+   spin_unlock(block_group-lock);
+   spin_unlock(space_info-lock);
+
+   ret = btrfs_error_discard_extent(fs_info-extent_root,
+start, bytes, trimmed);
+   if (!ret)
+   *total_trimmed += trimmed;
+
+   btrfs_add_free_space(block_group, reserved_start, reserved_bytes);
+
+   if (update) {
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (block_group-ro)
+   space_info-bytes_readonly += reserved_bytes;
+   block_group-reserved -= reserved_bytes;
+   space_info-bytes_reserved -= reserved_bytes;
+   spin_unlock(space_info-lock);
+   spin_unlock(block_group-lock);
+   }
+
+   return ret;
+}
+
+static int trim_no_bitmap(struct btrfs_block_group_cache *block_group,
+ u64 *total_trimmed, u64 start, u64 end, u64 minlen)
+{
+   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
+   struct btrfs_free_space *entry;
+   struct rb_node *node;
+   int ret;
+   u64 extent_start;
+   u64 extent_bytes;
+   u64 bytes;
 
while (start  end) {
spin_lock(ctl-tree_lock);
@@ -2596,81 +2636,118 @@ int btrfs_trim_block_group(struct 
btrfs_block_group_cache *block_group,
}
 
entry = tree_search_offset(ctl, start, 0, 1);
-   if (!entry)
-   entry = tree_search_offset(ctl,
-  offset_to_bitmap(ctl, start),
-  1, 1);
-
-   if (!entry || entry-offset = end) {
+   if (!entry) {
spin_unlock(ctl-tree_lock);
break;
}
 
-   if (entry-bitmap) {
-   ret = search_bitmap(ctl, entry, start, bytes);
-   if (!ret) {
-   if (start = end) {
-   spin_unlock(ctl-tree_lock);
-   break;
-   }
-   bytes = min(bytes, end - start);
-   bitmap_clear_bits(ctl, entry, start, bytes);
-   if (entry-bytes == 0)
-   free_bitmap(ctl, entry);
-   } else {
-   start = entry-offset + BITS_PER_BITMAP *
-   block_group-sectorsize;
+   /* skip bitmaps */
+   while (entry-bitmap) {
+   node = rb_next(entry-offset_index);
+   if (!node

[PATCH 2/2] Btrfs: avoid unnecessary bitmap search for cluster setup

2011-11-15 Thread Li Zefan
setup_cluster_no_bitmap() searches all the extents and bitmaps starting
from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
from offset are in the bitmaps list, so it's sufficient to search from
this list in setup_cluser_bitmap().

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |   42 --
 1 files changed, 4 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index df3bc22..584ef14 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2451,7 +2451,6 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
 {
struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
struct btrfs_free_space *entry;
-   struct rb_node *node;
int ret = -ENOSPC;
u64 bitmap_offset = offset_to_bitmap(ctl, offset);
 
@@ -2469,10 +2468,6 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
list_add(entry-list, bitmaps);
}
 
-   /*
-* First check our cached list of bitmaps and see if there is an entry
-* here that will work.
-*/
list_for_each_entry(entry, bitmaps, list) {
if (entry-bytes  min_bytes)
continue;
@@ -2483,38 +2478,10 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
}
 
/*
-* If we do have entries on our list and we are here then we didn't find
-* anything, so go ahead and get the next entry after the last entry in
-* this list and start the search from there.
+* The bitmaps list has all the bitmaps that record free space
+* starting after offset, so no more search is required.
 */
-   if (!list_empty(bitmaps)) {
-   entry = list_entry(bitmaps-prev, struct btrfs_free_space,
-  list);
-   node = rb_next(entry-offset_index);
-   if (!node)
-   return -ENOSPC;
-   entry = rb_entry(node, struct btrfs_free_space, offset_index);
-   goto search;
-   }
-
-   entry = tree_search_offset(ctl, offset_to_bitmap(ctl, offset), 0, 1);
-   if (!entry)
-   return -ENOSPC;
-
-search:
-   node = entry-offset_index;
-   do {
-   entry = rb_entry(node, struct btrfs_free_space, offset_index);
-   node = rb_next(entry-offset_index);
-   if (!entry-bitmap)
-   continue;
-   if (entry-bytes  min_bytes)
-   continue;
-   ret = btrfs_bitmap_cluster(block_group, entry, cluster, offset,
-  bytes, min_bytes);
-   } while (ret  node);
-
-   return ret;
+   return -ENOSPC;
 }
 
 /*
@@ -2532,8 +2499,8 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle 
*trans,
 u64 offset, u64 bytes, u64 empty_size)
 {
struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-   struct list_head bitmaps;
struct btrfs_free_space *entry, *tmp;
+   LIST_HEAD(bitmaps);
u64 min_bytes;
int ret;
 
@@ -2572,7 +2539,6 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle 
*trans,
goto out;
}
 
-   INIT_LIST_HEAD(bitmaps);
ret = setup_cluster_no_bitmap(block_group, cluster, bitmaps, offset,
  bytes, min_bytes);
if (ret)
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/8] btrfs: added helper functions to iterate backrefs

2011-11-02 Thread Li Zefan
(as this is going to be merged into mainline..)

 +/*
 + * calls iterate() for every inode that references the extent identified by
 + * the given parameters. will use the path given as a parameter and return it
 + * released.
 + * when the iterator function returns a non-zero value, iteration stops.
 + */
 +int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
 + struct btrfs_path *path,
 + u64 extent_item_objectid,
 + u64 extent_offset,
 + iterate_extent_inodes_t *iterate, void *ctx)

While trying to use this API, I found there's a big defect in this function.

   fs_tree 1   fs_tree 2
   \  /
\/
 \  /
  \/
 node
  |
  |
 leaf  (EXTENT_DATA item)

In the above case, the function will find only 1 reference.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag -c

2011-10-27 Thread Li Zefan
Stephane Chazelas wrote:
 I don't quite understand the behavior of btrfs fi defrag
 
 ~# truncate -s2G ~/a
 ~# mkfs.btrfs ~/a
 nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
 ~# mount -o loop ~/a /mnt/1
 /mnt/1# cd x
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G   64K  1.8G   1% /mnt/1
 /mnt/1# yes | head -c400M  a
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G   64K  1.8G   1% /mnt/1
 /mnt/1# sync
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G  402M  1.4G  23% /mnt/1
 /mnt/1# btrfs fi defrag -c a
 
 (exit status == 20 BTW).
 

int do_defrag(int ac, char **av)
{
...
return errors + 20;
}

This doesn't make sense to me.

 (20)/mnt/1# sync
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G  415M  994M  30% /mnt/1
 
 No space gain, even lost 15M or 400M depending on how you look at it.
 

Here's mine:

# df . -h
FilesystemSize  Used Avail Use% Mounted on
/home/lizf/tmp/a  2.0G  409M  1.4G  23% /mnt

And I was not suprised, as there's a regression.

With this fix:

http://marc.info/?l=linux-btrfsm=131495014823121w=2

# df . -h
FilesystemSize  Used Avail Use% Mounted on
/home/lizf/tmp/a  2.0G   14M  1.8G   1% /mnt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag -c

2011-10-27 Thread Li Zefan
 When the FS is mounted with compress:
 
 ~# mkfs.btrfs ./a
 nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
 ~# mount -o compress ./a /mnt/1
 ~# cd /mnt/1
 /mnt/1# yes | head -c400M  a
 /mnt/1# sync
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G   14M  1.8G   1% /mnt/1
 /mnt/1# btrfs fi defrag -c ./a
 (20)/mnt/1# sync
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G   21M  1.4G   2% /mnt/1
 
 Lost 400M?
 
 /mnt/1# btrfs fi defrag ./a
 (20)/mnt/1# sync
 /mnt/1# df -h .
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/loop1  2.0G   21M  1.4G   2% /mnt/1
 
 I take it it doesn't uncompress?
 
 I'm a bit confused here.
 

Yes, it won't be uncompressed if you mount with compress option.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs: fix defragmentation regression

2011-10-18 Thread Li Zefan
16:48, Dan Merillat wrote:
 On Fri, Sep 2, 2011 at 4:42 AM, Christoph Hellwig h...@infradead.org wrote:
 On Fri, Sep 02, 2011 at 03:56:25PM +0800, Li Zefan wrote:
 There's an off-by-one bug:

   # create a file with lots of 4K file extents
   # btrfs fi defrag /mnt/file
   # sync
   # filefrag -v /mnt/file
   Filesystem type is: 9123683e
   File size of /mnt/file is 1228800 (300 blocks, blocksize 4096)
ext logical physical expected length flags
  0   0 3372  64
  1  64 3136 3435  1
  2  65 3436 3136 64
  3 129 3201 3499  1
  4 130 3500 3201 64
  5 194 3266 3563  1
  6 195 3564 3266 64
  7 259 3331 3627  1
  8 260 3628 3331 40 eof

 After this patch:

 Can you please create an xfstests testcase for this?
 
 Did this fix get lost?  I don't see it in git, and defragmenting a
 file still results in 10x as many fragments as it started with.
 (3.1-rc9)
 

No, it's queued for 3.2, but I think it's a good candidate for 3.1.x.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix recursive auto-defrag

2011-10-05 Thread Li Zefan
Follow those steps:

  # mount -o autodefrag /dev/sda7 /mnt
  # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1
  # sync
  # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc

and then it'll go into a loop: writeback - defrag - writeback ...

It's because writeback writes [8K, 200K] and then writes [0, 8K].

I tried to make writeback know if the pages are dirtied by defrag,
but the patch was a bit intrusive. Here I simply set writeback_index
when we defrag a file.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 970977a..7a10f94 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1047,6 +1047,13 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
if (!max_to_defrag)
max_to_defrag = last_index - 1;
 
+   /*
+* make writeback starts from i, so the defrag range can be
+* written sequentially.
+*/
+   if (i  inode-i_mapping-writeback_index)
+   inode-i_mapping-writeback_index = i;
+
while (i = last_index  defrag_count  max_to_defrag) {
/*
 * make sure we stop running if someone unmounts
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: File compression control, again.

2011-09-27 Thread Li Zefan
01:17, Artem worte:
 Hi!
 
 So, it makes sense to keep the compression on by default
 and with LZO many people are going there.
 
 But, I want to create a database on a compressed btrfs filesystem
 which is seek-heavy rather than throughput-heavy
 and I really want to turn the compression off just for that database
 (smaller I/O reads, more precise cache usage).
 
 Maintenance nightmare that Miguel mentioned
 ( http://thread.gmane.org/gmane.comp.file-systems.btrfs/1665/focus=1669 )
 isn't really a problem as I'm going to automatically set the flags
 from the program, just before opening the database.
 
 Looking at fs/btrfs/ioctl.h I don't see any way to to this.
 Subvolume compression control isn't working either from what I heard
 (cf. https://bbs.archlinux.org/viewtopic.php?id=126635
  http://thread.gmane.org/gmane.comp.file-systems.btrfs/6237/ ).
 
 Am I right that fine-grained compression control is no longer supported by 
 btrfs?
 If so, I would like to vote for it to be added.
 

See this Per file/directory controls for COW and compression:

http://marc.info/?l=linux-btrfsm=130078867208491w=2

And the user tool patch (which got no reply):

http://marc.info/?l=linux-btrfsm=130311215721242w=2

So you can create a directory, and set the no-compress flag for it, and
then any file created in that dir will inherit the flag.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug

2011-09-25 Thread Li Zefan
Roman Kapusta wrote:
 Segfault and kernel bug after reflinking several files (20-30 files)
 each of size around 1GB through following command:
 
 cp -r --reflink sorce_dir .
 Segmentation fault
 
 I'm on Fedora 15, kernel 3.0.4

This has been fixed in Linux 3.1-rc7.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inefficient storing of ISO images with compress=lzo

2011-09-19 Thread Li Zefan
07:06, Maciej Marcin Piechotka wrote:
 On Mon, 2011-09-19 at 10:53 +0800, Li Zefan wrote:
 Maciej Marcin Piechotka wrote:
 I've noticed that:

  - with x86-64 Fedora 15 DVD install images:
- du -sh ROOT VOLUME was 36 GB
- btrfs df | grep -i data have shown over 40 GB used
  - without
- du -sh ROOT VOLUME is 34 GB
- btrfs df | grep -i data have shown less then 34 GB used

 It seems that iso files are considered compressable while they may not be 
 (and penalty is severe - 3x).


 With compress option specified, btrfs will try to compress the file, at most
 128K at one time, and if the compressed result is not smaller, the file will
 be marked as uncompressable.

 I just tried with Fedora-14-i386-DVD.iso, and the first 896K is compressed,
 with a compress ratio about 71.7%, and the remaining data is not compressed.

 --
 Li Zefan
 
 Just a question from person who don't know how btrfs operates - what if
 the beginning of file is well compressable and the rest is not?
 

It's explained in the previous mail - the beginning part will be compressed,
and the rest will not.

 In any case the compression was my uneducated guess where is missing
 4GB.
 

It probably has nothing to do with compression. You can try without 
compress=lzo,
and see if the issue still exists.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inefficient storing of ISO images with compress=lzo

2011-09-18 Thread Li Zefan
Maciej Marcin Piechotka wrote:
 I've noticed that:
 
  - with x86-64 Fedora 15 DVD install images:
- du -sh ROOT VOLUME was 36 GB
- btrfs df | grep -i data have shown over 40 GB used
  - without
- du -sh ROOT VOLUME is 34 GB
- btrfs df | grep -i data have shown less then 34 GB used
 
 It seems that iso files are considered compressable while they may not be 
 (and penalty is severe - 3x).
 

With compress option specified, btrfs will try to compress the file, at most
128K at one time, and if the compressed result is not smaller, the file will
be marked as uncompressable.

I just tried with Fedora-14-i386-DVD.iso, and the first 896K is compressed,
with a compress ratio about 71.7%, and the remaining data is not compressed.

--
Li Zefan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix pages truncation in btrfs_ioctl_clone()

2011-09-18 Thread Li Zefan
 Also move the function before locking extent state.
 
 Hmm, any reason?  i_mutex protects us from a racing write(2), but what 
 about a racing mmap()?  e.g.
 
 cloner: truncates dest pages
 writer: mmap - page_mkwrite locks extent, creates new dirty page, unlocks
 cloner: locks extent, clones, unlocks extent
 

(besides Chris' comments)

How can we avoid the race on dst file by locking src file extent state..

--
Li Zefan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inefficient storing of ISO images with compress=lzo

2011-09-18 Thread Li Zefan
Li Zefan wrote:
 Maciej Marcin Piechotka wrote:
 I've noticed that:

  - with x86-64 Fedora 15 DVD install images:
- du -sh ROOT VOLUME was 36 GB
- btrfs df | grep -i data have shown over 40 GB used
  - without
- du -sh ROOT VOLUME is 34 GB
- btrfs df | grep -i data have shown less then 34 GB used

 It seems that iso files are considered compressable while they may not be 
 (and penalty is severe - 3x).

 
 With compress option specified, btrfs will try to compress the file, at most
 128K at one time, and if the compressed result is not smaller, the file will
 be marked as uncompressable.
 
 I just tried with Fedora-14-i386-DVD.iso, and the first 896K is compressed,
 with a compress ratio about 71.7%, and the remaining data is not compressed.
 

correct: the compression ratio is 38.3%, pretty good :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix pages truncation in btrfs_ioctl_clone()

2011-09-16 Thread Li Zefan
It's a bug in commit f81c9cdc567cd3160ff9e64868d9a1a7ee226480
(Btrfs: truncate pages from clone ioctl target range)

We should pass the dest range to the truncate function, but not the
src range.

Also move the function before locking extent state.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5492bb3..f6af8b0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2237,6 +2237,10 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
goto out_unlock;
}
 
+   /* truncate page cache pages from target inode range */
+   truncate_inode_pages_range(inode-i_data, destoff,
+  PAGE_CACHE_ALIGN(destoff + len) - 1);
+
/* do any pending delalloc/csum calc on src, one way or
   another, and lock file content */
while (1) {
@@ -2253,10 +2257,6 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
btrfs_wait_ordered_range(src, off, len);
}
 
-   /* truncate page cache pages from target inode range */
-   truncate_inode_pages_range(inode-i_data, off,
-  ALIGN(off + len, PAGE_CACHE_SIZE) - 1);
-
/* clone data */
key.objectid = btrfs_ino(src);
key.type = BTRFS_EXTENT_DATA_KEY;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: don't make a file partly checksummed through file clone

2011-09-15 Thread Li Zefan
David Sterba wrote:
 On Wed, Sep 14, 2011 at 01:25:21PM +0800, Li Zefan wrote:
 It's because part of the file is checksummed and the other part is not,
 and then btrfs will complain checksum is not found when we read the file.

 Disallow file clone if src and dst file have different checksum flag,
 so we ensure a file is completely checksummed or unchecksummed.
 
 Your fix prevents the bug, but I don't think it's good to let file clone
 fail without any other message. ret is set to -EINVAL at the time of
 'goto out_fput', which is fine, but the user has no clue what happened
 or how to fix it.

While I agree with you on this comment..

 
 The nodatasum status is recorded in inode flags and remains like that
 regardless of the 'mount -o nodatasum', persistent and de facto
 unchangable (unless the file is created again with the opposite nodatasum
 mount). Even more, the user has no way to find out nodatasum flag of
 any inode/file (the corresponding FS_NODATASUM_FL is not there).
 
 My suggestion how to fix this:
 1. add FS_NODATASUM_FL file flag and code to set/get via setflags ioctl

This means we can have a file partly checksummed, which is what we want
to avoid in this patch.

 2. [this patch to skip cloning in case of nodatasum flag mismatch]
 3. ... add a printk why it failed

I don't think this is a good idea.

 
 The user then has at least option to drop/add the nodatasum flag for one of
 the. Unfortunatelly this makes file cloning less straightforward.
 

I don't know if Chris has plan on finer-grained checksum (not per file
but per extent), if yes, we can eliminate this constraint in file cloning
in the future.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: don't make a file partly checksummed through file clone

2011-09-13 Thread Li Zefan
To reproduce the bug:

  # mount /dev/sda7 /mnt
  # dd if=/dev/zero of=/mnt/src bs=4K count=1
  # umount /mnt

  # mount -o nodatasum /dev/sda7 /mnt
  # dd if=/dev/zero of=/mnt/dst bs=4K count=1
  # clone_range -s 4K -l 4K /mnt/src /mnt/dst

  # echo 3  /proc/sys/vm/drop_caches
  # cat /mnt/dst
  # dmesg
  ...
  btrfs no csum found for inode 258 start 0
  btrfs csum failed ino 258 off 0 csum 2566472073 private 0

It's because part of the file is checksummed and the other part is not,
and then btrfs will complain checksum is not found when we read the file.

Disallow file clone if src and dst file have different checksum flag,
so we ensure a file is completely checksummed or unchecksummed.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 970977a..dc82bbb 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2177,6 +2177,11 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
if (!(src_file-f_mode  FMODE_READ))
goto out_fput;
 
+   /* don't make the dst file partly checksummed */
+   if ((BTRFS_I(src)-flags  BTRFS_INODE_NODATASUM) !=
+   (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM))
+   goto out_fput;
+
ret = -EISDIR;
if (S_ISDIR(src-i_mode) || S_ISDIR(inode-i_mode))
goto out_fput;
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: don't change inode flag of the dest clone file

2011-09-13 Thread Li Zefan
The dst file will have the same inode flags with dst file after
file clone, and I think it's unexpected.

For example, the dst file will suddenly become immutable after
getting some share of data with src file, if the src is immutable.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dc82bbb..a401514 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2434,7 +2434,6 @@ static noinline long btrfs_ioctl_clone(struct file *file, 
unsigned long srcfd,
if (endoff  inode-i_size)
btrfs_i_size_write(inode, endoff);
 
-   BTRFS_I(inode)-flags = BTRFS_I(src)-flags;
ret = btrfs_update_inode(trans, root, inode);
BUG_ON(ret);
btrfs_end_transaction(trans, root);
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: trivial fix, a potential memory leak in btrfs_parse_early_options()

2011-09-13 Thread Li Zefan
14:06, Jeff Liu wrote:
 Signed-off-by: Jie Liu jeff@oracle.com
 
 ---
  fs/btrfs/super.c |   10 --
  1 files changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 15634d4..16f31e1 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -406,7 +406,7 @@ static int btrfs_parse_early_options(const char *options, 
 fmode_t flags,
  u64 *subvol_rootid, struct btrfs_fs_devices **fs_devices)
  {
  substring_t args[MAX_OPT_ARGS];
 -char *opts, *orig, *p;
 +char *device_name, *opts, *orig, *p;

Seems your email client replaced tabs with spaces.

Please read Documentation/email-clients.txt

  int error = 0;
  int intarg;
 
 @@ -457,8 +457,14 @@ static int btrfs_parse_early_options(const char 
 *options, fmode_t flags,
  }
  break;
  case Opt_device:
 -error = btrfs_scan_one_device(match_strdup(args[0]),
 +device_name = match_strdup(args[0]);
 +if (!device_name) {
 +error = -ENOMEM;
 +goto out_free_opts;
 +}
 +error = btrfs_scan_one_device(device_name,
  flags, holder, fs_devices);
 +kfree(device_name);
  if (error)
  goto out_free_opts;
  break;
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifying TRIM support on SSDs when using BtrFS

2011-09-08 Thread Li Zefan
(resend with linux-btrfs Cced)

 Hi All,
 
 My company is considering using BtrFS for some of our production
 systems, but before we can use it, I must positively verify that the
 TRIM command is being issued to the SSDs. 營 attempted to verify TRIM
 first using examples that work for Ext4
 http://andyduffell.com/techblog/?p=852, but I had to make changes to
 the procedure to work with BtrFS.
 
 The server is Ubuntu 11.04 server 64-bit release (mkfs.btrfs version
 0.19). I have installed the Linux 3.0.0 kernel as the BtrFS changelog
 states that bulk TRIM is not available in the kernel shipped with
 Ubuntu 11.04 (2.6.38).
 
 My testing methodology:
 
 * Manually TRIM the disks before starting: for i in {0..10} ; do let
 A=$i * 65536 ; hdparm --trim-sector-ranges $A:65535
 --please-destroy-my-drive /dev/sda ; done
 * Verify the drive was TRIM'd: ./sectors.pl |grep + | tee sectors-$(date +%s)
 * Partition the drive: fdisk /dev/sda
 * Make the file system: mkfs.btrfs /dev/sda1
 * Mount: sudo mount -t btrfs -o ssd /dev/sda1 /mnt
 * Create a file: dd if=/dev/urandom of=/mnt/testfile bs=1k count=5
 oflag=direct
 * Verify the file is on the disk: ./sectors.pl | tee sectors-$(date +%s)
 * Delete the test file: rm /mnt/testfile
 * See that the test file is TRIM'd from the disk: ./sectors.pl | tee
 sectors-$(date +%s)
 * Verify the TRIM'd blocks: diff the two most recent sectors-* files
 
 At this point, the pre-delete and post delete verifications still show
 the same disk blocks in use. I should instead see a reduction in the
 number of in use blocks. Waiting an hour (in case it takes a while for
 the TRIM command to be issued) after the test file is deleted still
 shows the same blocks in use.
 
 I have also tried mounting with the -o ssd,discard options, but that
 doesn't seem to help at all.
 
 I also posted this question at
 http://serverfault.com/questions/307397/verify-trim-support-with-btrfs-on-ssd
 but a good answer has yet to surface.  If you are interested, you can
 see my sectors.pl and a little more detail at that URL.
 
 Is my testing methodology flawed?  Am I missing something here?
 Thanks for your help.
 

Have you tried to use this methodology to test ext4? So you can verify if
it's flawed.

I used a much simpler way to test this, and I found btrfs works fine:

  # mount -t btrfs -o discard /dev/sdc5 /mnt
  # dd if=/dev/urandom of=/mnt/tmp bs=1M count=5
  # sync
  # hexdump /mnt/tmp
  000 064c ded5 c386 4721 060c 13e8 b090 b4a0
  ...
  # hexdump /dev/sdc5 | grep '064c ded5'
  0c0 064c ded5 c386 4721 060c 13e8 b090 b4a0

Now verify the discard feature:

  # rm /mnt/tmp
  # sync
  # echo 3  /proc/sys/vm/drop_caches
  # hexdump /dev/sdc5 | grep '064c ded5'
  (grep returns nothing, as expected)

So there's nothing wrong in btrfs.

Try the above test but without discard option, and you'll see the sectors
will not be zeroed.

--
Li Zefan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove BUG_ON() in compress_file_range()

2011-09-07 Thread Li Zefan
It's not a big deal if we fail to allocate the array, and instead of
panic we can just give up compressing.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/inode.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0ccc743..63b4fc0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -393,7 +393,10 @@ again:
 (BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))) {
WARN_ON(pages);
pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
-   BUG_ON(!pages);
+   if (!pages) {
+   /* just bail out to the uncompressed code */
+   goto cont;
+   }
 
if (BTRFS_I(inode)-force_compress)
compress_type = BTRFS_I(inode)-force_compress;
@@ -424,6 +427,7 @@ again:
will_compress = 1;
}
}
+cont:
if (start == 0) {
trans = btrfs_join_transaction(root);
BUG_ON(IS_ERR(trans));
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix direct-io vs nodatacow

2011-09-07 Thread Li Zefan
To reproduce the bug:

  # mount -o nodatacow /dev/sda7 /mnt/
  # dd if=/dev/zero of=/mnt/tmp bs=4K count=1
  1+0 records in
  1+0 records out
  4096 bytes (4.1 kB) copied, 0.000136115 s, 30.1 MB/s
  # dd if=/dev/zero of=/mnt/tmp bs=4K count=1 conv=notrunc oflag=direct
  dd: writing `/mnt/tmp': Input/output error
  1+0 records in
  0+0 records out

btrfs_ordered_update_i_size() may return 1, but btrfs_endio_direct_write()
mistakenly takes it as an error.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/inode.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0ccc743..3bd35fe 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5785,8 +5785,7 @@ again:
if (test_bit(BTRFS_ORDERED_NOCOW, ordered-flags)) {
ret = btrfs_ordered_update_i_size(inode, 0, ordered);
if (!ret)
-   ret = btrfs_update_inode(trans, root, inode);
-   err = ret;
+   err = btrfs_update_inode(trans, root, inode);
goto out;
}
 
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix array bound checking

2011-09-06 Thread Li Zefan
Otherwise we can execced the array bound of path-slots[].

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ctree.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 011cab3..0fe615e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -902,9 +902,10 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
 
orig_ptr = btrfs_node_blockptr(mid, orig_slot);
 
-   if (level  BTRFS_MAX_LEVEL - 1)
+   if (level  BTRFS_MAX_LEVEL - 1) {
parent = path-nodes[level + 1];
-   pslot = path-slots[level + 1];
+   pslot = path-slots[level + 1];
+   }
 
/*
 * deal with the case where there is only one pointer in the root
@@ -1107,9 +1108,10 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
mid = path-nodes[level];
WARN_ON(btrfs_header_generation(mid) != trans-transid);
 
-   if (level  BTRFS_MAX_LEVEL - 1)
+   if (level  BTRFS_MAX_LEVEL - 1) {
parent = path-nodes[level + 1];
-   pslot = path-slots[level + 1];
+   pslot = path-slots[level + 1];
+   }
 
if (!parent)
return 1;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rc6] btrfs: xattr: fix attribute removal

2011-09-06 Thread Li Zefan
David Sterba wrote:
 An attribute is not removed by 'setfattr -x attr file' and remains
 visible in attr list. This makes xfstests/062 pass again.
 
 Signed-off-by: David Sterba dste...@suse.cz

Reviewed-by: Li Zefan l...@cn.fujitsu.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Btrfs: fix defragmentation regression

2011-09-02 Thread Li Zefan
There's an off-by-one bug:

  # create a file with lots of 4K file extents
  # btrfs fi defrag /mnt/file
  # sync
  # filefrag -v /mnt/file
  Filesystem type is: 9123683e
  File size of /mnt/file is 1228800 (300 blocks, blocksize 4096)
   ext logical physical expected length flags
 0   0 3372  64
 1  64 3136 3435  1
 2  65 3436 3136 64
 3 129 3201 3499  1
 4 130 3500 3201 64
 5 194 3266 3563  1
 6 195 3564 3266 64
 7 259 3331 3627  1
 8 260 3628 3331 40 eof

After this patch:

  ...
  # filefrag -v /mnt/file
  Filesystem type is: 9123683e
  File size of /mnt/file is 1228800 (300 blocks, blocksize 4096)
   ext logical physical expected length flags
 0   0 3372 300 eof
  /mnt/file: 1 extent found

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 970977a..31fe6d4 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1081,7 +1081,6 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
 
defrag_count += ret;
balance_dirty_pages_ratelimited_nr(inode-i_mapping, ret);
-   i += ret;
 
if (newer_than) {
if (newer_off == (u64)-1)
@@ -1101,7 +1100,10 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
break;
}
} else {
-   i++;
+   if (ret  0)
+   i += ret;
+   else
+   i++;
}
}
 
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Btrfs: use i_size_read() in btrfs_defrag_file()

2011-09-02 Thread Li Zefan
Don't use inode-i_size directly, since we're not holding i_mutex.

This also fixes another bug, that i_size can change after it's checked
against 0 and then (i_size - 1) can be negative.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 31fe6d4..6f2b257 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -972,6 +972,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
struct btrfs_super_block *disk_super;
struct file_ra_state *ra = NULL;
unsigned long last_index;
+   u64 isize = i_size_read(inode);
u64 features;
u64 last_len = 0;
u64 skip = 0;
@@ -997,7 +998,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
compress_type = range-compress_type;
}
 
-   if (inode-i_size == 0)
+   if (isize == 0)
return 0;
 
/*
@@ -1022,10 +1023,10 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
 
/* find the last page to defrag */
if (range-start + range-len  range-start) {
-   last_index = min_t(u64, inode-i_size - 1,
+   last_index = min_t(u64, isize - 1,
 range-start + range-len - 1)  PAGE_CACHE_SHIFT;
} else {
-   last_index = (inode-i_size - 1)  PAGE_CACHE_SHIFT;
+   last_index = (isize - 1)  PAGE_CACHE_SHIFT;
}
 
if (newer_than) {
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Btrfs: fix wrong max_to_defrag in btrfs_defrag_file()

2011-09-02 Thread Li Zefan
It's off-by-one, and thus we may skip the last page while defragmenting.

An example case:

  # create /mnt/file with 2 4K file extents
  # btrfs fi defrag /mnt/file
  # sync
  # filefrag /mnt/file
  /mnt/file: 2 extents found

So it's not defragmented.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6f2b257..57aa5b7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1046,7 +1046,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
i = range-start  PAGE_CACHE_SHIFT;
}
if (!max_to_defrag)
-   max_to_defrag = last_index - 1;
+   max_to_defrag = last_index;
 
while (i = last_index  defrag_count  max_to_defrag) {
/*
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Btrfs: honor extent thresh during defragmentation

2011-09-02 Thread Li Zefan
We won't defrag an extent, if it's bigger than the threshold we
specified and there's no small extent before it, but actually
the code doesn't work this way.

There are three bugs:

- When should_defrag_range() decides we should keep on defragmenting
  an extent, last_len is not incremented. (old bug)

- The length that passes to should_defrag_range() is not the length
  we're going to defrag. (new bug)

- We always defrag 256K bytes data, and a big extent can be part of
  this range. (new bug)

For a file with 4 extents:

| 4K | 4K | 256K | 256K |

The result of defrag with (the default) 256K extent thresh should be:

| 264K | 256K |

but with those bugs, we'll get:

| 520K |

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 57aa5b7..90d7904 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -760,7 +760,7 @@ static int should_defrag_range(struct inode *inode, u64 
start, u64 len,
int ret = 1;
 
/*
-* make sure that once we start defragging and extent, we keep on
+* make sure that once we start defragging an extent, we keep on
 * defragging it
 */
if (start  *defrag_end)
@@ -805,7 +805,6 @@ static int should_defrag_range(struct inode *inode, u64 
start, u64 len,
 * extent will force at least part of that big extent to be defragged.
 */
if (ret) {
-   *last_len += len;
*defrag_end = extent_map_end(em);
} else {
*last_len = 0;
@@ -978,13 +977,14 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
u64 skip = 0;
u64 defrag_end = 0;
u64 newer_off = range-start;
-   int newer_left = 0;
unsigned long i;
+   unsigned long ra_index = 0;
int ret;
int defrag_count = 0;
int compress_type = BTRFS_COMPRESS_ZLIB;
int extent_thresh = range-extent_thresh;
-   int newer_cluster = (256 * 1024)  PAGE_CACHE_SHIFT;
+   int max_cluster = (256 * 1024)  PAGE_CACHE_SHIFT;
+   int cluster = max_cluster;
u64 new_align = ~((u64)128 * 1024 - 1);
struct page **pages = NULL;
 
@@ -1014,7 +1014,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
ra = file-f_ra;
}
 
-   pages = kmalloc(sizeof(struct page *) * newer_cluster,
+   pages = kmalloc(sizeof(struct page *) * max_cluster,
GFP_NOFS);
if (!pages) {
ret = -ENOMEM;
@@ -1039,7 +1039,6 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
 * the extents in the file evenly spaced
 */
i = (newer_off  new_align)  PAGE_CACHE_SHIFT;
-   newer_left = newer_cluster;
} else
goto out_ra;
} else {
@@ -1071,12 +1070,26 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
i = max(i + 1, next);
continue;
}
+
+   if (!newer_than) {
+   cluster = (PAGE_CACHE_ALIGN(defrag_end) 
+  PAGE_CACHE_SHIFT) - i;
+   cluster = min(cluster, max_cluster);
+   } else {
+   cluster = max_cluster;
+   }
+
if (range-flags  BTRFS_DEFRAG_RANGE_COMPRESS)
BTRFS_I(inode)-force_compress = compress_type;
 
-   btrfs_force_ra(inode-i_mapping, ra, file, i, newer_cluster);
+   if (i + cluster  ra_index) {
+   ra_index = max(i, ra_index);
+   btrfs_force_ra(inode-i_mapping, ra, file, ra_index,
+  cluster);
+   ra_index += max_cluster;
+   }
 
-   ret = cluster_pages_for_defrag(inode, pages, i, newer_cluster);
+   ret = cluster_pages_for_defrag(inode, pages, i, cluster);
if (ret  0)
goto out_ra;
 
@@ -1096,15 +1109,17 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
if (!ret) {
range-start = newer_off;
i = (newer_off  new_align)  PAGE_CACHE_SHIFT;
-   newer_left = newer_cluster;
} else {
break;
}
} else {
-   if (ret  0)
+   if (ret  0) {
i += ret;
-   else
+   last_len += ret  PAGE_CACHE_SHIFT;
+   } else {
i

Re: [PATCH] btrfs: fix memory leak in btrfs_defrag_file

2011-09-01 Thread Li Zefan
Diego Calleja wrote:
 kmemleak found this:
 unreferenced object 0x8801b64af968 (size 512):
   comm btrfs-cleaner, pid 3317, jiffies 4306810886 (age 903.272s)
   hex dump (first 32 bytes):
 00 82 01 07 00 ea ff ff c0 83 01 07 00 ea ff ff  
 80 82 01 07 00 ea ff ff c0 87 01 07 00 ea ff ff  
   backtrace:
 [816875cc] kmemleak_alloc+0x5c/0xc0
 [8114aec3] kmem_cache_alloc_trace+0x163/0x240
 [8127a290] btrfs_defrag_file+0xf0/0xb20
 [8125d9a5] btrfs_run_defrag_inodes+0x165/0x210
 [812479d7] cleaner_kthread+0x177/0x190
 [81075c7d] kthread+0x8d/0xa0
 [816af5f4] kernel_thread_helper+0x4/0x10
 [] 0x
 
 pages is not always freed. Fix it removing the unnecesary additional return.
 
 Signed-off-by: Diego Calleja dieg...@gmail.com
 

Reviewed-by: Li Zefan l...@cn.fujitsu.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: check file extent backref offset underflow

2011-08-28 Thread Li Zefan
Yan, Zheng wrote:
 Offset field in data extent backref can underflow if clone range ioctl
 is used. We can reliably detect the underflow because max file size is
 limited to 2^63 and max data extent size is limited by block group size.
 
 Signed-off-by: Zheng Yan  zheng.z@intel.com

Tested-by: Li Zefan l...@cn.fujitsu.com

...
 @@ -3323,8 +3323,11 @@ static int find_data_references(struct reloc_control 
 *rc,
   }
  
   key.objectid = ref_objectid;
 - key.offset = ref_offset;
   key.type = BTRFS_EXTENT_DATA_KEY;
 + if (ref_offset  ((u64)-1  32))
 + key.offset = 0;
 + else
 + key.offset = ref_offset;

This needs comment, as we're working around a corner case and a magic number is
used.

  
   path-search_commit_root = 1;
   path-skip_locking = 1;
 --
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >