[PATCH] Btrfs: fix double decrease of the writer counter

2012-09-17 Thread Miao Xie
In __btrfs_end_transaction(), we have invoked sb_end_intwrite(), but if we
need run btrfs_commit_transaction(), we will decrease the writer counter
for two times because btrfs_commit_transaction() also invokes sb_end_intwrite().
Fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/transaction.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 27c2600..3134fdc 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -551,8 +551,6 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
btrfs_trans_release_metadata(trans, root);
trans-block_rsv = NULL;
 
-   sb_end_intwrite(root-fs_info-sb);
-
if (lock  !atomic_read(root-fs_info-open_ioctl_trans) 
should_end_transaction(trans, root)) {
trans-transaction-blocked = 1;
@@ -573,6 +571,8 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
}
}
 
+   sb_end_intwrite(root-fs_info-sb);
+
WARN_ON(cur_trans != info-running_transaction);
WARN_ON(atomic_read(cur_trans-num_writers)  1);
atomic_dec(cur_trans-num_writers);
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops with a degraded volume

2012-09-17 Thread Liu Bo
On 09/15/2012 10:17 PM, Antoine Sirinelli wrote:
 Hi,
 
 I have experienced a very reproducible Oops within the btrfs driver. On
 a linux 3.5.4, if I mount a volume with the option degraded because
 one of the device is missing, I would get an Oops when I unmount it (or
 even before). You can see attached the kernel log.
 

Thanks for the report.  And this has been fixed by

commit 99f5944b8477914406173b47b4f261356286730b
Btrfs: do not strdup non existent strings

You can find this commit in 3.6.0-rc5. :)

thanks,
liubo

 Here is how I create my btrfs volume:
 
 # mkfs.btrfs /dev/vdb /dev/vdc
 # mount /dev/vdb /mnt
 # dd if=/dev/zero of=/mnt/zeros count=1M
 # umount /mnt
 # shutdown -h now
 
 I am then wiping one volume (/dev/vdc) and restarting the system. To
 get a crash, here is what I am doing:
 
 # mount -o degraded /dev/vdb /mnt
 # umount /mnt
 
 I recognise the volume is not usable after having erased one drive but I
 would expect no to crash the kernel in such circumstances. I am not an
 expert, I am just reporting a crash from an user point of view.
 
 Antoine
 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Experiences: Why BTRFS had to yield for ZFS

2012-09-17 Thread Casper Bang
Abstract
For database testing purposes, a COW filesystem was needed in order to
facilitate snapshotting and rollback, such as to provide mirrors of
our production database at fixed intervals (every night and by
demand).

Platform
An HP Proliant 380P (2x Intel Xeon E5-2620 with 12 cores for a total
of 24 threads) with build-in Smart Array SAS/SATA (Gen8) controllers,
was combined with 10x consumer Samsung 830 512GB SSD (SATAIII, 6Gb/s).
Oracle (Unbreakable) Linux x64 2.6.39-200.29.3.el6uek.x86_64 #1 SMP
Tue Aug 28 13:03:31 EDT 2012 and Oracle database standard edition
10.2.0.4 64bit.

Setup
OS was installed on fist disk (sda) and the remaining 9 (sdb - sdj)
were pooled into some 4.4TB, for containing Oracle datafiles. An
initial backup of the 1.5TB large prod database would get restored as
a (shut down) sync instance on the test server on the COW filesystem.
A script on the test server, would then apply Oracle archive files
from the production environment to this Oracle sync database, every
10'th minute, effectively making it near up-to-date with production.
The most reliable way to do this was with a simple NFS mount (rather
than rsync or samba). The idea then was, that it would be very fast
and easy to make a new snapshot of the sync database, start it up, and
voila you'd have a new instance ready to play with. A desktop machine
with ext4 partitions proved lower boundary for applying archivelog
data at around 1200 kb/s - we expected an order of magnitude higher
performance on the server.

BTRFS experiences
We used native BTRFS from kernel; with atime off, ssd mode. BTRFS
proved to be very fast at reading for a large TRDBMS (2x speedup
compared to a SAN). However, applying archivelog on a BTRFS filesystem
proved to scale poorly, by starting out with a decent apply rate it
would eventually end down around 400-500 kb/s. BTRFS had to be
abandoned due to this, since the script would never be able to finish
applying archivelog as new ones arrived. The desktop machine with
traditional spinning drives formatted for BTRFS showed a similar
scenario, so hardware (server, controller and disks) was excluded as a
cause.

ZFS experiences
We then tried using ZFS via custom-built SPL/ZFS 0.6.0-rc10 modules
with recordsize equal to that of Oracle database (8K); compression
off, quota off, dedup off, checksum on and atime on.
ZFS proved to be on-pair with a SAN, when it comes to reading for a
large TRDBMS. Thankfully, ZFS did not degrade much in archivelog apply
performance, and proved to have a lower-boundary of 15MB/s.

Conclusion
We had hoped to be able to utilize BTRFS, due to it's license and
inclusion in the Linux mainline kernel. However, for practical
purposes, we're not able to make use of BTRFS due to its performance
when writing -especially considering this is even without mixing in
shapshotting. While ZFS doesn't give us quite the boost in read
performance we had expected from SSD's, it seems more optimized for
writting and will allow us to complete our project of getting clones
of a production database environment up and running in a snap.

Take it for what it's worth, a couple of developers experiences with
BTRFS. We are not likely to go back and change things now it works,
but we are curious as to why we see such big differences between the
two file-systems. Any comments and/or feedback appreciated.

Regards,
Jesper and Casper
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experiences: Why BTRFS had to yield for ZFS

2012-09-17 Thread Ralf Hildebrandt
* Casper Bang casper.b...@gmail.com:

 Oracle (Unbreakable) Linux x64 2.6.39-200.29.3.el6uek.x86_64 #1 SMP

And the btrfs was that from vanilla 2.6.39 (i.e. over a year old)?

-- 
Ralf Hildebrandt   Charite Universitätsmedizin Berlin
ralf.hildebra...@charite.deCampus Benjamin Franklin
http://www.charite.de  Hindenburgdamm 30, 12203 Berlin
Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 v3] Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag

2012-09-17 Thread Liu Bo
We're going to use this flag EXTENT_DEFRAG to indicate which range
belongs to defragment so that we can implement snapshow-aware defrag:

We set the EXTENT_DEFRAG flag when dirtying the extents that need
defragmented, so later on writeback thread can differentiate between
normal writeback and writeback started by defragmentation.

This patch is used for the latter one.

Originally patch by Li Zefan l...@cn.fujitsu.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/extent_io.c |8 
 fs/btrfs/extent_io.h |2 ++
 fs/btrfs/file.c  |4 ++--
 fs/btrfs/inode.c |   20 
 fs/btrfs/ioctl.c |8 
 5 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4c87847..604e404 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1144,6 +1144,14 @@ int set_extent_delalloc(struct extent_io_tree *tree, u64 
start, u64 end,
  NULL, cached_state, mask);
 }
 
+int set_extent_defrag(struct extent_io_tree *tree, u64 start, u64 end,
+ struct extent_state **cached_state, gfp_t mask)
+{
+   return set_extent_bit(tree, start, end,
+ EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
+ NULL, cached_state, mask);
+}
+
 int clear_extent_dirty(struct extent_io_tree *tree, u64 start, u64 end,
   gfp_t mask)
 {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 25900af..512f8da 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -235,6 +235,8 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 
start, u64 end,
   int bits, int clear_bits, gfp_t mask);
 int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end,
struct extent_state **cached_state, gfp_t mask);
+int set_extent_defrag(struct extent_io_tree *tree, u64 start, u64 end,
+ struct extent_state **cached_state, gfp_t mask);
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
  u64 *start_ret, u64 *end_ret, int bits);
 struct extent_state *find_first_extent_bit_state(struct extent_io_tree *tree,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5caf285..226690a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1173,8 +1173,8 @@ again:
 
clear_extent_bit(BTRFS_I(inode)-io_tree, start_pos,
  last_pos - 1, EXTENT_DIRTY | EXTENT_DELALLOC |
- EXTENT_DO_ACCOUNTING, 0, 0, cached_state,
- GFP_NOFS);
+ EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
+ 0, 0, cached_state, GFP_NOFS);
unlock_extent_cached(BTRFS_I(inode)-io_tree,
 start_pos, last_pos - 1, cached_state,
 GFP_NOFS);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b2c3514..55857eb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3531,7 +3531,8 @@ again:
}
 
clear_extent_bit(BTRFS_I(inode)-io_tree, page_start, page_end,
- EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING,
+ EXTENT_DIRTY | EXTENT_DELALLOC |
+ EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, cached_state, GFP_NOFS);
 
ret = btrfs_set_extent_delalloc(inode, page_start, page_end,
@@ -5998,7 +5999,8 @@ unlock:
if (lockstart  lockend) {
if (create  len  lockend - lockstart) {
clear_extent_bit(BTRFS_I(inode)-io_tree, lockstart,
-lockstart + len - 1, unlock_bits, 1, 0,
+lockstart + len - 1,
+unlock_bits | EXTENT_DEFRAG, 1, 0,
 cached_state, GFP_NOFS);
/*
 * Beside unlock, we also need to cleanup reserved space
@@ -6006,8 +6008,8 @@ unlock:
 */
clear_extent_bit(BTRFS_I(inode)-io_tree,
 lockstart + len, lockend,
-unlock_bits | EXTENT_DO_ACCOUNTING,
-1, 0, NULL, GFP_NOFS);
+unlock_bits | EXTENT_DO_ACCOUNTING |
+EXTENT_DEFRAG, 1, 0, NULL, GFP_NOFS);
} else {
clear_extent_bit(BTRFS_I(inode)-io_tree, lockstart,
 lockend, unlock_bits, 1, 0,
@@ -6572,8 +6574,8 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned long offset)
 */
clear_extent_bit(tree, page_start, page_end,
 

[PATCH 2/2 v3] Btrfs: snapshot-aware defrag

2012-09-17 Thread Liu Bo
This comes from one of btrfs's project ideas,
As we defragment files, we break any sharing from other snapshots.
The balancing code will preserve the sharing, and defrag needs to grow this
as well.

Now we're able to fill the blank with this patch, in which we make full use of
backref walking stuff.

Here is the basic idea,
o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
o  at endio, after we finish updating fs tree, we use backref walking to find
   all parents of the ranges and re-link them with the new COWed file layout by
   adding corresponding backrefs.

Originally patch by Li Zefan l...@cn.fujitsu.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
Changes since v2:
- adopt better names for local structures.
- add proper reschedule phrase
- better error handling
- minor cleanups
(Thanks, David)

 fs/btrfs/inode.c |  617 ++
 1 files changed, 617 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 55857eb..8278aa2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -54,6 +54,7 @@
 #include locking.h
 #include free-space-cache.h
 #include inode-map.h
+#include backref.h
 
 struct btrfs_iget_args {
u64 ino;
@@ -1846,6 +1847,608 @@ out:
return ret;
 }
 
+/* snapshot-aware defrag */
+struct sa_defrag_extent_backref {
+   struct rb_node node;
+   struct old_sa_defrag_extent *old;
+   u64 root_id;
+   u64 inum;
+   u64 file_pos;
+   u64 extent_offset;
+   u64 num_bytes;
+   u64 generation;
+};
+
+struct old_sa_defrag_extent {
+   struct list_head list;
+   struct new_sa_defrag_extent *new;
+
+   u64 extent_offset;
+   u64 bytenr;
+   u64 offset;
+   u64 len;
+   int count;
+};
+
+struct new_sa_defrag_extent {
+   struct rb_root root;
+   struct list_head head;
+   struct btrfs_path *path;
+   struct inode *inode;
+   u64 file_pos;
+   u64 len;
+   u64 bytenr;
+   u64 disk_len;
+   u8 compress_type;
+};
+
+static int backref_comp(struct sa_defrag_extent_backref *b1,
+   struct sa_defrag_extent_backref *b2)
+{
+   if (b1-root_id  b2-root_id)
+   return -1;
+   else if (b1-root_id  b2-root_id)
+   return 1;
+
+   if (b1-inum  b2-inum)
+   return -1;
+   else if (b1-inum  b2-inum)
+   return 1;
+
+   if (b1-file_pos  b2-file_pos)
+   return -1;
+   else if (b1-file_pos  b2-file_pos)
+   return 1;
+
+   WARN_ON(1);
+   return 0;
+}
+
+static void backref_insert(struct rb_root *root,
+  struct sa_defrag_extent_backref *backref)
+{
+   struct rb_node **p = root-rb_node;
+   struct rb_node *parent = NULL;
+   struct sa_defrag_extent_backref *entry;
+   int ret;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct sa_defrag_extent_backref, node);
+
+   ret = backref_comp(backref, entry);
+   if (ret  0)
+   p = (*p)-rb_left;
+   else if (ret  0)
+   p = (*p)-rb_right;
+   else
+   BUG_ON(1);
+   }
+
+   rb_link_node(backref-node, parent, p);
+   rb_insert_color(backref-node, root);
+}
+
+/*
+ * Note the backref might has changed, and in this case we just return 0.
+ */
+static noinline int record_one_backref(u64 inum, u64 offset, u64 root_id,
+  void *ctx)
+{
+   struct btrfs_file_extent_item *extent;
+   struct btrfs_fs_info *fs_info;
+   struct old_sa_defrag_extent *old = ctx;
+   struct new_sa_defrag_extent *new = old-new;
+   struct btrfs_path *path = new-path;
+   struct btrfs_key key;
+   struct btrfs_root *root;
+   struct sa_defrag_extent_backref *backref;
+   struct extent_buffer *leaf;
+   struct inode *inode = new-inode;
+   int slot;
+   int ret;
+   u64 extent_offset;
+   u64 num_bytes;
+
+   if (BTRFS_I(inode)-root-root_key.objectid == root_id 
+   inum == btrfs_ino(inode))
+   return 0;
+
+   key.objectid = root_id;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   fs_info = BTRFS_I(inode)-root-fs_info;
+   root = btrfs_read_fs_root_no_name(fs_info, key);
+   if (IS_ERR(root)) {
+   if (PTR_ERR(root) == -ENOENT)
+   return 0;
+   WARN_ON(1);
+   pr_debug(inum=%llu, offset=%llu, root_id=%llu\n,
+inum, offset, root_id);
+   return PTR_ERR(root);
+   }
+
+   key.objectid = inum;
+   key.type = BTRFS_EXTENT_DATA_KEY;
+   if (offset  (u64)-1  32)
+   key.offset = 0;
+   else
+   key.offset = offset;
+
+   ret = 

Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag

2012-09-17 Thread Liu Bo
Please only push this one since the first one remains unchanged, I also posted 
it for
others to better review.

thanks,
liubo

On 09/17/2012 05:58 PM, Liu Bo wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow this
 as well.
 
 Now we're able to fill the blank with this patch, in which we make full use of
 backref walking stuff.
 
 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to find
all parents of the ranges and re-link them with the new COWed file layout 
 by
adding corresponding backrefs.
 
 Originally patch by Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 Changes since v2:
 - adopt better names for local structures.
 - add proper reschedule phrase
 - better error handling
 - minor cleanups
   (Thanks, David)
 
  fs/btrfs/inode.c |  617 
 ++
  1 files changed, 617 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 55857eb..8278aa2 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -54,6 +54,7 @@
  #include locking.h
  #include free-space-cache.h
  #include inode-map.h
 +#include backref.h
  
  struct btrfs_iget_args {
   u64 ino;
 @@ -1846,6 +1847,608 @@ out:
   return ret;
  }
  
 +/* snapshot-aware defrag */
 +struct sa_defrag_extent_backref {
 + struct rb_node node;
 + struct old_sa_defrag_extent *old;
 + u64 root_id;
 + u64 inum;
 + u64 file_pos;
 + u64 extent_offset;
 + u64 num_bytes;
 + u64 generation;
 +};
 +
 +struct old_sa_defrag_extent {
 + struct list_head list;
 + struct new_sa_defrag_extent *new;
 +
 + u64 extent_offset;
 + u64 bytenr;
 + u64 offset;
 + u64 len;
 + int count;
 +};
 +
 +struct new_sa_defrag_extent {
 + struct rb_root root;
 + struct list_head head;
 + struct btrfs_path *path;
 + struct inode *inode;
 + u64 file_pos;
 + u64 len;
 + u64 bytenr;
 + u64 disk_len;
 + u8 compress_type;
 +};
 +
 +static int backref_comp(struct sa_defrag_extent_backref *b1,
 + struct sa_defrag_extent_backref *b2)
 +{
 + if (b1-root_id  b2-root_id)
 + return -1;
 + else if (b1-root_id  b2-root_id)
 + return 1;
 +
 + if (b1-inum  b2-inum)
 + return -1;
 + else if (b1-inum  b2-inum)
 + return 1;
 +
 + if (b1-file_pos  b2-file_pos)
 + return -1;
 + else if (b1-file_pos  b2-file_pos)
 + return 1;
 +
 + WARN_ON(1);
 + return 0;
 +}
 +
 +static void backref_insert(struct rb_root *root,
 +struct sa_defrag_extent_backref *backref)
 +{
 + struct rb_node **p = root-rb_node;
 + struct rb_node *parent = NULL;
 + struct sa_defrag_extent_backref *entry;
 + int ret;
 +
 + while (*p) {
 + parent = *p;
 + entry = rb_entry(parent, struct sa_defrag_extent_backref, node);
 +
 + ret = backref_comp(backref, entry);
 + if (ret  0)
 + p = (*p)-rb_left;
 + else if (ret  0)
 + p = (*p)-rb_right;
 + else
 + BUG_ON(1);
 + }
 +
 + rb_link_node(backref-node, parent, p);
 + rb_insert_color(backref-node, root);
 +}
 +
 +/*
 + * Note the backref might has changed, and in this case we just return 0.
 + */
 +static noinline int record_one_backref(u64 inum, u64 offset, u64 root_id,
 +void *ctx)
 +{
 + struct btrfs_file_extent_item *extent;
 + struct btrfs_fs_info *fs_info;
 + struct old_sa_defrag_extent *old = ctx;
 + struct new_sa_defrag_extent *new = old-new;
 + struct btrfs_path *path = new-path;
 + struct btrfs_key key;
 + struct btrfs_root *root;
 + struct sa_defrag_extent_backref *backref;
 + struct extent_buffer *leaf;
 + struct inode *inode = new-inode;
 + int slot;
 + int ret;
 + u64 extent_offset;
 + u64 num_bytes;
 +
 + if (BTRFS_I(inode)-root-root_key.objectid == root_id 
 + inum == btrfs_ino(inode))
 + return 0;
 +
 + key.objectid = root_id;
 + key.type = BTRFS_ROOT_ITEM_KEY;
 + key.offset = (u64)-1;
 +
 + fs_info = BTRFS_I(inode)-root-fs_info;
 + root = btrfs_read_fs_root_no_name(fs_info, key);
 + if (IS_ERR(root)) {
 + if (PTR_ERR(root) == -ENOENT)
 + return 0;
 + WARN_ON(1);
 + pr_debug(inum=%llu, offset=%llu, root_id=%llu\n,
 +  inum, offset, root_id);
 + return PTR_ERR(root);
 + }
 +
 + key.objectid = inum;
 + key.type = 

Re: Experiences: Why BTRFS had to yield for ZFS

2012-09-17 Thread Avi Miller
Hi,

On 17/09/2012, at 7:55 PM, Casper Bnag casper.b...@gmail.com wrote:

 We're using the latest available kernel for our Oracle Unbreakable 
 Linux 6.3 from Aug 28. We have no other option, since the Oracle database
 software needs to run on a certified distro. 

Oracle Database is not certified to run on either btrfs or ZFS on Linux, so if 
certification is an issue, you can't use either filesystem. Out of interest, 
have you done a performance benchmark with ASM using ASMlib on the same 
platform? 

--
Oracle http://www.oracle.com
Avi Miller | Principal Program Manager | +61 (412) 229 687
Oracle Linux and Virtualization
417 St Kilda Road, Melbourne, Victoria 3004 Australia






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable

2012-09-17 Thread Hugo Mills
On Mon, Sep 17, 2012 at 12:48:10PM +0800, Anand Jain wrote:
 
   btrfs send introduced a part of code to read kernel-data
   from user-end using pipe. We need this part of code to be
   useable outside of send sub-cmd, so that developing
   service sub-cmd can use it.
 
 What's 'service sub-cmd' please?
 
   at the moment 'btrfs service history mnt|dev'
   to show logs of maintenance.
   comments/suggestions welcome.

   As I said in our private email exchange some months ago, I don't
think this is the right way to be doing this. For example, if you use
an alternative tool (such as btrfs-gui) which uses the ioctls
directly, you've lost that logging information.

   Keeping a log of what's been done to the FS is much better done by
extending the available logging in the kernel (and making it a
compile-time option for those who don't want or need it). You can then
write a simple shell script to chomp through the normal kernel logs to
extract this information.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I'll take your bet, but make it ten thousand francs. I'm only ---  
   a _poor_ corrupt official.


signature.asc
Description: Digital signature


Re: Experiences: Why BTRFS had to yield for ZFS

2012-09-17 Thread Avi Miller
Hi,

On 17/09/2012, at 8:47 PM, Casper Bnag casper.b...@gmail.com wrote:

 month, that just makes me wonder why Oracle didn't use these latest bits. 

We used the most stable release of btrfs that was available when the 
development of the UEK was done. Keep in mind that while it's versioned at 
2.6.39, it's actually 3.0.16 under the hood. It's just that some userspace 
doesn't like having a kernel version that doesn't start with 2.6

 Out of interest, have you done a performance benchmark with ASM using ASMlib
 on the same platform? 
 
 Sorry, no. Our experience with ASM is limited, we came to the conclusion once
 that we like being able to handle the files in a plain mountable file-system.

Perhaps, but ASM would provide all the functionality you require, including 
snapshots and rollback, at the highest possible performance. Certainly a lot 
higher than both ZFS and btrfs. And it's fully certified and supported by 
Oracle.

As an alternative, why not consider using Oracle VM on the machine and creating 
database VMs instead? You can then use the snapshot capability of Oracle VM 
while still running supported and certified filesystems inside each guest.

(We should also probably take this discussion off-list, as it has drifted away 
from btrfs proper). Feel free to reply to me directly if you want.

--
Oracle http://www.oracle.com
Avi Miller | Principal Program Manager | +61 (412) 229 687
Oracle Linux and Virtualization
417 St Kilda Road, Melbourne, Victoria 3004 Australia






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


enquiry about autodefrag option (resent)

2012-09-17 Thread ching
I am testing btrfs for long-term storage and backup, and i would like
to know more about autodefrag option:

1. Will autodefrag option benefit ssd?

My understanding is:

   autodrag - number of extent decrease - metadata decrease - a
healthier filesystem in the long run

   (P.S. I am aware that autodefrag will introduce extra write I/O)

2. AFAIK, autodefrag detects small random writes into files and
queues them up for an automatic defrag process, so the filesystem will
defragment itself while it's used.

If the system reboot/crash/remount-ro, will the autodefrag process
continue after resume?


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions

2012-09-17 Thread Ilya Dryomov
On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote:
 On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote:
  On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote:
  div_factor{_fine} has been implemented for two times, cleanup it.
  And I move them into a independent file named math.h because they are
  common math functions.
  
  You removed the sanity checks:
  
  -   if (factor = 0)
  -   return 0;
  -   if (factor = 100)
  -   return num;
 
 As inline functions, they should not contain complex checks, the caller should
 make sure the parameters are right. I think.

div_factor_fine() in volumes.c is not inline, and is called from
chunk_usage_filter() on unvalidated user input.  If you think the caller
should do those checks, you should move them to the caller as part of
your patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix race with freeze and free space inodes

2012-09-17 Thread Josef Bacik
On Sun, Sep 16, 2012 at 11:36:57PM -0600, Miao Xie wrote:
 On fri, 14 Sep 2012 11:26:20 -0400, Josef Bacik wrote:
  So we start our freeze, somebody comes in and does an fsync() on a file
  where we have to commit a transaction for whatever reason, and we will
  deadlock because the freeze is waiting on FS_FREEZE people to stop writing
  to the file system, but the transaction is waiting for its free space inodes
  to be written out, which are in turn waiting on sb_start_intwrite while
  trying to write the file extents.  To fix this we'll just skip the
  sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a
  transaction commit so we're safe wrt to freeze and this will keep us from
  deadlocking.  Thanks,
  
  Signed-off-by: Josef Bacik jba...@fusionio.com
  ---
   fs/btrfs/transaction.c |   10 +-
   1 files changed, 9 insertions(+), 1 deletions(-)
  
  diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
  index c9265a6..ba74dfb 100644
  --- a/fs/btrfs/transaction.c
  +++ b/fs/btrfs/transaction.c
  @@ -342,7 +342,15 @@ again:
  if (!h)
  return ERR_PTR(-ENOMEM);
   
  -   if (!__sb_start_write(root-fs_info-sb, SB_FREEZE_FS, false)) {
  +   /*
  +* If we are JOIN_NOLOCK we're already committing a transaction and
  +* waiting on this guy, so we don't need to do the sb_start_intwrite
  +* because we're already holding a ref.  We need this because we could
  +* have raced in and did an fsync() on a file which can kick a commit
  +* and then we deadlock with somebody doing a freeze.
  +*/
  +   if (type != TRANS_JOIN_NOLOCK 
  +   !__sb_start_write(root-fs_info-sb, SB_FREEZE_FS, false)) {
  if (type == TRANS_JOIN_FREEZE)
  return ERR_PTR(-EPERM);
  sb_start_intwrite(root-fs_info-sb);
  
 
 This patch forgets to deal with it in __btrfs_end_transaction(), or the 
 freeze counter
 will be wrong.
 

This was fixed locally I just sent the wrong patch, thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable

2012-09-17 Thread David Sterba
On Mon, Sep 17, 2012 at 12:48:10PM +0800, Anand Jain wrote:
   btrfs send introduced a part of code to read kernel-data
   from user-end using pipe. We need this part of code to be
   useable outside of send sub-cmd, so that developing
   service sub-cmd can use it.
 
 What's 'service sub-cmd' please?
 
   at the moment 'btrfs service history mnt|dev'
   to show logs of maintenance.
   comments/suggestions welcome.

Sorry, but without a more detailed description I can hardly give useful
comments.  The patch looks ok but stands alone, you can post it with
your proposed feature together.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC inside][PATCH] btrfs: allow setting NOCOW for a zero sized file via ioctl

2012-09-17 Thread David Sterba
Hi,

Josef, I noticed that you did not add the patch to btrfs-next. This is
understandable for a RFC patch of course, but I'd like to ask you to add
it into the queue, so people testing -next have a chance to give it a
try.

Thanks,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Root dentry has weird name

2012-09-17 Thread Marc MERLIN
On Mon, Sep 17, 2012 at 06:17:53PM +0200, David Sterba wrote:
 On Fri, Sep 14, 2012 at 10:09:12AM -0700, Marc MERLIN wrote:
  I only have btrfs on my laptop and just started getting this.
 
 Afaik, this is not directly related to btrfs. Search for the Root
 dentry has weird name message and you'll see occurences from kernel
 3.0, 3.1.
 
  I'm not too clear about whether it's in memory or on my filesystem 
  somewhere.
 
 It's reflecting a in-memory state.
 
  Can you recommend what I should do: reboot? fsck somehow? other?
 
 Reboot should help, also check for potential NFS problems, like
 unreachable server.

Thanks for your answer. I indeed should have searched that first instead of
assuming it was btrfs related.
I can also confirm that rebooting made the message go away.

Thanks for your answer and sorry for posting to the wrong list.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


This code cannot be right

2012-09-17 Thread Alan Cox

ctree.c:btrfs_insert_some_items()
{
...


if (total_size + data_size[i]+ ...
{
break;
nr = i;
}


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions

2012-09-17 Thread David Sterba
On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote:
 On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote:
  On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote:
  div_factor{_fine} has been implemented for two times, cleanup it.
  And I move them into a independent file named math.h because they are
  common math functions.
  
  You removed the sanity checks:
  
  -   if (factor = 0)
  -   return 0;
  -   if (factor = 100)
  -   return num;
 
 As inline functions, they should not contain complex checks, the caller should
 make sure the parameters are right. I think.

It's compiler's job to decide whether a function should be inlined or
not. The keyword/function attribute 'inline' is only a hint, unless
always_inline is used and the author should be sure that it really has
the expected outcome and that compiler is wrong here.

I don't agree that each caller should do the checks, it only makes code
harder to read and forces the authors to check for conditions that may
not be apparent or are just ommitted.

If we need a function that does not check the boundaries, then of course
go for it, but I don't see such case yet.

  in new version. And I don't think it's necessary to add an extra include
  with a rather generic name and trivial code. A separate .h/.c with
  non-filesystem related support code like this looks more suitable.
  
  Do you intend to use the functions out of extent-tree.c ?
 
 They are used in both extent-tree.c and volumes.c from the outset, but they
 were implemented in these two files severally.

Ah, I see.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix double decrease of the writer counter

2012-09-17 Thread Josef Bacik
On Mon, Sep 17, 2012 at 12:34:27AM -0600, Miao Xie wrote:
 In __btrfs_end_transaction(), we have invoked sb_end_intwrite(), but if we
 need run btrfs_commit_transaction(), we will decrease the writer counter
 for two times because btrfs_commit_transaction() also invokes 
 sb_end_intwrite().
 Fix it.
 

Already fixed in btrfs-next.  Thanks,

Josef
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/transaction.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 27c2600..3134fdc 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -551,8 +551,6 @@ static int __btrfs_end_transaction(struct 
 btrfs_trans_handle *trans,
   btrfs_trans_release_metadata(trans, root);
   trans-block_rsv = NULL;
  
 - sb_end_intwrite(root-fs_info-sb);
 -
   if (lock  !atomic_read(root-fs_info-open_ioctl_trans) 
   should_end_transaction(trans, root)) {
   trans-transaction-blocked = 1;
 @@ -573,6 +571,8 @@ static int __btrfs_end_transaction(struct 
 btrfs_trans_handle *trans,
   }
   }
  
 + sb_end_intwrite(root-fs_info-sb);
 +
   WARN_ON(cur_trans != info-running_transaction);
   WARN_ON(atomic_read(cur_trans-num_writers)  1);
   atomic_dec(cur_trans-num_writers);
 -- 
 1.7.6.5
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()

2012-09-17 Thread David Sterba
On Thu, Sep 06, 2012 at 06:00:32PM +0800, Miao Xie wrote:
 This patch fixes the following problem:
 - If we failed to deal with the delayed dir items, we should abort 
 transaction,
   just as its comment said. Fix it.
 - If root reference or root back reference insertion failed, we should
   abort transaction. Fix it.
 - Fix the double free problem of pending-inherit.
 - Do not restore the trans-rsv if we doesn't change it.
 - make the error path more clearly.

I've noticed a pattern in the error + transaction abort paths, that is
touched in this patch and would like to ask you to update it:

 @@ -1018,10 +1016,9 @@ static noinline int create_pending_snapshot(struct 
 btrfs_trans_handle *trans,
   BTRFS_FT_DIR, index);
   if (ret == -EEXIST) {
   pending-error = -EEXIST;
 - dput(parent);
   goto fail;

normal exit path: here we don't abort transaction, just go the exit
block and do the cleanup

   } else if (ret) {
 - goto abort_trans_dput;
 + goto abort_trans;

a transaction abort path: here we jump to a common block that calls
abort, but we lose the information where the abort occured

I went through the code and saw several uses of this pattern (and I
remember more than one bugreport that pointed to a abort_transaction
call without leaving any traces what condition failed).

(Search regex I used 'goto.*abort')

So the proposed pattern to use is

---
if (condition) {
btrfs_transaction_abort(...);
goto fail;
}


fail:
cleanup
return ...;
---

 @@ -1120,15 +1114,15 @@ static noinline int create_pending_snapshot(struct 
 btrfs_trans_handle *trans,
   ret = btrfs_reloc_post_snapshot(trans, pending);
   if (ret)
   goto abort_trans;
 - ret = 0;
  fail:
 - kfree(new_root_item);
 + dput(parent);
   trans-block_rsv = rsv;
 +no_free_objectid:
 + kfree(new_root_item);
 +root_item_alloc_fail:
   btrfs_block_rsv_release(root, pending-block_rsv, (u64)-1);
   return ret;
  
 -abort_trans_dput:
 - dput(parent);
  abort_trans:
   btrfs_abort_transaction(trans, root, ret);
   goto fail;

(end of function here)

this will also remove all the instances where a function ends with a
'goto'. All instances are convertible to the pattern described above.

Atlernate approach that I originally considered for fixing was to
introduce a call like 'btrfs_mark_transaction_abort_callsite' which
would need to add a field to fs_info and print it later. But, if we're
going to touch all the code, it makes sense to utilize the
infrastructure we already have.

Please consider updating your patch, I'll send a separate patch that
deals with aborts outside of create_pending_snapshot.

TIA,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag

2012-09-17 Thread Josef Bacik
On Mon, Sep 17, 2012 at 03:58:56AM -0600, Liu Bo wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow this
 as well.
 
 Now we're able to fill the blank with this patch, in which we make full use of
 backref walking stuff.
 
 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to find
all parents of the ranges and re-link them with the new COWed file layout 
 by
adding corresponding backrefs.
 
 Originally patch by Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com

I was trying to fixup the rejects on this patch when I noticed there were no
tabs, only spaces.  Thats not going to work and now I have to go back and make
sure none of your other patches did this.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'umount' of multi-device volume hangs until the device is physically un-plugged

2012-09-17 Thread Josef Bacik
On Sun, Sep 16, 2012 at 10:07:39PM -0600, Kay Sievers wrote:
 I'm currently playing around with native btrfs multi-device support in
 systemd. There might be a few hotplug issues to solve, here is the
 first one:
 
 A mounted (otherwise unused) multi-device volume (USB multi-slot card
 reader), hangs at:
   $ umount /mnt
 with (fedora) kernel
   3.6.0-0.rc5.git0.1.fc18.x86_64
 
 Any idea what to look for or what to try?
 

Can I see the whole sysrq+w?  Also can you try btrfs-next and see if you have
the same problems?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: This code cannot be right

2012-09-17 Thread Chris Mason
On Mon, Sep 17, 2012 at 10:25:54AM -0600, Alan Cox wrote:
 
 ctree.c:btrfs_insert_some_items()
 {
 ...
 
 
   if (total_size + data_size[i]+ ...
   {
   break;
   nr = i;
   }
 

Hi Alan,

Definitely not right ;) It's actually unused, but I thought I had gotten
rid of it long ago.  I'll queue up a patch for the next merge window,
thanks.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: do not hold the write_lock on the extent tree while logging V2

2012-09-17 Thread Josef Bacik
Dave Sterba pointed out a sleeping while atomic bug while doing fsync.  This
is because I'm an idiot and didn't realize that rwlock's were spin locks, so
we've been holding this thing while doing allocations and such which is not
good.  This patch fixes this by dropping the write lock before we do
anything heavy and re-acquire it when it is done.  We also need to take a
ref on the em's in case their corresponding pages are evicted and mark them
as being logged so that releasepage does not remove them and doesn't remove
them from our local list.  Thanks,

Reported-by: Dave Sterba d...@jikos.cz
Signed-off-by: Josef Bacik jba...@fusionio.com
---
V1-V2: drop our ref if we had an error
 fs/btrfs/extent_map.c |3 ++-
 fs/btrfs/extent_map.h |1 +
 fs/btrfs/tree-log.c   |   20 
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 8d1364d..b8cbc8d 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -407,7 +407,8 @@ int remove_extent_mapping(struct extent_map_tree *tree, 
struct extent_map *em)
 
WARN_ON(test_bit(EXTENT_FLAG_PINNED, em-flags));
rb_erase(em-rb_node, tree-map);
-   list_del_init(em-list);
+   if (!test_bit(EXTENT_FLAG_LOGGING, em-flags))
+   list_del_init(em-list);
em-in_tree = 0;
return ret;
 }
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 8e6294b..6792255 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -13,6 +13,7 @@
 #define EXTENT_FLAG_COMPRESSED 1
 #define EXTENT_FLAG_VACANCY 2 /* no file extent item found */
 #define EXTENT_FLAG_PREALLOC 3 /* pre-allocated extent */
+#define EXTENT_FLAG_LOGGING 4 /* Logging this extent */
 
 struct extent_map {
struct rb_node rb_node;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 038a522..a3e88cf 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2945,6 +2945,9 @@ static int btrfs_log_changed_extents(struct 
btrfs_trans_handle *trans,
list_del_init(em-list);
if (em-generation = test_gen)
continue;
+   /* Need a ref to keep it from getting evicted from cache */
+   atomic_inc(em-refs);
+   set_bit(EXTENT_FLAG_LOGGING, em-flags);
list_add_tail(em-list, extents);
}
 
@@ -2954,13 +2957,18 @@ static int btrfs_log_changed_extents(struct 
btrfs_trans_handle *trans,
em = list_entry(extents.next, struct extent_map, list);
 
list_del_init(em-list);
+   clear_bit(EXTENT_FLAG_LOGGING, em-flags);
 
/*
 * If we had an error we just need to delete everybody from our
 * private list.
 */
-   if (ret)
+   if (ret) {
+   free_extent_map(em);
continue;
+   }
+
+   write_unlock(tree-lock);
 
/*
 * If the previous EM and the last extent we left off on aren't
@@ -2971,21 +2979,25 @@ static int btrfs_log_changed_extents(struct 
btrfs_trans_handle *trans,
ret = copy_items(trans, inode, dst_path, args.src,
 args.start_slot, args.nr,
 LOG_INODE_ALL);
-   if (ret)
+   if (ret) {
+   free_extent_map(em);
continue;
+   }
btrfs_release_path(path);
args.nr = 0;
}
 
ret = log_one_extent(trans, inode, root, em, path, dst_path, 
args);
+   free_extent_map(em);
+   write_lock(tree-lock);
}
+   WARN_ON(!list_empty(extents));
+   write_unlock(tree-lock);
 
if (!ret  args.nr)
ret = copy_items(trans, inode, dst_path, args.src,
 args.start_slot, args.nr, LOG_INODE_ALL);
btrfs_release_path(path);
-   WARN_ON(!list_empty(extents));
-   write_unlock(tree-lock);
return ret;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops with a degraded volume

2012-09-17 Thread Antoine Sirinelli
On Mon, Sep 17, 2012 at 02:46:00PM +0800, Liu Bo wrote:
 On 09/15/2012 10:17 PM, Antoine Sirinelli wrote:
  I have experienced a very reproducible Oops within the btrfs driver. On
  a linux 3.5.4, if I mount a volume with the option degraded because
  one of the device is missing, I would get an Oops when I unmount it (or
  even before). You can see attached the kernel log.
 
 Thanks for the report.  And this has been fixed by
 
 commit 99f5944b8477914406173b47b4f261356286730b
 Btrfs: do not strdup non existent strings
 
 You can find this commit in 3.6.0-rc5. :)

That's right, I have done the same test with rc6 and it does not crash
anymore.

Many thanks,

Antoine


signature.asc
Description: Digital signature


btrfs raid1 degraded in need of chuck tree rebuild

2012-09-17 Thread Vladi Gergov
Below is my original post about my fs. Just wondering if anyone knows if
I can at this point get my data back or cut my losses. Is an fsck cable
of getting this fixed close or has my 2 year wait been in vain. Thanks
in advance!

Excerpts from Vladi Gergov's message of 2010-10-29 16:53:42 -0400:
  gypsyops @ /mnt  sudo mount -o degraded /dev/sdc das3/
 Password: 
 mount: wrong fs type, bad option, bad superblock on /dev/sdc,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 
 [  684.577540] device label das4 devid 2 transid 107954 /dev/sdc
 [  684.595150] btrfs: allowing degraded mounts
 [  684.595594] btrfs: failed to read chunk root on sdb
 [  684.604110] btrfs: open_ctree failed
 
  gypsyops @ /mnt  sudo btrfsck /dev/sdc
 btrfsck: volumes.c:1367: btrfs_read_sys_array: Assertion `!(ret)'
 failed.

Ok, I dug through this and found the bug responsible for your
unmountable FS.  When we're mounted in degraded mode, and we don't have
enough drives available to do raid1,10, we're can use the wrong raid
level for new allocations.

I'm fixing the kernel side so this doesn't happen anymore, but I'll need
to rebuild the chunk tree (and probably a few others) off your good disk
to
fix things.

I've got it reproduced here though, so I'll make an fsck that can scan
for the correct trees and fix it for you.

Since you're basically going to be my first external fsck customer, is
there anyway you can do a raw device based backup of the blocks?  This
way if I do mess things up we can repeat the experiment.

-chris

-- 

,-| Vladi
`-| Gergov
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'umount' of multi-device volume hangs until the device is physically un-plugged

2012-09-17 Thread Kay Sievers
On Mon, Sep 17, 2012 at 7:19 PM, Josef Bacik jba...@fusionio.com wrote:
 On Sun, Sep 16, 2012 at 10:07:39PM -0600, Kay Sievers wrote:
 I'm currently playing around with native btrfs multi-device support in
 systemd. There might be a few hotplug issues to solve, here is the
 first one:

 A mounted (otherwise unused) multi-device volume (USB multi-slot card
 reader), hangs at:
   $ umount /mnt
 with (fedora) kernel
   3.6.0-0.rc5.git0.1.fc18.x86_64

 Any idea what to look for or what to try?

 Can I see the whole sysrq+w?  Also can you try btrfs-next and see if you have
 the same problems?  Thanks,

Hmm, I can't reproduce that today. Nothing really has changes with the
setup. It was easy to reproduce yesterday, even across multiple
reboots.

I'll come back if I see it again. Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS_IOC_DEVICES_READY and removed devices

2012-09-17 Thread Kay Sievers
We are currently playing around with native btrfs multi-device support
in systemd. We already committed the needed pieces to systemd git, to
register all detected btrfs filesystems with the kernel.

For volumes which are listed in fstab for mounting, we delay the
actual mount-attempt of a multi-device volume until we see READY
returned from BTRFS_IOC_DEVICES_READY. A line with UUID= in /etc/fstab
with nofail in the options field, and we can boot up without any
device plugged in. Now plugging in devices one-after-the-other until
the volume has a full tree of devices; with the last device there,
systemd just mounts the volume as expected.

This seems to work very well so far, unless a device which is already
registered disappears, which is a kind of valid hotplug scenario we
should handle better:

If one device of a 2-device volume is registered with the in-kernel
cache, and then the device is unplugged from the system, the cache
state does not get updated. If then the other device of the 2-device
volume is registered, BTRFS_IOC_DEVICES_READY indicates ready; but in
fact only one of two needed devices are available at that time, and
mounting fails.

Can we somehow subscribe to device media-changes/removal to prevent
the stale device state in the in-kernel cache?

Or alternatively make BTRFS_IOC_DEVICES_READY re-validate all involved
block devices before it returns READY?

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Btrfs: set mount options permanently

2012-09-17 Thread Hidetoshi Seto
Following patches are going to implement one of unclaimed features
listed in the btrfs wiki:

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Set_mount_options_permanently

Special thanks to Kazuhiro Yamashita for his time and efforts.
Your comments/reviews are welcomed.

Thanks,
H.Seto

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: make space to keep default mount options

2012-09-17 Thread Hidetoshi Seto
This patch create space to hold default mount option,
and to use saved default mount option change super.c
to read default mount option first when mount devices.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
---
 fs/btrfs/ctree.h |5 -
 fs/btrfs/super.c |2 ++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fa5c45b..3eb0551 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -458,8 +458,11 @@ struct btrfs_super_block {
 
__le64 cache_generation;
 
+   /* default mount options */
+   unsigned long default_mount_opt;
+
/* future expansion */
-   __le64 reserved[31];
+   __le64 reserved[30];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
 } __attribute__ ((__packed__));
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e239915..7ef4a2e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -340,6 +340,8 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
char *compress_type;
bool compress_force = false;
 
+   info-mount_opt = info-super_copy-default_mount_opt;
+
cache_gen = btrfs_super_cache_generation(root-fs_info-super_copy);
if (cache_gen)
btrfs_set_opt(info-mount_opt, SPACE_CACHE);
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs-progs: add mount-option command

2012-09-17 Thread Hidetoshi Seto
This patch adds mount-option command.
The command can set/get default mount options.
Now, the command can set/get 24 options.
These options are equal to mount options which store
in fs_info/mount-opt.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
---
 Makefile |5 +-
 btrfs-parse-mntopt.c |  111 +
 btrfs-parse-mntopt.h |   65 ++
 btrfs.c  |1 +
 cmds-mount.c |  150 ++
 commands.h   |2 +
 ctree.h  |   41 +-
 7 files changed, 372 insertions(+), 3 deletions(-)
 create mode 100644 btrfs-parse-mntopt.c
 create mode 100644 btrfs-parse-mntopt.h
 create mode 100644 cmds-mount.c

diff --git a/Makefile b/Makefile
index c0aaa3d..6f67f4c 100644
--- a/Makefile
+++ b/Makefile
@@ -5,9 +5,10 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
print-tree.o \
  root-tree.o dir-item.o file-item.o inode-item.o \
  inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \
  volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \
- send-stream.o send-utils.o
+ send-stream.o send-utils.o btrfs-parse-mntopt.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
-  cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o
+  cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
+  cmds-mount.o
 
 CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \
-Wuninitialized -Wshadow -Wundef
diff --git a/btrfs-parse-mntopt.c b/btrfs-parse-mntopt.c
new file mode 100644
index 000..87b341c
--- /dev/null
+++ b/btrfs-parse-mntopt.c
@@ -0,0 +1,111 @@
+#include stdio.h
+#include stdlib.h
+#include string.h
+#include ctree.h
+#include btrfs-parse-mntopt.h
+
+void btrfs_parse_string2mntopt(struct btrfs_root *root, char **options)
+{
+   struct btrfs_super_block *sb = root-fs_info-super_copy;
+   char *p = NULL;
+   int i = 0;
+
+   memset(sb-default_mount_opt, 0, sizeof(unsigned long));
+   while ((p = strsep(options, ,)) != NULL) {
+   int token = DEF_MNTOPT_NUM + 1;
+
+   if (!*p)
+   continue;
+   for (i = 0; i  DEF_MNTOPT_NUM; i++) {
+   if (!strcmp(p, toke[i].pattern)) {
+   token = toke[i].token;
+   break;
+   }
+   }
+   if (token  DEF_MNTOPT_NUM) {
+   printf(error: %s\n, p);
+   return;
+   }
+
+   switch (token) {
+   case Opt_degraded:
+   btrfs_set_opt(sb-default_mount_opt, DEGRADED);
+   break;
+
+   case Opt_nodatasum:
+   btrfs_set_opt(sb-default_mount_opt, NODATASUM);
+   break;
+   case Opt_nodatacow:
+   btrfs_set_opt(sb-default_mount_opt, NODATACOW);
+   btrfs_set_opt(sb-default_mount_opt, NODATASUM);
+   break;
+   case Opt_ssd:
+   btrfs_set_opt(sb-default_mount_opt, SSD);
+   break;
+   case Opt_ssd_spread:
+   btrfs_set_opt(sb-default_mount_opt, SSD);
+   btrfs_set_opt(sb-default_mount_opt, SSD_SPREAD);
+   break;
+   case Opt_nossd:
+   btrfs_set_opt(sb-default_mount_opt, NOSSD);
+   btrfs_clear_opt(sb-default_mount_opt, SSD);
+   btrfs_clear_opt(sb-default_mount_opt, SSD_SPREAD);
+   break;
+   case Opt_nobarrier:
+   btrfs_set_opt(sb-default_mount_opt, NOBARRIER);
+   break;
+   case Opt_notreelog:
+   btrfs_set_opt(sb-default_mount_opt, NOTREELOG);
+   break;
+   case Opt_flushoncommit:
+   btrfs_set_opt(sb-default_mount_opt, FLUSHONCOMMIT);
+   break;
+   case Opt_discard:
+   btrfs_set_opt(sb-default_mount_opt, DISCARD);
+   break;
+   case Opt_space_cache:
+   btrfs_set_opt(sb-default_mount_opt, SPACE_CACHE);
+   break;
+   case Opt_no_space_cache:
+   btrfs_clear_opt(sb-default_mount_opt, SPACE_CACHE);
+   break;
+   case Opt_inode_cache:
+   btrfs_set_opt(sb-default_mount_opt, INODE_MAP_CACHE);
+   break;
+   case Opt_clear_cache:
+   btrfs_set_opt(sb-default_mount_opt, CLEAR_CACHE);
+   break;
+   case 

Re: [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()

2012-09-17 Thread Miao Xie
On mon, 17 Sep 2012 18:56:27 +0200, David Sterba wrote:
 On Thu, Sep 06, 2012 at 06:00:32PM +0800, Miao Xie wrote:
 This patch fixes the following problem:
 - If we failed to deal with the delayed dir items, we should abort 
 transaction,
   just as its comment said. Fix it.
 - If root reference or root back reference insertion failed, we should
   abort transaction. Fix it.
 - Fix the double free problem of pending-inherit.
 - Do not restore the trans-rsv if we doesn't change it.
 - make the error path more clearly.
 
 I've noticed a pattern in the error + transaction abort paths, that is
 touched in this patch and would like to ask you to update it:

OK, I will send a separate patch to fix this problem.

Thanks for your review.
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs-progs: add mount-option command

2012-09-17 Thread Miao Xie
On tue, 18 Sep 2012 10:30:17 +0900, Hidetoshi Seto wrote:
 This patch adds mount-option command.
 The command can set/get default mount options.
 Now, the command can set/get 24 options.
 These options are equal to mount options which store
 in fs_info/mount-opt.

I don't think we need implement a separate command to do this,
we can add it into btrfstune just like ext3/4. If so, the users
who used ext3/4 before can be familiar with btrfs command as soon
as possible.

Beside that, why not add a option into mkfs.btrfs?

Thanks
Miao

 
 Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
 ---
  Makefile |5 +-
  btrfs-parse-mntopt.c |  111 +
  btrfs-parse-mntopt.h |   65 ++
  btrfs.c  |1 +
  cmds-mount.c |  150 
 ++
  commands.h   |2 +
  ctree.h  |   41 +-
  7 files changed, 372 insertions(+), 3 deletions(-)
  create mode 100644 btrfs-parse-mntopt.c
  create mode 100644 btrfs-parse-mntopt.h
  create mode 100644 cmds-mount.c
 
 diff --git a/Makefile b/Makefile
 index c0aaa3d..6f67f4c 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -5,9 +5,10 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
 print-tree.o \
 root-tree.o dir-item.o file-item.o inode-item.o \
 inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \
 volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \
 -   send-stream.o send-utils.o
 +   send-stream.o send-utils.o btrfs-parse-mntopt.o
  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o 
 \
 -cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o
 +cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 +cmds-mount.o
  
  CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \
   -Wuninitialized -Wshadow -Wundef
 diff --git a/btrfs-parse-mntopt.c b/btrfs-parse-mntopt.c
 new file mode 100644
 index 000..87b341c
 --- /dev/null
 +++ b/btrfs-parse-mntopt.c
 @@ -0,0 +1,111 @@
 +#include stdio.h
 +#include stdlib.h
 +#include string.h
 +#include ctree.h
 +#include btrfs-parse-mntopt.h
 +
 +void btrfs_parse_string2mntopt(struct btrfs_root *root, char **options)
 +{
 + struct btrfs_super_block *sb = root-fs_info-super_copy;
 + char *p = NULL;
 + int i = 0;
 +
 + memset(sb-default_mount_opt, 0, sizeof(unsigned long));
 + while ((p = strsep(options, ,)) != NULL) {
 + int token = DEF_MNTOPT_NUM + 1;
 +
 + if (!*p)
 + continue;
 + for (i = 0; i  DEF_MNTOPT_NUM; i++) {
 + if (!strcmp(p, toke[i].pattern)) {
 + token = toke[i].token;
 + break;
 + }
 + }
 + if (token  DEF_MNTOPT_NUM) {
 + printf(error: %s\n, p);
 + return;
 + }
 +
 + switch (token) {
 + case Opt_degraded:
 + btrfs_set_opt(sb-default_mount_opt, DEGRADED);
 + break;
 +
 + case Opt_nodatasum:
 + btrfs_set_opt(sb-default_mount_opt, NODATASUM);
 + break;
 + case Opt_nodatacow:
 + btrfs_set_opt(sb-default_mount_opt, NODATACOW);
 + btrfs_set_opt(sb-default_mount_opt, NODATASUM);
 + break;
 + case Opt_ssd:
 + btrfs_set_opt(sb-default_mount_opt, SSD);
 + break;
 + case Opt_ssd_spread:
 + btrfs_set_opt(sb-default_mount_opt, SSD);
 + btrfs_set_opt(sb-default_mount_opt, SSD_SPREAD);
 + break;
 + case Opt_nossd:
 + btrfs_set_opt(sb-default_mount_opt, NOSSD);
 + btrfs_clear_opt(sb-default_mount_opt, SSD);
 + btrfs_clear_opt(sb-default_mount_opt, SSD_SPREAD);
 + break;
 + case Opt_nobarrier:
 + btrfs_set_opt(sb-default_mount_opt, NOBARRIER);
 + break;
 + case Opt_notreelog:
 + btrfs_set_opt(sb-default_mount_opt, NOTREELOG);
 + break;
 + case Opt_flushoncommit:
 + btrfs_set_opt(sb-default_mount_opt, FLUSHONCOMMIT);
 + break;
 + case Opt_discard:
 + btrfs_set_opt(sb-default_mount_opt, DISCARD);
 + break;
 + case Opt_space_cache:
 + btrfs_set_opt(sb-default_mount_opt, SPACE_CACHE);
 + break;
 + case Opt_no_space_cache:
 + btrfs_clear_opt(sb-default_mount_opt, SPACE_CACHE);
 + break;
 + 

Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable

2012-09-17 Thread Anand Jain



As I said in our private email exchange some months ago, I don't
think this is the right way to be doing this. For example, if you use
an alternative tool (such as btrfs-gui) which uses the ioctls
directly, you've lost that logging information.


 I agree with that Hugo. Thanks. These changes are partly for
 the same reason.


Keeping a log of what's been done to the FS is much better done by
extending the available logging in the kernel


 Could you please point out the modules you are talking about.
 I reviewed some but just in case if I have missed out any.



Thanks,  Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions

2012-09-17 Thread Miao Xie
On Mon, 17 Sep 2012 18:31:13 +0200, David Sterba wrote:
 On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote:
 On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote:
 On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote:
 div_factor{_fine} has been implemented for two times, cleanup it.
 And I move them into a independent file named math.h because they are
 common math functions.

 You removed the sanity checks:

 -   if (factor = 0)
 -   return 0;
 -   if (factor = 100)
 -   return num;

 As inline functions, they should not contain complex checks, the caller 
 should
 make sure the parameters are right. I think.
 
 It's compiler's job to decide whether a function should be inlined or
 not. The keyword/function attribute 'inline' is only a hint, unless
 always_inline is used and the author should be sure that it really has
 the expected outcome and that compiler is wrong here.

Right, but I think we should make the functions as simple as possible since
they are marked as inline, because the simple function is more likely to be 
inlined
than the complex one.

 I don't agree that each caller should do the checks, it only makes code
 harder to read and forces the authors to check for conditions that may
 not be apparent or are just ommitted.

Right. But for these functions, we are sure the value of the parameters is
in the right range in the most place, and all the place that we are sure the
value is right is in the hot path. The only place that we need check the
parameters is in slow path, this is also the reason why we make them inline.
so doing those checks just wastes time. We just need modify the caller.

Thanks
Miao

 If we need a function that does not check the boundaries, then of course
 go for it, but I don't see such case yet.
 
 in new version. And I don't think it's necessary to add an extra include
 with a rather generic name and trivial code. A separate .h/.c with
 non-filesystem related support code like this looks more suitable.

 Do you intend to use the functions out of extent-tree.c ?

 They are used in both extent-tree.c and volumes.c from the outset, but they
 were implemented in these two files severally.
 
 Ah, I see.
 
 david
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs-progs: add mount-option command

2012-09-17 Thread Roman Mamedov
On Tue, 18 Sep 2012 10:31:41 +0800
Miao Xie mi...@cn.fujitsu.com wrote:

 On tue, 18 Sep 2012 10:30:17 +0900, Hidetoshi Seto wrote:
  This patch adds mount-option command.
  The command can set/get default mount options.
  Now, the command can set/get 24 options.
  These options are equal to mount options which store
  in fs_info/mount-opt.
 
 I don't think we need implement a separate command to do this,
 we can add it into btrfstune just like ext3/4. If so, the users
 who used ext3/4 before can be familiar with btrfs command as soon
 as possible.

btrfstune currently only does one thing:

$ sudo btrfstune
usage: btrfstune [options] device
-S valueenable/disable seeding

To me it'd seem more logical the other way, why not move this operation to the
base btrfs utility under some command, and remove btrfstune completely.

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


Re: Experiences: Why BTRFS had to yield for ZFS

2012-09-17 Thread Anand Jain




A script on the test server, would then apply Oracle archive files
from the production environment to this Oracle sync database, every
10'th minute, effectively making it near up-to-date with production.



The most reliable way to do this was with a simple NFS mount (rather
than rsync or samba). The idea then was, that it would be very fast
and easy to make a new snapshot of the sync database, start it up, and
voila you'd have a new instance ready to play with. A desktop machine



 archive-log-apply script - if you could, can you share the
 script itself ? or provide more details about the script.
 (It will help to understand the work-load in question).

Thanks, Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable

2012-09-17 Thread Anand Jain



What's 'service sub-cmd' please?


   at the moment 'btrfs service historymnt|dev'
   to show logs of maintenance.
   comments/suggestions welcome.


Sorry, but without a more detailed description I can hardly give useful
comments.



David,

  'btrfs service history mnt|dev'
 is basically to show the list of cli/gui commands which are
 successfully run on the btrfs as part of its -
 creation (may be), configuration and maintenance.

 HTH.

Thanks, Anand

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix the missing error information in create_pending_snapshot()

2012-09-17 Thread Miao Xie
The macro btrfs_abort_transaction() can get the line number of the code
where the problem happens, so we should invoke it in the place that the
error occurs, or we will lose the line number.

Reported-by: David Sterba d...@jikos.cz
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/transaction.c |   57 +--
 1 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7d3fc93..cf98dbc 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1042,7 +1042,8 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
goto fail;
} else if (IS_ERR(dir_item)) {
ret = PTR_ERR(dir_item);
-   goto abort_trans;
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
}
btrfs_release_path(path);
 
@@ -1053,8 +1054,10 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 * snapshot
 */
ret = btrfs_run_delayed_items(trans, root);
-   if (ret)/* Transaction aborted */
-   goto abort_trans;
+   if (ret) {  /* Transaction aborted */
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
record_root_in_trans(trans, root);
btrfs_set_root_last_snapshot(root-root_item, trans-transid);
@@ -1087,7 +1090,8 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
if (ret) {
btrfs_tree_unlock(old);
free_extent_buffer(old);
-   goto abort_trans;
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
}
 
btrfs_set_lock_blocking(old);
@@ -1096,8 +1100,10 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
/* clean up in any case */
btrfs_tree_unlock(old);
free_extent_buffer(old);
-   if (ret)
-   goto abort_trans;
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
/* see comments in should_cow_block() */
root-force_cow = 1;
@@ -1109,8 +1115,10 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
ret = btrfs_insert_root(trans, tree_root, key, new_root_item);
btrfs_tree_unlock(tmp);
free_extent_buffer(tmp);
-   if (ret)
-   goto abort_trans;
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
/*
 * insert root back/forward references
@@ -1119,23 +1127,30 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 parent_root-root_key.objectid,
 btrfs_ino(parent_inode), index,
 dentry-d_name.name, dentry-d_name.len);
-   if (ret)
-   goto abort_trans;
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
key.offset = (u64)-1;
pending-snap = btrfs_read_fs_root_no_name(root-fs_info, key);
if (IS_ERR(pending-snap)) {
ret = PTR_ERR(pending-snap);
-   goto abort_trans;
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
}
 
ret = btrfs_reloc_post_snapshot(trans, pending);
-   if (ret)
-   goto abort_trans;
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
-   if (ret)
-   goto abort_trans;
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
ret = btrfs_insert_dir_item(trans, parent_root,
dentry-d_name.name, dentry-d_name.len,
@@ -1143,15 +1158,17 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
BTRFS_FT_DIR, index);
/* We have check then name at the beginning, so it is impossible. */
BUG_ON(ret == -EEXIST);
-   if (ret)
-   goto abort_trans;
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto fail;
+   }
 
btrfs_i_size_write(parent_inode, parent_inode-i_size +
 dentry-d_name.len * 2);
parent_inode-i_mtime = parent_inode-i_ctime = CURRENT_TIME;
ret = btrfs_update_inode(trans, parent_root, parent_inode);
if (ret)
-   goto abort_trans;
+   btrfs_abort_transaction(trans, root, ret);
 fail:
dput(parent);
trans-block_rsv = 

Re: [PATCH V3 4/7] Btrfs-progs: fix wrong way to check if the root item contains otime and uuid

2012-09-17 Thread Anand Jain




-   if(ri-generation == ri-generation_v2) {
+   if(sh-len == sizeof(struct btrfs_root_item)) {
t = ri-otime.sec;


 This looks fine now but should this work when we
 move to v3 and still have access to v2 introduced
 members.?

 ker cli
 v3  v2  v2 introduced members are unnecessarily blocked
 v2  v3  --as above--

Thanks, Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html