[PATCH] Btrfs: fix double decrease of the writer counter
In __btrfs_end_transaction(), we have invoked sb_end_intwrite(), but if we need run btrfs_commit_transaction(), we will decrease the writer counter for two times because btrfs_commit_transaction() also invokes sb_end_intwrite(). Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 27c2600..3134fdc 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -551,8 +551,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, btrfs_trans_release_metadata(trans, root); trans-block_rsv = NULL; - sb_end_intwrite(root-fs_info-sb); - if (lock !atomic_read(root-fs_info-open_ioctl_trans) should_end_transaction(trans, root)) { trans-transaction-blocked = 1; @@ -573,6 +571,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, } } + sb_end_intwrite(root-fs_info-sb); + WARN_ON(cur_trans != info-running_transaction); WARN_ON(atomic_read(cur_trans-num_writers) 1); atomic_dec(cur_trans-num_writers); -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops with a degraded volume
On 09/15/2012 10:17 PM, Antoine Sirinelli wrote: Hi, I have experienced a very reproducible Oops within the btrfs driver. On a linux 3.5.4, if I mount a volume with the option degraded because one of the device is missing, I would get an Oops when I unmount it (or even before). You can see attached the kernel log. Thanks for the report. And this has been fixed by commit 99f5944b8477914406173b47b4f261356286730b Btrfs: do not strdup non existent strings You can find this commit in 3.6.0-rc5. :) thanks, liubo Here is how I create my btrfs volume: # mkfs.btrfs /dev/vdb /dev/vdc # mount /dev/vdb /mnt # dd if=/dev/zero of=/mnt/zeros count=1M # umount /mnt # shutdown -h now I am then wiping one volume (/dev/vdc) and restarting the system. To get a crash, here is what I am doing: # mount -o degraded /dev/vdb /mnt # umount /mnt I recognise the volume is not usable after having erased one drive but I would expect no to crash the kernel in such circumstances. I am not an expert, I am just reporting a crash from an user point of view. Antoine -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Experiences: Why BTRFS had to yield for ZFS
Abstract For database testing purposes, a COW filesystem was needed in order to facilitate snapshotting and rollback, such as to provide mirrors of our production database at fixed intervals (every night and by demand). Platform An HP Proliant 380P (2x Intel Xeon E5-2620 with 12 cores for a total of 24 threads) with build-in Smart Array SAS/SATA (Gen8) controllers, was combined with 10x consumer Samsung 830 512GB SSD (SATAIII, 6Gb/s). Oracle (Unbreakable) Linux x64 2.6.39-200.29.3.el6uek.x86_64 #1 SMP Tue Aug 28 13:03:31 EDT 2012 and Oracle database standard edition 10.2.0.4 64bit. Setup OS was installed on fist disk (sda) and the remaining 9 (sdb - sdj) were pooled into some 4.4TB, for containing Oracle datafiles. An initial backup of the 1.5TB large prod database would get restored as a (shut down) sync instance on the test server on the COW filesystem. A script on the test server, would then apply Oracle archive files from the production environment to this Oracle sync database, every 10'th minute, effectively making it near up-to-date with production. The most reliable way to do this was with a simple NFS mount (rather than rsync or samba). The idea then was, that it would be very fast and easy to make a new snapshot of the sync database, start it up, and voila you'd have a new instance ready to play with. A desktop machine with ext4 partitions proved lower boundary for applying archivelog data at around 1200 kb/s - we expected an order of magnitude higher performance on the server. BTRFS experiences We used native BTRFS from kernel; with atime off, ssd mode. BTRFS proved to be very fast at reading for a large TRDBMS (2x speedup compared to a SAN). However, applying archivelog on a BTRFS filesystem proved to scale poorly, by starting out with a decent apply rate it would eventually end down around 400-500 kb/s. BTRFS had to be abandoned due to this, since the script would never be able to finish applying archivelog as new ones arrived. The desktop machine with traditional spinning drives formatted for BTRFS showed a similar scenario, so hardware (server, controller and disks) was excluded as a cause. ZFS experiences We then tried using ZFS via custom-built SPL/ZFS 0.6.0-rc10 modules with recordsize equal to that of Oracle database (8K); compression off, quota off, dedup off, checksum on and atime on. ZFS proved to be on-pair with a SAN, when it comes to reading for a large TRDBMS. Thankfully, ZFS did not degrade much in archivelog apply performance, and proved to have a lower-boundary of 15MB/s. Conclusion We had hoped to be able to utilize BTRFS, due to it's license and inclusion in the Linux mainline kernel. However, for practical purposes, we're not able to make use of BTRFS due to its performance when writing -especially considering this is even without mixing in shapshotting. While ZFS doesn't give us quite the boost in read performance we had expected from SSD's, it seems more optimized for writting and will allow us to complete our project of getting clones of a production database environment up and running in a snap. Take it for what it's worth, a couple of developers experiences with BTRFS. We are not likely to go back and change things now it works, but we are curious as to why we see such big differences between the two file-systems. Any comments and/or feedback appreciated. Regards, Jesper and Casper -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Experiences: Why BTRFS had to yield for ZFS
* Casper Bang casper.b...@gmail.com: Oracle (Unbreakable) Linux x64 2.6.39-200.29.3.el6uek.x86_64 #1 SMP And the btrfs was that from vanilla 2.6.39 (i.e. over a year old)? -- Ralf Hildebrandt Charite Universitätsmedizin Berlin ralf.hildebra...@charite.deCampus Benjamin Franklin http://www.charite.de Hindenburgdamm 30, 12203 Berlin Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v3] Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag
We're going to use this flag EXTENT_DEFRAG to indicate which range belongs to defragment so that we can implement snapshow-aware defrag: We set the EXTENT_DEFRAG flag when dirtying the extents that need defragmented, so later on writeback thread can differentiate between normal writeback and writeback started by defragmentation. This patch is used for the latter one. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/extent_io.c |8 fs/btrfs/extent_io.h |2 ++ fs/btrfs/file.c |4 ++-- fs/btrfs/inode.c | 20 fs/btrfs/ioctl.c |8 5 files changed, 28 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4c87847..604e404 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1144,6 +1144,14 @@ int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end, NULL, cached_state, mask); } +int set_extent_defrag(struct extent_io_tree *tree, u64 start, u64 end, + struct extent_state **cached_state, gfp_t mask) +{ + return set_extent_bit(tree, start, end, + EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG, + NULL, cached_state, mask); +} + int clear_extent_dirty(struct extent_io_tree *tree, u64 start, u64 end, gfp_t mask) { diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 25900af..512f8da 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -235,6 +235,8 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, int bits, int clear_bits, gfp_t mask); int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached_state, gfp_t mask); +int set_extent_defrag(struct extent_io_tree *tree, u64 start, u64 end, + struct extent_state **cached_state, gfp_t mask); int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, int bits); struct extent_state *find_first_extent_bit_state(struct extent_io_tree *tree, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5caf285..226690a 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1173,8 +1173,8 @@ again: clear_extent_bit(BTRFS_I(inode)-io_tree, start_pos, last_pos - 1, EXTENT_DIRTY | EXTENT_DELALLOC | - EXTENT_DO_ACCOUNTING, 0, 0, cached_state, - GFP_NOFS); + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, + 0, 0, cached_state, GFP_NOFS); unlock_extent_cached(BTRFS_I(inode)-io_tree, start_pos, last_pos - 1, cached_state, GFP_NOFS); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b2c3514..55857eb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3531,7 +3531,8 @@ again: } clear_extent_bit(BTRFS_I(inode)-io_tree, page_start, page_end, - EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING, + EXTENT_DIRTY | EXTENT_DELALLOC | + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached_state, GFP_NOFS); ret = btrfs_set_extent_delalloc(inode, page_start, page_end, @@ -5998,7 +5999,8 @@ unlock: if (lockstart lockend) { if (create len lockend - lockstart) { clear_extent_bit(BTRFS_I(inode)-io_tree, lockstart, -lockstart + len - 1, unlock_bits, 1, 0, +lockstart + len - 1, +unlock_bits | EXTENT_DEFRAG, 1, 0, cached_state, GFP_NOFS); /* * Beside unlock, we also need to cleanup reserved space @@ -6006,8 +6008,8 @@ unlock: */ clear_extent_bit(BTRFS_I(inode)-io_tree, lockstart + len, lockend, -unlock_bits | EXTENT_DO_ACCOUNTING, -1, 0, NULL, GFP_NOFS); +unlock_bits | EXTENT_DO_ACCOUNTING | +EXTENT_DEFRAG, 1, 0, NULL, GFP_NOFS); } else { clear_extent_bit(BTRFS_I(inode)-io_tree, lockstart, lockend, unlock_bits, 1, 0, @@ -6572,8 +6574,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned long offset) */ clear_extent_bit(tree, page_start, page_end,
[PATCH 2/2 v3] Btrfs: snapshot-aware defrag
This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- Changes since v2: - adopt better names for local structures. - add proper reschedule phrase - better error handling - minor cleanups (Thanks, David) fs/btrfs/inode.c | 617 ++ 1 files changed, 617 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 55857eb..8278aa2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -54,6 +54,7 @@ #include locking.h #include free-space-cache.h #include inode-map.h +#include backref.h struct btrfs_iget_args { u64 ino; @@ -1846,6 +1847,608 @@ out: return ret; } +/* snapshot-aware defrag */ +struct sa_defrag_extent_backref { + struct rb_node node; + struct old_sa_defrag_extent *old; + u64 root_id; + u64 inum; + u64 file_pos; + u64 extent_offset; + u64 num_bytes; + u64 generation; +}; + +struct old_sa_defrag_extent { + struct list_head list; + struct new_sa_defrag_extent *new; + + u64 extent_offset; + u64 bytenr; + u64 offset; + u64 len; + int count; +}; + +struct new_sa_defrag_extent { + struct rb_root root; + struct list_head head; + struct btrfs_path *path; + struct inode *inode; + u64 file_pos; + u64 len; + u64 bytenr; + u64 disk_len; + u8 compress_type; +}; + +static int backref_comp(struct sa_defrag_extent_backref *b1, + struct sa_defrag_extent_backref *b2) +{ + if (b1-root_id b2-root_id) + return -1; + else if (b1-root_id b2-root_id) + return 1; + + if (b1-inum b2-inum) + return -1; + else if (b1-inum b2-inum) + return 1; + + if (b1-file_pos b2-file_pos) + return -1; + else if (b1-file_pos b2-file_pos) + return 1; + + WARN_ON(1); + return 0; +} + +static void backref_insert(struct rb_root *root, + struct sa_defrag_extent_backref *backref) +{ + struct rb_node **p = root-rb_node; + struct rb_node *parent = NULL; + struct sa_defrag_extent_backref *entry; + int ret; + + while (*p) { + parent = *p; + entry = rb_entry(parent, struct sa_defrag_extent_backref, node); + + ret = backref_comp(backref, entry); + if (ret 0) + p = (*p)-rb_left; + else if (ret 0) + p = (*p)-rb_right; + else + BUG_ON(1); + } + + rb_link_node(backref-node, parent, p); + rb_insert_color(backref-node, root); +} + +/* + * Note the backref might has changed, and in this case we just return 0. + */ +static noinline int record_one_backref(u64 inum, u64 offset, u64 root_id, + void *ctx) +{ + struct btrfs_file_extent_item *extent; + struct btrfs_fs_info *fs_info; + struct old_sa_defrag_extent *old = ctx; + struct new_sa_defrag_extent *new = old-new; + struct btrfs_path *path = new-path; + struct btrfs_key key; + struct btrfs_root *root; + struct sa_defrag_extent_backref *backref; + struct extent_buffer *leaf; + struct inode *inode = new-inode; + int slot; + int ret; + u64 extent_offset; + u64 num_bytes; + + if (BTRFS_I(inode)-root-root_key.objectid == root_id + inum == btrfs_ino(inode)) + return 0; + + key.objectid = root_id; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + + fs_info = BTRFS_I(inode)-root-fs_info; + root = btrfs_read_fs_root_no_name(fs_info, key); + if (IS_ERR(root)) { + if (PTR_ERR(root) == -ENOENT) + return 0; + WARN_ON(1); + pr_debug(inum=%llu, offset=%llu, root_id=%llu\n, +inum, offset, root_id); + return PTR_ERR(root); + } + + key.objectid = inum; + key.type = BTRFS_EXTENT_DATA_KEY; + if (offset (u64)-1 32) + key.offset = 0; + else + key.offset = offset; + + ret =
Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag
Please only push this one since the first one remains unchanged, I also posted it for others to better review. thanks, liubo On 09/17/2012 05:58 PM, Liu Bo wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- Changes since v2: - adopt better names for local structures. - add proper reschedule phrase - better error handling - minor cleanups (Thanks, David) fs/btrfs/inode.c | 617 ++ 1 files changed, 617 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 55857eb..8278aa2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -54,6 +54,7 @@ #include locking.h #include free-space-cache.h #include inode-map.h +#include backref.h struct btrfs_iget_args { u64 ino; @@ -1846,6 +1847,608 @@ out: return ret; } +/* snapshot-aware defrag */ +struct sa_defrag_extent_backref { + struct rb_node node; + struct old_sa_defrag_extent *old; + u64 root_id; + u64 inum; + u64 file_pos; + u64 extent_offset; + u64 num_bytes; + u64 generation; +}; + +struct old_sa_defrag_extent { + struct list_head list; + struct new_sa_defrag_extent *new; + + u64 extent_offset; + u64 bytenr; + u64 offset; + u64 len; + int count; +}; + +struct new_sa_defrag_extent { + struct rb_root root; + struct list_head head; + struct btrfs_path *path; + struct inode *inode; + u64 file_pos; + u64 len; + u64 bytenr; + u64 disk_len; + u8 compress_type; +}; + +static int backref_comp(struct sa_defrag_extent_backref *b1, + struct sa_defrag_extent_backref *b2) +{ + if (b1-root_id b2-root_id) + return -1; + else if (b1-root_id b2-root_id) + return 1; + + if (b1-inum b2-inum) + return -1; + else if (b1-inum b2-inum) + return 1; + + if (b1-file_pos b2-file_pos) + return -1; + else if (b1-file_pos b2-file_pos) + return 1; + + WARN_ON(1); + return 0; +} + +static void backref_insert(struct rb_root *root, +struct sa_defrag_extent_backref *backref) +{ + struct rb_node **p = root-rb_node; + struct rb_node *parent = NULL; + struct sa_defrag_extent_backref *entry; + int ret; + + while (*p) { + parent = *p; + entry = rb_entry(parent, struct sa_defrag_extent_backref, node); + + ret = backref_comp(backref, entry); + if (ret 0) + p = (*p)-rb_left; + else if (ret 0) + p = (*p)-rb_right; + else + BUG_ON(1); + } + + rb_link_node(backref-node, parent, p); + rb_insert_color(backref-node, root); +} + +/* + * Note the backref might has changed, and in this case we just return 0. + */ +static noinline int record_one_backref(u64 inum, u64 offset, u64 root_id, +void *ctx) +{ + struct btrfs_file_extent_item *extent; + struct btrfs_fs_info *fs_info; + struct old_sa_defrag_extent *old = ctx; + struct new_sa_defrag_extent *new = old-new; + struct btrfs_path *path = new-path; + struct btrfs_key key; + struct btrfs_root *root; + struct sa_defrag_extent_backref *backref; + struct extent_buffer *leaf; + struct inode *inode = new-inode; + int slot; + int ret; + u64 extent_offset; + u64 num_bytes; + + if (BTRFS_I(inode)-root-root_key.objectid == root_id + inum == btrfs_ino(inode)) + return 0; + + key.objectid = root_id; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + + fs_info = BTRFS_I(inode)-root-fs_info; + root = btrfs_read_fs_root_no_name(fs_info, key); + if (IS_ERR(root)) { + if (PTR_ERR(root) == -ENOENT) + return 0; + WARN_ON(1); + pr_debug(inum=%llu, offset=%llu, root_id=%llu\n, + inum, offset, root_id); + return PTR_ERR(root); + } + + key.objectid = inum; + key.type =
Re: Experiences: Why BTRFS had to yield for ZFS
Hi, On 17/09/2012, at 7:55 PM, Casper Bnag casper.b...@gmail.com wrote: We're using the latest available kernel for our Oracle Unbreakable Linux 6.3 from Aug 28. We have no other option, since the Oracle database software needs to run on a certified distro. Oracle Database is not certified to run on either btrfs or ZFS on Linux, so if certification is an issue, you can't use either filesystem. Out of interest, have you done a performance benchmark with ASM using ASMlib on the same platform? -- Oracle http://www.oracle.com Avi Miller | Principal Program Manager | +61 (412) 229 687 Oracle Linux and Virtualization 417 St Kilda Road, Melbourne, Victoria 3004 Australia -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable
On Mon, Sep 17, 2012 at 12:48:10PM +0800, Anand Jain wrote: btrfs send introduced a part of code to read kernel-data from user-end using pipe. We need this part of code to be useable outside of send sub-cmd, so that developing service sub-cmd can use it. What's 'service sub-cmd' please? at the moment 'btrfs service history mnt|dev' to show logs of maintenance. comments/suggestions welcome. As I said in our private email exchange some months ago, I don't think this is the right way to be doing this. For example, if you use an alternative tool (such as btrfs-gui) which uses the ioctls directly, you've lost that logging information. Keeping a log of what's been done to the FS is much better done by extending the available logging in the kernel (and making it a compile-time option for those who don't want or need it). You can then write a simple shell script to chomp through the normal kernel logs to extract this information. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'll take your bet, but make it ten thousand francs. I'm only --- a _poor_ corrupt official. signature.asc Description: Digital signature
Re: Experiences: Why BTRFS had to yield for ZFS
Hi, On 17/09/2012, at 8:47 PM, Casper Bnag casper.b...@gmail.com wrote: month, that just makes me wonder why Oracle didn't use these latest bits. We used the most stable release of btrfs that was available when the development of the UEK was done. Keep in mind that while it's versioned at 2.6.39, it's actually 3.0.16 under the hood. It's just that some userspace doesn't like having a kernel version that doesn't start with 2.6 Out of interest, have you done a performance benchmark with ASM using ASMlib on the same platform? Sorry, no. Our experience with ASM is limited, we came to the conclusion once that we like being able to handle the files in a plain mountable file-system. Perhaps, but ASM would provide all the functionality you require, including snapshots and rollback, at the highest possible performance. Certainly a lot higher than both ZFS and btrfs. And it's fully certified and supported by Oracle. As an alternative, why not consider using Oracle VM on the machine and creating database VMs instead? You can then use the snapshot capability of Oracle VM while still running supported and certified filesystems inside each guest. (We should also probably take this discussion off-list, as it has drifted away from btrfs proper). Feel free to reply to me directly if you want. -- Oracle http://www.oracle.com Avi Miller | Principal Program Manager | +61 (412) 229 687 Oracle Linux and Virtualization 417 St Kilda Road, Melbourne, Victoria 3004 Australia -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
enquiry about autodefrag option (resent)
I am testing btrfs for long-term storage and backup, and i would like to know more about autodefrag option: 1. Will autodefrag option benefit ssd? My understanding is: autodrag - number of extent decrease - metadata decrease - a healthier filesystem in the long run (P.S. I am aware that autodefrag will introduce extra write I/O) 2. AFAIK, autodefrag detects small random writes into files and queues them up for an automatic defrag process, so the filesystem will defragment itself while it's used. If the system reboot/crash/remount-ro, will the autodefrag process continue after resume? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions
On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote: On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote: On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote: div_factor{_fine} has been implemented for two times, cleanup it. And I move them into a independent file named math.h because they are common math functions. You removed the sanity checks: - if (factor = 0) - return 0; - if (factor = 100) - return num; As inline functions, they should not contain complex checks, the caller should make sure the parameters are right. I think. div_factor_fine() in volumes.c is not inline, and is called from chunk_usage_filter() on unvalidated user input. If you think the caller should do those checks, you should move them to the caller as part of your patch. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix race with freeze and free space inodes
On Sun, Sep 16, 2012 at 11:36:57PM -0600, Miao Xie wrote: On fri, 14 Sep 2012 11:26:20 -0400, Josef Bacik wrote: So we start our freeze, somebody comes in and does an fsync() on a file where we have to commit a transaction for whatever reason, and we will deadlock because the freeze is waiting on FS_FREEZE people to stop writing to the file system, but the transaction is waiting for its free space inodes to be written out, which are in turn waiting on sb_start_intwrite while trying to write the file extents. To fix this we'll just skip the sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a transaction commit so we're safe wrt to freeze and this will keep us from deadlocking. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/transaction.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c9265a6..ba74dfb 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -342,7 +342,15 @@ again: if (!h) return ERR_PTR(-ENOMEM); - if (!__sb_start_write(root-fs_info-sb, SB_FREEZE_FS, false)) { + /* +* If we are JOIN_NOLOCK we're already committing a transaction and +* waiting on this guy, so we don't need to do the sb_start_intwrite +* because we're already holding a ref. We need this because we could +* have raced in and did an fsync() on a file which can kick a commit +* and then we deadlock with somebody doing a freeze. +*/ + if (type != TRANS_JOIN_NOLOCK + !__sb_start_write(root-fs_info-sb, SB_FREEZE_FS, false)) { if (type == TRANS_JOIN_FREEZE) return ERR_PTR(-EPERM); sb_start_intwrite(root-fs_info-sb); This patch forgets to deal with it in __btrfs_end_transaction(), or the freeze counter will be wrong. This was fixed locally I just sent the wrong patch, thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable
On Mon, Sep 17, 2012 at 12:48:10PM +0800, Anand Jain wrote: btrfs send introduced a part of code to read kernel-data from user-end using pipe. We need this part of code to be useable outside of send sub-cmd, so that developing service sub-cmd can use it. What's 'service sub-cmd' please? at the moment 'btrfs service history mnt|dev' to show logs of maintenance. comments/suggestions welcome. Sorry, but without a more detailed description I can hardly give useful comments. The patch looks ok but stands alone, you can post it with your proposed feature together. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC inside][PATCH] btrfs: allow setting NOCOW for a zero sized file via ioctl
Hi, Josef, I noticed that you did not add the patch to btrfs-next. This is understandable for a RFC patch of course, but I'd like to ask you to add it into the queue, so people testing -next have a chance to give it a try. Thanks, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Root dentry has weird name
On Mon, Sep 17, 2012 at 06:17:53PM +0200, David Sterba wrote: On Fri, Sep 14, 2012 at 10:09:12AM -0700, Marc MERLIN wrote: I only have btrfs on my laptop and just started getting this. Afaik, this is not directly related to btrfs. Search for the Root dentry has weird name message and you'll see occurences from kernel 3.0, 3.1. I'm not too clear about whether it's in memory or on my filesystem somewhere. It's reflecting a in-memory state. Can you recommend what I should do: reboot? fsck somehow? other? Reboot should help, also check for potential NFS problems, like unreachable server. Thanks for your answer. I indeed should have searched that first instead of assuming it was btrfs related. I can also confirm that rebooting made the message go away. Thanks for your answer and sorry for posting to the wrong list. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This code cannot be right
ctree.c:btrfs_insert_some_items() { ... if (total_size + data_size[i]+ ... { break; nr = i; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions
On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote: On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote: On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote: div_factor{_fine} has been implemented for two times, cleanup it. And I move them into a independent file named math.h because they are common math functions. You removed the sanity checks: - if (factor = 0) - return 0; - if (factor = 100) - return num; As inline functions, they should not contain complex checks, the caller should make sure the parameters are right. I think. It's compiler's job to decide whether a function should be inlined or not. The keyword/function attribute 'inline' is only a hint, unless always_inline is used and the author should be sure that it really has the expected outcome and that compiler is wrong here. I don't agree that each caller should do the checks, it only makes code harder to read and forces the authors to check for conditions that may not be apparent or are just ommitted. If we need a function that does not check the boundaries, then of course go for it, but I don't see such case yet. in new version. And I don't think it's necessary to add an extra include with a rather generic name and trivial code. A separate .h/.c with non-filesystem related support code like this looks more suitable. Do you intend to use the functions out of extent-tree.c ? They are used in both extent-tree.c and volumes.c from the outset, but they were implemented in these two files severally. Ah, I see. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix double decrease of the writer counter
On Mon, Sep 17, 2012 at 12:34:27AM -0600, Miao Xie wrote: In __btrfs_end_transaction(), we have invoked sb_end_intwrite(), but if we need run btrfs_commit_transaction(), we will decrease the writer counter for two times because btrfs_commit_transaction() also invokes sb_end_intwrite(). Fix it. Already fixed in btrfs-next. Thanks, Josef Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 27c2600..3134fdc 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -551,8 +551,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, btrfs_trans_release_metadata(trans, root); trans-block_rsv = NULL; - sb_end_intwrite(root-fs_info-sb); - if (lock !atomic_read(root-fs_info-open_ioctl_trans) should_end_transaction(trans, root)) { trans-transaction-blocked = 1; @@ -573,6 +571,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, } } + sb_end_intwrite(root-fs_info-sb); + WARN_ON(cur_trans != info-running_transaction); WARN_ON(atomic_read(cur_trans-num_writers) 1); atomic_dec(cur_trans-num_writers); -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()
On Thu, Sep 06, 2012 at 06:00:32PM +0800, Miao Xie wrote: This patch fixes the following problem: - If we failed to deal with the delayed dir items, we should abort transaction, just as its comment said. Fix it. - If root reference or root back reference insertion failed, we should abort transaction. Fix it. - Fix the double free problem of pending-inherit. - Do not restore the trans-rsv if we doesn't change it. - make the error path more clearly. I've noticed a pattern in the error + transaction abort paths, that is touched in this patch and would like to ask you to update it: @@ -1018,10 +1016,9 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, BTRFS_FT_DIR, index); if (ret == -EEXIST) { pending-error = -EEXIST; - dput(parent); goto fail; normal exit path: here we don't abort transaction, just go the exit block and do the cleanup } else if (ret) { - goto abort_trans_dput; + goto abort_trans; a transaction abort path: here we jump to a common block that calls abort, but we lose the information where the abort occured I went through the code and saw several uses of this pattern (and I remember more than one bugreport that pointed to a abort_transaction call without leaving any traces what condition failed). (Search regex I used 'goto.*abort') So the proposed pattern to use is --- if (condition) { btrfs_transaction_abort(...); goto fail; } fail: cleanup return ...; --- @@ -1120,15 +1114,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, ret = btrfs_reloc_post_snapshot(trans, pending); if (ret) goto abort_trans; - ret = 0; fail: - kfree(new_root_item); + dput(parent); trans-block_rsv = rsv; +no_free_objectid: + kfree(new_root_item); +root_item_alloc_fail: btrfs_block_rsv_release(root, pending-block_rsv, (u64)-1); return ret; -abort_trans_dput: - dput(parent); abort_trans: btrfs_abort_transaction(trans, root, ret); goto fail; (end of function here) this will also remove all the instances where a function ends with a 'goto'. All instances are convertible to the pattern described above. Atlernate approach that I originally considered for fixing was to introduce a call like 'btrfs_mark_transaction_abort_callsite' which would need to add a field to fs_info and print it later. But, if we're going to touch all the code, it makes sense to utilize the infrastructure we already have. Please consider updating your patch, I'll send a separate patch that deals with aborts outside of create_pending_snapshot. TIA, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag
On Mon, Sep 17, 2012 at 03:58:56AM -0600, Liu Bo wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com I was trying to fixup the rejects on this patch when I noticed there were no tabs, only spaces. Thats not going to work and now I have to go back and make sure none of your other patches did this. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'umount' of multi-device volume hangs until the device is physically un-plugged
On Sun, Sep 16, 2012 at 10:07:39PM -0600, Kay Sievers wrote: I'm currently playing around with native btrfs multi-device support in systemd. There might be a few hotplug issues to solve, here is the first one: A mounted (otherwise unused) multi-device volume (USB multi-slot card reader), hangs at: $ umount /mnt with (fedora) kernel 3.6.0-0.rc5.git0.1.fc18.x86_64 Any idea what to look for or what to try? Can I see the whole sysrq+w? Also can you try btrfs-next and see if you have the same problems? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: This code cannot be right
On Mon, Sep 17, 2012 at 10:25:54AM -0600, Alan Cox wrote: ctree.c:btrfs_insert_some_items() { ... if (total_size + data_size[i]+ ... { break; nr = i; } Hi Alan, Definitely not right ;) It's actually unused, but I thought I had gotten rid of it long ago. I'll queue up a patch for the next merge window, thanks. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not hold the write_lock on the extent tree while logging V2
Dave Sterba pointed out a sleeping while atomic bug while doing fsync. This is because I'm an idiot and didn't realize that rwlock's were spin locks, so we've been holding this thing while doing allocations and such which is not good. This patch fixes this by dropping the write lock before we do anything heavy and re-acquire it when it is done. We also need to take a ref on the em's in case their corresponding pages are evicted and mark them as being logged so that releasepage does not remove them and doesn't remove them from our local list. Thanks, Reported-by: Dave Sterba d...@jikos.cz Signed-off-by: Josef Bacik jba...@fusionio.com --- V1-V2: drop our ref if we had an error fs/btrfs/extent_map.c |3 ++- fs/btrfs/extent_map.h |1 + fs/btrfs/tree-log.c | 20 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 8d1364d..b8cbc8d 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -407,7 +407,8 @@ int remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em) WARN_ON(test_bit(EXTENT_FLAG_PINNED, em-flags)); rb_erase(em-rb_node, tree-map); - list_del_init(em-list); + if (!test_bit(EXTENT_FLAG_LOGGING, em-flags)) + list_del_init(em-list); em-in_tree = 0; return ret; } diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index 8e6294b..6792255 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -13,6 +13,7 @@ #define EXTENT_FLAG_COMPRESSED 1 #define EXTENT_FLAG_VACANCY 2 /* no file extent item found */ #define EXTENT_FLAG_PREALLOC 3 /* pre-allocated extent */ +#define EXTENT_FLAG_LOGGING 4 /* Logging this extent */ struct extent_map { struct rb_node rb_node; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 038a522..a3e88cf 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2945,6 +2945,9 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, list_del_init(em-list); if (em-generation = test_gen) continue; + /* Need a ref to keep it from getting evicted from cache */ + atomic_inc(em-refs); + set_bit(EXTENT_FLAG_LOGGING, em-flags); list_add_tail(em-list, extents); } @@ -2954,13 +2957,18 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, em = list_entry(extents.next, struct extent_map, list); list_del_init(em-list); + clear_bit(EXTENT_FLAG_LOGGING, em-flags); /* * If we had an error we just need to delete everybody from our * private list. */ - if (ret) + if (ret) { + free_extent_map(em); continue; + } + + write_unlock(tree-lock); /* * If the previous EM and the last extent we left off on aren't @@ -2971,21 +2979,25 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, ret = copy_items(trans, inode, dst_path, args.src, args.start_slot, args.nr, LOG_INODE_ALL); - if (ret) + if (ret) { + free_extent_map(em); continue; + } btrfs_release_path(path); args.nr = 0; } ret = log_one_extent(trans, inode, root, em, path, dst_path, args); + free_extent_map(em); + write_lock(tree-lock); } + WARN_ON(!list_empty(extents)); + write_unlock(tree-lock); if (!ret args.nr) ret = copy_items(trans, inode, dst_path, args.src, args.start_slot, args.nr, LOG_INODE_ALL); btrfs_release_path(path); - WARN_ON(!list_empty(extents)); - write_unlock(tree-lock); return ret; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops with a degraded volume
On Mon, Sep 17, 2012 at 02:46:00PM +0800, Liu Bo wrote: On 09/15/2012 10:17 PM, Antoine Sirinelli wrote: I have experienced a very reproducible Oops within the btrfs driver. On a linux 3.5.4, if I mount a volume with the option degraded because one of the device is missing, I would get an Oops when I unmount it (or even before). You can see attached the kernel log. Thanks for the report. And this has been fixed by commit 99f5944b8477914406173b47b4f261356286730b Btrfs: do not strdup non existent strings You can find this commit in 3.6.0-rc5. :) That's right, I have done the same test with rc6 and it does not crash anymore. Many thanks, Antoine signature.asc Description: Digital signature
btrfs raid1 degraded in need of chuck tree rebuild
Below is my original post about my fs. Just wondering if anyone knows if I can at this point get my data back or cut my losses. Is an fsck cable of getting this fixed close or has my 2 year wait been in vain. Thanks in advance! Excerpts from Vladi Gergov's message of 2010-10-29 16:53:42 -0400: gypsyops @ /mnt sudo mount -o degraded /dev/sdc das3/ Password: mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so [ 684.577540] device label das4 devid 2 transid 107954 /dev/sdc [ 684.595150] btrfs: allowing degraded mounts [ 684.595594] btrfs: failed to read chunk root on sdb [ 684.604110] btrfs: open_ctree failed gypsyops @ /mnt sudo btrfsck /dev/sdc btrfsck: volumes.c:1367: btrfs_read_sys_array: Assertion `!(ret)' failed. Ok, I dug through this and found the bug responsible for your unmountable FS. When we're mounted in degraded mode, and we don't have enough drives available to do raid1,10, we're can use the wrong raid level for new allocations. I'm fixing the kernel side so this doesn't happen anymore, but I'll need to rebuild the chunk tree (and probably a few others) off your good disk to fix things. I've got it reproduced here though, so I'll make an fsck that can scan for the correct trees and fix it for you. Since you're basically going to be my first external fsck customer, is there anyway you can do a raw device based backup of the blocks? This way if I do mess things up we can repeat the experiment. -chris -- ,-| Vladi `-| Gergov -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'umount' of multi-device volume hangs until the device is physically un-plugged
On Mon, Sep 17, 2012 at 7:19 PM, Josef Bacik jba...@fusionio.com wrote: On Sun, Sep 16, 2012 at 10:07:39PM -0600, Kay Sievers wrote: I'm currently playing around with native btrfs multi-device support in systemd. There might be a few hotplug issues to solve, here is the first one: A mounted (otherwise unused) multi-device volume (USB multi-slot card reader), hangs at: $ umount /mnt with (fedora) kernel 3.6.0-0.rc5.git0.1.fc18.x86_64 Any idea what to look for or what to try? Can I see the whole sysrq+w? Also can you try btrfs-next and see if you have the same problems? Thanks, Hmm, I can't reproduce that today. Nothing really has changes with the setup. It was easy to reproduce yesterday, even across multiple reboots. I'll come back if I see it again. Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS_IOC_DEVICES_READY and removed devices
We are currently playing around with native btrfs multi-device support in systemd. We already committed the needed pieces to systemd git, to register all detected btrfs filesystems with the kernel. For volumes which are listed in fstab for mounting, we delay the actual mount-attempt of a multi-device volume until we see READY returned from BTRFS_IOC_DEVICES_READY. A line with UUID= in /etc/fstab with nofail in the options field, and we can boot up without any device plugged in. Now plugging in devices one-after-the-other until the volume has a full tree of devices; with the last device there, systemd just mounts the volume as expected. This seems to work very well so far, unless a device which is already registered disappears, which is a kind of valid hotplug scenario we should handle better: If one device of a 2-device volume is registered with the in-kernel cache, and then the device is unplugged from the system, the cache state does not get updated. If then the other device of the 2-device volume is registered, BTRFS_IOC_DEVICES_READY indicates ready; but in fact only one of two needed devices are available at that time, and mounting fails. Can we somehow subscribe to device media-changes/removal to prevent the stale device state in the in-kernel cache? Or alternatively make BTRFS_IOC_DEVICES_READY re-validate all involved block devices before it returns READY? Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Btrfs: set mount options permanently
Following patches are going to implement one of unclaimed features listed in the btrfs wiki: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Set_mount_options_permanently Special thanks to Kazuhiro Yamashita for his time and efforts. Your comments/reviews are welcomed. Thanks, H.Seto -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: make space to keep default mount options
This patch create space to hold default mount option, and to use saved default mount option change super.c to read default mount option first when mount devices. Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com --- fs/btrfs/ctree.h |5 - fs/btrfs/super.c |2 ++ 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fa5c45b..3eb0551 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -458,8 +458,11 @@ struct btrfs_super_block { __le64 cache_generation; + /* default mount options */ + unsigned long default_mount_opt; + /* future expansion */ - __le64 reserved[31]; + __le64 reserved[30]; u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE]; struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS]; } __attribute__ ((__packed__)); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index e239915..7ef4a2e 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -340,6 +340,8 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) char *compress_type; bool compress_force = false; + info-mount_opt = info-super_copy-default_mount_opt; + cache_gen = btrfs_super_cache_generation(root-fs_info-super_copy); if (cache_gen) btrfs_set_opt(info-mount_opt, SPACE_CACHE); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs-progs: add mount-option command
This patch adds mount-option command. The command can set/get default mount options. Now, the command can set/get 24 options. These options are equal to mount options which store in fs_info/mount-opt. Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com --- Makefile |5 +- btrfs-parse-mntopt.c | 111 + btrfs-parse-mntopt.h | 65 ++ btrfs.c |1 + cmds-mount.c | 150 ++ commands.h |2 + ctree.h | 41 +- 7 files changed, 372 insertions(+), 3 deletions(-) create mode 100644 btrfs-parse-mntopt.c create mode 100644 btrfs-parse-mntopt.h create mode 100644 cmds-mount.c diff --git a/Makefile b/Makefile index c0aaa3d..6f67f4c 100644 --- a/Makefile +++ b/Makefile @@ -5,9 +5,10 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ root-tree.o dir-item.o file-item.o inode-item.o \ inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \ volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \ - send-stream.o send-utils.o + send-stream.o send-utils.o btrfs-parse-mntopt.o cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ - cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o + cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ + cmds-mount.o CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \ -Wuninitialized -Wshadow -Wundef diff --git a/btrfs-parse-mntopt.c b/btrfs-parse-mntopt.c new file mode 100644 index 000..87b341c --- /dev/null +++ b/btrfs-parse-mntopt.c @@ -0,0 +1,111 @@ +#include stdio.h +#include stdlib.h +#include string.h +#include ctree.h +#include btrfs-parse-mntopt.h + +void btrfs_parse_string2mntopt(struct btrfs_root *root, char **options) +{ + struct btrfs_super_block *sb = root-fs_info-super_copy; + char *p = NULL; + int i = 0; + + memset(sb-default_mount_opt, 0, sizeof(unsigned long)); + while ((p = strsep(options, ,)) != NULL) { + int token = DEF_MNTOPT_NUM + 1; + + if (!*p) + continue; + for (i = 0; i DEF_MNTOPT_NUM; i++) { + if (!strcmp(p, toke[i].pattern)) { + token = toke[i].token; + break; + } + } + if (token DEF_MNTOPT_NUM) { + printf(error: %s\n, p); + return; + } + + switch (token) { + case Opt_degraded: + btrfs_set_opt(sb-default_mount_opt, DEGRADED); + break; + + case Opt_nodatasum: + btrfs_set_opt(sb-default_mount_opt, NODATASUM); + break; + case Opt_nodatacow: + btrfs_set_opt(sb-default_mount_opt, NODATACOW); + btrfs_set_opt(sb-default_mount_opt, NODATASUM); + break; + case Opt_ssd: + btrfs_set_opt(sb-default_mount_opt, SSD); + break; + case Opt_ssd_spread: + btrfs_set_opt(sb-default_mount_opt, SSD); + btrfs_set_opt(sb-default_mount_opt, SSD_SPREAD); + break; + case Opt_nossd: + btrfs_set_opt(sb-default_mount_opt, NOSSD); + btrfs_clear_opt(sb-default_mount_opt, SSD); + btrfs_clear_opt(sb-default_mount_opt, SSD_SPREAD); + break; + case Opt_nobarrier: + btrfs_set_opt(sb-default_mount_opt, NOBARRIER); + break; + case Opt_notreelog: + btrfs_set_opt(sb-default_mount_opt, NOTREELOG); + break; + case Opt_flushoncommit: + btrfs_set_opt(sb-default_mount_opt, FLUSHONCOMMIT); + break; + case Opt_discard: + btrfs_set_opt(sb-default_mount_opt, DISCARD); + break; + case Opt_space_cache: + btrfs_set_opt(sb-default_mount_opt, SPACE_CACHE); + break; + case Opt_no_space_cache: + btrfs_clear_opt(sb-default_mount_opt, SPACE_CACHE); + break; + case Opt_inode_cache: + btrfs_set_opt(sb-default_mount_opt, INODE_MAP_CACHE); + break; + case Opt_clear_cache: + btrfs_set_opt(sb-default_mount_opt, CLEAR_CACHE); + break; + case
Re: [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()
On mon, 17 Sep 2012 18:56:27 +0200, David Sterba wrote: On Thu, Sep 06, 2012 at 06:00:32PM +0800, Miao Xie wrote: This patch fixes the following problem: - If we failed to deal with the delayed dir items, we should abort transaction, just as its comment said. Fix it. - If root reference or root back reference insertion failed, we should abort transaction. Fix it. - Fix the double free problem of pending-inherit. - Do not restore the trans-rsv if we doesn't change it. - make the error path more clearly. I've noticed a pattern in the error + transaction abort paths, that is touched in this patch and would like to ask you to update it: OK, I will send a separate patch to fix this problem. Thanks for your review. Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs-progs: add mount-option command
On tue, 18 Sep 2012 10:30:17 +0900, Hidetoshi Seto wrote: This patch adds mount-option command. The command can set/get default mount options. Now, the command can set/get 24 options. These options are equal to mount options which store in fs_info/mount-opt. I don't think we need implement a separate command to do this, we can add it into btrfstune just like ext3/4. If so, the users who used ext3/4 before can be familiar with btrfs command as soon as possible. Beside that, why not add a option into mkfs.btrfs? Thanks Miao Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com --- Makefile |5 +- btrfs-parse-mntopt.c | 111 + btrfs-parse-mntopt.h | 65 ++ btrfs.c |1 + cmds-mount.c | 150 ++ commands.h |2 + ctree.h | 41 +- 7 files changed, 372 insertions(+), 3 deletions(-) create mode 100644 btrfs-parse-mntopt.c create mode 100644 btrfs-parse-mntopt.h create mode 100644 cmds-mount.c diff --git a/Makefile b/Makefile index c0aaa3d..6f67f4c 100644 --- a/Makefile +++ b/Makefile @@ -5,9 +5,10 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ root-tree.o dir-item.o file-item.o inode-item.o \ inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \ volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \ - send-stream.o send-utils.o + send-stream.o send-utils.o btrfs-parse-mntopt.o cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ -cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o +cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ +cmds-mount.o CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \ -Wuninitialized -Wshadow -Wundef diff --git a/btrfs-parse-mntopt.c b/btrfs-parse-mntopt.c new file mode 100644 index 000..87b341c --- /dev/null +++ b/btrfs-parse-mntopt.c @@ -0,0 +1,111 @@ +#include stdio.h +#include stdlib.h +#include string.h +#include ctree.h +#include btrfs-parse-mntopt.h + +void btrfs_parse_string2mntopt(struct btrfs_root *root, char **options) +{ + struct btrfs_super_block *sb = root-fs_info-super_copy; + char *p = NULL; + int i = 0; + + memset(sb-default_mount_opt, 0, sizeof(unsigned long)); + while ((p = strsep(options, ,)) != NULL) { + int token = DEF_MNTOPT_NUM + 1; + + if (!*p) + continue; + for (i = 0; i DEF_MNTOPT_NUM; i++) { + if (!strcmp(p, toke[i].pattern)) { + token = toke[i].token; + break; + } + } + if (token DEF_MNTOPT_NUM) { + printf(error: %s\n, p); + return; + } + + switch (token) { + case Opt_degraded: + btrfs_set_opt(sb-default_mount_opt, DEGRADED); + break; + + case Opt_nodatasum: + btrfs_set_opt(sb-default_mount_opt, NODATASUM); + break; + case Opt_nodatacow: + btrfs_set_opt(sb-default_mount_opt, NODATACOW); + btrfs_set_opt(sb-default_mount_opt, NODATASUM); + break; + case Opt_ssd: + btrfs_set_opt(sb-default_mount_opt, SSD); + break; + case Opt_ssd_spread: + btrfs_set_opt(sb-default_mount_opt, SSD); + btrfs_set_opt(sb-default_mount_opt, SSD_SPREAD); + break; + case Opt_nossd: + btrfs_set_opt(sb-default_mount_opt, NOSSD); + btrfs_clear_opt(sb-default_mount_opt, SSD); + btrfs_clear_opt(sb-default_mount_opt, SSD_SPREAD); + break; + case Opt_nobarrier: + btrfs_set_opt(sb-default_mount_opt, NOBARRIER); + break; + case Opt_notreelog: + btrfs_set_opt(sb-default_mount_opt, NOTREELOG); + break; + case Opt_flushoncommit: + btrfs_set_opt(sb-default_mount_opt, FLUSHONCOMMIT); + break; + case Opt_discard: + btrfs_set_opt(sb-default_mount_opt, DISCARD); + break; + case Opt_space_cache: + btrfs_set_opt(sb-default_mount_opt, SPACE_CACHE); + break; + case Opt_no_space_cache: + btrfs_clear_opt(sb-default_mount_opt, SPACE_CACHE); + break; +
Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable
As I said in our private email exchange some months ago, I don't think this is the right way to be doing this. For example, if you use an alternative tool (such as btrfs-gui) which uses the ioctls directly, you've lost that logging information. I agree with that Hugo. Thanks. These changes are partly for the same reason. Keeping a log of what's been done to the FS is much better done by extending the available logging in the kernel Could you please point out the modules you are talking about. I reviewed some but just in case if I have missed out any. Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions
On Mon, 17 Sep 2012 18:31:13 +0200, David Sterba wrote: On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote: On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote: On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote: div_factor{_fine} has been implemented for two times, cleanup it. And I move them into a independent file named math.h because they are common math functions. You removed the sanity checks: - if (factor = 0) - return 0; - if (factor = 100) - return num; As inline functions, they should not contain complex checks, the caller should make sure the parameters are right. I think. It's compiler's job to decide whether a function should be inlined or not. The keyword/function attribute 'inline' is only a hint, unless always_inline is used and the author should be sure that it really has the expected outcome and that compiler is wrong here. Right, but I think we should make the functions as simple as possible since they are marked as inline, because the simple function is more likely to be inlined than the complex one. I don't agree that each caller should do the checks, it only makes code harder to read and forces the authors to check for conditions that may not be apparent or are just ommitted. Right. But for these functions, we are sure the value of the parameters is in the right range in the most place, and all the place that we are sure the value is right is in the hot path. The only place that we need check the parameters is in slow path, this is also the reason why we make them inline. so doing those checks just wastes time. We just need modify the caller. Thanks Miao If we need a function that does not check the boundaries, then of course go for it, but I don't see such case yet. in new version. And I don't think it's necessary to add an extra include with a rather generic name and trivial code. A separate .h/.c with non-filesystem related support code like this looks more suitable. Do you intend to use the functions out of extent-tree.c ? They are used in both extent-tree.c and volumes.c from the outset, but they were implemented in these two files severally. Ah, I see. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs-progs: add mount-option command
On Tue, 18 Sep 2012 10:31:41 +0800 Miao Xie mi...@cn.fujitsu.com wrote: On tue, 18 Sep 2012 10:30:17 +0900, Hidetoshi Seto wrote: This patch adds mount-option command. The command can set/get default mount options. Now, the command can set/get 24 options. These options are equal to mount options which store in fs_info/mount-opt. I don't think we need implement a separate command to do this, we can add it into btrfstune just like ext3/4. If so, the users who used ext3/4 before can be familiar with btrfs command as soon as possible. btrfstune currently only does one thing: $ sudo btrfstune usage: btrfstune [options] device -S valueenable/disable seeding To me it'd seem more logical the other way, why not move this operation to the base btrfs utility under some command, and remove btrfstune completely. -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
Re: Experiences: Why BTRFS had to yield for ZFS
A script on the test server, would then apply Oracle archive files from the production environment to this Oracle sync database, every 10'th minute, effectively making it near up-to-date with production. The most reliable way to do this was with a simple NFS mount (rather than rsync or samba). The idea then was, that it would be very fast and easy to make a new snapshot of the sync database, start it up, and voila you'd have a new instance ready to play with. A desktop machine archive-log-apply script - if you could, can you share the script itself ? or provide more details about the script. (It will help to understand the work-load in question). Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable
What's 'service sub-cmd' please? at the moment 'btrfs service historymnt|dev' to show logs of maintenance. comments/suggestions welcome. Sorry, but without a more detailed description I can hardly give useful comments. David, 'btrfs service history mnt|dev' is basically to show the list of cli/gui commands which are successfully run on the btrfs as part of its - creation (may be), configuration and maintenance. HTH. Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix the missing error information in create_pending_snapshot()
The macro btrfs_abort_transaction() can get the line number of the code where the problem happens, so we should invoke it in the place that the error occurs, or we will lose the line number. Reported-by: David Sterba d...@jikos.cz Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c | 57 +-- 1 files changed, 35 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 7d3fc93..cf98dbc 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1042,7 +1042,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, goto fail; } else if (IS_ERR(dir_item)) { ret = PTR_ERR(dir_item); - goto abort_trans; + btrfs_abort_transaction(trans, root, ret); + goto fail; } btrfs_release_path(path); @@ -1053,8 +1054,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, * snapshot */ ret = btrfs_run_delayed_items(trans, root); - if (ret)/* Transaction aborted */ - goto abort_trans; + if (ret) { /* Transaction aborted */ + btrfs_abort_transaction(trans, root, ret); + goto fail; + } record_root_in_trans(trans, root); btrfs_set_root_last_snapshot(root-root_item, trans-transid); @@ -1087,7 +1090,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, if (ret) { btrfs_tree_unlock(old); free_extent_buffer(old); - goto abort_trans; + btrfs_abort_transaction(trans, root, ret); + goto fail; } btrfs_set_lock_blocking(old); @@ -1096,8 +1100,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, /* clean up in any case */ btrfs_tree_unlock(old); free_extent_buffer(old); - if (ret) - goto abort_trans; + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } /* see comments in should_cow_block() */ root-force_cow = 1; @@ -1109,8 +1115,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, ret = btrfs_insert_root(trans, tree_root, key, new_root_item); btrfs_tree_unlock(tmp); free_extent_buffer(tmp); - if (ret) - goto abort_trans; + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } /* * insert root back/forward references @@ -1119,23 +1127,30 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, parent_root-root_key.objectid, btrfs_ino(parent_inode), index, dentry-d_name.name, dentry-d_name.len); - if (ret) - goto abort_trans; + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } key.offset = (u64)-1; pending-snap = btrfs_read_fs_root_no_name(root-fs_info, key); if (IS_ERR(pending-snap)) { ret = PTR_ERR(pending-snap); - goto abort_trans; + btrfs_abort_transaction(trans, root, ret); + goto fail; } ret = btrfs_reloc_post_snapshot(trans, pending); - if (ret) - goto abort_trans; + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1); - if (ret) - goto abort_trans; + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } ret = btrfs_insert_dir_item(trans, parent_root, dentry-d_name.name, dentry-d_name.len, @@ -1143,15 +1158,17 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, BTRFS_FT_DIR, index); /* We have check then name at the beginning, so it is impossible. */ BUG_ON(ret == -EEXIST); - if (ret) - goto abort_trans; + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } btrfs_i_size_write(parent_inode, parent_inode-i_size + dentry-d_name.len * 2); parent_inode-i_mtime = parent_inode-i_ctime = CURRENT_TIME; ret = btrfs_update_inode(trans, parent_root, parent_inode); if (ret) - goto abort_trans; + btrfs_abort_transaction(trans, root, ret); fail: dput(parent); trans-block_rsv =
Re: [PATCH V3 4/7] Btrfs-progs: fix wrong way to check if the root item contains otime and uuid
- if(ri-generation == ri-generation_v2) { + if(sh-len == sizeof(struct btrfs_root_item)) { t = ri-otime.sec; This looks fine now but should this work when we move to v3 and still have access to v2 introduced members.? ker cli v3 v2 v2 introduced members are unnecessarily blocked v2 v3 --as above-- Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html