RE: [PATCH] btrfs file write debugging patch
Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder Sent: Tuesday, March 01, 2011 4:20 AM To: Maria Wikström Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied the patch at the head of this thread (with the jiffies debugging commented out) and I'm attaching a ftrace using the function_graph tracer when I'm stuck in the loop. I've just snipped out a couple of the loops (the full trace file is quite large, and mostly repititious). I'm going to try to modify file.c with some trace_printk debugging to show the values of several of the relevant variables at various stages. I'm going to try to exit the loop after 256 tries with an EFAULT so I can stop the tracing at that point and capture a trace of the entry into the problem (the ftrace ring buffer fills up too fast for me to capture the entry point). As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with TPK: to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line 1088. I let it loop through the while (iov_iter_count(i) 0) {} loop for 256 times when copied==0 (otherwise it would loop infinitely). Then exit out and stop the trace. For reference,
Re: [PATCH V2] Btrfs: Batched discard support for btrfs
On Friday, February 25, 2011 04:16:27 PM Li Dongyang wrote: Thanks for your comments, here is the updated patch. I've tested it with xfstests 251(thanks to Lukas), and it looks fine to me. when we call btrfs_map_block() for RAID0/1/10/ or DUP, it only returns a single stripe length at most, I'm a bit confused why we are doing this and it makes a little trouble to this patch: we just trim the first stripe on each device right now. We can loop in btrfs_discard_extent(), mapping each stripe and trim them, but I think the ideal way is mapping the full length of the free extent and trim that all at once, ideas? Thanks a lot, Li Dongyang Signed-off-by: Li Dongyang lidongy...@novell.com Reviewed-by: David Sterba dste...@suse.cz Reviewed-by: Kurt Garloff garl...@suse.de --- Changelog V2: *Check if we have devices support trim before trying to trim the fs, also adjust minlen according to the discard_granularity. *Update reserved extent calculations in btrfs_trim_block_group(). *Call cond_resched() without checking need_resched() *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(), so we won't search the same extent for twice. *Try harder in btrfs_discard_extent(), now we won't report errors if it's not a EOPNOTSUPP. *make sure the block group is cached before trimming it,or we'll see an empty caching tree if the block group is not cached. *Minor return value fix in btrfs_discard_block_group(). --- fs/btrfs/ctree.h|5 ++- fs/btrfs/disk-io.c |5 ++- fs/btrfs/extent-tree.c | 102 +-- fs/btrfs/free-space-cache.c | 92 ++ fs/btrfs/free-space-cache.h |2 + fs/btrfs/ioctl.c| 47 6 files changed, 227 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2c98b3a..5cbc05c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2147,6 +2147,8 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, u64 root_objectid, u64 owner, u64 offset); int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len); +int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache, + u64 num_bytes, int reserve, int sinfo); int btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans, struct btrfs_root *root); int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, @@ -2217,7 +2219,8 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 end); int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr, -u64 num_bytes); +u64 num_bytes, u64 *actual_bytes); +int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e1aa8d6..bcb9451 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2947,7 +2947,10 @@ static int btrfs_destroy_pinned_extent(struct btrfs_root *root, break; /* opt_discard */ - ret = btrfs_error_discard_extent(root, start, end + 1 - start); + if (btrfs_test_opt(root, DISCARD)) + ret = btrfs_error_discard_extent(root, start, + end + 1 - start, + NULL); clear_extent_dirty(unpin, start, end, GFP_NOFS); btrfs_error_unpin_extent_range(root, start, end); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f3c96fc..38100c8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -36,8 +36,6 @@ static int update_block_group(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, int alloc); -static int update_reserved_bytes(struct btrfs_block_group_cache *cache, - u64 num_bytes, int reserve, int sinfo); static int __btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, @@ -442,7 +440,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, * allocate blocks for the tree root we can't do the fast caching since * we likely hold important locks. */ - if (!trans-transaction-in_commit + if (trans (!trans-transaction-in_commit) (root root !=
RE: [PATCH] btrfs file write debugging patch
Hi Mitch, I suspect there's a lock contention between flush-btrfs (lock_dellalloc_pages) and btrfs_file_aio_write. However I can not recreate it locally. Could you please try below patch? Thanks! diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, goto out; } - ret = btrfs_delalloc_reserve_space(inode, - num_pages PAGE_CACHE_SHIFT); - if (ret) - goto out; - ret = prepare_pages(root, file, pages, num_pages, pos, first_index, last_index, write_bytes); - if (ret) { - btrfs_delalloc_release_space(inode, + if (ret) + goto out; + + ret = btrfs_delalloc_reserve_space(inode, num_pages PAGE_CACHE_SHIFT); + if (ret) { + btrfs_drop_pages(pages, num_pages); goto out; } -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Zhong, Xin Sent: Tuesday, March 01, 2011 6:15 PM To: Mitch Harder; Maria Wikström Cc: Josef Bacik; Johannes Hirte; Chris Mason; linux-btrfs@vger.kernel.org Subject: RE: [PATCH] btrfs file write debugging patch Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder Sent: Tuesday, March 01, 2011 4:20 AM To: Maria Wikström Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH] btrfs file write debugging patch
On Tue, Mar 1, 2011 at 4:14 AM, Zhong, Xin xin.zh...@intel.com wrote: Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? There's no sign of memory pressure. Although I only have 1 GB in this box, I'm still show ~1/2 GB RAM free during this build. There's no swap space allocated, and nothing in dmesg that indicates there's a transient spike of RAM pressure. I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! That ftrace.log was run with ftrace defaults for a function trace. It should collect calls from the whole system. For the sake of consistency, I am intentionally trying to insure that very few other things are going on at the same time as this build. And I'm building with -j1 so things will happen the same way each time. Also, I supplied just the tail end of the trace log. The full log shows a few of the other build processes leading up to the problem, but the ftrace ring buffer fills up surprisingly fast. Even with a 50MB ring buffer for ftrace, I usually collect less than 1 second of information when something busy like a build is going on. Let me know if you'd like to see the full log. It's bigger, but I can find someplace to put it. But I'm pretty sure that wmldbcreate is the only thing that is going on when the breakage occurs. Otherwise I wouldn't get such consistent breakage in the same spot every time. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Tue, Mar 1, 2011 at 5:56 AM, Zhong, Xin xin.zh...@intel.com wrote: Hi Mitch, I suspect there's a lock contention between flush-btrfs (lock_dellalloc_pages) and btrfs_file_aio_write. However I can not recreate it locally. Could you please try below patch? Thanks! diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, goto out; } - ret = btrfs_delalloc_reserve_space(inode, - num_pages PAGE_CACHE_SHIFT); - if (ret) - goto out; - ret = prepare_pages(root, file, pages, num_pages, pos, first_index, last_index, write_bytes); - if (ret) { - btrfs_delalloc_release_space(inode, + if (ret) + goto out; + + ret = btrfs_delalloc_reserve_space(inode, num_pages PAGE_CACHE_SHIFT); + if (ret) { + btrfs_drop_pages(pages, num_pages); goto out; } Thanks. I've tested this patch, but the build is still failing at the same point as before. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6.38-rc6, patch] fix delayed_refs locking on error path...
Correctly unlock delayed_refs in the error case. Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e1aa8d6..c48d699 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2787,6 +2787,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, spin_lock(delayed_refs-lock); if (delayed_refs-num_entries == 0) { printk(KERN_INFO delayed_refs has NO entry\n); + spin_unlock(delayed_refs-lock); return ret; } -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6.38-rc6, patch] mark some internal functions static...
Prevent needless exporting of internal functions from compilation units by marking them static. Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index b5baff0..5e49196 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -74,7 +74,7 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p) * retake all the spinlocks in the path. You can safely use NULL * for held */ -noinline void btrfs_clear_path_blocking(struct btrfs_path *p, +static noinline void btrfs_clear_path_blocking(struct btrfs_path *p, struct extent_buffer *held) { int i; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e1aa8d6..c48d699 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2279,7 +2279,7 @@ static int write_dev_supers(struct btrfs_device *device, return errors i ? 0 : -1; } -int write_all_supers(struct btrfs_root *root, int max_mirrors) +static int write_all_supers(struct btrfs_root *root, int max_mirrors) { struct list_head *head; struct btrfs_device *dev; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f3c96fc..1961081 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -77,7 +77,7 @@ static int block_group_bits(struct btrfs_block_group_cache *cache, u64 bits) return (cache-flags bits) == bits; } -void btrfs_get_block_group(struct btrfs_block_group_cache *cache) +static void btrfs_get_block_group(struct btrfs_block_group_cache *cache) { atomic_inc(cache-count); } @@ -3576,7 +3576,7 @@ static void block_rsv_add_bytes(struct btrfs_block_rsv *block_rsv, spin_unlock(block_rsv-lock); } -void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv, +static void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv, struct btrfs_block_rsv *dest, u64 num_bytes) { struct btrfs_space_info *space_info = block_rsv-space_info; diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index a039065..ec5015c 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1371,7 +1371,7 @@ out: return ret; } -bool try_merge_free_space(struct btrfs_block_group_cache *block_group, +static bool try_merge_free_space(struct btrfs_block_group_cache *block_group, struct btrfs_free_space *info, bool update_stat) { struct btrfs_free_space *left_info; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index be2d4f6..7b97854 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2193,7 +2193,7 @@ static void get_block_group_info(struct list_head *groups_list, } } -long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg) +static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg) { struct btrfs_ioctl_space_args space_args; struct btrfs_ioctl_space_info space; -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
Hi, Mitch I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: #echo ':mod:btrfs' /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. Thanks a lot for your help! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs wishlist
Hi all Having managed ZFS for about two years, I want to post a wishlist. INCLUDED IN ZFS - Mirror existing single-drive filesystem, as in 'zfs attach' - RAIDz-stuff - single and hopefully multiple-parity RAID configuration with block-level checksumming - Background scrub/fsck - Pool-like management with multiple RAIDs/mirrors (VDEVs) - Autogrow as in ZFS autoexpand NOT INCLUDED IN CURRENT ZFS - Adding/removing drives from VDEVs - Rebalancing a pool - dedup This may be a long shot, but can someone tell if this is doable in a year or five? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs wishlist
Excerpts from Roy Sigurd Karlsbakk's message of 2011-03-01 13:35:42 -0500: Hi all Having managed ZFS for about two years, I want to post a wishlist. INCLUDED IN ZFS - Mirror existing single-drive filesystem, as in 'zfs attach' This one is easy, we do plan on adding it. - RAIDz-stuff - single and hopefully multiple-parity RAID configuration with block-level checksumming We'll have raid56, but it won't be variable stripe size. There will be one stripe size for data and one for metadata but that's it. - Background scrub/fsck These are in the works - Pool-like management with multiple RAIDs/mirrors (VDEVs) We have a pool of drives nowI'm not sure exactly what the vdevs are. - Autogrow as in ZFS autoexpand We grow to the available storage now. NOT INCLUDED IN CURRENT ZFS - Adding/removing drives from VDEVs We can add and remove drives on the fly today - Rebalancing a pool We can rebalance space between drives today. - dedup ZFS does have dedup we don't yet. This one has a firm maybe. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs wishlist
On Tue, Mar 1, 2011 at 10:39 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Roy Sigurd Karlsbakk's message of 2011-03-01 13:35:42 -0500: - Pool-like management with multiple RAIDs/mirrors (VDEVs) We have a pool of drives nowI'm not sure exactly what the vdevs are. This functionality is in btrfs already, but it's using different terminology and configuration methods. In ZFS, the lowest level in the storage stack is the physical block device. You group these block devices together into a virtual device (aka vdev). The possible vdevs are: - single disk vdev, with no redundancy - mirror vdev, with any number of devices (n-way mirroring) - raidz1 vdev, single-parity redundancy - raidz2 vdev, dual-parity redundancy - raidz3 vdev, triple-party redundancy - log vdev, separate device for journaling, or as a write cache - cache vdev, separate device that acts as a read cache A ZFS pool is made up of a collection of the vdevs. For example, a simple, non-redundant pool setup for a laptop would be: zpool create laptoppool da0 To create a pool with a dual-parity vdev using 8 disks: zpool create mypool raidz2 da0 da1 da2 da3 da4 da5 da6 da7 To later add to the existing pool: zpool add mypool raidz2 da8 da9 da10 da11 da12 da13 da14 da15 Later, you create your ZFS filesystems ontop of the pool. With btrfs, you setup the redundancy and the filesystem all in one shot, thus combining the vdev with the pool (aka filesystem). ZFS has better separation of the different layers (device, pool, filesystem), and better tools for working with them (zpool / zfs) but similar functionality is (or at least appears to be) in btrfs already. Using device mapper / md underneath btrfs also gives you a similar setup to ZFS. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
2011/3/1 Xin Zhong thierryzh...@hotmail.com: Hi, Mitch I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: #echo ':mod:btrfs' /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. Thanks a lot for your help! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I manually ran an strace around the build command (wmldbcreate) that is causing my problem, and I am attaching the strace for that. Please note that wmldbcreate does not seem to care when an error is returned, and continues on. So the error is occurring somewhat silently in the middle, and isn't the last item. The error is probably associated with one of the 12288 byte writes. I have re-run an ftrace following the conditions above, and have hosted that file (~1.1MB compressed) on my local server at: http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz Please note I am still using some debugging modifications of my own to file.c. They server the purpose of: (1) Avoiding an infinite loop by identifying when the problem is occuring, and exiting with error after 256 loops. (2) Stopping the trace after exiting to keep from flooding the ftrace buffer. (3) Provide debugging comments (all prefaced with TPK: in the trace). Let me know if you want me to change any of the conditions. wmldbcreate-strace.gz Description: GNU Zip compressed data
Re: [PATCH] btrfs file write debugging patch
On Mon, Feb 28, 2011 at 02:20:22PM -0600, Mitch Harder wrote: As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. *snip* Just my few cents. I've applied the patch from Chris Mason (Sun, 27 Feb 2011 20:46:05 -0500) and this one from Mitch (Mon, 28 Feb 2011 14:20:22 -0600) on top of vanilla 2.6.38-rc6 and it seems that it resolves my issues with hanging `svn info' during libgcrypt emerge. Piotr Szymaniak. -- - (...) Nie wyobrazam sobie, co ta gora miesa moglaby ci dac, czego ja nie moglbym ofiarowac. Oczywiscie poza piecdziesiecioma funtami rozrosnietych miesni. - Moze mnie wlasnie pociagaja rozrosniete miesnie. (...) W koncu wielu mezczyzn pociaga rozrosnieta tkanka tluszczowa piersi. -- Graham Masterton, The Wells of Hell pgp0s4aN9vbmU.pgp Description: PGP signature