RE: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Zhong, Xin
Is your system running out of memory or is there any other thread like 
flush-btrfs competing for the same page?

I can only see one process in your ftrace log. You may need to trace all 
btrfs.ko function calls instead of a single process. Thanks!

-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder
Sent: Tuesday, March 01, 2011 4:20 AM
To: Maria Wikström
Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; 
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs file write debugging patch

2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now 
  the same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with 
  no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





 I've applied the patch at the head of this thread (with the jiffies
 debugging commented out) and I'm attaching a ftrace using the
 function_graph tracer when I'm stuck in the loop.  I've just snipped
 out a couple of the loops (the full trace file is quite large, and
 mostly repititious).

 I'm going to try to modify file.c with some trace_printk debugging to
 show the values of several of the relevant variables at various
 stages.

 I'm going to try to exit the loop after 256 tries with an EFAULT so I
 can stop the tracing at that point and capture a trace of the entry
 into the problem (the ftrace ring buffer fills up too fast for me to
 capture the entry point).


As promised, I'm put together a modified file.c with many trace_printk
debugging statements to augment the ftrace.

The trace is ~128K compressed (about 31,600 lines or 2.6MB
uncompressed), so I'm putting it up on my local server instead of
attaching.  Let me know if it would be more appropriate to send to the
list as an attachment.

http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz

I preface all my trace_printk comments with TPK: to make skipping
through the trace easier.

The trace contains the trace of about 3 or 4 successful passes through
the btrfs_file_aio_write() function to show what a successful trace
looks like.

The pass through the btrfs_file_aio_write() that breaks begins on line 1088.

I let it loop through the while (iov_iter_count(i)  0) {} loop for
256 times when copied==0 (otherwise it would loop infinitely).  Then
exit out and stop the trace.

For reference, 

Re: [PATCH V2] Btrfs: Batched discard support for btrfs

2011-03-01 Thread Li Dongyang
On Friday, February 25, 2011 04:16:27 PM Li Dongyang wrote:
 Thanks for your comments, here is the updated patch.
 I've tested it with xfstests 251(thanks to Lukas), and it looks fine to me.
 
when we call btrfs_map_block() for RAID0/1/10/ or DUP, it only returns a single 
stripe
length at most, I'm a bit confused why we are doing this and it makes a little 
trouble to
this patch: we just trim the first stripe on each device right now.
We can loop in btrfs_discard_extent(), mapping each stripe and trim them, but I 
think the
ideal way is mapping the full length of the free extent and trim that all at 
once, ideas?

Thanks a lot,
Li Dongyang
 Signed-off-by: Li Dongyang lidongy...@novell.com
 Reviewed-by: David Sterba dste...@suse.cz
 Reviewed-by: Kurt Garloff garl...@suse.de
 ---
 Changelog V2:
 *Check if we have devices support trim before trying to trim the fs, also 
 adjust
   minlen according to the discard_granularity.
 *Update reserved extent calculations in btrfs_trim_block_group().
 *Call cond_resched() without checking need_resched()
 *Use bitmap_clear_bits() and unlink_free_space() instead of 
 btrfs_remove_free_space(),
   so we won't search the same extent for twice.
 *Try harder in btrfs_discard_extent(), now we won't report errors
  if it's not a EOPNOTSUPP.
 *make sure the block group is cached before trimming it,or we'll see an 
 empty caching
  tree if the block group is not cached.
 *Minor return value fix in btrfs_discard_block_group(). 
 ---
  fs/btrfs/ctree.h|5 ++-
  fs/btrfs/disk-io.c  |5 ++-
  fs/btrfs/extent-tree.c  |  102 
 +--
  fs/btrfs/free-space-cache.c |   92 ++
  fs/btrfs/free-space-cache.h |2 +
  fs/btrfs/ioctl.c|   47 
  6 files changed, 227 insertions(+), 26 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 2c98b3a..5cbc05c 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -2147,6 +2147,8 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 u64 root_objectid, u64 owner, u64 offset);
  
  int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len);
 +int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
 + u64 num_bytes, int reserve, int sinfo);
  int btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
   struct btrfs_root *root);
  int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
 @@ -2217,7 +2219,8 @@ u64 btrfs_account_ro_block_groups_free_space(struct 
 btrfs_space_info *sinfo);
  int btrfs_error_unpin_extent_range(struct btrfs_root *root,
  u64 start, u64 end);
  int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr,
 -u64 num_bytes);
 +u64 num_bytes, u64 *actual_bytes);
 +int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range);
  
  /* ctree.c */
  int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index e1aa8d6..bcb9451 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2947,7 +2947,10 @@ static int btrfs_destroy_pinned_extent(struct 
 btrfs_root *root,
   break;
  
   /* opt_discard */
 - ret = btrfs_error_discard_extent(root, start, end + 1 - start);
 + if (btrfs_test_opt(root, DISCARD))
 + ret = btrfs_error_discard_extent(root, start,
 +  end + 1 - start,
 +  NULL);
  
   clear_extent_dirty(unpin, start, end, GFP_NOFS);
   btrfs_error_unpin_extent_range(root, start, end);
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index f3c96fc..38100c8 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -36,8 +36,6 @@
  static int update_block_group(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 u64 bytenr, u64 num_bytes, int alloc);
 -static int update_reserved_bytes(struct btrfs_block_group_cache *cache,
 -  u64 num_bytes, int reserve, int sinfo);
  static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   u64 bytenr, u64 num_bytes, u64 parent,
 @@ -442,7 +440,7 @@ static int cache_block_group(struct 
 btrfs_block_group_cache *cache,
* allocate blocks for the tree root we can't do the fast caching since
* we likely hold important locks.
*/
 - if (!trans-transaction-in_commit 
 + if (trans  (!trans-transaction-in_commit) 
   (root  root != 

RE: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Zhong, Xin
Hi Mitch,

I suspect there's a lock contention between flush-btrfs (lock_dellalloc_pages) 
and btrfs_file_aio_write. However I can not recreate it locally. Could you 
please try below patch? Thanks!

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
goto out;
}
 
-   ret = btrfs_delalloc_reserve_space(inode,
-   num_pages  PAGE_CACHE_SHIFT);
-   if (ret)
-   goto out;
-
ret = prepare_pages(root, file, pages, num_pages,
pos, first_index, last_index,
write_bytes);
-   if (ret) {
-   btrfs_delalloc_release_space(inode,
+   if (ret)
+   goto out;
+   
+   ret = btrfs_delalloc_reserve_space(inode,
num_pages  PAGE_CACHE_SHIFT);
+   if (ret) {
+   btrfs_drop_pages(pages, num_pages);
goto out;
}


-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Zhong, Xin
Sent: Tuesday, March 01, 2011 6:15 PM
To: Mitch Harder; Maria Wikström
Cc: Josef Bacik; Johannes Hirte; Chris Mason; linux-btrfs@vger.kernel.org
Subject: RE: [PATCH] btrfs file write debugging patch

Is your system running out of memory or is there any other thread like 
flush-btrfs competing for the same page?

I can only see one process in your ftrace log. You may need to trace all 
btrfs.ko function calls instead of a single process. Thanks!

-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder
Sent: Tuesday, March 01, 2011 4:20 AM
To: Maria Wikström
Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; 
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs file write debugging patch

2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now 
  the same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with 
  no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  

Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Mitch Harder
On Tue, Mar 1, 2011 at 4:14 AM, Zhong, Xin xin.zh...@intel.com wrote:
 Is your system running out of memory or is there any other thread like 
 flush-btrfs competing for the same page?


There's no sign of memory pressure.  Although I only have 1 GB in this
box, I'm still show ~1/2 GB RAM free during this build.  There's no
swap space allocated, and nothing in dmesg that indicates there's a
transient spike of RAM pressure.

 I can only see one process in your ftrace log. You may need to trace all 
 btrfs.ko function calls instead of a single process. Thanks!


That ftrace.log was run with ftrace defaults for a function trace.  It
should collect calls from the whole system.

For the sake of consistency, I am intentionally trying to insure that
very few other things are going on at the same time as this build.
And I'm building with -j1 so things will happen the same way each
time.

Also, I supplied just the tail end of the trace log.  The full log
shows a few of the other build processes leading up to the problem,
but the ftrace ring buffer fills up surprisingly fast.  Even with a
50MB ring buffer for ftrace, I usually collect less than 1 second of
information when something busy like a build is going on.

Let me know if you'd like to see the full log.  It's bigger, but I can
find someplace to put it.

But I'm pretty sure that wmldbcreate is the only thing that is going
on when the breakage occurs.  Otherwise I wouldn't get such consistent
breakage in the same spot every time.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Mitch Harder
On Tue, Mar 1, 2011 at 5:56 AM, Zhong, Xin xin.zh...@intel.com wrote:
 Hi Mitch,

 I suspect there's a lock contention between flush-btrfs 
 (lock_dellalloc_pages) and btrfs_file_aio_write. However I can not recreate 
 it locally. Could you please try below patch? Thanks!

 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
 *iocb,
                        goto out;
                }

 -               ret = btrfs_delalloc_reserve_space(inode,
 -                                       num_pages  PAGE_CACHE_SHIFT);
 -               if (ret)
 -                       goto out;
 -
                ret = prepare_pages(root, file, pages, num_pages,
                                    pos, first_index, last_index,
                                    write_bytes);
 -               if (ret) {
 -                       btrfs_delalloc_release_space(inode,
 +               if (ret)
 +                       goto out;
 +
 +               ret = btrfs_delalloc_reserve_space(inode,
                                        num_pages  PAGE_CACHE_SHIFT);
 +               if (ret) {
 +                       btrfs_drop_pages(pages, num_pages);
                        goto out;
                }



Thanks.

I've tested this patch, but the build is still failing at the same
point as before.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.38-rc6, patch] fix delayed_refs locking on error path...

2011-03-01 Thread Daniel J Blueman
Correctly unlock delayed_refs in the error case.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e1aa8d6..c48d699 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2787,6 +2787,7 @@ static int btrfs_destroy_delayed_refs(struct
btrfs_transaction *trans,
spin_lock(delayed_refs-lock);
if (delayed_refs-num_entries == 0) {
printk(KERN_INFO delayed_refs has NO entry\n);
+   spin_unlock(delayed_refs-lock);
return ret;
}

-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.38-rc6, patch] mark some internal functions static...

2011-03-01 Thread Daniel J Blueman
Prevent needless exporting of internal functions from compilation
units by marking them static.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b5baff0..5e49196 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -74,7 +74,7 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p)
  * retake all the spinlocks in the path.  You can safely use NULL
  * for held
  */
-noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
+static noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
struct extent_buffer *held)
 {
int i;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e1aa8d6..c48d699 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2279,7 +2279,7 @@ static int write_dev_supers(struct btrfs_device *device,
return errors  i ? 0 : -1;
 }

-int write_all_supers(struct btrfs_root *root, int max_mirrors)
+static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
struct list_head *head;
struct btrfs_device *dev;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f3c96fc..1961081 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -77,7 +77,7 @@ static int block_group_bits(struct
btrfs_block_group_cache *cache, u64 bits)
return (cache-flags  bits) == bits;
 }

-void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
+static void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
 {
atomic_inc(cache-count);
 }
@@ -3576,7 +3576,7 @@ static void block_rsv_add_bytes(struct
btrfs_block_rsv *block_rsv,
spin_unlock(block_rsv-lock);
 }

-void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
+static void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
 struct btrfs_block_rsv *dest, u64 num_bytes)
 {
struct btrfs_space_info *space_info = block_rsv-space_info;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index a039065..ec5015c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1371,7 +1371,7 @@ out:
return ret;
 }

-bool try_merge_free_space(struct btrfs_block_group_cache *block_group,
+static bool try_merge_free_space(struct btrfs_block_group_cache *block_group,
  struct btrfs_free_space *info, bool update_stat)
 {
struct btrfs_free_space *left_info;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index be2d4f6..7b97854 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2193,7 +2193,7 @@ static void get_block_group_info(struct
list_head *groups_list,
}
 }

-long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
 {
struct btrfs_ioctl_space_args space_args;
struct btrfs_ioctl_space_info space;
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Xin Zhong

Hi, Mitch
I think you can config ftrace to just trace function calls of btrfs.ko which 
will save a lot of trace buffer space. See below command:
#echo ':mod:btrfs'  /sys/kernel/debug/tracing/set_ftrace_filterAnd please send 
out the full ftrace log again.

Another helpful information might be the strace log of the wmldbcreate process. 
It will show us the io pattern of this command.
Thanks a lot for your help!
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs wishlist

2011-03-01 Thread Roy Sigurd Karlsbakk
Hi all

Having managed ZFS for about two years, I want to post a wishlist.

INCLUDED IN ZFS

- Mirror existing single-drive filesystem, as in 'zfs attach'
- RAIDz-stuff - single and hopefully multiple-parity RAID configuration with 
block-level checksumming
- Background scrub/fsck
- Pool-like management with multiple RAIDs/mirrors (VDEVs)
- Autogrow as in ZFS autoexpand

NOT INCLUDED IN CURRENT ZFS

- Adding/removing drives from VDEVs
- Rebalancing a pool
- dedup

This may be a long shot, but can someone tell if this is doable in a year or 
five?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs wishlist

2011-03-01 Thread Chris Mason
Excerpts from Roy Sigurd Karlsbakk's message of 2011-03-01 13:35:42 -0500:
 Hi all
 
 Having managed ZFS for about two years, I want to post a wishlist.
 
 INCLUDED IN ZFS
 
 - Mirror existing single-drive filesystem, as in 'zfs attach'

This one is easy, we do plan on adding it.

 - RAIDz-stuff - single and hopefully multiple-parity RAID configuration with 
 block-level checksumming

We'll have raid56, but it won't be variable stripe size.  There will be
one stripe size for data and one for metadata but that's it.

 - Background scrub/fsck

These are in the works

 - Pool-like management with multiple RAIDs/mirrors (VDEVs)

We have a pool of drives nowI'm not sure exactly what the vdevs are.

 - Autogrow as in ZFS autoexpand

We grow to the available storage now.

 
 NOT INCLUDED IN CURRENT ZFS
 
 - Adding/removing drives from VDEVs

We can add and remove drives on the fly today

 - Rebalancing a pool

We can rebalance space between drives today.

 - dedup

ZFS does have dedup we don't yet.  This one has a firm maybe.


-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs wishlist

2011-03-01 Thread Freddie Cash
On Tue, Mar 1, 2011 at 10:39 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Roy Sigurd Karlsbakk's message of 2011-03-01 13:35:42 -0500:

 - Pool-like management with multiple RAIDs/mirrors (VDEVs)

 We have a pool of drives nowI'm not sure exactly what the vdevs are.

This functionality is in btrfs already, but it's using different
terminology and configuration methods.

In ZFS, the lowest level in the storage stack is the physical block device.

You group these block devices together into a virtual device (aka
vdev).  The possible vdevs are:
  - single disk vdev, with no redundancy
  - mirror vdev, with any number of devices (n-way mirroring)
  - raidz1 vdev, single-parity redundancy
  - raidz2 vdev, dual-parity redundancy
  - raidz3 vdev, triple-party redundancy
  - log vdev, separate device for journaling, or as a write cache
  - cache vdev, separate device that acts as a read cache

A ZFS pool is made up of a collection of the vdevs.

For example, a simple, non-redundant pool setup for a laptop would be:
  zpool create laptoppool da0

To create a pool with a dual-parity vdev using 8 disks:
  zpool create mypool raidz2 da0 da1 da2 da3 da4 da5 da6 da7

To later add to the existing pool:
  zpool add mypool raidz2 da8 da9 da10 da11 da12 da13 da14 da15

Later, you create your ZFS filesystems ontop of the pool.

With btrfs, you setup the redundancy and the filesystem all in one
shot, thus combining the vdev with the pool (aka filesystem).

ZFS has better separation of the different layers (device, pool,
filesystem), and better tools for working with them (zpool / zfs) but
similar functionality is (or at least appears to be) in btrfs already.

Using device mapper / md underneath btrfs also gives you a similar setup to ZFS.

-- 
Freddie Cash
fjwc...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Mitch Harder
2011/3/1 Xin Zhong thierryzh...@hotmail.com:

 Hi, Mitch
 I think you can config ftrace to just trace function calls of btrfs.ko which 
 will save a lot of trace buffer space. See below command:
 #echo ':mod:btrfs'  /sys/kernel/debug/tracing/set_ftrace_filterAnd please 
 send out the full ftrace log again.

 Another helpful information might be the strace log of the wmldbcreate 
 process. It will show us the io pattern of this command.
 Thanks a lot for your help!

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


I manually ran an strace around the build command (wmldbcreate) that
is causing my problem, and I am attaching the strace for that.

Please note that wmldbcreate does not seem to care when an error is
returned, and continues on.  So the error is occurring somewhat
silently in the middle, and isn't the last item.  The error is
probably associated with one of the 12288 byte writes.

I have re-run an ftrace following the conditions above, and have
hosted that file (~1.1MB compressed) on my local server at:

http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz

Please note I am still using some debugging modifications of my own to file.c.

They server the purpose of:
(1) Avoiding an infinite loop by identifying when the problem is
occuring, and exiting with error after 256 loops.
(2) Stopping the trace after exiting to keep from flooding the ftrace buffer.
(3) Provide debugging comments (all prefaced with TPK: in the trace).

Let me know if you want me to change any of the conditions.


wmldbcreate-strace.gz
Description: GNU Zip compressed data


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Piotr Szymaniak
On Mon, Feb 28, 2011 at 02:20:22PM -0600, Mitch Harder wrote:
 As promised, I'm put together a modified file.c with many trace_printk
 debugging statements to augment the ftrace.
 *snip*

Just my few cents. I've applied the patch from Chris Mason (Sun, 27 Feb
2011 20:46:05 -0500) and this one from Mitch (Mon, 28 Feb 2011 14:20:22
-0600) on top of vanilla 2.6.38-rc6 and it seems that it resolves my
issues with hanging `svn info' during libgcrypt emerge.

Piotr Szymaniak.
-- 
 - (...) Nie wyobrazam sobie, co ta gora miesa moglaby ci dac, czego ja
nie   moglbym   ofiarowac.  Oczywiscie  poza  piecdziesiecioma  funtami
rozrosnietych miesni.
 - Moze mnie wlasnie pociagaja rozrosniete miesnie. (...) W koncu wielu
mezczyzn pociaga rozrosnieta tkanka tluszczowa piersi.
  -- Graham Masterton, The Wells of Hell


pgp0s4aN9vbmU.pgp
Description: PGP signature