Re: crc32c implementation on x86 with SSE4.2... CONFIG_BTRFS_HW_SUM

2008-10-17 Thread Tomasz Torcz
Dnia 2008-10-16, czw o godzinie 09:49 -0400, Chris Mason pisze:
 On Thu, 2008-10-16 at 14:40 +0100, Miguel Sousa Filipe wrote:
  Hi there,
  
  I noticed that btrfs, in the git tree, has its own implementation of
  crc32c for x86 with SSE4.2 that implement a crc32 instruction.. it
  appears.
 I don't see intel's patches in mainline yet, but I know there was a
plan
 to get them there.

  It seems to be merged:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8cb51ba8e06570a5fff674b3744d12a1b089f2d0

  Using generic, replaceable by arch version is surely better. This way
you can avoid implementing crc32 for each architecture (like, for
example UltraSPARC T2, which computes crc at healthy 48 GB/s). 

-- 
Tomasz Torcz

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with a backref problem

2008-10-17 Thread Josef Bacik
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote:
 2008/10/17 Josef Bacik [EMAIL PROTECTED]:
  On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote:
  Hello,
 
  Its the end of the day here and I haven't figured this out, so hopefully 
  Yan you
  can figure this out and I can come in tomorrow and keep working on taking
  alloc_mutex out :).  What is happening is I'm getting -ENOENT from
  lookup_extent_backref in finish_current_insert() when extent_op-type ==
  PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this 
  can
  happen is if we delete the extent backref completely, and then do
  btrfs_update_ref.  I put a lookup_extent_backref in 
  __btrfs_update_extent_ref
  and did a BUG_ON(ret), and it gave me this backtrace
 
 
 I guess there are two or more threads running finish_current_insert at the 
 same
 time. (find_first_extent_bit vs clear_extent_bits race)


This can't happen, its gaurded by a new mutex that is responsible for the
extent_ins/del_pending/pinned_extents extent io trees.
 
   [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
   [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
   [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
   [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
   [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
   [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
   [8109a832] ? init_object+0x27/0x6e
   [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
   [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
   [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
   [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
   [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
   [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
   [81046721] ? kthread+0x47/0x76
   [8100cd59] ? child_rip+0xa/0x11
   [810466da] ? kthread+0x0/0x76
   [8100cd4f] ? child_rip+0x0/0x11
 
  And I also put in some printk's to figure out when exactly this was 
  happening,
  and it happens in split_node() when c == root-node, so we do an
  insert_new_root.  My first reaction was to put a c = path-nodes[level] 
  after
  the insert_new_root, but looking at it thats just going to give me the same
  thing back.  I can't figure out if I'm doing something wrong or if there is
  something wonky with the backref stuff, and what is even more worriesome 
  is that
  I can't figure out why having alloc_mutex in there kept this problem from
  happenign before, since the way this happens doesn't have anything to do 
  with
  alloc_mutex.  All help is appreciated, even random thinking outloud, 
  hopefully
  we can figure out what is going on and I can finish ripping alloc_mutex 
  out.
  Thanks,
 
 
  Ok I think I figured it out, we need to have a
 
  c = root-node;
 
  after the insert_new_root, since we will have free'd the old extent buffer 
  and
  replaced it with a new one.  Does that sound right?  Thanks,
 
 
 I don't think so. If we do this, we will end up spliting the new root.


Ok, I think I understand this now, thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with a backref problem

2008-10-17 Thread Josef Bacik
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote:
 2008/10/17 Josef Bacik [EMAIL PROTECTED]:
  On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote:
  Hello,
 
  Its the end of the day here and I haven't figured this out, so hopefully 
  Yan you
  can figure this out and I can come in tomorrow and keep working on taking
  alloc_mutex out :).  What is happening is I'm getting -ENOENT from
  lookup_extent_backref in finish_current_insert() when extent_op-type ==
  PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this 
  can
  happen is if we delete the extent backref completely, and then do
  btrfs_update_ref.  I put a lookup_extent_backref in 
  __btrfs_update_extent_ref
  and did a BUG_ON(ret), and it gave me this backtrace
 
 
 I guess there are two or more threads running finish_current_insert at the 
 same
 time. (find_first_extent_bit vs clear_extent_bits race)
 
   [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
   [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
   [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
   [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
   [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
   [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
   [8109a832] ? init_object+0x27/0x6e
   [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
   [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
   [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
   [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
   [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
   [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
   [81046721] ? kthread+0x47/0x76
   [8100cd59] ? child_rip+0xa/0x11
   [810466da] ? kthread+0x0/0x76
   [8100cd4f] ? child_rip+0x0/0x11
 
  And I also put in some printk's to figure out when exactly this was 
  happening,
  and it happens in split_node() when c == root-node, so we do an
  insert_new_root.  My first reaction was to put a c = path-nodes[level] 
  after
  the insert_new_root, but looking at it thats just going to give me the same
  thing back.  I can't figure out if I'm doing something wrong or if there is
  something wonky with the backref stuff, and what is even more worriesome 
  is that
  I can't figure out why having alloc_mutex in there kept this problem from
  happenign before, since the way this happens doesn't have anything to do 
  with
  alloc_mutex.  All help is appreciated, even random thinking outloud, 
  hopefully
  we can figure out what is going on and I can finish ripping alloc_mutex 
  out.
  Thanks,
 
 
  Ok I think I figured it out, we need to have a
 
  c = root-node;
 
  after the insert_new_root, since we will have free'd the old extent buffer 
  and
  replaced it with a new one.  Does that sound right?  Thanks,
 
 
 I don't think so. If we do this, we will end up spliting the new root.


here is my patch, its a bit of a mess right now, thanks,

Josef

 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 9caeb37..4f2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1390,8 +1390,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
lowest_level = p-lowest_level;
WARN_ON(lowest_level  ins_len  0);
WARN_ON(p-nodes[0] != NULL);
-   WARN_ON(cow  root == root-fs_info-extent_root 
-   !mutex_is_locked(root-fs_info-alloc_mutex));
+
if (ins_len  0)
lowest_unlock = 2;
 
@@ -2051,6 +2050,7 @@ static noinline int split_node(struct btrfs_trans_handle 
*trans,
if (c == root-node) {
/* trying to split the root, lets make a new one */
ret = insert_new_root(trans, root, path, level + 1);
+   printk(KERN_ERR splitting the root, %llu\n, c-start);
if (ret)
return ret;
} else {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fad58b9..d1e304f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -516,12 +516,14 @@ struct btrfs_free_space {
struct rb_node offset_index;
u64 offset;
u64 bytes;
+   unsigned long ip;
 };
 
 struct btrfs_block_group_cache {
struct btrfs_key key;
struct btrfs_block_group_item item;
spinlock_t lock;
+   struct mutex alloc_mutex;
u64 pinned;
u64 reserved;
u64 flags;
@@ -600,6 +602,7 @@ struct btrfs_fs_info {
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
struct mutex alloc_mutex;
+   struct mutex extent_io_mutex;
struct mutex chunk_mutex;
struct mutex drop_mutex;
struct mutex volume_mutex;
@@ -1879,8 +1882,12 @@ int 

Re: Data-deduplication?

2008-10-17 Thread Valerie Aurora Henson
On Thu, Oct 16, 2008 at 03:30:49PM -0400, Chris Mason wrote:
 On Thu, 2008-10-16 at 15:25 -0400, Valerie Aurora Henson wrote:
  
  Both deduplication and compression have an interesting side effect in
  which a write to a previously allocated block can return ENOSPC.
  This is even more exciting when you factor in mmap.  Any thoughts on
  how to handle this?
 
 Unfortunately we'll have a number of places where ENOSPC will jump in
 where people don't expect it, and this includes any COW overwrite of an
 existing extent.  The old extent isn't freed until snapshot deletion
 time, which won't happen until after the current transaction commits.
 
 Another example is fallocate.  The extent will have a little flag that
 says I'm a preallocated extent, which is how we'll know we're allowed to
 overwrite it directly instead of doing COW.
 
 But, to write to the fallocated extent, we'll have to clear the flag.
 So, we'll have to cow the block that holds the file extent pointer,
 which means we can enospc.

I'm sure you know this, but for the peanut gallery: You can avoid some
of these sort of purely copy-on-write ENOSPC cases.  Any operation
where the space used afterwards is less than or equal to the space
used before - like in your fallocate case - can avoid ENOSPC as long
as you reserve a certain amount of space on the fs and break down the
changes into small enough groups.  Most file systems don't let you
fill up beyond 90-95% anyway because performance goes to hell.  You
also need to do this so you can delete when your file system is full.

In general, it'd be nice to say that if your app can't handle suprise
ENOSPC, then if you run without snapshots, compression, or data dedup,
we guarantee you'll only get ENOSPC in the normal cases.  What do
you think?

-VAL
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data-deduplication?

2008-10-17 Thread Christoph Hellwig
On Thu, Oct 16, 2008 at 03:25:01PM -0400, Valerie Aurora Henson wrote:
 Both deduplication and compression have an interesting side effect in
 which a write to a previously allocated block can return ENOSPC.
 This is even more exciting when you factor in mmap.  Any thoughts on
 how to handle this?

Note that this can already happen in todays filesystems.  Writing into
some preallocated space can always cause splits of the allocation or
bmap btrees as the pervious big preallocated extent now is split into
one allocated and at least one (or two if writing into the middle)
preallocated extents.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html