Re: crc32c implementation on x86 with SSE4.2... CONFIG_BTRFS_HW_SUM
Dnia 2008-10-16, czw o godzinie 09:49 -0400, Chris Mason pisze: On Thu, 2008-10-16 at 14:40 +0100, Miguel Sousa Filipe wrote: Hi there, I noticed that btrfs, in the git tree, has its own implementation of crc32c for x86 with SSE4.2 that implement a crc32 instruction.. it appears. I don't see intel's patches in mainline yet, but I know there was a plan to get them there. It seems to be merged: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8cb51ba8e06570a5fff674b3744d12a1b089f2d0 Using generic, replaceable by arch version is surely better. This way you can avoid implementing crc32 for each architecture (like, for example UltraSPARC T2, which computes crc at healthy 48 GB/s). -- Tomasz Torcz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with a backref problem
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote: 2008/10/17 Josef Bacik [EMAIL PROTECTED]: On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote: Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace I guess there are two or more threads running finish_current_insert at the same time. (find_first_extent_bit vs clear_extent_bits race) This can't happen, its gaurded by a new mutex that is responsible for the extent_ins/del_pending/pinned_extents extent io trees. [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Ok I think I figured it out, we need to have a c = root-node; after the insert_new_root, since we will have free'd the old extent buffer and replaced it with a new one. Does that sound right? Thanks, I don't think so. If we do this, we will end up spliting the new root. Ok, I think I understand this now, thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with a backref problem
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote: 2008/10/17 Josef Bacik [EMAIL PROTECTED]: On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote: Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace I guess there are two or more threads running finish_current_insert at the same time. (find_first_extent_bit vs clear_extent_bits race) [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Ok I think I figured it out, we need to have a c = root-node; after the insert_new_root, since we will have free'd the old extent buffer and replaced it with a new one. Does that sound right? Thanks, I don't think so. If we do this, we will end up spliting the new root. here is my patch, its a bit of a mess right now, thanks, Josef diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 9caeb37..4f2 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1390,8 +1390,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root lowest_level = p-lowest_level; WARN_ON(lowest_level ins_len 0); WARN_ON(p-nodes[0] != NULL); - WARN_ON(cow root == root-fs_info-extent_root - !mutex_is_locked(root-fs_info-alloc_mutex)); + if (ins_len 0) lowest_unlock = 2; @@ -2051,6 +2050,7 @@ static noinline int split_node(struct btrfs_trans_handle *trans, if (c == root-node) { /* trying to split the root, lets make a new one */ ret = insert_new_root(trans, root, path, level + 1); + printk(KERN_ERR splitting the root, %llu\n, c-start); if (ret) return ret; } else { diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fad58b9..d1e304f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -516,12 +516,14 @@ struct btrfs_free_space { struct rb_node offset_index; u64 offset; u64 bytes; + unsigned long ip; }; struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; spinlock_t lock; + struct mutex alloc_mutex; u64 pinned; u64 reserved; u64 flags; @@ -600,6 +602,7 @@ struct btrfs_fs_info { struct mutex transaction_kthread_mutex; struct mutex cleaner_mutex; struct mutex alloc_mutex; + struct mutex extent_io_mutex; struct mutex chunk_mutex; struct mutex drop_mutex; struct mutex volume_mutex; @@ -1879,8 +1882,12 @@ int
Re: Data-deduplication?
On Thu, Oct 16, 2008 at 03:30:49PM -0400, Chris Mason wrote: On Thu, 2008-10-16 at 15:25 -0400, Valerie Aurora Henson wrote: Both deduplication and compression have an interesting side effect in which a write to a previously allocated block can return ENOSPC. This is even more exciting when you factor in mmap. Any thoughts on how to handle this? Unfortunately we'll have a number of places where ENOSPC will jump in where people don't expect it, and this includes any COW overwrite of an existing extent. The old extent isn't freed until snapshot deletion time, which won't happen until after the current transaction commits. Another example is fallocate. The extent will have a little flag that says I'm a preallocated extent, which is how we'll know we're allowed to overwrite it directly instead of doing COW. But, to write to the fallocated extent, we'll have to clear the flag. So, we'll have to cow the block that holds the file extent pointer, which means we can enospc. I'm sure you know this, but for the peanut gallery: You can avoid some of these sort of purely copy-on-write ENOSPC cases. Any operation where the space used afterwards is less than or equal to the space used before - like in your fallocate case - can avoid ENOSPC as long as you reserve a certain amount of space on the fs and break down the changes into small enough groups. Most file systems don't let you fill up beyond 90-95% anyway because performance goes to hell. You also need to do this so you can delete when your file system is full. In general, it'd be nice to say that if your app can't handle suprise ENOSPC, then if you run without snapshots, compression, or data dedup, we guarantee you'll only get ENOSPC in the normal cases. What do you think? -VAL -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data-deduplication?
On Thu, Oct 16, 2008 at 03:25:01PM -0400, Valerie Aurora Henson wrote: Both deduplication and compression have an interesting side effect in which a write to a previously allocated block can return ENOSPC. This is even more exciting when you factor in mmap. Any thoughts on how to handle this? Note that this can already happen in todays filesystems. Writing into some preallocated space can always cause splits of the allocation or bmap btrees as the pervious big preallocated extent now is split into one allocated and at least one (or two if writing into the middle) preallocated extents. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html