Re: [RFC PATCH 0/3] apply the Probabilistic Skiplist on btrfs
On 01/06/2012 07:51 AM, David Sterba wrote: Hi, I've let it run through xfstests and ended at 091, patches applied on top of 3.2, mount options compress-force=lzo,discard,inode_cache,space_cache,autodefrag fresh mkfs with defaults. Hi David, Thanks a lot for your work! I also find this and fix it. I will send V2 patchset after it goes through xfstests. thanks, liubo [ 1081.623819] btrfs: force lzo compression [ 1081.629166] btrfs: enabling inode map caching [ 1081.634853] btrfs: enabling auto defrag [ 1081.638569] btrfs: disk space caching is enabled [ 1119.693957] [ cut here ] [ 1119.697876] kernel BUG at fs/btrfs/file.c:530! [ 1119.697876] invalid opcode: [#1] SMP [ 1119.697876] CPU 1 [ 1119.697876] Modules linked in: loop btrfs aoe [ 1119.697876] [ 1119.697876] Pid: 25819, comm: fsx Not tainted 3.2.0-default+ #95 Intel Corporation Santa Rosa platform/Matanzas [ 1119.697876] RIP: 0010:[a0048a18] [a0048a18] btrfs_drop_extent_cache+0x3f8/0x400 [btrfs] [ 1119.697876] RSP: 0018:88000c47f698 EFLAGS: 00010282 [ 1119.697876] RAX: ffef RBX: 88006ff01e48 RCX: 00026fff [ 1119.697876] RDX: 88006ed5d830 RSI: 00022000 RDI: [ 1119.697876] RBP: 88000c47f738 R08: R09: 00022000 [ 1119.697876] R10: fffe R11: 00026fff R12: 88001ada9e48 [ 1119.697876] R13: 0001f000 R14: R15: 88000c47f708 [ 1119.697876] FS: 7f262e570700() GS:88007de0() knlGS: [ 1119.697876] CS: 0010 DS: ES: CR0: 8005003b [ 1119.697876] CR2: 7fc4364fc000 CR3: 79435000 CR4: 06e0 [ 1119.697876] DR0: DR1: DR2: [ 1119.697876] DR3: DR6: 0ff0 DR7: 0400 [ 1119.697876] Process fsx (pid: 25819, threadinfo 88000c47e000, task 880063640700) [ 1119.697876] Stack: [ 1119.697876] 8800 81092040 88000c47f6f0 01000246 [ 1119.697876] 0001 3000 88006e5c44f0 [ 1119.697876] 88006e5c43e0 [ 1119.697876] Call Trace: [ 1119.697876] [81092040] ? trace_hardirqs_on_caller+0x20/0x1d0 [ 1119.697876] [a003a0b0] ? csum_exist_in_range+0xa0/0xa0 [btrfs] [ 1119.697876] [a003f296] cow_file_range+0x136/0x3e0 [btrfs] [ 1119.697876] [810921fd] ? trace_hardirqs_on+0xd/0x10 [ 1119.697876] [a003f8a7] run_delalloc_nocow+0x367/0x820 [btrfs] [ 1119.697876] [81357dae] ? do_raw_spin_unlock+0x5e/0xb0 [ 1119.697876] [a00400c9] run_delalloc_range+0x369/0x370 [btrfs] [ 1119.697876] [a00582c0] __extent_writepage+0x5f0/0x750 [btrfs] [ 1119.697876] [81349f4d] ? radix_tree_gang_lookup_tag_slot+0x8d/0xd0 [ 1119.697876] [810f30d1] ? find_get_pages_tag+0x111/0x1b0 [ 1119.697876] [a0058692] extent_write_cache_pages.clone.0+0x272/0x3f0 [btrfs] [ 1119.697876] [81357dae] ? do_raw_spin_unlock+0x5e/0xb0 [ 1119.697876] [81131604] ? kfree+0xd4/0x180 [ 1119.697876] [81092040] ? trace_hardirqs_on_caller+0x20/0x1d0 [ 1119.697876] [a0058a56] extent_writepages+0x46/0x60 [btrfs] [ 1119.697876] [a003b590] ? acls_after_inode_item+0xd0/0xd0 [btrfs] [ 1119.697876] [a003ad17] btrfs_writepages+0x27/0x30 [btrfs] [ 1120.018734] [810fdcc4] do_writepages+0x24/0x40 [ 1120.018734] [810f3cdb] __filemap_fdatawrite_range+0x5b/0x60 [ 1120.018734] [810f3d3a] filemap_write_and_wait_range+0x5a/0x80 [ 1120.018734] [a004859a] btrfs_file_aio_write+0x4da/0x560 [btrfs] [ 1120.018734] [8113a852] do_sync_write+0xe2/0x120 [ 1120.018734] [8187d2ad] ? __mutex_unlock_slowpath+0xdd/0x180 [ 1120.018734] [8187d35e] ? mutex_unlock+0xe/0x10 [ 1120.018734] [a004703f] ? btrfs_file_llseek+0x6f/0x390 [btrfs] [ 1120.018734] [8113b15e] vfs_write+0xce/0x190 [ 1120.018734] [8113b4a4] sys_write+0x54/0xa0 [ 1120.018734] [81887a82] system_call_fastpath+0x16/0x1b [ 1120.018734] Code: 5e 41 5f c9 c3 0f 0b be bf 01 00 00 48 c7 c7 e6 02 09 a0 48 89 95 68 ff ff ff e8 e4 a2 00 e1 48 8b 95 68 ff ff ff e9 3c fc ff ff 0f 0b 0f 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 [ 1120.018734] RIP [a0048a18] btrfs_drop_extent_cache+0x3f8/0x400 [btrfs] [ 1120.018734] RSP 88000c47f698 [ 1120.047841] ---[ end trace ca0f509767e0195d ]--- xfstests/091 output: 091 57s ... [19:47:50] [19:48:28] [failed, exit status 1] - output mismatch (see 091.out.bad) --- 091.out 2011-11-01 10:31:12.0 +0100 +++ 091.out.bad 2012-01-05 19:48:28.0 +0100 @@ -5,3 +5,41 @@ fsx -N 1 -o 8192 -l 50 -r
[RFC PATCH 0/3] apply the Probabilistic Skiplist on btrfs
Since we are inclined to apply a lockless scheme on some objects of btrfs for higher performance, we want to build a RCU version the Probabilistic Skiplist. Here our skiplist algorithm is based on the skiplist experiments of Con Kolivas ker...@kolivas.org for BFS cpu scheduler. And more details about skiplist design are in patch 1. Right now we have a plan to apply skiplist on extent_map and extent_state. Here we choose extent_map firstly, since it is a read mostly thing, and the change is quite direct, all we need to do is a) to replace rbtree with skiplist, b) to add rcu support. And more details are in patch 2 and patch 3. I've done some simple tests for performance on my 2-core box, there is no obvious difference, but I want to focus on the design side and make sure there is no more bug in it firstly. For long term goals, we want to ship skiplist to lib, like lib/rbtree.c. MORE TESTS ARE WELCOME! Liu Bo (3): Btrfs: add the Probabilistic Skiplist Btrfs: rebuild extent_map based on skiplist Btrfs: convert rwlock to RCU for extent_map fs/btrfs/Makefile |2 +- fs/btrfs/compression.c |8 +- fs/btrfs/disk-io.c | 15 ++- fs/btrfs/extent_io.c | 13 +- fs/btrfs/extent_map.c | 296 ++-- fs/btrfs/extent_map.h | 21 +++- fs/btrfs/file.c| 23 +++- fs/btrfs/inode.c | 69 fs/btrfs/ioctl.c |8 +- fs/btrfs/relocation.c |9 +- fs/btrfs/scrub.c |4 +- fs/btrfs/skiplist.c| 98 fs/btrfs/skiplist.h| 217 +++ fs/btrfs/volumes.c | 68 ++- 14 files changed, 651 insertions(+), 200 deletions(-) create mode 100644 fs/btrfs/skiplist.c create mode 100644 fs/btrfs/skiplist.h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/3] apply the Probabilistic Skiplist on btrfs
Since we are inclined to apply a lockless scheme on some objects of btrfs for higher performance, we want to build a RCU version the Probabilistic Skiplist. Here our skiplist algorithm is based on the skiplist experiments of Con Kolivas ker...@kolivas.org for BFS cpu scheduler. And more details about skiplist design are in patch 1. Right now we have a plan to apply skiplist on extent_map and extent_state. Here we choose extent_map firstly, since it is a read mostly thing, and the change is quite direct, all we need to do is a) to replace rbtree with skiplist, b) to add rcu support. And more details are in patch 2 and patch 3. I've done some simple tests for performance on my 2-core box, there is no obvious difference, but I want to focus on the design side and make sure there is no more bug in it firstly. For long term goals, we want to ship skiplist to lib, like lib/rbtree.c. MORE TESTS ARE WELCOME! Liu Bo (3): Btrfs: add the Probabilistic Skiplist Btrfs: rebuild extent_map based on skiplist Btrfs: convert rwlock to RCU for extent_map fs/btrfs/Makefile |2 +- fs/btrfs/compression.c |8 +- fs/btrfs/disk-io.c | 15 ++- fs/btrfs/extent_io.c | 13 +- fs/btrfs/extent_map.c | 296 ++-- fs/btrfs/extent_map.h | 21 +++- fs/btrfs/file.c| 23 +++- fs/btrfs/inode.c | 69 fs/btrfs/ioctl.c |8 +- fs/btrfs/relocation.c |9 +- fs/btrfs/scrub.c |4 +- fs/btrfs/skiplist.c| 98 fs/btrfs/skiplist.h| 217 +++ fs/btrfs/volumes.c | 68 ++- 14 files changed, 651 insertions(+), 200 deletions(-) create mode 100644 fs/btrfs/skiplist.c create mode 100644 fs/btrfs/skiplist.h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] apply the Probabilistic Skiplist on btrfs
Hi, I've let it run through xfstests and ended at 091, patches applied on top of 3.2, mount options compress-force=lzo,discard,inode_cache,space_cache,autodefrag fresh mkfs with defaults. [ 1081.623819] btrfs: force lzo compression [ 1081.629166] btrfs: enabling inode map caching [ 1081.634853] btrfs: enabling auto defrag [ 1081.638569] btrfs: disk space caching is enabled [ 1119.693957] [ cut here ] [ 1119.697876] kernel BUG at fs/btrfs/file.c:530! [ 1119.697876] invalid opcode: [#1] SMP [ 1119.697876] CPU 1 [ 1119.697876] Modules linked in: loop btrfs aoe [ 1119.697876] [ 1119.697876] Pid: 25819, comm: fsx Not tainted 3.2.0-default+ #95 Intel Corporation Santa Rosa platform/Matanzas [ 1119.697876] RIP: 0010:[a0048a18] [a0048a18] btrfs_drop_extent_cache+0x3f8/0x400 [btrfs] [ 1119.697876] RSP: 0018:88000c47f698 EFLAGS: 00010282 [ 1119.697876] RAX: ffef RBX: 88006ff01e48 RCX: 00026fff [ 1119.697876] RDX: 88006ed5d830 RSI: 00022000 RDI: [ 1119.697876] RBP: 88000c47f738 R08: R09: 00022000 [ 1119.697876] R10: fffe R11: 00026fff R12: 88001ada9e48 [ 1119.697876] R13: 0001f000 R14: R15: 88000c47f708 [ 1119.697876] FS: 7f262e570700() GS:88007de0() knlGS: [ 1119.697876] CS: 0010 DS: ES: CR0: 8005003b [ 1119.697876] CR2: 7fc4364fc000 CR3: 79435000 CR4: 06e0 [ 1119.697876] DR0: DR1: DR2: [ 1119.697876] DR3: DR6: 0ff0 DR7: 0400 [ 1119.697876] Process fsx (pid: 25819, threadinfo 88000c47e000, task 880063640700) [ 1119.697876] Stack: [ 1119.697876] 8800 81092040 88000c47f6f0 01000246 [ 1119.697876] 0001 3000 88006e5c44f0 [ 1119.697876] 88006e5c43e0 [ 1119.697876] Call Trace: [ 1119.697876] [81092040] ? trace_hardirqs_on_caller+0x20/0x1d0 [ 1119.697876] [a003a0b0] ? csum_exist_in_range+0xa0/0xa0 [btrfs] [ 1119.697876] [a003f296] cow_file_range+0x136/0x3e0 [btrfs] [ 1119.697876] [810921fd] ? trace_hardirqs_on+0xd/0x10 [ 1119.697876] [a003f8a7] run_delalloc_nocow+0x367/0x820 [btrfs] [ 1119.697876] [81357dae] ? do_raw_spin_unlock+0x5e/0xb0 [ 1119.697876] [a00400c9] run_delalloc_range+0x369/0x370 [btrfs] [ 1119.697876] [a00582c0] __extent_writepage+0x5f0/0x750 [btrfs] [ 1119.697876] [81349f4d] ? radix_tree_gang_lookup_tag_slot+0x8d/0xd0 [ 1119.697876] [810f30d1] ? find_get_pages_tag+0x111/0x1b0 [ 1119.697876] [a0058692] extent_write_cache_pages.clone.0+0x272/0x3f0 [btrfs] [ 1119.697876] [81357dae] ? do_raw_spin_unlock+0x5e/0xb0 [ 1119.697876] [81131604] ? kfree+0xd4/0x180 [ 1119.697876] [81092040] ? trace_hardirqs_on_caller+0x20/0x1d0 [ 1119.697876] [a0058a56] extent_writepages+0x46/0x60 [btrfs] [ 1119.697876] [a003b590] ? acls_after_inode_item+0xd0/0xd0 [btrfs] [ 1119.697876] [a003ad17] btrfs_writepages+0x27/0x30 [btrfs] [ 1120.018734] [810fdcc4] do_writepages+0x24/0x40 [ 1120.018734] [810f3cdb] __filemap_fdatawrite_range+0x5b/0x60 [ 1120.018734] [810f3d3a] filemap_write_and_wait_range+0x5a/0x80 [ 1120.018734] [a004859a] btrfs_file_aio_write+0x4da/0x560 [btrfs] [ 1120.018734] [8113a852] do_sync_write+0xe2/0x120 [ 1120.018734] [8187d2ad] ? __mutex_unlock_slowpath+0xdd/0x180 [ 1120.018734] [8187d35e] ? mutex_unlock+0xe/0x10 [ 1120.018734] [a004703f] ? btrfs_file_llseek+0x6f/0x390 [btrfs] [ 1120.018734] [8113b15e] vfs_write+0xce/0x190 [ 1120.018734] [8113b4a4] sys_write+0x54/0xa0 [ 1120.018734] [81887a82] system_call_fastpath+0x16/0x1b [ 1120.018734] Code: 5e 41 5f c9 c3 0f 0b be bf 01 00 00 48 c7 c7 e6 02 09 a0 48 89 95 68 ff ff ff e8 e4 a2 00 e1 48 8b 95 68 ff ff ff e9 3c fc ff ff 0f 0b 0f 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 [ 1120.018734] RIP [a0048a18] btrfs_drop_extent_cache+0x3f8/0x400 [btrfs] [ 1120.018734] RSP 88000c47f698 [ 1120.047841] ---[ end trace ca0f509767e0195d ]--- xfstests/091 output: 091 57s ... [19:47:50] [19:48:28] [failed, exit status 1] - output mismatch (see 091.out.bad) --- 091.out 2011-11-01 10:31:12.0 +0100 +++ 091.out.bad 2012-01-05 19:48:28.0 +0100 @@ -5,3 +5,41 @@ fsx -N 1 -o 8192 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W fsx -N 1 -o 32768 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W fsx -N 1 -o 128000 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -W +./091: line 46: 25819 Segmentation fault $here/ltp/fsx $args $TEST_DIR/junk $seq.full 21 +fsx -N 1 -l 50 -r