Re: abort transaction and RO
On 01/15/2013 02:35 PM, Bernd Schubert wrote: Hrmm, that bug then seems to cause another bug. After the file system went into RO, I simply umounted and mounted again and a few seconds after that my entire system failed. Relevant logs are attached. Further log attachment: btrfsck /dev/vg_fuj2/test /tmp/btrfs.log 21 checking extents ref mismatch on [12689408 262144] extent item 1, found 0 Incorrect local backref count on 12689408 root 1 owner 256 offset 0 found 0 wanted 1 back 0x1869ef0 backpointer mismatch on [12689408 262144] owner ref check failed [12689408 262144] ref mismatch on [12951552 262144] extent item 0, found 1 Backref 12951552 root 1 owner 256 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 12951552 root 1 owner 256 offset 0 found 1 wanted 0 back 0x185faa0 backpointer mismatch on [12951552 262144] ref mismatch on [13213696 262144] extent item 1, found 0 Incorrect local backref count on 13213696 root 1 owner 273 offset 0 found 0 wanted 1 back 0x1869fd0 backpointer mismatch on [13213696 262144] owner ref check failed [13213696 262144] ref mismatch on [13475840 262144] extent item 1, found 0 Incorrect local backref count on 13475840 root 1 owner 259 offset 0 found 0 wanted 1 back 0x186a0b0 backpointer mismatch on [13475840 262144] owner ref check failed [13475840 262144] ref mismatch on [13737984 262144] extent item 1, found 0 Incorrect local backref count on 13737984 root 1 owner 260 offset 0 found 0 wanted 1 back 0x186a190 backpointer mismatch on [13737984 262144] owner ref check failed [13737984 262144] ref mismatch on [14000128 262144] extent item 1, found 0 Incorrect local backref count on 14000128 root 1 owner 261 offset 0 found 0 wanted 1 back 0x186a270 backpointer mismatch on [14000128 262144] owner ref check failed [14000128 262144] ref mismatch on [14262272 262144] extent item 1, found 0 Incorrect local backref count on 14262272 root 1 owner 257 offset 0 found 0 wanted 1 back 0x186a350 backpointer mismatch on [14262272 262144] owner ref check failed [14262272 262144] ref mismatch on [14524416 262144] extent item 1, found 0 Incorrect local backref count on 14524416 root 1 owner 262 offset 0 found 0 wanted 1 back 0x186a430 backpointer mismatch on [14524416 262144] owner ref check failed [14524416 262144] ref mismatch on [14786560 262144] extent item 1, found 0 Incorrect local backref count on 14786560 root 1 owner 263 offset 0 found 0 wanted 1 back 0x186a510 backpointer mismatch on [14786560 262144] owner ref check failed [14786560 262144] ref mismatch on [15048704 262144] extent item 1, found 0 Incorrect local backref count on 15048704 root 1 owner 264 offset 0 found 0 wanted 1 back 0x186a5f0 backpointer mismatch on [15048704 262144] owner ref check failed [15048704 262144] ref mismatch on [15310848 262144] extent item 1, found 0 Incorrect local backref count on 15310848 root 1 owner 265 offset 0 found 0 wanted 1 back 0x186a6d0 backpointer mismatch on [15310848 262144] owner ref check failed [15310848 262144] ref mismatch on [15572992 262144] extent item 1, found 0 Incorrect local backref count on 15572992 root 1 owner 266 offset 0 found 0 wanted 1 back 0x186a7b0 backpointer mismatch on [15572992 262144] owner ref check failed [15572992 262144] ref mismatch on [15835136 262144] extent item 1, found 0 Incorrect local backref count on 15835136 root 1 owner 267 offset 0 found 0 wanted 1 back 0x186a890 backpointer mismatch on [15835136 262144] owner ref check failed [15835136 262144] ref mismatch on [16097280 262144] extent item 1, found 0 Incorrect local backref count on 16097280 root 1 owner 268 offset 0 found 0 wanted 1 back 0x186a970 backpointer mismatch on [16097280 262144] owner ref check failed [16097280 262144] ref mismatch on [16359424 262144] extent item 1, found 0 Incorrect local backref count on 16359424 root 1 owner 269 offset 0 found 0 wanted 1 back 0x186aa50 backpointer mismatch on [16359424 262144] owner ref check failed [16359424 262144] ref mismatch on [16621568 262144] extent item 1, found 0 Incorrect local backref count on 16621568 root 1 owner 270 offset 0 found 0 wanted 1 back 0x186ab30 backpointer mismatch on [16621568 262144] owner ref check failed [16621568 262144] ref mismatch on [16883712 262144] extent item 1, found 0 Incorrect local backref count on 16883712 root 1 owner 271 offset 0 found 0 wanted 1 back 0x186ac10 backpointer mismatch on [16883712 262144] owner ref check failed [16883712 262144] ref mismatch on [17145856 262144] extent item 1, found 0 Incorrect local backref count on 17145856 root 1 owner 272 offset 0 found 0 wanted 1 back 0x186acf0 backpointer mismatch on [17145856 262144] owner ref check failed [17145856 262144] ref mismatch on [17408000 262144] extent item 1, found 0 Incorrect local backref count on 17408000 root 1 owner 281 offset 0 found 0 wanted 1 back 0x186add0 backpointer mismatch on [17408000 262144] owner ref check failed [17408000 262144] ref mismatch on [17670144 262144] extent item 1, found 0
scrub questtion
When you start btrfs scrub and point at one subvolume, what is scrubbed? Just that subvolume or the entire volume? Gene -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scrub questtion
On Tue, Jan 15, 2013 at 8:21 AM, Gene Czarcinski g...@czarc.net wrote: When you start btrfs scrub and point at one subvolume, what is scrubbed? Just that subvolume or the entire volume? The entire volume. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs for files 10GB = random spontaneous CRC failure.
Hi, i had a similar scenario like Tomasz: - Started with single 3TB Disk. - Filled the 3TB Disk with a lot of files (more than 30 with 10-30GB) - Added 2x 1,5TB Disks - btrfs balance start dconvert=raid1 mconvert=raid1 $MOUNT - # btrfs scrub start $MOUNT - # btrfs scrub status $MOUNT scrub status for $ID scrub started at Tue Jan 15 07:10:15 2013 and finished after 24020 seconds total bytes scrubbed: 4.30TB with 0 errors so at least it is no general bug in btrfs - maybe this helps you... # uname -a Linux n40l 3.7.2 #1 SMP Sun Jan 13 11:46:56 CET 2013 x86_64 GNU/Linux # btrfs version Btrfs v0.20-rc1-37-g91d9ee Regards Lars Am 14.01.2013 17:34, schrieb Chris Mason: On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote: On 14/01/13 15:57, Chris Mason wrote: On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote: On 14/01/13 14:59, Chris Mason wrote: On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: Hi, Since I had some free time over Christmas, I decided to conduct few tests over btrFS to se how it will cope with real life storage for normal gray users and I've found that filesystem will always mess up your files that are larger than 10GB. Hi Tom, I'd like to nail down the test case a little better. 1) Create on one drive, fill with data 2) Add a second drive, convert to raid1 3) find corruptions? What happens if you start with two drives in raid1? In other words, I'm trying to see if this is a problem with the conversion code. -chris Ok, my description might be a bit enigmatic so to cut long story short tests are: 1) create a single drive default btrfs volume on single partition - fill with test data - scrub - admire errors. 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on separate disk, each same size etc. - fill with test data - scrub - admire errors. 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on separate disk, each same size etc. - fill with test data - scrub - admire errors. all disks are same age + size + model ... two different batches to avoid same time failure. Ok, so we have two possible causes. #1 btrfs is writing garbage to your disks. #2 something in your kernel is corrupting your data. Since you're able to see this 100% of the time, lets assume that if #2 were true, we'd be able to trigger it on other filesystems. So, I've attached an old friend, stress.sh. Use it like this: stress.sh -n 5 -c your source directory -s your btrfs mount point It will run in a loop with 5 parallel processes and make 5 copies of your data set into the destination. It will run forever until there are errors. You can use a higher process count (-n) to force more concurrency and use more ram. It may help to pin down all but 2 or 3 GB of your memory. What I'd like you to do is find a data set and command line that make the script find errors on btrfs. Then, try the same thing on xfs or ext4 and let it run at least twice as long. Then report back ;) -chris Chris, Will do, just please be remember that 2TB of test data on customer grade sata drives will take a while to test :) Many thanks. You might want to start with a smaller data set, 20GB or so total. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- ADC-Ingenieurbüro Wiedemann | In der Borngasse 12 | 57520 Friedewald | Tel: 02743-930233 | Fax: 02743-930235 | www.adc-wiedemann.de GF: Dipl.-Ing. Hendrik Wiedemann | Umsatzsteuer-ID: DE 147979431 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix crash of starting balance
On Tue, Jan 15, 2013 at 10:47:57PM +0800, Liu Bo wrote: We will crash on BUG_ON(ret == -EEXIST) when we do not resume the existing balance but attempt to start a new one. The steps can be: 1. start balance 2. pause balance 3. start balance Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/volumes.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5cce6aa..3901654 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3100,7 +3100,12 @@ int btrfs_balance(struct btrfs_balance_control *bctl, goto out; if (!(bctl-flags BTRFS_BALANCE_RESUME)) { - BUG_ON(ret == -EEXIST); + /* + * This can happen when we do not resume the existing balance + * but try to start a new one instead. + */ + if (ret == -EEXIST) + goto out; set_balance_control(bctl); } else { BUG_ON(ret != -EEXIST); OK, it seems balance pause/resume logic got broken by dev-replace code (5ac00addc7ac09110995fe967071d191b5981cc1), which went into v3.8-rc1. This is most certainly not the right way to fix it, that BUG_ON is there for a reason. I'll send a fix in a couple of days. Thanks, Ilya -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm/slab: add a leak decoder callback
The merge processing occurs during kmem_cache_create and you are setting up the decoder field afterwards! Wont work. In the thread I suggested providing the callback at destruction: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg21130.html I liked that it limits accesibility of the callback to the only path that uses it. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can moving data to a subvolume not take as long as a fully copy?
On Tue, Jan 15, 2013 at 8:49 AM, Marc MERLIN m...@merlins.org wrote: On Mon, Jan 14, 2013 at 10:48:50PM -0800, David Brown wrote: Why not make a snapshot of the root volume, and then delete the files you want to move from the original root, and delete the rest of root from the snapshot? Are a snapshot of the root volume and a subvolume effectively the same thing as far as btrfs sees them? Once I have that snapshot which I'll treat as a subvolume, can I then snapshot that snapshot/subvolume further? Yes, the product of the btrfs snapshot command is a subvolume. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs for files 10GB = random spontaneous CRC failure.
On 14/01/13 16:34, Chris Mason wrote: On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote: On 14/01/13 15:57, Chris Mason wrote: On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote: On 14/01/13 14:59, Chris Mason wrote: On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: Hi, Since I had some free time over Christmas, I decided to conduct few tests over btrFS to se how it will cope with real life storage for normal gray users and I've found that filesystem will always mess up your files that are larger than 10GB. Hi Tom, I'd like to nail down the test case a little better. 1) Create on one drive, fill with data 2) Add a second drive, convert to raid1 3) find corruptions? What happens if you start with two drives in raid1? In other words, I'm trying to see if this is a problem with the conversion code. -chris Ok, my description might be a bit enigmatic so to cut long story short tests are: 1) create a single drive default btrfs volume on single partition - fill with test data - scrub - admire errors. 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on separate disk, each same size etc. - fill with test data - scrub - admire errors. 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on separate disk, each same size etc. - fill with test data - scrub - admire errors. all disks are same age + size + model ... two different batches to avoid same time failure. Ok, so we have two possible causes. #1 btrfs is writing garbage to your disks. #2 something in your kernel is corrupting your data. Since you're able to see this 100% of the time, lets assume that if #2 were true, we'd be able to trigger it on other filesystems. So, I've attached an old friend, stress.sh. Use it like this: stress.sh -n 5 -c your source directory -s your btrfs mount point It will run in a loop with 5 parallel processes and make 5 copies of your data set into the destination. It will run forever until there are errors. You can use a higher process count (-n) to force more concurrency and use more ram. It may help to pin down all but 2 or 3 GB of your memory. What I'd like you to do is find a data set and command line that make the script find errors on btrfs. Then, try the same thing on xfs or ext4 and let it run at least twice as long. Then report back ;) -chris Chris, Will do, just please be remember that 2TB of test data on customer grade sata drives will take a while to test :) Many thanks. You might want to start with a smaller data set, 20GB or so total. -chris Chris all, Sorry for not replying for that long but Chris old friend stress.sh have proven that all my storage is affected with this bug and first thing was to bring everything down before corruptions will spread any further. Anyway for subject sake btrfs stress have failed after 2h, ext4 stress have failed after 8h (according to time ./stress.sh blablabla ) - so it might be related to that ext4 always seamed slower on my machine than btrfs. Anyway I wanted to use this opportunity to thank Chris and everybody related to btrfs development - your file system found a hidden bug in my set up that would be there until it would pretty much corrupt everything. I don't even want to think how much my main storage got corrupted over time (etx4 over lvm over md raid 5). p.s. bizzare that when I fill ext4 partition with test data everything check's up OK (crc over all files), but with Chris tool it gets corrupted - for both Adaptec crappy pcie controller and for mother board built in one. Also since courses of history proven that my testing facilities are crap - any suggestion's on how can I test ram, cpu controller would be appreciated. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs for files 10GB = random spontaneous CRC failure.
On Tue, Jan 15, 2013 at 04:32:10PM -0700, Tom Kusmierz wrote: Chris all, Sorry for not replying for that long but Chris old friend stress.sh have proven that all my storage is affected with this bug and first thing was to bring everything down before corruptions will spread any further. Anyway for subject sake btrfs stress have failed after 2h, ext4 stress have failed after 8h (according to time ./stress.sh blablabla ) - so it might be related to that ext4 always seamed slower on my machine than btrfs. Ok, great. These problems are really hard to debug, and I'm glad we've nailed it down to the lower layers. Anyway I wanted to use this opportunity to thank Chris and everybody related to btrfs development - your file system found a hidden bug in my set up that would be there until it would pretty much corrupt everything. I don't even want to think how much my main storage got corrupted over time (etx4 over lvm over md raid 5). p.s. bizzare that when I fill ext4 partition with test data everything check's up OK (crc over all files), but with Chris tool it gets corrupted - for both Adaptec crappy pcie controller and for mother board built in one. One really hard part of tracking down corruptions is that our boxes have so much ram right now that they are often hidden by the page cache. My first advice is to boot with much less ram (1G/2G) or pin down all your ram for testing. A problem that triggers in 10 minutes is a billion times easier to figure out than one that triggers in 8 hours. Also since courses of history proven that my testing facilities are crap - any suggestion's on how can I test ram, cpu controller would be appreciated. Step one is to figure out if you've got a CPU/memory problem or an IO problem. memtest is often able to find CPU and memory problems, but if you pass memtest I like to use gcc for extra hard testing. If you have the ram, make a copy of the linux kernel tree in /dev/shm or any ramdisk/tmpfs mount. Then run make -j ; make clean in a loop until your box either crashes, gcc reports an internal compiler error, or 16 hours go by. Your loop will need to check for failed makes and stop once you get the first failure. Hopefully that will catch it. Otherwise we need to look at the IO stack. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm/slab: add a leak decoder callback
On Tue, Jan 15, 2013 at 04:30:52PM +, Christoph Lameter wrote: On Mon, 14 Jan 2013, Liu Bo wrote: This adds a leak decoder callback so that kmem_cache_destroy() can use to generate debugging output for the allocated objects. Interesting idea. @@ -3787,6 +3789,9 @@ static int slab_unmergeable(struct kmem_cache *s) if (s-ctor) return 1; + if (s-decoder) + return 1; + /* * We may have set a slab to be unmergeable during bootstrap. */ The merge processing occurs during kmem_cache_create and you are setting up the decoder field afterwards! Wont work. You're right, I miss the lock part. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm/slab: add a leak decoder callback
On Tue, Jan 15, 2013 at 09:01:05AM -0800, Zach Brown wrote: The merge processing occurs during kmem_cache_create and you are setting up the decoder field afterwards! Wont work. In the thread I suggested providing the callback at destruction: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg21130.html I liked that it limits accesibility of the callback to the only path that uses it. Well, I was trying to avoid API change, but seems we have to, I'll update the patch as your suggestion in the next version. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix crash of starting balance
On Tue, Jan 15, 2013 at 06:59:04PM +0200, Ilya Dryomov wrote: On Tue, Jan 15, 2013 at 10:47:57PM +0800, Liu Bo wrote: We will crash on BUG_ON(ret == -EEXIST) when we do not resume the existing balance but attempt to start a new one. The steps can be: 1. start balance 2. pause balance 3. start balance Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/volumes.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5cce6aa..3901654 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3100,7 +3100,12 @@ int btrfs_balance(struct btrfs_balance_control *bctl, goto out; if (!(bctl-flags BTRFS_BALANCE_RESUME)) { - BUG_ON(ret == -EEXIST); + /* +* This can happen when we do not resume the existing balance +* but try to start a new one instead. +*/ + if (ret == -EEXIST) + goto out; set_balance_control(bctl); } else { BUG_ON(ret != -EEXIST); OK, it seems balance pause/resume logic got broken by dev-replace code (5ac00addc7ac09110995fe967071d191b5981cc1), which went into v3.8-rc1. This is most certainly not the right way to fix it, that BUG_ON is there for a reason. I'll send a fix in a couple of days. Okay, right here waiting for test ;) thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] mm/slab: add a leak decoder callback
This adds a leak decoder callback so that slab destruction can use to generate debugging output for the allocated objects. Callers like btrfs are using their own leak tracking which will manage allocated objects in a list(or something else), this does indeed the same thing as what slab does. So adding a callback for leak tracking can avoid this as well as runtime overhead. (The idea is from Zach Brown z...@zabbo.net.) Signed-off-by: Liu Bo bo.li@oracle.com --- v2: add a wrapper API for slab destruction to make decoder only work in particular path. fs/btrfs/extent_io.c | 26 -- fs/btrfs/extent_map.c| 13 - include/linux/slab.h |2 ++ include/linux/slab_def.h |1 + include/linux/slub_def.h |1 + mm/slab_common.c | 17 - mm/slub.c|2 ++ 7 files changed, 58 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bcc8dff..355c7fc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -63,6 +63,26 @@ tree_fs_info(struct extent_io_tree *tree) return btrfs_sb(tree-mapping-host-i_sb); } +static void extent_state_leak_decoder(void *object) +{ + struct extent_state *state = object; + + printk(KERN_ERR btrfs state leak: start %llu end %llu + state %lu in tree %p refs %d\n, + (unsigned long long)state-start, + (unsigned long long)state-end, + state-state, state-tree, atomic_read(state-refs)); +} + +static void extent_buffer_leak_decoder(void *object) +{ + struct extent_buffer *eb = object; + + printk(KERN_ERR btrfs buffer leak start %llu len %lu + refs %d\n, (unsigned long long)eb-start, + eb-len, atomic_read(eb-refs)); +} + int __init extent_io_init(void) { extent_state_cache = kmem_cache_create(btrfs_extent_state, @@ -115,9 +135,11 @@ void extent_io_exit(void) */ rcu_barrier(); if (extent_state_cache) - kmem_cache_destroy(extent_state_cache); + kmem_cache_destroy_decoder(extent_state_cache, + extent_state_leak_decoder); if (extent_buffer_cache) - kmem_cache_destroy(extent_buffer_cache); + kmem_cache_destroy_decoder(extent_buffer_cache, + extent_buffer_leak_decoder); } void extent_io_tree_init(struct extent_io_tree *tree, diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index f359e4c..bccba3d 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -16,6 +16,16 @@ static LIST_HEAD(emaps); static DEFINE_SPINLOCK(map_leak_lock); #endif +static void extent_map_leak_decoder(void *object) +{ + struct extent_map *em = object; + + printk(KERN_ERR btrfs ext map leak: start %llu len %llu block %llu + flags %lu refs %d in tree %d compress %d\n, + em-start, em-len, em-block_start, em-flags, + atomic_read(em-refs), em-in_tree, (int)em-compress_type); +} + int __init extent_map_init(void) { extent_map_cache = kmem_cache_create(btrfs_extent_map, @@ -39,7 +49,8 @@ void extent_map_exit(void) } if (extent_map_cache) - kmem_cache_destroy(extent_map_cache); + kmem_cache_destroy_decoder(extent_map_cache, + extent_map_leak_decoder); } /** diff --git a/include/linux/slab.h b/include/linux/slab.h index 5d168d7..5c6a8d8 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -114,6 +114,7 @@ struct kmem_cache { const char *name; /* Slab name for sysfs */ int refcount; /* Use counter */ void (*ctor)(void *); /* Called on object slot creation */ + void (*decoder)(void *);/* Called on object slot leak detection */ struct list_head list; /* List of all slab caches on the system */ }; #endif @@ -132,6 +133,7 @@ struct kmem_cache * kmem_cache_create_memcg(struct mem_cgroup *, const char *, size_t, size_t, unsigned long, void (*)(void *), struct kmem_cache *); void kmem_cache_destroy(struct kmem_cache *); +void kmem_cache_destroy_decoder(struct kmem_cache *, void (*)(void *)); int kmem_cache_shrink(struct kmem_cache *); void kmem_cache_free(struct kmem_cache *, void *); diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 8bb6e0e..7ca8309 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -48,6 +48,7 @@ struct kmem_cache { /* constructor func */ void (*ctor)(void *obj); + void (*decoder)(void *obj); /* 4) cache creation/removal */ const char *name; diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 9db4825..fc18af7 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -93,6 +93,7 @@ struct kmem_cache {
Re: [PATCH V2] mm/slab: add a leak decoder callback
On wed, 16 Jan 2013 11:03:13 +0800, Liu Bo wrote: This adds a leak decoder callback so that slab destruction can use to generate debugging output for the allocated objects. Callers like btrfs are using their own leak tracking which will manage allocated objects in a list(or something else), this does indeed the same thing as what slab does. So adding a callback for leak tracking can avoid this as well as runtime overhead. If the slab is merged with the other one, this patch can work well? Thanks Miao (The idea is from Zach Brown z...@zabbo.net.) Signed-off-by: Liu Bo bo.li@oracle.com --- v2: add a wrapper API for slab destruction to make decoder only work in particular path. fs/btrfs/extent_io.c | 26 -- fs/btrfs/extent_map.c| 13 - include/linux/slab.h |2 ++ include/linux/slab_def.h |1 + include/linux/slub_def.h |1 + mm/slab_common.c | 17 - mm/slub.c|2 ++ 7 files changed, 58 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bcc8dff..355c7fc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -63,6 +63,26 @@ tree_fs_info(struct extent_io_tree *tree) return btrfs_sb(tree-mapping-host-i_sb); } +static void extent_state_leak_decoder(void *object) +{ + struct extent_state *state = object; + + printk(KERN_ERR btrfs state leak: start %llu end %llu +state %lu in tree %p refs %d\n, +(unsigned long long)state-start, +(unsigned long long)state-end, +state-state, state-tree, atomic_read(state-refs)); +} + +static void extent_buffer_leak_decoder(void *object) +{ + struct extent_buffer *eb = object; + + printk(KERN_ERR btrfs buffer leak start %llu len %lu +refs %d\n, (unsigned long long)eb-start, +eb-len, atomic_read(eb-refs)); +} + int __init extent_io_init(void) { extent_state_cache = kmem_cache_create(btrfs_extent_state, @@ -115,9 +135,11 @@ void extent_io_exit(void) */ rcu_barrier(); if (extent_state_cache) - kmem_cache_destroy(extent_state_cache); + kmem_cache_destroy_decoder(extent_state_cache, +extent_state_leak_decoder); if (extent_buffer_cache) - kmem_cache_destroy(extent_buffer_cache); + kmem_cache_destroy_decoder(extent_buffer_cache, +extent_buffer_leak_decoder); } void extent_io_tree_init(struct extent_io_tree *tree, diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index f359e4c..bccba3d 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -16,6 +16,16 @@ static LIST_HEAD(emaps); static DEFINE_SPINLOCK(map_leak_lock); #endif +static void extent_map_leak_decoder(void *object) +{ + struct extent_map *em = object; + + printk(KERN_ERR btrfs ext map leak: start %llu len %llu block %llu +flags %lu refs %d in tree %d compress %d\n, +em-start, em-len, em-block_start, em-flags, +atomic_read(em-refs), em-in_tree, (int)em-compress_type); +} + int __init extent_map_init(void) { extent_map_cache = kmem_cache_create(btrfs_extent_map, @@ -39,7 +49,8 @@ void extent_map_exit(void) } if (extent_map_cache) - kmem_cache_destroy(extent_map_cache); + kmem_cache_destroy_decoder(extent_map_cache, +extent_map_leak_decoder); } /** diff --git a/include/linux/slab.h b/include/linux/slab.h index 5d168d7..5c6a8d8 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -114,6 +114,7 @@ struct kmem_cache { const char *name; /* Slab name for sysfs */ int refcount; /* Use counter */ void (*ctor)(void *); /* Called on object slot creation */ + void (*decoder)(void *);/* Called on object slot leak detection */ struct list_head list; /* List of all slab caches on the system */ }; #endif @@ -132,6 +133,7 @@ struct kmem_cache * kmem_cache_create_memcg(struct mem_cgroup *, const char *, size_t, size_t, unsigned long, void (*)(void *), struct kmem_cache *); void kmem_cache_destroy(struct kmem_cache *); +void kmem_cache_destroy_decoder(struct kmem_cache *, void (*)(void *)); int kmem_cache_shrink(struct kmem_cache *); void kmem_cache_free(struct kmem_cache *, void *); diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 8bb6e0e..7ca8309 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -48,6 +48,7 @@ struct kmem_cache { /* constructor func */ void (*ctor)(void *obj); + void (*decoder)(void *obj); /* 4) cache creation/removal */ const char *name; diff --git