Re: space cache generation (...) does not match inode (...)
On Mon, Aug 8, 2011 at 8:14 AM, Josef Bacik jo...@redhat.com wrote: On 08/06/2011 10:16 PM, Andrew Lutomirski wrote: I've always gotten space cache generation warnings, but some time after 3.0 they started going nuts. I get: space cache generation (14667727114112179905) does not match inode (154185) and other similar messages (with a huge number and a smaller number) at rates higher than one message per ms. They don't happen constantly, but they come in bursts big enough to fill my log buffer. Yeah sorry that's going to happen when you first switch to 3.0. We switched the space cache stuff over to using the normal checksumming code so all old space cache is going to look invalid. This is nothing to worry about, it will just end up discarded and re-generated. Thanks, Can you put in a rate limit and make the message less alarming? There's enough log spam from it that I can't see anything else in my log. --Andy Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
space cache generation (...) does not match inode (...)
I've always gotten space cache generation warnings, but some time after 3.0 they started going nuts. I get: space cache generation (14667727114112179905) does not match inode (154185) and other similar messages (with a huge number and a smaller number) at rates higher than one message per ms. They don't happen constantly, but they come in bursts big enough to fill my log buffer. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:4676!
On Fri, Jun 10, 2011 at 2:43 PM, Marek Otahal markota...@gmail.com wrote: The test-case is quite easy, 1. mount the FS, just with compress-force=lzo option // I didn't try without, but on my other btrfs partition that doesn't use compression the err never happened ...so, can the others who experience the bug confirm compress=lzo used? Yes, I use compress=lzo. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to remove a device on a RAID-1 before replacing it?
I have a disk with a SMART failure. It still works but I assume it'll fail sooner or later. I want to remove it from my btrfs volume, replace it, and add the new one. But the obvious command doesn't work: # btrfs device delete /dev/dm-5 /mnt/foo ERROR: error removing the device '/dev/dm-5' dmesg says: btrfs: unable to go below two devices on raid1 With mdadm, I would fail the device, remove it, run degraded until I get a new device, and hot-add that device. With btrfs, I'd like some confirmation from the fs that data is balanced appropriately so I won't get data loss if I just yank the drive. And I don't even know how to tell btrfs to release the drive so I can safely remove it. (Mounting with -o degraded doesn't help. I could umount, remove the disk, then remount, but that feels like a hack.) This is 2.6.38.1 running Fedora 14's version of btrfs-progs, but btrfs-progs-unstable git does the same thing, as does btrfs-vol -r. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to remove a device on a RAID-1 before replacing it?
On Tue, Mar 29, 2011 at 4:21 PM, cwillu cwi...@cwillu.com wrote: On Tue, Mar 29, 2011 at 2:09 PM, Andrew Lutomirski l...@mit.edu wrote: I have a disk with a SMART failure. It still works but I assume it'll fail sooner or later. I want to remove it from my btrfs volume, replace it, and add the new one. But the obvious command doesn't work: # btrfs device delete /dev/dm-5 /mnt/foo ERROR: error removing the device '/dev/dm-5' dmesg says: btrfs: unable to go below two devices on raid1 With mdadm, I would fail the device, remove it, run degraded until I get a new device, and hot-add that device. With btrfs, I'd like some confirmation from the fs that data is balanced appropriately so I won't get data loss if I just yank the drive. And I don't even know how to tell btrfs to release the drive so I can safely remove it. (Mounting with -o degraded doesn't help. I could umount, remove the disk, then remount, but that feels like a hack.) There's no nice way to remove a failing disk in btrfs right now (btrfs dev delete is more of a online management thing to politely remove a perfectly functional disk you'd like to use for something else.) As I understand things, the only way to do it right now is the umount, remove disk, remount w/ degraded, and then btrfs add the new device. Well, the disk *is* perfectly functional. It just won't be for long. I guess what I'm saying is that either btrfs dev delete isn't really working -- I want to be able to convert to non-RAID and back or degraged and back or something else equivalent. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.37: Multi-second I/O latency while untarring
On Mon, Feb 14, 2011 at 10:22 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Andrew Lutomirski's message of 2011-02-11 19:35:02 -0500: On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote: We can tell more if you post the full traces from latencytop. I have a patch here for latencytop that adds a -c mode, which dumps the traces out to a text files. http://oss.oracle.com/~mason/latencytop.patch Based on what you have here, I think it's probably a latency problem between btrfs and the dm-crypt stuff. How easily can setup a test partition without dm-crypt? Done, on the same physical disk as before. The latency is just as bad. On this test, I wrote a total of 3.1G, which is under half of my RAM. That should rule out lots of VM issues. latencytop trace below. Just to confirm, you say on a physical disk you mean without dm-crypt? Sorry for the exceedingly slow reply. This problem is really bad with 2.6.38.1. To make it a little easier to demonstrate, I wrote a tool that shows off the problem. I made a test btrfs partition on a plain disk partition (same disk as my dm-crypt but an unencrypted partition). Now clone a kernel tree there and run make -j8. Wait until the disk starts to write data out in earnest (takes awhile to dirty enough pages). Watch crap like this happen (with nr_requests = 2048, scheduler = deadline). io_latency_watch read 1M file on test partition read took 0.000 seconds (worst = 0.963s) read took 0.000 seconds (worst = 0.963s) read took 0.022 seconds (worst = 0.963s) read took 0.000 seconds (worst = 0.963s) read took 0.028 seconds (worst = 0.963s) read took 1.430 seconds (worst = 1.430s) read took 0.270 seconds (worst = 1.430s) read took 1.237 seconds (worst = 1.430s) read took 0.282 seconds (worst = 1.430s) read took 0.131 seconds (worst = 1.430s) io_latency_watch read 1M file on other partition on same disk is similar, and io_latency_test write dir on other partition is even worse. The cfq scheduler is similar. --Andy /* io_latency_test.c * Copyright (c) 2011 Andy Lutomirski * Licensed under GPLv2. * * Compile with gcc -O2 -std=gnu99 -lrt */ #define _FILE_OFFSET_BITS 64 #define _GNU_SOURCE #include stdio.h #include stdlib.h #include unistd.h #include stdbool.h #include time.h #include stdint.h #include string.h #include signal.h #include inttypes.h #include fcntl.h volatile const char *file_to_unlink; void handler(int x) { if (file_to_unlink) unlink((char*)file_to_unlink); _exit(0); } void do_read(const char *name) { int fd = open(name, O_RDONLY | O_DIRECT); if (fd 0) { perror(open); exit(1); } uint64_t worst = 0; off_t size = lseek(fd, 0, SEEK_END); if (size == (off_t)-1) { perror(lseek); abort(); } size -= (size % 4096); if (size 4096) { printf(File is smaller than 4k\n); exit(1); } printf(File size is % PRIu64 bytes -- bigger is better\n, (uint64_t)size); while(true) { uint64_t pos = 4096 * (random() % (size / 4096)); struct timespec start; clock_gettime(CLOCK_MONOTONIC, start); unsigned char x[4096]; if (pread(fd, x, 4096, pos) != 4096) { perror(pread); abort(); } struct timespec end; clock_gettime(CLOCK_MONOTONIC, end); uint64_t ns = (end.tv_nsec - start.tv_nsec) + 10ULL * (end.tv_sec - start.tv_sec); if (ns worst) worst = ns; printf(read took %.3f seconds (worst = %.3fs)\n, 1e-9 * ns, 1e-9 * worst); if (posix_fadvise(fd, 0, size, POSIX_FADV_DONTNEED) != 0) perror(posix_fadvise); usleep(100); } } void do_write(const char *dir) { char *name; if (asprintf(name, %s/tmp.XX, dir) == -1) abort(); int fd = mkstemp(name); if (fd == -1) { perror(mkstemp); abort(); } file_to_unlink = name; uint64_t worst = 0; unsigned char x; while(true) { x++; struct timespec start; clock_gettime(CLOCK_MONOTONIC, start); if (pwrite(fd, x, 1, 0) != 1) { perror(pwrite); abort(); } if (fdatasync(fd) != 0) { perror(fdatasync); abort(); } struct timespec end; clock_gettime(CLOCK_MONOTONIC, end); uint64_t ns = (end.tv_nsec - start.tv_nsec) + 10ULL * (end.tv_sec - start.tv_sec); if (ns worst) worst = ns; printf(write + fsync took %.3f seconds (worst = %.3fs)\n, 1e-9 * ns, 1e-9 * worst); usleep(100); } } int main(int argc, char **argv) { if (argc != 3) { printf(Usage: %s write dir or %s read file\n, argv[0], argv[0]); return 1; } bool write; if (!strcmp(argv[1], write)) { write = true; } else if (!strcmp(argv[1], read)) { write = false; } else { printf(Bad mode\n); return 1; } struct sigaction sa; sa.sa_handler = handler; sigemptyset(sa.sa_mask); sa.sa_flags = 0; if (sigaction(SIGINT, sa, 0) != 0) { perror(sigaction); exit(1);
2.6.37: bash is looping unkillably in btrfs
I have two processes that are unkillable and taking about 50% of a CPU each. There is no actual I/O happening (disk light is off and the disk even spun down after awhile). This may or may not be related to unmounting a filesystem. (I'm not sure -- I have two btrfs failesystems and I unmounted one before I noticed the problem). bash is in: [81423344] schedule_timeout+0x36/0xe3 [810348df] ? ttwu_post_activation+0x60/0xf9 [81030ee2] ? need_resched+0x23/0x2d [8142312c] wait_for_common+0xa8/0xf8 [8103af01] ? default_wake_function+0x0/0x14 [81047c5f] ? local_bh_enable_ip+0xe/0x10 [81423234] wait_for_completion+0x1d/0x1f [81114fa5] writeback_inodes_sb_nr+0x76/0x7d [81115378] writeback_inodes_sb_nr_if_idle+0x41/0x56 [a01562b8] shrink_delalloc.clone.43+0xa4/0x13c [btrfs] [814245c6] ? _raw_spin_lock+0xe/0x10 [a0159aae] btrfs_delalloc_reserve_metadata+0x12e/0x140 [btrfs] [a0159b94] btrfs_delalloc_reserve_space+0x2a/0x47 [btrfs] [810b5db0] ? unlock_page+0x2a/0x2f [a0172885] btrfs_file_aio_write+0x503/0x8b3 [btrfs] [811c302e] ? security_dentry_open+0x2f/0x33 [8110dcef] ? mnt_want_write+0x2e/0x4a [810f78fe] do_sync_write+0xcb/0x108 [81030efa] ? should_resched+0xe/0x2e [811ca3a2] ? selinux_file_permission+0x5a/0xb9 [811c2f0a] ? security_file_permission+0x2e/0x33 [810f7fed] vfs_write+0xac/0xff [810f81f4] sys_write+0x4a/0x6e [81002beb] system_call_fastpath+0x16/0x1b flush-btrfs-6 is in: [81115f50] bdi_writeback_thread+0x151/0x20b [81115dff] ? bdi_writeback_thread+0x0/0x20b [81115dff] ? bdi_writeback_thread+0x0/0x20b [8105c550] kthread+0x82/0x8a [81003994] kernel_thread_helper+0x4/0x10 [8105c4ce] ? kthread+0x0/0x8a [81003990] ? kernel_thread_helper+0x0/0x10 perf top says: 870.00 23.2% clockevents_program_event /lib/modules/2.6.37+/build/vmlinux 541.00 14.4% update_ts_time_stats /lib/modules/2.6.37+/build/vmlinux 465.00 12.4% sched_clock /lib/modules/2.6.37+/build/vmlinux 133.00 3.5% get_next_timer_interrupt /lib/modules/2.6.37+/build/vmlinux 116.00 3.1% schedule /lib/modules/2.6.37+/build/vmlinux 57.00 1.5% pick_next_task_fair /lib/modules/2.6.37+/build/vmlinux 55.00 1.5% select_task_rq_fair /lib/modules/2.6.37+/build/vmlinux 50.00 1.3% __switch_to /lib/modules/2.6.37+/build/vmlinux 46.00 1.2% enqueue_hrtimer /lib/modules/2.6.37+/build/vmlinux 45.00 1.2% ktime_get /lib/modules/2.6.37+/build/vmlinux 44.00 1.2% read_hpet /lib/modules/2.6.37+/build/vmlinux 37.00 1.0% try_to_wake_up /lib/modules/2.6.37+/build/vmlinux 36.00 1.0% task_rq_lock /lib/modules/2.6.37+/build/vmlinux 36.00 1.0% update_rq_clock /lib/modules/2.6.37+/build/vmlinux 34.00 0.9% sched_clock_cpu /lib/modules/2.6.37+/build/vmlinux 33.00 0.9% tick_nohz_restart_sched_tick /lib/modules/2.6.37+/build/vmlinux 32.00 0.9% resched_task /lib/modules/2.6.37+/build/vmlinux 31.00 0.8% enqueue_entity /lib/modules/2.6.37+/build/vmlinux 31.00 0.8% _raw_spin_unlock_irqrestore /lib/modules/2.6.37+/build/vmlinux 31.00 0.8% reschedule_interrupt /lib/modules/2.6.37+/build/vmlinux 30.00 0.8% c1e_idle /lib/modules/2.6.37+/build/vmlinux 30.00 0.8% _raw_spin_lock /lib/modules/2.6.37+/build/vmlinux 29.00 0.8% tick_nohz_stop_sched_tick /lib/modules/2.6.37+/build/vmlinux 27.00 0.7% place_entity /lib/modules/2.6.37+/build/vmlinux 25.00 0.7% wb_do_writeback /lib/modules/2.6.37+/build/vmlinux 25.00 0.7% sched_clock_local /lib/modules/2.6.37+/build/vmlinux 24.00 0.6% do_raw_spin_lock /lib/modules/2.6.37+/build/vmlinux 22.00 0.6% local_bh_disable /lib/modules/2.6.37+/build/vmlinux 22.00 0.6% rb_insert_color /lib/modules/2.6.37+/build/vmlinux 22.00 0.6% _raw_spin_lock_bh /lib/modules/2.6.37+/build/vmlinux 21.00 0.6% __hrtimer_start_range_ns /lib/modules/2.6.37+/build/vmlinux 21.00 0.6% rb_erase /lib/modules/2.6.37+/build/vmlinux 20.00 0.5% hrtick_update /lib/modules/2.6.37+/build/vmlinux 19.00 0.5% dec128 /lib/modules/2.6.37+/kernel/arch/x86/crypto/aes-x86_64.ko 19.00 0.5% select_nohz_load_balancer /lib/modules/2.6.37+/build/vmlinux 18.00 0.5% select_idle_sibling /lib/modules/2.6.37+/build/vmlinux 17.00 0.5% update_curr /lib/modules/2.6.37+/build/vmlinux powertop shows 47k IPI per seconds (rescheduling interrupts). perf record -p 14726
2.6.37: Multi-second I/O latency while untarring
As I type this, I have an ssh process running that's dumping data into a fifo at high speed (maybe 500Mbps) and a tar process that's untarring from the same fifo onto btrfs. The btrfs fs is mounted -o space_cache,compress. This machine has 8GB ram, 8 logical cores, and a fast (i7-2600) CPU, so it's not an issue with the machine struggling under load. Every few tens of seconds, my system stalls for several seconds. These stalls cause keyboard input to be lost, firefox to hang, etc. Setting tar's ionice priority to best effort / 7 or to idle makes no difference. ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes no difference. max_sectors_kb = 64 in addition to the above doesn't help either. latencytop shows regular instances of 2-7 *second* latency, variously in sync_page, start_transaction, btrfs_start_ordered_extent, and do_get_write_access (from jbd2 on my ext4 root partition). echo 3 drop_caches gave me 7 GB free RAM. I still had stalls when 4-5 GB were still free (so it shouldn't be a problem with important pages being evicted). In case it matters, all of my partitions are on LVM on dm-crypt, but this machine has AES-NI so the overhead from that should be minimal. In fact, overall CPU usage is only about 10%. What gives? I thought this stuff was supposed to be better on modern kernels. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.37: Multi-second I/O latency while untarring
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500: As I type this, I have an ssh process running that's dumping data into a fifo at high speed (maybe 500Mbps) and a tar process that's untarring from the same fifo onto btrfs. The btrfs fs is mounted -o space_cache,compress. This machine has 8GB ram, 8 logical cores, and a fast (i7-2600) CPU, so it's not an issue with the machine struggling under load. Every few tens of seconds, my system stalls for several seconds. These stalls cause keyboard input to be lost, firefox to hang, etc. Setting tar's ionice priority to best effort / 7 or to idle makes no difference. ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes no difference. max_sectors_kb = 64 in addition to the above doesn't help either. latencytop shows regular instances of 2-7 *second* latency, variously in sync_page, start_transaction, btrfs_start_ordered_extent, and do_get_write_access (from jbd2 on my ext4 root partition). echo 3 drop_caches gave me 7 GB free RAM. I still had stalls when 4-5 GB were still free (so it shouldn't be a problem with important pages being evicted). In case it matters, all of my partitions are on LVM on dm-crypt, but this machine has AES-NI so the overhead from that should be minimal. In fact, overall CPU usage is only about 10%. What gives? I thought this stuff was supposed to be better on modern kernels. We can tell more if you post the full traces from latencytop. I have a patch here for latencytop that adds a -c mode, which dumps the traces out to a text files. http://oss.oracle.com/~mason/latencytop.patch Big dump at end of email from latencytop git + your patch. Based on what you have here, I think it's probably a latency problem between btrfs and the dm-crypt stuff. How easily can setup a test partition without dm-crypt? Not so easily on that disk. I left some space inside the LVM to play with but none outside. I'll try hooking up another disk over eSATA l (on a Cougar Point 3Gbps controller, so it might blow up). And here's the dump: === Fri Feb 11 14:44:07 2011 Globals: Cause Maximum Percentage synchronous write 4249.1 msec 35.5 % Writing to a pipe 4248.5 msec 35.5 % Writing a page to disk 105.9 msec 2.1 % Page fault 23.7 msec 0.2 % Reading from a pipe 4.7 msec 19.8 % Waiting for event (select)4.6 msec 6.4 % Waiting for event (poll) 1.3 msec 0.0 % Executing raw SCSI command1.3 msec 0.2 % opening cdrom device 1.3 msec 0.3 % Process details: Process ksoftirqd/1 (10) Total: 50.0 msec [run_ksoftirqd] 4.8 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/2 (15) Total: 8.7 msec [run_ksoftirqd] 4.9 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/3 (19) Total: 2.9 msec [run_ksoftirqd] 2.9 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/5 (27) Total: 80.6 msec [run_ksoftirqd] 5.0 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process scsi_eh_1 (62) Total: 45.0 msec Executing internal ATA command0.7 msec 62.3 % ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler ahci_error_handler ata_scsi_error scsi_error_handler kthread kernel_thread_helper SCSI error handler0.6 msec 37.7 % scsi_error_handler kthread kernel_thread_helper Process kworker/7:1 (76) Total: 8.7 msec . 3.9 msec100.0 % worker_thread kthread kernel_thread_helper Process kworker/4:1 (139) Total: 124.0 msec . 4.9 msec100.0 % worker_thread kthread kernel_thread_helper Process kworker/6:1 (140) Total: 11.7 msec . 3.8 msec100.0 % worker_thread kthread kernel_thread_helper Process kworker/5:1 (141) Total: 12.5 msec . 4.9 msec100.0 % worker_thread kthread kernel_thread_helper Process kworker/2:1 (142) Total: 26.1 msec . 4.9 msec100.0 % worker_thread kthread kernel_thread_helper Process kworker/1:1 (143) Total: 47.1 msec . 4.9 msec100.0 % worker_thread kthread kernel_thread_helper Process kworker/3:1 (150) Total: 4.6 msec . 3.1 msec100.0 % worker_thread kthread kernel_thread_helper Process jbd2/dm-1-8 (376) Total: 66.7 msec Writing buffer to disk (synchronous) 66.7 msec100.0 %
Re: 2.6.37: Multi-second I/O latency while untarring
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500: As I type this, I have an ssh process running that's dumping data into a fifo at high speed (maybe 500Mbps) and a tar process that's untarring from the same fifo onto btrfs. The btrfs fs is mounted -o space_cache,compress. This machine has 8GB ram, 8 logical cores, and a fast (i7-2600) CPU, so it's not an issue with the machine struggling under load. Every few tens of seconds, my system stalls for several seconds. These stalls cause keyboard input to be lost, firefox to hang, etc. Setting tar's ionice priority to best effort / 7 or to idle makes no difference. ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes no difference. max_sectors_kb = 64 in addition to the above doesn't help either. latencytop shows regular instances of 2-7 *second* latency, variously in sync_page, start_transaction, btrfs_start_ordered_extent, and do_get_write_access (from jbd2 on my ext4 root partition). echo 3 drop_caches gave me 7 GB free RAM. I still had stalls when 4-5 GB were still free (so it shouldn't be a problem with important pages being evicted). In case it matters, all of my partitions are on LVM on dm-crypt, but this machine has AES-NI so the overhead from that should be minimal. In fact, overall CPU usage is only about 10%. What gives? I thought this stuff was supposed to be better on modern kernels. We can tell more if you post the full traces from latencytop. I have a patch here for latencytop that adds a -c mode, which dumps the traces out to a text files. http://oss.oracle.com/~mason/latencytop.patch Based on what you have here, I think it's probably a latency problem between btrfs and the dm-crypt stuff. How easily can setup a test partition without dm-crypt? Done, on the same physical disk as before. The latency is just as bad. On this test, I wrote a total of 3.1G, which is under half of my RAM. That should rule out lots of VM issues. latencytop trace below. The impression I get (from watching the disk activity light) is that the disk is mostly idle but every now and then writes out a ton of data. While it's writing, the system often becomes unusable. P.S. How bad is this? I got it on both disks. btrfs: free space inode generation (0) did not match free space cache generation (11070) for block group 1103101952 === Fri Feb 11 19:30:57 2011 Globals: Cause Maximum Percentage Writing a page to disk 2009.0 msec 19.7 % fsync() on a file (type 'F' for details)612.2 msec 5.0 % synchronous write 573.6 msec 1.8 % Page fault 57.3 msec 0.7 % Writing buffer to disk (synchronous) 45.2 msec 0.1 % Unlinking file 12.6 msec 0.0 % Waiting for event (select)5.0 msec 22.3 % Reading from a pipe 5.0 msec 29.9 % Waiting for event (poll) 5.0 msec 17.8 % Process details: Process kthreadd (2) Total: 1.9 msec kthreadd kernel thread1.9 msec100.0 % kthreadd kernel_thread_helper Process ksoftirqd/0 (3) Total: 18.5 msec [run_ksoftirqd] 4.0 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/1 (10) Total: 19.6 msec [run_ksoftirqd] 4.9 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process kworker/0:1 (11) Total: 556.3 msec . 5.0 msec100.0 % worker_thread kthread kernel_thread_helper Process ksoftirqd/2 (15) Total: 8.1 msec [run_ksoftirqd] 2.9 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/4 (23) Total: 11.2 msec [run_ksoftirqd] 4.3 msec100.0 % run_ksoftirqd kthread kernel_thread_helper Process scsi_eh_1 (62) Total: 38.8 msec SCSI error handler0.9 msec 39.9 % scsi_error_handler kthread kernel_thread_helper Executing internal ATA command0.7 msec 60.1 % ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler ahci_error_handler ata_scsi_error scsi_error_handler kthread kernel_thread_helper Process kworker/u:4 (69) Total: 616.5 msec Creating block layer request 54.9 msec 77.8 % get_request_wait __make_request generic_make_request kcryptd_crypt_write_io_submit kcryptd_crypt process_one_work worker_thread kthread kernel_thread_helper . 5.0 msec 22.2 % worker_thread kthread kernel_thread_helper Process kworker/u:5 (70) Total: 1712.3 msec Creating block layer request492.8 msec 94.3 %
[2.6.33 regression] btrfs mount causes memory corruption
Mounting btrfs corrupts memory and causes nasty crashes within a few seconds. This seems to happen even if the mount fails (note the unrecognized mount option). This is a regression from 2.6.32, and I've attached an example. --Andy Btrfs loaded device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 2 transid 68136 /dev/dm-2 device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 1 transid 68136 /dev/dm-1 device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 2 transid 68136 /dev/mapper/big_2 device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 1 transid 68136 /dev/mapper/big_1 device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 1 transid 68136 /dev/mapper/big_1 btrfs: unrecognized mount option 'acl' btrfs: open_ctree failed [ cut here ] kernel BUG at mm/slub.c:2969! invalid opcode: [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 6 Pid: 2692, comm: bash Tainted: GW 2.6.33 #2 P6T WS PRO/System Product Name RIP: 0010:[810fbbde] [810fbbde] kfree+0x62/0xd5 RSP: 0018:88019db87c68 EFLAGS: 00010246 RAX: 0048 RBX: 88019db87d18 RCX: 8801b175de20 RDX: ea00 RSI: ea000380 RDI: 8801 RBP: 88019db87c88 R08: 81a57aa0 R09: 8801b551c240 R10: 0002412fde13 R11: R12: 8801 R13: 811d9532 R14: 0010 R15: 88019db87ce8 FS: 7fde0bce7700() GS:8800282c() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f041b1b4600 CR3: 0001b776a000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process bash (pid: 2692, threadinfo 88019db86000, task 88019d928000) Stack: 8801b551c240 88019db87d18 88019b65f164 0 88019db87ca8 811d9532 88019db87ce8 8801b4b8f548 0 88019db87cc8 811de035 8801b4b8f548 8801b644bba8 Call Trace: [811d9532] ebitmap_destroy+0x21/0x3c [811de035] context_destroy+0x58/0x6c [811e0787] security_compute_sid+0x26d/0x282 [811e0815] security_transition_sid+0x1f/0x21 [811d45d9] selinux_bprm_set_creds+0xd1/0x25f [810e3510] ? vma_link+0x88/0xb1 [811d4a29] ? selinux_vm_enough_memory+0x40/0x45 [8120cc58] ? spin_unlock_irqrestore+0x9/0xb [8120cce0] ? __up_write+0x42/0x47 [811c909d] security_bprm_set_creds+0x13/0x15 [8110cc3b] prepare_binprm+0xc3/0xf0 [8110d55e] do_execve+0x150/0x2d2 [81010eaf] sys_execve+0x43/0x5a [8100a0ca] stub_execve+0x6a/0xc0 Code: 83 c3 08 48 83 3b 00 eb ec 49 83 fc 10 0f 86 82 00 00 00 4c 89 e7 e8 c5 e2 ff ff 48 89 c6 48 8b 00 84 c0 78 14 66 a9 00 c0 75 04 0f 0b eb fe 48 89 f7 e8 ea 36 fd ff eb 5c 48 8b 4d 08 48 8b 7e RIP [810fbbde] kfree+0x62/0xd5 RSP 88019db87c68 ---[ end trace 57f7151f6a5def07 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.33 regression] btrfs mount causes memory corruption
On Thu, Feb 25, 2010 at 3:23 PM, Josef Bacik jo...@redhat.com wrote: On Thu, Feb 25, 2010 at 03:01:08PM -0500, Andrew Lutomirski wrote: Mounting btrfs corrupts memory and causes nasty crashes within a few seconds. This seems to happen even if the mount fails (note the unrecognized mount option). This is a regression from 2.6.32, and I've attached an example. And it only happens when you mount a btrfs fs? Can you show me a trace of when you mount a btrfs fs with valid mount options? I'd like to see if we're not cleaning up something properly or what. Thanks, Seems OK. Or maybe I just got lucky, but it's crashed every time I tried to mount with 'acl' before. I even went through a couple iterations of trying to mount with 'xattr' and 'user_xattr', both of which failed. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.33 regression] btrfs mount causes memory corruption
On Thu, Feb 25, 2010 at 3:38 PM, Josef Bacik jo...@redhat.com wrote: Ok it looks like we have a problem kfree'ing the wrong stuff. we kstrdup the options string, but then strsep screws with the pointer, so when we kfree() it, we're not giving it the right pointer. Please try this patch, and mount with -o acl and other such garbage to make sure it actually worked (acl isn't a valid mount option btw). Let me know if it works. Thanks, Josef diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 8a1ea6e..f8b4521 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -128,7 +128,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) { struct btrfs_fs_info *info = root-fs_info; substring_t args[MAX_OPT_ARGS]; - char *p, *num; + char *p, *num, *orig; int intarg; int ret = 0; @@ -143,6 +143,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) if (!options) return -ENOMEM; + orig = options; while ((p = strsep(options, ,)) != NULL) { int token; @@ -280,7 +281,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) } } out: - kfree(options); + kfree(orig); return ret; } Thanks for the instant patch. I hammered on it a bit and it hasn't crashed yet. I'll let you know if it crashes later. (The earlier trial with xattr crashed after a couple minutes.) In the mean time, Tested-by: Andy Lutomirski l...@mit.edu --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Where is current btrfs and btrfs-progs development?
It looks like the git trees at: http://git.kernel.org/smart/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git and http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=summary are several weeks out of date. For example, the patch here: http://news.gmane.org/gmane.comp.file-systems.btrfs looks like it's based on a revision that isn't in the btrfs-progs-unstable tree. Is there an up to date tree or patch queue somewhere? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Snapshot mysteries (and an oops)
Hi all- I'm a bit mystified by snapshots. I think that there are some bugs in btrfsctl at least (or maybe its documentation). There's definitely at least one bug in the kernel. Here's some commands I just tried (vanilla 2.6.32, btrfs-progs from git today. test is a brand-new empty btrfs filesystem, mounted with default options). Questions and comments are inline: [test]# btrfsctl -S subvol1 . operation complete Btrfs v0.19-4-gab8fb4c [test]# touch subvol1/file1 [test]# btrfsctl -s snap1 subvol1 operation complete Btrfs v0.19-4-gab8fb4c [test]# ls snap1 file1 OK, so it looks like I can make a snapshot of a subvolume, and everything works as expected. [test]# mkdir dir2 [test]# touch dir2/file2 [test]# btrfsctl -s snap2 dir2 operation complete Btrfs v0.19-4-gab8fb4c [test]# ls snap2 dir2 snap1 subvol1 [test]# ls snap2/snap1 [test]# WTF? It looks like btrfsctl just snapshotted the subvolume containing dir2 instead of snapshotting the directory. I would have expected it to either snapshot just the directory or, if that's impossible, to fail. [test]# rm -rf snap1 rm: cannot remove directory `snap1': Directory not empty [test]# ls snap1 [test]# OK, so rmdir can't remove snapshots. (Is there any good reason for that?) [test]# btrfsctl -D snap1 ioctl:: No such file or directory [test]# btrfsctl -D snap1 . operation complete Btrfs v0.19-4-gab8fb4c I can't make any sense of that. What's the second parameter to -D supposed to do? [test]# btrfsctl -D subvol1 . operation complete Btrfs v0.19-4-gab8fb4c Phew. That worked :) [test]# rm -rf * OK, now I'm back to where I started. [test]# btrfsctl -S subvol2 . operation complete Btrfs v0.19-4-gab8fb4c [test]# touch subvol2/file [test]# ln subvol2/file file Segmentation fault Crap. I guess I wasn't supposed to try that. dmesg attached: Process ln (pid: 3153, threadinfo 88019694, task 8801a4149780) Stack: 88017a741e00 88018af585d0 000e 0 880196941e28 88018af585d0 88017a741e00 88017a7905d0 0 88017e4b3680 88017a790688 880196941e78 81105988 Call Trace: [81105988] vfs_link+0xd5/0x14a Thanks, Andy [811057e9] ? lookup_hash+0x3b/0x3f [81107eb1] sys_linkat+0xc4/0x121 [8106af52] ? up_read+0xe/0x10 [8141d2a9] ? do_page_fault+0x269/0x299 [81095e6c] ? audit_syscall_entry+0x11e/0x14a [81107f2c] sys_link+0x1e/0x22 [81011cf2] system_call_fastpath+0x16/0x1b Code: ff 85 c0 41 89 c6 ba 01 00 00 00 75 39 49 8b 44 24 20 48 89 da 4c 89 fe 4c 89 e7 49 89 45 e0 e8 8f dc ff ff 85 c0 41 89 c6 74 04 0f 0b eb fe 48 8b 45 b8 31 d2 48 89 de 4c 89 e7 48 8b 48 28 e8 RIP [a0bc4305] btrfs_link+0xcf/0x144 [btrfs] RSP 880196941dd8 ---[ end trace 95f0a8585b4e506f ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html