Because it's unsafe to do any I/O at that point. We'd rather you have to reboot than scribble more bad data on your disk!
Joel On Tue, Jul 03, 2012 at 11:35:32PM -0700, Aleks Clark wrote: > it said 'clean' and exited. Working on bringing the cluster down. Is > there a reason why, after the kernel panics, ocfs2 makes all i/o > block? I can't even unmount the filesystem on any node, I have to > actually reboot it. > > On Tue, Jul 3, 2012 at 11:17 PM, Joel Becker <jl...@evilplan.org> wrote: > > On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote: > >> well, by 'clean', it said it was clean. the locks persisted though. I > >> seriously can't believe there's no way to force lock removal. is it > >> just a file somewhere I can delete? > > > > There's no lock hanging around past a full restart. This looks like > > on-disk corruption. Did fsck.ocfs2 say that it run multiple passes, or > > just say "clean" and exit? Please try fsck.ocfs2 with the '-f' flag > > (obviously with the filesystem not mounted on ANY node). > > > > Joel > > > >> > >> > >> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark <aleks.cl...@gmail.com> wrote: > >> > yep, tried that, returned clean. > >> > > >> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh > >> > <herbert.van.den.be...@oracle.com> wrote: > >> >> > >> >> One more thing: did you try running fsck.ocfs2 on it? > >> >> > >> >> Thanks, > >> >> Herbert. > >> >> > >> >> > >> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote: > >> >>> > >> >>> Hmm doesn't mean much to me, but maybe to someone else on the list. > >> >>> But > >> >>> I bet their first suggestion will be to try a recent kernel... > >> >>> > >> >>> Thanks, > >> >>> Herbert. > >> >>> > >> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote: > >> >>>> > >> >>>> Nick, I don't think so, it's a 2tb partition with only 300gb used. > >> >>>> > >> >>>> Herb, > >> >>>> > >> >>>> > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578659] > >> >>>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression: > >> >>>> path_leaf_bh(left_path) == path_leaf_bh(right_path) > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578714] > >> >>>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error > >> >>>> during insert of 15761664 (left path cpos 20725762) results in two > >> >>>> identical paths ending at 395267 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578800] ------------[ cut here > >> >>>> ]------------ > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578826] kernel BUG at > >> >>>> > >> >>>> /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483! > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578881] invalid opcode: 0000 > >> >>>> [#1] > >> >>>> SMP > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578909] last sysfs file: > >> >>>> /sys/devices/virtual/net/lo/operstate > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578937] CPU 0 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.578960] Modules linked in: > >> >>>> drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables > >> >>>> iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac > >> >>>> x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb > >> >>>> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop > >> >>>> md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 > >> >>>> i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid > >> >>>> hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata > >> >>>> usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded: > >> >>>> drbd] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm > >> >>>> Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579309] RIP: > >> >>>> 0010:[<ffffffffa041177b>] [<ffffffffa041177b>] > >> >>>> ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579363] RSP: > >> >>>> 0018:ffff880014839688 EFLAGS: 00010292 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579390] RAX: 00000000000000bf > >> >>>> RBX: 0000000000060803 RCX: 0000000000001806 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579435] RDX: 0000000000000000 > >> >>>> RSI: 0000000000000096 RDI: 0000000000000246 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579479] RBP: ffff8800148398a8 > >> >>>> R08: 00000000000209d0 R09: 000000000000000a > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579524] R10: 0000000000000000 > >> >>>> R11: 0000000100000000 R12: 00000000013c4002 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579568] R13: ffff88002a1e4030 > >> >>>> R14: 0000000000000001 R15: ffff88023c153c60 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579613] FS: > >> >>>> 00007f0cfef83700(0000) GS:ffff880008a00000(0000) > >> >>>> knlGS:0000000000000000 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579659] CS: 0010 DS: 002b ES: > >> >>>> 002b CR0: 000000008005003b > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579687] CR2: 00007f0d25dbf000 > >> >>>> CR3: 000000023ccb6000 CR4: 00000000000426e0 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579732] DR0: 0000000000000000 > >> >>>> DR1: 0000000000000000 DR2: 0000000000000000 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579776] DR3: 0000000000000000 > >> >>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid: > >> >>>> 25326, threadinfo ffff880014838000, task ffff88023b999c40) > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579867] Stack: > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579887] 0000000000f08100 > >> >>>> 00000000013c4002 0000000000060803 ffff880014839718 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579923]<0> ffff880232abde80 > >> >>>> ffff88023b999c40 ffff88023b999c40 ffff8800148397a8 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.579977]<0> ffff8800148397c8 > >> >>>> ffff8800148398a8 ffff88023d8027f8 0000000000f08100 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580047] Call Trace: > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580074] [<ffffffffa04186b9>] > >> >>>> ? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580108] [<ffffffffa0442e08>] > >> >>>> ? __ocfs2_journal_access+0x261/0x32a [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580156] [<ffffffffa04194da>] > >> >>>> ? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580205] [<ffffffffa0436a34>] > >> >>>> ? ocfs2_add_inode_data+0x62/0x6e [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580239] [<ffffffffa0442f53>] > >> >>>> ? ocfs2_journal_access_di+0x0/0xf [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580272] [<ffffffffa041c1d5>] > >> >>>> ? ocfs2_write_begin_nolock+0x1376/0x1de2 [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580321] [<ffffffffa0466e02>] > >> >>>> ? ocfs2_set_buffer_uptodate+0x15/0x60e [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580370] [<ffffffffa043a9a5>] > >> >>>> ? ocfs2_validate_inode_block+0x0/0x1ab [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580418] [<ffffffffa0442f53>] > >> >>>> ? ocfs2_journal_access_di+0x0/0xf [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580451] [<ffffffffa041cd57>] > >> >>>> ? ocfs2_write_begin+0x116/0x1d2 [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580484] [<ffffffff810b4fd0>] > >> >>>> ? generic_file_buffered_write+0x118/0x278 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580515] [<ffffffff810b54e1>] > >> >>>> ? __generic_file_aio_write+0x25f/0x293 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580548] [<ffffffffa0434fc8>] > >> >>>> ? ocfs2_prepare_inode_for_write+0x683/0x69c [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580597] [<ffffffffa042c4e2>] > >> >>>> ? ocfs2_rw_lock+0x16d/0x239 [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580628] [<ffffffffa0435b19>] > >> >>>> ? ocfs2_file_aio_write+0x45f/0x5da [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580674] [<ffffffff8101654b>] > >> >>>> ? sched_clock+0x5/0x8 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580703] [<ffffffff8104a4cc>] > >> >>>> ? default_wake_function+0x0/0x9 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580733] [<ffffffff810eebf2>] > >> >>>> ? do_sync_write+0xce/0x113 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580762] [<ffffffff81064f92>] > >> >>>> ? autoremove_wake_function+0x0/0x2e > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580792] [<ffffffff8105cd26>] > >> >>>> ? kill_pid_info+0x31/0x3b > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580819] [<ffffffff8105cefc>] > >> >>>> ? sys_kill+0x72/0x140 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580847] [<ffffffff810ef544>] > >> >>>> ? vfs_write+0xa9/0x102 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580875] [<ffffffff810ef5f4>] > >> >>>> ? sys_pwrite64+0x57/0x77 > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580902] [<ffffffff81010b42>] > >> >>>> ? system_call_fastpath+0x16/0x1b > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.580930] Code: 41 b8 b3 09 00 > >> >>>> 00 48 63 d2 48 c7 c7 6f 48 48 a0 89 0c 24 31 c0 48 c7 c1 c0 df 47 a0 > >> >>>> 48 89 5c 24 10 44 89 64 24 08 e8 5c 91 ee e0<0f> 0b eb fe 83 7c 24 > >> >>>> 5c > >> >>>> 00 75 1a 49 8b 54 17 08 8b 5c 24 58 0f > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.581120] RIP > >> >>>> [<ffffffffa041177b>] ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2] > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.581167] RSP<ffff880014839688> > >> >>>> Jul 3 14:47:26 castor kernel: [3488036.581581] ---[ end trace > >> >>>> fb597ecc3418e6d6 ]--- > >> >>>> > >> >>>> > >> >>>> On Tue, Jul 3, 2012 at 5:39 PM, Herbert van den Bergh > >> >>>> <herbert.van.den.be...@oracle.com> wrote: > >> >>>>> > >> >>>>> On 07/03/2012 04:12 PM, Aleks Clark wrote: > >> >>>>>> > >> >>>>>> Ok, so I've got this ocfs2 cluster that's been running for a long > >> >>>>>> while, hosting my VMs. All of the sudden I'm getting kernel panics > >> >>>>>> originating from ocfs2 when trying to spin up one particular file. > >> >>>>>> I've determined that there are several locks on this file, one of > >> >>>>>> them > >> >>>>>> exclusive. I restarted the whole cluster to try to get rid of it, > >> >>>>>> but > >> >>>>>> no go. I also tried to copy the file, both on and off of the > >> >>>>>> cluster, > >> >>>>>> but only half of it copied. Any way to get around either issue would > >> >>>>>> be appreciated. > >> >>>>> > >> >>>>> The panic stack may be helpful, and any messages that the kernel spit > >> >>>>> out > >> >>>>> before it. > >> >>>>> > >> >>>>> Thanks, > >> >>>>> Herbert. > >> >>>>> > >> >>>>> > >> >>>> > >> >>> _______________________________________________ > >> >>> Ocfs2-users mailing list > >> >>> Ocfs2-users@oss.oracle.com > >> >>> https://oss.oracle.com/mailman/listinfo/ocfs2-users > >> > > >> > > >> > > >> > -- > >> > Aleks Clark > >> > >> > >> > >> -- > >> Aleks Clark > >> > >> _______________________________________________ > >> Ocfs2-users mailing list > >> Ocfs2-users@oss.oracle.com > >> https://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > -- > > > > Joel's First Law: > > > > Nature abhors a GUI. > > > > http://www.jlbec.org/ > > jl...@evilplan.org > > > > -- > Aleks Clark > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users -- "Heav'n hath no rage like love to hatred turn'd, nor Hell a fury, like a woman scorn'd." - William Congreve http://www.jlbec.org/ jl...@evilplan.org _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users