Yet another Panic again today:
Oct 8 12:36:00 n9 kernel: [79230.175890] Unable to handle kernel NULL pointer dereference at 0000000000000258 RIP: Oct 8 12:36:00 n9 kernel: [79230.175917] [<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20 Oct 8 12:36:00 n9 kernel: [79230.176023] PGD 3d08c5067 PUD 331112067 PMD 0 Oct 8 12:36:00 n9 kernel: [79230.176059] Oops: 0000 [1] SMP Oct 8 12:36:00 n9 kernel: [79230.176091] CPU 3 Oct 8 12:36:00 n9 kernel: [79230.176117] Modules linked in: nfs lockd nfs_acl sunrpc ocfs2 crc32c libcrc32c ipmi_devintf ipmi_si ipmi_msghandler ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs iptabl e_filter ip_tables x_tables xfs ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi parport_pc lp parport loop i2c_piix4 dcdbas i2c_core psmouse button shpchp pci_hotplug k8temp serio_raw pcspkr evdev ext3 jbd mbcache sr_mod cdrom sg sd_mod pata_serverworks usbhid hid ata_generic tg3 ehci_hcd pata_acpi sata_svw ohci_hcd libata scsi_mod usbcore therma l processor fan fbcon tileblit font bitblit softcursor fuse Oct 8 12:36:00 n9 kernel: [79230.176537] Pid: 4915, comm: o2net Not tainted 2.6.24-24-server #1 Oct 8 12:36:00 n9 kernel: [79230.176571] RIP: 0010:[<ffffffff88473a7e>] [<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20 Oct 8 12:36:00 n9 kernel: [79230.176636] RSP: 0000:ffff8104119b3ca8 EFLAGS: 00010282 Oct 8 12:36:00 n9 kernel: [79230.176667] RAX: 0000000000000000 RBX: ffff8103def84018 RCX: 0000000000000005 Oct 8 12:36:00 n9 kernel: [79230.176703] RDX: ffff8103def83100 RSI: 0000000000000005 RDI: ffff8103def84018 Oct 8 12:36:00 n9 kernel: [79230.176738] RBP: ffff8103def84400 R08: ffff8103def84400 R09: ffff8103dee43a00 Oct 8 12:36:00 n9 kernel: [79230.176774] R10: 000000000000004e R11: ffffffff8847b580 R12: 0900000000007aa4 Oct 8 12:36:00 n9 kernel: [79230.176809] R13: 0000000000000005 R14: 0000000000000000 R15: 000000000000001f Oct 8 12:36:00 n9 kernel: [79230.176845] FS: 00002ad989b79670(0000) GS:ffff810416d4ac80(0000) knlGS:00000000f5420b90 Oct 8 12:36:00 n9 kernel: [79230.176899] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Oct 8 12:36:00 n9 kernel: [79230.176931] CR2: 0000000000000258 CR3: 0000000370517000 CR4: 00000000000006e0 Oct 8 12:36:00 n9 kernel: [79230.176966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 8 12:36:00 n9 kernel: [79230.177002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 8 12:36:00 n9 kernel: [79230.177037] Process o2net (pid: 4915, threadinfo ffff8104119b2000, task ffff8104115247f0) Oct 8 12:36:00 n9 kernel: [79230.177092] Stack: ffffffff8847b5a6 ffff810411440400 00000000161974a2 ffff8104114c1028 Oct 8 12:36:00 n9 kernel: [79230.177155] 0000000000000000 ffff8103def84400 0900000000007aa4 ffff8104114c1018 Oct 8 12:36:00 n9 kernel: [79230.177215] 0000000000000000 000000000000001f ffffffff8840bef4 000000000000012c Oct 8 12:36:00 n9 kernel: [79230.177256] Call Trace: Oct 8 12:36:00 n9 kernel: [79230.177312] [<ffffffff8847b5a6>] :ocfs2:ocfs2_blocking_ast+0x26/0x310 Oct 8 12:36:00 n9 kernel: [79230.177366] [ocfs2_dlm:dlm_proxy_ast_handler+0x824/0x830] :ocfs2_dlm:dlm_proxy_ast_handler+0x824/0x830 Oct 8 12:36:00 n9 kernel: [79230.177427] [ocfs2_nodemanager:do_gettimeofday+0x2f/0x2fb90] do_gettimeofday+0x2f/0xc0 Oct 8 12:36:00 n9 kernel: [79230.177481] [ocfs2_nodemanager:o2net_process_message+0x4cc/0x5b0] :ocfs2_nodemanager:o2net_process_message+0x4cc/0x5b0 Oct 8 12:36:00 n9 kernel: [79230.177540] [__dequeue_entity+0x3d/0x50] __dequeue_entity+0x3d/0x50 Oct 8 12:36:00 n9 kernel: [79230.177580] [ocfs2_nodemanager:o2net_recv_tcp_msg+0x65/0x80] :ocfs2_nodemanager:o2net_recv_tcp_msg+0x65/0x80 Oct 8 12:36:00 n9 kernel: [79230.177643] [ocfs2_nodemanager:o2net_rx_until_empty+0x38b/0x900] :ocfs2_nodemanager:o2net_rx_until_empty+0x38b/0x900 Oct 8 12:36:00 n9 kernel: [79230.177707] [ocfs2_nodemanager:o2net_rx_until_empty+0x0/0x900] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0x900 Oct 8 12:36:00 n9 kernel: [79230.177765] [run_workqueue+0xcc/0x170] run_workqueue+0xcc/0x170 Oct 8 12:36:00 n9 kernel: [79230.177799] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.177832] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.177865] [worker_thread+0xa3/0x110] worker_thread+0xa3/0x110 Oct 8 12:36:00 n9 kernel: [79230.177899] [<ffffffff80254510>] autoremove_wake_function+0x0/0x30 Oct 8 12:36:00 n9 kernel: [79230.177935] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.177969] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.178001] [kthread+0x4b/0x80] kthread+0x4b/0x80 Oct 8 12:36:00 n9 kernel: [79230.178036] [child_rip+0xa/0x12] child_rip+0xa/0x12 Oct 8 12:36:00 n9 kernel: [79230.177969] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.178001] [kthread+0x4b/0x80] kthread+0x4b/0x80 Oct 8 12:36:00 n9 kernel: [79230.178036] [child_rip+0xa/0x12] child_rip+0xa/0x12 Oct 8 12:36:00 n9 kernel: [79230.178073] [kthread+0x0/0x80] kthread+0x0/0x80 Oct 8 12:36:00 n9 kernel: [79230.178104] [child_rip+0x0/0x12] child_rip+0x0/0x12 Oct 8 12:36:00 n9 kernel: [79230.179971] Oct 8 12:36:00 n9 kernel: [79230.179993] Oct 8 12:36:00 n9 kernel: [79230.179993] Code: 48 8b 80 58 02 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 8b 47 Oct 8 12:36:00 n9 kernel: [79230.180111] RIP [<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20 Oct 8 12:36:00 n9 kernel: [79230.180156] RSP <ffff8104119b3ca8> Oct 8 12:36:00 n9 kernel: [79230.180183] CR2: 0000000000000258 Oct 8 12:36:00 n9 kernel: [79230.180566] ---[ end trace ae9a4fee19ded66d ]--- : On Wed, Oct 7, 2009 at 8:31 PM, Sunil Mushran <sunil.mush...@oracle.com>wrote: > It could be the stale inode info was propagated by the nfs node > to the oopsing node via the lvb. But I am not sure about that. > > In any event, applying the fix would be a step forward. The fix > has been in mainline for quite sometime now. > > Laurence Mayer wrote: > >> Nope, the node that crashed is not the NFS server. >> How should I proceed? >> What do you suggest? >> Could this happen again? >> >> On Wed, Oct 7, 2009 at 8:16 PM, Sunil Mushran >> <sunil.mush...@oracle.com<mailto: >> sunil.mush...@oracle.com>> wrote: >> >> And does the node exporting the volume encounter the oops? >> >> If so, the likeliest candidate would be: >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a >> >> If it is on another node, I am currently unsure whether a nfs >> export on one node could cause this to occur on another. Need more >> coffee. >> >> The problem in short is due to how nfs bypasses the normal fs lookup >> to access files. It uses the file handle to directly access the inode, >> bypassing the locking. Normally that is not a problem. The race window >> is if the file is deleted (on any node in the cluster) and nfs >> reads that >> inode without the lock. In the oops we see the disk generation is >> greater >> than the in-memory inode generation. That means the inode was >> deleted and >> reused. The fix closes the race window. >> >> Sunil >> >> Laurence Mayer wrote: >> >> Yes. >> We have setup 10 node cluster, with one of the nodes exporting >> the NFS to the workstations. >> Please expand your answer. >> Thanks >> Laurence >> >> >> On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran >> <sunil.mush...@oracle.com <mailto:sunil.mush...@oracle.com> >> <mailto:sunil.mush...@oracle.com >> <mailto:sunil.mush...@oracle.com>>> wrote: >> >> Are you exporting this volume via nfs? We fixed a small >> race (in >> the nfs >> access path) that could lead to this oops. >> >> Laurence Mayer wrote: >> >> Hi again, >> OS: Ubuntu 8.04 x64 >> Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 >> 19:39:36 UTC >> 2009 x86_64 GNU/Linux >> 10 Node Cluster >> OCFS2 Version: 1.3.9-0ubuntu1 >> I received this panic on the 5th Oct, I cannot work >> out why >> this has started to happen. >> Please please can you provide directions. >> Let me know if you require any further details or >> information. >> Oct 5 10:21:22 n1 kernel: [1006473.993681] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: >> inode->i_generation != le32_to_cpu(fe->i_generation) >> Oct 5 10:21:22 n1 kernel: [1006473.993756] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode >> 3064741 disk generation: 1309441612 inode->i_generation: 13 >> 09441501 >> Oct 5 10:21:22 n1 kernel: [1006473.993865] >> ------------[ cut >> here ]------------ >> Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at >> /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! >> Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: >> 0000 [3] SMP >> Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 >> Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules >> linked in: >> ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs >> ipmi_devintf >> ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm >> ocfs2_nodemanager configfs iptable_filter ip_tables >> x_tables >> xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >> ib_addr >> iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl >> sunrpc parport_pc lp parport loop serio_raw psmouse >> i2c_piix4 >> i2c_core dcdbas evdev button k8temp shpchp pci_hotplug >> pcspkr >> ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic >> pata_acpi >> usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd >> libata scsi_mod usbcore thermal processor fan fbcon >> tileblit >> font bitblit softcursor fuse >> Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, >> comm: R >> Tainted: G D 2.6.24-24-server #1 >> Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: >> 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] >> :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: >> 0018:ffff8101238f9d58 EFLAGS: 00010296 >> Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: >> 0000000000000093 RBX: ffff8102eaf03000 RCX: >> 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: >> 00000000ffffffff RSI: 0000000000000000 RDI: >> ffffffff8058ffa4 >> Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: >> 0000000100080000 R08: 0000000000000000 R09: >> 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: >> 0000000000000000 R11: 0000000000000000 R12: >> ffff81012599ee00 >> Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: >> ffff81012599ef08 R14: ffff81012599f2b8 R15: >> ffff81012599ef08 >> Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: >> 00002b3802fed670(0000) GS:ffff810418022c80(0000) >> knlGS:00000000f546bb90 >> Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 >> DS: 0000 >> ES: 0000 CR0: 000000008005003b >> Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: >> 00007f5db5542000 CR3: 0000000167ddf000 CR4: >> 00000000000006e0 >> Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: >> 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: >> 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: >> 1387, threadinfo ffff8101238f8000, task ffff8104110cc000) >> Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: >> 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8 >> 00000000802c34c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995212] >> 0000000000000000 >> 0000000100000000 ffffffff80680c00 00000000804715e2 >> Oct 5 10:21:22 n1 kernel: [1006473.995272] >> 0000000100000000 >> ffff8101238f9e48 ffff810245558b80 ffff81031e358680 >> Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: >> Oct 5 10:21:22 n1 kernel: [1006473.995380] >> [<ffffffff8857d03f>] >> :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 >> Oct 5 10:21:22 n1 kernel: [1006473.995427] >> [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995470] >> [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80 >> Oct 5 10:21:22 n1 kernel: [1006473.995514] >> [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 >> Oct 5 10:21:22 n1 kernel: [1006473.995553] >> [filldir+0x0/0xf0] filldir+0x0/0xf0 >> Oct 5 10:21:22 n1 kernel: [1006473.995594] >> [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230 >> Oct 5 10:21:22 n1 kernel: [1006473.995631] >> [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50 >> Oct 5 10:21:22 n1 kernel: [1006473.995664] >> [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0 >> Oct 5 10:21:22 n1 kernel: [1006473.995699] >> [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0 >> Oct 5 10:21:22 n1 kernel: [1006473.997568] >> [system_call+0x7e/0x83] system_call+0x7e/0x83 >> Oct 5 10:21:22 n1 kernel: [1006473.997605] >> Oct 5 10:21:22 n1 kernel: [1006473.997627] >> Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b >> eb fe >> 83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f >> Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP >> [<ffffffff8856c404>] >> :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP >> <ffff8101238f9d58> >> Thanks >> Laurence >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> <mailto:Ocfs2-users@oss.oracle.com> >> <mailto:Ocfs2-users@oss.oracle.com >> <mailto:Ocfs2-users@oss.oracle.com>> >> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> >> >> >> >> >> >
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users