Hello, 

Since I update my lustre 2.2 to 2.5.1 (Centos6.5) and copy the MDT to a new SSD 
disk. I get random kernel panics in the MDS (both HA pairs). The last kernel 
panic I get this log:

<4>Lustre: MGS: non-config logname received: params
<3>LustreError: 11-0: cetafs-MDT0000-lwp-MDT0000: Communicating with 0@lo, 
operation mds_connect failed with -11.
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: cetafs-MDT0000: Will be in recovery for at least 5:00, or until 102 
clients reconnect
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: Skipped 5 previous similar messages
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: Skipped 9 previous similar messages
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: Skipped 2 previous similar messages
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: Skipped 23 previous similar messages
<4>Lustre: MGS: non-config logname received: params
<4>Lustre: Skipped 8 previous similar messages
<3>LustreError: 3461:0:(ldlm_lib.c:1751:check_for_next_transno()) 
cetafs-MDT0000: waking for gap in transno, VBR is OFF (skip: 17188113481, ql: 
1, comp: 101, conn: 102, next: 17188113493, last_committed: 17188113480)
<6>Lustre: cetafs-MDT0000: Recovery over after 1:13, of 102 clients 102 
recovered and 0 were evicted.
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa0c3b6a0>] __iam_path_lookup+0x70/0x1f0 [osd_ldiskfs]
<4>PGD 106c0bf067 PUD 106c0be067 PMD 0 
<4>Oops: 0002 [#1] SMP 
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 0 
<4>Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) 
fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) osc(U) 
mdc(U) fid(U) fld(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) 
lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ipmi_devintf 
cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm 
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_multipath microcode 
iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich mfd_core i2c_i801 igb 
i2c_algo_bit i2c_core ptp pps_core ioatdma dca mlx4_ib ib_sa ib_mad ib_core 
mlx4_en mlx4_core sg ext4 jbd2 mbcache sd_mod crc_t10dif ahci isci libsas 
mpt2sas scsi_transport_sas raid_class megaraid_sas dm_mirror dm_region_hash 
dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 3362, comm: mdt00_001 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1 
Bull SAS bullx/X9DRH-7TF/7F/iTF/iF
<4>RIP: 0010:[<ffffffffa0c3b6a0>]  [<ffffffffa0c3b6a0>] 
__iam_path_lookup+0x70/0x1f0 [osd_ldiskfs]
<4>RSP: 0018:ffff88085e2754b0  EFLAGS: 00010246
<4>RAX: 00000000fffffffb RBX: ffff88085e275600 RCX: 000000000009c93c
<4>RDX: 0000000000000000 RSI: 000000000009c93b RDI: ffff88106bcc32f0
<4>RBP: ffff88085e275500 R08: 0000000000000000 R09: 00000000ffffffff
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff88085e2755c8
<4>R13: 0000000000005250 R14: ffff8810569bf308 R15: 0000000000000001
<4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 000000106dd9b000 CR4: 00000000000407f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process mdt00_001 (pid: 3362, threadinfo ffff88085e274000, task 
ffff88085f55c080)
<4>Stack:
<4> 0000000000000000 ffff88085e2755d8 ffff8810569bf288 ffffffffa00fd2c4
<4><d> ffff88085e275660 ffff88085e2755c8 ffff88085e2756c8 0000000000000000
<4><d> 0000000000000000 ffff88085db2a480 ffff88085e275530 ffffffffa0c3ba6c
<4>Call Trace:
<4> [<ffffffffa00fd2c4>] ? do_get_write_access+0x3b4/0x520 [jbd2]
<4> [<ffffffffa0c3ba6c>] iam_lookup_lock+0x7c/0xb0 [osd_ldiskfs]
<4> [<ffffffffa0c3bad4>] __iam_it_get+0x34/0x160 [osd_ldiskfs]
<4> [<ffffffffa0c3be1e>] iam_it_get+0x2e/0x150 [osd_ldiskfs]
<4> [<ffffffffa0c3bf4e>] iam_it_get_exact+0xe/0x30 [osd_ldiskfs]
<4> [<ffffffffa0c3d47f>] iam_insert+0x4f/0xb0 [osd_ldiskfs]
<4> [<ffffffffa0c366ea>] osd_oi_iam_refresh+0x18a/0x330 [osd_ldiskfs]
<4> [<ffffffffa0c3ea40>] ? iam_lfix_ipd_alloc+0x0/0x20 [osd_ldiskfs]
<4> [<ffffffffa0c386dd>] osd_oi_insert+0x11d/0x480 [osd_ldiskfs]
<4> [<ffffffff811ae522>] ? generic_setxattr+0xa2/0xb0
<4> [<ffffffffa0c25021>] ? osd_ea_fid_set+0xf1/0x410 [osd_ldiskfs]
<4> [<ffffffffa0c33595>] osd_object_ea_create+0x5b5/0x700 [osd_ldiskfs]
<4> [<ffffffffa0e173bf>] lod_object_create+0x13f/0x260 [lod]
<4> [<ffffffffa0e756c0>] mdd_object_create_internal+0xa0/0x1c0 [mdd]
<4> [<ffffffffa0e86428>] mdd_create+0xa38/0x1730 [mdd]
<4> [<ffffffffa0c2af37>] ? osd_xattr_get+0x97/0x2e0 [osd_ldiskfs]
<4> [<ffffffffa0e14770>] ? lod_index_lookup+0x0/0x30 [lod]
<4> [<ffffffffa0d50358>] mdo_create+0x18/0x50 [mdt]
<4> [<ffffffffa0d5a64c>] mdt_reint_open+0x13ac/0x21a0 [mdt]
<4> [<ffffffffa065983c>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
<4> [<ffffffffa04f4600>] ? lu_ucred_key_init+0x160/0x1a0 [obdclass]
<4> [<ffffffffa0d431f1>] mdt_reint_rec+0x41/0xe0 [mdt]
<4> [<ffffffffa0d2add3>] mdt_reint_internal+0x4c3/0x780 [mdt]
<4> [<ffffffffa0d2b35d>] mdt_intent_reint+0x1ed/0x520 [mdt]
<4> [<ffffffffa0d26a0e>] mdt_intent_policy+0x3ae/0x770 [mdt]
<4> [<ffffffffa0610511>] ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc]
<4> [<ffffffffa0639abf>] ldlm_handle_enqueue0+0x4ef/0x10a0 [ptlrpc]
<4> [<ffffffffa0d26ed6>] mdt_enqueue+0x46/0xe0 [mdt]
<4> [<ffffffffa0d2dbca>] mdt_handle_common+0x52a/0x1470 [mdt]
<4> [<ffffffffa0d68545>] mds_regular_handle+0x15/0x20 [mdt]
<4> [<ffffffffa0669a45>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
<4> [<ffffffffa03824ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4> [<ffffffffa03933df>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4> [<ffffffffa06610e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
<4> [<ffffffff81054839>] ? __wake_up_common+0x59/0x90
<4> [<ffffffffa066adad>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
<4> [<ffffffffa066a2c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
<4> [<ffffffff8109aee6>] kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>Code: 00 48 8b 5d b8 45 31 ff 0f 1f 00 49 8b 46 30 31 d2 48 89 d9 44 89 ee 
48 8b 7d c0 ff 50 20 48 8b 13 66 2e 0f 1f 84 00 00 00 00 00 <f0> 0f ba 2a 19 19 
c9 85 c9 74 15 48 8b 0a f7 c1 00 00 00 02 74 
<1>RIP  [<ffffffffa0c3b6a0>] __iam_path_lookup+0x70/0x1f0 [osd_ldiskfs]
<4> RSP <ffff88085e2754b0>
<4>CR2: 0000000000000000







Any suggestion is welcome?

THANKS!!!







Alfonso Pardo Diaz
System Administrator / Researcher
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37




----------------------------
Confidencialidad: 
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario 
y puede contener información privilegiada o confidencial. Si no es vd. el 
destinatario indicado, queda notificado de que la utilización, divulgación y/o 
copia sin autorización está prohibida en virtud de la legislación vigente. Si 
ha recibido este mensaje por error, le rogamos que nos lo comunique 
inmediatamente respondiendo al mensaje y proceda a su destrucción.

Disclaimer: 
This message and its attached files is intended exclusively for its recipients 
and may contain confidential information. If you received this e-mail in error 
you are hereby notified that any dissemination, copy or disclosure of this 
communication is strictly prohibited and may be unlawful. In this case, please 
notify us by a reply and delete this email and its contents immediately. 
----------------------------

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to