Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
> The journal will prevent inconsistencies in the filesystem in case of a crash. > It cannot prevent corruption of the on-disk data, inconsistencies caused by > cache > enabled on the disks or in a RAID controller, software bugs, memory > corruption, bad cables, etc. The OSS is part of a 'Snowbird' installation, so the RAID/Disk part should be fine. I hope that we 'just' hit a small software bug :-/ > That is why it is still a good idea for users to run e2fsck periodically on a > filesystem. Ok, we will keep this in mind (e2fsck was surprisingly fast anyway!) Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On 2010-08-14, at 2:28, Adrian Ulrich wrote: >> - the on-disk structure of the object directory for this OST is corrupted. >> Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. > > e2fsck fixed it: The OST is now running since 40 minutes without problems: > > But shouldn't the journal of ext3/ldiskfs make running e2fsck unnecessary? The journal will prevent inconsistencies in the filesystem in case of a crash. It cannot prevent corruption of the on-disk data, inconsistencies caused by cache enabled on the disks or in a RAID controller, software bugs, memory corruption, bad cables, etc. That is why it is still a good idea for users to run e2fsck periodically on a filesystem. If you are using LVM there is an lvcheck script I wrote that can check a filesystem snapshot on a running system, but otherwise you should do it whenever the opportunity arises. Cheers, Andreas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
> - the on-disk structure of the object directory for this OST is corrupted. > Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. e2fsck fixed it: The OST is now running since 40 minutes without problems: e2fsck 1.41.6.sun1 (30-May-2009) lustre1-OST0005: recovering journal lustre1-OST0005 has been mounted 72 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Directory inode 440696867, block 493, offset 0: directory corrupted Salvage? yes Directory inode 440696853, block 517, offset 0: directory corrupted Salvage? yes Directory inode 440696842, block 560, offset 0: directory corrupted Salvage? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 17769156 Connect to /lost+found? yes Inode 17769156 ref count is 2, should be 1. Fix? yes Unattached zero-length inode 17883901. Clear? yes Pass 5: Checking group summary information lustre1-OST0005: * FILE SYSTEM WAS MODIFIED * lustre1-OST0005: 44279/488382464 files (15.4% non-contiguous), 280329314/1953524992 blocks But shouldn't the journal of ext3/ldiskfs make running e2fsck unnecessary? Have a nice weekend and thanks a lot for the fast reply! Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On 2010-08-13, at 12:29, Adrian Ulrich wrote: > Pid: 11833, comm: ll_ost_creat_00 Tainted: G > 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 > :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 > [] :ldiskfs:ldiskfs_lookup+0x53/0x290 > [] __lookup_hash+0x10b/0x130 > [] lookup_one_len+0x53/0x61 > [] :obdfilter:filter_fid2dentry+0x42d/0x730 > [] :obdfilter:filter_statfs+0x273/0x350 > [] :obdfilter:filter_parent_lock+0x20/0x220 > [] :obdfilter:filter_precreate+0x843/0x19e0 > [] :obdfilter:filter_create+0x10b9/0x15e0 > [] :ost:ost_handle+0x131d/0x5a70 Two possibilities I can see: - MDS sent very large create request. Compare the values from: mds> lctl get_param osc.*.prealloc_* oss> lctl get_param obdfilter.*.last_id and see if they match. If last_id is growing quickly the thread is busy precreating many objects for some reason. If this OST has a much higher prealloc_last_id on the MDS, something is bad in the MDS lov_objids file. - the on-disk structure of the object directory for this OST is corrupted. Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi Alexey, > in general soft-lookup isn't error, that just notice about some operation is > need too many time (more then 10s i think). > attached soft-lookup say - OST is busy with creating objects after MDS<>OST > reconnect, Yes, i know that a soft-lockup doesn't mean that i hit a bug but having ll_ost_creat_* wasting 100% CPU doesn't seem to be normal. > i think you have too busy disks or overloaded node. Disk %busy is < 5% for all attached disks. The OST is doing almost nothing (there are a few read()'s, that's all) > if you have slow disks - client can be disconnected before they request is > processing, and that request blocked to reconnect from that client. The recovery of the clients seems to be ok: all clients can write/read data from the OST but there is something wrong between the MDS<->OST0005. But this might just be a side-effect of the ll_ost_creat_* issue :-/ Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Adrian, in general soft-lookup isn't error, that just notice about some operation is need too many time (more then 10s i think). attached soft-lookup say - OST is busy with creating objects after MDS<>OST reconnect, i think you have too busy disks or overloaded node. to avoid that softlookup you can reduce maximal number of precreated object via procfs (if my memory correct). why clients has reconnects is different question, but that can be related to soft-lookup. if you have slow disks - client can be disconnected before they request is processing, and that request blocked to reconnect from that client. On Aug 13, 2010, at 21:29, Adrian Ulrich wrote: > Oh, sorry for mixing this up. > > 'lctl df' doesn't show much new stuff: > > 0001:0400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect()) > lustre1-OST0005: > 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting > 0001:0400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect()) > lustre1-OST0005: > refuse reconnection from > 9f880a5e-2331-07a8-8611-d6e3102f4...@10.201.48.12@o2ib to 0x8101c93b6000; > still busy with 1 active RPCs > 0001:0002: > 0001:0400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect()) > lustre1-OST0005: > lustre1-mdtlov_UUID reconnecting > > >> please post soft-lookup report. one of possibility, MDS ask too many objects >> to create on that OST or OST have too many reconnects. > > LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 > previous similar messages > BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833] > CPU 4: > Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) > crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) > ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) > netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm > (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) > xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) > cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) > scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) > battery(U) > asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) > tpm_infineon(U) > tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) > pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) > dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) > ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) > scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) > shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) > ohci_hcd(U) ehci_hcd(U) > Pid: 11833, comm: ll_ost_creat_00 Tainted: G > 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 > RIP: 0010:[] [] > :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 > RSP: 0018:8101e715d500 EFLAGS: 0202 > RAX: RBX: 0008 RCX: 0c91a9f1 > RDX: 8101e8893800 RSI: 8101e715d4e8 RDI: 81010773e838 > RBP: 0002 R08: 81017bd9cff8 R09: 81017bd9c000 > R10: 810216dfb000 R11: 4c6578dc R12: 81017d41e6d0 > R13: 80063b4c R14: 8101e715d5b8 R15: 80014fae > FS: 2ab1d8b97220() GS:81021fc74bc0() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 09c9c178 CR3: 00201000 CR4: 06e0 > > Call Trace: > [] vsnprintf+0x559/0x59e > [] cache_alloc_refill+0x106/0x186 > [] :ldiskfs:ldiskfs_lookup+0x53/0x290 > [] __lookup_hash+0x10b/0x130 > [] lookup_one_len+0x53/0x61 > [] :obdfilter:filter_fid2dentry+0x42d/0x730 > [] :obdfilter:filter_statfs+0x273/0x350 > [] __down_trylock+0x44/0x4e > [] :obdfilter:filter_parent_lock+0x20/0x220 > [] :obdfilter:filter_precreate+0x843/0x19e0 > [] :lnet:lnet_ni_send+0x93/0xd0 > [] dput+0x23/0x10a > [] :obdfilter:filter_create+0x10b9/0x15e0 > [] :lnet:LNetPut+0x702/0x800 > [] :ptlrpc:ptl_send_buf+0x3f3/0x5b0 > [] :ptlrpc:lustre_msg_add_version+0x34/0x110 > [] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 > [] :ptlrpc:lustre_pack_reply+0x29/0xb0 > [] :ost:ost_handle+0x131d/0x5a70 > [] vsnprintf+0x559/0x59e > [] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990 > [] __next_cpu+0x19/0x28 > [] find_busiest_group+0x20d/0x621 > [] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 > [] enqueue_task+0x41/0x56 > [] :ptlrpc:ptlrpc_check_req+0x1d/0x110 > [] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170 > [] lock_timer_base+0x1b/0x3c > [] __wake_up_common+0x3e/0x68 > [] :ptlrpc:ptlrpc_main+0x1218/0x13e0 > [] default_wake_function+0x0/0xe > [] child_rip+0xa/0x11 > [] :ptlrpc:ptlrpc_main+0x0/0x13e0 > [] child_rip+0x0/0x11 > > > Btw: We are running 1.8.1.1 (with rhel kernel > 2.6.18-128.7.1.el5_lustre
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi Alexey, > Llog_reader is tool to read configuration llog, if you want decode debug log, > you should use lctl df $file > $output Oh, sorry for mixing this up. 'lctl df' doesn't show much new stuff: 0001:0400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting 0001:0400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f4...@10.201.48.12@o2ib to 0x8101c93b6000; still busy with 1 active RPCs 0001:0002: 0001:0400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: lustre1-mdtlov_UUID reconnecting > please post soft-lookup report. one of possibility, MDS ask too many objects > to create on that OST or OST have too many reconnects. LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 previous similar messages BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833] CPU 4: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U) tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 11833, comm: ll_ost_creat_00 Tainted: G 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 RIP: 0010:[] [] :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 RSP: 0018:8101e715d500 EFLAGS: 0202 RAX: RBX: 0008 RCX: 0c91a9f1 RDX: 8101e8893800 RSI: 8101e715d4e8 RDI: 81010773e838 RBP: 0002 R08: 81017bd9cff8 R09: 81017bd9c000 R10: 810216dfb000 R11: 4c6578dc R12: 81017d41e6d0 R13: 80063b4c R14: 8101e715d5b8 R15: 80014fae FS: 2ab1d8b97220() GS:81021fc74bc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 09c9c178 CR3: 00201000 CR4: 06e0 Call Trace: [] vsnprintf+0x559/0x59e [] cache_alloc_refill+0x106/0x186 [] :ldiskfs:ldiskfs_lookup+0x53/0x290 [] __lookup_hash+0x10b/0x130 [] lookup_one_len+0x53/0x61 [] :obdfilter:filter_fid2dentry+0x42d/0x730 [] :obdfilter:filter_statfs+0x273/0x350 [] __down_trylock+0x44/0x4e [] :obdfilter:filter_parent_lock+0x20/0x220 [] :obdfilter:filter_precreate+0x843/0x19e0 [] :lnet:lnet_ni_send+0x93/0xd0 [] dput+0x23/0x10a [] :obdfilter:filter_create+0x10b9/0x15e0 [] :lnet:LNetPut+0x702/0x800 [] :ptlrpc:ptl_send_buf+0x3f3/0x5b0 [] :ptlrpc:lustre_msg_add_version+0x34/0x110 [] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 [] :ptlrpc:lustre_pack_reply+0x29/0xb0 [] :ost:ost_handle+0x131d/0x5a70 [] vsnprintf+0x559/0x59e [] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990 [] __next_cpu+0x19/0x28 [] find_busiest_group+0x20d/0x621 [] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 [] enqueue_task+0x41/0x56 [] :ptlrpc:ptlrpc_check_req+0x1d/0x110 [] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170 [] lock_timer_base+0x1b/0x3c [] __wake_up_common+0x3e/0x68 [] :ptlrpc:ptlrpc_main+0x1218/0x13e0 [] default_wake_function+0x0/0xe [] child_rip+0xa/0x11 [] :ptlrpc:ptlrpc_main+0x0/0x13e0 [] child_rip+0x0/0x11 Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1) Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On Aug 13, 2010, at 20:49, Adrian Ulrich wrote: > Hi, > > Since a few hours we have a problem with one of our OSTs: > > One (and only one) ll_ost_create_ process on one of the OSTs > seems to go crazy and uses 100% CPU. > > Rebooting the OST + MDS didn't help and there isn't much > going on on the filesystem itself: > > - /proc/fs/lustre/ost/OSS/ost_create/stats is almost 'static' > - iostat shows almost no usage > - ib traffic is < 100 kb/s > > > The MDS logs this each ~3 minutes: > Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while > communicating with 10.201.62...@o2ib. The ost_connect operation failed with > -16 > ..and later: > Aug 13 19:17:16 mds1 kernel: LustreError: > 10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery > failed: -110 > Aug 13 19:17:16 mds1 kernel: LustreError: > 10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST > idx 5/32: rc = -110 > Aug 13 19:17:16 mds1 kernel: LustreError: > 10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed > at mds_lov_clear_orphans: -110 > Aug 13 19:17:16 mds1 kernel: LustreError: > 10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync > failed -110, deactivating > Aug 13 19:17:54 mds1 kernel: Lustre: > 6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried > all connections, increasing latency to 51s > -110 = -ETIMEOUT, operation don't finished before deadline, or network problem. > oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create > process) ll_ost_create work on destroy old created objects, i think. > > On the affected OSS we get > Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: > refuse reconnection from lustre1-mdtlov_u...@10.201.62.11@o2ib to > 0x8102164d0200; still busy with 2 active RPCs > > > $ llog_reader lustre-log.1281718692.11833 shows: Llog_reader is tool to read configuration llog, if you want decode debug log, you should use lctl df $file > $output > > And we get tons of soft-cpu lockups :-/ > > Any ideas? please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects. > > > Regards, > Adrian > > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi, Since a few hours we have a problem with one of our OSTs: One (and only one) ll_ost_create_ process on one of the OSTs seems to go crazy and uses 100% CPU. Rebooting the OST + MDS didn't help and there isn't much going on on the filesystem itself: - /proc/fs/lustre/ost/OSS/ost_create/stats is almost 'static' - iostat shows almost no usage - ib traffic is < 100 kb/s The MDS logs this each ~3 minutes: Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 10.201.62...@o2ib. The ost_connect operation failed with -16 ..and later: Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery failed: -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx 5/32: rc = -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at mds_lov_clear_orphans: -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync failed -110, deactivating Aug 13 19:17:54 mds1 kernel: Lustre: 6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all connections, increasing latency to 51s oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create process) On the affected OSS we get Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from lustre1-mdtlov_u...@10.201.62.11@o2ib to 0x8102164d0200; still busy with 2 active RPCs $ llog_reader lustre-log.1281718692.11833 shows: Bit 0 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -32511 of 284875 not set Bit 0 of 284875 not set Bit -1 of 284875 not set Bit 0 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -1 of 284875 not set Bit 0 of 284875 not set Segmentation fault <-- *ouch* And we get tons of soft-cpu lockups :-/ Any ideas? Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss