Re: [Lustre-discuss] HA problem with Lustre 2.2
Mar 28 16:05:56 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.1.44@o2ib. The ost_connect operation failed with -19 I suppose thats the NID of a 'dead' OST? How was the filesystem formatted? Did you specify --msgnode and --failnode ? -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Questions concerning quotacheck.
Hi, But apart from that, when else is quotaceck required? For example is there any case where the quotas will not be coherent? And have to run quotacheck again in order to recheck them? Lustre versions prior to 1.6.5 required you to run quotacheck after server crashes. Recent versions (1.8, 2.x) use journaled quotas and will survive crashes just fine. We are running Lustre 1.8 and 2.2 servers and i never had to re-run quotacheck on them. Secondly, inside the operations manual v1.8.4 section 9.1.2 it states that the required time for quotacheck to complete its task, is proportional to amount of files in the fs. So is there a practical way to get an indication of the amount of time quotacheck will need? I can't give you exact timings (or a formula) but it's pretty fast: We enabled quotas on our 1.8.x system while about 100TB were already used. (Medium sized files). The initial `quotacheck' run finished within 30 minutes. Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Large Corosync/Pacemaker clusters
I will try the renice solution you proposed. re-niceing corosync should not be required as the process is supposed to run with RT-Priority anyway. I have been thinking that I could increase the token timeout value in /etc/corosync/corosync.conf , to prevent short hiccups. Did you specify a value to this parameter or did you leave the default 1000ms value? We configured the token timeout to 17 seconds: totem { [] transport: udpu rrp_mode: passive token: 17000 } This configuration works just fine for us since months: We didn't see a single 'false positive STONITH' with this configuration. Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16
Hello, May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.123@tcp. The ost_connect operation failed with -16 Error -16 stands for -EBUSY When you got this error message, you failed to run ls, df ,vi, touch and so on, which affect us to do anything in the file system. That's to be expected in such a situation. I suppose that 'lfs check servers' returned 'temporarily unavailable' for some OSTs ? I think the ost_connect failure could report some error messages to users instead of causing any interactive actions stuck. No: Users shouldn't get an error in such a situation: The filesystem will just hang until the situation recovered (= the client was able to re-connect to the OST). Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] stripe don't work
lmm_stripe_count: 1 [...] The file is only allocated into OST with index 4 Yes, because the stripe *count* is set to '1'. Set it to '-1' to use all OSTs: $ lfs setstripe -c -1 -s 1M . ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with secondary groups
Hi Jason, You need to set L_GETIDENTITY_TEST if you would like to have output on stdout. (It would write the result to /proc/... otherwise) Example: [root@n-mds1 ~]# L_GETIDENTITY_TEST=1 l_getidentity nero-MDT 3480 uid=3480 gid=888,10,121,194,196,198,888,1000 Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Most recent Linux Kernel on the client for a 1.8.7 server
could someone please tell me what the most recent kernel version (and lustre version) is on the client side, if I have to stick to 1.8.7 on the server side? 2.x clients will refuse to talk to 1.8.x servers. You can build the 1.8.x client with a few patches on CentOS6 (2.6.32), but you should really consider to upgrade to 2.x in the future. Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] recovery from multiple disks failure on the same md
Hi, A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. While recovering it, another disk failed. so recovering procedure seems to be halt, So did the md-array stop itself on the 3th disk failure (or at least turn read-only)? If it did you might be able to get it running again without catastrophic corruption. This is what i would try (without any warranty!): - Forget about the 2 syncing spares - Take the 3th failed disk and attach it to some pc - Copy as much data as possible to a new spare using dd_rescue (-r might help) - Put the drive with the fresh copy (= the good, new drive) into the array and assemble + start it. Use --force if mdadm complains about outdated metadata. (and starting it as 'readonly' for now would also be a good idea) - Add a new spare to the array and sync it as fast as possible to get at least 1 parity disk. - Run 'fsck -n /dev/mdX' to see how badly damaged your filesystem is. If you think that fsck can fix the errors (and will not cause more damadge), run it without '-n' - Add the 2nd parity disk, sync it, mount the filesystem and pray. The amount of data corruption will be linked to the success of dd_rescue: You are probably lucky if it only failed to read a few sectors. And i agree with Kevin: If you have a support contract: ask them to fix it. (..and if you have enough hardware + time: create a backup of ALL drives in the failed raid via 'dd' before touching anything!) I'd also recommend to start periodic scrubbing: We do this once per month with low priority (~5MBPS) with little impact to the users. Regards and good luck, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Slow Directory Listing
While normal file access works fine, the directory listing is extremely slow. Depending on the number of files in a directory, the listing takes around 5 - 15 secs. I tried 'ls --color=none' and it worked fine; listed the contents immediately. That's because 'ls --color=always|auto' does an lstat() of each file (--color=none doesn't) which causes Lustre to send: - 1 RPC to the MDS per file - 1 RPC (per file) to EACH OSS where the file is stored to get the filesize Some time ago i've created a patch to speed up 'ls' while keeping (most) of the colors (https://github.com/adrian-bl/patchwork/blob/master/coreutils/ls/ls-lustre.diff) But patching samba will not be possible in your case as it really needs the information returned by stat(). Double clicking on directory takes a long long time to display. Attach `strace' to samba: It will probably be busy doing lstat() which is a 'slow' operation on Lustre in any case. The cluster consist of - - two DRBD Mirrored MDS Servers (Dell R610s) with 10K RPM disks - four OSS Nodes (2 Node Cluster (Dell R710s) with a common storage (Dell MD3200)) How many OSTs do you have per OSS? What's your stripe setting? Setting the stripe to 1 could give you a huge speedup (without affecting normal I/O as i assume that the 9MB files are read/written sequentially) Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] stuck OSS node
Hi Craig, Has anyone seen anything like this? Yes: we had a similar problem a couple of times: First, try to umount all OSTs on the affected OSS. Some OSTs will (most likely) fail to umount. (umount gets stuck due to the ll_ost_io_?? thread). Note the 'broken' OSTs and kill the OSS (echo b /proc/sysrq-trigger) after the 'good' OSTs finished umounting. Afterwards do a simple 'e2fsck -f -p' on the bad OSTs - it should complain about corrupted directories and other nice things. If it doesn't - upgrade to the latest fsck from whamcloud. (We had a corruption a few months ago that was unfixable/not detected with the 1.8.4-sun e2fsprogs) This is a recent phenomena - we are not sure, but we think it may be related to a particular workload. Our o2ib clients don't seem to have any trouble. I don't think that this issue is related to the network: It's probably just 'bad luck' that only the tcp clients hit the corrupted directories. Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] New wc-discuss Lustre Mailing List
you can subscribe simply by sending an e-mail to wc-discuss+subscr...@googlegroups.com. This bounces, but sending an e-mail to wc-discuss+subscr...@whamcloud.com works. However: The link in the verification mail will take you to a login page - so you still need to have a google account to subscribe :-( -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. signature.asc Description: PGP signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover
Hi Kevin, But you specified that as a failover node: # tunefs.lustre --erase-params --param=failover.node=10.201.62...@o2ib,10.201.30...@tcp failover.node=10.201.62...@o2ib,10.201.30...@tcp mdt.group_upcall=/usr/sbin/l_getgroups /dev/md10 Well: First i was just running # tunefs.lustre --param mdt.quota_type=ug /dev/md10 and this alone was enough to break it. Then i tried to remove the quota-option with --erase-params and i've included both nodes (the primary + failover) because 'tunefs.lustre /dev/md10' displayed them. Not sure what you mean when you say it worked before It worked before we added the *.quota_type parameters: This installation is over 1 year old, saw quite a few remounts and an upgrade from 1.8.1.1 - 1.8.4. did you specify both sets on your mkfs command line? The initial installation was done / dictated by the swiss branch of an (no longer existing) three-letter company. This command was used to create the filesystem on the MDS # FS_NAME=lustre1 # MGS_1=10.201.62...@o2ib0,10.201.30...@tcp0 # MGS_2=10.201.62...@o2ib0,10.201.30...@tcp0 # mkfs.lustre --reformat --fsname ${FS_NAME} --mdt --mgs --failnode=${MGS_1} --failnode=${MGS_2} /dev/md10 Regards and thanks, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover
Hi, Your MDS refuses to start after we tried to enable Quotas: What we did: # umount /lustre/mds # tunefs.lustre --param mdt.quota_type=ug /dev/md10 (as described in http://wiki.lustre.org/manual/LustreManual18_HTML/ConfiguringQuotas.html) # sync # mount -t lustre /dev/md10 /lustre/mds --- at this point, the mds crashed --- Now the MDS refuses to startup: Lustre: OBD class driver, http://www.lustre.org/ Lustre: Lustre Version: 1.8.4 Lustre: Build Version: 1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 Lustre: Listener bound to ib0:10.201.62.11:987:mlx4_0 Lustre: Register global MR array, MR size: 0x, array size: 1 Lustre: Added LNI 10.201.62...@o2ib [8/64/0/180] Lustre: Added LNI 10.201.30...@tcp [8/256/0/180] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ init dynlocks cache ldiskfs created from ext3-2.6-rhel5 kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on md10, internal journal LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on md10, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. Lustre: MGS MGS started Lustre: mgc10.201.62...@o2ib: Reactivating import Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available for connect (no target) LustreError: 6440:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@81021986a000 x1352839800570911/t0 o38-?@?:0/0 lens 368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available for connect (no target) LustreError: Skipped 1 previous similar message LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@81021986ac00 x1352839303546603/t0 o38-?@?:0/0 lens 368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 1 previous similar message LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available for connect (no target) LustreError: Skipped 17 previous similar messages LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@8101ee758400 x1352840769468288/t0 o38-?@?:0/0 lens 368/0 e 0 to 0 dl 1290181454 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 17 previous similar messages LustreError: 6423:0:(mgs_handler.c:671:mgs_handle()) MGS handle cmd=253 rc=-99 LustreError: 11-0: an error occurred while communicating with 0...@lo. The mgs_target_reg operation failed with -99 LustreError: 6177:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for lustre1-MDT: -99 LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available for connect (no target) LustreError: Skipped 17 previous similar messages LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@8101ea921800 x1352839510145001/t0 o38-?@?:0/0 lens 368/0 e 0 to 0 dl 1290181455 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 18 previous similar messages LustreError: 6177:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -99 LustreError: 6177:0:(obd_mount.c:1438:server_put_super()) no obd lustre1-MDT LustreError: 6177:0:(obd_mount.c:147:server_deregister_mount()) lustre1-MDT not registered Lustre: MGS has stopped. LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available for connect (no target) LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@8101ec658000 x1352839459803293/t0 o38-?@?:0/0 lens 368/0 e 0 to 0 dl 1290181457 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 50 previous similar messages LustreError: Skipped 58 previous similar messages Lustre: server umount lustre1-MDT complete LustreError: 6177:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-99) Removing the quota params via # tunefs.lustre --erase-params --param=failover.node=10.201.62...@o2ib,10.201.30...@tcp failover.node=10.201.62...@o2ib,10.201.30...@tcp mdt.group_upcall=/usr/sbin/l_getgroups /dev/md10 did not help. So what does 'Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover' exactly mean? This *is* 10.201.62.11 and tunefs shows: checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre1-MDT Index: 0 Lustre FS: lustre1 Mount type:
Re: [Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover
Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover I've removed the own NID from the MDT and all OSTs: afterwards i was able to mount everything and lustre is now working (again) since 8 hours. However, i have no idea: - if this is correct (will failover still work?) - why it worked before - why it didn't work again after removing the quota option. -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
- the on-disk structure of the object directory for this OST is corrupted. Run e2fsck -fp /dev/{ostdev} on the unmounted OST filesystem. e2fsck fixed it: The OST is now running since 40 minutes without problems: e2fsck 1.41.6.sun1 (30-May-2009) lustre1-OST0005: recovering journal lustre1-OST0005 has been mounted 72 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Directory inode 440696867, block 493, offset 0: directory corrupted Salvagey? yes Directory inode 440696853, block 517, offset 0: directory corrupted Salvagey? yes Directory inode 440696842, block 560, offset 0: directory corrupted Salvagey? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 17769156 Connect to /lost+foundy? yes Inode 17769156 ref count is 2, should be 1. Fixy? yes Unattached zero-length inode 17883901. Cleary? yes Pass 5: Checking group summary information lustre1-OST0005: * FILE SYSTEM WAS MODIFIED * lustre1-OST0005: 44279/488382464 files (15.4% non-contiguous), 280329314/1953524992 blocks But shouldn't the journal of ext3/ldiskfs make running e2fsck unnecessary? Have a nice weekend and thanks a lot for the fast reply! Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
The journal will prevent inconsistencies in the filesystem in case of a crash. It cannot prevent corruption of the on-disk data, inconsistencies caused by cache enabled on the disks or in a RAID controller, software bugs, memory corruption, bad cables, etc. The OSS is part of a 'Snowbird' installation, so the RAID/Disk part should be fine. I hope that we 'just' hit a small software bug :-/ That is why it is still a good idea for users to run e2fsck periodically on a filesystem. Ok, we will keep this in mind (e2fsck was surprisingly fast anyway!) Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi, Since a few hours we have a problem with one of our OSTs: One (and only one) ll_ost_create_ process on one of the OSTs seems to go crazy and uses 100% CPU. Rebooting the OST + MDS didn't help and there isn't much going on on the filesystem itself: - /proc/fs/lustre/ost/OSS/ost_create/stats is almost 'static' - iostat shows almost no usage - ib traffic is 100 kb/s The MDS logs this each ~3 minutes: Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 10.201.62...@o2ib. The ost_connect operation failed with -16 ..and later: Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery failed: -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx 5/32: rc = -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at mds_lov_clear_orphans: -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync failed -110, deactivating Aug 13 19:17:54 mds1 kernel: Lustre: 6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all connections, increasing latency to 51s oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create process) On the affected OSS we get Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from lustre1-mdtlov_u...@10.201.62.11@o2ib to 0x8102164d0200; still busy with 2 active RPCs $ llog_reader lustre-log.1281718692.11833 shows: Bit 0 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -32511 of 284875 not set Bit 0 of 284875 not set Bit -1 of 284875 not set Bit 0 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -1 of 284875 not set Bit 0 of 284875 not set Segmentation fault -- *ouch* And we get tons of soft-cpu lockups :-/ Any ideas? Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi Alexey, Llog_reader is tool to read configuration llog, if you want decode debug log, you should use lctl df $file $output Oh, sorry for mixing this up. 'lctl df' doesn't show much new stuff: 0001:0400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting 0001:0400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f4...@10.201.48.12@o2ib to 0x8101c93b6000; still busy with 1 active RPCs 0001:0002: 0001:0400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: lustre1-mdtlov_UUID reconnecting please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects. LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 previous similar messages BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833] CPU 4: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U) tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 11833, comm: ll_ost_creat_00 Tainted: G 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 RIP: 0010:[88b49af4] [88b49af4] :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 RSP: 0018:8101e715d500 EFLAGS: 0202 RAX: RBX: 0008 RCX: 0c91a9f1 RDX: 8101e8893800 RSI: 8101e715d4e8 RDI: 81010773e838 RBP: 0002 R08: 81017bd9cff8 R09: 81017bd9c000 R10: 810216dfb000 R11: 4c6578dc R12: 81017d41e6d0 R13: 80063b4c R14: 8101e715d5b8 R15: 80014fae FS: 2ab1d8b97220() GS:81021fc74bc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 09c9c178 CR3: 00201000 CR4: 06e0 Call Trace: [8001a1b8] vsnprintf+0x559/0x59e [8005ba4d] cache_alloc_refill+0x106/0x186 [88b4bf63] :ldiskfs:ldiskfs_lookup+0x53/0x290 [800366e8] __lookup_hash+0x10b/0x130 [800e2c9b] lookup_one_len+0x53/0x61 [88bd71ed] :obdfilter:filter_fid2dentry+0x42d/0x730 [88bd3383] :obdfilter:filter_statfs+0x273/0x350 [8026f08b] __down_trylock+0x44/0x4e [88bd2f00] :obdfilter:filter_parent_lock+0x20/0x220 [88bd7d43] :obdfilter:filter_precreate+0x843/0x19e0 [887e0923] :lnet:lnet_ni_send+0x93/0xd0 [8000d0bd] dput+0x23/0x10a [88be1e19] :obdfilter:filter_create+0x10b9/0x15e0 [887e6ad2] :lnet:LNetPut+0x702/0x800 [888e9a13] :ptlrpc:ptl_send_buf+0x3f3/0x5b0 [888ef394] :ptlrpc:lustre_msg_add_version+0x34/0x110 [888ea198] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 [888f1f69] :ptlrpc:lustre_pack_reply+0x29/0xb0 [88ba161d] :ost:ost_handle+0x131d/0x5a70 [8001a1b8] vsnprintf+0x559/0x59e [88797978] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990 [80148e8c] __next_cpu+0x19/0x28 [80088f36] find_busiest_group+0x20d/0x621 [888f3f05] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 [80089d8d] enqueue_task+0x41/0x56 [888f8c1d] :ptlrpc:ptlrpc_check_req+0x1d/0x110 [888fb357] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170 [8003d382] lock_timer_base+0x1b/0x3c [8008881d] __wake_up_common+0x3e/0x68 [888fee08] :ptlrpc:ptlrpc_main+0x1218/0x13e0 [8008a3f3] default_wake_function+0x0/0xe [8005dfb1] child_rip+0xa/0x11 [888fdbf0] :ptlrpc:ptlrpc_main+0x0/0x13e0 [8005dfa7] child_rip+0x0/0x11 Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1) Regards, Adrian ___ Lustre-discuss mailing list
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi Alexey, in general soft-lookup isn't error, that just notice about some operation is need too many time (more then 10s i think). attached soft-lookup say - OST is busy with creating objects after MDSOST reconnect, Yes, i know that a soft-lockup doesn't mean that i hit a bug but having ll_ost_creat_* wasting 100% CPU doesn't seem to be normal. i think you have too busy disks or overloaded node. Disk %busy is 5% for all attached disks. The OST is doing almost nothing (there are a few read()'s, that's all) if you have slow disks - client can be disconnected before they request is processing, and that request blocked to reconnect from that client. The recovery of the clients seems to be ok: all clients can write/read data from the OST but there is something wrong between the MDS-OST0005. But this might just be a side-effect of the ll_ost_creat_* issue :-/ Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Frederik, 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, `io-size' reads its data from the 'disk I/O size' part of brw_stats: read | write disk I/O size ios % cum % | ios % cum % ..and in your case there are no stats, (for reasons unknown to me...) that's why lustre-info.pl cannot display anything. Otherwise the file looks fine: eg. `--monitor=io-time' should work. Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Frederik, If you've got a lot of OSTs on your server you need a wide monitor for some of the options like --monitor=ost-patterns for all OSTs... The output format is not ideal, but it's a good reason to upgrade your workstation to a dualhead configuration ;-) Looking at the code it seems the value for 'setattr' is missing from the stats file for some of our OSTs. Looking at the stats file, indeed the setattr line is missing for some OSTs. As Andreas already said: If 'setattr' is missing, there was no setattr operation (yet). Changing printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice); into printf(, %s=%5.1f R/s,$type,(($stats-{$type}||0)/$slice) ); should fix the warning. (The totals are ok, because in perl undef/$x == 0/$x) 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, please wait... for a very long time until I killed it, I have not had the time to debug this yet. I never tested it with anything else than 1.8.1.1 but this should be trivial to fix: Could you mail me the output of /proc/fs/lustre/obdfilter/##SOME_OST##/exports/##A_RANDOM_NID##/brw_stats ? Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
First: Sorry for the shameless self advertising, but... I uploaded two lustre-related modules to the CPAN: #1: Lustre::Info provides easy access to information located at /proc/fs/lustre, it also comes with a 'performance monitoring' script called 'lustre-info.pl' #2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but with additional lustre-specific features ($dir_fh-set_stripe...) Examples and details: Lustre::Info and lustre-info.pl --- Lustre::Info provides a Perl-OO interface to lustres procfs information. (confusing) example code to get the blockdevice of all OSTs: # my $l = Lustre::Info-new; print join(\n, map( { $l-get_ost($_)-get_name.: .$l-get_ost($_)-get_blockdevice } \ @{$l-get_ost_list}), '' ) if $l-is_ost; # ..output: $ perl test.pl lustre1-OST001e: /dev/md17 lustre1-OST0016: /dev/md15 lustre1-OST000e: /dev/md13 lustre1-OST0006: /dev/md11 The module also includes a script called 'lustre-info.pl' that can be used to gather some live performance statistics: Use `--ost-stats' to get a quick overview on what's going on: $ lustre-info.pl --ost-stats lustre1-OST0006 (@ /dev/md11) : write= 5.594 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST000e (@ /dev/md13) : write= 3.997 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 4.0 R/s lustre1-OST0016 (@ /dev/md15) : write= 5.502 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST001e (@ /dev/md17) : write= 5.905 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.7 R/s You can also get client-ost details via `--monitor=MODE' $ lustre-info.pl --monitor=ost --as-list # this will only show clients where read+write = 1MB/s client nid | lustre1-OST0006| lustre1-OST000e| lustre1-OST0016 | lustre1-OST001e| +++ TOTALS +++ (MB/s) 10.201.46...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 1.1 | read= 0.0, write= 1.1 10.201.47...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 1.2 | r= 0.0, w= 2.0 | r= 0.0, w= 0.0 | read= 0.0, write= 3.2 There are many more options, checkout `lustre-info.pl --help' for details! Lustre::LFS::Dir and Lustre::LFS::File --- This two packages behave like IO::File and IO::Dir but both of them add some lustre-only features to the returned filehandle. Quick example: my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH $fh-open( test) or die; print $fh Foo Bar!\n; my $stripe_info = $fh-get_stripe or die Not on a lustre filesystem?!\n; Keep in mind that both Lustre modules are far from being complete: Lustre::Info really needs some MDT support and Lustre::LFS is just a wrapper for /usr/bin/lfs: An XS-Version would be much better. But i'd love to hear some feedback if someone decides to play around with this modules + lustre-info.pl :-) Cheers, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss