Re: [Lustre-discuss] HA problem with Lustre 2.2

2013-04-01 Thread Adrian Ulrich

 Mar 28 16:05:56 mds1 kernel: LustreError: 11-0: an error occurred while
 communicating with 192.168.1.44@o2ib. The ost_connect operation failed with
 -19

I suppose thats the NID of a 'dead' OST?

How was the filesystem formatted? Did you specify --msgnode and --failnode ?

-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Questions concerning quotacheck.

2013-01-22 Thread Adrian Ulrich
Hi,


 But apart from that, when else is quotaceck required? For example is there
 any case where the quotas will not be coherent? And have to run quotacheck
 again in order to recheck them?

Lustre versions prior to 1.6.5 required you to run quotacheck after server 
crashes.
Recent versions (1.8, 2.x) use journaled quotas and will survive crashes just 
fine.

We are running Lustre 1.8 and 2.2 servers and i never had to re-run quotacheck 
on them.



 Secondly, inside the operations manual v1.8.4 section 9.1.2 it states that
 the required time for quotacheck to complete its task, is proportional to
 amount of files in the fs. So is there a practical way to get an indication
 of the amount of time quotacheck will need?

I can't give you exact timings (or a formula) but it's pretty fast:

We enabled quotas on our 1.8.x system while about 100TB were already used.
(Medium sized files). The initial `quotacheck' run finished within 30 minutes.



Regards,
 Adrian


-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Large Corosync/Pacemaker clusters

2012-11-07 Thread Adrian Ulrich

 I will try the renice solution you proposed.

re-niceing corosync should not be required as the process is supposed to run 
with RT-Priority anyway.


 I have been thinking that I could increase the token timeout value in 
 /etc/corosync/corosync.conf , to prevent short hiccups. Did you 
 specify a value to this parameter or did you leave the default 1000ms value?

We configured the token timeout to 17 seconds:

 totem {
[]
transport: udpu
rrp_mode: passive
token: 17000
 }


This configuration works just fine for us since months: We didn't see a single 
'false positive STONITH' with this configuration.


Regards,
 Adrian







___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16

2012-05-29 Thread Adrian Ulrich
Hello,


 May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while 
 communicating with 192.168.50.123@tcp. The ost_connect operation failed with 
 -16

Error -16 stands for -EBUSY


 When you got this error message, you failed to run ls, df ,vi, touch 
 and so on, which affect us to do anything in the file system.

That's to be expected in such a situation. I suppose that 'lfs check servers' 
returned 'temporarily unavailable' for some OSTs ?


 I think the ost_connect failure could report some error messages to users 
 instead of  causing any interactive actions stuck.

No: Users shouldn't get an error in such a situation: The filesystem will just 
hang until the situation recovered (= the client was able to re-connect to the 
OST).



Regards,
 Adrian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] stripe don't work

2012-05-28 Thread Adrian Ulrich

 lmm_stripe_count:   1
 [...]
 The file is only allocated into OST with index 4

Yes, because the stripe *count* is set to '1'.

Set it to '-1' to use all OSTs:

 $ lfs setstripe -c -1 -s 1M .


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with secondary groups

2012-05-25 Thread Adrian Ulrich
Hi Jason,

You need to set L_GETIDENTITY_TEST if you would like to have output on stdout.
(It would write the result to /proc/... otherwise)

Example:
 [root@n-mds1 ~]# L_GETIDENTITY_TEST=1 l_getidentity nero-MDT 3480
 uid=3480 gid=888,10,121,194,196,198,888,1000



Regards,
 Adrian



-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Most recent Linux Kernel on the client for a 1.8.7 server

2012-05-16 Thread Adrian Ulrich

 could someone please tell me what the most recent kernel version (and lustre 
 version) is on the client side, if I have to stick to 1.8.7 on the server 
 side?

2.x clients will refuse to talk to 1.8.x servers.

You can build the 1.8.x client with a few patches on CentOS6 (2.6.32), but you 
should really consider to upgrade to 2.x in the future.

Regards,
 Adrian



-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] recovery from multiple disks failure on the same md

2012-05-07 Thread Adrian Ulrich
Hi,


 A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. 
 While recovering it, another disk failed. so recovering procedure seems to be 
 halt,

So did the md-array stop itself on the 3th disk failure (or at least turn 
read-only)?

If it did you might be able to get it running again without catastrophic 
corruption.


This is what i would try (without any warranty!):


 - Forget about the 2 syncing spares

 - Take the 3th failed disk and attach it to some pc

 - Copy as much data as possible to a new spare using dd_rescue
(-r might help)

 - Put the drive with the fresh copy (= the good, new drive) into the array 
and assemble + start it.
Use --force if mdadm complains about outdated metadata.
(and starting it as 'readonly' for now would also be a good idea)

 - Add a new spare to the array and sync it as fast as possible to get at 
least 1 parity disk.

 - Run 'fsck -n /dev/mdX' to see how badly damaged your filesystem is.
If you think that fsck can fix the errors (and will not cause more 
damadge), run it without '-n'

 - Add the 2nd parity disk, sync it, mount the filesystem and pray.


The amount of data corruption will be linked to the success of dd_rescue: You 
are probably lucky if it only failed to read a few sectors.


And i agree with Kevin:

If you have a support contract: ask them to fix it.
(..and if you have enough hardware + time: create a backup of ALL drives in the 
failed raid via 'dd' before touching anything!)


I'd also recommend to start periodic scrubbing: We do this once per month with 
low priority (~5MBPS) with little impact to the users.


Regards and good luck,
 Adrian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Slow Directory Listing

2011-09-06 Thread Adrian Ulrich

 While normal file access works fine, the directory listing is extremely
 slow.
 Depending on the number of files in a directory, the listing takes around 5
 - 15 secs.
 
 I tried 'ls --color=none' and it worked fine; listed the contents
 immediately.

That's because 'ls --color=always|auto' does an lstat() of each file 
(--color=none doesn't) which causes Lustre to send:

 - 1 RPC to the MDS per file
 - 1 RPC (per file) to EACH OSS where the file is stored to get the filesize

Some time ago i've created a patch to speed up 'ls' while keeping (most) of the 
colors
(https://github.com/adrian-bl/patchwork/blob/master/coreutils/ls/ls-lustre.diff)

But patching samba will not be possible in your case as it really needs the 
information returned by stat().


 Double clicking on directory takes a long long time to display.

Attach `strace' to samba: It will probably be busy doing lstat() which is a 
'slow' operation on Lustre in any case.



 The cluster consist of -
 - two DRBD Mirrored MDS Servers (Dell R610s) with 10K RPM disks
 - four OSS Nodes (2 Node Cluster (Dell R710s) with a common storage (Dell
 MD3200))

How many OSTs do you have per OSS?
What's your stripe setting? Setting the stripe to 1 could give you a huge 
speedup (without affecting normal I/O as i assume that the 9MB files are 
read/written sequentially)


Regards,
 Adrian


-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] stuck OSS node

2011-08-05 Thread Adrian Ulrich
Hi Craig,

 Has anyone seen anything like this?

Yes: we had a similar problem a couple of times:


First, try to umount all OSTs on the affected OSS.

Some OSTs will (most likely) fail to umount. (umount gets stuck due to the 
ll_ost_io_?? thread).
Note the 'broken' OSTs and kill the OSS (echo b  /proc/sysrq-trigger) after 
the 'good' OSTs finished umounting.

Afterwards do a simple 'e2fsck -f -p' on the bad OSTs - it should complain 
about corrupted directories and other nice things. If it doesn't - upgrade to 
the latest fsck from whamcloud.
(We had a corruption a few months ago that was unfixable/not detected with the 
1.8.4-sun e2fsprogs)



 This is a recent phenomena - we are not 
 sure, but we think it may be related to a particular workload.  Our o2ib 
 clients don't seem to have any trouble.

I don't think that this issue is related to the network: It's probably just 
'bad luck' that only the tcp clients hit the corrupted directories.



Regards,
 Adrian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] New wc-discuss Lustre Mailing List

2011-07-03 Thread Adrian Ulrich

 you can subscribe simply by sending an e-mail to
 wc-discuss+subscr...@googlegroups.com.

This bounces, but sending an e-mail to wc-discuss+subscr...@whamcloud.com 
works.
However: The link in the verification mail will take you to a login page - so 
you still
need to have a google account to subscribe :-(




-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.



signature.asc
Description: PGP signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover

2010-11-21 Thread Adrian Ulrich
Hi Kevin,

 But you specified that as a failover node:
   # tunefs.lustre --erase-params 
 --param=failover.node=10.201.62...@o2ib,10.201.30...@tcp 
 failover.node=10.201.62...@o2ib,10.201.30...@tcp 
 mdt.group_upcall=/usr/sbin/l_getgroups /dev/md10

Well: First i was just running

# tunefs.lustre --param mdt.quota_type=ug /dev/md10

and this alone was enough to break it.

Then i tried to remove the quota-option with --erase-params and i've included
both nodes (the primary + failover) because 'tunefs.lustre /dev/md10' displayed 
them.


 Not sure what you mean when you say it worked before

It worked before we added the *.quota_type parameters: This installation
is over 1 year old, saw quite a few remounts and an upgrade from
1.8.1.1 - 1.8.4.


 did you specify both sets on your mkfs command line?

The initial installation was done / dictated by the swiss branch of
an (no longer existing) three-letter company. This command was used
to create the filesystem on the MDS

# FS_NAME=lustre1
# MGS_1=10.201.62...@o2ib0,10.201.30...@tcp0
# MGS_2=10.201.62...@o2ib0,10.201.30...@tcp0
# mkfs.lustre --reformat --fsname ${FS_NAME} --mdt --mgs --failnode=${MGS_1} 
--failnode=${MGS_2} /dev/md10


Regards and thanks,
 Adrian


-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover

2010-11-19 Thread Adrian Ulrich
Hi,


Your MDS refuses to start after we tried to enable Quotas:


What we did:
 # umount /lustre/mds
 # tunefs.lustre --param mdt.quota_type=ug /dev/md10 (as described in 
http://wiki.lustre.org/manual/LustreManual18_HTML/ConfiguringQuotas.html)
 # sync
 # mount -t lustre /dev/md10 /lustre/mds
--- at this point, the mds crashed ---

Now the MDS refuses to startup:

Lustre: OBD class driver, http://www.lustre.org/
Lustre: Lustre Version: 1.8.4
Lustre: Build Version: 
1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
Lustre: Listener bound to ib0:10.201.62.11:987:mlx4_0
Lustre: Register global MR array, MR size: 0x, array size: 1
Lustre: Added LNI 10.201.62...@o2ib [8/64/0/180]
Lustre: Added LNI 10.201.30...@tcp [8/256/0/180]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; http://www.lustre.org/
init dynlocks cache
ldiskfs created from ext3-2.6-rhel5
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
LDISKFS FS on md10, internal journal
LDISKFS-fs: recovery complete.
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
LDISKFS FS on md10, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: MGS MGS started
Lustre: mgc10.201.62...@o2ib: Reactivating import
Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, 
specified as failover
LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available  for connect 
(no target)
LustreError: 6440:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@81021986a000 x1352839800570911/t0 o38-?@?:0/0 lens 
368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available  for connect 
(no target)
LustreError: Skipped 1 previous similar message
LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@81021986ac00 x1352839303546603/t0 o38-?@?:0/0 lens 
368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 1 
previous similar message
LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available  for connect 
(no target)
LustreError: Skipped 17 previous similar messages
LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@8101ee758400 x1352840769468288/t0 o38-?@?:0/0 lens 
368/0 e 0 to 0 dl 1290181454 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 17 
previous similar messages
LustreError: 6423:0:(mgs_handler.c:671:mgs_handle()) MGS handle cmd=253 rc=-99
LustreError: 11-0: an error occurred while communicating with 0...@lo. The 
mgs_target_reg operation failed with -99
LustreError: 6177:0:(obd_mount.c:1097:server_start_targets()) Required 
registration failed for lustre1-MDT: -99
LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available  for connect 
(no target)
LustreError: Skipped 17 previous similar messages
LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@8101ea921800 x1352839510145001/t0 o38-?@?:0/0 lens 
368/0 e 0 to 0 dl 1290181455 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 18 
previous similar messages
LustreError: 6177:0:(obd_mount.c:1655:server_fill_super()) Unable to start 
targets: -99
LustreError: 6177:0:(obd_mount.c:1438:server_put_super()) no obd lustre1-MDT
LustreError: 6177:0:(obd_mount.c:147:server_deregister_mount()) lustre1-MDT 
not registered
Lustre: MGS has stopped.
LustreError: 137-5: UUID 'lustre1-MDT_UUID' is not available  for connect 
(no target)
LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@8101ec658000 x1352839459803293/t0 o38-?@?:0/0 lens 
368/0 e 0 to 0 dl 1290181457 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 50 
previous similar messages
LustreError: Skipped 58 previous similar messages
Lustre: server umount lustre1-MDT complete
LustreError: 6177:0:(obd_mount.c:2050:lustre_fill_super()) Unable to
mount  (-99)


Removing the quota params via
 # tunefs.lustre --erase-params 
--param=failover.node=10.201.62...@o2ib,10.201.30...@tcp 
failover.node=10.201.62...@o2ib,10.201.30...@tcp 
mdt.group_upcall=/usr/sbin/l_getgroups /dev/md10

did not help.


So what does 'Lustre: Denying initial registration attempt from nid 
10.201.62...@o2ib, specified as failover' exactly mean?
This *is* 10.201.62.11 and tunefs shows:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target: lustre1-MDT
Index:  0
Lustre FS:  lustre1
Mount type: 

Re: [Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover

2010-11-19 Thread Adrian Ulrich

 Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, 
 specified as failover

I've removed the own NID from the MDT and all OSTs: afterwards i was able to 
mount
everything and lustre is now working (again) since 8 hours.

However, i have no idea:

 - if this is correct (will failover still work?)
 - why it worked before
 - why it didn't work again after removing the quota option.



-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

2010-08-14 Thread Adrian Ulrich

 - the on-disk structure of the object directory for this OST is corrupted.
   Run e2fsck -fp /dev/{ostdev} on the unmounted OST filesystem.

e2fsck fixed it: The OST is now running since 40 minutes without problems:

e2fsck 1.41.6.sun1 (30-May-2009)
lustre1-OST0005: recovering journal
lustre1-OST0005 has been mounted 72 times without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Directory inode 440696867, block 493, offset 0: directory corrupted
Salvagey? yes

Directory inode 440696853, block 517, offset 0: directory corrupted
Salvagey? yes

Directory inode 440696842, block 560, offset 0: directory corrupted
Salvagey? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 17769156
Connect to /lost+foundy? yes

Inode 17769156 ref count is 2, should be 1.  Fixy? yes

Unattached zero-length inode 17883901.  Cleary? yes

Pass 5: Checking group summary information

lustre1-OST0005: * FILE SYSTEM WAS MODIFIED *
lustre1-OST0005: 44279/488382464 files (15.4% non-contiguous), 
280329314/1953524992 blocks



But shouldn't the journal of ext3/ldiskfs make running e2fsck unnecessary?


Have a nice weekend and thanks a lot for the fast reply!

Regards,
 Adrian


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

2010-08-14 Thread Adrian Ulrich
 The journal will prevent inconsistencies in the filesystem in case of a crash.
 It cannot prevent corruption of the on-disk data, inconsistencies caused by 
 cache
 enabled on the disks or in a RAID controller, software bugs, memory 
 corruption, bad cables, etc. 

The OSS is part of a 'Snowbird' installation, so the RAID/Disk part should be 
fine.
I hope that we 'just' hit a small software bug :-/


 That is why it is still a good idea for users to run e2fsck periodically on a 
 filesystem.

Ok, we will keep this in mind (e2fsck was surprisingly fast anyway!)


Regards,
 Adrian

-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

2010-08-13 Thread Adrian Ulrich
Hi,

Since a few hours we have a problem with one of our OSTs:

One (and only one) ll_ost_create_ process on one of the OSTs
seems to go crazy and uses 100% CPU.

Rebooting the OST + MDS didn't help and there isn't much
going on on the filesystem itself:

 - /proc/fs/lustre/ost/OSS/ost_create/stats is almost 'static'
 - iostat shows almost no usage
 - ib traffic is  100 kb/s


The MDS logs this each ~3 minutes:
 Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while 
communicating with 10.201.62...@o2ib. The ost_connect operation failed with -16
..and later:
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery 
failed: -110
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST 
idx 5/32: rc = -110
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at 
mds_lov_clear_orphans: -110
 Aug 13 19:17:16 mds1 kernel: LustreError: 
10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync 
failed -110, deactivating
 Aug 13 19:17:54 mds1 kernel: Lustre: 
6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all 
connections, increasing latency to 51s

oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create 
process)

On the affected OSS we get
 Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: 
refuse reconnection from lustre1-mdtlov_u...@10.201.62.11@o2ib to 
0x8102164d0200; still busy with 2 active RPCs


$ llog_reader lustre-log.1281718692.11833 shows:
Bit 0 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -32511 of 284875 not set
Bit 0 of 284875 not set
Bit -1 of 284875 not set
Bit 0 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -1 of 284875 not set
Bit 0 of 284875 not set
Segmentation fault -- *ouch*


And we get tons of soft-cpu lockups :-/

Any ideas?


Regards,
 Adrian


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

2010-08-13 Thread Adrian Ulrich
Hi Alexey,

 Llog_reader is tool to read configuration llog, if you want decode debug log, 
 you should use lctl df $file  $output

Oh, sorry for mixing this up.

'lctl df' doesn't show much new stuff:

0001:0400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect())
 lustre1-OST0005:
 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting
0001:0400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect())
 lustre1-OST0005:
refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f4...@10.201.48.12@o2ib 
to 0x8101c93b6000; still busy with 1 active RPCs
0001:0002:
0001:0400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect())
 lustre1-OST0005:
lustre1-mdtlov_UUID reconnecting


 please post soft-lookup report. one of possibility, MDS ask too many objects 
 to create on that OST or OST have too many reconnects.

LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 
previous similar messages
BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833]
CPU 4:
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) 
crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U)
 ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) 
netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm
(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) 
xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) 
cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) 
scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) 
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) 
ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U) 
tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U) 
e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) 
dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) 
ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) 
scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) 
shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) 
ohci_hcd(U) ehci_hcd(U)
Pid: 11833, comm: ll_ost_creat_00 Tainted: G  
2.6.18-128.7.1.el5_lustre.1.8.1.1 #1
RIP: 0010:[88b49af4]  [88b49af4] 
:ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0
RSP: 0018:8101e715d500  EFLAGS: 0202
RAX:  RBX: 0008 RCX: 0c91a9f1
RDX: 8101e8893800 RSI: 8101e715d4e8 RDI: 81010773e838
RBP: 0002 R08: 81017bd9cff8 R09: 81017bd9c000
R10: 810216dfb000 R11: 4c6578dc R12: 81017d41e6d0
R13: 80063b4c R14: 8101e715d5b8 R15: 80014fae
FS:  2ab1d8b97220() GS:81021fc74bc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 09c9c178 CR3: 00201000 CR4: 06e0

Call Trace:
 [8001a1b8] vsnprintf+0x559/0x59e
 [8005ba4d] cache_alloc_refill+0x106/0x186
 [88b4bf63] :ldiskfs:ldiskfs_lookup+0x53/0x290
 [800366e8] __lookup_hash+0x10b/0x130
 [800e2c9b] lookup_one_len+0x53/0x61
 [88bd71ed] :obdfilter:filter_fid2dentry+0x42d/0x730
 [88bd3383] :obdfilter:filter_statfs+0x273/0x350
 [8026f08b] __down_trylock+0x44/0x4e
 [88bd2f00] :obdfilter:filter_parent_lock+0x20/0x220
 [88bd7d43] :obdfilter:filter_precreate+0x843/0x19e0
 [887e0923] :lnet:lnet_ni_send+0x93/0xd0
 [8000d0bd] dput+0x23/0x10a
 [88be1e19] :obdfilter:filter_create+0x10b9/0x15e0
 [887e6ad2] :lnet:LNetPut+0x702/0x800
 [888e9a13] :ptlrpc:ptl_send_buf+0x3f3/0x5b0
 [888ef394] :ptlrpc:lustre_msg_add_version+0x34/0x110
 [888ea198] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0
 [888f1f69] :ptlrpc:lustre_pack_reply+0x29/0xb0
 [88ba161d] :ost:ost_handle+0x131d/0x5a70
 [8001a1b8] vsnprintf+0x559/0x59e
 [88797978] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990
 [80148e8c] __next_cpu+0x19/0x28
 [80088f36] find_busiest_group+0x20d/0x621
 [888f3f05] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
 [80089d8d] enqueue_task+0x41/0x56
 [888f8c1d] :ptlrpc:ptlrpc_check_req+0x1d/0x110
 [888fb357] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170
 [8003d382] lock_timer_base+0x1b/0x3c
 [8008881d] __wake_up_common+0x3e/0x68
 [888fee08] :ptlrpc:ptlrpc_main+0x1218/0x13e0
 [8008a3f3] default_wake_function+0x0/0xe
 [8005dfb1] child_rip+0xa/0x11
 [888fdbf0] :ptlrpc:ptlrpc_main+0x0/0x13e0
 [8005dfa7] child_rip+0x0/0x11


Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1)

Regards,
 Adrian
___
Lustre-discuss mailing list

Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

2010-08-13 Thread Adrian Ulrich
Hi Alexey,

 in general soft-lookup isn't error, that just notice about some operation is 
 need too many time (more then 10s i think).
 attached soft-lookup say - OST is busy with creating objects after MDSOST 
 reconnect, 

Yes, i know that a soft-lockup doesn't mean that i hit a bug but having 
ll_ost_creat_* wasting 100% CPU
doesn't seem to be normal.

 i think you have too busy disks or overloaded node.

Disk %busy is  5% for all attached disks.
The OST is doing almost nothing (there are a few read()'s, that's all)


 if you have slow disks - client can be disconnected before they request is 
 processing, and that request blocked to reconnect from that client.

The recovery of the clients seems to be ok: all clients can write/read data 
from the OST but
there is something wrong between the MDS-OST0005.

But this might just be a side-effect of the ll_ost_creat_* issue :-/

Regards,
 Adrian


-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN

2010-07-30 Thread Adrian Ulrich
Hi Frederik,

 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, 

`io-size' reads its data from the 'disk I/O size' part of brw_stats:

   read  | write
 disk I/O size  ios   % cum % |  ios   % cum %

..and in your case there are no stats, (for reasons unknown to me...)
that's why lustre-info.pl cannot display anything.

Otherwise the file looks fine: eg. `--monitor=io-time' should work.

Regards,
 Adrian


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN

2010-07-29 Thread Adrian Ulrich
Hi Frederik,

 If you've got a lot of OSTs on  your server you need a wide monitor for some
 of the options like  --monitor=ost-patterns for all OSTs...

The output format is not ideal, but it's a good reason to upgrade your
workstation to a dualhead configuration ;-)


 Looking at   the code it seems the value for 'setattr' is missing from the 
 stats file 
 for some of our OSTs. Looking at the stats file, indeed the setattr line 
 is missing for some OSTs.

As Andreas already said: If 'setattr' is missing, there was no setattr 
operation (yet).
Changing
  printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice);
into
  printf(, %s=%5.1f R/s,$type,(($stats-{$type}||0)/$slice) );

should fix the warning. (The totals are ok, because in perl  undef/$x == 0/$x)


 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, 
 please wait... for a very long time until I killed it, I have not had 
 the time to debug this yet.

I never tested it with anything else than 1.8.1.1 but this should be
trivial to fix:

Could you mail me the output of
 /proc/fs/lustre/obdfilter/##SOME_OST##/exports/##A_RANDOM_NID##/brw_stats ?

Regards,
 Adrian



-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN

2010-07-28 Thread Adrian Ulrich
First: Sorry for the shameless self advertising, but...

I uploaded two lustre-related modules to the CPAN:

#1: Lustre::Info provides easy access to information located
at /proc/fs/lustre, it also comes with a 'performance monitoring'
script called 'lustre-info.pl'

#2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but
   with additional lustre-specific features ($dir_fh-set_stripe...)


Examples and details:

Lustre::Info and lustre-info.pl
---

Lustre::Info provides a Perl-OO interface to lustres procfs information.

(confusing) example code to get the blockdevice of all OSTs:
 
 #
 my $l = Lustre::Info-new;
 print join(\n, map( { $l-get_ost($_)-get_name.: 
.$l-get_ost($_)-get_blockdevice } \
@{$l-get_ost_list}), '' ) if $l-is_ost;
 #

..output:
 $ perl test.pl
 lustre1-OST001e: /dev/md17
 lustre1-OST0016: /dev/md15
 lustre1-OST000e: /dev/md13
 lustre1-OST0006: /dev/md11

The module also includes a script called 'lustre-info.pl' that can
be used to gather some live performance statistics:

Use `--ost-stats' to get a quick overview on what's going on:
$ lustre-info.pl --ost-stats
 lustre1-OST0006 (@ /dev/md11) :  write=   5.594 MB/s, read=   0.000 MB/s, 
create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  6.0 R/s
 lustre1-OST000e (@ /dev/md13) :  write=   3.997 MB/s, read=   0.000 MB/s, 
create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  4.0 R/s
 lustre1-OST0016 (@ /dev/md15) :  write=   5.502 MB/s, read=   0.000 MB/s, 
create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  6.0 R/s
 lustre1-OST001e (@ /dev/md17) :  write=   5.905 MB/s, read=   0.000 MB/s, 
create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  6.7 R/s


You can also get client-ost details via `--monitor=MODE'

$ lustre-info.pl --monitor=ost --as-list  # this will only show clients where 
read+write = 1MB/s
 client nid   | lustre1-OST0006| lustre1-OST000e| lustre1-OST0016  
   | lustre1-OST001e| +++ TOTALS +++ (MB/s)
10.201.46...@o2ib  | r=   0.0, w=   0.0 | r=   0.0, w=   0.0 | r=   0.0, w=   
0.0 | r=   0.0, w=   1.1 | read=   0.0, write=   1.1
10.201.47...@o2ib  | r=   0.0, w=   0.0 | r=   0.0, w=   1.2 | r=   0.0, w=   
2.0 | r=   0.0, w=   0.0 | read=   0.0, write=   3.2


There are many more options, checkout `lustre-info.pl --help' for details!


Lustre::LFS::Dir and Lustre::LFS::File
---

This two packages behave like IO::File and IO::Dir but both of
them add some lustre-only features to the returned filehandle.

Quick example:
 my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH
 $fh-open( test) or die;
 print $fh Foo Bar!\n;
 my $stripe_info = $fh-get_stripe or die Not on a lustre filesystem?!\n;



Keep in mind that both Lustre modules are far from being complete:
Lustre::Info really needs some MDT support and Lustre::LFS is just a
wrapper for /usr/bin/lfs: An XS-Version would be much better.

But i'd love to hear some feedback if someone decides to play around
with this modules + lustre-info.pl :-)


Cheers,
 Adrian


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss