Re: [lustre-discuss] "Not on preferred path" error

2016-09-21 Thread Ben Evans
That's the way multipath is showing it, yes, however back in the 1.8 days
we used LSI's propriatary multipathing kernel modules called MPP.  MPP
presented both paths to the device driver layer as a single device, so the
multipath view would have a single path.

I no longer have any of my notes from this sort of thing, I don't know if
there are any old-school LSI/NetApp/Engenio people on here who would have
a better chance with diagnosing this sort of thing.

-Ben Evans

On 9/21/16, 1:37 PM, "Tao, Zhiqi" <zhiqi@intel.com> wrote:

>It appears that there is only one SAS path to the back storage, which
>explained why some of LUN showed on non-preferred path.
>
>Typically we recommend to have two SAS connections from each OSS to the
>storage. One connects to the upper controller and one connects to the
>lower controller. Then, distributed LUNs between two controllers. In the
>event of SAS connection failure, all LUNs would failover to one
>controller. The one used to go through the other controller would shows
>that they are not on the preferred path. As this kind of failover
>happened on the multipath layer, it's transparent to Lustre. The file
>system continues to run as you observed.
>
>Best Regards,
>Zhiqi
>
>-Original Message-
>From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On
>Behalf Of Lewis Hyatt
>Sent: Tuesday, September 20, 2016 12:53 PM
>To: Ben Evans <bev...@cray.com>; lustre-discuss@lists.lustre.org
>Subject: Re: [lustre-discuss] "Not on preferred path" error
>
>I see, thanks. This is what we see from running multipath cmds... i don't
>see anything that means anything to me, but FWIW it looks the same as on
>our other OSS that is working ok.
>
>$multipath -ll
>map03 (360080e50002ee510023f50092c6c) dm-13 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][active]
>  \_ 3:0:1:3 sdk 8:160 [active][ready]
>map02 (360080e50002ee410024250092c11) dm-12 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][active]
>  \_ 3:0:1:2 sdj 8:144 [active][ready]
>map01 (360080e50002ee510023b50092c4c) dm-11 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][enabled]
>  \_ 3:0:1:1 sdi 8:128 [active][ready]
>map00 (360080e50002ee410023e50092bf2) dm-10 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][enabled]
>  \_ 3:0:1:0 sdh 8:112 [active][ready]
>map09 (360080e50002ee4dc02f250092c62) dm-7 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][enabled]
>  \_ 3:0:0:3 sde 8:64  [active][ready]
>map11 (360080e50002ee4dc02f650092c84) dm-9 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][active]
>  \_ 3:0:0:5 sdg 8:96  [active][ready]
>map08 (360080e50002ec89002e550092a07) dm-6 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][enabled]
>  \_ 3:0:0:2 sdd 8:48  [active][ready]
>map10 (360080e50002ec89002e950092a27) dm-8 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][active]
>  \_ 3:0:0:4 sdf 8:80  [active][ready]
>map07 (360080e50002ee4dc02ee50092c44) dm-5 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][active]
>  \_ 3:0:0:1 sdc 8:32  [active][ready]
>map06 (360080e50002ec89002e1500929e9) dm-4 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][active]
>  \_ 3:0:0:0 sdb 8:16  [active][ready]
>map05 (360080e50002ee510024350092c8c) dm-15 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][enabled]
>  \_ 3:0:1:5 sdm 8:192 [active][ready]
>map04 (360080e50002ee410024650092c31) dm-14 LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][rw]
>\_ round-robin 0 [prio=1][enabled]
>  \_ 3:0:1:4 sdl 8:176 [active][ready]
>
>===
>
>$multipath -r
>reload: map06 (360080e50002ec89002e1500929e9)  LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][n/a]
>\_ round-robin 0 [prio=1][undef]
>  \_ 3:0:0:0 sdb 8:16  [active][ready]
>reload: map07 (360080e50002ee4dc02ee50092c44)  LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][n/a]
>\_ round-robin 0 [prio=1][undef]
>  \_ 3:0:0:1 sdc 8:32  [active][ready]
>reload: map08 (360080e50002ec89002e550092a07)  LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][n/a]
>\_ round-robin 0 [prio=1][undef]
>  \_ 3:0:0:2 sdd 8:48  [active][ready]
>reload: map09 (360080e50002ee4dc02f250092c62)  LSI,VirtualDisk
>[size=15T][features=0][hwhandler=0][n/a]
>\_ round-robin 0 [prio=1][undef]
>  \_ 3:0:0:3 sde 8:64  [active][ready]
>reloa

Re: [lustre-discuss] "Not on preferred path" error

2016-09-21 Thread Tao, Zhiqi
It appears that there is only one SAS path to the back storage, which explained 
why some of LUN showed on non-preferred path. 

Typically we recommend to have two SAS connections from each OSS to the 
storage. One connects to the upper controller and one connects to the lower 
controller. Then, distributed LUNs between two controllers. In the event of SAS 
connection failure, all LUNs would failover to one controller. The one used to 
go through the other controller would shows that they are not on the preferred 
path. As this kind of failover happened on the multipath layer, it's 
transparent to Lustre. The file system continues to run as you observed. 

Best Regards,
Zhiqi

-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Lewis Hyatt
Sent: Tuesday, September 20, 2016 12:53 PM
To: Ben Evans <bev...@cray.com>; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] "Not on preferred path" error

I see, thanks. This is what we see from running multipath cmds... i don't see 
anything that means anything to me, but FWIW it looks the same as on our other 
OSS that is working ok.

$multipath -ll
map03 (360080e50002ee510023f50092c6c) dm-13 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:1:3 sdk 8:160 [active][ready]
map02 (360080e50002ee410024250092c11) dm-12 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:1:2 sdj 8:144 [active][ready]
map01 (360080e50002ee510023b50092c4c) dm-11 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:1 sdi 8:128 [active][ready]
map00 (360080e50002ee410023e50092bf2) dm-10 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:0 sdh 8:112 [active][ready]
map09 (360080e50002ee4dc02f250092c62) dm-7 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:0:3 sde 8:64  [active][ready]
map11 (360080e50002ee4dc02f650092c84) dm-9 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:5 sdg 8:96  [active][ready]
map08 (360080e50002ec89002e550092a07) dm-6 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:0:2 sdd 8:48  [active][ready]
map10 (360080e50002ec89002e950092a27) dm-8 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:4 sdf 8:80  [active][ready]
map07 (360080e50002ee4dc02ee50092c44) dm-5 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:1 sdc 8:32  [active][ready]
map06 (360080e50002ec89002e1500929e9) dm-4 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:0 sdb 8:16  [active][ready]
map05 (360080e50002ee510024350092c8c) dm-15 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:5 sdm 8:192 [active][ready]
map04 (360080e50002ee410024650092c31) dm-14 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:4 sdl 8:176 [active][ready]

===

$multipath -r
reload: map06 (360080e50002ec89002e1500929e9)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:0 sdb 8:16  [active][ready]
reload: map07 (360080e50002ee4dc02ee50092c44)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:1 sdc 8:32  [active][ready]
reload: map08 (360080e50002ec89002e550092a07)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:2 sdd 8:48  [active][ready]
reload: map09 (360080e50002ee4dc02f250092c62)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:3 sde 8:64  [active][ready]
reload: map10 (360080e50002ec89002e950092a27)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:4 sdf 8:80  [active][ready]
reload: map11 (360080e50002ee4dc02f650092c84)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:5 sdg 8:96  [active][ready]
reload: map00 (360080e50002ee410023e50092bf2)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:0 sdh 8:112 [active][ready]
reload: map01 (360080e50002ee510023b50092c4c)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:1 sdi 8:128 [active][ready]
reload: map02 (360080e50002ee410024250092c11)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:2 sdj 8:144 [active][read

Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Lewis Hyatt
I see, thanks. This is what we see from running multipath cmds... i don't see 
anything that means anything to me, but FWIW it looks the same as on our other 
OSS that is working ok.


$multipath -ll
map03 (360080e50002ee510023f50092c6c) dm-13 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:1:3 sdk 8:160 [active][ready]
map02 (360080e50002ee410024250092c11) dm-12 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:1:2 sdj 8:144 [active][ready]
map01 (360080e50002ee510023b50092c4c) dm-11 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
 \_ 3:0:1:1 sdi 8:128 [active][ready]
map00 (360080e50002ee410023e50092bf2) dm-10 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
 \_ 3:0:1:0 sdh 8:112 [active][ready]
map09 (360080e50002ee4dc02f250092c62) dm-7 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
 \_ 3:0:0:3 sde 8:64  [active][ready]
map11 (360080e50002ee4dc02f650092c84) dm-9 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:0:5 sdg 8:96  [active][ready]
map08 (360080e50002ec89002e550092a07) dm-6 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
 \_ 3:0:0:2 sdd 8:48  [active][ready]
map10 (360080e50002ec89002e950092a27) dm-8 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:0:4 sdf 8:80  [active][ready]
map07 (360080e50002ee4dc02ee50092c44) dm-5 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:0:1 sdc 8:32  [active][ready]
map06 (360080e50002ec89002e1500929e9) dm-4 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
map05 (360080e50002ee510024350092c8c) dm-15 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
 \_ 3:0:1:5 sdm 8:192 [active][ready]
map04 (360080e50002ee410024650092c31) dm-14 LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
 \_ 3:0:1:4 sdl 8:176 [active][ready]

===

$multipath -r
reload: map06 (360080e50002ec89002e1500929e9)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
reload: map07 (360080e50002ee4dc02ee50092c44)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:0:1 sdc 8:32  [active][ready]
reload: map08 (360080e50002ec89002e550092a07)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:0:2 sdd 8:48  [active][ready]
reload: map09 (360080e50002ee4dc02f250092c62)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:0:3 sde 8:64  [active][ready]
reload: map10 (360080e50002ec89002e950092a27)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:0:4 sdf 8:80  [active][ready]
reload: map11 (360080e50002ee4dc02f650092c84)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:0:5 sdg 8:96  [active][ready]
reload: map00 (360080e50002ee410023e50092bf2)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:1:0 sdh 8:112 [active][ready]
reload: map01 (360080e50002ee510023b50092c4c)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:1:1 sdi 8:128 [active][ready]
reload: map02 (360080e50002ee410024250092c11)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:1:2 sdj 8:144 [active][ready]
reload: map03 (360080e50002ee510023f50092c6c)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:1:3 sdk 8:160 [active][ready]
reload: map04 (360080e50002ee410024650092c31)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:1:4 sdl 8:176 [active][ready]
reload: map05 (360080e50002ee510024350092c8c)  LSI,VirtualDisk
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
 \_ 3:0:1:5 sdm 8:192 [active][ready]

Thanks again for the assistance all, I really appreciate it.

-lewis


On 9/20/16 2:48 PM, Ben Evans wrote:

multipath is a linux utility which handles communications from the server
to the disk array.  It is independent of Lustre or Infiniband.  For OSSes,
each OSS had 2 connections to each storage array it communicated with,
usually there were a pair of arrays per OSS pair (except for in a rare
handful of our systems which had 1).

-Ben Evans

On 9/20/16, 2:33 PM, "lustre-discuss on behalf of Lewis 

Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Ben Evans
multipath is a linux utility which handles communications from the server
to the disk array.  It is independent of Lustre or Infiniband.  For OSSes,
each OSS had 2 connections to each storage array it communicated with,
usually there were a pair of arrays per OSS pair (except for in a rare
handful of our systems which had 1).

-Ben Evans

On 9/20/16, 2:33 PM, "lustre-discuss on behalf of Lewis Hyatt"

wrote:

>Thanks so much for the information, we will look into this asap.
>Forgive my ignorance, but is multipath here referring to some
>lustre-specific 
>or infiniband-related process? Not familiar with it in this context.
>Thanks again.
>
>-lewis
>
>
>On 9/20/16 2:24 PM, Ben Evans wrote:
>> Lewis,
>>
>> Yes, "Not on preferred path" is something that bubbles up through the TS
>> gui from multipath.
>>
>> A simple thing you can check is running multipath -ll on the OSS (and
>>it's
>> peer) in question and seeing if it reports that one or more path is
>>down.
>> If it's just on one OSS, try running 'multipath -r'.  If it doesn't come
>> back and look OK, then it's most likely a cable issue, and you can try
>> re-seating it to see if it helps.  It's been a long time since I
>>diagnosed
>> this, though and can't remember the details of how to associate cables
>> with paths, though there should be indicator lights on the back of
>> everything and the path that is down should be red.
>>
>> The high load is probably associated with the cable issue, since you're
>> putting more strain on one path.
>>
>> -Ben Evans
>>
>> On 9/20/16, 12:21 PM, "lustre-discuss on behalf of Lewis Hyatt"
>> 
>> wrote:
>>
>>> Hello-
>>>
>>> I am having an issue with a lustre 1.8 array that I have little hope
>>> of figuring out on my own, so I thought I would try here to see if
>>> anyone might know what this warning/error means. Our array was built
>>> by Terascala, which no longer exists, so we have no support for it and
>>> little documentation (and not much in-house knowledge). I see this
>>> complaint "Not on preferred path" on the GUI that we have, which I
>>> assume was something custom made by Terascala, and I am not sure even
>>> what path it is referring to; we use infiniband for all connections
>>> and it could relate to this, but not sure. We see this error on 3 of
>>> the 12 OSTs. More specifically, we have 2 OSSs, each handling 6 OSTs,
>>> and all 3 of the "not on optimal path" OSTs are on the same OSS.
>>>
>>> We do not know if it's related, but this same OSS is in a very bad
>>> state, with very high load average (200), very high I/O wait time, and
>>> taking many seconds to respond to each read request, making the array
>>> more or less unusable. That's the problem we are trying to fix.
>>>
>>> I realize there's not much hope for anyone to help us with that given
>>> how little information I am able to provide. But I was hoping someone
>>> out there might know what this "not on optimal path" error means, and
>>> if it matters for anything or not, so we have somewhere to start.
>>> Thanks very much!
>>>
>>> I could provide screen shots of the management GUI we have, if it
>>> would be informative.
>>>
>>> -Lewis
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Lewis Hyatt

Thanks so much for the information, we will look into this asap.
Forgive my ignorance, but is multipath here referring to some lustre-specific 
or infiniband-related process? Not familiar with it in this context. Thanks again.


-lewis


On 9/20/16 2:24 PM, Ben Evans wrote:

Lewis,

Yes, "Not on preferred path" is something that bubbles up through the TS
gui from multipath.

A simple thing you can check is running multipath -ll on the OSS (and it's
peer) in question and seeing if it reports that one or more path is down.
If it's just on one OSS, try running 'multipath -r'.  If it doesn't come
back and look OK, then it's most likely a cable issue, and you can try
re-seating it to see if it helps.  It's been a long time since I diagnosed
this, though and can't remember the details of how to associate cables
with paths, though there should be indicator lights on the back of
everything and the path that is down should be red.

The high load is probably associated with the cable issue, since you're
putting more strain on one path.

-Ben Evans

On 9/20/16, 12:21 PM, "lustre-discuss on behalf of Lewis Hyatt"

wrote:


Hello-

I am having an issue with a lustre 1.8 array that I have little hope
of figuring out on my own, so I thought I would try here to see if
anyone might know what this warning/error means. Our array was built
by Terascala, which no longer exists, so we have no support for it and
little documentation (and not much in-house knowledge). I see this
complaint "Not on preferred path" on the GUI that we have, which I
assume was something custom made by Terascala, and I am not sure even
what path it is referring to; we use infiniband for all connections
and it could relate to this, but not sure. We see this error on 3 of
the 12 OSTs. More specifically, we have 2 OSSs, each handling 6 OSTs,
and all 3 of the "not on optimal path" OSTs are on the same OSS.

We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.

I realize there's not much hope for anyone to help us with that given
how little information I am able to provide. But I was hoping someone
out there might know what this "not on optimal path" error means, and
if it matters for anything or not, so we have somewhere to start.
Thanks very much!

I could provide screen shots of the management GUI we have, if it
would be informative.

-Lewis
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Joe Landman

On 09/20/2016 01:39 PM, Lewis Hyatt wrote:

Thanks very much for the suggestions. dmesg output is here:
http://pastebin.com/jCafCZiZ
We don't see any disk-related stuff there, and also our GUI shows all
the RAID arrays as being fine.


Hmmm  I rarely trust GUIs for RAID.  Do you have underlying CLI 
tools you can do a sanity check with?



If anything in there jumps out at you, I'd really appreciate your
thoughts! We are almost certainly going to reboot the affected OSS later
today to see how that goes.


Not seeing anything leap out other than two particular targets, 
twlstr-OST000b and twlstr-OST0006, appear to be "slow".  This appears to 
be what is causing client evictions, lock bits, etc.


The question is, why are these two OSTs slow.  What is the underlying 
RAID, how many operations are queued up, etc.?


A tool we recommend for (nearly instantaneous) holistic level views on a 
system is glances, which you can install via pip


pip install glances

then run it as

glances -t 1

to get a second by second view of your system.  Dstat is also good.

Dumb question ... what does

swapon -s

report?  I am assuming you aren't swapping (and don't have swap enabled 
on the system, but it never hurts to ask).


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Lewis Hyatt

Thanks very much for the suggestions. dmesg output is here:
http://pastebin.com/jCafCZiZ
We don't see any disk-related stuff there, and also our GUI shows all the RAID 
arrays as being fine.


If anything in there jumps out at you, I'd really appreciate your thoughts! We 
are almost certainly going to reboot the affected OSS later today to see how 
that goes.


We're a fairly small team (12 people or so) so I have a good feel what everyone 
is doing and they should not be abusing it too badly... We did recently ask 
people to delete small files they may have, do you think deletion of a lot of 
small files could trigger such issues? Thanks again!


-lewis


On 9/20/16 12:29 PM, Joe Landman wrote:

On 09/20/2016 12:21 PM, Lewis Hyatt wrote:


We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.


This sounds like a storage system failure.  Queuing up of IOs to drive the load
to 200 usually means something is broken elsewhere in the stack at a lower
level.  Not always ... sometimes you have users who like to write several
million/billion small ( < 100 byte ) files.

What does dmesg report?  Try to do a pastebin/gist of it, and point it to the
list.

Things that come to mind are

a) offlined RAID (most likely):  This would explain the user load, and all
sorts of strange messages about block devices and file systems in the logs

b) A user DoS against the storage: usually someone writing many tiny files.

There are other possibilities, but these seem more likely.




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Bob Ball
Stabbing in the dark, but this sounds like a multipath problem. Perhaps 
you have 2 or more paths to the storage, and one or more of them is down 
for some reason, perhaps the hardware itself, perhaps a cable is 
pulled  You could look for LEDs in a bad state.


I always find it instructive to reboot such a system and watch what 
comes up on the console during the startup.


bob

On 9/20/2016 12:29 PM, Joe Landman wrote:

On 09/20/2016 12:21 PM, Lewis Hyatt wrote:


We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.


This sounds like a storage system failure.  Queuing up of IOs to drive 
the load to 200 usually means something is broken elsewhere in the 
stack at a lower level.  Not always ... sometimes you have users who 
like to write several million/billion small ( < 100 byte ) files.


What does dmesg report?  Try to do a pastebin/gist of it, and point it 
to the list.


Things that come to mind are

a) offlined RAID (most likely):  This would explain the user load, and 
all sorts of strange messages about block devices and file systems in 
the logs


b) A user DoS against the storage: usually someone writing many tiny 
files.


There are other possibilities, but these seem more likely.





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Joe Landman

On 09/20/2016 12:21 PM, Lewis Hyatt wrote:


We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.


This sounds like a storage system failure.  Queuing up of IOs to drive 
the load to 200 usually means something is broken elsewhere in the stack 
at a lower level.  Not always ... sometimes you have users who like to 
write several million/billion small ( < 100 byte ) files.


What does dmesg report?  Try to do a pastebin/gist of it, and point it 
to the list.


Things that come to mind are

a) offlined RAID (most likely):  This would explain the user load, and 
all sorts of strange messages about block devices and file systems in 
the logs


b) A user DoS against the storage: usually someone writing many tiny files.

There are other possibilities, but these seem more likely.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org