Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-29 Thread Angelo
Sorry for my mistake, Andreas.

I'm using RAID-1 for the MDT.

Regards,
Angelo

2016-03-28 19:51 GMT-03:00 Dilger, Andreas :

> On 2016/03/28, 08:01, "lustre-discuss on behalf of Angelo Cavalcanti" <
> lustre-discuss-boun...@lists.lustre.org on behalf of acrribe...@gmail.com>
> wrote:
>
> Dear all,
>
> We're having trouble with a lustre 2.5.3 implementation. This is our setup:
>
>
>-
>
>One server for MGS/MDS/MDT. MDT is served from a raid-6 backed
>partition of 2TB (que tipo de hd?)
>
>
> Note that using RAID-6 for the MDT storage will significantly hurt your
> metadata
> performance, since this will incur a lot of read-modify-write overhead
> when doing
> 4KB metadata block updates.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel High Performance Data Division
>
>
>-
>
>Two OSS/OST in a active/active HA with pacemaker. Both are connected
>to a storage via SAS.
>
>
>
>- One SGI Infinite Storage IS5600 with two raid-6 backed volume
>groups. Each group has two volumes, each volume has 15TB capacity.
>
>
> Volumes are recognized by OSSs as multipath devices, each voulme has 4
> paths. Volumes were created with a GPT partition table and a single
> partition.
>
> Volume partitions were then formatted as OSTs with the following command:
>
> # mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E
> stride=128,stripe_width=1024"
> --mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1
> --mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1
> --servicenode=10.149.0.152@o2ib1
> /dev/mapper/360080e500029eaec012656951fcap1
>
>
> Testing with bonnie++ in a client with the below command:
>
> $ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0
> -f -b -u vhpc
>
>
> No problem creating files inside the lustre mount point, but *rewriting*
> the same files results in the errors below:
>
>
> Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed
>
> Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca:
> remaining active paths: 3
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06
> d8 22 00 20 00 00
>
> Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:128.
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:192.
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 06
> d8 22 00 20 00 00
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:64.
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0.
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 06
> d8 22 00 20 00 00
>
> Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: sdi -
> rdac checker reports path is up
>
> Mar 18 17:46:14 oss01 multipathd: 8:128: reinstated
>
> Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca:
> remaining active paths: 4
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:14 oss01 kernel: device-mapper: multipath: Failing path 8:128.
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 

Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-29 Thread Angelo
Thank you very much, Nate.

It's works.

I setup the "max_sectors" parameter to "4096":

# cat /etc/modprobe.d/mpt2sas.conf
options mpt2sas max_sectors=4096

And the bonnie++ tests were sucessfully executed.

Regards,
Angelo

2016-03-28 19:55 GMT-03:00 Nate Pearlstein :

> I thought I responded to the entire list but only sent to Angelo,
>
> Very likely, lustre on the oss nodes is setting the max_sectors_kb all the
> way up to max_hw_sectors_kb and this value ends up being too large for the
> sas hca.  You should set max_sectors for you mpt2sas to something smaller
> like 4096, rebuild the initrd and this will put a better limit on
> max_hw_sectors_kb for the is5600 luns…
>
>
> > On Mar 28, 2016, at 6:51 PM, Dilger, Andreas 
> wrote:
> >
> > On 2016/03/28, 08:01, "lustre-discuss on behalf of Angelo Cavalcanti" <
> lustre-discuss-boun...@lists.lustre.org lustre-discuss-boun...@lists.lustre.org> on behalf of acrribe...@gmail.com
> > wrote:
> >
> >
> > Dear all,
> >
> > We're having trouble with a lustre 2.5.3 implementation. This is our
> setup:
> >
> >
> >  *   One server for MGS/MDS/MDT. MDT is served from a raid-6 backed
> partition of 2TB (que tipo de hd?)
> >
> > Note that using RAID-6 for the MDT storage will significantly hurt your
> metadata
> > performance, since this will incur a lot of read-modify-write overhead
> when doing
> > 4KB metadata block updates.
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Lustre Principal Architect
> > Intel High Performance Data Division
> >
> >
> >  *   Two OSS/OST in a active/active HA with pacemaker. Both are
> connected to a storage via SAS.
> >
> >
> >  *   One SGI Infinite Storage IS5600 with two raid-6 backed volume
> groups. Each group has two volumes, each volume has 15TB capacity.
> >
> >
> > Volumes are recognized by OSSs as multipath devices, each voulme has 4
> paths. Volumes were created with a GPT partition table and a single
> partition.
> >
> >
> > Volume partitions were then formatted as OSTs with the following command:
> >
> >
> > # mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E
> stride=128,stripe_width=1024"
> --mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1
> --mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1
> --servicenode=10.149.0.152@o2ib1
> /dev/mapper/360080e500029eaec012656951fcap1
> >
> >
> > Testing with bonnie++ in a client with the below command:
> >
> > $ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0
> -f -b -u vhpc
> >
> >
> > No problem creating files inside the lustre mount point, but *rewriting*
> the same files results in the errors below:
> >
> >
> > Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed
> >
> > Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca:
> remaining active paths: 3
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00
> 06 d8 22 00 20 00 00
> >
> > Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed
> >
> > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path
> 8:128.
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00
> 07 18 22 00 18 00 00
> >
> > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path
> 8:192.
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> >
> > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00
> 06 d8 22 00 20 00 00
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00
> 07 18 22 00 18 00 00
> >
> > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path
> 8:64.
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00
> 07 18 22 00 18 00 00
> >
> > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0.
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
> >
> > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> >
> > Mar 18 17:46:13 oss01 

Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-28 Thread Nate Pearlstein
I thought I responded to the entire list but only sent to Angelo,

Very likely, lustre on the oss nodes is setting the max_sectors_kb all the way 
up to max_hw_sectors_kb and this value ends up being too large for the sas hca. 
 You should set max_sectors for you mpt2sas to something smaller like 4096, 
rebuild the initrd and this will put a better limit on max_hw_sectors_kb for 
the is5600 luns…


> On Mar 28, 2016, at 6:51 PM, Dilger, Andreas  wrote:
> 
> On 2016/03/28, 08:01, "lustre-discuss on behalf of Angelo Cavalcanti" 
> 
>  on behalf of acrribe...@gmail.com> wrote:
> 
> 
> Dear all,
> 
> We're having trouble with a lustre 2.5.3 implementation. This is our setup:
> 
> 
>  *   One server for MGS/MDS/MDT. MDT is served from a raid-6 backed partition 
> of 2TB (que tipo de hd?)
> 
> Note that using RAID-6 for the MDT storage will significantly hurt your 
> metadata
> performance, since this will incur a lot of read-modify-write overhead when 
> doing
> 4KB metadata block updates.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel High Performance Data Division
> 
> 
>  *   Two OSS/OST in a active/active HA with pacemaker. Both are connected to 
> a storage via SAS.
> 
> 
>  *   One SGI Infinite Storage IS5600 with two raid-6 backed volume groups. 
> Each group has two volumes, each volume has 15TB capacity.
> 
> 
> Volumes are recognized by OSSs as multipath devices, each voulme has 4 paths. 
> Volumes were created with a GPT partition table and a single partition.
> 
> 
> Volume partitions were then formatted as OSTs with the following command:
> 
> 
> # mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E 
> stride=128,stripe_width=1024" 
> --mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1 
> --mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1 
> --servicenode=10.149.0.152@o2ib1 
> /dev/mapper/360080e500029eaec012656951fcap1
> 
> 
> Testing with bonnie++ in a client with the below command:
> 
> $ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0 -f 
> -b -u vhpc
> 
> 
> No problem creating files inside the lustre mount point, but *rewriting* the 
> same files results in the errors below:
> 
> 
> Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed
> 
> Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca: 
> remaining active paths: 3
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06 d8 
> 22 00 20 00 00
> 
> Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed
> 
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:128.
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 07 18 
> 22 00 18 00 00
> 
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:192.
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> 
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 06 d8 
> 22 00 20 00 00
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 07 18 
> 22 00 18 00 00
> 
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:64.
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 07 18 
> 22 00 18 00 00
> 
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0.
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
> 
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 06 d8 
> 22 00 20 00 00
> 
> Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: sdi - 
> rdac checker reports path is up
> 
> Mar 18 17:46:14 oss01 multipathd: 8:128: reinstated
> 
> Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: 
> remaining active paths: 4
> 
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
> 
> Mar 

Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-28 Thread Dilger, Andreas
On 2016/03/28, 08:01, "lustre-discuss on behalf of Angelo Cavalcanti" 

 on behalf of acrribe...@gmail.com> wrote:


Dear all,

We're having trouble with a lustre 2.5.3 implementation. This is our setup:


  *   One server for MGS/MDS/MDT. MDT is served from a raid-6 backed partition 
of 2TB (que tipo de hd?)

Note that using RAID-6 for the MDT storage will significantly hurt your metadata
performance, since this will incur a lot of read-modify-write overhead when 
doing
4KB metadata block updates.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel High Performance Data Division


  *   Two OSS/OST in a active/active HA with pacemaker. Both are connected to a 
storage via SAS.


  *   One SGI Infinite Storage IS5600 with two raid-6 backed volume groups. 
Each group has two volumes, each volume has 15TB capacity.


Volumes are recognized by OSSs as multipath devices, each voulme has 4 paths. 
Volumes were created with a GPT partition table and a single partition.


Volume partitions were then formatted as OSTs with the following command:


# mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E 
stride=128,stripe_width=1024" 
--mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1 
--mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1 
--servicenode=10.149.0.152@o2ib1 /dev/mapper/360080e500029eaec012656951fcap1


Testing with bonnie++ in a client with the below command:

$ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0 -f -b 
-u vhpc


No problem creating files inside the lustre mount point, but *rewriting* the 
same files results in the errors below:


Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed

Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca: remaining 
active paths: 3

Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:128.

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:192.

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:64.

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0.

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: sdi - rdac 
checker reports path is up

Mar 18 17:46:14 oss01 multipathd: 8:128: reinstated

Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: remaining 
active paths: 4

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:14 oss01 kernel: device-mapper: multipath: Failing path 8:128.

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:14 oss01 multipathd: 8:128: mark as failed

Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: remaining 
active paths: 3

Mar 18 17:46:14 oss01 multipathd: 8:192: mark as failed


Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-28 Thread Stu Midgley
upgrade your IS5600 firmware.  We were seeing this till we upgraded to
the latest NetApp firmware.

On Mon, Mar 28, 2016 at 10:30 PM, Ben Evans  wrote:
> You're getting multipathing errors, which means it's most likely not a
> filesystem-level issue.  See if you can get the logs from the storage array
> as well, there might be some detail there as to what is happening.
>
> Can you check your logs and determine if it's a single connection that is
> always failing?  If so, can you try replacing the cable and see if that
> clears it up?  Next would be checking to make sure that the source and
> destination SAS ports are good.
>
> -Ben Evans
>
> From: lustre-discuss  on behalf of
> Angelo Cavalcanti 
> Date: Monday, March 28, 2016 at 10:01 AM
> To: "lustre-discuss@lists.lustre.org" 
> Subject: [lustre-discuss] Error Lustre/multipath/storage
>
> Dear all,
>
> We're having trouble with a lustre 2.5.3 implementation. This is our setup:
>
>
> One server for MGS/MDS/MDT. MDT is served from a raid-6 backed partition of
> 2TB (que tipo de hd?)
>
>
> Two OSS/OST in a active/active HA with pacemaker. Both are connected to a
> storage via SAS.
>
>
> One SGI Infinite Storage IS5600 with two raid-6 backed volume groups. Each
> group has two volumes, each volume has 15TB capacity.
>
>
> Volumes are recognized by OSSs as multipath devices, each voulme has 4
> paths. Volumes were created with a GPT partition table and a single
> partition.
>
>
> Volume partitions were then formatted as OSTs with the following command:
>
>
> # mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E
> stride=128,stripe_width=1024"
> --mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1
> --mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1
> --servicenode=10.149.0.152@o2ib1
> /dev/mapper/360080e500029eaec012656951fcap1
>
>
> Testing with bonnie++ in a client with the below command:
>
> $ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0 -f
> -b -u vhpc
>
>
> No problem creating files inside the lustre mount point, but *rewriting* the
> same files results in the errors below:
>
>
> Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed
>
> Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca:
> remaining active paths: 3
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06
> d8 22 00 20 00 00
>
> Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:128.
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:192.
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 06
> d8 22 00 20 00 00
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:64.
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 07
> 18 22 00 18 00 00
>
> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0.
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 06
> d8 22 00 20 00 00
>
> Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: sdi -
> rdac checker reports path is up
>
> Mar 18 17:46:14 oss01 multipathd: 8:128: reinstated
>
> Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca:
> remaining active paths: 4
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result:
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>
> Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 

Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-28 Thread Stu Midgley
This is what our multipathd.conf looks like

efaults {
find_multipaths yes
user_friendly_names yes
queue_without_daemon yes
}

blacklist {
}

devices {
device {
vendor "(NETAPP|LSI|ENGENIO)"
product "INF-01-00"
product_blacklist "Universal Xport"
path_grouping_policy "group_by_prio"
path_checker "rdac"
features "2 pg_init_retries 50"
hardware_handler "1 rdac"
prio "rdac"
failback "immediate"
rr_weight "uniform"
no_path_retry 30
retain_attached_hw_handler "yes"
detect_prio "yes"
}
}

On Mon, Mar 28, 2016 at 11:23 PM, Stu Midgley  wrote:
> upgrade your IS5600 firmware.  We were seeing this till we upgraded to
> the latest NetApp firmware.
>
> On Mon, Mar 28, 2016 at 10:30 PM, Ben Evans  wrote:
>> You're getting multipathing errors, which means it's most likely not a
>> filesystem-level issue.  See if you can get the logs from the storage array
>> as well, there might be some detail there as to what is happening.
>>
>> Can you check your logs and determine if it's a single connection that is
>> always failing?  If so, can you try replacing the cable and see if that
>> clears it up?  Next would be checking to make sure that the source and
>> destination SAS ports are good.
>>
>> -Ben Evans
>>
>> From: lustre-discuss  on behalf of
>> Angelo Cavalcanti 
>> Date: Monday, March 28, 2016 at 10:01 AM
>> To: "lustre-discuss@lists.lustre.org" 
>> Subject: [lustre-discuss] Error Lustre/multipath/storage
>>
>> Dear all,
>>
>> We're having trouble with a lustre 2.5.3 implementation. This is our setup:
>>
>>
>> One server for MGS/MDS/MDT. MDT is served from a raid-6 backed partition of
>> 2TB (que tipo de hd?)
>>
>>
>> Two OSS/OST in a active/active HA with pacemaker. Both are connected to a
>> storage via SAS.
>>
>>
>> One SGI Infinite Storage IS5600 with two raid-6 backed volume groups. Each
>> group has two volumes, each volume has 15TB capacity.
>>
>>
>> Volumes are recognized by OSSs as multipath devices, each voulme has 4
>> paths. Volumes were created with a GPT partition table and a single
>> partition.
>>
>>
>> Volume partitions were then formatted as OSTs with the following command:
>>
>>
>> # mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E
>> stride=128,stripe_width=1024"
>> --mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1
>> --mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1
>> --servicenode=10.149.0.152@o2ib1
>> /dev/mapper/360080e500029eaec012656951fcap1
>>
>>
>> Testing with bonnie++ in a client with the below command:
>>
>> $ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0 -f
>> -b -u vhpc
>>
>>
>> No problem creating files inside the lustre mount point, but *rewriting* the
>> same files results in the errors below:
>>
>>
>> Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed
>>
>> Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca:
>> remaining active paths: 3
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result:
>> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06
>> d8 22 00 20 00 00
>>
>> Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed
>>
>> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:128.
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
>> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 07
>> 18 22 00 18 00 00
>>
>> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:192.
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result:
>> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>>
>> Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 06
>> d8 22 00 20 00 00
>>
>> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code
>>
>> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result:
>> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>>
>> Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 07
>> 18 22 00 18 00 00
>>
>> Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:64.
>>
>> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code
>>
>> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result:
>> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
>>
>> Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] 

Re: [lustre-discuss] Error Lustre/multipath/storage

2016-03-28 Thread Ben Evans
You're getting multipathing errors, which means it's most likely not a 
filesystem-level issue.  See if you can get the logs from the storage array as 
well, there might be some detail there as to what is happening.

Can you check your logs and determine if it's a single connection that is 
always failing?  If so, can you try replacing the cable and see if that clears 
it up?  Next would be checking to make sure that the source and destination SAS 
ports are good.

-Ben Evans

From: lustre-discuss 
>
 on behalf of Angelo Cavalcanti 
>
Date: Monday, March 28, 2016 at 10:01 AM
To: "lustre-discuss@lists.lustre.org" 
>
Subject: [lustre-discuss] Error Lustre/multipath/storage


Dear all,

We're having trouble with a lustre 2.5.3 implementation. This is our setup:


  *   One server for MGS/MDS/MDT. MDT is served from a raid-6 backed partition 
of 2TB (que tipo de hd?)


  *   Two OSS/OST in a active/active HA with pacemaker. Both are connected to a 
storage via SAS.


  *   One SGI Infinite Storage IS5600 with two raid-6 backed volume groups. 
Each group has two volumes, each volume has 15TB capacity.


Volumes are recognized by OSSs as multipath devices, each voulme has 4 paths. 
Volumes were created with a GPT partition table and a single partition.


Volume partitions were then formatted as OSTs with the following command:


# mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E 
stride=128,stripe_width=1024" 
--mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1 
--mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1 
--servicenode=10.149.0.152@o2ib1 /dev/mapper/360080e500029eaec012656951fcap1


Testing with bonnie++ in a client with the below command:

$ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0 -f -b 
-u vhpc


No problem creating files inside the lustre mount point, but *rewriting* the 
same files results in the errors below:


Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed

Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec012656951fca: remaining 
active paths: 3

Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:128.

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:192.

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:64.

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0.

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 06 d8 
22 00 20 00 00

Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: sdi - rdac 
checker reports path is up

Mar 18 17:46:14 oss01 multipathd: 8:128: reinstated

Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec012656951fca: remaining 
active paths: 4

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 07 18 
22 00 18 00 00

Mar 18 17:46:14 oss01 kernel: device-mapper: multipath: Failing path 8:128.

Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code

Mar