Re: [zfs-discuss] MPxIO n00b question

2012-05-30 Thread Sašo Kiselkov
On 05/25/2012 08:40 PM, Richard Elling wrote:
 See the soluion at https://www.illumos.org/issues/644
  -- richard

And predictably, I'm back with another n00b question regarding this
array. I've put a pair of LSI-9200-8e controllers in the server and
attached the cables to the enclosure to each of the HBAs. As a result
(why?) I'm getting some really strange behavior:

 * piss poor performance (around 5MB/s per disk tops)
 * fmd(1M) running one core at near 100% saturation each time something
   writes or reads from the pool
 * using fmstat I noticed that its the eft module receiving hundreds of
   fault reports every second
 * fmd is flooded by multipath failover ereports like:

...
May 29 21:11:44.9408 ereport.io.scsi.cmd.disk.tran
May 29 21:11:44.9423 ereport.io.scsi.cmd.disk.tran
May 29 21:11:44.8474 ereport.io.scsi.cmd.disk.recovered
May 29 21:11:44.9455 ereport.io.scsi.cmd.disk.tran
May 29 21:11:44.9457 ereport.io.scsi.cmd.disk.dev.rqs.derr
May 29 21:11:44.9462 ereport.io.scsi.cmd.disk.tran
May 29 21:11:44.9527 ereport.io.scsi.cmd.disk.tran
May 29 21:11:44.9535 ereport.io.scsi.cmd.disk.dev.rqs.derr
May 29 21:11:44.6362 ereport.io.scsi.cmd.disk.recovered
...



I suspect that multipath is something not exactly very happy with my
Toshiba disks, but I have no idea what to do to make it work at least
somehow acceptably. I tried messing with scsi_vhci.conf to try and set
load-balance=none, change the scsi-vhci-failover-override for the
Toshiba disks to f_asym_lsi, flashing the latest as well as old firmware
in the cards, reseating them to other PCI-e slots, removing one cable
and even removing one whole HBA, unloading the eft fmd module etc, but
nothing helped so far and I'm sort of out of ideas. Anybody else got an
idea on what I might try?

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPxIO n00b question

2012-05-30 Thread Richard Elling
On May 30, 2012, at 1:07 PM, Sašo Kiselkov wrote:

 On 05/25/2012 08:40 PM, Richard Elling wrote:
 See the soluion at https://www.illumos.org/issues/644
 -- richard
 
 And predictably, I'm back with another n00b question regarding this
 array. I've put a pair of LSI-9200-8e controllers in the server and
 attached the cables to the enclosure to each of the HBAs. As a result
 (why?) I'm getting some really strange behavior:
 
 * piss poor performance (around 5MB/s per disk tops)
 * fmd(1M) running one core at near 100% saturation each time something
   writes or reads from the pool
 * using fmstat I noticed that its the eft module receiving hundreds of
   fault reports every second
 * fmd is flooded by multipath failover ereports like:
 
 ...
 May 29 21:11:44.9408 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9423 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.8474 ereport.io.scsi.cmd.disk.recovered
 May 29 21:11:44.9455 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9457 ereport.io.scsi.cmd.disk.dev.rqs.derr
 May 29 21:11:44.9462 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9527 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9535 ereport.io.scsi.cmd.disk.dev.rqs.derr
 May 29 21:11:44.6362 ereport.io.scsi.cmd.disk.recovered
 ...
 
 
 
 I suspect that multipath is something not exactly very happy with my
 Toshiba disks, but I have no idea what to do to make it work at least
 somehow acceptably. I tried messing with scsi_vhci.conf to try and set
 load-balance=none, change the scsi-vhci-failover-override for the
 Toshiba disks to f_asym_lsi, flashing the latest as well as old firmware
 in the cards, reseating them to other PCI-e slots, removing one cable
 and even removing one whole HBA, unloading the eft fmd module etc, but
 nothing helped so far and I'm sort of out of ideas. Anybody else got an
 idea on what I might try?

Those ereports are consistent with faulty cabling. You can trace all of the
cables and errors using tools like lsiutil, sg_logs, kstats, etc. Unfortunately,
it is not really possible to get into this level of detail over email, and it 
can
consume many hours.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPxIO n00b question

2012-05-30 Thread Sašo Kiselkov
On 05/30/2012 10:53 PM, Richard Elling wrote:
 On May 30, 2012, at 1:07 PM, Sašo Kiselkov wrote:
 
 On 05/25/2012 08:40 PM, Richard Elling wrote:
 See the soluion at https://www.illumos.org/issues/644
 -- richard

 And predictably, I'm back with another n00b question regarding this
 array. I've put a pair of LSI-9200-8e controllers in the server and
 attached the cables to the enclosure to each of the HBAs. As a result
 (why?) I'm getting some really strange behavior:

 * piss poor performance (around 5MB/s per disk tops)
 * fmd(1M) running one core at near 100% saturation each time something
   writes or reads from the pool
 * using fmstat I noticed that its the eft module receiving hundreds of
   fault reports every second
 * fmd is flooded by multipath failover ereports like:

 ...
 May 29 21:11:44.9408 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9423 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.8474 ereport.io.scsi.cmd.disk.recovered
 May 29 21:11:44.9455 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9457 ereport.io.scsi.cmd.disk.dev.rqs.derr
 May 29 21:11:44.9462 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9527 ereport.io.scsi.cmd.disk.tran
 May 29 21:11:44.9535 ereport.io.scsi.cmd.disk.dev.rqs.derr
 May 29 21:11:44.6362 ereport.io.scsi.cmd.disk.recovered
 ...



 I suspect that multipath is something not exactly very happy with my
 Toshiba disks, but I have no idea what to do to make it work at least
 somehow acceptably. I tried messing with scsi_vhci.conf to try and set
 load-balance=none, change the scsi-vhci-failover-override for the
 Toshiba disks to f_asym_lsi, flashing the latest as well as old firmware
 in the cards, reseating them to other PCI-e slots, removing one cable
 and even removing one whole HBA, unloading the eft fmd module etc, but
 nothing helped so far and I'm sort of out of ideas. Anybody else got an
 idea on what I might try?
 
 Those ereports are consistent with faulty cabling. You can trace all of the
 cables and errors using tools like lsiutil, sg_logs, kstats, etc. 
 Unfortunately,
 it is not really possible to get into this level of detail over email, and it 
 can
 consume many hours.
  -- richard

That's actually a pretty good piece of information for me! I will try
changing my cabling to see if I can get the errors to go away. Thanks
again for the suggestions!

Cheers
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPxIO n00b question

2012-05-30 Thread Sašo Kiselkov
On 05/30/2012 10:53 PM, Richard Elling wrote:
 Those ereports are consistent with faulty cabling. You can trace all of the
 cables and errors using tools like lsiutil, sg_logs, kstats, etc. 
 Unfortunately,
 it is not really possible to get into this level of detail over email, and it 
 can
 consume many hours.
  -- richard

And it turns out you were right. Looking at errors using iostat -E while
manipulating the path taken by the data using mpathadm clearly shows
that one of the paths is faulty. Thanks again for pointing me in the
right direction!

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] MPxIO n00b question

2012-05-25 Thread Sašo Kiselkov
I'm currently trying to get a SuperMicro JBOD with dual SAS expander
chips running in MPxIO, but I'm a total amateur to this and would like
to ask about how to detect whether MPxIO is working (or not).

My SAS topology is:

 *) One LSI SAS2008-equipped HBA (running the latest IT firmware from
LSI) with two external ports.
 *) Two SAS cables running from the HBA to the SuperMicro JBOD, where
they enter the JBOD's rear backplane (which is equipped with two
LSI SAS expander chips).
 *) From the rear backplane, via two internal SAS cables to the front
backplane (also with two SAS expanders on it)
 *) The JBOD is populated with 45 2TB Toshiba SAS 7200rpm drives

The machine also has a PERC H700 for the boot media, configured into a
hardware RAID-1 (on which rpool resides).

Here is the relevant part from cfgadm -al for the MPxIO bits:

c5 scsi-sas connectedconfigured
unknown
c5::dsk/c5t5393D8CB4452d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8C90CF2d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF2A6d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF2AAd0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF2BEd0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF2C6d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF2E2d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF2F2d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF5C6d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF28Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF32Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF35Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF35Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF36Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF36Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF52Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF53Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF53Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF312d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF316d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF506d0  disk connectedconfigured
unknown
c5::dsk/c5t5393E8CAF546d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C84F5Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C84FBAd0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C851EEd0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C852A6d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C852C2d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C852CAd0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C852EAd0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C854BAd0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C854E2d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C855AAd0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8509Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8520Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8528Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8530Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8531Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8557Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8558Ed0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C8560Ad0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C85106d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C85222d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C85246d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C85366d0  disk connectedconfigured
unknown
c5::dsk/c5t5393F8C85636d0  disk connectedconfigured
unknown
c5::es/ses0ESI  connectedconfigured
unknown
c5::es/ses1ESI  connectedconfigured
unknown
c5::smp/expd0  smp  connectedconfigured
unknown
c5::smp/expd1  smp  connectedconfigured
unknown
c6 scsi-sas connectedconfigured
unknown
c6::dsk/c6t5393D8CB4453d0  disk connectedconfigured
unknown
c6::dsk/c6t5393E8C90CF3d0  disk connectedconfigured
unknown
c6::dsk/c6t5393E8CAF2A7d0  disk connectedconfigured
unknown
c6::dsk/c6t5393E8CAF2ABd0  disk 

Re: [zfs-discuss] MPxIO n00b question

2012-05-25 Thread Jim Klimov

Sorry I can't comment on MPxIO, except that I thought zfs
could by itself discern two paths to the same drive, if
only to protect against double-importing the disk into pool.

2012-05-25 21:07, Sašo Kiselkov wrote:
 I'd like to create a 5x 9-drive raidz's on the JBOD, but

I'm not sure how to do it now that I can see each drive twice...


I am not sure it is a good idea to use such low protection
(raidz1) with large drives. At least, I was led to believe
that after 2Tb in size raidz2 is preferable, and raidz3 is
optimal due to long scrub/resilver times leading to large
timeframes that a pool with an error is exposed to possible
fatal errors (due to double-failures with single-protection).

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPxIO n00b question

2012-05-25 Thread Sašo Kiselkov
On 05/25/2012 07:35 PM, Jim Klimov wrote:
 Sorry I can't comment on MPxIO, except that I thought zfs could by
 itself discern two paths to the same drive, if only to protect
 against double-importing the disk into pool.

Unfortunately, it isn't the same thing. MPxIO provides redundant
signaling to the drives, independent of the storage/RAID layer above
it, so it does have its place (besides simply increasing throughput).

 I am not sure it is a good idea to use such low protection (raidz1)
 with large drives. At least, I was led to believe that after 2Tb in
 size raidz2 is preferable, and raidz3 is optimal due to long
 scrub/resilver times leading to large timeframes that a pool with
 an error is exposed to possible fatal errors (due to
 double-failures with single-protection).

I'd use lower protection if it were available :) The data on that
array is not very important, the primary design parameter is low cost
per MB. We're in a very demanding IO environment, we need large
quantities of high-throughput, high-IOPS storage, but we don't need
stellar reliability. If the pool gets corrupted due to unfortunate
double-drive failure, well, that's tough, but not unbearable (the pool
stores customer channel recordings for nPVR, so nothing critical really).

--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPxIO n00b question

2012-05-25 Thread Jim Klimov

2012-05-25 21:45, Sašo Kiselkov wrote:

On 05/25/2012 07:35 PM, Jim Klimov wrote:

Sorry I can't comment on MPxIO, except that I thought zfs could by
itself discern two paths to the same drive, if only to protect
against double-importing the disk into pool.


Unfortunately, it isn't the same thing. MPxIO provides redundant
signaling to the drives, independent of the storage/RAID layer above
it, so it does have its place (besides simply increasing throughput).


Yes, I know - I just don't have hands-on experience with that
in Solaris (and limited in Linux), not so many double-link
boxes around here :)


I'd use lower protection if it were available :)

 The data on that array is not very important, the primary design
 parameter is low cost per MB.

Why not just stripe it all then? That would give good speeds ;)
Arguably, mirroring would indeed cost about twice as much per MB,
but as a tradeoff which may be useful to you - it can also give
a lot more IOPS due to more TLVDEVs being available for striping,
and doubling read speeds due to mirroring.


We're in a very demanding IO environment, we need large
quantities of high-throughput, high-IOPS storage, but we don't need
stellar reliability.


Does your array include SSD L2ARC caches?

I guess (and want to be corrected if wrong) - since ZFS can tolerate
loss of L2ARCs so much that mirroring them is not even supported,
you may get away with several single-link SSDs connected to one or
another controller (or likely a dedicated one other than those
driving the disk arrays - since IOPS on the few SSDs will be higher
than on tens of disks). Likely you shouldn't connect those single
link (SATA) SSDs to the dual-link backplane either - i.e. mount
them in the server chassis, not in the JBOD box.

I may be wrong though :)


If the pool gets corrupted due to unfortunate
double-drive failure, well, that's tough, but not unbearable (the pool
stores customer channel recordings for nPVR, so nothing critical really).


//Jim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPxIO n00b question

2012-05-25 Thread Richard Elling
See the soluion at https://www.illumos.org/issues/644
 -- richard

On May 25, 2012, at 10:07 AM, Sašo Kiselkov wrote:

 I'm currently trying to get a SuperMicro JBOD with dual SAS expander
 chips running in MPxIO, but I'm a total amateur to this and would like
 to ask about how to detect whether MPxIO is working (or not).
 
 My SAS topology is:
 
 *) One LSI SAS2008-equipped HBA (running the latest IT firmware from
LSI) with two external ports.
 *) Two SAS cables running from the HBA to the SuperMicro JBOD, where
they enter the JBOD's rear backplane (which is equipped with two
LSI SAS expander chips).
 *) From the rear backplane, via two internal SAS cables to the front
backplane (also with two SAS expanders on it)
 *) The JBOD is populated with 45 2TB Toshiba SAS 7200rpm drives
 
 The machine also has a PERC H700 for the boot media, configured into a
 hardware RAID-1 (on which rpool resides).
 
 Here is the relevant part from cfgadm -al for the MPxIO bits:
 
 c5 scsi-sas connectedconfigured
 unknown
 c5::dsk/c5t5393D8CB4452d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8C90CF2d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF2A6d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF2AAd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF2BEd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF2C6d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF2E2d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF2F2d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF5C6d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF28Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF32Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF35Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF35Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF36Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF36Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF52Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF53Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF53Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF312d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF316d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF506d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393E8CAF546d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C84F5Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C84FBAd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C851EEd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C852A6d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C852C2d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C852CAd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C852EAd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C854BAd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C854E2d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C855AAd0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8509Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8520Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8528Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8530Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8531Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8557Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8558Ed0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C8560Ad0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C85106d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C85222d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C85246d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C85366d0  disk connectedconfigured
 unknown
 c5::dsk/c5t5393F8C85636d0  disk connectedconfigured
 unknown
 c5::es/ses0ESI  connectedconfigured
 unknown
 c5::es/ses1ESI  connectedconfigured
 unknown
 c5::smp/expd0  smp  connectedconfigured
 unknown
 c5::smp/expd1  smp  connectedconfigured
 unknown
 c6 scsi-sas connectedconfigured
 unknown
 c6::dsk/c6t5393D8CB4453d0 

Re: [zfs-discuss] MPxIO n00b question

2012-05-25 Thread Sašo Kiselkov
On 05/25/2012 08:40 PM, Richard Elling wrote:
 See the soluion at https://www.illumos.org/issues/644 -- richard

Good Lord, that was it! It never occurred to me that the drives had a
say in this. Thanks a billion!

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss