Re: [zfs-discuss] mpt errors on snv 127

2009-12-08 Thread Chad Cantwell
fyi to everyone, the Asus P5W64 motherboard previously in my opensolaris machine
was the culprit, and not the general mpt issues.  At the time the motherboard 
was
originally put in that machine, there was not enough zfs i/o load to trigger the
problem which led to the false impression the hardware was fine.  I'm using a
5400 chipset xeon board now (asus dseb-gh) and my LSI cards are working 
perfectly
again; over 2 hours of heavy I/O and no errors or warnings with snv 127 (with 
the
P5W64/LSI combo with build 127 it would never run more than 15 minutes without
warnings).  I chose this board partly since it has PCI-X slots and I thought 
those
might be useful for AOC-SAT2-MV8 cards if I couldn't shake the mpt issues, but 
now
that the mpt issues are gone I can continue with that controller if I want.

Thanks everyone for your help,
Chad


On Sun, Dec 06, 2009 at 11:12:50PM -0800, Chad Cantwell wrote:
 Thanks for the info on the yukon driver.  I realize too many variables makes
 things impossible to determine, but I had made these hardware changes awhile
 back, and they seemed to work fine at the time.  Since they aren't now, even
 in the older OpenSolaris (i've tried 2009.06 and 2008.11 now), the problem
 seems to be a hardware quirk, and the only way to narrow that down is to
 change hardware back until it works like it used to in at least the older
 snv builds.  I've ruled out the ethernet controller.  I'm leaning toward
 the current motherboard (Asus P5W64) not playing nicely with the LSI cards,
 but it will probably be several days until I get to the bottom of this since
 it takes awhile to test after making a change...
 
 Thanks,
 Chad
 
 On Mon, Dec 07, 2009 at 11:09:39AM +1000, James C. McPherson wrote:
  
  
  Gday Chad,
  the more swaptronics you partake in, the more difficult it
  is going to be for us (collectively) to figure out what is
  going wrong on your system. Btw, since you're running a build
  past 124, you can use the yge driver instead of the yukonx
  (from Marvell) or myk (from Murayama-san) drivers.
  
  As another comment in this thread has mentioned, a full scrub
  can be a serious test of your hardware depending on how much
  data you've got to walk over. If you can keep the hardware
  variables to a minimum then clarity will be more achievable.
  
  
  thankyou,
  James C. McPherson
  --
  Senior Kernel Software Engineer, Solaris
  Sun Microsystems
  http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-06 Thread Chad Cantwell
Thanks for the info on the yukon driver.  I realize too many variables makes
things impossible to determine, but I had made these hardware changes awhile
back, and they seemed to work fine at the time.  Since they aren't now, even
in the older OpenSolaris (i've tried 2009.06 and 2008.11 now), the problem
seems to be a hardware quirk, and the only way to narrow that down is to
change hardware back until it works like it used to in at least the older
snv builds.  I've ruled out the ethernet controller.  I'm leaning toward
the current motherboard (Asus P5W64) not playing nicely with the LSI cards,
but it will probably be several days until I get to the bottom of this since
it takes awhile to test after making a change...

Thanks,
Chad

On Mon, Dec 07, 2009 at 11:09:39AM +1000, James C. McPherson wrote:
 
 
 Gday Chad,
 the more swaptronics you partake in, the more difficult it
 is going to be for us (collectively) to figure out what is
 going wrong on your system. Btw, since you're running a build
 past 124, you can use the yge driver instead of the yukonx
 (from Marvell) or myk (from Murayama-san) drivers.
 
 As another comment in this thread has mentioned, a full scrub
 can be a serious test of your hardware depending on how much
 data you've got to walk over. If you can keep the hardware
 variables to a minimum then clarity will be more achievable.
 
 
 thankyou,
 James C. McPherson
 --
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-05 Thread Chad Cantwell
Hi all,

Unfortunately for me, there does seem to be a hardware component to my problem. 
 Although my rsync copied almost 4TB of
data with no iostat errors after going back to OpenSolaris 2009.06, I/O on one 
of my mpt cards did eventually hang, with
6 disk lights on and 2 off, until rebooting.  There are a few hardware changes 
made since the last time I did a full
backup, so it's possible that whatever problem was introduced didn't happen 
frequently enough in low i/o usage for
me to detect until now when I was reinstalling and copying massive amounts of 
data back.

The changes I had made since originally installing osol2009.06 several months 
ago are:

- stop using marvel yukon2 ethernet onboard driver (which used a 3rd party 
driver) in favor of intel 1000 pt dual port,
which necessesitated an extra pci-e slot, prompting the following item:
- swapped motherboards between 2 machines (they were similiar though, with 
similiar onboard hardware and shouldn't
have been a major change).  Originally was an Asus P5Q Deluxe w/3 pci-e slots, 
now is a slightly older Asus P5W64 w/4
pci-e slots.
- the intel 1000 pt dual port card has been aggregated as aggr0 since it was 
installed (the older yukon2 was a basic
interface)

the above changes were what was done awhile ago before upgrading opensolaris to 
127, and things seemed to be working fine
for at least 2-3 months with rsync updating (never hung, or had a fatal zfs 
error or lost access to data requiring a reboot)

new changes since troubleshooting snv 127 mpt issues:
- upgrade LSI 3081 firmware from 1.28.2 (or was it .02) to 1.29, the latest.  
If this turns out to be an issue, I do have
the previous IT firmware that I was using before which I can flash back.

another, albeit unlikely factor: when I originally copied all my data to my 
first opensolaris raidz2 pool, I didn't use
rsync at all, I used netcat  tar, and only setup rsync later for updates.  
perhaps the huge initial single rsync of
the large tree does something strange that the original intiial netcat  tar 
copy did not (i know, unlikely, but I'm
grasping at straws here to determine what has happened).

I'll work on ruling out the potential sources of hardware problems before I 
report any more on the mpt issues, since
my test case would probably confound things at this point.  I am affected by 
the mpt bugs since I would get the
timeouts almost constantly in snv 127+, but since I'm also apparently affected 
by some other unknown hardware issue,
my data on the mpt problems might lead people in the wrong direction at this 
point.

I will first try to go back to the non-aggregated yukon ethernet, remove the 
intel dual port pci-e network adapter,
then if the problem persists try half of my drives on each LSI controller 
individually to confirm if one controller
has a problem the other does not, or one drive in one set is causing a new 
problem to a particular controller.  I hope
to have some kind of answer at that point and not have to resort to motherboard 
swapping again.

Chad

On Thu, Dec 03, 2009 at 10:44:53PM -0800, Chad Cantwell wrote:
 I eventually performed a few more tests, adjusting some zfs tuning options 
 which had no effect, and trying the
 itmpt driver which someone had said would work, and regardless my system 
 would always freeze quite rapidly in
 snv 127 and 128a.  Just to double check my hardware, I went back to the 
 opensolaris 2009.06 release version, and
 everything is working fine.  The system has been running a few hours and 
 copied a lot of data and not had any
 trouble, mpt syslog events, or iostat errors.
 
 One thing I found interesting, and I don't know if it's significant or not, 
 is that under the recent builds and
 under 2009.06, I had run echo '::interrupts' | mdb -k to check the 
 interrupts used.  (I don't have the printout
 handy for snv 127+, though).
 
 I have a dual port gigabit Intel 1000 P PCI-e card, which shows up as e1000g0 
 and e1000g1.  In snv 127+, each of
 my e1000g devices shares an IRQ with my mpt devices (mpt0, mpt1) on the IRQ 
 listing, whereas in opensolaris
 2009.06, all 4 devices are on different IRQs.  I don't know if this is 
 significant, but most of my testing when
 I encountered errors was data transfer via the network, so it could have 
 potentially been interfering with the
 mpt drivers when it was on the same IRQ.  The errors did seem to be less 
 frequent when the server I was copying
 from was linked at 100 instead of 1000 (one of my tests), but that is as 
 likely to be a result of the slower zpool
 throughput as it is to be related to the network traffic.
 
 I'll probably stay with 2009.06 for now since it works fine for me, but I can 
 try a newer build again once some
 more progress is made in this area and people want to see if its fixed (this 
 machine is mainly to backup another
 array so it's not too big a deal to test later when the mpt drivers are 
 looking better and wipe again in the event
 of problems)
 
 Chad
 
 

Re: [zfs-discuss] mpt errors on snv 127

2009-12-03 Thread Chad Cantwell
I eventually performed a few more tests, adjusting some zfs tuning options 
which had no effect, and trying the
itmpt driver which someone had said would work, and regardless my system would 
always freeze quite rapidly in
snv 127 and 128a.  Just to double check my hardware, I went back to the 
opensolaris 2009.06 release version, and
everything is working fine.  The system has been running a few hours and copied 
a lot of data and not had any
trouble, mpt syslog events, or iostat errors.

One thing I found interesting, and I don't know if it's significant or not, is 
that under the recent builds and
under 2009.06, I had run echo '::interrupts' | mdb -k to check the interrupts 
used.  (I don't have the printout
handy for snv 127+, though).

I have a dual port gigabit Intel 1000 P PCI-e card, which shows up as e1000g0 
and e1000g1.  In snv 127+, each of
my e1000g devices shares an IRQ with my mpt devices (mpt0, mpt1) on the IRQ 
listing, whereas in opensolaris
2009.06, all 4 devices are on different IRQs.  I don't know if this is 
significant, but most of my testing when
I encountered errors was data transfer via the network, so it could have 
potentially been interfering with the
mpt drivers when it was on the same IRQ.  The errors did seem to be less 
frequent when the server I was copying
from was linked at 100 instead of 1000 (one of my tests), but that is as likely 
to be a result of the slower zpool
throughput as it is to be related to the network traffic.

I'll probably stay with 2009.06 for now since it works fine for me, but I can 
try a newer build again once some
more progress is made in this area and people want to see if its fixed (this 
machine is mainly to backup another
array so it's not too big a deal to test later when the mpt drivers are looking 
better and wipe again in the event
of problems)

Chad

On Tue, Dec 01, 2009 at 03:06:31PM -0800, Chad Cantwell wrote:
 To update everyone, I did a complete zfs scrub, and it it generated no errors 
 in iostat, and I have 4.8T of
 data on the filesystem so it was a fairly lengthy test.  The machine also has 
 exhibited no evidence of
 instability.  If I were to start copying a lot of data to the filesystem 
 again though, I'm sure it would
 generate errors and crash again.
 
 Chad
 
 
 On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote:
  Well, ok, the msi=0 thing didn't help after all.  A few minutes after my 
  last message a few errors showed
  up in iostat, and then in a few minutes more the machine was locked up 
  hard...  Maybe I will try just
  doing a scrub instead of my rsync process and see how that does.
  
  Chad
  
  
  On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote:
   I don't think the hardware has any problems, it only started having 
   errors when I upgraded OpenSolaris.
   It's still working fine again now after a reboot.  Actually, I reread one 
   of your earlier messages,
   and I didn't realize at first when you said non-Sun JBOD that this 
   didn't apply to me (in regards to
   the msi=0 fix) because I didn't realize JBOD was shorthand for an 
   external expander device.  Since
   I'm just using baremetal, and passive backplanes, I think the msi=0 fix 
   should apply to me based on
   what you wrote earlier, anyway I've put 
 set mpt:mpt_enable_msi = 0
   now in /etc/system and rebooted as it was suggested earlier.  I've 
   resumed my rsync, and so far there
   have been no errors, but it's only been 20 minutes or so.  I should have 
   a good idea by tomorrow if this
   definitely fixed the problem (since even when the machine was not 
   crashing it was tallying up iostat errors
   fairly rapidly)
   
   Thanks again for your help.  Sorry for wasting your time if the 
   previously posted workaround fixes things.
   I'll let you know tomorrow either way.
   
   Chad
   
   On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
Chad Cantwell wrote:
After another crash I checked the syslog and there were some different 
errors than the ones
I saw previously during operation:
...

Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not 
supported.
...
Nov 30 20:59:13 the-vault   mpt_config_space_init failed
...
Nov 30 20:59:15 the-vault   mpt_restart_ioc failed


Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
System-Serial-Number, HOSTNAME: the-vault
Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
Nov 30 21:33:02 the-vault EVENT-ID: 
7886cc0d-4760-60b2-e06a-8158c3334f63
Nov 30 21:33:02 the-vault DESC: The transmitting device sent an 
invalid request.
Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R 
for more information.
Nov 30 

Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Chad Cantwell
I don't think the hardware has any problems, it only started having errors when 
I upgraded OpenSolaris.
It's still working fine again now after a reboot.  Actually, I reread one of 
your earlier messages,
and I didn't realize at first when you said non-Sun JBOD that this didn't 
apply to me (in regards to
the msi=0 fix) because I didn't realize JBOD was shorthand for an external 
expander device.  Since
I'm just using baremetal, and passive backplanes, I think the msi=0 fix should 
apply to me based on
what you wrote earlier, anyway I've put 
set mpt:mpt_enable_msi = 0
now in /etc/system and rebooted as it was suggested earlier.  I've resumed my 
rsync, and so far there
have been no errors, but it's only been 20 minutes or so.  I should have a good 
idea by tomorrow if this
definitely fixed the problem (since even when the machine was not crashing it 
was tallying up iostat errors
fairly rapidly)

Thanks again for your help.  Sorry for wasting your time if the previously 
posted workaround fixes things.
I'll let you know tomorrow either way.

Chad

On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
 Chad Cantwell wrote:
 After another crash I checked the syslog and there were some different 
 errors than the ones
 I saw previously during operation:
 ...
 
 Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not supported.
 ...
 Nov 30 20:59:13 the-vault   mpt_config_space_init failed
 ...
 Nov 30 20:59:15 the-vault   mpt_restart_ioc failed
 
 
 Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
 PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
 Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
 Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
 System-Serial-Number, HOSTNAME: the-vault
 Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
 Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63
 Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid 
 request.
 Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R for 
 more information.
 Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may be 
 disabled
 Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device 
 instances associated with this fault
 Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and 
 patches are installed. Otherwise schedule a repair procedure to replace the 
 affected device(s).  Us
 e fmadm faulty to identify the devices or contact Sun for support.
 
 
 Sorry to have to tell you, but that HBA is dead. Or at
 least dying horribly. If you can't init the config space
 (that's the pci bus config space), then you've got about
 1/2 the nails in the coffin hammered in. Then the failure
 to restart the IOC (io controller unit) == the rest of
 the lid hammered down.
 
 
 best regards,
 James C. McPherson
 --
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Chad Cantwell
Well, ok, the msi=0 thing didn't help after all.  A few minutes after my last 
message a few errors showed
up in iostat, and then in a few minutes more the machine was locked up hard...  
Maybe I will try just
doing a scrub instead of my rsync process and see how that does.

Chad


On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote:
 I don't think the hardware has any problems, it only started having errors 
 when I upgraded OpenSolaris.
 It's still working fine again now after a reboot.  Actually, I reread one of 
 your earlier messages,
 and I didn't realize at first when you said non-Sun JBOD that this didn't 
 apply to me (in regards to
 the msi=0 fix) because I didn't realize JBOD was shorthand for an external 
 expander device.  Since
 I'm just using baremetal, and passive backplanes, I think the msi=0 fix 
 should apply to me based on
 what you wrote earlier, anyway I've put 
   set mpt:mpt_enable_msi = 0
 now in /etc/system and rebooted as it was suggested earlier.  I've resumed my 
 rsync, and so far there
 have been no errors, but it's only been 20 minutes or so.  I should have a 
 good idea by tomorrow if this
 definitely fixed the problem (since even when the machine was not crashing it 
 was tallying up iostat errors
 fairly rapidly)
 
 Thanks again for your help.  Sorry for wasting your time if the previously 
 posted workaround fixes things.
 I'll let you know tomorrow either way.
 
 Chad
 
 On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
  Chad Cantwell wrote:
  After another crash I checked the syslog and there were some different 
  errors than the ones
  I saw previously during operation:
  ...
  
  Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not supported.
  ...
  Nov 30 20:59:13 the-vault   mpt_config_space_init failed
  ...
  Nov 30 20:59:15 the-vault   mpt_restart_ioc failed
  
  
  Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
  PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
  Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
  Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
  System-Serial-Number, HOSTNAME: the-vault
  Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
  Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63
  Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid 
  request.
  Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R for 
  more information.
  Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may 
  be disabled
  Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device 
  instances associated with this fault
  Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and 
  patches are installed. Otherwise schedule a repair procedure to replace 
  the affected device(s).  Us
  e fmadm faulty to identify the devices or contact Sun for support.
  
  
  Sorry to have to tell you, but that HBA is dead. Or at
  least dying horribly. If you can't init the config space
  (that's the pci bus config space), then you've got about
  1/2 the nails in the coffin hammered in. Then the failure
  to restart the IOC (io controller unit) == the rest of
  the lid hammered down.
  
  
  best regards,
  James C. McPherson
  --
  Senior Kernel Software Engineer, Solaris
  Sun Microsystems
  http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Mark Nipper
This is basically just a me too.  I'm using different hardware but essentially 
the same problems.  The relevant hardware I have is:
---
SuperMicro MBD-H8Di3+-F-O motherboard with LSI 1068E onboard
SuperMicro SC846E2-R900B 4U chassis with two LSI SASx36 expander chips on the 
backplane
24 Western Digital RE4-GP 2TB 7.2k RPM SATA drives
---

I have two SFF-8087 to SFF-8087 cables running from the two ports on the 
motherboard (4 channels each) to two ports on the backplane, each port going to 
one of the LSI expander chips.  The backplane has four additional ports which 
support cascading additional enclosures together, but I'm not making use of any 
of this at the moment.

The machine is currently dead at the data center, and it's late, so if you want 
anything more from me, just let me know and I'll run stuff tomorrow on the 
machine.  But otherwise, the behavior sounds the same as all of the other MPT 
reports recently.  I was not seeing these types of problems with 2009.06, but 
also wanted to upgrade to get raidz3 support.

Just tell me what other commands you might want output from to help diagnose 
the problem.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Mark Johnson



Chad Cantwell wrote:

Hi,

 I was using for quite awhile OpenSolaris 2009.06
with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of
about ~20T and this worked perfectly fine (no issues or device errors
logged for several months, no hanging).  A few days ago I decided to
reinstall with the latest OpenSolaris in order to take advantage of
raidz3.  


Just to be clear... The same setup was working fine on osol2009.06,
you upgraded to b127 and it started failing?

Did you keep the osol2009.06 be around so you can reboot back to it?

If so, have you tried the osol2009.06 mpt driver in the
BE with the latest bits (make sure you make a backup copy
of the mpt driver)?



MRJ


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Mark Johnson



Mark Johnson wrote:



Chad Cantwell wrote:

Hi,

 I was using for quite awhile OpenSolaris 2009.06
with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of
about ~20T and this worked perfectly fine (no issues or device errors
logged for several months, no hanging).  A few days ago I decided to
reinstall with the latest OpenSolaris in order to take advantage of
raidz3.  


Just to be clear... The same setup was working fine on osol2009.06,
you upgraded to b127 and it started failing?

Did you keep the osol2009.06 be around so you can reboot back to it?

If so, have you tried the osol2009.06 mpt driver in the
BE with the latest bits (make sure you make a backup copy
of the mpt driver)?


What's the earliest build someone has seen this
problem? i.e. if we binary chop, has anyone seen it in
b118?

I have no idea if the old mpt drivers will work on a
new kernel... But if someone wants to try... Something
like the following should work...


# first, I would work out of a test BE in case you
# mess something up.
beadm create test-be
beadm activate test-be
reboot

# assuming your lasted BE is call snv127, mount it and backup
# the stock mpt driver and conf file.
beadm mount snv127 /mnt
cp /mnt/kernel/drv/mpt.conf /mnt/kernel/drv/mpt.conf.orig
cp /mnt/kernel/drv/amd64/mpt /mnt/kernel/drv/amd64/mpt.orig

# see what builds are out there...
pkg search /kernel/drv/amd64/mpt


# There's probably an easier way to do this...
# grab an older mpt. This will take a while since it's
# not in it's own package and ckr has some dependencies
# so it will pull in a bunch of other packages.
# change out 118 with the build you want to grab.
mkdir /tmp/mpt
pkg image-create -f -F -a opensolaris.org=http://pkg.opensolaris.org/dev 
/tmp/mpt
pkg -R /tmp/mpt/ install sunw...@0.5.11-0.118
cp /tmp/mpt/kernel/drv/mpt.conf /mnt/kernel/drv/mpt.conf
cp /tmp/mpt/kernel/drv/amd64/mpt /mnt/kernel/drv/amd64/mpt
rm -rf /tmp/mpt/
bootadm update-archive -R /mnt




MRJ



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Markus Kovero
We actually tried this, although using sol10-version of mpt-driver. 
Surprisingly it didn't work :-)

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Mark Johnson
Sent: 1. joulukuuta 2009 15:57
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] mpt errors on snv 127



Mark Johnson wrote:
 
 
 Chad Cantwell wrote:
 Hi,

  I was using for quite awhile OpenSolaris 2009.06
 with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of
 about ~20T and this worked perfectly fine (no issues or device errors
 logged for several months, no hanging).  A few days ago I decided to
 reinstall with the latest OpenSolaris in order to take advantage of
 raidz3.  
 
 Just to be clear... The same setup was working fine on osol2009.06,
 you upgraded to b127 and it started failing?
 
 Did you keep the osol2009.06 be around so you can reboot back to it?
 
 If so, have you tried the osol2009.06 mpt driver in the
 BE with the latest bits (make sure you make a backup copy
 of the mpt driver)?

What's the earliest build someone has seen this
problem? i.e. if we binary chop, has anyone seen it in
b118?

I have no idea if the old mpt drivers will work on a
new kernel... But if someone wants to try... Something
like the following should work...


# first, I would work out of a test BE in case you
# mess something up.
beadm create test-be
beadm activate test-be
reboot

# assuming your lasted BE is call snv127, mount it and backup
# the stock mpt driver and conf file.
beadm mount snv127 /mnt
cp /mnt/kernel/drv/mpt.conf /mnt/kernel/drv/mpt.conf.orig
cp /mnt/kernel/drv/amd64/mpt /mnt/kernel/drv/amd64/mpt.orig

# see what builds are out there...
pkg search /kernel/drv/amd64/mpt


# There's probably an easier way to do this...
# grab an older mpt. This will take a while since it's
# not in it's own package and ckr has some dependencies
# so it will pull in a bunch of other packages.
# change out 118 with the build you want to grab.
mkdir /tmp/mpt
pkg image-create -f -F -a opensolaris.org=http://pkg.opensolaris.org/dev 
/tmp/mpt
pkg -R /tmp/mpt/ install sunw...@0.5.11-0.118
cp /tmp/mpt/kernel/drv/mpt.conf /mnt/kernel/drv/mpt.conf
cp /tmp/mpt/kernel/drv/amd64/mpt /mnt/kernel/drv/amd64/mpt
rm -rf /tmp/mpt/
bootadm update-archive -R /mnt




MRJ



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Adam Cheal
 
 What's the earliest build someone has seen this
 problem? i.e. if we binary chop, has anyone seen it
 in
 b118?
 

We have used every stable build from b118 up, as b118 was the first reliable 
one that could be used is a CIFS-heavy environment. The problem occurs on all 
of them.

- Adam
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Travis Tabbal
If someone from Sun will confirm that it should work to use the mpt driver from 
2009.06, I'd be willing to set up a BE and try it. I still have the snapshot 
from my 2009.06 install, so I should be able to mount that and grab the files 
easily enough.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Carson Gaspar

Travis Tabbal wrote:

If someone from Sun will confirm that it should work to use the mpt
driver from 2009.06, I'd be willing to set up a BE and try it. I
still have the snapshot from my 2009.06 install, so I should be able
to mount that and grab the files easily enough.


I tried, it doesn't work. It's interesting to note that the itmpt driver 
(much, much older) works just fine. It seems someone has gotten 
creative with the mpt driver's use of the DDI.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Chad Cantwell
First I tried just upgrading to b127, that had a few issues besides the mpt 
driver.  After that
I did a clean install of b127, but no I don't have my osol2009.06 root still 
there.  I wasn't
sure how to install another copy and leave it there (I suspect it is possible, 
since I saw
when doing upgrades it creates a second root environment, but my forte isn't 
solaris so I
just reformatted the root device)

On Tue, Dec 01, 2009 at 08:09:32AM -0500, Mark Johnson wrote:
 
 
 Chad Cantwell wrote:
 Hi,
 
  I was using for quite awhile OpenSolaris 2009.06
 with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of
 about ~20T and this worked perfectly fine (no issues or device errors
 logged for several months, no hanging).  A few days ago I decided to
 reinstall with the latest OpenSolaris in order to take advantage of
 raidz3.
 
 Just to be clear... The same setup was working fine on osol2009.06,
 you upgraded to b127 and it started failing?
 
 Did you keep the osol2009.06 be around so you can reboot back to it?
 
 If so, have you tried the osol2009.06 mpt driver in the
 BE with the latest bits (make sure you make a backup copy
 of the mpt driver)?
 
 
 
 MRJ
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-01 Thread Chad Cantwell
To update everyone, I did a complete zfs scrub, and it it generated no errors 
in iostat, and I have 4.8T of
data on the filesystem so it was a fairly lengthy test.  The machine also has 
exhibited no evidence of
instability.  If I were to start copying a lot of data to the filesystem again 
though, I'm sure it would
generate errors and crash again.

Chad


On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote:
 Well, ok, the msi=0 thing didn't help after all.  A few minutes after my last 
 message a few errors showed
 up in iostat, and then in a few minutes more the machine was locked up 
 hard...  Maybe I will try just
 doing a scrub instead of my rsync process and see how that does.
 
 Chad
 
 
 On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote:
  I don't think the hardware has any problems, it only started having errors 
  when I upgraded OpenSolaris.
  It's still working fine again now after a reboot.  Actually, I reread one 
  of your earlier messages,
  and I didn't realize at first when you said non-Sun JBOD that this didn't 
  apply to me (in regards to
  the msi=0 fix) because I didn't realize JBOD was shorthand for an external 
  expander device.  Since
  I'm just using baremetal, and passive backplanes, I think the msi=0 fix 
  should apply to me based on
  what you wrote earlier, anyway I've put 
  set mpt:mpt_enable_msi = 0
  now in /etc/system and rebooted as it was suggested earlier.  I've resumed 
  my rsync, and so far there
  have been no errors, but it's only been 20 minutes or so.  I should have a 
  good idea by tomorrow if this
  definitely fixed the problem (since even when the machine was not crashing 
  it was tallying up iostat errors
  fairly rapidly)
  
  Thanks again for your help.  Sorry for wasting your time if the previously 
  posted workaround fixes things.
  I'll let you know tomorrow either way.
  
  Chad
  
  On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
   Chad Cantwell wrote:
   After another crash I checked the syslog and there were some different 
   errors than the ones
   I saw previously during operation:
   ...
   
   Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not supported.
   ...
   Nov 30 20:59:13 the-vault   mpt_config_space_init failed
   ...
   Nov 30 20:59:15 the-vault   mpt_restart_ioc failed
   
   
   Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
   PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
   Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
   Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
   System-Serial-Number, HOSTNAME: the-vault
   Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
   Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63
   Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid 
   request.
   Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R 
   for more information.
   Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances 
   may be disabled
   Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the 
   device instances associated with this fault
   Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and 
   patches are installed. Otherwise schedule a repair procedure to replace 
   the affected device(s).  Us
   e fmadm faulty to identify the devices or contact Sun for support.
   
   
   Sorry to have to tell you, but that HBA is dead. Or at
   least dying horribly. If you can't init the config space
   (that's the pci bus config space), then you've got about
   1/2 the nails in the coffin hammered in. Then the failure
   to restart the IOC (io controller unit) == the rest of
   the lid hammered down.
   
   
   best regards,
   James C. McPherson
   --
   Senior Kernel Software Engineer, Solaris
   Sun Microsystems
   http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-11-30 Thread James C. McPherson

Chad Cantwell wrote:

Hi,

Sorry for not replying to one of the already open threads on this topic;
I've just joined the list for the purposes of this discussion and have
nothing in my client to reply to yet.

I have an x86_64 opensolaris machine running on a Core 2 Quad Q9650
platform with two LSI SAS3081E-R PCI-E 8 port SAS controllers, with
8 drives each. 


Are these disks internal to your server's chassis, or external in
a jbod? If in a jbod, which one? Also, which cables are you using?


thankyou,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-11-30 Thread Chad Cantwell
Hi,

Replied to your previous general query already, but in summary, they are in the
server chassis.  It's a Chenbro 16 hotswap bay case.  It has 4 mini backplanes
that each connect via an SFF-8087 cable (1m) to my LSI cards (2 cables / 8 
drives
per card).

Chad

On Tue, Dec 01, 2009 at 01:02:34PM +1000, James C. McPherson wrote:
 Chad Cantwell wrote:
 Hi,
 
 Sorry for not replying to one of the already open threads on this topic;
 I've just joined the list for the purposes of this discussion and have
 nothing in my client to reply to yet.
 
 I have an x86_64 opensolaris machine running on a Core 2 Quad Q9650
 platform with two LSI SAS3081E-R PCI-E 8 port SAS controllers, with
 8 drives each.
 
 Are these disks internal to your server's chassis, or external in
 a jbod? If in a jbod, which one? Also, which cables are you using?
 
 
 thankyou,
 James C. McPherson
 --
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-11-30 Thread James C. McPherson

Chad Cantwell wrote:

Hi,

Replied to your previous general query already, but in summary, they are in the
server chassis.  It's a Chenbro 16 hotswap bay case.  It has 4 mini backplanes
that each connect via an SFF-8087 cable (1m) to my LSI cards (2 cables / 8 
drives
per card).


Hi Chad,
thanks for the followup. Just to confirm - you've got this
Chenbro chassis connected to the actual server chassis (where
the cpu is), or do you have the cpu inside the Chenbro chassis?


thankyou,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-11-30 Thread Chad Cantwell
Hi,

The Chenbro chassis contains everything - the motherboard/CPU, and the disks.  
As far as
I know the chenbro backplanes are basically electrical jumpers that the LSI 
cards shouldn't
be aware of.  They pass through the SATA signals directly from SFF-8087 cables 
to the
disks.

Thanks,
Chad

On Tue, Dec 01, 2009 at 01:43:06PM +1000, James C. McPherson wrote:
 Chad Cantwell wrote:
 Hi,
 
 Replied to your previous general query already, but in summary, they are in 
 the
 server chassis.  It's a Chenbro 16 hotswap bay case.  It has 4 mini 
 backplanes
 that each connect via an SFF-8087 cable (1m) to my LSI cards (2 cables / 8 
 drives
 per card).
 
 Hi Chad,
 thanks for the followup. Just to confirm - you've got this
 Chenbro chassis connected to the actual server chassis (where
 the cpu is), or do you have the cpu inside the Chenbro chassis?
 
 
 thankyou,
 James
 --
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-11-30 Thread Chad Cantwell
After another crash I checked the syslog and there were some different errors 
than the ones
I saw previously during operation:

Nov 30 20:26:11 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:26:11 the-vault   Disconnected command timeout for Target 10
Nov 30 20:59:12 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:59:12 the-vault   mpt_send_handshake_msg task 3 failed
Nov 30 20:59:13 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not supported.
Nov 30 20:59:13 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:59:13 the-vault   mpt_config_space_init failed
Nov 30 20:59:15 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:59:15 the-vault   LSI PCI device (1000,) not supported.
Nov 30 20:59:15 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:59:15 the-vault   mpt_config_space_init failed
Nov 30 20:59:15 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 20:59:15 the-vault   mpt_restart_ioc failed
Nov 30 21:32:17 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:17 the-vault   mpt_send_handshake_msg task 4 failed
Nov 30 21:32:18 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:18 the-vault   LSI PCI device (1000,) not supported.
Nov 30 21:32:18 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:18 the-vault   mpt_config_space_init failed
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   LSI PCI device (1000,) not supported.
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   mpt_config_space_init failed
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   mpt_restart_ioc failed
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   Rejecting future commands
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0):
Nov 30 21:32:19 the-vault   Disconnected command timeout for Target 14
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked
Nov 30 21:32:19 the-vault scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1):
Nov 30 21:32:19 the-vault   rejecting command, throttle choked

Re: [zfs-discuss] mpt errors on snv 127

2009-11-30 Thread James C. McPherson

Chad Cantwell wrote:

After another crash I checked the syslog and there were some different errors 
than the ones
I saw previously during operation:

...


Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not supported.

...

Nov 30 20:59:13 the-vault   mpt_config_space_init failed

...

Nov 30 20:59:15 the-vault   mpt_restart_ioc failed




Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
System-Serial-Number, HOSTNAME: the-vault
Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63
Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid request.
Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R for more 
information.
Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may be 
disabled
Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device 
instances associated with this fault
Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and 
patches are installed. Otherwise schedule a repair procedure to replace the 
affected device(s).  Us
e fmadm faulty to identify the devices or contact Sun for support.



Sorry to have to tell you, but that HBA is dead. Or at
least dying horribly. If you can't init the config space
(that's the pci bus config space), then you've got about
1/2 the nails in the coffin hammered in. Then the failure
to restart the IOC (io controller unit) == the rest of
the lid hammered down.


best regards,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss