Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-14 Thread Aaron Brady
Well, I upgraded to b124, disabling ACPI because of [1], and I get exactly the 
same behaviour. I've removed the device from the zpool, and tried dd-ing from 
the device while I remove it; it still hangs all IO on the system until the 
disk is re-inserted.

I'm running the kernel with -v (from diagnosing the ACPI issue) and nothing 
enlightening is printed in dmesg.

1: http://defect.opensolaris.org/bz/show_bug.cgi?id=11739
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-14 Thread Ross
What are you running there?  snv or OpenSolaris?

Could you try an OpenSolaris 2009.06 live disc and boot directly from that.  
Once I was running that build every single hot plug I tried worked flawlessly.  
I tried for several hours to replicate the problems that caused me to log that 
bug report, but the issue appeared completely resolved.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Aaron Brady
All's gone quiet on this issue, and the bug is closed, but I'm having exactly 
the same problem; pulling a disk on this card, under OpenSolaris 111, is 
pausing all IO (including, weirdly, network IO), and using the ZFS utilities 
(zfs list, zpool list, zpool status) causes a hang until I replace the disk.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Tim Cook
On Tue, Oct 13, 2009 at 8:54 AM, Aaron Brady bra...@gmail.com wrote:

 All's gone quiet on this issue, and the bug is closed, but I'm having
 exactly the same problem; pulling a disk on this card, under OpenSolaris
 111, is pausing all IO (including, weirdly, network IO), and using the ZFS
 utilities (zfs list, zpool list, zpool status) causes a hang until I replace
 the disk.
 --



Did you set your failmode to continue?


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Ross
Hi Tim, that doesn't help in this case - it's a complete lockup apparently 
caused by driver issues.

However, the good news ofr Insom is that the bug is closed because the problem 
now appears fixed.  I tested it and found that it's no longer occuring in 
OpenSolaris 2008.11 or 2009.06.

If you move to a newer build of OpenSolaris you should be fine.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Aaron Brady
I did, but as tcook suggests running a later build, I'll try an image-update 
(though, 111  2008.11, right?)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Tim Cook
On Tue, Oct 13, 2009 at 9:42 AM, Aaron Brady bra...@gmail.com wrote:

 I did, but as tcook suggests running a later build, I'll try an
 image-update (though, 111  2008.11, right?)



It should be, yes.  b111 was released in April of 2009.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Ross
This sounds like exactly the kind of problem I've been shouting about for 6 
months or more.  I posted a huge thread on availability on these forums because 
I had concerns over exactly this kind of hanging.

ZFS doesn't trust hardware or drivers when it comes to your data - everything 
is checksummed.  However, when it comes to seeing whether devices are 
responding, and checking for faults, it blindly trusts whatever the hardware or 
driver tells it.  Unfortunately, that means ZFS is vulnerable to any unexpected 
bug or error in the storage chain.  I've encountered at least two hang 
conditions myself (and I'm not exactly a heavy user), and I've seen several 
others on the forums, including a few on x4500's.

Now, I do accept that errors like this will be few and far between, but they 
still means you have the risk that a badly handled error condition can hang 
your entire server, instead of just one drive.  Solaris can handle things like 
CPU's or Memory going faulty for crying out loud.  Its raid storage system had 
better be able to handle a disk failing.

Sun seem to be taking the approach that these errors should be dealt with in 
the driver layer.  And while that's technically correct, a reliable storage 
system had damn well better be able to keep the server limping along while we 
wait for patches to the storage drivers.

ZFS absolutely needs an error handling layer between the volume manager and the 
devices.  It needs to timeout items that are not responding, and it needs to 
drop bad devices if they could cause problems elsewhere.

And yes, I'm repeating myself, but I can't understand why this is not being 
acted on.  Right now the error checking appears to be such that if an 
unexpected, or badly handled error condition occurs in the driver stack, the 
pool or server hangs.  Whereas the expected behavior would be for just one 
drive to fail.  The absolute worst case scenario should be that an entire 
controller has to be taken offline (and I would hope that the controllers in an 
x4500 would be running separate instances of the driver software).

None one of those conditions should be fatal, good storage designs cope with 
them all, and good error handling at the ZFS layer is absolutely vital when you 
have projects like Comstar introducing more and more types of storage device 
for ZFS to work with.

Each extra type of storage introduces yet more software into the equation, and 
increases the risk of finding faults like this.  While they will be rare, they 
should be expected, and ZFS should be designed to handle them.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Tim
On Thu, Feb 12, 2009 at 9:25 AM, Ross myxi...@googlemail.com wrote:

 This sounds like exactly the kind of problem I've been shouting about for 6
 months or more.  I posted a huge thread on availability on these forums
 because I had concerns over exactly this kind of hanging.

 ZFS doesn't trust hardware or drivers when it comes to your data -
 everything is checksummed.  However, when it comes to seeing whether devices
 are responding, and checking for faults, it blindly trusts whatever the
 hardware or driver tells it.  Unfortunately, that means ZFS is vulnerable to
 any unexpected bug or error in the storage chain.  I've encountered at least
 two hang conditions myself (and I'm not exactly a heavy user), and I've seen
 several others on the forums, including a few on x4500's.

 Now, I do accept that errors like this will be few and far between, but
 they still means you have the risk that a badly handled error condition can
 hang your entire server, instead of just one drive.  Solaris can handle
 things like CPU's or Memory going faulty for crying out loud.  Its raid
 storage system had better be able to handle a disk failing.

 Sun seem to be taking the approach that these errors should be dealt with
 in the driver layer.  And while that's technically correct, a reliable
 storage system had damn well better be able to keep the server limping along
 while we wait for patches to the storage drivers.

 ZFS absolutely needs an error handling layer between the volume manager and
 the devices.  It needs to timeout items that are not responding, and it
 needs to drop bad devices if they could cause problems elsewhere.

 And yes, I'm repeating myself, but I can't understand why this is not being
 acted on.  Right now the error checking appears to be such that if an
 unexpected, or badly handled error condition occurs in the driver stack, the
 pool or server hangs.  Whereas the expected behavior would be for just one
 drive to fail.  The absolute worst case scenario should be that an entire
 controller has to be taken offline (and I would hope that the controllers in
 an x4500 would be running separate instances of the driver software).

 None one of those conditions should be fatal, good storage designs cope
 with them all, and good error handling at the ZFS layer is absolutely vital
 when you have projects like Comstar introducing more and more types of
 storage device for ZFS to work with.

 Each extra type of storage introduces yet more software into the equation,
 and increases the risk of finding faults like this.  While they will be
 rare, they should be expected, and ZFS should be designed to handle them.



I'd imagine for the exact same reason short-stroking/right-sizing isn't a
concern.

We don't have this problem in the 7000 series, perhaps you should buy one
of those.

;)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Ross Smith
Heh, yeah, I've thought the same kind of thing in the past.  The
problem is that the argument doesn't really work for system admins.

As far as I'm concerned, the 7000 series is a new hardware platform,
with relatively untested drivers, running a software solution that I
know is prone to locking up when hardware faults are handled badly by
drivers.  Fair enough, that actual solution is out of our price range,
but I would still be very dubious about purchasing it.  At the very
least I'd be waiting a year for other people to work the kinks out of
the drivers.

Which is a shame, because ZFS has so many other great features it's
easily our first choice for a storage platform.  The one and only
concern we have is its reliability.  We have snv_106 running as a test
platform now.  If I felt I could trust ZFS 100% I'd roll it out
tomorrow.



On Thu, Feb 12, 2009 at 4:25 PM, Tim t...@tcsac.net wrote:


 On Thu, Feb 12, 2009 at 9:25 AM, Ross myxi...@googlemail.com wrote:

 This sounds like exactly the kind of problem I've been shouting about for
 6 months or more.  I posted a huge thread on availability on these forums
 because I had concerns over exactly this kind of hanging.

 ZFS doesn't trust hardware or drivers when it comes to your data -
 everything is checksummed.  However, when it comes to seeing whether devices
 are responding, and checking for faults, it blindly trusts whatever the
 hardware or driver tells it.  Unfortunately, that means ZFS is vulnerable to
 any unexpected bug or error in the storage chain.  I've encountered at least
 two hang conditions myself (and I'm not exactly a heavy user), and I've seen
 several others on the forums, including a few on x4500's.

 Now, I do accept that errors like this will be few and far between, but
 they still means you have the risk that a badly handled error condition can
 hang your entire server, instead of just one drive.  Solaris can handle
 things like CPU's or Memory going faulty for crying out loud.  Its raid
 storage system had better be able to handle a disk failing.

 Sun seem to be taking the approach that these errors should be dealt with
 in the driver layer.  And while that's technically correct, a reliable
 storage system had damn well better be able to keep the server limping along
 while we wait for patches to the storage drivers.

 ZFS absolutely needs an error handling layer between the volume manager
 and the devices.  It needs to timeout items that are not responding, and it
 needs to drop bad devices if they could cause problems elsewhere.

 And yes, I'm repeating myself, but I can't understand why this is not
 being acted on.  Right now the error checking appears to be such that if an
 unexpected, or badly handled error condition occurs in the driver stack, the
 pool or server hangs.  Whereas the expected behavior would be for just one
 drive to fail.  The absolute worst case scenario should be that an entire
 controller has to be taken offline (and I would hope that the controllers in
 an x4500 would be running separate instances of the driver software).

 None one of those conditions should be fatal, good storage designs cope
 with them all, and good error handling at the ZFS layer is absolutely vital
 when you have projects like Comstar introducing more and more types of
 storage device for ZFS to work with.

 Each extra type of storage introduces yet more software into the equation,
 and increases the risk of finding faults like this.  While they will be
 rare, they should be expected, and ZFS should be designed to handle them.


 I'd imagine for the exact same reason short-stroking/right-sizing isn't a
 concern.

 We don't have this problem in the 7000 series, perhaps you should buy one
 of those.

 ;)

 --Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Bob Friesenhahn

On Thu, 12 Feb 2009, Ross Smith wrote:


As far as I'm concerned, the 7000 series is a new hardware platform,


You are joking right?  Have you ever looked at the photos of these 
new systems or compared them to other Sun systems?  They are just 
re-purposed existing systems with a bit of extra secret sauce added.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Tim
On Thu, Feb 12, 2009 at 5:16 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Thu, 12 Feb 2009, Ross Smith wrote:


 As far as I'm concerned, the 7000 series is a new hardware platform,


 You are joking right?  Have you ever looked at the photos of these new
 systems or compared them to other Sun systems?  They are just re-purposed
 existing systems with a bit of extra secret sauce added.

 Bob


Ya, that *secret sauce* is what makes it a new system.  And out of the last
4 x4240's I've ordered, two had to have new motherboards installed within a
week, and one had to have a new power supply.  The other appears to have a
dvd rom drive going flaky.  So the fact they're based on existing hardware
isn't exactly confidence inspiring either.

Sun's old sparc gear: rock solid.  The newer x64 has been leaving a bad
taste in my mouth TBQH.  The engineering behind the systems when I open them
up is absolutely phenomenal.  The failure rate, however, is downright scary.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-11 Thread Peter Schultze
 Yup, was an absolute nightmare to diagnose on top of everything else.  
 Definitely doesn't 
 happen in windows too.  I really want somebody to try snv_94 on a Thumper to 
 see if you
 get the same behaviour there, or whether it's unique to Supermicro's Marvell 
 card.

On a Thumper under S10U5 we recently had a hardware failure
of one disk. This caused all I/O to the entire 46 disk pool to hang.
zpool status commands also were hanging. Reset commands
from the service processor timed out unsuccessfully. The system
had to be power cycled manually. After that booting took about
30 minutes. At this point the bad disk could be unconfigured
with cfgadm and then hot swapped with a warranty replacement.

So it appears that bug 6735931 is also affecting the X4500 upon disk
hardware failure; in a way that seriously impairs the entire system's 
fault tolerance. 

I would be willing to test any T-patch coming out soon

I found this thread after seeing a total failure of a hot unplug
of a 1.5TB disk from a (different) newly assembled system with 3 AOC-SAT2-MV8
cards and 24 disks + one host spare. After removing one disk
the entire system also froze; instead of initiating a resilver
process with the hot spare. Clearly the marvell88sx driver cannot handle
disk outages in any environment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Brian D. Horn
Well, when you leave out a bunch of relevant information you also leave
people guessing! :-)

Regardless, is it possibly that all of your testing was done with ZFS and not
just the raw disk?  If so, it is possible that ZFS isn't noticing the hot 
unplugging
of the disk until it tries to access the drive.  I don't know this, but it would
be consistent with what you have related to date.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread James C. McPherson
Ross wrote:
 lol, I got bored after 13 pages and a whole day of going back through my
 notes to pick out the relevant information.
 
 Besides, I did mention that I was using cfgadm to see what was connected
 :-p.  If you're really interested, most of my troubleshooting notes have
 been posted to the forum, but unfortunately Sun's software has split it
 into three or four pieces.  Just search for posts talking about the
 AOC-SAT2-MV8 card to find them.
 
 Without fail, cfgadm changes the status from disk to sata-port when I
 unplug a device attached to port 6 or 7, but most of the time unplugging
 disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7.

That does seem inconsistent, or at least, it's not what I'd expect.

 Often the system hung completely when you pulled one of the disks 0-5,
 and wouldn't respond again until you re-inserted it.
 
 I'm 99.99% sure this is a driver issue for this controller.

Have you logged a bug on it yet?


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Ross Smith

  Without fail, cfgadm changes the status from disk to sata-port when I
  unplug a device attached to port 6 or 7, but most of the time unplugging
  disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7.
 
 That does seem inconsistent, or at least, it's not what I'd expect.

Yup, was an absolute nightmare to diagnose on top of everything else.  
Definitely doesn't happen in windows too.  I really want somebody to try snv_94 
on a Thumper to see if you get the same behaviour there, or whether it's unique 
to Supermicro's Marvell card.

  Often the system hung completely when you pulled one of the disks 0-5,
  and wouldn't respond again until you re-inserted it.
  
  I'm 99.99% sure this is a driver issue for this controller.
 
 Have you logged a bug on it yet?

Yup, 6735931.  Added the information about it working in Windows today too.

Ross

_
Get Hotmail on your mobile from Vodafone 
http://clk.atdmt.com/UKM/go/107571435/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread James C. McPherson
Ross Smith wrote:
Without fail, cfgadm changes the status from disk to sata-port 
 when I
unplug a device attached to port 6 or 7, but most of the time 
 unplugging
disks 0-5 results in no change in cfgadm, until I also attach disk 
 6 or 7.
  
   That does seem inconsistent, or at least, it's not what I'd expect.
 
 Yup, was an absolute nightmare to diagnose on top of everything else.  
 Definitely doesn't happen in windows too.  I really want somebody to try 
 snv_94 on a Thumper to see if you get the same behaviour there, or 
 whether it's unique to Supermicro's Marvell card.

That's a very good question.

Often the system hung completely when you pulled one of the disks 0-5,
and wouldn't respond again until you re-inserted it.
   
I'm 99.99% sure this is a driver issue for this controller.
  
   Have you logged a bug on it yet?
 
 Yup, 6735931.  Added the information about it working in Windows today too.


Heh... I should have recognised that, I moved it from the
triage queue to driver/sata :-)


James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Tim
I don't think its just b94, I recall this behavior for as long as I've
had the card.  I'd also be interested to know if the sun driver team
has ever even tested with this card.  I realize its probably not a top
priority, but it sure would be nice to have it working properly.






On 8/20/08, Ross Smith [EMAIL PROTECTED] wrote:

  Without fail, cfgadm changes the status from disk to sata-port when
  I
  unplug a device attached to port 6 or 7, but most of the time unplugging
  disks 0-5 results in no change in cfgadm, until I also attach disk 6 or
  7.

 That does seem inconsistent, or at least, it's not what I'd expect.

 Yup, was an absolute nightmare to diagnose on top of everything else.
 Definitely doesn't happen in windows too.  I really want somebody to try
 snv_94 on a Thumper to see if you get the same behaviour there, or whether
 it's unique to Supermicro's Marvell card.

  Often the system hung completely when you pulled one of the disks 0-5,
  and wouldn't respond again until you re-inserted it.
 
  I'm 99.99% sure this is a driver issue for this controller.

 Have you logged a bug on it yet?

 Yup, 6735931.  Added the information about it working in Windows today too.

 Ross

 _
 Get Hotmail on your mobile from Vodafone
 http://clk.atdmt.com/UKM/go/107571435/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Ross
Haven't a clue, but I've just gotten around to installing windows on this box 
to test and I can confirm that hot plug works just fine in windows.

Drives appear and dissappear in device manager the second I unplug the 
hardware.  Any drive, either controller.  So far I've done a couple of dozen 
removals, pulling individual drives, or as many as half a dozen at once.  I've 
even gone as far as to immediately pull a drive I only just connected.  Windows 
has no problems at all.

Unfortunately for me, Windows doesn't support ZFS...  right now it's looking a 
whole load more stable.

Ross


 div id=jive-html-wrapper-div
 div dir=ltrI don't have any extra cards lying
 around and can't really take my server down, so
 my immediate question would be:brIs there any sort
 of PCI bridge chip on the card?  I know in my
 experience I've seen all sorts of headaches with
 less than stellar bridge chips.  Specifically
 some of the IBM bridge chips.br
 brFood for
 thought.brbr--Tim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Tim
You could always try FreeBSD :)

--Tim

On Fri, Aug 15, 2008 at 9:44 AM, Ross [EMAIL PROTECTED] wrote:

 Haven't a clue, but I've just gotten around to installing windows on this
 box to test and I can confirm that hot plug works just fine in windows.

 Drives appear and dissappear in device manager the second I unplug the
 hardware.  Any drive, either controller.  So far I've done a couple of dozen
 removals, pulling individual drives, or as many as half a dozen at once.
  I've even gone as far as to immediately pull a drive I only just connected.
  Windows has no problems at all.

 Unfortunately for me, Windows doesn't support ZFS...  right now it's
 looking a whole load more stable.

 Ross



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Ross Smith

Oh god no, I'm already learning three new operating systems, now is not a good 
time to add a fourth.
 
Ross-- Windows admin now working with Ubuntu, OpenSolaris and ESX



Date: Fri, 15 Aug 2008 10:07:31 -0500From: [EMAIL PROTECTED]: [EMAIL 
PROTECTED]: Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive 
removedCC: zfs-discuss@opensolaris.org
You could always try FreeBSD :)--Tim
On Fri, Aug 15, 2008 at 9:44 AM, Ross [EMAIL PROTECTED] wrote:
Haven't a clue, but I've just gotten around to installing windows on this box 
to test and I can confirm that hot plug works just fine in windows.Drives 
appear and dissappear in device manager the second I unplug the hardware.  Any 
drive, either controller.  So far I've done a couple of dozen removals, pulling 
individual drives, or as many as half a dozen at once.  I've even gone as far 
as to immediately pull a drive I only just connected.  Windows has no problems 
at all.Unfortunately for me, Windows doesn't support ZFS...  right now it's 
looking a whole load more stable.Ross
_
Win a voice over part with Kung Fu Panda  Live Search   and   100’s of Kung Fu 
Panda prizes to win with Live Search
http://clk.atdmt.com/UKM/go/107571439/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Florin Iucha
On Fri, Aug 15, 2008 at 10:07:31AM -0500, Tim wrote:
 You could always try FreeBSD :)

  Unfortunately for me, Windows doesn't support ZFS...  right now it's
  looking a whole load more stable.

Nope: FreeBSD doesn't have proper power management either.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


pgpOZ7C2tjYVb.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-14 Thread Ross
This is the problem when you try to write up a good summary of what you found.  
I've got pages and pages of notes of all the tests I did here, far more than I 
could include in that PDF.

What makes me think it's driver is that I've done much of what you suggested.  
I've replicated the exact same behaviour on two different cards, individually 
and with both cards attached to the server.  It's also consistent across many 
different brands and types of drive, and occurs even if I have just 4 drives 
connected out of 8 on a single controller.

I did wonder whether it could be hardware related, so I tested plugging and 
unplugging drives while the computer was booting.  While doing that and 
hot-plugging drives in the BIOS, at no point did I see any hanging of the 
system, which tends to confirm my thought that it's driver related.

I was also able to power on the system with all drives connected, wait for the 
controllers to finish scanning the drives, then remove a few at the GRUB boot 
screen.  From there when I continue to boot Solaris, the correct state is 
detected every time for all drives.

Based on that, it appears that it's purely a problem with detection of the 
insertion / removal event after Solaris has loaded its drivers.  Initial 
detection is fine, it's purely hot swap detection on ports 0-5 that fails.  I 
know it sounds weird, but trust me I checked this pretty carefully, and 
experience has taught me never to assume computers won't behave in odd ways.

I do appreciate my diagnosis may be wrong as I have very limited knowledge of 
Solaris' internals, but that is my best guess right now.

Ross
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-14 Thread Tim
I don't have any extra cards lying around and can't really take my server
down, so my immediate question would be:
Is there any sort of PCI bridge chip on the card?  I know in my experience
I've seen all sorts of headaches with less than stellar bridge chips.
Specifically some of the IBM bridge chips.

Food for thought.

--Tim





On Thu, Aug 14, 2008 at 5:24 AM, Ross [EMAIL PROTECTED] wrote:

 This is the problem when you try to write up a good summary of what you
 found.  I've got pages and pages of notes of all the tests I did here, far
 more than I could include in that PDF.

 What makes me think it's driver is that I've done much of what you
 suggested.  I've replicated the exact same behaviour on two different cards,
 individually and with both cards attached to the server.  It's also
 consistent across many different brands and types of drive, and occurs even
 if I have just 4 drives connected out of 8 on a single controller.

 I did wonder whether it could be hardware related, so I tested plugging and
 unplugging drives while the computer was booting.  While doing that and
 hot-plugging drives in the BIOS, at no point did I see any hanging of the
 system, which tends to confirm my thought that it's driver related.

 I was also able to power on the system with all drives connected, wait for
 the controllers to finish scanning the drives, then remove a few at the GRUB
 boot screen.  From there when I continue to boot Solaris, the correct state
 is detected every time for all drives.

 Based on that, it appears that it's purely a problem with detection of the
 insertion / removal event after Solaris has loaded its drivers.  Initial
 detection is fine, it's purely hot swap detection on ports 0-5 that fails.
  I know it sounds weird, but trust me I checked this pretty carefully, and
 experience has taught me never to assume computers won't behave in odd ways.

 I do appreciate my diagnosis may be wrong as I have very limited knowledge
 of Solaris' internals, but that is my best guess right now.

 Ross


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-11 Thread Ross
Ok, I've now reported most of the problems I found, but have additional 
information to add to bugs 6667199 and 667208.  Can anybody tell me how I go 
about reporting that to Sun?

thanks,

Ross
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss