[zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I have two identical Supermicro boxes with 32GB ram. Hardware details at
the end of the message.

They were running OI 151.a.5 for months. The zpool configuration was one
storage zpool with 3 vdevs of 8 disks in RAIDZ2.

The OI installation is absolutely clean. Just next-next-next until done.
All I do is configure the network after install. I don't install or enable
any other services.

Then I added more disks and rebuild the systems with OI 151.a.7 and this
time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

The systems started crashing really bad. They just disappear from the
network, black and unresponsive console, no error lights but no activity
indication either. The only way out is to power cycle the system.

There is no pattern in the crashes. It may crash in 2 days in may crash in
2 hours.

I upgraded the memory on both systems to 128GB at no avail. This is the max
memory they can take.

In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

Any idea what could be the problem.

Thank you

-- Peter

Supermicro X9DRH-iF
Xeon E5-2620 @ 2.0 GHz 6-Core
LSI SAS9211-8i HBA
32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Michael Schuster
Peter,

sorry if this is so obvious that you didn't mention it: Have you checked
/var/adm/messages and other diagnostic tool output?

regards
Michael

On Wed, Mar 20, 2013 at 4:34 PM, Peter Wood peterwood...@gmail.com wrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash in
 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Will Murnane
Does the Supermicro IPMI show anything when it crashes?  Does anything show
up in event logs in the BIOS, or in system logs under OI?


On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.com wrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash in
 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I'm sorry. I should have mentioned it that I can't find any errors in the
logs. The last entry in /var/adm/messages is that I removed the keyboard
after the last reboot and then it shows the new boot up messages when I
boot up the system after the crash. The BIOS log is empty. I'm not sure how
to check the IPMI but IPMI is not configured and I'm not using it.

Just another observation - the crashes are more intense the more data the
system serves (NFS).

I'm looking into FRMW upgrades for the LSI now.


On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote:

 Does the Supermicro IPMI show anything when it crashes?  Does anything
 show up in event logs in the BIOS, or in system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Michael Schuster
How about crash dumps?

michael

On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com wrote:

 I'm sorry. I should have mentioned it that I can't find any errors in the
 logs. The last entry in /var/adm/messages is that I removed the keyboard
 after the last reboot and then it shows the new boot up messages when I
 boot up the system after the crash. The BIOS log is empty. I'm not sure how
 to check the IPMI but IPMI is not configured and I'm not using it.

 Just another observation - the crashes are more intense the more data the
 system serves (NFS).

 I'm looking into FRMW upgrades for the LSI now.


 On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote:

 Does the Supermicro IPMI show anything when it crashes?  Does anything
 show up in event logs in the BIOS, or in system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I'm going to need some help with the crash dumps. I'm not very familiar
with Solaris.

Do I have to enable something to get the crash dumps? Where should I look
for them?

Thanks for the help.


On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster michaelspriv...@gmail.com
 wrote:

 How about crash dumps?

 michael


 On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.comwrote:

 I'm sorry. I should have mentioned it that I can't find any errors in the
 logs. The last entry in /var/adm/messages is that I removed the keyboard
 after the last reboot and then it shows the new boot up messages when I
 boot up the system after the crash. The BIOS log is empty. I'm not sure how
 to check the IPMI but IPMI is not configured and I'm not using it.

 Just another observation - the crashes are more intense the more data the
 system serves (NFS).

 I'm looking into FRMW upgrades for the LSI now.


 On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote:

 Does the Supermicro IPMI show anything when it crashes?  Does anything
 show up in event logs in the BIOS, or in system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details
 at the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was
 one storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until
 done. All I do is configure the network after install. I don't install or
 enable any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and
 this time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 --
 Michael Schuster
 http://recursiveramblings.wordpress.com/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Jim Klimov

On 2013-03-20 17:15, Peter Wood wrote:

I'm going to need some help with the crash dumps. I'm not very familiar
with Solaris.

Do I have to enable something to get the crash dumps? Where should I
look for them?


Typically the kernel crash dumps are created as a result of kernel
panic; also they may be forced by administrative actions like NMI.
They require you to configure a dump volume of sufficient size (see
dumpadm) and a /var/crash which may be a dataset on a large enough
pool - after the reboot the dump data will be migrated there.

To help with the hangs you can try the BIOS watchdog (which would
require a bmc driver, one which is known from OpenSolaris is alas
not opensourced and not redistributable), or with a software deadman
timer:

http://www.cuddletech.com/blog/pivot/entry.php?id=1044

http://wiki.illumos.org/display/illumos/System+Hangs

Also, if you configure crash dump on NMI and set up your IPMI card,
then you can likely gain remote access to both the server console
(physical and/or serial) and may be able to trigger the NMI, too.

HTH,
//Jim



Thanks for the help.


On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster
michaelspriv...@gmail.com mailto:michaelspriv...@gmail.com wrote:

How about crash dumps?

michael


On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com
mailto:peterwood...@gmail.com wrote:

I'm sorry. I should have mentioned it that I can't find any
errors in the logs. The last entry in /var/adm/messages is that
I removed the keyboard after the last reboot and then it shows
the new boot up messages when I boot up the system after the
crash. The BIOS log is empty. I'm not sure how to check the IPMI
but IPMI is not configured and I'm not using it.

Just another observation - the crashes are more intense the more
data the system serves (NFS).

I'm looking into FRMW upgrades for the LSI now.


On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane
will.murn...@gmail.com mailto:will.murn...@gmail.com wrote:

Does the Supermicro IPMI show anything when it crashes?
  Does anything show up in event logs in the BIOS, or in
system logs under OI?


On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood
peterwood...@gmail.com mailto:peterwood...@gmail.com wrote:

I have two identical Supermicro boxes with 32GB ram.
Hardware details at the end of the message.

They were running OI 151.a.5 for months. The zpool
configuration was one storage zpool with 3 vdevs of 8
disks in RAIDZ2.

The OI installation is absolutely clean. Just
next-next-next until done. All I do is configure the
network after install. I don't install or enable any
other services.

Then I added more disks and rebuild the systems with OI
151.a.7 and this time configured the zpool with 6 vdevs
of 5 disks in RAIDZ.

The systems started crashing really bad. They
just disappear from the network, black and unresponsive
console, no error lights but no activity indication
either. The only way out is to power cycle the system.

There is no pattern in the crashes. It may crash in 2
days in may crash in 2 hours.

I upgraded the memory on both systems to 128GB at no
avail. This is the max memory they can take.

In summary all I did is upgrade to OI 151.a.7 and
reconfigured zpool.

Any idea what could be the problem.

Thank you

-- Peter

Supermicro X9DRH-iF
Xeon E5-2620 @ 2.0 GHz 6-Core
LSI SAS9211-8i HBA
32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Michael Schuster
http://recursiveramblings.wordpress.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
Hi Jim,

Thanks for the pointers. I'll definitely look into this.


--
Peter Blajev
IT Manager, TAAZ Inc.
Office: 858-597-0512 x125


On Wed, Mar 20, 2013 at 11:29 AM, Jim Klimov jimkli...@cos.ru wrote:

 On 2013-03-20 17:15, Peter Wood wrote:

 I'm going to need some help with the crash dumps. I'm not very familiar
 with Solaris.

 Do I have to enable something to get the crash dumps? Where should I
 look for them?


 Typically the kernel crash dumps are created as a result of kernel
 panic; also they may be forced by administrative actions like NMI.
 They require you to configure a dump volume of sufficient size (see
 dumpadm) and a /var/crash which may be a dataset on a large enough
 pool - after the reboot the dump data will be migrated there.

 To help with the hangs you can try the BIOS watchdog (which would
 require a bmc driver, one which is known from OpenSolaris is alas
 not opensourced and not redistributable), or with a software deadman
 timer:

 http://www.cuddletech.com/**blog/pivot/entry.php?id=1044http://www.cuddletech.com/blog/pivot/entry.php?id=1044

 http://wiki.illumos.org/**display/illumos/System+Hangshttp://wiki.illumos.org/display/illumos/System+Hangs

 Also, if you configure crash dump on NMI and set up your IPMI card,
 then you can likely gain remote access to both the server console
 (physical and/or serial) and may be able to trigger the NMI, too.

 HTH,
 //Jim


 Thanks for the help.


 On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster
 michaelspriv...@gmail.com 
 mailto:michaelsprivate@gmail.**commichaelspriv...@gmail.com
 wrote:

 How about crash dumps?

 michael


 On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com
 mailto:peterwood...@gmail.com** wrote:

 I'm sorry. I should have mentioned it that I can't find any
 errors in the logs. The last entry in /var/adm/messages is that
 I removed the keyboard after the last reboot and then it shows
 the new boot up messages when I boot up the system after the
 crash. The BIOS log is empty. I'm not sure how to check the IPMI
 but IPMI is not configured and I'm not using it.

 Just another observation - the crashes are more intense the more
 data the system serves (NFS).

 I'm looking into FRMW upgrades for the LSI now.


 On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane
 will.murn...@gmail.com mailto:will.murn...@gmail.com** wrote:

 Does the Supermicro IPMI show anything when it crashes?
   Does anything show up in event logs in the BIOS, or in
 system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood
 peterwood...@gmail.com mailto:peterwood...@gmail.com**
 wrote:

 I have two identical Supermicro boxes with 32GB ram.
 Hardware details at the end of the message.

 They were running OI 151.a.5 for months. The zpool
 configuration was one storage zpool with 3 vdevs of 8
 disks in RAIDZ2.

 The OI installation is absolutely clean. Just
 next-next-next until done. All I do is configure the
 network after install. I don't install or enable any
 other services.

 Then I added more disks and rebuild the systems with OI
 151.a.7 and this time configured the zpool with 6 vdevs
 of 5 disks in RAIDZ.

 The systems started crashing really bad. They
 just disappear from the network, black and unresponsive
 console, no error lights but no activity indication
 either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2
 days in may crash in 2 hours.

 I upgraded the memory on both systems to 128GB at no
 avail. This is the max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and
 reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 
 mailto:zfs-discuss@**opensolaris.orgzfs-discuss@opensolaris.org
 

 http://mail.opensolaris.org/**
 mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org 
 

Re: [zfs-discuss] [BULK] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
No problem Trey. Anything will help.

Yes, I did a clean install overwriting the old OS.



  Just to make sure, you actually did an overwrite reinstall with OI151a7
 rather than upgrading the existing OS images?   If you did a pkg
 image-update, you should be able to boot back into the oi151a5 image from
 grub.  Apologies in advance if I'm stating the obvious.

  -- Trey


 On Mar 20, 2013, at 11:34 AM, Peter Wood peterwood...@gmail.com wrote:

   I have two identical Supermicro boxes with 32GB ram. Hardware details
 at the end of the message.

  They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

  The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

  Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

  The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

  There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

  I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

  In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

  Any idea what could be the problem.

  Thank you

  -- Peter

  Supermicro X9DRH-iF
  Xeon E5-2620 @ 2.0 GHz 6-Core
  LSI SAS9211-8i HBA
  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

  ___

 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Jens Elkner
On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote:
I'm sorry. I should have mentioned it that I can't find any errors in the
logs. The last entry in /var/adm/messages is that I removed the keyboard
after the last reboot and then it shows the new boot up messages when I 
 boot
up the system after the crash. The BIOS log is empty. I'm not sure how to
check the IPMI but IPMI is not configured and I'm not using it.

You definitely should! Plugin a cable into the dedicated network port 
and configure it (easiest way for you is probably to jump into the BIOS
and assign the appropriate IP address etc.). Than, for a quick look, 
point your browser to the given IP port 80 (default login is
ADMIN/ADMIN). Also you may now configure some other details
(accounts/passwords/roles).

To track the problem, either write a script, which polls the parameters
in question periodically or just install the latest ipmiViewer and use
this to monitor your sensors ad hoc.
see ftp://ftp.supermicro.com/utility/IPMIView/

Just another observation - the crashes are more intense the more data the
system serves (NFS).
I'm looking into FRMW upgrades for the LSI now.

Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28 (1.0b).
However, I doubt, that your problem has anything to do with the
SAS-ctrl or OI or ZFS.

My guess is, that either your MB is broken (we had an X9DRH-iF, which
instantly disappeared as soon as it got some real load) or you have
a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz
that's not very likely, but worth a try (socket placement on this board
is not really smart IMHO).

To test quickly
- disable all addtional, unneeded service in OI, which may put some
  load on the machine (like NFS service, http and bla) and perhaps
  even export unneeded pools (just to be sure)
- fire up your ipmiviewer and look at the sensors (set update to
  10s) or refresh manually often
- start 'openssl speed -multi 32' and keep watching your cpu temp
  sensors (with 2GHz I guess it takes ~ 12min)

I guess, your machine disappears before the CPUs getting really hot
(broken MB). If CPUs switch off (usually first CPU2 and a little bit
later CPU1) you have a cooling problem. If nothing happens, well, than
it could be an OI or ZFS problem ;-)

Have fun,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 52768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
Great write up Jens.

The chance of two MB to be broken is probably low but overheating is a very
good point. It was on my to-do list to setup IPMI and seems that now is the
best time to do it.

Thanks

On Wed, Mar 20, 2013 at 1:08 PM, Jens Elkner jel+...@cs.uni-magdeburg.dewrote:

 On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote:
 I'm sorry. I should have mentioned it that I can't find any errors in
 the
 logs. The last entry in /var/adm/messages is that I removed the
 keyboard
 after the last reboot and then it shows the new boot up messages when
 I boot
 up the system after the crash. The BIOS log is empty. I'm not sure
 how to
 check the IPMI but IPMI is not configured and I'm not using it.

 You definitely should! Plugin a cable into the dedicated network port
 and configure it (easiest way for you is probably to jump into the BIOS
 and assign the appropriate IP address etc.). Than, for a quick look,
 point your browser to the given IP port 80 (default login is
 ADMIN/ADMIN). Also you may now configure some other details
 (accounts/passwords/roles).

 To track the problem, either write a script, which polls the parameters
 in question periodically or just install the latest ipmiViewer and use
 this to monitor your sensors ad hoc.
 see ftp://ftp.supermicro.com/utility/IPMIView/

 Just another observation - the crashes are more intense the more data
 the
 system serves (NFS).
 I'm looking into FRMW upgrades for the LSI now.

 Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28
 (1.0b).
 However, I doubt, that your problem has anything to do with the
 SAS-ctrl or OI or ZFS.

 My guess is, that either your MB is broken (we had an X9DRH-iF, which
 instantly disappeared as soon as it got some real load) or you have
 a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz
 that's not very likely, but worth a try (socket placement on this board
 is not really smart IMHO).

 To test quickly
 - disable all addtional, unneeded service in OI, which may put some
   load on the machine (like NFS service, http and bla) and perhaps
   even export unneeded pools (just to be sure)
 - fire up your ipmiviewer and look at the sensors (set update to
   10s) or refresh manually often
 - start 'openssl speed -multi 32' and keep watching your cpu temp
   sensors (with 2GHz I guess it takes ~ 12min)

 I guess, your machine disappears before the CPUs getting really hot
 (broken MB). If CPUs switch off (usually first CPU2 and a little bit
 later CPU1) you have a cooling problem. If nothing happens, well, than
 it could be an OI or ZFS problem ;-)

 Have fun,
 jel.
 --
 Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
 Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
 39106 Magdeburg, Germany Tel: +49 391 67 52768
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] This mailing list EOL???

2013-03-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
I can't seem to find any factual indication that opensolaris.org mailing lists 
are going away, and I can't even find the reference to whoever said it was EOL 
in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] This mailing list EOL???

2013-03-20 Thread Cindy Swearingen

Hi Ned,

This list is migrating to java.net and will not be available
in its current form after March 24, 2013.

The archive of this list is available here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/

I will provide an invitation to the new list shortly.

Thanks for your patience.

Cindy

On 03/20/13 15:05, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

I can't seem to find any factual indication that opensolaris.org mailing
lists are going away, and I can't even find the reference to whoever
said it was EOL in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] This mailing list EOL???

2013-03-20 Thread Deirdre Straughan
Will the archives of all the lists be preserved? I don't think we've seen a
clear answer on that (it's possible you haven't, either!).

On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen 
cindy.swearin...@oracle.com wrote:

 Hi Ned,

 This list is migrating to java.net and will not be available
 in its current form after March 24, 2013.

 The archive of this list is available here:

 http://www.mail-archive.com/**zfs-discuss@opensolaris.org/http://www.mail-archive.com/zfs-discuss@opensolaris.org/

 I will provide an invitation to the new list shortly.

 Thanks for your patience.

 Cindy


 On 03/20/13 15:05, Edward Ned Harvey 
 (**opensolarisisdeadlongliveopens**olaris)
 wrote:

 I can't seem to find any factual indication that opensolaris.org mailing
 lists are going away, and I can't even find the reference to whoever
 said it was EOL in a few weeks ... a few weeks ago.

 So ... are these mailing lists going bye-bye?



 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 


best regards,
Deirdré Straughan
Community Architect, SmartOS
illumos Community Manager


cell 720 371 4107
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I can reproduce the problem. I can crash the system.

Here are the steps I did (some steps may not be needed but I haven't tested
it):

- Clean install of OI 151.a.7 on Supermicro hardware described above (32GB
RAM though, not the 128GB)

- Create 1 zpool, 6 raidz vdevs with 5 drives each

- NFS export a dataset
  zfs set sharenfs=rw=@10.20.1/24 vol01/htmlspace

- Create zfs child dataset
  zfs create vol01/htmlspace/A

  $ zfs get -H sharenfs vol01/htmlspace/A
  vol01/htmlspace/A   sharenfsrw=@10.20.1/24  inherited from
vol01/htmlspace

- Stop NFS shearing for the child dataset

  zfs set sharenfs=off vol01/htmlspace/A

The crash is instant after the sharenfs=off command.

I thought it was coincident so after reboot I tried it on another dataset.
Instant crash again. I get my prompt back but that's it. The system is gone
after that.

The NFS exported file systems are not accessed by any system on the
network. They are not in use. That's why I wanted to stop exporting them.
And, even if they were in use this should now crash the system, right?

I can't try the other box because it is heavy in production. At least not
until later tonight.

I thought I'll collect some advice to make each crash as useful as possible.

Any pointers are appreciated.

Thanks,

-- Peter


On Wed, Mar 20, 2013 at 8:34 AM, Peter Wood peterwood...@gmail.com wrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash in
 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Please join us on the new zfs discuss list on java.net

2013-03-20 Thread Cindy Swearingen

Hi Everyone,

The ZFS discussion list is moving to java.net.

This opensolaris/zfs discussion will not be available after March 24.
There is no way to migrate the existing list to the new list.

The solaris-zfs project is here:

http://java.net/projects/solaris-zfs

See the steps below to join the ZFS project or just the discussion list,
but you must create an account on java.net to join the list.

Thanks, Cindy

1. Create an account on java.net.

https://java.net/people/new

2. When logged in to your java.net account, join the solaris-zfs
project as an Observer by clicking the Join This Project link on the
left side of this page:

http://java.net/projects/solaris-zfs

3. Subscribe to the zfs discussion mailing list here:

http://java.net/projects/solaris-zfs/lists
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss