Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-25 Thread Peter Wood
 If we create local users in /etc/passwd and /etc/groups, can you please
 tell us how to refresh NFSv4 server to update the user mapping table in
 Openindiana?. How do you face this issue?. If we restart the NFS service in
 Openindiana, using /etc/init.d/nfs restart, will NFSv4 clients reconnect or
 will they enter in a unstable state?.


If you create the user with the same UID on your Debian boxes and on the OI
server there should be no need to do anything else. The mapping is handled
by idmapd (Linux) and svc:/network/nfs/mapid (OpenIndiana). Just make sure
they are configured to use the same mapid domain.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-24 Thread Peter Wood
First thing I'll do is to go in the BIOS and disable CPU C states and
disable all power saving features. If that doesn't help then try NFSv4.

The reason I disable CPU C states is because of previous experience with
OpenSolaris on Dell boxes about 2yr ago. It will crash the system in
similar fashion. There are multiple reports on the Internet about this and
for sure that solution worked for us. To be on the safe side I do the same
on the Supermicro boxes.

We switched to NFSv4 about two days ago and so far no crash. I'll be more
confident that this is the fix for us after running for at least 5 days
with no crash.

I wish I had the resources to do more tests. Unfortunately all I can tell
right now is that crashes are happening on SuperMicro hardware but not
Dell, and the trigger is exporting one particular directory via NFSv3. I
don't think it is the high IOPS. More likely it is related to the way the
directory is used. What we do is we move files and directories around and
re-point symlinks while everything has been accessed from the clients and
we do this every 15min.
Something like: mv nfsdir/targetdir nfsdir/targetdir.old; mv
nfsdir/targetdir.new nfsdir/targetdir.

To me it looks more like locking issue then high IOPS issue.


On Tue, Apr 23, 2013 at 11:26 PM, Alberto Picón Couselo
alpic...@gmail.comwrote:

 Hi.

 We have almost the same hardware as yours and we had the same issue. We
 have exported a ZFS pool to three Xen VMs Debian 6.0 mounted using NFSv3.
 When one of these boxes launches a highio PHP process to create a backup,
 it creates, copies and deletes a large amount of files. The box just
 crashed the same way as yours, during the deletion process, no ping, no
 log, no response at all. We had to do a cold restart unplugging system
 power cords...

 We have changed to NFSv4 hoping to fix this issue. Can please comment your
 results regarding this issues?

 Any help would be kindly appreciated.

 Best Regards,

   I've asked the ZFS discussion list for help on this but now I have more
  information and it looks like a bug in the drivers or something.

  I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
  151a and OI 151a.7. All these systems are used as storage servers, clean
 OS
  install, no extra services running. The systems are NFS exporting a lot
 of
  ZFS datasets that are mounted on about ten CentOS-5.9 systems.

  The above setup has been working for 2+ years with no problem.

  Recently we bought two Supermicro systems:
   Supermicro X9DRH-iF
   Xeon E5-2620 @ 2.0 GHz 6-Core
   128GB RAM
   LSI SAS9211-8i HBA
   32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

  I installed OI151.a.7 on them and started migrating data from the old
 Dell
  servers (zfs send/receive).

  Things have been working great for about two months until I migrated one
  particular directory to one of the new Supermicro systems and after about
  two days the system crashed. No network connectivity, black console, no
  response to keyboard keys, no activity lights (no error lights either) on
  the chassis. The only way out is to hit the reset button. Nothing in the
  logs as far as I can tell. Log entries just stop when the system crashes.

  In the following two months I did a lot of testing and a lot of trips to
  the colo in the middle of the night and the observation is that
 regardless
  of the OS everything works on the Dell servers. As soon as I move that
  directory to any of the Supermicro servers with OI151.a.7 it will crash
  them within 2 hours up to 5 days.

  The Supermicro servers can be idle, exporting nothing, or can be
 exporting
  15+ other directories with high IOPS and working for months with no
  problems but as soon as I have them export that directory they'll crash
 in
  5 days the most.

  There is only one difference between that directory an all others
 exported
  directories. One of the client systems that mounts it and writes to it is
  an old Debian 5.0 system. No idea why that would crash a Supermicro
 system
  but not a Dell system.

  We worked directly with LSI developers and upgraded the firmware to some
  unpublished, prerelease development version to no avail. We disabled all
  power saving features and CPU C states in the BIOS and nothing changed.

  Any idea?I had a similar kind of problem where a VirtualBox Freebsd 9.1
 VM could hang

 the server.
 It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running
 on.
 The are separate NFS shared datasets in on of my 3 pools.

 When I ran a make buildworld in that VM it consistently locked up the OI
 host,
 no console access,
 no network access ( not even ping ).
 As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang
 since.


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-24 Thread Alberto Picón Couselo
I can confirm you that we have disabled all power saving features of the 
boxes. However, I can't assure that CPU C states are totally disabled.


Anyway, we have changed to NFSv4 to test the system stability. The PHP 
process reads a folder with a huge number of hashed files and folders 
and creates a tarball, deleting the copy afterwards. As you comment, we 
think it could be due to some kind of locking/highio NFSv3 related issue...


If we create local users in /etc/passwd and /etc/groups, can you please 
tell us how to refresh NFSv4 server to update the user mapping table in 
Openindiana?. How do you face this issue?. If we restart the NFS service 
in Openindiana, using /etc/init.d/nfs restart, will NFSv4 clients 
reconnect or will they enter in a unstable state?.


Thank you very much in advance,

El 24/04/2013 20:11, Peter Wood escribió:
First thing I'll do is to go in the BIOS and disable CPU C states and 
disable all power saving features. If that doesn't help then try NFSv4.


The reason I disable CPU C states is because of previous experience 
with OpenSolaris on Dell boxes about 2yr ago. It will crash the system 
in similar fashion. There are multiple reports on the Internet about 
this and for sure that solution worked for us. To be on the safe side 
I do the same on the Supermicro boxes.


We switched to NFSv4 about two days ago and so far no crash. I'll be 
more confident that this is the fix for us after running for at least 
5 days with no crash.


I wish I had the resources to do more tests. Unfortunately all I can 
tell right now is that crashes are happening on SuperMicro hardware 
but not Dell, and the trigger is exporting one particular directory 
via NFSv3. I don't think it is the high IOPS. More likely it is 
related to the way the directory is used. What we do is we move files 
and directories around and re-point symlinks while everything has been 
accessed from the clients and we do this every 15min.
Something like: mv nfsdir/targetdir nfsdir/targetdir.old; mv 
nfsdir/targetdir.new nfsdir/targetdir.


To me it looks more like locking issue then high IOPS issue.


On Tue, Apr 23, 2013 at 11:26 PM, Alberto Picón Couselo 
alpic...@gmail.com mailto:alpic...@gmail.com wrote:


Hi.

We have almost the same hardware as yours and we had the same
issue. We have exported a ZFS pool to three Xen VMs Debian 6.0
mounted using NFSv3. When one of these boxes launches a highio PHP
process to create a backup, it creates, copies and deletes a large
amount of files. The box just crashed the same way as yours,
during the deletion process, no ping, no log, no response at all.
We had to do a cold restart unplugging system power cords...

We have changed to NFSv4 hoping to fix this issue. Can please
comment your results regarding this issues?

Any help would be kindly appreciated.

Best Regards,

 I've asked the ZFS discussion list for help on this but now I
have more
 information and it looks like a bug in the drivers or something.

 I have number of Dell PE R710 and PE 2950 servers running
OpenSolaris, OI
 151a and OI 151a.7. All these systems are used as storage
servers, clean OS
 install, no extra services running. The systems are NFS
exporting a lot of
 ZFS datasets that are mounted on about ten CentOS-5.9 systems.

 The above setup has been working for 2+ years with no problem.

 Recently we bought two Supermicro systems:
  Supermicro X9DRH-iF
  Xeon E5-2620 @ 2.0 GHz 6-Core
  128GB RAM
  LSI SAS9211-8i HBA
  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 I installed OI151.a.7 on them and started migrating data from
the old Dell
 servers (zfs send/receive).

 Things have been working great for about two months until I
migrated one
 particular directory to one of the new Supermicro systems and
after about
 two days the system crashed. No network connectivity, black
console, no
 response to keyboard keys, no activity lights (no error
lights either) on
 the chassis. The only way out is to hit the reset button.
Nothing in the
 logs as far as I can tell. Log entries just stop when the
system crashes.

 In the following two months I did a lot of testing and a lot
of trips to
 the colo in the middle of the night and the observation is
that regardless
 of the OS everything works on the Dell servers. As soon as I
move that
 directory to any of the Supermicro servers with OI151.a.7 it
will crash
 them within 2 hours up to 5 days.

 The Supermicro servers can be idle, exporting nothing, or can
be exporting
 15+ other directories with high IOPS and working for months
with no
 problems but as soon as I 

Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-11 Thread Paul van der Zwan

On 11 Apr 2013, at 0:29 , Peter Wood peterwood...@gmail.com wrote:

 On Wed, Apr 10, 2013 at 7:35 AM, Paul van der Zwan 
 pa...@vanderzwan.orgwrote:
 
 
 On 9 Apr 2013, at 3:13 , Peter Wood peterwood...@gmail.com wrote:
 
 I've asked the ZFS discussion list for help on this but now I have more
 information and it looks like a bug in the drivers or something.
 
 I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
 151a and OI 151a.7. All these systems are used as storage servers, clean
 OS
 install, no extra services running. The systems are NFS exporting a lot
 of
 ZFS datasets that are mounted on about ten CentOS-5.9 systems.
 
 The above setup has been working for 2+ years with no problem.
 
 Recently we bought two Supermicro systems:
 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 128GB RAM
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
 
 I installed OI151.a.7 on them and started migrating data from the old
 Dell
 servers (zfs send/receive).
 
 Things have been working great for about two months until I migrated one
 particular directory to one of the new Supermicro systems and after about
 two days the system crashed. No network connectivity, black console, no
 response to keyboard keys, no activity lights (no error lights either) on
 the chassis. The only way out is to hit the reset button. Nothing in the
 logs as far as I can tell. Log entries just stop when the system crashes.
 
 In the following two months I did a lot of testing and a lot of trips to
 the colo in the middle of the night and the observation is that
 regardless
 of the OS everything works on the Dell servers. As soon as I move that
 directory to any of the Supermicro servers with OI151.a.7 it will crash
 them within 2 hours up to 5 days.
 
 The Supermicro servers can be idle, exporting nothing, or can be
 exporting
 15+ other directories with high IOPS and working for months with no
 problems but as soon as I have them export that directory they'll crash
 in
 5 days the most.
 
 There is only one difference between that directory an all others
 exported
 directories. One of the client systems that mounts it and writes to it is
 an old Debian 5.0 system. No idea why that would crash a Supermicro
 system
 but not a Dell system.
 
 We worked directly with LSI developers and upgraded the firmware to some
 unpublished, prerelease development version to no avail. We disabled all
 power saving features and CPU C states in the BIOS and nothing changed.
 
 Any idea?
 
 I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could
 hang the server.
 It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running
 on.
 The are separate NFS shared datasets in on of my 3 pools.
 
 When I ran a make buildworld in that VM it consistently locked up the OI
 host, no console access,
 no network access ( not even ping ).
 As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang
 since.
 So it looked like a heavy NFSv3 load was the issue.
 
Paul
 
 
 Make sense. I haven't tried that.
 
 If I'm correct ZFS on OI supports NFSv2,3 and 4.
 
 By switching to NFSv4 you mean that on your client machine (the FreeBSD VM)
 you setup the NFS client to use NFSv4 protocol. Do I understand this
 correctly? Or, did you do something on the OI server to accept only NFSv4
 connections?
 
 Could you please give more information.

I haven't changed the server but only the mount options on on the client.
It's /etc/fstab now has:
192.168.178.24:/data/ports  /usr/ports  nfs rw,nfsv4-   
-
192.168.178.24:/data/src/usr/srcnfs rw,nfsv4-   
-
192.168.178.24:/data/obj/usr/objnfs rw,nfsv4  -   -

A make buildworld does seem to take quite a bit longer than when I was using 
nfsv3 so it might just be a case
of a lighter load. I have no hard data but it feels like it takes twice as long.

Paul


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-11 Thread Paul van der Zwan

On 10 Apr 2013, at 22:03 , Ian Collins i...@ianshome.com wrote:

 Paul van der Zwan wrote:
 
 When it hung the system would not respond to anything at all.
 The only way out I could find was a hard reset or power cycle.
 
 I do have the following in /etc/system:
 set snooping=1
 set pcplusmp:apic_panic_on_nmi=1
 But that did not make a difference.
 
 BTW the hang was/is reproducable, everytime I ran a make buildworld inside 
 the VM it would hang.
 I have tried a few make buildworlds now that I use NFSv4 and no hangs so far.
 
 Had you tried decoupling the VM host from the NFS storage?
 

Haven't tried it yet but will try to run a VM on a remote system and see what 
happens 
when I run a make buildworld on both nfs v3 and v4.


Paul


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-11 Thread Paul van der Zwan

On 10 Apr 2013, at 22:03 , Ian Collins i...@ianshome.com wrote:

 Paul van der Zwan wrote:
 
 When it hung the system would not respond to anything at all.
 The only way out I could find was a hard reset or power cycle.
 
 I do have the following in /etc/system:
 set snooping=1
 set pcplusmp:apic_panic_on_nmi=1
 But that did not make a difference.
 
 BTW the hang was/is reproducable, everytime I ran a make buildworld inside 
 the VM it would hang.
 I have tried a few make buildworlds now that I use NFSv4 and no hangs so far.
 
 Had you tried decoupling the VM host from the NFS storage?
 

I have just tried copying one of the vms to my imac and ran a make buildworld 
from that.
I had the /usr/src and /usr/obj mounted on nfsv3 and it completely locked up 
the server during 
the make cleandir phase. So when it is deleting a lot of files.

I had to power off and restart the server.
Will try an nfsv4 mounted attempt next.
 
Paul


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-10 Thread Paul van der Zwan

On 9 Apr 2013, at 3:13 , Peter Wood peterwood...@gmail.com wrote:

 I've asked the ZFS discussion list for help on this but now I have more
 information and it looks like a bug in the drivers or something.
 
 I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
 151a and OI 151a.7. All these systems are used as storage servers, clean OS
 install, no extra services running. The systems are NFS exporting a lot of
 ZFS datasets that are mounted on about ten CentOS-5.9 systems.
 
 The above setup has been working for 2+ years with no problem.
 
 Recently we bought two Supermicro systems:
  Supermicro X9DRH-iF
  Xeon E5-2620 @ 2.0 GHz 6-Core
  128GB RAM
  LSI SAS9211-8i HBA
  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
 
 I installed OI151.a.7 on them and started migrating data from the old Dell
 servers (zfs send/receive).
 
 Things have been working great for about two months until I migrated one
 particular directory to one of the new Supermicro systems and after about
 two days the system crashed. No network connectivity, black console, no
 response to keyboard keys, no activity lights (no error lights either) on
 the chassis. The only way out is to hit the reset button. Nothing in the
 logs as far as I can tell. Log entries just stop when the system crashes.
 
 In the following two months I did a lot of testing and a lot of trips to
 the colo in the middle of the night and the observation is that regardless
 of the OS everything works on the Dell servers. As soon as I move that
 directory to any of the Supermicro servers with OI151.a.7 it will crash
 them within 2 hours up to 5 days.
 
 The Supermicro servers can be idle, exporting nothing, or can be exporting
 15+ other directories with high IOPS and working for months with no
 problems but as soon as I have them export that directory they'll crash in
 5 days the most.
 
 There is only one difference between that directory an all others exported
 directories. One of the client systems that mounts it and writes to it is
 an old Debian 5.0 system. No idea why that would crash a Supermicro system
 but not a Dell system.
 
 We worked directly with LSI developers and upgraded the firmware to some
 unpublished, prerelease development version to no avail. We disabled all
 power saving features and CPU C states in the BIOS and nothing changed.
 
 Any idea?

I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could hang 
the server.
It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running on.
The are separate NFS shared datasets in on of my 3 pools.

When I ran a make buildworld in that VM it consistently locked up the OI host, 
no console access,
no network access ( not even ping ).
As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang since.
So it looked like a heavy NFSv3 load was the issue.

Paul


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-10 Thread Marcel Telka
On Wed, Apr 10, 2013 at 04:35:06PM +0200, Paul van der Zwan wrote:
 
 On 9 Apr 2013, at 3:13 , Peter Wood peterwood...@gmail.com wrote:
 
  I've asked the ZFS discussion list for help on this but now I have more
  information and it looks like a bug in the drivers or something.
  
  I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
  151a and OI 151a.7. All these systems are used as storage servers, clean OS
  install, no extra services running. The systems are NFS exporting a lot of
  ZFS datasets that are mounted on about ten CentOS-5.9 systems.
  
  The above setup has been working for 2+ years with no problem.
  
  Recently we bought two Supermicro systems:
   Supermicro X9DRH-iF
   Xeon E5-2620 @ 2.0 GHz 6-Core
   128GB RAM
   LSI SAS9211-8i HBA
   32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
  
  I installed OI151.a.7 on them and started migrating data from the old Dell
  servers (zfs send/receive).
  
  Things have been working great for about two months until I migrated one
  particular directory to one of the new Supermicro systems and after about
  two days the system crashed. No network connectivity, black console, no
  response to keyboard keys, no activity lights (no error lights either) on
  the chassis. The only way out is to hit the reset button. Nothing in the
  logs as far as I can tell. Log entries just stop when the system crashes.
  
  In the following two months I did a lot of testing and a lot of trips to
  the colo in the middle of the night and the observation is that regardless
  of the OS everything works on the Dell servers. As soon as I move that
  directory to any of the Supermicro servers with OI151.a.7 it will crash
  them within 2 hours up to 5 days.
  
  The Supermicro servers can be idle, exporting nothing, or can be exporting
  15+ other directories with high IOPS and working for months with no
  problems but as soon as I have them export that directory they'll crash in
  5 days the most.
  
  There is only one difference between that directory an all others exported
  directories. One of the client systems that mounts it and writes to it is
  an old Debian 5.0 system. No idea why that would crash a Supermicro system
  but not a Dell system.
  
  We worked directly with LSI developers and upgraded the firmware to some
  unpublished, prerelease development version to no avail. We disabled all
  power saving features and CPU C states in the BIOS and nothing changed.
  
  Any idea?
 
 I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could hang 
 the server.
 It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running on.
 The are separate NFS shared datasets in on of my 3 pools.
 
 When I ran a make buildworld in that VM it consistently locked up the OI 
 host, no console access,
 no network access ( not even ping ).
 As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang 
 since.
 So it looked like a heavy NFSv3 load was the issue.

Please try to get a crash dump file when the system is in hung state.
I'm interested to analyze the crash dump file.


Thanks.

-- 
+---+
| Marcel Telka   e-mail:   mar...@telka.sk  |
|homepage: http://telka.sk/ |
|jabber:   mar...@jabber.sk |
+---+

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-10 Thread Paul van der Zwan

On 10 Apr 2013, at 16:46 , Marcel Telka mar...@telka.sk wrote:

 On Wed, Apr 10, 2013 at 04:35:06PM +0200, Paul van der Zwan wrote:
 
 On 9 Apr 2013, at 3:13 , Peter Wood peterwood...@gmail.com wrote:
 
 I've asked the ZFS discussion list for help on this but now I have more
 information and it looks like a bug in the drivers or something.
 
 I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
 151a and OI 151a.7. All these systems are used as storage servers, clean OS
 install, no extra services running. The systems are NFS exporting a lot of
 ZFS datasets that are mounted on about ten CentOS-5.9 systems.
 
 The above setup has been working for 2+ years with no problem.
 
 Recently we bought two Supermicro systems:
 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 128GB RAM
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
 
 I installed OI151.a.7 on them and started migrating data from the old Dell
 servers (zfs send/receive).
 
 Things have been working great for about two months until I migrated one
 particular directory to one of the new Supermicro systems and after about
 two days the system crashed. No network connectivity, black console, no
 response to keyboard keys, no activity lights (no error lights either) on
 the chassis. The only way out is to hit the reset button. Nothing in the
 logs as far as I can tell. Log entries just stop when the system crashes.
 
 In the following two months I did a lot of testing and a lot of trips to
 the colo in the middle of the night and the observation is that regardless
 of the OS everything works on the Dell servers. As soon as I move that
 directory to any of the Supermicro servers with OI151.a.7 it will crash
 them within 2 hours up to 5 days.
 
 The Supermicro servers can be idle, exporting nothing, or can be exporting
 15+ other directories with high IOPS and working for months with no
 problems but as soon as I have them export that directory they'll crash in
 5 days the most.
 
 There is only one difference between that directory an all others exported
 directories. One of the client systems that mounts it and writes to it is
 an old Debian 5.0 system. No idea why that would crash a Supermicro system
 but not a Dell system.
 
 We worked directly with LSI developers and upgraded the firmware to some
 unpublished, prerelease development version to no avail. We disabled all
 power saving features and CPU C states in the BIOS and nothing changed.
 
 Any idea?
 
 I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could hang 
 the server.
 It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running 
 on.
 The are separate NFS shared datasets in on of my 3 pools.
 
 When I ran a make buildworld in that VM it consistently locked up the OI 
 host, no console access,
 no network access ( not even ping ).
 As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang 
 since.
 So it looked like a heavy NFSv3 load was the issue.
 
 Please try to get a crash dump file when the system is in hung state.
 I'm interested to analyze the crash dump file.
 
 

When it hung the system would not respond to anything at all.
The only way out I could find was a hard reset or power cycle.

I do have the following in /etc/system:
set snooping=1
set pcplusmp:apic_panic_on_nmi=1
But that did not make a difference.

BTW the hang was/is reproducable, everytime I ran a make buildworld inside the 
VM it would hang.
I have tried a few make buildworlds now that I use NFSv4 and no hangs so far.

Regards, 

Paul


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-10 Thread Ian Collins

Paul van der Zwan wrote:


When it hung the system would not respond to anything at all.
The only way out I could find was a hard reset or power cycle.

I do have the following in /etc/system:
set snooping=1
set pcplusmp:apic_panic_on_nmi=1
But that did not make a difference.

BTW the hang was/is reproducable, everytime I ran a make buildworld inside the 
VM it would hang.
I have tried a few make buildworlds now that I use NFSv4 and no hangs so far.


Had you tried decoupling the VM host from the NFS storage?

--
Ian.


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-10 Thread Peter Wood
On Wed, Apr 10, 2013 at 7:35 AM, Paul van der Zwan pa...@vanderzwan.orgwrote:


 On 9 Apr 2013, at 3:13 , Peter Wood peterwood...@gmail.com wrote:

  I've asked the ZFS discussion list for help on this but now I have more
  information and it looks like a bug in the drivers or something.
 
  I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
  151a and OI 151a.7. All these systems are used as storage servers, clean
 OS
  install, no extra services running. The systems are NFS exporting a lot
 of
  ZFS datasets that are mounted on about ten CentOS-5.9 systems.
 
  The above setup has been working for 2+ years with no problem.
 
  Recently we bought two Supermicro systems:
   Supermicro X9DRH-iF
   Xeon E5-2620 @ 2.0 GHz 6-Core
   128GB RAM
   LSI SAS9211-8i HBA
   32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
 
  I installed OI151.a.7 on them and started migrating data from the old
 Dell
  servers (zfs send/receive).
 
  Things have been working great for about two months until I migrated one
  particular directory to one of the new Supermicro systems and after about
  two days the system crashed. No network connectivity, black console, no
  response to keyboard keys, no activity lights (no error lights either) on
  the chassis. The only way out is to hit the reset button. Nothing in the
  logs as far as I can tell. Log entries just stop when the system crashes.
 
  In the following two months I did a lot of testing and a lot of trips to
  the colo in the middle of the night and the observation is that
 regardless
  of the OS everything works on the Dell servers. As soon as I move that
  directory to any of the Supermicro servers with OI151.a.7 it will crash
  them within 2 hours up to 5 days.
 
  The Supermicro servers can be idle, exporting nothing, or can be
 exporting
  15+ other directories with high IOPS and working for months with no
  problems but as soon as I have them export that directory they'll crash
 in
  5 days the most.
 
  There is only one difference between that directory an all others
 exported
  directories. One of the client systems that mounts it and writes to it is
  an old Debian 5.0 system. No idea why that would crash a Supermicro
 system
  but not a Dell system.
 
  We worked directly with LSI developers and upgraded the firmware to some
  unpublished, prerelease development version to no avail. We disabled all
  power saving features and CPU C states in the BIOS and nothing changed.
 
  Any idea?

 I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could
 hang the server.
 It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running
 on.
 The are separate NFS shared datasets in on of my 3 pools.

 When I ran a make buildworld in that VM it consistently locked up the OI
 host, no console access,
 no network access ( not even ping ).
 As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang
 since.
 So it looked like a heavy NFSv3 load was the issue.

 Paul


Make sense. I haven't tried that.

If I'm correct ZFS on OI supports NFSv2,3 and 4.

By switching to NFSv4 you mean that on your client machine (the FreeBSD VM)
you setup the NFS client to use NFSv4 protocol. Do I understand this
correctly? Or, did you do something on the OI server to accept only NFSv4
connections?

Could you please give more information.

Thanks,

-- Peter
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-09 Thread Ram Chander
There could be corruption in that dir. Can you run a scrub on the pool

zpool scrub pool


On Tue, Apr 9, 2013 at 6:43 AM, Peter Wood peterwood...@gmail.com wrote:

 I've asked the ZFS discussion list for help on this but now I have more
 information and it looks like a bug in the drivers or something.

 I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
 151a and OI 151a.7. All these systems are used as storage servers, clean OS
 install, no extra services running. The systems are NFS exporting a lot of
 ZFS datasets that are mounted on about ten CentOS-5.9 systems.

 The above setup has been working for 2+ years with no problem.

 Recently we bought two Supermicro systems:
   Supermicro X9DRH-iF
   Xeon E5-2620 @ 2.0 GHz 6-Core
   128GB RAM
   LSI SAS9211-8i HBA
   32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 I installed OI151.a.7 on them and started migrating data from the old Dell
 servers (zfs send/receive).

 Things have been working great for about two months until I migrated one
 particular directory to one of the new Supermicro systems and after about
 two days the system crashed. No network connectivity, black console, no
 response to keyboard keys, no activity lights (no error lights either) on
 the chassis. The only way out is to hit the reset button. Nothing in the
 logs as far as I can tell. Log entries just stop when the system crashes.

 In the following two months I did a lot of testing and a lot of trips to
 the colo in the middle of the night and the observation is that regardless
 of the OS everything works on the Dell servers. As soon as I move that
 directory to any of the Supermicro servers with OI151.a.7 it will crash
 them within 2 hours up to 5 days.

 The Supermicro servers can be idle, exporting nothing, or can be exporting
 15+ other directories with high IOPS and working for months with no
 problems but as soon as I have them export that directory they'll crash in
 5 days the most.

 There is only one difference between that directory an all others exported
 directories. One of the client systems that mounts it and writes to it is
 an old Debian 5.0 system. No idea why that would crash a Supermicro system
 but not a Dell system.

 We worked directly with LSI developers and upgraded the firmware to some
 unpublished, prerelease development version to no avail. We disabled all
 power saving features and CPU C states in the BIOS and nothing changed.

 Any idea?

 Thanks a lot.

 -- Peter
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


[OpenIndiana-discuss] NFS exported dataset crashes the system

2013-04-08 Thread Peter Wood
I've asked the ZFS discussion list for help on this but now I have more
information and it looks like a bug in the drivers or something.

I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
151a and OI 151a.7. All these systems are used as storage servers, clean OS
install, no extra services running. The systems are NFS exporting a lot of
ZFS datasets that are mounted on about ten CentOS-5.9 systems.

The above setup has been working for 2+ years with no problem.

Recently we bought two Supermicro systems:
  Supermicro X9DRH-iF
  Xeon E5-2620 @ 2.0 GHz 6-Core
  128GB RAM
  LSI SAS9211-8i HBA
  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

I installed OI151.a.7 on them and started migrating data from the old Dell
servers (zfs send/receive).

Things have been working great for about two months until I migrated one
particular directory to one of the new Supermicro systems and after about
two days the system crashed. No network connectivity, black console, no
response to keyboard keys, no activity lights (no error lights either) on
the chassis. The only way out is to hit the reset button. Nothing in the
logs as far as I can tell. Log entries just stop when the system crashes.

In the following two months I did a lot of testing and a lot of trips to
the colo in the middle of the night and the observation is that regardless
of the OS everything works on the Dell servers. As soon as I move that
directory to any of the Supermicro servers with OI151.a.7 it will crash
them within 2 hours up to 5 days.

The Supermicro servers can be idle, exporting nothing, or can be exporting
15+ other directories with high IOPS and working for months with no
problems but as soon as I have them export that directory they'll crash in
5 days the most.

There is only one difference between that directory an all others exported
directories. One of the client systems that mounts it and writes to it is
an old Debian 5.0 system. No idea why that would crash a Supermicro system
but not a Dell system.

We worked directly with LSI developers and upgraded the firmware to some
unpublished, prerelease development version to no avail. We disabled all
power saving features and CPU C states in the BIOS and nothing changed.

Any idea?

Thanks a lot.

-- Peter
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss