Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-11-08 Thread Lachlan Mulcahy
Hi All,


On Wed, Nov 2, 2011 at 5:24 PM, Lachlan Mulcahy
lmulc...@marinsoftware.comwrote:

 Now trying another suggestion sent to me by a direct poster:

 *   Recommendation from Sun (Oracle) to work around a bug:
 *   6958068 - Nehalem deeper C-states cause erratic scheduling
 behavior
 set idle_cpu_prefer_mwait = 0
 set idle_cpu_no_deep_c = 1

 Was apparently the cause of a similar symptom for them and we are using
 Nehalem.

 At this point I'm running out of options, so it can't hurt to try it.


 So far the system has been running without any lock ups since very late
 Monday evening -- we're now almost 48 hours on.

 So far so good, but it's hard to be certain this is the solution, since I
 could never prove it was the root cause.

 For now I'm just continuing to test and build confidence level. More time
 will make me more confident. Maybe a week or so


We're now over a week running with C-states disabled and have not
experienced any further system lock ups. I am feeling much more confident
in this system now -- it will probably see at least another week or two in
addition to more load/QA testing and then be pushed into production.

Will update if I see the issue crop up again, but for anyone else
experiencing a similar symptom, I'd highly recommend trying this as a
solution.

So far it seems to have worked for us.

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-11-03 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Lachlan Mulcahy
 
 I have been having issues with Solaris kernel based systems locking up
and
 am wondering if anyone else has observed a similar symptom before.
 
 ...

 Dell R710 / 80G Memory with two daisy chained MD1220 disk arrays - 22
Disks
 each - 600GB 10k RPM SAS Drives
 Storage Controller: LSI, Inc. 1068E (JBOD)

Please see
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-November/046189.html

But I'll need to expand upon this a little more here:  When we bought that
system, solaris was a supported os on the R710.  We paid for oracle gold
support (or whatever they called it) and we dug into it for hours and hours,
weeks and weeks, never got anywhere.

When I replaced the NIC (don't use the built-in bcom nic) it became much
better.  It went from crashing weekly to crashing monthly.

I don't believe you'll ever be able to make the problem go away completely.
This is the nature of running on unsupported hardware - even if you pay and
get a support contract - they just don't develop or test on that platform
with any quantity, so the end result is crap.

We have since reprovisioned the R710 to other purposes, where it's perfectly
stable.  We have also bought Sun (oracle) server to fill the requirements
that were formerly filled by the R710 with solaris, and it's also perfectly
stable.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-11-03 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Lachlan Mulcahy
 
 *       Recommendation from Sun (Oracle) to work around a bug:
 *       6958068 - Nehalem deeper C-states cause erratic scheduling
behavior
 set idle_cpu_prefer_mwait = 0
 set idle_cpu_no_deep_c = 1
 Was apparently the cause of a similar symptom for them and we are using
 Nehalem.

FWIW, we also disabled the c-states.  It seemed to make an improvement, but
not what I would call a fix for us.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-11-03 Thread Lachlan Mulcahy
Hi Edward,

Thanks for your input.


Please see
 http://mail.opensolaris.org/pipermail/zfs-discuss/2010-November/046189.html

 But I'll need to expand upon this a little more here:  When we bought that
 system, solaris was a supported os on the R710.  We paid for oracle gold
 support (or whatever they called it) and we dug into it for hours and
 hours,
 weeks and weeks, never got anywhere.

 When I replaced the NIC (don't use the built-in bcom nic) it became much
 better.  It went from crashing weekly to crashing monthly.


We have had no end of trouble with those junky broadcomm NICs.

When we originally moved to all Dell hardware from a hosted solution we had
a load of problems with the NICs just dropping packets when the CPU got
busy. The platform for those machines was CentOS 5.5 -- It never created
instability or server crashes, however.

We generally use Intel NICs for our production interfaces now. Over the
years I've found that Intel simply make rock solid NICs.

We still use those broadcomm NICs but mostly for out of band/maintenance
network access. On the host in question we are using Intel for the main
production interface and a broadcomm device for maintenance.

If I see any more issues, I'll consider disabling the onboard broadcomms
via BIOS. So far with the sleep states disabled it seems better.

I don't believe you'll ever be able to make the problem go away completely.
 This is the nature of running on unsupported hardware - even if you pay and
 get a support contract - they just don't develop or test on that platform
 with any quantity, so the end result is crap.

 We have since reprovisioned the R710 to other purposes, where it's
 perfectly
 stable.  We have also bought Sun (oracle) server to fill the requirements
 that were formerly filled by the R710 with solaris, and it's also perfectly
 stable.


Unfortunately the point of using Solaris/ZFS for us in this particular
instance is to avoid having to buy more hardware.

We have a system that just needs to hold a lot of data and RAIDZ2 w/ lzjb
gets us about 4-5X the useable space as a regular xfs/ext4 RAID-10 with the
same disks.

This system is legacy and just needs to live for another few months and
support the data growth, so the idea here is to try to avoid spending money
on something that is going away.

This sort of precludes buying SnOracle(Sun) hardware to run Solaris on
tried and true gear. :-/

Thanks and Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-11-02 Thread Lachlan Mulcahy
Hi All,

No joy.. the system seized up again within a few hours of coming back up.

 Now trying another suggestion sent to me by a direct poster:

 *   Recommendation from Sun (Oracle) to work around a bug:
 *   6958068 - Nehalem deeper C-states cause erratic scheduling behavior
 set idle_cpu_prefer_mwait = 0
 set idle_cpu_no_deep_c = 1

 Was apparently the cause of a similar symptom for them and we are using
 Nehalem.

 At this point I'm running out of options, so it can't hurt to try it.


So far the system has been running without any lock ups since very late
Monday evening -- we're now almost 48 hours on.

So far so good, but it's hard to be certain this is the solution, since I
could never prove it was the root cause.

For now I'm just continuing to test and build confidence level. More time
will make me more confident. Maybe a week or so

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-10-31 Thread Marion Hakanson
lmulc...@marinsoftware.com said:
 . . .
 The MySQL server is:
 Dell R710 / 80G Memory with two daisy chained MD1220 disk arrays - 22 Disks
 each - 600GB 10k RPM SAS Drives Storage Controller: LSI, Inc. 1068E (JBOD)
 
 I have also seen similar symptoms on systems with MD1000 disk arrays
 containing 2TB 7200RPM SATA drives.
 
 The only thing of note that seems to show up in the /var/adm/messages file on
 this MySQL server is:
 
 Oct 31 18:24:51 mslvstdp02r scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/
 pci8086,3410@9/pci1000,3080@0 (mpt0): Oct 31 18:24:51 mslvstdp02r mpt
 request inquiry page 0x89 for SATA target:58 failed! Oc
 . . .

Have you got the latest firmware on your LSI 1068E HBA's?  These have been
known to have lockups/timeouts when used with SAS expanders (disk enclosures)
with incompatible firmware revisions, and/or with older mpt drivers.

The MD1220 is a 6Gbit/sec device.  You may be better off with a matching
HBA  -- Dell has certainly told us the MD1200-series is not intended for
use with the 3Gbit/sec HBA's.  We're doing fine with the LSI SAS 9200-8e,
for example, when connecting to Dell MD1200's with the 2TB nearline SAS
disk drives.

Last, are you sure it's memory-related?  You might keep an eye on arcstat.pl
output and see what the ARC sizes look like just prior to lockup.  Also,
maybe you can look up instructions on how to force a crash dump when the
system hangs -- one of the experts around here could tell a lot from a
crash dump file.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-10-31 Thread Lachlan Mulcahy
Hi Marion,

Thanks for your swifty reply!

Have you got the latest firmware on your LSI 1068E HBA's?  These have been
 known to have lockups/timeouts when used with SAS expanders (disk
 enclosures)
 with incompatible firmware revisions, and/or with older mpt drivers.


I'll need to check that out -- I'm 90% sure that these are fresh out of box
HBAs.

Will try an upgrade there and see if we get any joy there...


 The MD1220 is a 6Gbit/sec device.  You may be better off with a matching
 HBA  -- Dell has certainly told us the MD1200-series is not intended for
 use with the 3Gbit/sec HBA's.  We're doing fine with the LSI SAS 9200-8e,
 for example, when connecting to Dell MD1200's with the 2TB nearline SAS
 disk drives.


I was aware the MD1220 is a 6G device, but I figured that since our IO
throughput doesn't actually come close to saturating 3Gbit/sec that it
would just operate at the lower speed and be OK. I guess it is something to
look at if I run out of other options...


Last, are you sure it's memory-related?  You might keep an eye on 
 arcstat.pl
 output and see what the ARC sizes look like just prior to lockup.  Also,
 maybe you can look up instructions on how to force a crash dump when the
 system hangs -- one of the experts around here could tell a lot from a
 crash dump file.


I'm starting to doubt that it is a memory issue now -- especially since I
now have some results from my latest test...

output of arcstat.pl looked like this just prior to the lock up:

19:57:3624G   24G   94   16161   194   1   1
19:57:4124G   24G   96   17462   213   0   0
time  arcsz c  mh%  mhit  hit%  hits  l2hit%  l2hits
19:57:4623G   24G   94   16162   192   1   1
19:57:5124G   24G   96   16963   205   0   0
19:57:5624G   24G   95   16961   206   0   0

^-- This is the very last line printed...

I actually discovered and rebooted the machine via DRAC at around 20:44, so
it had been in it's bad state for around 1 hour.

Some snippets from the output some 20 minutes earlier shows the point at
while the arcsz grew to reach the maximum:

time  arcsz c  mh%  mhit  hit%  hits  l2hit%  l2hits
19:36:4521G   24G   95   15258   177   0   0
19:37:0022G   24G   95   15657   182   0   0
19:37:1522G   24G   95   15959   185   0   0
19:37:3023G   24G   94   15358   178   0   0
19:37:4523G   24G   95   16959   195   0   0
19:38:0024G   24G   95   16059   187   0   0
19:38:2524G   24G   96   15158   177   0   0

So it seems that arcsz reaching the 24G maximum wasn't necessarily to
blame, since the system operated for a good 20mins in this state.

I was also logging vmstat 5 prior to the crash (though I forgot to
include some timestamps in my output) and these are the final lines
recorded in that log:

 kthr  memorypagedisk  faults  cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy
id
 0 0 0 25885248 18012208 71 2090 0 0 0 0 0  0  0  0 22 17008 210267 30229 1
5 94
 0 0 0 25884764 18001848 71 2044 0 0 0 0 0  0  0  0 25 14846 151228 25911 1
5 94
 0 0 0 25884208 17991876 71 2053 0 0 0 0 0  0  0  0  8 16343 185416 28946 1
5 93

So it seems there was some 17-18G free in the system when the lock up
occurred. Curious...

I was also capturing some arc info from mdb -k  and the output prior to the
lock up was...

Monday, October 31, 2011 07:57:51 PM UTC
arc_no_grow   = 0
arc_tempreserve   = 0 MB
arc_meta_used =  4621 MB
arc_meta_limit= 20480 MB
arc_meta_max  =  4732 MB
Monday, October 31, 2011 07:57:56 PM UTC
arc_no_grow   = 0
arc_tempreserve   = 0 MB
arc_meta_used =  4622 MB
arc_meta_limit= 20480 MB
arc_meta_max  =  4732 MB

Looks like metadata was not primarily responsible for consuming all of that
24G of ARC in arcstat.pl output...

Also seems nothing interesting in /var/adm/messages leading up to my
rebooting :

Oct 31 18:42:57 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 512 PPM exceeds tolerance 500 PPM
Oct 31 18:44:01 mslvstdp02r last message repeated 1 time
Oct 31 18:45:05 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 512 PPM exceeds tolerance 500 PPM
Oct 31 18:46:09 mslvstdp02r last message repeated 1 time
Oct 31 18:47:23 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:06:13 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:09:27 mslvstdp02r last message repeated 4 times
Oct 31 19:25:04 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:28:17 mslvstdp02r last message repeated 3 times

Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-10-31 Thread Lachlan Mulcahy
Hi All/Marion,

A small update...


known to have lockups/timeouts when used with SAS expanders (disk
 enclosures)
 with incompatible firmware revisions, and/or with older mpt drivers.


 I'll need to check that out -- I'm 90% sure that these are fresh out of
 box HBAs.

 Will try an upgrade there and see if we get any joy there...


We did not have the latest firmware on the HBA - through a lot of pain I
managed to boot into an MS-DOS disk and run the firmware update. We're now
running the latest on this card from the LSI.com website. (both HBA BIOS
and Firmware)


  The MD1220 is a 6Gbit/sec device.  You may be better off with a matching
 HBA  -- Dell has certainly told us the MD1200-series is not intended for
 use with the 3Gbit/sec HBA's.  We're doing fine with the LSI SAS 9200-8e,
 for example, when connecting to Dell MD1200's with the 2TB nearline SAS
 disk drives.


 I was aware the MD1220 is a 6G device, but I figured that since our IO
 throughput doesn't actually come close to saturating 3Gbit/sec that it
 would just operate at the lower speed and be OK. I guess it is something to
 look at if I run out of other options...


This was my mistake - this particular system has MD1120s attached to it. We
have a mix of 1220s and 1120s since we've been with Dell since the 1120s
were current model.

Just kicked off the system running with the same logging as before with
this new firmware, so I'll see if this goes any better.

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-10-31 Thread Lachlan Mulcahy
Hi All,


We did not have the latest firmware on the HBA - through a lot of pain I
 managed to boot into an MS-DOS disk and run the firmware update. We're now
 running the latest on this card from the LSI.com website. (both HBA BIOS
 and Firmware)


No joy.. the system seized up again within a few hours of coming back up.

Now trying another suggestion sent to me by a direct poster:

*   Recommendation from Sun (Oracle) to work around a bug:
*   6958068 - Nehalem deeper C-states cause erratic scheduling behavior
set idle_cpu_prefer_mwait = 0
set idle_cpu_no_deep_c = 1

Was apparently the cause of a similar symptom for them and we are using
Nehalem.

At this point I'm running out of options, so it can't hurt to try it.

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris Based Systems Lock Up - Possibly ZFS/memory related?

2011-10-31 Thread Richard Elling
FWIW, we recommend disabling C-states in the BIOS for NexentaStor systems.
C-states are evil.
 -- richard

On Oct 31, 2011, at 9:46 PM, Lachlan Mulcahy wrote:

 Hi All,
 
 
 We did not have the latest firmware on the HBA - through a lot of pain I 
 managed to boot into an MS-DOS disk and run the firmware update. We're now 
 running the latest on this card from the LSI.com website. (both HBA BIOS and 
 Firmware)
 
 No joy.. the system seized up again within a few hours of coming back up. 
 
 Now trying another suggestion sent to me by a direct poster:
 
 *   Recommendation from Sun (Oracle) to work around a bug:
 *   6958068 - Nehalem deeper C-states cause erratic scheduling behavior
 set idle_cpu_prefer_mwait = 0
 set idle_cpu_no_deep_c = 1
 
 Was apparently the cause of a similar symptom for them and we are using 
 Nehalem.
 
 At this point I'm running out of options, so it can't hurt to try it.
 
 Regards,
 -- 
 Lachlan Mulcahy
 Senior DBA, 
 Marin Software Inc.
 San Francisco, USA
 
 AU Mobile: +61 458 448 721
 US Mobile: +1 (415) 867 2839
 Office : +1 (415) 671 6080
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA '11, Boston, MA, December 4-9 














___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss