Re: Terrible disk performance with LSI / FreeBSD 9.2-RC1

2013-08-10 Thread J David
To follow up on this issue, at one point the stats were down to this:

extended device statistics
device r/s   w/skr/skw/s qlen svc_t  %b
da00.0   0.0 0.0 0.00   0.0   0
da10.0   0.0 0.0 0.00   0.0   0
da2  127.9   0.0   202.3 0.01  47.5 100
da3  125.9   0.0   189.3 0.01  43.1  97
da4  127.9   0.0   189.8 0.01  45.8 100
da5  128.9   0.0   206.3 0.00  42.5  99
da6  127.9   0.0   202.3 0.01  46.2  98
da70.0 249.7 0.0   334.2   10  39.5 100

At some point, I figured out that 125 random iops is pretty much the
limit for 7200 RPM SATA drives.  So mostly what we're looking at here
is the resilver of a raidz2 is the pathological worst case.  Lesson
learned; raidz2 is just really not viable without some kind of sort on
the resilver operations.  Wish I understood ZFS well enough to do
something about that, but research suggests the problem is
non-trivial. :(

There also seems to be a separate ZFS issue related to having a very
large number of snapshots (e.g. hourly for several months on a couple
of filesystems).  Some combination of the OS updates we've been doing
trying to get this machine to 9.2-RC1 and deleting a ton of snapshots.
 It would be nice to know which it was; I guess we'll find out in a
few months.

So it seems like the combination of these two issues is mostly what
is/was plaguing us.

Thanks!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Terrible disk performance with LSI / FreeBSD 9.2-RC1

2013-08-07 Thread J David
On Wed, Aug 7, 2013 at 3:15 PM, James Gosnell jamesgosn...@gmail.com wrote:
 Maybe one of your drives is bad, so it's constantly doing error correction?

Not according to SMART; all the drives report no problems.  Also, all
the drives seem to perform in lock-step for both reading and writing.
E.g. when one drive in an array is failing, all the drives may be
pulling the same # of reads, but the failing drive will often report
100% busy and/or multi-second svc_t's and the others will sit at 4%
with 20msec svc_t's or similar.  In this case, it's acting like the
disks are all hugely overloaded.   Except without even the high
svc_t's I typically associate with overworking an array.

The speeds do fluctuate.  Last night it was down to 64k/sec reads per
drive (about 15 reads/sec) and still reporting 90% busy on all drives.

It feels like some sort of issue with the
bus/controller/kernel/driver/ZFS that is affecting all the drives
equally.

Also, even ls takes forever (10-30 seconds for ls -lh /) but when it
eventually does finish, time ls -lh / reports:

0.02 real 0.00 user 0.00 sys

Really not sure what to make of that. An attempt to do ps axlww |
fgrep ls while the ls was running failed, because the ps hangs just
as long as the ls.  So it's like the system is just repeatedly putting
anything that touches the disks on hold, even if all the data being
requested is clearly in cache.  (Even apparently loading the binary
for /bin/ls or doing ls -lh / twice in a row.)

Thanks!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Terrible disk performance with LSI / FreeBSD 9.2-RC1

2013-08-06 Thread J David
We have a machine running 9.2-RC1 that's getting terrible disk I/O
performance.  Its performance has always been pretty bad, but it
didn't really become clear how bad until we did a zpool replace on one
of the drives and realized it was going to take 3 weeks to rebuild a
1TB drive.

The hardware specs are:
- 2 x Xeon L5420
- 32 GiB RAM
- LSI Logic SAS 1068E
- 2 x 32GB SSD's
- 6 x 1TB Western Digital RE3 7200RPM SATA

The LSI controller has the most recent firmware I'm aware of
(6.36.00.00 / 1.33.00.00 dated 2011.08.24), is in IT mode, and appears
to be working fine:

mpt0 Adapter:
   Board Name: USASLP-L8i
   Board Assembly: USASLP-L8i
Chip Name: C1068E
Chip Revision: B3
  RAID Levels: none

mpt0 Configuration: 0 volumes, 8 drives
drive da0 (30G) ONLINE FTM32GL25H 10 SATA
drive da1 (29G) ONLINE SSDSA2SH032G1GN 8860 SATA
drive da2 (931G) ONLINE WDC WD1002FBYS-0 0C05 SATA
drive da3 (931G) ONLINE WDC WD1002FBYS-0 0C05 SATA
drive da4 (931G) ONLINE WDC WD1002FBYS-0 0C05 SATA
drive da5 (931G) ONLINE WDC WD1002FBYS-0 0C05 SATA
drive da6 (931G) ONLINE WDC WD1002FBYS-0 0C05 SATA
drive da7 (931G) ONLINE WDC WD1002FBYS-0 0C05 SATA

The eight drives are configured as ZIL, L2ARC on SSD and a six drive
raidz2 on the spinning disks.

We did a ZFS replace on the last drive in the line, and the resilver
is proceeding at less than 800k/sec.

extended device statistics
device r/s   w/skr/skw/s qlen svc_t  %b
da00.0   0.0 0.0 0.10   0.9   0
da10.0   8.2 0.019.90   0.1   0
da2  125.6  23.0   768.240.54  33.0  88
da3  126.6  23.1   769.041.34  32.3  89
da4  126.0  24.0   768.542.74  32.1  88
da5  125.9  22.0   768.240.14  31.6  87
da6  124.0  22.0   766.639.95  31.4  84
da70.0 136.9 0.0   801.30   0.6   4

The system has plenty of free RAM, is 99.7% idle, has nothing else
going on, and runs like a one-legged dog.

There are no error messages or any sign of a problem anywhere, other
than the really terrible performance.  (When not rebuilding, it does
light NFS duty.  That performance is similarly bad, but has never
really mattered.)

Similar systems running Solaris put out 10x these numbers claiming 30%
busy instead of 90% busy.

Does anyone have any suggestions for how I could troubleshoot this
further?  At this point, I'm kind of at a loss as to where to go from
here.  My goal is to try to phase out the Solaris machines, but this
is kind of a roadblock.

Thanks for any advice!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: 9-STABLE doesn't boot: can't load 'kernel'

2013-04-16 Thread J David
loader.conf was empty and there's no 4k gnops, geli, anything like that.
 This is a 100% normal install.

Although, since you mentioned 4k blocks, I did leave a gap between ada0p1
and ada0p2 to start the root partition on a 4k boundary.  (It's an SSD that
will almost never be written to once installed, so that might be a bit
silly, but it's a habit already.)

I decided to try this again without the gap, and that seems to have worked.
 I made it through install and partitioning and OS updating to 9-STABLE and
installing new boot blocks and it seems to have worked.  I even got it to
work with a ZFS root.

Here's the partition table I ended up with:

=   34  234441581  ada0  GPT  (111G)
 34990 1  freebsd-boot  (495k)
   1024  226051072 2  freebsd-zfs  (107G)
  2260520968389519 3  freebsd-swap  (4.0G)

I'm not sure why this would make a difference, but either it does or doing
it cleared out whatever else was wrong.  This box will be stress tested and
rebooted quite a bit in the next few days, so I will report back if it
comes unglued. :)

Thanks for the suggestion!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


9-STABLE doesn't boot: can't load 'kernel'

2013-04-15 Thread J David
After installing 9.1-RELEASE amd64 on a system, it boots up fine.  If I
then build and install a new 9-STABLE kernel  world, reboots die in the
loader with:

can't load 'kernel'

This is a pretty straightforward system, one drive, not large (128GB SSD).
 GPT partitioned, gptboot boot code.  One UFS root partition to boot from,
a swap partition and, the rest for ZFS.

(At first I tried to do this system with root-on-ZFS but that also failed,
adding unable to load zpool by guid or similar before the can't load
'kernel' message.)

Once this happens, the disk is unbootable.  I can start from the install CD
and access the disk just fine, but even if I move kernel.old back to
kernel, it doesn't boot anymore.  Likewise, it doesn't matter if I
overwrite the boot code with gptboot  pmbr from the install CD or the new
ones from /boot after installworld.

The disk looks like:

# gpart show
= 34 234441581 ada0 GPT (111G)
34 222 1 freebsd-boot (111k)
256 1792 - free - (896k)
2048 8388608 2 freebsd-ufs (4.0G)
8390656 8388608 3 freebsd-swap (4.0G)
16779264 217662351 4 freebsd-zfs (103G)

In the loader:
BTX loader 1.00  BTX version is 1.02
Consoles: internal video/keyboard
BIOS drive C: is disk0
BIOS 621kB/2067924kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(root@builder, Mon Apr 15 09:14:38 UTC 2013)

can't load 'kernel'

Type '?' for a list of commands, 'help' for more detailed help.
OK show
[…]
currdev=disk0p2:
[…]
loaddev=disk0p2:
[…]
OK lsdev
cd devices:
disk devices:
   disk0: BIOS drive C:
pxe devices:
OK ls
open '/' failed: no such file or directory
OK help
Verbose help not available, use '?' to list commands

So it's getting the boot device right (disk0p2 / ada0p2), but can't see it
at all.

Does anyone know what might be wrong?

Thanks for any advice!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org