FreeBSD Quarterly Status Report, July-September 2012.

2013-03-04 Thread Isabell Long
FreeBSD Quarterly Status Report, July-September 2012.

Introduction

   This report covers FreeBSD-related projects between July and September
   2012. This is the third of the four reports planned for 2012.

   Highlights from this quarter include successful participation in Google
   Summer of Code, major work in areas of the source and ports trees, and
   a Developer Summit attended by over 30 developers.

   Thanks to all the reporters for the excellent work! This report
   contains 12 entries and we hope you enjoy reading it.
 __

Projects

 * FreeBSD on Altera FPGAs
 * Native iSCSI Target
 * Parallel rc.d execution

FreeBSD Team Reports

 * FreeBSD Bugbusting Team
 * FreeBSD Foundation
 * The FreeBSD Core Team

Kernel

 * FreeBSD on ARMv6/ARMv7

Documentation

 * The FreeBSD Japanese Documentation Project

Ports

 * KDE/FreeBSD
 * Ports Collection

Miscellaneous

 * FreeBSD Developer Summit, Cambridge, UK

FreeBSD in Google Summer of Code

 * Google Summer of Code 2012
 __

FreeBSD Bugbusting Team

   URL: http://www.FreeBSD.org/support.html#gnats
   URL: https://wiki.freebsd.org/BugBusting

   Contact: Eitan Adler ead...@freebsd.org
   Contact: Gavin Atkinson ga...@freebsd.org
   Contact: Oleksandr Tymoshenko go...@freebsd.org

   In August, Eitan Adler (eadler@) and Oleksandr Tymoshenko (gonzo@)
   joined the Bugmeister team. At the same time, Remko Lodder and Volker
   Werth stepped down. We extend our thanks to Volker and Remko for their
   work in the past, and welcome Oleksandr and Eitan. Eitan and Oleksandr
   have been working hard on migrating from GNATS, and have made
   significant progress on evaluating new software, and creating scripts
   to export data from GNATS.

   The bugbusting team continue work on trying to make the contents of the
   GNATS PR database cleaner, more accessible and easier for committers to
   find and resolve PRs, by tagging PRs to indicate the areas involved,
   and by ensuring that there is sufficient info within each PR to resolve
   each issue.

   As always, anybody interested in helping out with the PR queue is
   welcome to join us in #freebsd-bugbusters on EFnet. We are always
   looking for additional help, whether your interests lie in triaging
   incoming PRs, generating patches to resolve existing problems, or
   simply helping with the database housekeeping (identifying duplicate
   PRs, ones that have already been resolved, etc). This is a great way of
   getting more involved with FreeBSD!

Open tasks:

1. Further research into tools suitable to replace GNATS.
2. Get more users involved with triaging PRs as they come in.
3. Assist committers with closing PRs.
 __

FreeBSD Developer Summit, Cambridge, UK

   URL: https://wiki.freebsd.org/201208DevSummit

   Contact: Robert Watson rwat...@freebsd.org

   In the end of August, there was an off-season Developer Summit held
   in Cambridge, UK at the University of Cambridge Computer Laboratory.
   This was a three-day event, with a documentation summit scheduled for
   the day before. The three days of the main event were split into three
   sessions, with two tracks in each. Some of them even involved ARM
   developers from the neighborhoods which proven to be productive, and
   led to further engagement between the FreeBSD community and ARM.

   The schedule was finalized on the first day, spawning a plethora of
   topics to discuss, followed by splitting into groups. A short summary
   from each of the groups was presented in the final session and then
   published at the event's home page on the FreeBSD wiki. This summit
   contributed greatly to arriving to a tentative plan for throwing the
   switch to make clang the default compiler on HEAD. This was further
   discussed on the mailing list, and has now happened, bringing us one
   big step closer to a GPL-free FreeBSD 10. As part of the program, an
   afternoon of short talks from researchers in the Cambridge Computer
   Laboratory involved either operating systems work in general or FreeBSD
   in particular. Robert Watson showed off a tablet running FreeBSD on a
   MIPS-compatible soft-core processor running on an Altera FPGA.

   In association with the event, a dinner was hosted by St. John's
   college and co-sponsored by Google and the FreeBSD Foundation. The day
   after the conference, a trip was organized to Bletchley Park, which was
   celebrating Turing's centenary in 2012.
 __

FreeBSD Foundation

   URL: http://www.freebsdfoundation.org/press/2012Jul-newsletter.shtml

   Contact: Deb Goodkin d...@freebsdfoundation.org

   The Foundation hosted and sponsored the Cambridge FreeBSD developer
   summit in 

FreeBSD Quarterly Status Report, October-December 2012.

2013-03-04 Thread Isabell Long
FreeBSD Quarterly Status Report, October-December 2012.

Introduction

   This report covers FreeBSD-related projects between October and
   December 2012. This is the last of four reports planned for 2012.

   Highlights from this status report include a very successful EuroBSDCon
   2012 conference and associated FreeBSD Developer Summit, both held in
   Warsaw, Poland. Other highlights are several projects related to the
   FreeBSD port to the ARM architecture, extending support for platforms,
   boards and CPUs, improvements to the performance of the pf(4) firewall,
   and a new native iSCSI target.

   Thanks to all the reporters for the excellent work! This report
   contains 27 entries and we hope you enjoy reading it.

   The deadline for submissions covering the period between January and
   March 2013 is April 21st, 2013.
 __

Projects

 * BHyVe
 * Native iSCSI Target
 * NFS Version 4
 * pxe_http -- booting FreeBSD from apache
 * UEFI
 * Unprivileged install and image creation

Userland Programs

 * BSD-licenced patch(1)
 * bsdconfig(8)

FreeBSD Team Reports

 * FreeBSD Core Team
 * FreeBSD Documentation Engineering
 * FreeBSD Foundation
 * Postmaster

Kernel

 * AMD GPUs kernel-modesetting support
 * Common Flash Interface (CFI) driver improvements
 * SMP-Friendly pf(4)
 * Unmapped I/O

Documentation

 * The FreeBSD Japanese Documentation Project

Architectures

 * Compiler improvements for FreeBSD/ARMv6
 * FreeBSD on AARCH64
 * FreeBSD on BeagleBone
 * FreeBSD on Raspberry Pi

Ports

 * FreeBSD Haskell Ports
 * KDE/FreeBSD
 * Ports Collection
 * Xfce

Miscellaneous

 * EuroBSDcon 2012
 * FreeBSD Developer Summit, Warsaw
 __

AMD GPUs kernel-modesetting support

   URL: https://wiki.FreeBSD.org/AMD_GPU
   URL: http://people.FreeBSD.org/~kib/misc/ttm.1.patch

   Contact: Alexander Kabaev k...@freebsd.org
   Contact: Jean-Sébastien Pédron dumbb...@freebsd.org
   Contact: Konstantin Belousov k...@freebsd.org

   Jean-Sébastien Pédron started to port the AMD GPUs driver from Linux to
   FreeBSD 10-CURRENT in January 2013. This work is based on a previous
   effort by Alexander Kabaev. Konstantin Belousov provided the initial
   port of the TTM memory manager.

   As of this writing, the driver is building but the tested device fails
   to attach.

   Status updates will be posted to the FreeBSD wiki.
 __

BHyVe

   URL: https://wiki.FreeBSD.org/BHyVe
   URL: http://www.bhyve.org/

   Contact: Neel Natu n...@freebsd.org
   Contact: Peter Grehan gre...@freebsd.org

   BHyVe is a type-2 hypervisor for FreeBSD/amd64 hosts with Intel VT-x
   and EPT CPU support. The bhyve project branch was merged into CURRENT
   on Jan 18. Work is progressing on performance, ease of use, AMD SVM
   support, and being able to run non-FreeBSD operating systems.

Open tasks:

1. 1. Booting Linux/*BSD/Windows
2. 2. Moving the codebase to a more modular design consisting of a
   small base and loadable modules
3. 3. Various hypervisor features such as suspend/resume/live
   migration/sparse disk support
 __

BSD-licenced patch(1)

   URL: http://code.google.com/p/bsd-patch/

   Contact: Pedro Giffuni p...@freebsd.org
   Contact: Gabor Kovesdan ga...@freebsd.org
   Contact: Xin Li delp...@freebsd.org

   FreeBSD has been using for a while a very old version of GNU patch that
   is partially under the GPLv2. The original GNU patch utility is based
   on an initial implementation by Larry Wall that was not actually
   copyleft. OpenBSD did many enhancements to an older non-copyleft
   version of patch, this version was later adopted and further refined by
   DragonFlyBSD and NetBSD but there was no centralized development of the
   tool and FreeBSD kept working independently. In less than a week we
   took the version in DragonFlyBSD and adapted the FreeBSD enhancements
   to make it behave nearer to the version used natively in FreeBSD. Most
   of the work was done by Pedro Giffuni, adapting patches from sepotvin@
   and ed@, and additional contributions were done by Christoph Mallon,
   Gabor Kovesdan and Xin Li. As a result of this we now have a new
   version of patch committed in head/usr.bin/patch that you can try by
   using WITH_BSD_PATCH in your builds. The new patch(1) doesn't support
   the FreeBSD-specific -I and -S options which don't seem necessary. In
   GNU patch -I actually means 'ignore whitespaces' and we now support it
   too.

Open tasks:

1. Testing. A lot more testing.
 __

bsdconfig(8)

   URL: 

Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-03-04 Thread Pierre Guinoiseau
Hi,

I've tested it in a 8.3R jail on a 9.1R host, same setup, and the problem is
still there. So it may be a kernel bug on 9.1R.

On 14/02/2013 10:19:45, Oliver Brandmueller o...@e-gitt.net wrote:

 Hi,
 
 On Thu, Feb 14, 2013 at 03:13:57AM +0100, Pierre Guinoiseau wrote:
   I have seen openldap spin the cpu and even run out of memory to get 
   killed on some of our test systems running ~9.1-rel with zfs.
 [...]
  I've the same problem too, inside a jail, stored on ZFS. I've tried various
  tuning in slapd.conf, but none fixed the problem. While hanging, db_stat -c
  shows that all locks are being used, I've tried to set the limit really 
  high,
  far more than normally needed, but it didn't help. I may have the same 
  problem
  with amavisd-new but I've to verify that to be sure the symptoms are 
  similar.
 
 I have amd64 9.1-STABLE r245456 (about Jan 15) running. I have openldap 
 openldap-server-2.4.33_2 running, depending on libltdl-2.4.2 and 
 db46-4.6.21.4 .
 
 The system is zfs only (for the local filesystems, where openldap is 
 running - it has several NFS mounts for other purposes though). It's up 
 and running for about a month now (29 days) and never showed any 
 problematic behaviour regarding to slapd.
 
 I have ~10 SEARCH requests per seconds avg and only minor 
 ADD/MODIFY/DELETE operations. It has several binds und unbinds, about 
 1/10th of the requests. It runs in slurpd slave mode for my master LDAP.
 
 zroot/var/db runs with compression=off, dedup=off, zroot is a mirrored 
 pool on 2 Intel SATA SSD drives inside a GPT partition. Swap is on a ZFS 
 zvol.
 
 - Oliver
 
 
 -- 
 | Oliver Brandmueller  http://sysadm.in/ o...@sysadm.in |
 |Ich bin das Internet. Sowahr ich Gott helfe. |




pgp8DOT5kXi6a.pgp
Description: PGP signature


Re: Musings on ZFS Backup strategies

2013-03-04 Thread Volodymyr Kostyrko

02.03.2013 03:12, David Magda:


On Mar 1, 2013, at 12:55, Volodymyr Kostyrko wrote:


Yes, I'm working with backups the same way, I wrote a simple script that 
synchronizes two filesystems between distant servers. I also use the same 
script to synchronize bushy filesystems (with hundred thousands of files) where 
rsync produces a too big load for synchronizing.

https://github.com/kworr/zfSnap/commit/08d8b499dbc2527a652cddbc601c7ee8c0c23301


There are quite a few scripts out there:

http://www.freshports.org/search.php?query=zfs


A lot of them require python or ruby, and none of them manages 
synchronizing snapshots over network.



For file level copying, where you don't want to walk the entire tree, here is the 
zfs diff command:


zfs diff [-FHt] snapshot [snapshot|filesystem]

 Describes differences between a snapshot and a successor dataset. The
 successor dataset can be a later snapshot or the current filesystem.

 The changed files are displayed including the change type. The change
 type is displayed useing a single character. If a file or directory
 was renamed, the old and the new names are displayed.


http://www.freebsd.org/cgi/man.cgi?query=zfs

This allows one to get a quick list of files and directories, then use 
tar/rsync/cp/etc. to do the actual copy (where the destination does not have to 
be ZFS: e.g., NFS, ext4, Lustre, HDFS, etc.).


I know that but I see no reason in reverting to file-based synch if I 
can do block-based.


--
Sphinx of black quartz, judge my vow.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Musings on ZFS Backup strategies

2013-03-04 Thread David Magda
On Mon, March 4, 2013 11:07, Volodymyr Kostyrko wrote:
 02.03.2013 03:12, David Magda:
 There are quite a few scripts out there:

  http://www.freshports.org/search.php?query=zfs

 A lot of them require python or ruby, and none of them manages
 synchronizing snapshots over network.

Yes, but I think it is worth considering the creation of snapshots, and
the transfer of snapshots, as two separate steps. By treating them
independently (perhaps in two different scripts), it helps prevent the
breakage in one from affecting the other.

Snapshots are not backups (IMHO), but they are handy for users and
sysadmins for the simple situations of accidentally files. If your network
access / copying breaks or is slow for some reason, at least you have
simply copies locally. Similarly if you're having issues with the machine
that keeps your remove pool.

By keeping the snapshots going separately, once any problems with the
network or remote server are solved, you can use them to incrementally
sync up the remote pool. You can simply run the remote-sync scripts more
often to do the catch up.

It's just an idea, and everyone has different needs. I often find it handy
to keep different steps in different scripts that are loosely coupled.

 This allows one to get a quick list of files and directories, then use
 tar/rsync/cp/etc. to do the actual copy (where the destination does not
 have to be ZFS: e.g., NFS, ext4, Lustre, HDFS, etc.).

 I know that but I see no reason in reverting to file-based synch if I
 can do block-based.

Sure. I just thought I'd mention it in the thread in case other do need
that functionality and were not aware of zfs diff. Not everyone does or
can do pool-to-pool backups.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Musings on ZFS Backup strategies

2013-03-04 Thread Volodymyr Kostyrko

04.03.2013 19:04, David Magda:

On Mon, March 4, 2013 11:07, Volodymyr Kostyrko wrote:

02.03.2013 03:12, David Magda:

There are quite a few scripts out there:

http://www.freshports.org/search.php?query=zfs


A lot of them require python or ruby, and none of them manages
synchronizing snapshots over network.


Yes, but I think it is worth considering the creation of snapshots, and
the transfer of snapshots, as two separate steps. By treating them
independently (perhaps in two different scripts), it helps prevent the
breakage in one from affecting the other.


Exactly. My script is just an addition to zfSnap or any other tool that 
manages snapshots. Currently it does nothing more then comparing list of 
available snapshots and network transfer.



Snapshots are not backups (IMHO), but they are handy for users and
sysadmins for the simple situations of accidentally files. If your network
access / copying breaks or is slow for some reason, at least you have
simply copies locally. Similarly if you're having issues with the machine
that keeps your remove pool.


Yes, I addressed such thing specifically adding availability to restart 
transfer from any point or just even don't care - once initialized the 
process is autonomous and in case of failure anything would be rolled 
back to last known good snapshot. I also added possibility to 
compress/limit traffic.



By keeping the snapshots going separately, once any problems with the
network or remote server are solved, you can use them to incrementally
sync up the remote pool. You can simply run the remote-sync scripts more
often to do the catch up.

It's just an idea, and everyone has different needs. I often find it handy
to keep different steps in different scripts that are loosely coupled.


I just tried to give another use for snapshots. Or least the way to 
simplify things in one specific situation.


--
Sphinx of black quartz, judge my vow.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 9.1 minimal ram requirements

2013-03-04 Thread Kenneth D. Merry

I just checked in a change to HEAD (247814) that compiles CTL in GENERIC
but disables it by default.  (i.e. it uses no memory)  You can re-enable
it with the existing loader tunable.

i.e. set kern.cam.ctl.disable=0 in /boot/loader.conf and it will be
enabled.

Ken

On Wed, Feb 27, 2013 at 18:26:28 -0800, Adrian Chadd wrote:
 Hi Ken,
 
 I'd like to fix this for 9.2 and -HEAD.
 
 Would you mind if I disabled CTL in GENERIC (but still build it as a
 module) until you've fixed the initial RAM reservation that it
 requires?
 
 Thanks,
 
 
 
 Adrian
 
 
 On 22 December 2012 22:32, Adrian Chadd adr...@freebsd.org wrote:
  Ken,
 
  Does CAM CTL really need to pre-allocate 35MB of RAM at startup?
 
 
 
  Adrian
 
  On 22 December 2012 16:45, Sergey Kandaurov pluk...@gmail.com wrote:
  On 23 December 2012 03:40, Marten Vijn i...@martenvijn.nl wrote:
  On 12/23/2012 12:27 AM, Jakub Lach wrote:
 
  Guys, I've heard about some absurd RAM requirements
  for 9.1, has anybody tested it?
 
  e.g.
 
  http://forums.freebsd.org/showthread.php?t=36314
 
 
  jup, I can comfirm this with nanobsd (cross) compiled
  for my soekris net4501 which has 64 MB mem:
 
  from dmesg: real memory  = 67108864 (64 MB)
 
  while the same config compiled against a 9.0 tree still works...
 
 
  This (i.e. the kmem_map too small message seen with kernel memory
  shortage) could be due to CAM CTL ('device ctl' added in 9.1), which is
  quite a big kernel memory consumer.
  Try to disable CTL in loader with kern.cam.ctl.disable=1 to finish boot.
  A longer term workaround could be to postpone those memory allocations
  until the first call to CTL.
 
  # cam ctl init allocates roughly 35 MB of kernel memory at once
  # three memory pools, somewhat under M_DEVBUF, and memory disk
  # devbuf takes 1022K with kern.cam.ctl.disable=1
 
   Type InUse MemUse HighUse Requests  Size(s)
 devbuf   213 20366K   -  265  
  16,32,64,128,256,512,1024,2048,4096
 ctlmem  5062 10113K   - 5062  64,2048
 ctlblk   200   800K   -  200  4096
ramdisk 1  4096K   -1
ctlpool   532   138K   -  532  16,512
 
  --
  wbr,
  pluknet
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: GNATS now available via rsync

2013-03-04 Thread Jason Helfman
On Sun, Dec 23, 2012 at 1:51 PM, Simon L. B. Nielsen si...@freebsd.orgwrote:

 Hey,

 The GNATS database can now be mirrored using rsync from:

   rsync://bit0.us-west.freebsd.org/FreeBSD-bit/gnats/

 I expect that URL to be permanent, at least while GNATS is still
 alive. At a later point there will be more mirrors (a us-east will be
 the first) and I will find a place to publish the mirror list.

 On a side note, GNATS changes aren't mirrored to the old CVSup system
 right now, as cvsupd broke on FreeBSD 10.0, which the hosts running
 GNATS is running. There is no current plans from clusteradm@'s side to
 fix this now that an alternative way to get GNATS exists and cvsup is
 deprecated long term anyway.


I have supplied an update to reflect this change in the committers's guide
here:

http://www.freebsd.org/doc/en/articles/committers-guide/gnats.html

-jgh

--
Jason Helfman  | FreeBSD Committer
j...@freebsd.org | http://people.freebsd.org/~jgh  | The Power to Serve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Karl Denninger
Well now this is interesting.

I have converted a significant number of filesystems to ZFS over the
last week or so and have noted a few things.  A couple of them aren't so
good.

The subject machine in question has 12GB of RAM and dual Xeon
5500-series processors.  It also has an ARECA 1680ix in it with 2GB of
local cache and the BBU for it.  The ZFS spindles are all exported as
JBOD drives.  I set up four disks under GPT, have a single freebsd-zfs
partition added to them, are labeled and the providers are then
geli-encrypted and added to the pool.  When the same disks were running
on UFS filesystems they were set up as a 0+1 RAID array under the ARECA
adapter, exported as a single unit, GPT labeled as a single pack and
then gpart-sliced and newfs'd under UFS+SU.

Since I previously ran UFS filesystems on this config I know what the
performance level I achieved with that, and the entire system had been
running flawlessly set up that way for the last couple of years. 
Presently the machine is running 9.1-Stable, r244942M

Immediately after the conversion I set up a second pool to play with
backup strategies to a single drive and ran into a problem.  The disk I
used for that testing is one that previously was in the rotation and is
also known good.  I began to get EXTENDED stalls with zero I/O going on,
some lasting for 30 seconds or so.  The system was not frozen but
anything that touched I/O would lock until it cleared.  Dedup is off,
incidentally.

My first thought was that I had a bad drive, cable or other physical
problem.  However, searching for that proved fruitless -- there was
nothing being logged anywhere -- not in the SMART data, not by the
adapter, not by the OS.  Nothing.  Sticking a digital storage scope on
the +5V and +12V rails didn't disclose anything interesting with the
power in the chassis; it's stable.  Further, swapping the only disk that
had changed (the new backup volume) with a different one didn't change
behavior either.

The last straw was when I was able to reproduce the stalls WITHIN the
original pool against the same four disks that had been running
flawlessly for two years under UFS, and still couldn't find any evidence
of a hardware problem (not even ECC-corrected data returns.)  All the
disks involved are completely clean -- zero sector reassignments, the
drive-specific log is clean, etc.

Attempting to cut back the ARECA adapter's aggressiveness (buffering,
etc) on the theory that I was tickling something in its cache management
algorithm that was pissing it off proved fruitless as well, even when I
shut off ALL caching and NCQ options.  I also set
vfs.zfs.prefetch_disable=1 to no effect.  H...

Last night after reading the ZFS Tuning wiki for FreeBSD I went on a
lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set
vfs.zfs.write_limit_override to 102400 (1GB) and rebooted.  /*

The problem instantly disappeared and I cannot provoke its return even
with multiple full-bore snapshot and rsync filesystem copies running
while a scrub is being done.*/
/**/
I'm pinging between being I/O and processor (geli) limited now in normal
operation and slamming the I/O channel during a scrub.  It appears that
performance is roughly equivalent, maybe a bit less, than it was with
UFS+SU -- but it's fairly close.

The operating theory I have at the moment is that the ARC cache was in
some way getting into a near-deadlock situation with other memory
demands on the system (there IS a Postgres server running on this
hardware although it's a replication server and not taking queries --
nonetheless it does grab a chunk of RAM) leading to the stalls. 
Limiting its grab of RAM appears to have to resolved the contention
issue.  I was unable to catch it actually running out of free memory
although it was consistently into the low five-digit free page count and
the kernel never garfed on the console about resource exhaustion --
other than a bitch about swap stalling (the infamous more than 20
seconds message.)  Page space in use near the time in question (I could
not get a display while locked as it went to I/O and froze) was not
zero, but pretty close to it (a few thousand blocks.)  That the system
was driven into light paging does appear to be significant and
indicative of some sort of memory contention issue as under operation
with UFS filesystems this machine has never been observed to allocate
page space.

Anyone seen anything like this before and if so is this a case of
bad defaults or some bad behavior between various kernel memory
allocation contention sources?

This isn't exactly a resource-constrained machine running x64 code with
12GB of RAM and two quad-core processors in it!

-- 
-- Karl Denninger
/The Market Ticker ®/ http://market-ticker.org
Cuda Systems LLC
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to 

carp on stable/9: is there a way to keep jumbo? (fwd)

2013-03-04 Thread Dmitry Morozovsky
Collegaues,

sorry, sent to the wrong list (the only escuse for me is possibly that I'm 
trying to make HAST base on carp...)

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***


-- Forwarded message --
Date: Tue, 5 Mar 2013 02:31:51
From: Dmitry Morozovsky ma...@rinet.ru
To: freebsd...@freebsd.org
Subject: carp on stable/9: is there a way to keep jumbo?

Dear collesagues,

yes, I know glebius@ overhauled carp in -current, but I'm a bit nervous to 
deploy bleeding edge system on a NAS/SAN ;)

So, my question is about current state of carp in stable/9: building HA pair I 
found that carp interfaces lose jumbo capabilities:

root@cthulhu4:~# ifconfig | grep mtu
em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 
9000
em1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 
9000
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384
lagg0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 
9000
carp0: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500
carp1: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500
root@cthulhu4:~# ifconfig carp1 mtu 9000
ifconfig: ioctl (set mtu): Invalid argument

Is it unavoidable at the moment, or am I missing something obvious?

Thanks!

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Steven Hartland

What does zfs-stats -a show when your having the stall issue?

You can also use zfs iostats to show individual disk iostats
which may help identify a single failing disk e.g.
zpool iostat -v 1

Also have you investigated which of the two sysctls you changed
fixed it or does it require both?

   Regards
   Steve

- Original Message - 
From: Karl Denninger k...@denninger.net

To: freebsd-stable@freebsd.org
Sent: Monday, March 04, 2013 10:48 PM
Subject: ZFS stalls -- and maybe we should be talking about defaults?


Well now this is interesting.

I have converted a significant number of filesystems to ZFS over the
last week or so and have noted a few things.  A couple of them aren't so
good.

The subject machine in question has 12GB of RAM and dual Xeon
5500-series processors.  It also has an ARECA 1680ix in it with 2GB of
local cache and the BBU for it.  The ZFS spindles are all exported as
JBOD drives.  I set up four disks under GPT, have a single freebsd-zfs
partition added to them, are labeled and the providers are then
geli-encrypted and added to the pool.  When the same disks were running
on UFS filesystems they were set up as a 0+1 RAID array under the ARECA
adapter, exported as a single unit, GPT labeled as a single pack and
then gpart-sliced and newfs'd under UFS+SU.

Since I previously ran UFS filesystems on this config I know what the
performance level I achieved with that, and the entire system had been
running flawlessly set up that way for the last couple of years.
Presently the machine is running 9.1-Stable, r244942M

Immediately after the conversion I set up a second pool to play with
backup strategies to a single drive and ran into a problem.  The disk I
used for that testing is one that previously was in the rotation and is
also known good.  I began to get EXTENDED stalls with zero I/O going on,
some lasting for 30 seconds or so.  The system was not frozen but
anything that touched I/O would lock until it cleared.  Dedup is off,
incidentally.

My first thought was that I had a bad drive, cable or other physical
problem.  However, searching for that proved fruitless -- there was
nothing being logged anywhere -- not in the SMART data, not by the
adapter, not by the OS.  Nothing.  Sticking a digital storage scope on
the +5V and +12V rails didn't disclose anything interesting with the
power in the chassis; it's stable.  Further, swapping the only disk that
had changed (the new backup volume) with a different one didn't change
behavior either.

The last straw was when I was able to reproduce the stalls WITHIN the
original pool against the same four disks that had been running
flawlessly for two years under UFS, and still couldn't find any evidence
of a hardware problem (not even ECC-corrected data returns.)  All the
disks involved are completely clean -- zero sector reassignments, the
drive-specific log is clean, etc.

Attempting to cut back the ARECA adapter's aggressiveness (buffering,
etc) on the theory that I was tickling something in its cache management
algorithm that was pissing it off proved fruitless as well, even when I
shut off ALL caching and NCQ options.  I also set
vfs.zfs.prefetch_disable=1 to no effect.  H...

Last night after reading the ZFS Tuning wiki for FreeBSD I went on a
lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set
vfs.zfs.write_limit_override to 102400 (1GB) and rebooted.  /*

The problem instantly disappeared and I cannot provoke its return even
with multiple full-bore snapshot and rsync filesystem copies running
while a scrub is being done.*/
/**/
I'm pinging between being I/O and processor (geli) limited now in normal
operation and slamming the I/O channel during a scrub.  It appears that
performance is roughly equivalent, maybe a bit less, than it was with
UFS+SU -- but it's fairly close.

The operating theory I have at the moment is that the ARC cache was in
some way getting into a near-deadlock situation with other memory
demands on the system (there IS a Postgres server running on this
hardware although it's a replication server and not taking queries --
nonetheless it does grab a chunk of RAM) leading to the stalls.
Limiting its grab of RAM appears to have to resolved the contention
issue.  I was unable to catch it actually running out of free memory
although it was consistently into the low five-digit free page count and
the kernel never garfed on the console about resource exhaustion --
other than a bitch about swap stalling (the infamous more than 20
seconds message.)  Page space in use near the time in question (I could
not get a display while locked as it went to I/O and froze) was not
zero, but pretty close to it (a few thousand blocks.)  That the system
was driven into light paging does appear to be significant and
indicative of some sort of memory contention issue as under operation
with UFS filesystems this machine has never been observed to allocate
page space.

Anyone seen anything like this before and if 

Re: carp on stable/9: is there a way to keep jumbo? (fwd)

2013-03-04 Thread Steven Hartland

You might want to try:-
http://blog.multiplay.co.uk/dropzone/freebsd/carp-mtu.patch

Be warned it doesn't do any validation so if you use it against physical
interfaces with a smaller MTU things will likely go badly wrong, hell
they may go badly wrong anyway as its just a very quick and dirty hack ;-)

   Regards
   Steve
- Original Message - 
From: Dmitry Morozovsky ma...@rinet.ru

To: freebsd-stable@FreeBSD.org
Sent: Monday, March 04, 2013 10:49 PM
Subject: carp on stable/9: is there a way to keep jumbo? (fwd)



Collegaues,

sorry, sent to the wrong list (the only escuse for me is possibly that I'm 
trying to make HAST base on carp...)


--
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***


-- Forwarded message --
Date: Tue, 5 Mar 2013 02:31:51
From: Dmitry Morozovsky ma...@rinet.ru
To: freebsd...@freebsd.org
Subject: carp on stable/9: is there a way to keep jumbo?

Dear collesagues,

yes, I know glebius@ overhauled carp in -current, but I'm a bit nervous to 
deploy bleeding edge system on a NAS/SAN ;)


So, my question is about current state of carp in stable/9: building HA pair I 
found that carp interfaces lose jumbo capabilities:


root@cthulhu4:~# ifconfig | grep mtu
em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 
9000
em1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 
9000
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384
lagg0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 
9000
carp0: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500
carp1: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500
root@cthulhu4:~# ifconfig carp1 mtu 9000
ifconfig: ioctl (set mtu): Invalid argument

Is it unavoidable at the moment, or am I missing something obvious?

Thanks!

--
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Dennis Glatting
I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25%
) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips
without much load. Interestingly pbzip2 consistently created a problem
on a volume whereas gzip does not.

Here, stalls happen across several systems however I have had less
problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same
chips: IR vs IT) I don't have a problem.




On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote:
 Well now this is interesting.
 
 I have converted a significant number of filesystems to ZFS over the
 last week or so and have noted a few things.  A couple of them aren't so
 good.
 
 The subject machine in question has 12GB of RAM and dual Xeon
 5500-series processors.  It also has an ARECA 1680ix in it with 2GB of
 local cache and the BBU for it.  The ZFS spindles are all exported as
 JBOD drives.  I set up four disks under GPT, have a single freebsd-zfs
 partition added to them, are labeled and the providers are then
 geli-encrypted and added to the pool.  When the same disks were running
 on UFS filesystems they were set up as a 0+1 RAID array under the ARECA
 adapter, exported as a single unit, GPT labeled as a single pack and
 then gpart-sliced and newfs'd under UFS+SU.
 
 Since I previously ran UFS filesystems on this config I know what the
 performance level I achieved with that, and the entire system had been
 running flawlessly set up that way for the last couple of years. 
 Presently the machine is running 9.1-Stable, r244942M
 
 Immediately after the conversion I set up a second pool to play with
 backup strategies to a single drive and ran into a problem.  The disk I
 used for that testing is one that previously was in the rotation and is
 also known good.  I began to get EXTENDED stalls with zero I/O going on,
 some lasting for 30 seconds or so.  The system was not frozen but
 anything that touched I/O would lock until it cleared.  Dedup is off,
 incidentally.
 
 My first thought was that I had a bad drive, cable or other physical
 problem.  However, searching for that proved fruitless -- there was
 nothing being logged anywhere -- not in the SMART data, not by the
 adapter, not by the OS.  Nothing.  Sticking a digital storage scope on
 the +5V and +12V rails didn't disclose anything interesting with the
 power in the chassis; it's stable.  Further, swapping the only disk that
 had changed (the new backup volume) with a different one didn't change
 behavior either.
 
 The last straw was when I was able to reproduce the stalls WITHIN the
 original pool against the same four disks that had been running
 flawlessly for two years under UFS, and still couldn't find any evidence
 of a hardware problem (not even ECC-corrected data returns.)  All the
 disks involved are completely clean -- zero sector reassignments, the
 drive-specific log is clean, etc.
 
 Attempting to cut back the ARECA adapter's aggressiveness (buffering,
 etc) on the theory that I was tickling something in its cache management
 algorithm that was pissing it off proved fruitless as well, even when I
 shut off ALL caching and NCQ options.  I also set
 vfs.zfs.prefetch_disable=1 to no effect.  H...
 
 Last night after reading the ZFS Tuning wiki for FreeBSD I went on a
 lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set
 vfs.zfs.write_limit_override to 102400 (1GB) and rebooted.  /*
 
 The problem instantly disappeared and I cannot provoke its return even
 with multiple full-bore snapshot and rsync filesystem copies running
 while a scrub is being done.*/
 /**/
 I'm pinging between being I/O and processor (geli) limited now in normal
 operation and slamming the I/O channel during a scrub.  It appears that
 performance is roughly equivalent, maybe a bit less, than it was with
 UFS+SU -- but it's fairly close.
 
 The operating theory I have at the moment is that the ARC cache was in
 some way getting into a near-deadlock situation with other memory
 demands on the system (there IS a Postgres server running on this
 hardware although it's a replication server and not taking queries --
 nonetheless it does grab a chunk of RAM) leading to the stalls. 
 Limiting its grab of RAM appears to have to resolved the contention
 issue.  I was unable to catch it actually running out of free memory
 although it was consistently into the low five-digit free page count and
 the kernel never garfed on the console about resource exhaustion --
 other than a bitch about swap stalling (the infamous more than 20
 seconds message.)  Page space in use near the time in question (I could
 not get a display while locked as it went to I/O and froze) was not
 zero, but pretty close to it (a few thousand blocks.)  That the system
 was driven into light paging does appear to be significant and
 indicative of some sort of memory contention issue as under operation
 with UFS filesystems this machine has never been observed to allocate
 page space.
 
 

Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Karl Denninger
On 3/4/2013 6:33 PM, Steven Hartland wrote:
 What does zfs-stats -a show when your having the stall issue?

 You can also use zfs iostats to show individual disk iostats
 which may help identify a single failing disk e.g.
 zpool iostat -v 1

 Also have you investigated which of the two sysctls you changed
 fixed it or does it require both?

Regards
Steve

 - Original Message - From: Karl Denninger k...@denninger.net
 To: freebsd-stable@freebsd.org
 Sent: Monday, March 04, 2013 10:48 PM
 Subject: ZFS stalls -- and maybe we should be talking about defaults?


 Well now this is interesting.

 I have converted a significant number of filesystems to ZFS over the
 last week or so and have noted a few things.  A couple of them aren't so
 good.

 The subject machine in question has 12GB of RAM and dual Xeon
 5500-series processors.  It also has an ARECA 1680ix in it with 2GB of
 local cache and the BBU for it.  The ZFS spindles are all exported as
 JBOD drives.  I set up four disks under GPT, have a single freebsd-zfs
 partition added to them, are labeled and the providers are then
 geli-encrypted and added to the pool.  When the same disks were running
 on UFS filesystems they were set up as a 0+1 RAID array under the ARECA
 adapter, exported as a single unit, GPT labeled as a single pack and
 then gpart-sliced and newfs'd under UFS+SU.

 Since I previously ran UFS filesystems on this config I know what the
 performance level I achieved with that, and the entire system had been
 running flawlessly set up that way for the last couple of years.
 Presently the machine is running 9.1-Stable, r244942M

 Immediately after the conversion I set up a second pool to play with
 backup strategies to a single drive and ran into a problem.  The disk I
 used for that testing is one that previously was in the rotation and is
 also known good.  I began to get EXTENDED stalls with zero I/O going on,
 some lasting for 30 seconds or so.  The system was not frozen but
 anything that touched I/O would lock until it cleared.  Dedup is off,
 incidentally.

 My first thought was that I had a bad drive, cable or other physical
 problem.  However, searching for that proved fruitless -- there was
 nothing being logged anywhere -- not in the SMART data, not by the
 adapter, not by the OS.  Nothing.  Sticking a digital storage scope on
 the +5V and +12V rails didn't disclose anything interesting with the
 power in the chassis; it's stable.  Further, swapping the only disk that
 had changed (the new backup volume) with a different one didn't change
 behavior either.

 The last straw was when I was able to reproduce the stalls WITHIN the
 original pool against the same four disks that had been running
 flawlessly for two years under UFS, and still couldn't find any evidence
 of a hardware problem (not even ECC-corrected data returns.)  All the
 disks involved are completely clean -- zero sector reassignments, the
 drive-specific log is clean, etc.

 Attempting to cut back the ARECA adapter's aggressiveness (buffering,
 etc) on the theory that I was tickling something in its cache management
 algorithm that was pissing it off proved fruitless as well, even when I
 shut off ALL caching and NCQ options.  I also set
 vfs.zfs.prefetch_disable=1 to no effect.  H...

 Last night after reading the ZFS Tuning wiki for FreeBSD I went on a
 lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set
 vfs.zfs.write_limit_override to 102400 (1GB) and rebooted.  /*

 The problem instantly disappeared and I cannot provoke its return even
 with multiple full-bore snapshot and rsync filesystem copies running
 while a scrub is being done.*/
 /**/
 I'm pinging between being I/O and processor (geli) limited now in normal
 operation and slamming the I/O channel during a scrub.  It appears that
 performance is roughly equivalent, maybe a bit less, than it was with
 UFS+SU -- but it's fairly close.

 The operating theory I have at the moment is that the ARC cache was in
 some way getting into a near-deadlock situation with other memory
 demands on the system (there IS a Postgres server running on this
 hardware although it's a replication server and not taking queries --
 nonetheless it does grab a chunk of RAM) leading to the stalls.
 Limiting its grab of RAM appears to have to resolved the contention
 issue.  I was unable to catch it actually running out of free memory
 although it was consistently into the low five-digit free page count and
 the kernel never garfed on the console about resource exhaustion --
 other than a bitch about swap stalling (the infamous more than 20
 seconds message.)  Page space in use near the time in question (I could
 not get a display while locked as it went to I/O and froze) was not
 zero, but pretty close to it (a few thousand blocks.)  That the system
 was driven into light paging does appear to be significant and
 indicative of some sort of memory contention issue as under operation
 with UFS 

Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Karl Denninger
Stick this in /boot/loader.conf and see if your lockups goes away:

vfs.zfs.write_limit_override=102400

I've got a sentinal running that watches for zero-bandwidth zpool
iostat 5s that has been running for close to 12 hours now and with the
two tunables I changed it doesn't appear to be happening any more.

This system always has small-ball write I/Os going to it as it's a
postgresql hot standby mirror backing a VERY active system and is
receiving streaming logdata from the primary at a colocation site, so
the odds of it ever experiencing an actual zero for I/O (unless there's
a connectivity problem) is pretty remote.

If it turns out that the write_limit_override tunable is the one
responsible for stopping the hangs I can drop the ARC limit tunable
although I'm not sure I want to; I don't see much if any performance
penalty from leaving it where it is and if the larger cache isn't
helping anything then why use it?  I'm inclined to stick an SSD in the
cabinet as a cache drive instead of dedicating RAM to this -- even
though it's not AS fast as RAM it's still MASSIVELY quicker than getting
data off a rotating plate of rust.

Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all?

On 3/4/2013 8:07 PM, Dennis Glatting wrote:
 I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25%
 ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips
 without much load. Interestingly pbzip2 consistently created a problem
 on a volume whereas gzip does not.

 Here, stalls happen across several systems however I have had less
 problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same
 chips: IR vs IT) I don't have a problem.




 On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote:
 Well now this is interesting.

 I have converted a significant number of filesystems to ZFS over the
 last week or so and have noted a few things.  A couple of them aren't so
 good.

 The subject machine in question has 12GB of RAM and dual Xeon
 5500-series processors.  It also has an ARECA 1680ix in it with 2GB of
 local cache and the BBU for it.  The ZFS spindles are all exported as
 JBOD drives.  I set up four disks under GPT, have a single freebsd-zfs
 partition added to them, are labeled and the providers are then
 geli-encrypted and added to the pool.  When the same disks were running
 on UFS filesystems they were set up as a 0+1 RAID array under the ARECA
 adapter, exported as a single unit, GPT labeled as a single pack and
 then gpart-sliced and newfs'd under UFS+SU.

 Since I previously ran UFS filesystems on this config I know what the
 performance level I achieved with that, and the entire system had been
 running flawlessly set up that way for the last couple of years. 
 Presently the machine is running 9.1-Stable, r244942M

 Immediately after the conversion I set up a second pool to play with
 backup strategies to a single drive and ran into a problem.  The disk I
 used for that testing is one that previously was in the rotation and is
 also known good.  I began to get EXTENDED stalls with zero I/O going on,
 some lasting for 30 seconds or so.  The system was not frozen but
 anything that touched I/O would lock until it cleared.  Dedup is off,
 incidentally.

 My first thought was that I had a bad drive, cable or other physical
 problem.  However, searching for that proved fruitless -- there was
 nothing being logged anywhere -- not in the SMART data, not by the
 adapter, not by the OS.  Nothing.  Sticking a digital storage scope on
 the +5V and +12V rails didn't disclose anything interesting with the
 power in the chassis; it's stable.  Further, swapping the only disk that
 had changed (the new backup volume) with a different one didn't change
 behavior either.

 The last straw was when I was able to reproduce the stalls WITHIN the
 original pool against the same four disks that had been running
 flawlessly for two years under UFS, and still couldn't find any evidence
 of a hardware problem (not even ECC-corrected data returns.)  All the
 disks involved are completely clean -- zero sector reassignments, the
 drive-specific log is clean, etc.

 Attempting to cut back the ARECA adapter's aggressiveness (buffering,
 etc) on the theory that I was tickling something in its cache management
 algorithm that was pissing it off proved fruitless as well, even when I
 shut off ALL caching and NCQ options.  I also set
 vfs.zfs.prefetch_disable=1 to no effect.  H...

 Last night after reading the ZFS Tuning wiki for FreeBSD I went on a
 lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set
 vfs.zfs.write_limit_override to 102400 (1GB) and rebooted.  /*

 The problem instantly disappeared and I cannot provoke its return even
 with multiple full-bore snapshot and rsync filesystem copies running
 while a scrub is being done.*/
 /**/
 I'm pinging between being I/O and processor (geli) limited now in normal
 operation and slamming the I/O channel 

Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Steven Hartland
- Original Message - 
From: Karl Denninger k...@denninger.net



Stick this in /boot/loader.conf and see if your lockups goes away:

vfs.zfs.write_limit_override=102400

...


If it turns out that the write_limit_override tunable is the one
responsible for stopping the hangs I can drop the ARC limit tunable
although I'm not sure I want to; I don't see much if any performance
penalty from leaving it where it is and if the larger cache isn't
helping anything then why use it?  I'm inclined to stick an SSD in the
cabinet as a cache drive instead of dedicating RAM to this -- even
though it's not AS fast as RAM it's still MASSIVELY quicker than getting
data off a rotating plate of rust.


Now interesting you should say that I've seen a stall recently on ZFS
only box running on 6 x SSD RAIDZ2.

The stall was caused by fairly large mysql import, with nothing else
running.

Then it happened I thought the machine had wedged, but minutes (not
seconds) later, everything sprung into action again.


Am I correct that a ZFS filesystem does NOT use the VM buffer cache
at all?


Correct

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Karl Denninger

On 3/4/2013 9:25 PM, Steven Hartland wrote:
 - Original Message - From: Karl Denninger k...@denninger.net

 Stick this in /boot/loader.conf and see if your lockups goes away:

 vfs.zfs.write_limit_override=102400
 ...

 If it turns out that the write_limit_override tunable is the one
 responsible for stopping the hangs I can drop the ARC limit tunable
 although I'm not sure I want to; I don't see much if any performance
 penalty from leaving it where it is and if the larger cache isn't
 helping anything then why use it?  I'm inclined to stick an SSD in the
 cabinet as a cache drive instead of dedicating RAM to this -- even
 though it's not AS fast as RAM it's still MASSIVELY quicker than getting
 data off a rotating plate of rust.

 Now interesting you should say that I've seen a stall recently on ZFS
 only box running on 6 x SSD RAIDZ2.

 The stall was caused by fairly large mysql import, with nothing else
 running.

 Then it happened I thought the machine had wedged, but minutes (not
 seconds) later, everything sprung into action again.

That's exactly what I can reproduce here; the stalls are anywhere from a
few seconds to well north of a half-minute.  It looks like the machine
is hung -- but it is not.

The machine in question normally runs with zero swap allocated but it
always has 1.5Gb of shared memory allocated to Postgres (shared_buffers
= 1500MB in its config file)

I wonder if the ARC cache management code is misbehaving when shared
segments are in use?

-- 
-- Karl Denninger
/The Market Ticker ®/ http://market-ticker.org
Cuda Systems LLC
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Dennis Glatting
On Mon, 2013-03-04 at 20:58 -0600, Karl Denninger wrote:
 Stick this in /boot/loader.conf and see if your lockups goes away:
 
 vfs.zfs.write_limit_override=102400
 

K.


 I've got a sentinal running that watches for zero-bandwidth zpool
 iostat 5s that has been running for close to 12 hours now and with the
 two tunables I changed it doesn't appear to be happening any more.
 

I've also done this as well as top and systat -vmstat. Disk I/O stops
but the system lives through top, system, and the network. However, if I
try to login the login won't complete.

All of my systems are hardware RAID1 for the OS (LSI and Areca) and
typically a separate disk for swap. All other disks are ZFS.

 This system always has small-ball write I/Os going to it as it's a
 postgresql hot standby mirror backing a VERY active system and is
 receiving streaming logdata from the primary at a colocation site, so
 the odds of it ever experiencing an actual zero for I/O (unless there's
 a connectivity problem) is pretty remote.
 

I am doing multi TB sorts and GB database loads.


 If it turns out that the write_limit_override tunable is the one
 responsible for stopping the hangs I can drop the ARC limit tunable
 although I'm not sure I want to; I don't see much if any performance
 penalty from leaving it where it is and if the larger cache isn't
 helping anything then why use it?  I'm inclined to stick an SSD in the
 cabinet as a cache drive instead of dedicating RAM to this -- even
 though it's not AS fast as RAM it's still MASSIVELY quicker than getting
 data off a rotating plate of rust.
 

I forgot to mention that on my three 8.3 systems they occasionally
offline a disk (one or two a week, total). I simply online the disk and
after resilver all is well. There are ~40 disks across those three
systems. Of my 9.1 systems three are busy but with smaller number of
disks (about eight across two volumes (RAIDz2 and mirror).

I also have a ZFS-on-Linux (CentOS) system for play (about 12 disks). It
did not exhibit problems when it was in use but it did teach me a lesson
on the evils of dedup. :)


 Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all?
 

Dunno.


 On 3/4/2013 8:07 PM, Dennis Glatting wrote:
  I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25%
  ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips
  without much load. Interestingly pbzip2 consistently created a problem
  on a volume whereas gzip does not.
 
  Here, stalls happen across several systems however I have had less
  problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same
  chips: IR vs IT) I don't have a problem.
 
 
 
 
  On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote:
  Well now this is interesting.
 
  I have converted a significant number of filesystems to ZFS over the
  last week or so and have noted a few things.  A couple of them aren't so
  good.
 
  The subject machine in question has 12GB of RAM and dual Xeon
  5500-series processors.  It also has an ARECA 1680ix in it with 2GB of
  local cache and the BBU for it.  The ZFS spindles are all exported as
  JBOD drives.  I set up four disks under GPT, have a single freebsd-zfs
  partition added to them, are labeled and the providers are then
  geli-encrypted and added to the pool.  When the same disks were running
  on UFS filesystems they were set up as a 0+1 RAID array under the ARECA
  adapter, exported as a single unit, GPT labeled as a single pack and
  then gpart-sliced and newfs'd under UFS+SU.
 
  Since I previously ran UFS filesystems on this config I know what the
  performance level I achieved with that, and the entire system had been
  running flawlessly set up that way for the last couple of years. 
  Presently the machine is running 9.1-Stable, r244942M
 
  Immediately after the conversion I set up a second pool to play with
  backup strategies to a single drive and ran into a problem.  The disk I
  used for that testing is one that previously was in the rotation and is
  also known good.  I began to get EXTENDED stalls with zero I/O going on,
  some lasting for 30 seconds or so.  The system was not frozen but
  anything that touched I/O would lock until it cleared.  Dedup is off,
  incidentally.
 
  My first thought was that I had a bad drive, cable or other physical
  problem.  However, searching for that proved fruitless -- there was
  nothing being logged anywhere -- not in the SMART data, not by the
  adapter, not by the OS.  Nothing.  Sticking a digital storage scope on
  the +5V and +12V rails didn't disclose anything interesting with the
  power in the chassis; it's stable.  Further, swapping the only disk that
  had changed (the new backup volume) with a different one didn't change
  behavior either.
 
  The last straw was when I was able to reproduce the stalls WITHIN the
  original pool against the same four disks that had been running
  flawlessly for two years under UFS, and still 

Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Dennis Glatting
On Tue, 2013-03-05 at 03:25 +, Steven Hartland wrote:
 - Original Message - 
 From: Karl Denninger k...@denninger.net
 
  Stick this in /boot/loader.conf and see if your lockups goes away:
 
  vfs.zfs.write_limit_override=102400
 ...
 
  If it turns out that the write_limit_override tunable is the one
  responsible for stopping the hangs I can drop the ARC limit tunable
  although I'm not sure I want to; I don't see much if any performance
  penalty from leaving it where it is and if the larger cache isn't
  helping anything then why use it?  I'm inclined to stick an SSD in the
  cabinet as a cache drive instead of dedicating RAM to this -- even
  though it's not AS fast as RAM it's still MASSIVELY quicker than getting
  data off a rotating plate of rust.
 
 Now interesting you should say that I've seen a stall recently on ZFS
 only box running on 6 x SSD RAIDZ2.
 
 The stall was caused by fairly large mysql import, with nothing else
 running.
 
 Then it happened I thought the machine had wedged, but minutes (not
 seconds) later, everything sprung into action again.
 

I've seen this too.


  Am I correct that a ZFS filesystem does NOT use the VM buffer cache
  at all?
 
 Correct
 
 Regards
 Steve
 
 
 This e.mail is private and confidential between Multiplay (UK) Ltd. and the 
 person or entity to whom it is addressed. In the event of misdirection, the 
 recipient is prohibited from using, copying, printing or otherwise 
 disseminating it or any information contained in it. 
 
 In the event of misdirection, illegible or incomplete transmission please 
 telephone +44 845 868 1337
 or return the E.mail to postmas...@multiplay.co.uk.
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

-- 
Dennis Glatting d...@pki2.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Steven Hartland
- Original Message - 
From: Karl Denninger k...@denninger.net

Then it happened I thought the machine had wedged, but minutes (not
seconds) later, everything sprung into action again.


That's exactly what I can reproduce here; the stalls are anywhere from a
few seconds to well north of a half-minute.  It looks like the machine
is hung -- but it is not.


Out of interest when this happens for you is syncer using lots of CPU?

If its anything like my stalls you'll need top loaded prior to the fact.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Karl Denninger

On 3/4/2013 10:01 PM, Steven Hartland wrote:
 - Original Message - From: Karl Denninger k...@denninger.net
 Then it happened I thought the machine had wedged, but minutes (not
 seconds) later, everything sprung into action again.

 That's exactly what I can reproduce here; the stalls are anywhere from a
 few seconds to well north of a half-minute.  It looks like the machine
 is hung -- but it is not.

 Out of interest when this happens for you is syncer using lots of CPU?

 If its anything like my stalls you'll need top loaded prior to the fact.

Regards
Steve
Don't know.  But the CPU is getting hammered when it happens because I
am geli-encrypting all my drives and as a consequence it is not at all
uncommon for the load average to be north of 10 when the system is under
heavy I/O load.  System response is fine right up until it stalls.

I'm going to put some effort into trying to isolate exactly what is
going on here in the coming days since I happen to have a spare box in
an identical configuration that I can afford to lock up without
impacting anyone doing real work :-)

-- 
-- Karl Denninger
/The Market Ticker ®/ http://market-ticker.org
Cuda Systems LLC
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Ben Morrow
Quoth Karl Denninger k...@denninger.net:
 
 Note that the machine is not booting from ZFS -- it is booting from and
 has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks
 like a single da0 drive to the OS) and that drive stalls as well when
 it freezes.  It's definitely a kernel thing when it happens as the OS
 would otherwise not have locked (just I/O to the user partitions) -- but
 it does. 

Is it still the case that mixing UFS and ZFS can cause problems, or were
they all fixed? I remember a while ago (before the arc usage monitoring
code was added) there were a number of reports of serious probles
running an rsync from UFS to ZFS.

If you can it might be worth trying your scratch machine booting from
ZFS. Probably the best way is to leave your swap partition where it is
(IMHO it's not worth trying to swap onto a zvol) and convert the UFS
partition into a separate zpool to boot from. You will also need to
replace the boot blocks; assuming you're using GPT you can do this with
gpart bootcode -p /boot/gptzfsboot -i gpt boot partition.

Ben

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Jeremy Chadwick
On Tue, Mar 05, 2013 at 05:05:47AM +, Ben Morrow wrote:
 Quoth Karl Denninger k...@denninger.net:
  
  Note that the machine is not booting from ZFS -- it is booting from and
  has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks
  like a single da0 drive to the OS) and that drive stalls as well when
  it freezes.  It's definitely a kernel thing when it happens as the OS
  would otherwise not have locked (just I/O to the user partitions) -- but
  it does. 
 
 Is it still the case that mixing UFS and ZFS can cause problems, or were
 they all fixed? I remember a while ago (before the arc usage monitoring
 code was added) there were a number of reports of serious probles
 running an rsync from UFS to ZFS.

This problem still exists on stable/9.  The behaviour manifests itself
as fairly bad performance (I cannot remember if stalling or if just
throughput rates were awful).  I can only speculate as to what the root
cause is, but my guess is that it has something to do with the two
caching systems (UFS vs. ZFS ARC) fighting over large sums of memory.

The advice I've given people in the past is: if you do a LOT of I/O
between UFS and ZFS on the same box, it's time to move to 100% ZFS.
That said, I still do not recommend ZFS for a root filesystem (this
biting people still happens even today), and swap-on-ZFS is a huge
no-no.

I will note that I myself use pure UFS+SU (not SUJ) for my main OS
installation (that means /, swap, /var, /tmp, and /usr) on a dedicated
SSD, while everything else is ZFS raidz1 (no dedup, no compression;
won't ever enable these until that thread priority problem is fixed on
FreeBSD).

However, when I was migrating from gmirror+UFS+SU to ZFS, I witnessed
what I described in my 1st and 2nd paragraphs.  What userland utilities
were used (rsync vs. cp) made no difference; the problem is in the
kernel.

Footnote about this thread:

This thread contains all sorts of random pieces of information about
systems, with very little actual detail in them (barring the symptoms,
which are always useful to know!).

For example, just because your machine has 8 cores and 12GB of RAM
doesn't mean jack squat if some software in the kernel is designed
oddly.  Reworded: throwing more hardware at a problem solves nothing.

The most useful thing (for me) that I found was deep within the thread,
a few words along the lines of De-dup isn't used.  What about
compression, and if it's *ever* been enabled on the filesystem (even
if not presently enabled)?  It matters.  All this matters.

I see lots of end-users talking about these problems, but (barring
Steven) literally no kernel people who are in the know about ZFS
mentioning how said users can get them (devs) info that can help track
this down.  Those devs live on freebsd-fs@ and freebsd-hackers@, and not
too many read freebsd-stable@.

Step back for a moment and look at this anti-KISS configuration:

- Hardware RAID controller involved (Areca 1680ix)
- Hardware RAID controller has its own battery-backed cache (2GB)
- Therefore arcmsr(4) is involved -- revision of driver/OS build
  matters here, ditto with firmware version
- 4 disks are involved, models unknown
- Disks are GPT and are *partitioned, and ZFS refers to the partitions
  not the raw disk -- this matters (honest, it really does; the ZFS
  code handles things differently with raw disks)
- Providers are GELI-encrypted

Now ask yourself if any dev is really going to tackle this one given the
above mess.

My advice would be to get rid of the hardware RAID (go with Intel ICHxx
or ESBx on-board with AHCI), use raw disks for ZFS (if 4096-byte sector
disks use the gnop(8) method, which is a one-time thing), and get rid of
GELI.  If you can reproduce the problem there 100% of the time, awesome,
it's a clean/clear setup for someone to help investigate.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Garrett Wollman
In article 8c68812328e3483ba9786ef155911...@multiplay.co.uk,
kill...@multiplay.co.uk writes:

Now interesting you should say that I've seen a stall recently on ZFS
only box running on 6 x SSD RAIDZ2.

The stall was caused by fairly large mysql import, with nothing else
running.

Then it happened I thought the machine had wedged, but minutes (not
seconds) later, everything sprung into action again.

I have certainly seen what you might describe as stalls, caused, so
far as I can tell, by kernel memory starvation.  I've seen it take as
much as a half an hour to recover from these (which is too long for my
users).  Right now I have the ARC limited to 64 GB (on a 96 GB file
server) and that has made it more stable, but it's still not behaving
quite as I would like, and I'm looking to put more memory into the
system (to be used for non-ARC functions).  Looking at my munin
graphs, I find that backups in particular put very heavy pressure on,
doubling the UMA allocations over steady-state, and this takes about
four or five hours to climb back down.  See
http://people.freebsd.org/~wollman/vmstat_z-day.png for an example.

Some of the stalls are undoubtedly caused by internal fragmentation
rather than actual data in use.  (Solaris used to have this issue, and
some hooks were added to allow some amount of garbage collection with
the cooperation of the filesystem.)

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org