FreeBSD Quarterly Status Report, July-September 2012.
FreeBSD Quarterly Status Report, July-September 2012. Introduction This report covers FreeBSD-related projects between July and September 2012. This is the third of the four reports planned for 2012. Highlights from this quarter include successful participation in Google Summer of Code, major work in areas of the source and ports trees, and a Developer Summit attended by over 30 developers. Thanks to all the reporters for the excellent work! This report contains 12 entries and we hope you enjoy reading it. __ Projects * FreeBSD on Altera FPGAs * Native iSCSI Target * Parallel rc.d execution FreeBSD Team Reports * FreeBSD Bugbusting Team * FreeBSD Foundation * The FreeBSD Core Team Kernel * FreeBSD on ARMv6/ARMv7 Documentation * The FreeBSD Japanese Documentation Project Ports * KDE/FreeBSD * Ports Collection Miscellaneous * FreeBSD Developer Summit, Cambridge, UK FreeBSD in Google Summer of Code * Google Summer of Code 2012 __ FreeBSD Bugbusting Team URL: http://www.FreeBSD.org/support.html#gnats URL: https://wiki.freebsd.org/BugBusting Contact: Eitan Adler ead...@freebsd.org Contact: Gavin Atkinson ga...@freebsd.org Contact: Oleksandr Tymoshenko go...@freebsd.org In August, Eitan Adler (eadler@) and Oleksandr Tymoshenko (gonzo@) joined the Bugmeister team. At the same time, Remko Lodder and Volker Werth stepped down. We extend our thanks to Volker and Remko for their work in the past, and welcome Oleksandr and Eitan. Eitan and Oleksandr have been working hard on migrating from GNATS, and have made significant progress on evaluating new software, and creating scripts to export data from GNATS. The bugbusting team continue work on trying to make the contents of the GNATS PR database cleaner, more accessible and easier for committers to find and resolve PRs, by tagging PRs to indicate the areas involved, and by ensuring that there is sufficient info within each PR to resolve each issue. As always, anybody interested in helping out with the PR queue is welcome to join us in #freebsd-bugbusters on EFnet. We are always looking for additional help, whether your interests lie in triaging incoming PRs, generating patches to resolve existing problems, or simply helping with the database housekeeping (identifying duplicate PRs, ones that have already been resolved, etc). This is a great way of getting more involved with FreeBSD! Open tasks: 1. Further research into tools suitable to replace GNATS. 2. Get more users involved with triaging PRs as they come in. 3. Assist committers with closing PRs. __ FreeBSD Developer Summit, Cambridge, UK URL: https://wiki.freebsd.org/201208DevSummit Contact: Robert Watson rwat...@freebsd.org In the end of August, there was an off-season Developer Summit held in Cambridge, UK at the University of Cambridge Computer Laboratory. This was a three-day event, with a documentation summit scheduled for the day before. The three days of the main event were split into three sessions, with two tracks in each. Some of them even involved ARM developers from the neighborhoods which proven to be productive, and led to further engagement between the FreeBSD community and ARM. The schedule was finalized on the first day, spawning a plethora of topics to discuss, followed by splitting into groups. A short summary from each of the groups was presented in the final session and then published at the event's home page on the FreeBSD wiki. This summit contributed greatly to arriving to a tentative plan for throwing the switch to make clang the default compiler on HEAD. This was further discussed on the mailing list, and has now happened, bringing us one big step closer to a GPL-free FreeBSD 10. As part of the program, an afternoon of short talks from researchers in the Cambridge Computer Laboratory involved either operating systems work in general or FreeBSD in particular. Robert Watson showed off a tablet running FreeBSD on a MIPS-compatible soft-core processor running on an Altera FPGA. In association with the event, a dinner was hosted by St. John's college and co-sponsored by Google and the FreeBSD Foundation. The day after the conference, a trip was organized to Bletchley Park, which was celebrating Turing's centenary in 2012. __ FreeBSD Foundation URL: http://www.freebsdfoundation.org/press/2012Jul-newsletter.shtml Contact: Deb Goodkin d...@freebsdfoundation.org The Foundation hosted and sponsored the Cambridge FreeBSD developer summit in
FreeBSD Quarterly Status Report, October-December 2012.
FreeBSD Quarterly Status Report, October-December 2012. Introduction This report covers FreeBSD-related projects between October and December 2012. This is the last of four reports planned for 2012. Highlights from this status report include a very successful EuroBSDCon 2012 conference and associated FreeBSD Developer Summit, both held in Warsaw, Poland. Other highlights are several projects related to the FreeBSD port to the ARM architecture, extending support for platforms, boards and CPUs, improvements to the performance of the pf(4) firewall, and a new native iSCSI target. Thanks to all the reporters for the excellent work! This report contains 27 entries and we hope you enjoy reading it. The deadline for submissions covering the period between January and March 2013 is April 21st, 2013. __ Projects * BHyVe * Native iSCSI Target * NFS Version 4 * pxe_http -- booting FreeBSD from apache * UEFI * Unprivileged install and image creation Userland Programs * BSD-licenced patch(1) * bsdconfig(8) FreeBSD Team Reports * FreeBSD Core Team * FreeBSD Documentation Engineering * FreeBSD Foundation * Postmaster Kernel * AMD GPUs kernel-modesetting support * Common Flash Interface (CFI) driver improvements * SMP-Friendly pf(4) * Unmapped I/O Documentation * The FreeBSD Japanese Documentation Project Architectures * Compiler improvements for FreeBSD/ARMv6 * FreeBSD on AARCH64 * FreeBSD on BeagleBone * FreeBSD on Raspberry Pi Ports * FreeBSD Haskell Ports * KDE/FreeBSD * Ports Collection * Xfce Miscellaneous * EuroBSDcon 2012 * FreeBSD Developer Summit, Warsaw __ AMD GPUs kernel-modesetting support URL: https://wiki.FreeBSD.org/AMD_GPU URL: http://people.FreeBSD.org/~kib/misc/ttm.1.patch Contact: Alexander Kabaev k...@freebsd.org Contact: Jean-Sébastien Pédron dumbb...@freebsd.org Contact: Konstantin Belousov k...@freebsd.org Jean-Sébastien Pédron started to port the AMD GPUs driver from Linux to FreeBSD 10-CURRENT in January 2013. This work is based on a previous effort by Alexander Kabaev. Konstantin Belousov provided the initial port of the TTM memory manager. As of this writing, the driver is building but the tested device fails to attach. Status updates will be posted to the FreeBSD wiki. __ BHyVe URL: https://wiki.FreeBSD.org/BHyVe URL: http://www.bhyve.org/ Contact: Neel Natu n...@freebsd.org Contact: Peter Grehan gre...@freebsd.org BHyVe is a type-2 hypervisor for FreeBSD/amd64 hosts with Intel VT-x and EPT CPU support. The bhyve project branch was merged into CURRENT on Jan 18. Work is progressing on performance, ease of use, AMD SVM support, and being able to run non-FreeBSD operating systems. Open tasks: 1. 1. Booting Linux/*BSD/Windows 2. 2. Moving the codebase to a more modular design consisting of a small base and loadable modules 3. 3. Various hypervisor features such as suspend/resume/live migration/sparse disk support __ BSD-licenced patch(1) URL: http://code.google.com/p/bsd-patch/ Contact: Pedro Giffuni p...@freebsd.org Contact: Gabor Kovesdan ga...@freebsd.org Contact: Xin Li delp...@freebsd.org FreeBSD has been using for a while a very old version of GNU patch that is partially under the GPLv2. The original GNU patch utility is based on an initial implementation by Larry Wall that was not actually copyleft. OpenBSD did many enhancements to an older non-copyleft version of patch, this version was later adopted and further refined by DragonFlyBSD and NetBSD but there was no centralized development of the tool and FreeBSD kept working independently. In less than a week we took the version in DragonFlyBSD and adapted the FreeBSD enhancements to make it behave nearer to the version used natively in FreeBSD. Most of the work was done by Pedro Giffuni, adapting patches from sepotvin@ and ed@, and additional contributions were done by Christoph Mallon, Gabor Kovesdan and Xin Li. As a result of this we now have a new version of patch committed in head/usr.bin/patch that you can try by using WITH_BSD_PATCH in your builds. The new patch(1) doesn't support the FreeBSD-specific -I and -S options which don't seem necessary. In GNU patch -I actually means 'ignore whitespaces' and we now support it too. Open tasks: 1. Testing. A lot more testing. __ bsdconfig(8) URL:
Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems
Hi, I've tested it in a 8.3R jail on a 9.1R host, same setup, and the problem is still there. So it may be a kernel bug on 9.1R. On 14/02/2013 10:19:45, Oliver Brandmueller o...@e-gitt.net wrote: Hi, On Thu, Feb 14, 2013 at 03:13:57AM +0100, Pierre Guinoiseau wrote: I have seen openldap spin the cpu and even run out of memory to get killed on some of our test systems running ~9.1-rel with zfs. [...] I've the same problem too, inside a jail, stored on ZFS. I've tried various tuning in slapd.conf, but none fixed the problem. While hanging, db_stat -c shows that all locks are being used, I've tried to set the limit really high, far more than normally needed, but it didn't help. I may have the same problem with amavisd-new but I've to verify that to be sure the symptoms are similar. I have amd64 9.1-STABLE r245456 (about Jan 15) running. I have openldap openldap-server-2.4.33_2 running, depending on libltdl-2.4.2 and db46-4.6.21.4 . The system is zfs only (for the local filesystems, where openldap is running - it has several NFS mounts for other purposes though). It's up and running for about a month now (29 days) and never showed any problematic behaviour regarding to slapd. I have ~10 SEARCH requests per seconds avg and only minor ADD/MODIFY/DELETE operations. It has several binds und unbinds, about 1/10th of the requests. It runs in slurpd slave mode for my master LDAP. zroot/var/db runs with compression=off, dedup=off, zroot is a mirrored pool on 2 Intel SATA SSD drives inside a GPT partition. Swap is on a ZFS zvol. - Oliver -- | Oliver Brandmueller http://sysadm.in/ o...@sysadm.in | |Ich bin das Internet. Sowahr ich Gott helfe. | pgp8DOT5kXi6a.pgp Description: PGP signature
Re: Musings on ZFS Backup strategies
02.03.2013 03:12, David Magda: On Mar 1, 2013, at 12:55, Volodymyr Kostyrko wrote: Yes, I'm working with backups the same way, I wrote a simple script that synchronizes two filesystems between distant servers. I also use the same script to synchronize bushy filesystems (with hundred thousands of files) where rsync produces a too big load for synchronizing. https://github.com/kworr/zfSnap/commit/08d8b499dbc2527a652cddbc601c7ee8c0c23301 There are quite a few scripts out there: http://www.freshports.org/search.php?query=zfs A lot of them require python or ruby, and none of them manages synchronizing snapshots over network. For file level copying, where you don't want to walk the entire tree, here is the zfs diff command: zfs diff [-FHt] snapshot [snapshot|filesystem] Describes differences between a snapshot and a successor dataset. The successor dataset can be a later snapshot or the current filesystem. The changed files are displayed including the change type. The change type is displayed useing a single character. If a file or directory was renamed, the old and the new names are displayed. http://www.freebsd.org/cgi/man.cgi?query=zfs This allows one to get a quick list of files and directories, then use tar/rsync/cp/etc. to do the actual copy (where the destination does not have to be ZFS: e.g., NFS, ext4, Lustre, HDFS, etc.). I know that but I see no reason in reverting to file-based synch if I can do block-based. -- Sphinx of black quartz, judge my vow. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Musings on ZFS Backup strategies
On Mon, March 4, 2013 11:07, Volodymyr Kostyrko wrote: 02.03.2013 03:12, David Magda: There are quite a few scripts out there: http://www.freshports.org/search.php?query=zfs A lot of them require python or ruby, and none of them manages synchronizing snapshots over network. Yes, but I think it is worth considering the creation of snapshots, and the transfer of snapshots, as two separate steps. By treating them independently (perhaps in two different scripts), it helps prevent the breakage in one from affecting the other. Snapshots are not backups (IMHO), but they are handy for users and sysadmins for the simple situations of accidentally files. If your network access / copying breaks or is slow for some reason, at least you have simply copies locally. Similarly if you're having issues with the machine that keeps your remove pool. By keeping the snapshots going separately, once any problems with the network or remote server are solved, you can use them to incrementally sync up the remote pool. You can simply run the remote-sync scripts more often to do the catch up. It's just an idea, and everyone has different needs. I often find it handy to keep different steps in different scripts that are loosely coupled. This allows one to get a quick list of files and directories, then use tar/rsync/cp/etc. to do the actual copy (where the destination does not have to be ZFS: e.g., NFS, ext4, Lustre, HDFS, etc.). I know that but I see no reason in reverting to file-based synch if I can do block-based. Sure. I just thought I'd mention it in the thread in case other do need that functionality and were not aware of zfs diff. Not everyone does or can do pool-to-pool backups. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Musings on ZFS Backup strategies
04.03.2013 19:04, David Magda: On Mon, March 4, 2013 11:07, Volodymyr Kostyrko wrote: 02.03.2013 03:12, David Magda: There are quite a few scripts out there: http://www.freshports.org/search.php?query=zfs A lot of them require python or ruby, and none of them manages synchronizing snapshots over network. Yes, but I think it is worth considering the creation of snapshots, and the transfer of snapshots, as two separate steps. By treating them independently (perhaps in two different scripts), it helps prevent the breakage in one from affecting the other. Exactly. My script is just an addition to zfSnap or any other tool that manages snapshots. Currently it does nothing more then comparing list of available snapshots and network transfer. Snapshots are not backups (IMHO), but they are handy for users and sysadmins for the simple situations of accidentally files. If your network access / copying breaks or is slow for some reason, at least you have simply copies locally. Similarly if you're having issues with the machine that keeps your remove pool. Yes, I addressed such thing specifically adding availability to restart transfer from any point or just even don't care - once initialized the process is autonomous and in case of failure anything would be rolled back to last known good snapshot. I also added possibility to compress/limit traffic. By keeping the snapshots going separately, once any problems with the network or remote server are solved, you can use them to incrementally sync up the remote pool. You can simply run the remote-sync scripts more often to do the catch up. It's just an idea, and everyone has different needs. I often find it handy to keep different steps in different scripts that are loosely coupled. I just tried to give another use for snapshots. Or least the way to simplify things in one specific situation. -- Sphinx of black quartz, judge my vow. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1 minimal ram requirements
I just checked in a change to HEAD (247814) that compiles CTL in GENERIC but disables it by default. (i.e. it uses no memory) You can re-enable it with the existing loader tunable. i.e. set kern.cam.ctl.disable=0 in /boot/loader.conf and it will be enabled. Ken On Wed, Feb 27, 2013 at 18:26:28 -0800, Adrian Chadd wrote: Hi Ken, I'd like to fix this for 9.2 and -HEAD. Would you mind if I disabled CTL in GENERIC (but still build it as a module) until you've fixed the initial RAM reservation that it requires? Thanks, Adrian On 22 December 2012 22:32, Adrian Chadd adr...@freebsd.org wrote: Ken, Does CAM CTL really need to pre-allocate 35MB of RAM at startup? Adrian On 22 December 2012 16:45, Sergey Kandaurov pluk...@gmail.com wrote: On 23 December 2012 03:40, Marten Vijn i...@martenvijn.nl wrote: On 12/23/2012 12:27 AM, Jakub Lach wrote: Guys, I've heard about some absurd RAM requirements for 9.1, has anybody tested it? e.g. http://forums.freebsd.org/showthread.php?t=36314 jup, I can comfirm this with nanobsd (cross) compiled for my soekris net4501 which has 64 MB mem: from dmesg: real memory = 67108864 (64 MB) while the same config compiled against a 9.0 tree still works... This (i.e. the kmem_map too small message seen with kernel memory shortage) could be due to CAM CTL ('device ctl' added in 9.1), which is quite a big kernel memory consumer. Try to disable CTL in loader with kern.cam.ctl.disable=1 to finish boot. A longer term workaround could be to postpone those memory allocations until the first call to CTL. # cam ctl init allocates roughly 35 MB of kernel memory at once # three memory pools, somewhat under M_DEVBUF, and memory disk # devbuf takes 1022K with kern.cam.ctl.disable=1 Type InUse MemUse HighUse Requests Size(s) devbuf 213 20366K - 265 16,32,64,128,256,512,1024,2048,4096 ctlmem 5062 10113K - 5062 64,2048 ctlblk 200 800K - 200 4096 ramdisk 1 4096K -1 ctlpool 532 138K - 532 16,512 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Kenneth Merry k...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: GNATS now available via rsync
On Sun, Dec 23, 2012 at 1:51 PM, Simon L. B. Nielsen si...@freebsd.orgwrote: Hey, The GNATS database can now be mirrored using rsync from: rsync://bit0.us-west.freebsd.org/FreeBSD-bit/gnats/ I expect that URL to be permanent, at least while GNATS is still alive. At a later point there will be more mirrors (a us-east will be the first) and I will find a place to publish the mirror list. On a side note, GNATS changes aren't mirrored to the old CVSup system right now, as cvsupd broke on FreeBSD 10.0, which the hosts running GNATS is running. There is no current plans from clusteradm@'s side to fix this now that an alternative way to get GNATS exists and cvsup is deprecated long term anyway. I have supplied an update to reflect this change in the committers's guide here: http://www.freebsd.org/doc/en/articles/committers-guide/gnats.html -jgh -- Jason Helfman | FreeBSD Committer j...@freebsd.org | http://people.freebsd.org/~jgh | The Power to Serve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS stalls -- and maybe we should be talking about defaults?
Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. H... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set vfs.zfs.write_limit_override to 102400 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel during a scrub. It appears that performance is roughly equivalent, maybe a bit less, than it was with UFS+SU -- but it's fairly close. The operating theory I have at the moment is that the ARC cache was in some way getting into a near-deadlock situation with other memory demands on the system (there IS a Postgres server running on this hardware although it's a replication server and not taking queries -- nonetheless it does grab a chunk of RAM) leading to the stalls. Limiting its grab of RAM appears to have to resolved the contention issue. I was unable to catch it actually running out of free memory although it was consistently into the low five-digit free page count and the kernel never garfed on the console about resource exhaustion -- other than a bitch about swap stalling (the infamous more than 20 seconds message.) Page space in use near the time in question (I could not get a display while locked as it went to I/O and froze) was not zero, but pretty close to it (a few thousand blocks.) That the system was driven into light paging does appear to be significant and indicative of some sort of memory contention issue as under operation with UFS filesystems this machine has never been observed to allocate page space. Anyone seen anything like this before and if so is this a case of bad defaults or some bad behavior between various kernel memory allocation contention sources? This isn't exactly a resource-constrained machine running x64 code with 12GB of RAM and two quad-core processors in it! -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to
carp on stable/9: is there a way to keep jumbo? (fwd)
Collegaues, sorry, sent to the wrong list (the only escuse for me is possibly that I'm trying to make HAST base on carp...) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** -- Forwarded message -- Date: Tue, 5 Mar 2013 02:31:51 From: Dmitry Morozovsky ma...@rinet.ru To: freebsd...@freebsd.org Subject: carp on stable/9: is there a way to keep jumbo? Dear collesagues, yes, I know glebius@ overhauled carp in -current, but I'm a bit nervous to deploy bleeding edge system on a NAS/SAN ;) So, my question is about current state of carp in stable/9: building HA pair I found that carp interfaces lose jumbo capabilities: root@cthulhu4:~# ifconfig | grep mtu em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 9000 em1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 9000 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 lagg0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 9000 carp0: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500 carp1: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500 root@cthulhu4:~# ifconfig carp1 mtu 9000 ifconfig: ioctl (set mtu): Invalid argument Is it unavoidable at the moment, or am I missing something obvious? Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
What does zfs-stats -a show when your having the stall issue? You can also use zfs iostats to show individual disk iostats which may help identify a single failing disk e.g. zpool iostat -v 1 Also have you investigated which of the two sysctls you changed fixed it or does it require both? Regards Steve - Original Message - From: Karl Denninger k...@denninger.net To: freebsd-stable@freebsd.org Sent: Monday, March 04, 2013 10:48 PM Subject: ZFS stalls -- and maybe we should be talking about defaults? Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. H... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set vfs.zfs.write_limit_override to 102400 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel during a scrub. It appears that performance is roughly equivalent, maybe a bit less, than it was with UFS+SU -- but it's fairly close. The operating theory I have at the moment is that the ARC cache was in some way getting into a near-deadlock situation with other memory demands on the system (there IS a Postgres server running on this hardware although it's a replication server and not taking queries -- nonetheless it does grab a chunk of RAM) leading to the stalls. Limiting its grab of RAM appears to have to resolved the contention issue. I was unable to catch it actually running out of free memory although it was consistently into the low five-digit free page count and the kernel never garfed on the console about resource exhaustion -- other than a bitch about swap stalling (the infamous more than 20 seconds message.) Page space in use near the time in question (I could not get a display while locked as it went to I/O and froze) was not zero, but pretty close to it (a few thousand blocks.) That the system was driven into light paging does appear to be significant and indicative of some sort of memory contention issue as under operation with UFS filesystems this machine has never been observed to allocate page space. Anyone seen anything like this before and if
Re: carp on stable/9: is there a way to keep jumbo? (fwd)
You might want to try:- http://blog.multiplay.co.uk/dropzone/freebsd/carp-mtu.patch Be warned it doesn't do any validation so if you use it against physical interfaces with a smaller MTU things will likely go badly wrong, hell they may go badly wrong anyway as its just a very quick and dirty hack ;-) Regards Steve - Original Message - From: Dmitry Morozovsky ma...@rinet.ru To: freebsd-stable@FreeBSD.org Sent: Monday, March 04, 2013 10:49 PM Subject: carp on stable/9: is there a way to keep jumbo? (fwd) Collegaues, sorry, sent to the wrong list (the only escuse for me is possibly that I'm trying to make HAST base on carp...) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** -- Forwarded message -- Date: Tue, 5 Mar 2013 02:31:51 From: Dmitry Morozovsky ma...@rinet.ru To: freebsd...@freebsd.org Subject: carp on stable/9: is there a way to keep jumbo? Dear collesagues, yes, I know glebius@ overhauled carp in -current, but I'm a bit nervous to deploy bleeding edge system on a NAS/SAN ;) So, my question is about current state of carp in stable/9: building HA pair I found that carp interfaces lose jumbo capabilities: root@cthulhu4:~# ifconfig | grep mtu em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 9000 em1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 9000 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 lagg0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric 0 mtu 9000 carp0: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500 carp1: flags=49UP,LOOPBACK,RUNNING metric 0 mtu 1500 root@cthulhu4:~# ifconfig carp1 mtu 9000 ifconfig: ioctl (set mtu): Invalid argument Is it unavoidable at the moment, or am I missing something obvious? Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25% ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips without much load. Interestingly pbzip2 consistently created a problem on a volume whereas gzip does not. Here, stalls happen across several systems however I have had less problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same chips: IR vs IT) I don't have a problem. On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote: Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. H... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set vfs.zfs.write_limit_override to 102400 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel during a scrub. It appears that performance is roughly equivalent, maybe a bit less, than it was with UFS+SU -- but it's fairly close. The operating theory I have at the moment is that the ARC cache was in some way getting into a near-deadlock situation with other memory demands on the system (there IS a Postgres server running on this hardware although it's a replication server and not taking queries -- nonetheless it does grab a chunk of RAM) leading to the stalls. Limiting its grab of RAM appears to have to resolved the contention issue. I was unable to catch it actually running out of free memory although it was consistently into the low five-digit free page count and the kernel never garfed on the console about resource exhaustion -- other than a bitch about swap stalling (the infamous more than 20 seconds message.) Page space in use near the time in question (I could not get a display while locked as it went to I/O and froze) was not zero, but pretty close to it (a few thousand blocks.) That the system was driven into light paging does appear to be significant and indicative of some sort of memory contention issue as under operation with UFS filesystems this machine has never been observed to allocate page space.
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 3/4/2013 6:33 PM, Steven Hartland wrote: What does zfs-stats -a show when your having the stall issue? You can also use zfs iostats to show individual disk iostats which may help identify a single failing disk e.g. zpool iostat -v 1 Also have you investigated which of the two sysctls you changed fixed it or does it require both? Regards Steve - Original Message - From: Karl Denninger k...@denninger.net To: freebsd-stable@freebsd.org Sent: Monday, March 04, 2013 10:48 PM Subject: ZFS stalls -- and maybe we should be talking about defaults? Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. H... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set vfs.zfs.write_limit_override to 102400 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel during a scrub. It appears that performance is roughly equivalent, maybe a bit less, than it was with UFS+SU -- but it's fairly close. The operating theory I have at the moment is that the ARC cache was in some way getting into a near-deadlock situation with other memory demands on the system (there IS a Postgres server running on this hardware although it's a replication server and not taking queries -- nonetheless it does grab a chunk of RAM) leading to the stalls. Limiting its grab of RAM appears to have to resolved the contention issue. I was unable to catch it actually running out of free memory although it was consistently into the low five-digit free page count and the kernel never garfed on the console about resource exhaustion -- other than a bitch about swap stalling (the infamous more than 20 seconds message.) Page space in use near the time in question (I could not get a display while locked as it went to I/O and froze) was not zero, but pretty close to it (a few thousand blocks.) That the system was driven into light paging does appear to be significant and indicative of some sort of memory contention issue as under operation with UFS
Re: ZFS stalls -- and maybe we should be talking about defaults?
Stick this in /boot/loader.conf and see if your lockups goes away: vfs.zfs.write_limit_override=102400 I've got a sentinal running that watches for zero-bandwidth zpool iostat 5s that has been running for close to 12 hours now and with the two tunables I changed it doesn't appear to be happening any more. This system always has small-ball write I/Os going to it as it's a postgresql hot standby mirror backing a VERY active system and is receiving streaming logdata from the primary at a colocation site, so the odds of it ever experiencing an actual zero for I/O (unless there's a connectivity problem) is pretty remote. If it turns out that the write_limit_override tunable is the one responsible for stopping the hangs I can drop the ARC limit tunable although I'm not sure I want to; I don't see much if any performance penalty from leaving it where it is and if the larger cache isn't helping anything then why use it? I'm inclined to stick an SSD in the cabinet as a cache drive instead of dedicating RAM to this -- even though it's not AS fast as RAM it's still MASSIVELY quicker than getting data off a rotating plate of rust. Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all? On 3/4/2013 8:07 PM, Dennis Glatting wrote: I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25% ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips without much load. Interestingly pbzip2 consistently created a problem on a volume whereas gzip does not. Here, stalls happen across several systems however I have had less problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same chips: IR vs IT) I don't have a problem. On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote: Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. H... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=20), set vfs.zfs.write_limit_override to 102400 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel
Re: ZFS stalls -- and maybe we should be talking about defaults?
- Original Message - From: Karl Denninger k...@denninger.net Stick this in /boot/loader.conf and see if your lockups goes away: vfs.zfs.write_limit_override=102400 ... If it turns out that the write_limit_override tunable is the one responsible for stopping the hangs I can drop the ARC limit tunable although I'm not sure I want to; I don't see much if any performance penalty from leaving it where it is and if the larger cache isn't helping anything then why use it? I'm inclined to stick an SSD in the cabinet as a cache drive instead of dedicating RAM to this -- even though it's not AS fast as RAM it's still MASSIVELY quicker than getting data off a rotating plate of rust. Now interesting you should say that I've seen a stall recently on ZFS only box running on 6 x SSD RAIDZ2. The stall was caused by fairly large mysql import, with nothing else running. Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all? Correct Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 3/4/2013 9:25 PM, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net Stick this in /boot/loader.conf and see if your lockups goes away: vfs.zfs.write_limit_override=102400 ... If it turns out that the write_limit_override tunable is the one responsible for stopping the hangs I can drop the ARC limit tunable although I'm not sure I want to; I don't see much if any performance penalty from leaving it where it is and if the larger cache isn't helping anything then why use it? I'm inclined to stick an SSD in the cabinet as a cache drive instead of dedicating RAM to this -- even though it's not AS fast as RAM it's still MASSIVELY quicker than getting data off a rotating plate of rust. Now interesting you should say that I've seen a stall recently on ZFS only box running on 6 x SSD RAIDZ2. The stall was caused by fairly large mysql import, with nothing else running. Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. That's exactly what I can reproduce here; the stalls are anywhere from a few seconds to well north of a half-minute. It looks like the machine is hung -- but it is not. The machine in question normally runs with zero swap allocated but it always has 1.5Gb of shared memory allocated to Postgres (shared_buffers = 1500MB in its config file) I wonder if the ARC cache management code is misbehaving when shared segments are in use? -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On Mon, 2013-03-04 at 20:58 -0600, Karl Denninger wrote: Stick this in /boot/loader.conf and see if your lockups goes away: vfs.zfs.write_limit_override=102400 K. I've got a sentinal running that watches for zero-bandwidth zpool iostat 5s that has been running for close to 12 hours now and with the two tunables I changed it doesn't appear to be happening any more. I've also done this as well as top and systat -vmstat. Disk I/O stops but the system lives through top, system, and the network. However, if I try to login the login won't complete. All of my systems are hardware RAID1 for the OS (LSI and Areca) and typically a separate disk for swap. All other disks are ZFS. This system always has small-ball write I/Os going to it as it's a postgresql hot standby mirror backing a VERY active system and is receiving streaming logdata from the primary at a colocation site, so the odds of it ever experiencing an actual zero for I/O (unless there's a connectivity problem) is pretty remote. I am doing multi TB sorts and GB database loads. If it turns out that the write_limit_override tunable is the one responsible for stopping the hangs I can drop the ARC limit tunable although I'm not sure I want to; I don't see much if any performance penalty from leaving it where it is and if the larger cache isn't helping anything then why use it? I'm inclined to stick an SSD in the cabinet as a cache drive instead of dedicating RAM to this -- even though it's not AS fast as RAM it's still MASSIVELY quicker than getting data off a rotating plate of rust. I forgot to mention that on my three 8.3 systems they occasionally offline a disk (one or two a week, total). I simply online the disk and after resilver all is well. There are ~40 disks across those three systems. Of my 9.1 systems three are busy but with smaller number of disks (about eight across two volumes (RAIDz2 and mirror). I also have a ZFS-on-Linux (CentOS) system for play (about 12 disks). It did not exhibit problems when it was in use but it did teach me a lesson on the evils of dedup. :) Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all? Dunno. On 3/4/2013 8:07 PM, Dennis Glatting wrote: I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25% ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips without much load. Interestingly pbzip2 consistently created a problem on a volume whereas gzip does not. Here, stalls happen across several systems however I have had less problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same chips: IR vs IT) I don't have a problem. On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote: Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still
Re: ZFS stalls -- and maybe we should be talking about defaults?
On Tue, 2013-03-05 at 03:25 +, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net Stick this in /boot/loader.conf and see if your lockups goes away: vfs.zfs.write_limit_override=102400 ... If it turns out that the write_limit_override tunable is the one responsible for stopping the hangs I can drop the ARC limit tunable although I'm not sure I want to; I don't see much if any performance penalty from leaving it where it is and if the larger cache isn't helping anything then why use it? I'm inclined to stick an SSD in the cabinet as a cache drive instead of dedicating RAM to this -- even though it's not AS fast as RAM it's still MASSIVELY quicker than getting data off a rotating plate of rust. Now interesting you should say that I've seen a stall recently on ZFS only box running on 6 x SSD RAIDZ2. The stall was caused by fairly large mysql import, with nothing else running. Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. I've seen this too. Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all? Correct Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Dennis Glatting d...@pki2.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
- Original Message - From: Karl Denninger k...@denninger.net Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. That's exactly what I can reproduce here; the stalls are anywhere from a few seconds to well north of a half-minute. It looks like the machine is hung -- but it is not. Out of interest when this happens for you is syncer using lots of CPU? If its anything like my stalls you'll need top loaded prior to the fact. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 3/4/2013 10:01 PM, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. That's exactly what I can reproduce here; the stalls are anywhere from a few seconds to well north of a half-minute. It looks like the machine is hung -- but it is not. Out of interest when this happens for you is syncer using lots of CPU? If its anything like my stalls you'll need top loaded prior to the fact. Regards Steve Don't know. But the CPU is getting hammered when it happens because I am geli-encrypting all my drives and as a consequence it is not at all uncommon for the load average to be north of 10 when the system is under heavy I/O load. System response is fine right up until it stalls. I'm going to put some effort into trying to isolate exactly what is going on here in the coming days since I happen to have a spare box in an identical configuration that I can afford to lock up without impacting anyone doing real work :-) -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
Quoth Karl Denninger k...@denninger.net: Note that the machine is not booting from ZFS -- it is booting from and has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks like a single da0 drive to the OS) and that drive stalls as well when it freezes. It's definitely a kernel thing when it happens as the OS would otherwise not have locked (just I/O to the user partitions) -- but it does. Is it still the case that mixing UFS and ZFS can cause problems, or were they all fixed? I remember a while ago (before the arc usage monitoring code was added) there were a number of reports of serious probles running an rsync from UFS to ZFS. If you can it might be worth trying your scratch machine booting from ZFS. Probably the best way is to leave your swap partition where it is (IMHO it's not worth trying to swap onto a zvol) and convert the UFS partition into a separate zpool to boot from. You will also need to replace the boot blocks; assuming you're using GPT you can do this with gpart bootcode -p /boot/gptzfsboot -i gpt boot partition. Ben ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On Tue, Mar 05, 2013 at 05:05:47AM +, Ben Morrow wrote: Quoth Karl Denninger k...@denninger.net: Note that the machine is not booting from ZFS -- it is booting from and has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks like a single da0 drive to the OS) and that drive stalls as well when it freezes. It's definitely a kernel thing when it happens as the OS would otherwise not have locked (just I/O to the user partitions) -- but it does. Is it still the case that mixing UFS and ZFS can cause problems, or were they all fixed? I remember a while ago (before the arc usage monitoring code was added) there were a number of reports of serious probles running an rsync from UFS to ZFS. This problem still exists on stable/9. The behaviour manifests itself as fairly bad performance (I cannot remember if stalling or if just throughput rates were awful). I can only speculate as to what the root cause is, but my guess is that it has something to do with the two caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. The advice I've given people in the past is: if you do a LOT of I/O between UFS and ZFS on the same box, it's time to move to 100% ZFS. That said, I still do not recommend ZFS for a root filesystem (this biting people still happens even today), and swap-on-ZFS is a huge no-no. I will note that I myself use pure UFS+SU (not SUJ) for my main OS installation (that means /, swap, /var, /tmp, and /usr) on a dedicated SSD, while everything else is ZFS raidz1 (no dedup, no compression; won't ever enable these until that thread priority problem is fixed on FreeBSD). However, when I was migrating from gmirror+UFS+SU to ZFS, I witnessed what I described in my 1st and 2nd paragraphs. What userland utilities were used (rsync vs. cp) made no difference; the problem is in the kernel. Footnote about this thread: This thread contains all sorts of random pieces of information about systems, with very little actual detail in them (barring the symptoms, which are always useful to know!). For example, just because your machine has 8 cores and 12GB of RAM doesn't mean jack squat if some software in the kernel is designed oddly. Reworded: throwing more hardware at a problem solves nothing. The most useful thing (for me) that I found was deep within the thread, a few words along the lines of De-dup isn't used. What about compression, and if it's *ever* been enabled on the filesystem (even if not presently enabled)? It matters. All this matters. I see lots of end-users talking about these problems, but (barring Steven) literally no kernel people who are in the know about ZFS mentioning how said users can get them (devs) info that can help track this down. Those devs live on freebsd-fs@ and freebsd-hackers@, and not too many read freebsd-stable@. Step back for a moment and look at this anti-KISS configuration: - Hardware RAID controller involved (Areca 1680ix) - Hardware RAID controller has its own battery-backed cache (2GB) - Therefore arcmsr(4) is involved -- revision of driver/OS build matters here, ditto with firmware version - 4 disks are involved, models unknown - Disks are GPT and are *partitioned, and ZFS refers to the partitions not the raw disk -- this matters (honest, it really does; the ZFS code handles things differently with raw disks) - Providers are GELI-encrypted Now ask yourself if any dev is really going to tackle this one given the above mess. My advice would be to get rid of the hardware RAID (go with Intel ICHxx or ESBx on-board with AHCI), use raw disks for ZFS (if 4096-byte sector disks use the gnop(8) method, which is a one-time thing), and get rid of GELI. If you can reproduce the problem there 100% of the time, awesome, it's a clean/clear setup for someone to help investigate. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
In article 8c68812328e3483ba9786ef155911...@multiplay.co.uk, kill...@multiplay.co.uk writes: Now interesting you should say that I've seen a stall recently on ZFS only box running on 6 x SSD RAIDZ2. The stall was caused by fairly large mysql import, with nothing else running. Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. I have certainly seen what you might describe as stalls, caused, so far as I can tell, by kernel memory starvation. I've seen it take as much as a half an hour to recover from these (which is too long for my users). Right now I have the ARC limited to 64 GB (on a 96 GB file server) and that has made it more stable, but it's still not behaving quite as I would like, and I'm looking to put more memory into the system (to be used for non-ARC functions). Looking at my munin graphs, I find that backups in particular put very heavy pressure on, doubling the UMA allocations over steady-state, and this takes about four or five hours to climb back down. See http://people.freebsd.org/~wollman/vmstat_z-day.png for an example. Some of the stalls are undoubtedly caused by internal fragmentation rather than actual data in use. (Solaris used to have this issue, and some hooks were added to allow some amount of garbage collection with the cooperation of the filesystem.) -GAWollman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org