Re: SunFire X2200 ilo's bge1 DOWN/UP
On Fri, May 31, 2013 at 08:24:47AM +0300, Daniel Braniss wrote: On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: --/04w6evG8XlLl3ft Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename=bge.media_sts.diff Index: sys/dev/bge/if_bge.c === --- sys/dev/bge/if_bge.c (revision 251021) +++ sys/dev/bge/if_bge.c (working copy) @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar BGE_LOCK(sc); + if ((ifp-if_flags IFF_UP) == 0) { + BGE_UNLOCK(sc); + return; + } if (sc-bge_flags BGE_FLAG_TBI) { ifmr-ifm_status = IFM_AVALID; ifmr-ifm_active = IFM_ETHER; --/04w6evG8XlLl3ft-- after 18hs, the logs are empty! it seems the patch fixes the problem. now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts ... It could be any number of daemons that query interface state such as an SNMP server, ladvd, etc. If you wanted help you could modify the patch so that it does something like this: #include sys/proc.h if (/* test for IFF_UP */) { BGE_UNLOCK(sc); if_printf(ifp, state queried on down interface by pid %d (%s), --| add a \n curthread-td_proc-p_pid, curthread-td_proc-p_comm); return; } -- John Baldwin snmpd call this several times a second, (difficult to measeure since sysolog just says last message repeated 22 times in any case, the DOWN/UP appears once every few hours, oh well. I have now stopped the snmpd daemon, maybe there is someone else ... I have no idea why snmpd wants to know media status for interfaces that are put into down state. The media status resolved after bringing up the interface may be different one that was seen before. The patch also makes dhclient think driver got a valid link regardless of link establishment. I guess that wouldn't be issue though. I'll commit the patch after some more testing. Thanks for reporting and testing! no problem! after more than 3 days, there were no more 'reports', so snmpd was the culprit. the snmpd we use is from ports, i'll try and see waht's going on ... thanks danny thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Corrupt GPT header on disk from twa array - fixable?
On Jun 3, 2013, at 1:09, Warren Block wbl...@wonkity.com wrote: On Mon, 3 Jun 2013, Alban Hertroys wrote: Really, the easiest way would be to temporarily install the old RAID controller and copy the data off the array. Well, that would mean I'd have to assemble the old server again, as the controller is not compatible with the hardware in the new one. And that would probably be unnecessary as well, since I already did copy the data off those disks. I was just curious whether it would be possible to read that data off the disks while I still have them (with their original contents) in the new server in the eventuality that I _did_ forget to copy something over or that something wasn't copied over correctly. I copied the data over a 100MBit ethernet link, which was the fastest option I had with the old server; it had USB1 and no native SATA. Hence the RAID controller, but that was on a now deprecated PCI-X channel (those 64-bit parallel things) and all 4 ports were in use. Not to mention that the CPU was so old that it had a rather narrow margin for operating temperatures and overheated several times during the copying process, because rsync+sshd put a relatively high load on the CPU (An old Athlon XP 2000+). PCI-X cards will operate in PCI slots. Or at least some will; I've done that with an Intel network card. The motherboard can't have components that block the unused part of the edge connector, or the offending card edge could be removed with extreme prejudice. Not this 3Ware card. I remember buying that particular motherboard because the card wouldn't fit in the PCI slots on the board I had. There's a division in those PCI-X slots opposite of where there's one in normal PCI slots and no groove in the card to match the division in the PCI slot. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Corrupt GPT header on disk from twa array - fixable?
On Mon, Jun 03, 2013 at 09:14:41AM +0200, Alban Hertroys wrote: On Jun 3, 2013, at 1:09, Warren Block wbl...@wonkity.com wrote: On Mon, 3 Jun 2013, Alban Hertroys wrote: Really, the easiest way would be to temporarily install the old RAID controller and copy the data off the array. Well, that would mean I'd have to assemble the old server again, as the controller is not compatible with the hardware in the new one. And that would probably be unnecessary as well, since I already did copy the data off those disks. I was just curious whether it would be possible to read that data off the disks while I still have them (with their original contents) in the new server in the eventuality that I _did_ forget to copy something over or that something wasn't copied over correctly. I copied the data over a 100MBit ethernet link, which was the fastest option I had with the old server; it had USB1 and no native SATA. Hence the RAID controller, but that was on a now deprecated PCI-X channel (those 64-bit parallel things) and all 4 ports were in use. Not to mention that the CPU was so old that it had a rather narrow margin for operating temperatures and overheated several times during the copying process, because rsync+sshd put a relatively high load on the CPU (An old Athlon XP 2000+). PCI-X cards will operate in PCI slots. Or at least some will; I've done that with an Intel network card. The motherboard can't have components that block the unused part of the edge connector, or the offending card edge could be removed with extreme prejudice. Not this 3Ware card. I remember buying that particular motherboard because the card wouldn't fit in the PCI slots on the board I had. There's a division in those PCI-X slots opposite of where there's one in normal PCI slots and no groove in the card to match the division in the PCI slot. This is all besides-the-point, but to clarify: please see the following diagram: http://en.wikipedia.org/wiki/File:PCI_Keying.png I recommend seeing the caption under the diagram, in addition to reading the Mixing of 32-bit and 64-bit PCI cards in different width slots section: http://en.wikipedia.org/wiki/PCI-X It sounds like your 3Ware card is 5V PCI-X (32-bit or 64-bit is irrelevant), and your new motherboard only supports 3.3V PCI (which is pretty much the norm on all motherboards today when it comes to classic PCI). The 5V stuff is generally shunned (both with regards to PCI and PCI-X) and is uncommon at this point in time. You can find some server-class boards that offer this capability, such as Supermicro's UIO slots, where you purchase the proper type of riser (adapter) for the type of card you have, i.e. UIO-5.5V PCI-X 64-bit), but you will not find this on consumer/desktop or even enthusiast boards. Example: http://www.supermicro.com/support/resources/riser/riser.aspx If you want to know what kind of card it is, ask 3Ware or see the user manual. Note that many vendors do not disclose all the relevant data in the manual or on their site. That info: voltage (3.3V vs. 5V vs. universal), bus width (32-bit vs. 64-bit), and if 64-bit if the card will function in a 32-bit slot (some cards won't). Educational footnote: AGP is another one of those standards that went through the same nonsense (specifically 3.3V vs. 1.5V), except the situation was worse when some card manufacturers began selling 1.5V cards with incorrect notchings, resulting in smoke/fire when installed in a 3.3V slot. I have one such card, and keep it solely as a reminder of manufacturer/vendor idiocy. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout
Ian Lepore wrote: On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote: Steven Hartland wrote: Have you checked your sata cables and psu outputs? Both of these could be the underlying cause of poor signalling. I can't easily check that because it is a cheap rented server in a remote location. But I don't believe it is bad cabling or PSU anyway, or otherwise the problem would occur intermittently all the time if the load on the disks is sufficiently high. But it only occurs at tags=3 and above. At tags=2 it does not occur at all, no matter how hard I hammer on the disks. At the moment I'm inclined to believe that it is either a bug in the HDD firmware or in the controller. The disks aren't exactly new, they're 400 GB Samsung ones that are several years old. I think it's not uncommon to have bugs in the NCQ implementation in such disks. The only thing that puzzles me is the fact that the problem also disappears completely when I reduce the SATA rev from II to I, even at tags=32. It seems to me that you dismiss signaling problems too quickly. Consider the possibilities... A bad cable leads to intermittant errors at higher speeds. When NCQ is disabled or limited the software handles these errors pretty much transparently. When NCQ is not limitted and there are many outstanding requests, suddenly the error handling in the software breaks down somehow and a minor recoverable problem becomes an in-your-face error. It could also be a software bug in the way CAM handles the failure of NCQ commands. When command queueing is used on a SCSI drive and a queued command fails only that command fails. A queued command failure on a SATA device fails ALL currently queued commands. I've not looked at the code but do the SATA CAM drivers do the right thing here? Less commands queued makes it less likely that multiple commands will be in progress when a failure occurs. A lower link rate also makes you more immune to signal failures. Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [HEADSUP] New pkg-devel 1.1.0 beta1
On Thu, May 30, 2013 at 05:20:54PM +0200, Baptiste Daroussin wrote: The pkg developement team is proud to announce the new 1.1.0 beta1 release of pkg. - new experimental pkg convert (can convert from and to legacy pkg database) pkg2ng now uses pkg convert (still recommanded to use pkg2ng) Converting packages from /var/db/pkg Converting pkg-1.1.0.b3_1... pkg: unknown keyword display, ignoring @display Installing pkg-1.1.0.b3_1...Segmentation fault (core dumped) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [HEADSUP] New pkg-devel 1.1.0 beta1
On Mon, Jun 03, 2013 at 07:17:24PM +0400, Slawa Olhovchenkov wrote: On Thu, May 30, 2013 at 05:20:54PM +0200, Baptiste Daroussin wrote: The pkg developement team is proud to announce the new 1.1.0 beta1 release of pkg. - new experimental pkg convert (can convert from and to legacy pkg database) pkg2ng now uses pkg convert (still recommanded to use pkg2ng) Converting packages from /var/db/pkg Converting pkg-1.1.0.b3_1... pkg: unknown keyword display, ignoring @display Installing pkg-1.1.0.b3_1...Segmentation fault (core dumped) ___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org Have you run pkg2ng? regards, Bapt pgpKioBuIVSWS.pgp Description: PGP signature
Re: [HEADSUP] New pkg-devel 1.1.0 beta1
On Mon, Jun 03, 2013 at 05:34:19PM +0200, Baptiste Daroussin wrote: On Mon, Jun 03, 2013 at 07:17:24PM +0400, Slawa Olhovchenkov wrote: On Thu, May 30, 2013 at 05:20:54PM +0200, Baptiste Daroussin wrote: The pkg developement team is proud to announce the new 1.1.0 beta1 release of pkg. - new experimental pkg convert (can convert from and to legacy pkg database) pkg2ng now uses pkg convert (still recommanded to use pkg2ng) Converting packages from /var/db/pkg Converting pkg-1.1.0.b3_1... pkg: unknown keyword display, ignoring @display Installing pkg-1.1.0.b3_1...Segmentation fault (core dumped) ___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org Have you run pkg2ng? Yes, this is run pkg2ng. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: freebsd-stable Digest, Vol 515, Issue 6
Folks, Hate to follow myself up, but - on the one 9.1-STABLE machine where the disk i/o bogging down issue was a showstopper, I fixed it by reverting to 8.4-STABLE. Symptoms instantly went away, box became performant and responsive. regards, Ross -- Ross Alexander, (780) 675-6823 / (780) 689-0749, r...@athabascau.ca Always do right. This will gratify some people, and astound the rest. -- Samuel Clemens -- This communication is intended for the use of the recipient to whom it is addressed, and may contain confidential, personal, and or privileged information. Please contact us immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communications received in error, or subsequent reply, should be deleted or destroyed. --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
9.1-current disk throughput stalls ?
Folks, I wonder if anyone here has insight on a disk throughput problem that's come up over the last week or two. Now, I habitually run an 'svn up' and then rebuild world + kernel every Saturday morning on the home machines. It's all scripted and logged; I've been doing this for years and the process is very cut and dried. Saturday AM, I started it as usual - today it was still running, but only about 15% done. Normally it completes in 39 minutes, +/- 1 minute. What I've noticed is that disk performance on disk intensive stuff has gotten very flaky over the last two or three weeks. A buildworld, to pick an example, will run nicely for three to five minutes and then bog down. The disks stay busy, but forward progress slows to a crawl and then apparently stops. Individual cleandirs are taking five to ten seconds each on an otherwise unloaded machine. It feels like a vax-11/780 with 30 users and RA-80s, if anyone here remembers those days :). Here's a 'systat -vms': 5 usersLoad 0.30 0.30 0.27 Jun 3 09:07 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 84032 13908 194911240736 15071k count All 671192 16300 1076410k61416 pages Proc:Interrupts r p d s w Csw Trp Sys Int Sof Fltcow 630 total 113 3573 29 113 630 83 26 26 zfodhdac1 16 ozfod xhci0 ehci 0.9%Sys 0.2%Intr 0.3%User 0.0%Nice 98.6%Idle%ozfod ohci0 ohci ||||||||||| daefr93 emu10kx0 + prcfr 178 hpet0:t0 dtbuf 596 totfr hdac0 259 Namei Name-cache Dir-cache329578 desvn react 359 ahci0 260 Callshits %hits % 17505 numvn pdwak re0 261 475 294 62 14841 frevn pdpgs intrn Disks ada0 ada1 pass0 pass1 796676 wire KB/t 5.42 5.96 0.00 0.00 65484 act tps 197 192 0 0 45332 inact MB/s 1.04 1.12 0.00 0.00 cache %busy7482 0 015071692 free buf This is taken during the early stages of a builworld. The cleandir job steps are *crawling* along. Rattling the keyboard (USB or serial, although an SSH sessions seems to work sometimes as well) gets the buildworld doing some useful work again. Meanwhile, the apps load (which is two instances of WSPR, an instance of baudline, KDE, and a vncserver), which is soundcard I/O bound and does little to no disk I/O) runs along perfectly happily. The oldest kernel I have that shows the syndrome is - FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498: Sat May 11 00:03:15 MDT 2013 toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64 H/W info - hw.machine: amd64 hw.model: AMD Phenom(tm) II X4 965 Processor hw.ncpu: 4 hw.physmem: 16883937280 hw.clockrate: 3411 kern.sched.name: ULE ahci0: ATI IXP700 AHCI SATA controller port 0xa000-0xa007,0x9000-0x9003,\ 0x8000-0x8007,0x7000-0x7003,0x6000-0x600f mem 0xfe6ffc00-0xfe6f \ irq 19 at device 17.0 on pci0 ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported ahcich0: AHCI channel at channel 0 on ahci0 [...] ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: WDC WD1200JD-22HBC0 08.02D08 ATA-6 SATA 1.x device ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada0: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 ada1: WDC WD1200JD-22HBC0 08.02D08 ATA-6 SATA 1.x device ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada1: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad8 I'm not paging, I don't have wild interrupt loads (checked with 'vmstat -i'), the ZFS pool is not in the middle of a scrub, but the machine has bad trivial response and buildworld doesn't get finished. I am seeing very similar behaviour on three other 9.1-current machines, all of which are AHCI/SATA setups, using both Seagate and WD disks (of random sizes and ages). All these boxes ran fine a month ago. BTW, when I do the rattle-keyboard-to-get-disks-going trick, the NFS daemon reports that the system clock slews badly - machine time drops behind wall clock time. Something is locking the clock update off. (Hmmm, I see I'm running a pre-5000/feature
Re: [HEADSUP] New pkg-devel 1.1.0 beta1
On Mon, Jun 03, 2013 at 07:39:03PM +0400, Slawa Olhovchenkov wrote: On Mon, Jun 03, 2013 at 05:34:19PM +0200, Baptiste Daroussin wrote: On Mon, Jun 03, 2013 at 07:17:24PM +0400, Slawa Olhovchenkov wrote: On Thu, May 30, 2013 at 05:20:54PM +0200, Baptiste Daroussin wrote: The pkg developement team is proud to announce the new 1.1.0 beta1 release of pkg. - new experimental pkg convert (can convert from and to legacy pkg database) pkg2ng now uses pkg convert (still recommanded to use pkg2ng) Converting packages from /var/db/pkg Converting pkg-1.1.0.b3_1... pkg: unknown keyword display, ignoring @display Installing pkg-1.1.0.b3_1...Segmentation fault (core dumped) ___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org Have you run pkg2ng? Yes, this is run pkg2ng. Ok I'll have a look and fix asap. regards, Bapt pgpqoZlMsKLc4.pgp Description: PGP signature
ZFS LZ4 Upgrade
Hi. I'm planning on doing a ZFS root installation on a remote server very soon. The company only offers a 9.0 and 9.1 installation and rescue (nfs/pxe boot with ramdisk basically) system. I'd like to use LZ4 with the ZFS root pool, so I'm going to be upgrading to -STABLE once I have the initial system installed. Here's what I'll do: - install the 9.1 system - svn source, buildworld/kernel, install, reboot - upon booting the -STABLE system, begin enabling LZ4 compression on /usr/ports /usr/src etc. Will this work, or do I need to find some way to initially create the zpool with a -STABLE system? Is it just a matter of running zfs upgrade and zpool upgrade before enabling LZ4, or am I missing something? Thanks -kenta ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS LZ4 Upgrade
On Jun 3, 2013, at 12:52 PM, Kenta Suzumoto ken...@hush.com wrote: Hi. I'm planning on doing a ZFS root installation on a remote server very soon. The company only offers a 9.0 and 9.1 installation and rescue (nfs/pxe boot with ramdisk basically) system. I'd like to use LZ4 with the ZFS root pool, so I'm going to be upgrading to -STABLE once I have the initial system installed. Here's what I'll do: - install the 9.1 system - svn source, buildworld/kernel, install, reboot - upon booting the -STABLE system, begin enabling LZ4 compression on /usr/ports /usr/src etc. Will this work, or do I need to find some way to initially create the zpool with a -STABLE system? Is it just a matter of running zfs upgrade and zpool upgrade before enabling LZ4, or am I missing something? Thanks That should work. Just keep in mind that blocks written before you enable compression won't be compressed. So you may want to create a _new_ ZFS for src (and ports if it already exists as well) after your source upgrade, then copy the contents of /usr/src over to it. (Then update the mountpoints as desired, etc.) JN ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout
On Mon, Jun 03, 2013 at 03:06:53PM +0100, Mike Pumford wrote: Ian Lepore wrote: On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote: Steven Hartland wrote: Have you checked your sata cables and psu outputs? Both of these could be the underlying cause of poor signalling. I can't easily check that because it is a cheap rented server in a remote location. But I don't believe it is bad cabling or PSU anyway, or otherwise the problem would occur intermittently all the time if the load on the disks is sufficiently high. But it only occurs at tags=3 and above. At tags=2 it does not occur at all, no matter how hard I hammer on the disks. At the moment I'm inclined to believe that it is either a bug in the HDD firmware or in the controller. The disks aren't exactly new, they're 400 GB Samsung ones that are several years old. I think it's not uncommon to have bugs in the NCQ implementation in such disks. The only thing that puzzles me is the fact that the problem also disappears completely when I reduce the SATA rev from II to I, even at tags=32. It seems to me that you dismiss signaling problems too quickly. Consider the possibilities... A bad cable leads to intermittant errors at higher speeds. When NCQ is disabled or limited the software handles these errors pretty much transparently. When NCQ is not limitted and there are many outstanding requests, suddenly the error handling in the software breaks down somehow and a minor recoverable problem becomes an in-your-face error. It could also be a software bug in the way CAM handles the failure of NCQ commands. When command queueing is used on a SCSI drive and a queued command fails only that command fails. A queued command failure on a SATA device fails ALL currently queued commands. I've not looked at the code but do the SATA CAM drivers do the right thing here? Quoting T13/2015-D ATA8-ACS2 WD spec: If an error occurs while the device is processing an NCQ command, then the device shall return command aborted for all NCQ commands that are in the queue and shall return command aborted for any new commands, except a READ LOG EXT command requesting log address 10h, until the device completes a READ LOG EXT command requesting log address 10h (i.e., reading the NCQ Command Error log) without error. While I can't easily provide an answer to your question, I can tell you that sys/dev/ahci/ahci.c does execute READ LOG EXT (command 0x2f) for certain scenarios (the code is in function ahci_issue_recovery()). The one person who can answer this question is mav@, who is now CC'd. Less commands queued makes it less likely that multiple commands will be in progress when a failure occurs. A lower link rate also makes you more immune to signal failures. He isn't seeing SATA-level signal/link failure; the AHCI driver would complain about that, and those messages aren't there. Unless, of course, those messages are only visible when verbose booting is enabled (I hope not). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-current disk throughput stalls ?
On Mon, Jun 03, 2013 at 09:38:45AM -0600, Ross Alexander wrote: I wonder if anyone here has insight on a disk throughput problem that's come up over the last week or two. Now, I habitually run an 'svn up' and then rebuild world + kernel every Saturday morning on the home machines. It's all scripted and logged; I've been doing this for years and the process is very cut and dried. Saturday AM, I started it as usual - today it was still running, but only about 15% done. Normally it completes in 39 minutes, +/- 1 minute. What I've noticed is that disk performance on disk intensive stuff has gotten very flaky over the last two or three weeks. A buildworld, to pick an example, will run nicely for three to five minutes and then bog down. The disks stay busy, but forward progress slows to a crawl and then apparently stops. Individual cleandirs are taking five to ten seconds each on an otherwise unloaded machine. It feels like a vax-11/780 with 30 users and RA-80s, if anyone here remembers those days :). Here's a 'systat -vms': 5 usersLoad 0.30 0.30 0.27 Jun 3 09:07 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 84032 13908 194911240736 15071k count All 671192 16300 1076410k61416 pages Proc:Interrupts r p d s w Csw Trp Sys Int Sof Fltcow 630 total 113 3573 29 113 630 83 26 26 zfodhdac1 16 ozfod xhci0 ehci 0.9%Sys 0.2%Intr 0.3%User 0.0%Nice 98.6%Idle%ozfod ohci0 ohci ||||||||||| daefr93 emu10kx0 + prcfr 178 hpet0:t0 dtbuf 596 totfr hdac0 259 Namei Name-cache Dir-cache329578 desvn react 359 ahci0 260 Callshits %hits % 17505 numvn pdwak re0 261 475 294 62 14841 frevn pdpgs intrn Disks ada0 ada1 pass0 pass1 796676 wire KB/t 5.42 5.96 0.00 0.00 65484 act tps 197 192 0 0 45332 inact MB/s 1.04 1.12 0.00 0.00 cache %busy7482 0 015071692 free buf This is taken during the early stages of a builworld. The cleandir job steps are *crawling* along. Rattling the keyboard (USB or serial, although an SSH sessions seems to work sometimes as well) gets the buildworld doing some useful work again. Meanwhile, the apps load (which is two instances of WSPR, an instance of baudline, KDE, and a vncserver), which is soundcard I/O bound and does little to no disk I/O) runs along perfectly happily. The oldest kernel I have that shows the syndrome is - FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498: Sat May 11 00:03:15 MDT 2013 toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64 H/W info - hw.machine: amd64 hw.model: AMD Phenom(tm) II X4 965 Processor hw.ncpu: 4 hw.physmem: 16883937280 hw.clockrate: 3411 kern.sched.name: ULE ahci0: ATI IXP700 AHCI SATA controller port 0xa000-0xa007,0x9000-0x9003,\ 0x8000-0x8007,0x7000-0x7003,0x6000-0x600f mem 0xfe6ffc00-0xfe6f \ irq 19 at device 17.0 on pci0 ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported ahcich0: AHCI channel at channel 0 on ahci0 [...] ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: WDC WD1200JD-22HBC0 08.02D08 ATA-6 SATA 1.x device ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada0: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 ada1: WDC WD1200JD-22HBC0 08.02D08 ATA-6 SATA 1.x device ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada1: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad8 I'm not paging, I don't have wild interrupt loads (checked with 'vmstat -i'), the ZFS pool is not in the middle of a scrub, but the machine has bad trivial response and buildworld doesn't get finished. I am seeing very similar behaviour on three other 9.1-current machines, all of which are AHCI/SATA setups, using both Seagate and WD disks (of random sizes and ages). All these boxes ran fine a month ago. BTW, when I do the rattle-keyboard-to-get-disks-going trick, the NFS daemon reports that the
can not build xf86-video-intel
Hi, Recently I switched from using ports only to packages and now trying to enable kms. I've rebuild libdrm, but xf86-video-intel fails with the following error (any ideas how to fix that?): CC sna_display_fake.lo CC sna_driver.lo sna_driver.c:437:2: error: implicit declaration of function 'xf86getBoolValue' is invalid in C99 [-Werror,-Wimplicit-function-declaration] xf86getBoolValue(val, xf86GetOptValString(sna-Options, id)); ^ 1 error generated. *** [sna_driver.lo] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel/work/xf86-video-intel-2.21.6/src/sna. *** [all-recursive] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel/work/xf86-video-intel-2.21.6/src/sna. *** [all-recursive] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel/work/xf86-video-intel-2.21.6/src. *** [all-recursive] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel/work/xf86-video-intel-2.21.6. *** [all] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel/work/xf86-video-intel-2.21.6. *** [do-build] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel. *** [reinstall] Error code 1 Stop in /usr/ports/x11-drivers/xf86-video-intel. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
bce kernel page faults and NMIs (was: Strange reboot since 9.1)
Howdy folks, this email is a follow-on to a 3-month-old thread about kernel page faults from the bce driver[0]. 0: http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072713.html Sorry to revive such an old thread, but a couple of bits of new information has come to light here that may be useful for others. The header splitting suggestion that Marius Strobl made[1] did fix the kernel page fault rooted in bce_intr() that we were seeing (and that other folks reported in the original thread). I'm no bce expert, but it looks to me like the bce driver does not apply the same flow control to its page queue as it does to its receive queue, maybe that's related to the problem? 1: http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072766.html After disabling bce header splitting we stopped getting kernel page faults, but we still had problems with this NIC (Broadcom NetXtreme II BCM5716 Gigabit Ethernet) producing frequent PCI errors and occasional NMIs. I found this thread[2] that suggests that the NIC firmware version may be relevant to the NMI problem. The Red Hat people are reporting that firmware version 6.0.1 is bad and 6.4.5 is good; 9.1 ships with 6.0.17, so who knows what that means... We ended up reverting to the bce driver from FreeBSD 7 and that fixed our NMI problems. (The bce driver from FreeBSD 7 also has header splitting disabled by default: Bonus!) 2: https://bugzilla.redhat.com/show_bug.cgi?id=693542 -- Sebastian Kuzminsky ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-current disk throughput stalls ?
On Mon, 3 Jun 2013, Jeremy Chadwick wrote: 1. There is no such thing as 9.1-CURRENT. Either you meant 9.1-STABLE (what should be called stable/9) or -CURRENT (what should be called head). I wrote: The oldest kernel I have that shows the syndrome is - FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498: Sat May 11 00:03:15 MDT 2013 toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64 See above. You're right, I shouldn't post after a 07:00 dentist's appt while my spouse is worrying me about the ins adjustor's report on the car damage :(. Hey, I'm very fallible. I'll try harder. 2. Is there some reason you excluded details of your ZFS setup? zpool status would be a good start. Thanks for the useful hint as to what info you need to diagnose. One of the machines ran a 5 drive zraid-1 pool (Mnemosyne). Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup. (Mnemosyne-sub-1.) The third is a 2 drive ZFS raid-1, again in the simplest possible gpart/gmirror manner (Aukward). The fourth is a conceptually identical 2 drive ZFS raid-1, swapping to a zvol (Griffon.) If you look on the FreeBSD wiki, the pages that say bootable zfs gptzfsboot and bootable mirror - https://wiki.freebsd.org/RootOnZFS http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup Well, I just followed those in cookbook style (modulo device and pool names). Didn't see any reason to be creative; I build for reliability, not performance. Aukward is gpart/zfs raid-1 box #1: aukward:/u0/rwa ls -l /dev/gpt total 0 crw-r- 1 root operator 0x91 Jun 3 10:18 vol0 crw-r- 1 root operator 0x8e Jun 3 10:18 vol1 aukward:/u0/rwa zpool list -v NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT ult_root 111G 108G 2.53G97% 1.00x ONLINE - mirror 111G 108G 2.53G - gpt/vol0 - - - - gpt/vol1 - - - - aukward:/u0/rwa zpool status pool: ult_root state: ONLINE scan: scrub repaired 0 in 1h13m with 0 errors on Sun May 5 04:29:30 2013 config: NAME STATE READ WRITE CKSUM ult_root ONLINE 0 0 0 mirror-0ONLINE 0 0 0 gpt/vol0 ONLINE 0 0 0 gpt/vol1 ONLINE 0 0 0 errors: No known data errors (Yes, that machine has no swap. Has NEVER had swap, has 16 GB and uses maybe 10% at max load. Has been running 9.x since prerelease days, FWTW. The ARC is throttled to 2 GB; zfs-stats says I never get near using even that. It's just the box that drives the radios, a ham radio hobby machine.) Griffon is also gpart/zfs raid-1 - griffon:/u0/rwa uname -a FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 r251062M: Tue May 28 10:39:13 MDT 2013 t...@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC amd64 griffon:/u0/rwa ls -l /dev/gpt total 0 crw-r- 1 root operator 0x7b Jun 3 08:38 disk0 crw-r- 1 root operator 0x80 Jun 3 08:38 disk1 crw-r- 1 root operator 0x79 Jun 3 08:38 swap0 crw-r- 1 root operator 0x7e Jun 3 08:38 swap1 and the pool is fat and happy - griffon:/u0/rwa zpool status -v pool: pool0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 errors: No known data errors Note that swap is through ZFS zvol; griffon:/u0/rwa cat /etc/fstab # DeviceMountpoint FStype Options DumpPass# # # /dev/zvol/pool0/swap none swapsw 0 0 pool0 / zfs rw 0 0 pool0/tmp /tmpzfs rw 0 0 pool0/var /varzfs rw 0 0 pool0/usr /usrzfs rw 0 0 pool0/u0/u0 zfs rw 0 0 /dev/cd0/cdrom cd9660 ro,noauto 0 0 /dev/ada2s1d/mnt0 ufs rw,noauto 0 0 /dev/da0s1 /u0/rwa/camera msdosfs rw,noauto 0 0 The machine has 32 GB and never swaps. It runs virtualbox loads, anything from one to forty virtuals (little OpenBSD images.) Load is always light. As for the zraid-5 box (Mnemosyne), I first replaced the ZFS pool with a simple gpart/gmirror. The drives gmirrored are known to be good. That *also* ran like mud. Then I downgraded to 8.4-STABLE, GENERIC kernel, and it's just fine now thanks. I
Re: 9.1-current disk throughput stalls ?
On Mon, Jun 03, 2013 at 03:48:30PM -0600, Ross Alexander wrote: On Mon, 3 Jun 2013, Jeremy Chadwick wrote: 1. There is no such thing as 9.1-CURRENT. Either you meant 9.1-STABLE (what should be called stable/9) or -CURRENT (what should be called head). I wrote: The oldest kernel I have that shows the syndrome is - FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498: Sat May 11 00:03:15 MDT 2013 toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64 See above. You're right, I shouldn't post after a 07:00 dentist's appt while my spouse is worrying me about the ins adjustor's report on the car damage :(. Hey, I'm very fallible. I'll try harder. 2. Is there some reason you excluded details of your ZFS setup? zpool status would be a good start. Thanks for the useful hint as to what info you need to diagnose. One of the machines ran a 5 drive zraid-1 pool (Mnemosyne). Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup. (Mnemosyne-sub-1.) The third is a 2 drive ZFS raid-1, again in the simplest possible gpart/gmirror manner (Aukward). The fourth is a conceptually identical 2 drive ZFS raid-1, swapping to a zvol (Griffon.) If you look on the FreeBSD wiki, the pages that say bootable zfs gptzfsboot and bootable mirror - https://wiki.freebsd.org/RootOnZFS http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup Well, I just followed those in cookbook style (modulo device and pool names). Didn't see any reason to be creative; I build for reliability, not performance. Aukward is gpart/zfs raid-1 box #1: aukward:/u0/rwa ls -l /dev/gpt total 0 crw-r- 1 root operator 0x91 Jun 3 10:18 vol0 crw-r- 1 root operator 0x8e Jun 3 10:18 vol1 aukward:/u0/rwa zpool list -v NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT ult_root 111G 108G 2.53G97% 1.00x ONLINE - mirror 111G 108G 2.53G - gpt/vol0 - - - - gpt/vol1 - - - - aukward:/u0/rwa zpool status pool: ult_root state: ONLINE scan: scrub repaired 0 in 1h13m with 0 errors on Sun May 5 04:29:30 2013 config: NAME STATE READ WRITE CKSUM ult_root ONLINE 0 0 0 mirror-0ONLINE 0 0 0 gpt/vol0 ONLINE 0 0 0 gpt/vol1 ONLINE 0 0 0 errors: No known data errors (Yes, that machine has no swap. Has NEVER had swap, has 16 GB and uses maybe 10% at max load. Has been running 9.x since prerelease days, FWTW. The ARC is throttled to 2 GB; zfs-stats says I never get near using even that. It's just the box that drives the radios, a ham radio hobby machine.) Griffon is also gpart/zfs raid-1 - griffon:/u0/rwa uname -a FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 r251062M: Tue May 28 10:39:13 MDT 2013 t...@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC amd64 griffon:/u0/rwa ls -l /dev/gpt total 0 crw-r- 1 root operator 0x7b Jun 3 08:38 disk0 crw-r- 1 root operator 0x80 Jun 3 08:38 disk1 crw-r- 1 root operator 0x79 Jun 3 08:38 swap0 crw-r- 1 root operator 0x7e Jun 3 08:38 swap1 and the pool is fat and happy - griffon:/u0/rwa zpool status -v pool: pool0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 errors: No known data errors Note that swap is through ZFS zvol; griffon:/u0/rwa cat /etc/fstab # DeviceMountpoint FStype Options DumpPass# # # /dev/zvol/pool0/swap none swapsw 0 0 pool0 / zfs rw 0 0 pool0/tmp /tmpzfs rw 0 0 pool0/var /varzfs rw 0 0 pool0/usr /usrzfs rw 0 0 pool0/u0/u0 zfs rw 0 0 /dev/cd0/cdrom cd9660 ro,noauto 0 0 /dev/ada2s1d/mnt0 ufs rw,noauto 0 0 /dev/da0s1 /u0/rwa/camera msdosfs rw,noauto 0 0 The machine has 32 GB and never swaps. It runs virtualbox loads, anything from one to forty virtuals (little OpenBSD images.) Load is always light. As for the zraid-5 box (Mnemosyne), I first replaced the ZFS pool with a simple gpart/gmirror.
Re: 9.1-current disk throughput stalls ?
On Mon, Jun 03, 2013 at 03:34:26PM -0700, Jeremy Chadwick wrote: 7. ZFS setup is a mirror (RAID-1-like), Should have referenced [2]. 12. Rolling back to 8.4-STABLE (date/build unknown) apparently fixes your issue (I would appreciate you running the system for 72 hours before making this statement, and doing the *exact same things* on it that cause the problem with 9.1-STABLE) [2] I should have used the word exacerbate instead of cause. v) I really wish you would not have rolled this system back to 8.4-STABLE. For anyone to debug this, we need the system in a consistent state. Changing kernels/etc. User error while using vim (I have an awful tendency to nuke entire lines when switching between input mode vs. navigation mode); last line should read Changing kernels/etc. in the middle of troubleshooting a problem you ask for assistance with makes things very difficult. (And I say that knowing that rolling back as a form of testing is good, since it can help narrow things down to a specific version or release, i.e. a software problem). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS LZ4 Upgrade
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 06/03/13 11:52, Kenta Suzumoto wrote: Hi. I'm planning on doing a ZFS root installation on a remote server very soon. The company only offers a 9.0 and 9.1 installation and rescue (nfs/pxe boot with ramdisk basically) system. I'd like to use LZ4 with the ZFS root pool, so I'm going to be upgrading to -STABLE once I have the initial system installed. Here's what I'll do: - install the 9.1 system - svn source, buildworld/kernel, install, reboot - upon booting the -STABLE system, begin enabling LZ4 compression on /usr/ports /usr/src etc. Will this work, or do I need to find some way to initially create the zpool with a -STABLE system? Is it just a matter of running zfs upgrade and zpool upgrade before enabling LZ4, or am I missing something? Thanks This should work, you don't have to have the pool created on a -STABLE system and can always do 'zpool upgrade' when necessary. Please note that if you want to upgrade the root pool, make sure that you have a fresh bootblock before rebooting (zpool upgrade should give you a reminder). The FreeBSD ZFS boot code supports booting from LZ4 compressed datasets. I would recommend using a dedicated root pool by the way, it have a lot of advantages, like you can use a separate ZIL device for the non-root pool, etc. and have the flexibility of using different RAID-Z layout for different purpose (e.g. mirror for boot pool and RAID-Z for the rest), so your remote hands would be able to just remove one disk and your system is bootable, in the case when the first disk's boot partition is physically damaged. Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJRrSLoAAoJEG80Jeu8UPuzLW8IAMRJJHloskYlrikKGw/Iao8A jlifL2sDpnztgqhGtfltXOIyfKJL7EDw8s8ccJq8Xy1vKy2pndOgCc7GRfuvYS64 07kGrEKQKAv0BcrD6uddtMFstDdrI7cnK3btyAavNJhXR/b2A8f8/jze13mPpdUJ DS2PBJ0rwmHqU7VXxkCq/M8atZGT7pq2ednPcHXX3QaqPmpopUtX89x6D99S39U7 a1UKN2Ic78CAc5R01I80y6II85yBylwIlT5lRPE/SVYbtBFsidhQQ8/Hx/Fcz3qI 7ceK6CiAoQy7ntBvE+uBbQ2630Z3m6kOmInFJD/DnQnf1n6twmBxqwsFTjITbHw= =pCl7 -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org