Re: Possible PEBKAC bug for fwget(8)?
On 2023-Jul-07 08:03:40 +0100, Graham Perrin wrote: >PCI pictured at ><https://en.wikipedia.org/wiki/Peripheral_Component_Interconnect>, somehow I >don't imagine finding that type of slot inside the HP EliteBook where I ran >the command ;-) Whilst you probably don't have a full-size PCI or PCIe connector in your laptop, it's very likely that it has a Mini PCIe connector for the WiFi adapter. Even without that, there are virtual PCI buses inside your CPU chip - have a look at the output of "pciconf -lv". -- Peter Jeremy signature.asc Description: PGP signature
ntpd fails on recent -current/arm64
Somewhere between c283016-g607bc91d90a3 and c283077-g7f658f99f7ed, some change in the kernel has made ntpd stop working on my arm64 test box. (My amd64 test box is a couple of days behind so I'm not sure if it's arm-specific). What I've identified so far: * The problem is in the kernel, not userland. * The impact seems to be limited to ntpd (in particular, ntpdate works). * ntpd appears to be correctly exchanging NTP packets with peers. * ntpd is not responding to "ntpq -p" queries * ntp_gettime and ntp_adjtime both return TIME_ERROR to ntptime I've looked through the commits and, beyond much of netinet being roto-tilled, I can't see anything obvious. Is anyone else seeing anything similar? Can anyone suggest where to look next? -- Peter Jeremy signature.asc Description: PGP signature
Re: Beadm can't create snapshot
On 2022-Aug-23 15:19:34 +0200, Ronald Klop wrote: >Van: Kyle Evans >> I was not aware that beadm touches loader.conf, but I find that >> slightly horrifying. I won't personally make bectl do that, but I >> guess I could at least document that it doesn't... > >Today I looked up something for boot environments myself and read this: >https://wiki.freebsd.org/BootEnvironments#Setting_Boot_Dataset > >"In order for boot environments to be effective, you must let the bootfs zpool >property control which dataset gets mounted as the root. Particularly, >/etc/fstab must be purged of any / mount, and /boot/loader.conf must not be >setting vfs.root.mountfrom directly. " > >So it is documented somewhere at least. Looking at the wiki history, Kyle wrote that in January 2020. I wonder if he recalls where that requirement came from. I've gone rummaging through the mailing list history and other wiki pages. It seems that vfs.root.mountfrom used to be required - e.g. https://lists.freebsd.org/pipermail/freebsd-fs/2011-September/012482.html https://lists.freebsd.org/pipermail/svn-src-head/2011-October/030641.html and people wanted to change that - e.g. https://lists.freebsd.org/pipermail/freebsd-current/2009-October/012933.html https://lists.freebsd.org/pipermail/freebsd-fs/2010-March/008010.html resulting in it becoming optional in May 2012: https://lists.freebsd.org/pipermail/svn-src-head/2012-May/036902.html Based on the quoted wiki entry, it seems that sometime between May 2012 and January 2020, vfs.root.mountfrom went from "must be set" to "must not be set" and I can't find anywhere where that is publicised. This is a serious problem because we now have the situation where some documentation still says to set vfs.root.mountfrom - e.g. https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror step 2.6 and people are still using it without being warned that it shouldn't be used - e.g. the thread starting https://lists.freebsd.org/pipermail/freebsd-fs/2020-July/028351.html I've had a look at the beadm source and it preserves/updates vfs.root.mountfrom if it's present in loader.conf but doesn't add it if it's not present. IMO, if bectl isn't going to update loader.conf, it needs to warn and fail if loader.conf contains a vfs.root.mountfrom that points to a BE that's different to bootfs. (And ideally, a similar check of /etc/fstab, though beadm doesn't touch that). -- Peter Jeremy signature.asc Description: PGP signature
Re: Beadm can't create snapshot
On 2022-Aug-22 10:56:51 +0200, "Patrick M. Hausen" wrote: >> Am 22.08.2022 um 10:45 schrieb Peter Jeremy : >> On 2022-Aug-17 18:07:20 +0200, "Patrick M. Hausen" wrote: >>> Isn't beadm retired in favour of bectl? >> >> 2) "bectl activate" doesn't update /boot/loader.conf so the wrong >> root filesystem is mounted. > >You mean the vfs.root.mountfrom option? I thought that, too, was deprecated and >replaced by the bootfs property of the zpool. I've looking through mailing list archives and searched the 'net and haven't found anything saying vfs.root.mountfrom is deprecated. loader(8) mentions that it will fallback to using "currdev" if there's no root entry in /etc/fstab and vfs.root.mountfrom isn't set. At the very least, it's an undocumented incompatibility between beadm and bectl: I can't take an existing system that's using beadm and just switch to using bectl. -- Peter Jeremy signature.asc Description: PGP signature
Re: Beadm can't create snapshot
On 2022-Aug-17 18:07:20 +0200, "Patrick M. Hausen" wrote: >Isn't beadm retired in favour of bectl? bectl still has a number of bugs: 1) The output from "bectl list" is in filesystem/bename order rather than creation date order. This is an issue if you use (eg) git commit hashes as the name. 2) "bectl activate" doesn't update /boot/loader.conf so the wrong root filesystem is mounted. That said "bectl create" appears to be a workable replacement for "beadm create" and avoids the current "'snapshots_changed' is readonly" bugs. -- Peter Jeremy signature.asc Description: PGP signature
Re: recover deleted file
On 2022-Apr-17 01:13:02 +0300, Sami Halabi wrote: >I understand its hard to undelete since no one designed UFS/ZFS to do so.. >that why I asked in later replies to see if someone would step in and >implement such a "feature" and I suggested some directions/thoughts. As you point out, neither UFS nor ZFS were designed to support an "undelete" function: Once an inode has no references (open files or directory entries), the inode and all associated data blocks are returned to the free list and could be used by a subsequent allocation. What semantics would you like UFS or ZFS to implement instead? Is it just that the inode and associated data blocks should stay in limbo for some period? If, what controls the period? What if a file is truncated to 0 or overwritten before being unlinked? How much would you be willing to pay for "undelete" functionality? >As soren@ suggested in later reply it maybe would be easier to implement >custom rm script that moves files to "Recycle bin" directory (and empty it >after some period) Alternatively, you could alias "rm" to "rm -i". >but as a programmer I know that perfection is needed :) >so It might start as a simple task and end in many what-if's >(unfortunattly I did my last C programming in late 2003!). This doesn't need to be C. You could do this in your scripting language of choice. Or you could offer to pay someone to do this for you. >What amzes me is that this "feature" was asked too much in the last decade >or two and no one ever implemented it, maybe it's not needed in daily >usage, but in disasters it would be super userful, save admins many time >and nerves.. I went rummaging back through my mail archives and it actually doesn't seem to come up that often. You seem to be about the 3rd person this century on the lists I read. I did find a discussion in zfs-discuss from May/June 2006 about supporting undelete but it seems that no agreement on the desired behaviour was achieved. >For now I did some backup tools locally and used chflags to mark them >undeletable so I wouldn't do that mistake again, You could also consider snapshots - both UFS and ZFS support snapshots. If the information is very critical (you mentioned legal consequences) then you might like to consider real-time replication of the MySQL redo logs to another systems - though that won't necessarily protect you from someone accidently doing a "DELETE FROM xxx;" or "DROP TABLE xxx;" -- Peter Jeremy signature.asc Description: PGP signature
Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?
On 2021-Dec-09 08:19:30 +0100, Emmanuel Vadot wrote: > > Hi Mark, > >On Wed, 8 Dec 2021 20:36:20 -0800 >Mark Millard via freebsd-current wrote: > >> [ Note: w...@freebsd.org is only a guess, based on: >> https://lists.freebsd.org/archives/dev-commits-src-main/2021-December/001931.html >> ] >> >> Attempting to update to: >> >> main-n251456-22c4ab6cb015-dirty: Tue Dec 7 19:38:53 PST 2021 >> >> resulted in boot failure (showing some boot -v output): [hang just before root is mounted] > Could you try reverting >8661e085fb953855dbc7059f21a64a05ae61b22c "mmc: Fix HS200/HS400 >capability check" and let me know ? I had exactly the same boot failure but was still working backwards through the root mount code trying to isolate the issue. Reverting 8661e085fb953855dbc7059f21a64a05ae61b22c solves the problem for me. I'd noticed the mmc1 difference and mmcsd1 error: mmc1: bus: 8bit, 200MHz (HS200 timing) mmc1: memory: 30310400 blocks, erase sector 1024 blocks mmc1: setting transfer rate to 150.000MHz (HS200 timing) bud I didn't think it was the cause. I had tracked down that the hang was somewhere between https://cgit.freebsd.org/src/tree/sys/kern/vfs_mountroot.c#n779 and https://cgit.freebsd.org/src/tree/sys/kern/vfs_mountroot.c#n1008 which led me to suspect that the problem might be in the geom layer (eg g_waitidle()) but was still considering where to add my next tranche of printf's when I saw Mark's mail. -- Peter Jeremy signature.asc Description: PGP signature
Re: Install to ZFS root is using device names hence failing when device tree is changed.
On 2021-Sep-06 17:45:31 +0200, Karel Gardas wrote: >just installed 14-current snapshot from 2.9. on uefi amd64 machine. >Installed from USB memstick which was detected as da0 into the ssd >hanging on usb3 in external enclosure which was detected as da1. > >ZFS root pool is then using /dev/da1p3 as swap and /dev/da1p1 as >/boot/efi and probably also something as root zpool. > >Anyway, expected thing happen. When I pulled out USB stick identified as >da0 on reboot, the drive on USB3 switch from da1 to da0 and result is >unbootable system with complains about various /dev/da1xx drives missing >for swap efi boot etc. Can you give more details about exactly what the errors and when they occur during the boot cycle. In particular: * Low-level boot (anything prior to the FreeBSD kernel) knows nothing about da0 or da1, so any problems there are associated with your BIOS config, not FreeBSD. * The swap partition will, by default, appear as a hard-wired device name in /etc/fstab - that will definitely need updating. This will prevent the "swapon" working but won't prevent the boot. * ZFS doesn't care about device names - it looks for ZFS labels on all possible devices. -- Peter Jeremy signature.asc Description: PGP signature
Re: Files in /etc containing empty VCSId header
On 2021-Jun-08 17:13:45 -0600, Ian Lepore wrote: >On Tue, 2021-06-08 at 15:11 -0700, Rodney W. Grimes wrote: >> There is a command for that which does or use to do a pretty >> decent job of it called whereis(1). Thanks. That looks useful. >revolution > whereis ntp.conf >ntp.conf: >revolution > whereis netif >netif: >revolution > whereis services >services: > >So how does that help me locate the origin of these files in the source >tree? It works for me™: server% whereis ntp.conf ntp.conf: /usr/src/usr.sbin/ntp/ntpd/ntp.conf server% whereis netif netif: /usr/src/libexec/rc/rc.d/netif server% whereis services services: /usr/src/contrib/unbound/services Is your source tree somewhere other than /usr/src? -- Peter Jeremy signature.asc Description: PGP signature
Re: geli broken in 13.0-BETA4 and later on armv8
On 2021-Mar-06 10:39:02 -0800, Oleksandr Tymoshenko wrote: >Peter Jeremy via freebsd-current (freebsd-current@freebsd.org) wrote: >> [Adding arm@ and making it clearer that this is armv8-only] >> >> On 2021-Mar-06 20:26:19 +1100, Peter Jeremy >> wrote: >> >On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable >> > wrote: >> >>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 >> >>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - >> >>RK3399, arm64) has changed so that a geli-encrypted partition (using >> >>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on >> >>13.0-BETA4. >> > >> >I've confirmed that the problem is f76393a6305b - reverting that >> >commit fixes the problem in releng/13.0. >> > >> >I've further verified that the bug is still present in main (14.x) >> >at 028616d0dd69. > >Could you test this patch and let me know if it fixes the issue? > >https://people.freebsd.org/~gonzo/patches/armv8crypto-xts-fix.diff Yes, it does. Thank you very much. --- Peter Jeremy signature.asc Description: PGP signature
Re: geli broken in 13.0-BETA4 and later on armv8
[Adding arm@ and making it clearer that this is armv8-only] On 2021-Mar-06 20:26:19 +1100, Peter Jeremy wrote: >On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable > wrote: >>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 >>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - >>RK3399, arm64) has changed so that a geli-encrypted partition (using >>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on >>13.0-BETA4. > >I've confirmed that the problem is f76393a6305b - reverting that >commit fixes the problem in releng/13.0. > >I've further verified that the bug is still present in main (14.x) >at 028616d0dd69. -- Peter Jeremy signature.asc Description: PGP signature
Re: geli broken in 13.0-BETA4 and later
On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable wrote: >Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 >(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - >RK3399, arm64) has changed so that a geli-encrypted partition (using >AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on >13.0-BETA4. I've confirmed that the problem is f76393a6305b - reverting that commit fixes the problem in releng/13.0. I've further verified that the bug is still present in main (14.x) at 028616d0dd69. -- Peter Jeremy signature.asc Description: PGP signature
Re: New Xorg - different key-codes
On 2020-Mar-11 10:29:08 +0100, Niclas Zeising wrote: >This has to do with switching to using evdev to handle input devices on >FreeBSD 12 and CURRENT. There's been several reports, and suggested >solutions to this, as well as an UPDATING entry detailing the change. The UPDATING entry says that it's switched from devd to udev. There's no mention of evdev or that the keycodes have been roto-tilled. It's basically a vanilla "things have been changed, see the documentation" entry. Given that entry, it's hardly surprising that people are confused. -- Peter Jeremy signature.asc Description: PGP signature
Re: System clock is slow
On 2020-Mar-09 19:59:09 -0400, Theron wrote: >Since switching from 12.1-RELEASE to CURRENT I've noticed timing >problems with audio applications. It turns out that the problem is not >with the audio drivers, but with the system clock driver, which now >reports passage of time 0.3% too slow. Although I discovered this only >recently, it's been broken since r352684 made on Sept. 25. Has anyone >else noticed? Note that r352684 was MFC'd to both 11-stable (r353007) and 12-stable (r353006) in early October and I don't recall seeing any adverse reports before this. Are you running NTP? If so, is NTP maintaining lock and what is the reported PLL frequency (ntpq -c kerni)? What does "sysctl kern.timecounter" report and have you tried using any of the alternative timecounters listed in kern.timecounter.choice? Are you overclocking your CPU (or doing anything else non-standard)? -- Peter Jeremy signature.asc Description: PGP signature
Re: Which AMD CPUs are supported -- temperature
On 2020-Feb-13 13:27:17 -0800, Chris wrote: >My BIOS appears to have the correct temp reading. Would it be of any use >to anyone besides myself, if I were to decompile it, and get the source >for the temp reading/monitoring from it? I would definitely like to have this information. If you are able to share the two constants (both step size and reference temperature), that would be great. -- Peter Jeremy signature.asc Description: PGP signature
Re: Which AMD CPUs are supported -- temperature
On 2020-Feb-12 15:23:51 -0500, mike tancsa wrote: >Not sure about the older Athlon CPUs, but the 2 generations of Ryzen's I >have seem correct as well as an APU > >CPU: AMD GX-412TC SOC (998.17-MHz K8-class CPU) OTOH, I'm not confident about temperatures on my APU. The publicly available data just says that the SoC reports "a temperature on its own scale" relative to a Tctl_max which "is specified in the power and thermal data sheet" (that I have been unable to locate). Everyone seems to assume that the step size is 0.125K but I haven't found that publicly documented anywhere. The AMD Product Brief states that the maximum temperature is 90°C but using that as Tctl_max gives me temperature readings that don't look right. >And on a fanless APU > ># sysctl -a dev.cpu.0.temperature >dev.cpu.0.temperature: 62.6C > ># sysctl -a dev.amdtemp.0.core0.sensor0 >dev.amdtemp.0.core0.sensor0: 63.1C At what ambient temperature? I see a similar value from my (idle) APU3 but don't believe the (implied) ~35K junction-to-ambient difference. -- Peter Jeremy signature.asc Description: PGP signature
Re: head -r356066 reaching kern.ipc.nmbclusters on Rock64 (CortexA53 with 4GiByte of RAM) while putting files on it via nfs: some evidence
Sorry for the delay in responding. On 2019-Dec-27 21:59:49 -0800, Mark Millard via freebsd-arm wrote: >The following sort of sequence leads to the Rock64 not >responding on the console or over ethernet, after notifying >of nmbclusters having been reached. (This limits what >information I have of what things were like at the end.) There's a bug in the dwc(4) driver such that it can leak mbuf clusters. I've been running with the following patch but need to clean it up samewhat before I can commit it: Index: sys/dev/dwc/if_dwc.c === --- sys/dev/dwc/if_dwc.c(revision 356350) +++ sys/dev/dwc/if_dwc.c(working copy) @@ -755,7 +755,6 @@ dwc_rxfinish_locked(struct dwc_softc *sc) { struct ifnet *ifp; - struct mbuf *m0; struct mbuf *m; int error, idx, len; uint32_t rdes0; @@ -762,9 +761,8 @@ ifp = sc->ifp; - for (;;) { + for (; ; sc->rx_idx = next_rxidx(sc, sc->rx_idx)) { idx = sc->rx_idx; - rdes0 = sc->rxdesc_ring[idx].tdes0; if ((rdes0 & DDESC_RDES0_OWN) != 0) break; @@ -773,9 +771,9 @@ BUS_DMASYNC_POSTREAD); bus_dmamap_unload(sc->rxbuf_tag, sc->rxbuf_map[idx].map); + m = sc->rxbuf_map[idx].mbuf; len = (rdes0 >> DDESC_RDES0_FL_SHIFT) & DDESC_RDES0_FL_MASK; if (len != 0) { - m = sc->rxbuf_map[idx].mbuf; m->m_pkthdr.rcvif = ifp; m->m_pkthdr.len = len; m->m_len = len; @@ -784,24 +782,33 @@ /* Remove trailing FCS */ m_adj(m, -ETHER_CRC_LEN); + /* Consume the mbuf and mark it as consumed */ + sc->rxbuf_map[idx].mbuf = NULL; DWC_UNLOCK(sc); (*ifp->if_input)(ifp, m); DWC_LOCK(sc); + m = NULL; } else { /* XXX Zero-length packet ? */ } - if ((m0 = dwc_alloc_mbufcl(sc)) != NULL) { - if ((error = dwc_setup_rxbuf(sc, idx, m0)) != 0) { - /* -* XXX Now what? -* We've got a hole in the rx ring. -*/ + if (m == NULL) { + if ((m = dwc_alloc_mbufcl(sc)) == NULL) { + if_inc_counter(sc->ifp, IFCOUNTER_IQDROPS, 1); + continue; } - } else + } + + if ((error = dwc_setup_rxbuf(sc, idx, m)) != 0) { + m_free(m); + device_printf(sc->dev, + "dwc_setup_rxbuf returned %d\n", error); if_inc_counter(sc->ifp, IFCOUNTER_IQDROPS, 1); - - sc->rx_idx = next_rxidx(sc, sc->rx_idx); + /* +* XXX Now what? + * We've got a hole in the rx ring. +*/ + } } } -- Peter Jeremy signature.asc Description: PGP signature
buildworld has mandatory dependency on optional executable.
I've just discovered that "make buildworld" has a mandatory dependency on kbdcontrol (see https://svnweb.freebsd.org/base/head/Makefile.inc1?annotate=354138#l2207 ) but, if WITHOUT_LEGACY_CONSOLE is defined then kbdcontrol isn't built (https://svnweb.freebsd.org/base/head/usr.sbin/Makefile?annotate=352949#l162 ) and the installed version will be deleted by "make delete-old": https://svnweb.freebsd.org/base/head/tools/build/mk/OptionalObsoleteFiles.inc?annotate=353358#l4520 This seems undesirable... The "make buildworld" failure doesn't make the cause obvious - it just reports "*** Error code 1" in bootstrap-tools. Having trace the failure, I now see ".ERROR_TARGET='_bootstrap-tools-link-kbdcontrol'" but that was only obvious in hindsight. -- Peter Jeremy signature.asc Description: PGP signature
Re: Reproducable deadlock in NFS client
On 2019-Oct-03 23:28:07 +, Rick Macklem wrote: >1 - kib@ just put a patch up on phabricator that reorganizes the handling > of vnode_pager_setsize(). > D21883 > (If you could test this patch, that might be the best approach.) That fixes my problem. I've added a note to D21883 >ps: Btw, capturing "procstat -kk" and "ps axHl" would give you/us more info. > (The "H" on "ps" shows the iod threads.) > If you can drop into the debugger when it is hung as above, you could > capture the stuff listed here: >https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Thanks for the pointer and sorry for leaving that out. -- Peter Jeremy signature.asc Description: PGP signature
Reproduceable deadlock in NFS Client
My diskless Rock64 has taken to deadlocking reproduceably whilst building libprivatesqlite3.a as part of buildworld when running r352792. At the time of the deadlock, the relevant running process is: ar -crD libprivatesqlite3.a sqlite3.o And those files are: -rw-r--r--1 root wheel 3178496 4 Oct 01:10 libprivatesqlite3.a -rw-r--r--1 root wheel 7975272 4 Oct 01:10 sqlite3.o The "ar" reports it's in bo_wwait and, after about 30 minutes, I get: deadlres_td_sleep_q: possible deadlock detected for 0xfd00012c9560, blocked for 1800613 ticks cpuid = 2 time = 1570117920 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0x0054b83c lr = 0x000e2b08 sp = 0x4030a790 fp = 0x4030a9a0 db_trace_self_wrapper() at vpanic+0x18c pc = 0x000e2b08 lr = 0x0027fb54 sp = 0x4030a9b0 fp = 0x4030aa50 vpanic() at panic+0x44 pc = 0x0027fb54 lr = 0x0027f904 sp = 0x4030aa60 fp = 0x4030aae0 panic() at deadlkres+0x33c pc = 0x0027f904 lr = 0x0021c19c sp = 0x4030aaf0 fp = 0x4030ab50 deadlkres() at fork_exit+0x7c pc = 0x0021c19c lr = 0x002404f4 sp = 0x4030ab60 fp = 0x4030ab90 fork_exit() at fork_trampoline+0x10 pc = 0x002404f4 lr = 0x0056743c sp = 0x4030aba0 fp = 0x0000 -- Peter Jeremy signature.asc Description: PGP signature
Re: panic: sleeping thread on r352386
On 2019-Sep-17 15:24:30 +0300, Konstantin Belousov wrote: >Try this. > >diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c >index 63ea4736707..a23b4ba4efa 100644 Sorry for the delay but I'm not seeing problems with this version of your patch (now r352457) either. Thank you for your efforts. -- Peter Jeremy signature.asc Description: PGP signature
Re: panic: sleeping thread on r352386
On 2019-Sep-17 11:06:58 +0300, Konstantin Belousov wrote: >Try the following change, which more accurately tries to avoid >vnode_pager_setsize(). The real cause requires much more extensive >changes. > >diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c >index 63ea4736707..16dc7745c77 100644 >--- a/sys/fs/nfsclient/nfs_clport.c >+++ b/sys/fs/nfsclient/nfs_clport.c ... With that patch, I'm back to "Sleeping thread (...) owns a non-sleepable lock" panics. -- Peter Jeremy signature.asc Description: PGP signature
Re: "Sleeping with non-sleepable lock" in NFS on recent -current
On 2019-Sep-16 11:19:02 +0300, Konstantin Belousov wrote: >diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c >index 471e029a8b5..63ea4736707 100644 ... Thanks, that patch seems much more stable. -- Peter Jeremy signature.asc Description: PGP signature
Re: "Sleeping with non-sleepable lock" in NFS on recent -current
On 2019-Sep-16 09:32:52 +0300, Konstantin Belousov wrote: >On Mon, Sep 16, 2019 at 04:12:05PM +1000, Peter Jeremy wrote: >> I'm consistently seeing panics in the NFS code on recent -current on aarm64. >> The panics are one of the following two: >> Sleeping on "vmopar" with the following non-sleepable locks held: >> exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 >> (0xfd0078b346f0) locked @ /usr/src/sys/fs/nfsclient/nfs_clport.c:432 >> >> Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock >> >> Both panics have nearly identical backtraces (see below). I'm running >> diskless on a Rock64 with both filesystem and swap over NFS. The panics >> can be fairly reliably triggered by any of: >> * "make -j4 buildworld" >> * linking the kernel (as part of buildkernel) >> * "make installworld" >> >> Has anyone else seen this? ... >Weird since this should have been fixed long time ago. Anyway, please >try the following, it should fix the rest of cases. > >diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c ... >@@ -540,7 +541,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr >*nap, void *nvaper, > } else { > np->n_size = vap->va_size; > np->n_flag |= NSIZECHANGED; >- vnode_pager_setsize(vp, np->n_size); >+ setnsize = 1; Should this else block include a "nsize = np->n_size;"? Without it, nsize will remain set to 0, which looks wrong. -- Peter Jeremy signature.asc Description: PGP signature
"Sleeping with non-sleepable lock" in NFS on recent -current
I'm consistently seeing panics in the NFS code on recent -current on aarm64. The panics are one of the following two: Sleeping on "vmopar" with the following non-sleepable locks held: exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 (0xfd0078b346f0) locked @ /usr/src/sys/fs/nfsclient/nfs_clport.c:432 Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock Both panics have nearly identical backtraces (see below). I'm running diskless on a Rock64 with both filesystem and swap over NFS. The panics can be fairly reliably triggered by any of: * "make -j4 buildworld" * linking the kernel (as part of buildkernel) * "make installworld" Has anyone else seen this? The first panic (sleeping on vmopar) has a backtrace: sched_switch() at mi_switch+0x19c pc = 0x002ab368 lr = 0x0028a9f4 sp = 0x61192660 fp = 0x61192680 mi_switch() at sleepq_switch+0x100 pc = 0x0028a9f4 lr = 0x002d56dc sp = 0x61192690 fp = 0x611926d0 sleepq_switch() at sleepq_wait+0x48 pc = 0x002d56dc lr = 0x002d5594 sp = 0x611926e0 fp = 0x61192700 sleepq_wait() at _sleep+0x2c4 [***] pc = 0x002d5594 lr = 0x00289eec sp = 0x61192710 fp = 0x611927b0 _sleep() at vm_object_page_remove+0x178 [***] pc = 0x00289eec lr = 0x0052211c sp = 0x611927c0 fp = 0x61192820 vm_object_page_remove() at vnode_pager_setsize+0xc0 pc = 0x0052211c lr = 0x00539a70 sp = 0x61192830 fp = 0x61192870 vnode_pager_setsize() at nfscl_loadattrcache+0x2e8 pc = 0x00539a70 lr = 0x001ed4b4 sp = 0x61192880 fp = 0x611928e0 nfscl_loadattrcache() at ncl_writerpc+0x104 pc = 0x001ed4b4 lr = 0x001e2158 sp = 0x611928f0 fp = 0x61192a40 ncl_writerpc() at ncl_doio+0x36c pc = 0x001e2158 lr = 0x001f0370 sp = 0x61192a50 fp = 0x61192ae0 ncl_doio() at nfssvc_iod+0x228 pc = 0x001f0370 lr = 0x001f1d88 sp = 0x61192af0 fp = 0x61192b50 nfssvc_iod() at fork_exit+0x7c pc = 0x001f1d88 lr = 0x0023ff5c sp = 0x61192b60 fp = 0x61192b90 fork_exit() at fork_trampoline+0x10 pc = 0x0023ff5c lr = 0x00562c34 sp = 0x61192ba0 fp = 0x For the second panic, the [***] change to: sleepq_wait() at vm_page_sleep_if_busy+0x80 vm_page_sleep_if_busy() at vm_object_page_remove+0xfc -- Peter Jeremy signature.asc Description: PGP signature
Re: "panic: Duplicate alloc" in dwmmc_attach on Rock64
On 2019-Jun-21 20:59:39 +1000, Peter Jeremy wrote: >Since r349169, my Rock64 has consistently panic'd whilst attaching >rockchip_dwmmc1. A kernel built at r349135 works OK. The relevant >output looks like: >rockchip_dwmmc0: (RockChip)> mem 0xff50-0xff503fff irq 40 on ofwbus0 >rockchip_dwmmc0: Hardware version ID is 270a >mmc0: on rockchip_dwmmc0 >rockchip_dwmmc1: (RockChip)> mem 0xff52-0xff523fff irq 42 on ofwbus0 >rockchip_dwmmc1: Hardware version ID is 270a >panic: Duplicate alloc of 0xfd89cf50 from zone 0xfd817540(16) >slab 0xfd89cf90(0) I did some more digging and narrowed this down to r349151 (which has nothing that would be an obvious cause). And the problem went away somewhere between r349269 and r349288. Since there's nothing obvious there either, I presume this is something more subtle like a race condition that has been provoked by the code changes. -- Peter Jeremy signature.asc Description: PGP signature
"panic: Duplicate alloc" in dwmmc_attach on Rock64
Since r349169, my Rock64 has consistently panic'd whilst attaching rockchip_dwmmc1. A kernel built at r349135 works OK. The relevant output looks like: rockchip_dwmmc0: mem 0xff50-0xff503fff irq 40 on ofwbus0 rockchip_dwmmc0: Hardware version ID is 270a mmc0: on rockchip_dwmmc0 rockchip_dwmmc1: mem 0xff52-0xff523fff irq 42 on ofwbus0 rockchip_dwmmc1: Hardware version ID is 270a panic: Duplicate alloc of 0xfd89cf50 from zone 0xfd817540(16) slab 0xfd89cf90(0) cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0x00535d54 lr = 0x000df10c sp = 0x000104d0 fp = 0x000106e0 db_trace_self_wrapper() at vpanic+0x18c pc = 0x000df10c lr = 0x00278218 sp = 0x000106f0 fp = 0x00010790 vpanic() at panic+0x44 pc = 0x00278218 lr = 0x00277fc8 sp = 0x000107a0 fp = 0x00010820 panic() at uma_dbg_alloc+0x144 pc = 0x00277fc8 lr = 0x004fa4b0 sp = 0x00010830 fp = 0x00010850 uma_dbg_alloc() at uma_zalloc_arg+0x9b0 pc = 0x004fa4b0 lr = 0x004f9960 sp = 0x00010860 fp = 0x000108e0 uma_zalloc_arg() at malloc+0x9c pc = 0x004f9960 lr = 0x00252a8c sp = 0x000108f0 fp = 0x00010920 malloc() at bounce_bus_dmamem_alloc+0x4c pc = 0x00252a8c lr = 0x00533b64 sp = 0x00010930 fp = 0x00010960 bounce_bus_dmamem_alloc() at dwmmc_attach+0x5fc pc = 0x00533b64 lr = 0x00556f14 sp = 0x00010970 fp = 0x000109e0 dwmmc_attach() at device_attach+0x3f4 pc = 0x00556f14 lr = 0x002abd8c sp = 0x000109f0 fp = 0x00010a40 device_attach() at bus_generic_new_pass+0x12c pc = 0x002abd8c lr = 0x002adb40 sp = 0x00010a50 fp = 0x00010a80 ... I've looked through all the intervening commits and don't see any smoking gun. Does anyone have any suggestions? -- Peter Jeremy signature.asc Description: PGP signature
Re: error: yacc.h: No such file or directory
On 2019-Jun-18 07:01:31 -0700, Enji Cooper wrote: > >> On Jun 18, 2019, at 06:59, Enji Cooper wrote: >> PS This is one of the reasons why I wasn’t quick to discount Peter Jeremy’s >> reported build issue. > >Correction: I meant Julian Stacey. I'm not sure how I feel about being confused with jhs. Actually, I had also seen this problem in both mkesdb_static and mkcsmapper_static but hadn't reported it because I was investigating something else and wasn't certain that it wasn't self-inflicted. -- Peter Jeremy signature.asc Description: PGP signature
Re: FreeBSD 12 kernel broken
On 2019-Mar-22 19:08:18 +0300, Rozhuk Ivan wrote: >ld: error: undefined symbol: xz_dec_init >>>> referenced by g_uzip_lzma.c:106 (/usr/src/sys/geom/uzip/g_uzip_lzma.c:106) >>>> g_uzip_lzma.o:(g_uzip_lzma_ctor) > >ld: error: undefined symbol: xz_dec_run >>>> referenced by g_uzip_lzma.c:81 (/usr/src/sys/geom/uzip/g_uzip_lzma.c:81) >>>> g_uzip_lzma.o:(g_uzip_lzma_decompress) > >ld: error: undefined symbol: xz_dec_end >>>> referenced by g_uzip_lzma.c:60 (/usr/src/sys/geom/uzip/g_uzip_lzma.c:60) >>>> g_uzip_lzma.o:(g_uzip_lzma_free) >--- kernel.full --- >*** [kernel.full] Error code 1 Are you talking about FreeBSD 12 or FreeBSD 13? -- Peter Jeremy signature.asc Description: PGP signature
Re: Optimization bug with floating-point?
On 2019-Mar-13 23:30:07 -0700, Steve Kargl wrote: >AFAICT, all libm float routines need to be modified to conditional >include ieeefp.h and call fpsetprec(FP_PD). This will work around >issues is FP and libm. FreeBSD needs to issue an erratum about >the numerical issues with clang. I vaguely recall looking into the x87 initialisation a long time ago and STR that the startup code (either crtX or in the kernel) does a fninit() to set the precision. I don't recall exactly where. IMO, calling fpsetprec() in every libm float function is overkill. It should be enough to fpsetprec() before main() and add a note in the man pages that libm is built to use the default FPU configuration and changing the configuration (precision or rounding) may result in larger errors. -- Peter Jeremy signature.asc Description: PGP signature
Re: how to browse svnweb source?
On 2018-May-28 18:06:07 -0700, Jeffrey Bouquet wrote: >> > Suddenly the site www.secnetix.de/olli/FreeBSD/svnews which showed >> > sequential >> > source as for example xx1966 on april 3 xx2040 on april 4 this year, >> > is not loading >> > in the browser. That site is not associated with the FreeBSD Project so you would need to discuss the absence of information on that site with whoever runs it. >I tried that url every which way, sorting the headings, etc, and onscreen >would be at best, a description of the new source but not specifically which >files were changed and their complete path. Nothing like the url mentioned >above at >.de in the latter's overview. Without knowing what that site displayed, it's very difficult to know where (or if) svnweb provides the information. Given a known revision, you can check (eg) https://svnweb.freebsd.org/base?view=revision&revision=333926 If you want a sequential list of commits, you might be better off with (eg) https://lists.freebsd.org/pipermail/svn-src-all/ -- Peter Jeremy signature.asc Description: PGP signature
Re: Strange ARC/Swap/CPU on yesterday's -CURRENT
On 2018-Mar-11 10:43:58 -1000, Jeff Roberson wrote: >Also, if you could try going back to r328953 or r326346 and let me know if >the problem exists in either. That would be very helpful. If anyone is >willing to debug this with me contact me directly and I will send some >test patches or debugging info after you have done the above steps. I ran into this on 11-stable and tracked it to r326619 (MFC of r325851). I initially got around the problem by reverting that commit but either it or something very similar is still present in 11-stable r331053. I've seen it in my main server (32GB RAM) but haven't managed to reproduce it in smaller VBox guests - one difficulty I faced was artificially filling ARC. -- Peter Jeremy signature.asc Description: PGP signature
Re: Build error: 'emmintrin.h' file not found
On 2018-Jan-24 17:34:33 +0100, Florian Limberger wrote: >since a few days I can't build 12-CURRENT anymore, due to the 'emmintrin.h' >header missing. I ran into a similar problem about a month ago. First of all, does your host system have emmintrin.h? E.g. what is the output of "find /usr/lib/clang -name emmintrin.h" ? -- Peter Jeremy signature.asc Description: PGP signature
Re: Unable to build 12-current/amd64
On 2017-Dec-23 13:42:40 +0100, Dimitry Andric wrote: >On 23 Dec 2017, at 10:56, Peter Jeremy wrote: >> >> Since r326496, buildworld on my 12-current/amd64 system has consistently >> died as follows. >... >> /usr/src/contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp:1166:10: fatal >> error: 'emmintrin.h' file not found >> #include >> ^ >> 1 error generated. >> *** Error code 1 >> >> Stop. >> make[4]: stopped in /usr/src/lib/clang/libclang >> >> I'm building on a 12.0-CURRENT VirtualBox guest at r326430. I've checked >> that my /usr/src is clean and deleted /usr/obj to no effect. I have dug >> into SourceManager.cpp and the #include is protected by a #if __SSE2__, >> which is relying on clang internal checks to define (and my CPU supports >> SSE2). Does anyone have any ideas to explain what is going on? > >First of all, does your host system have emmintrin.h? E.g. what is the >output of "find /usr/lib/clang -name emmintrin.h" ? Aha. Somehow my entire /usr/lib/clang/5.0.0 tree was missing. I'm not sure if that was an installworld glitch or something I accidently did. In any case, restoring it has fixed the problem. Thanks for the pointer. -- Peter Jeremy signature.asc Description: PGP signature
Unable to build 12-current/amd64
Since r326496, buildworld on my 12-current/amd64 system has consistently died as follows. I have no problems building on i386 or building 12-current/amd64 on 11-stable. ... >>> stage 3: cross tools -- cd /usr/src; INSTALL="sh /usr/src/tools/install.sh" TOOLS_PREFIX=/usr/obj/usr/src/amd64.amd64/tmp PATH=/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/sbin:/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/bin:/usr/obj/usr/src/amd64.amd64/tmp/legacy/bin:/sbin:/bin:/usr/sbin:/usr/bin WORLDTMP=/usr/obj/usr/src/amd64.amd64/tmp MAKEFLAGS="-m /usr/src/tools/build/mk -m /usr/src/share/mk" make -f Makefile.inc1 DESTDIR= OBJTOP='/usr/obj/usr/src/amd64.amd64/tmp/obj-tools' OBJROOT='${OBJTOP}/' MAKEOBJDIRPREFIX= BOOTSTRAPPING=1200054 BWPHASE=cross-tools SSP_CFLAGS= MK_HTML=no NO_LINT=yes MK_MAN=no -DNO_PIC MK_PROFILE=no -DNO_SHARED -DNO_CPU_CFLAGS MK_WARNS=no MK_CTF=no MK_CLANG_EXTRAS=no MK_CLANG_FULL=no MK_LLDB=no MK_TESTS=no MK_INCLUDES=yes TARGET=amd64 TARGET_ARCH=amd64 MK_GDB=no MK_LLD_IS_LD=no MK_TESTS=no cross-tools ... ===> lib/clang/libclang (all) ... c++ -O2 -pipe -I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libclang -I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libllvm -I/usr/src/contrib/llvm/tools/clang/lib/Driver -I/usr/src/contrib/llvm/tools/clang/include -I/usr/src/lib/clang/include -I/usr/src/contrib/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd12.0\" -DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd12.0\" -DDEFAULT_SYSROOT=\"/usr/obj/usr/src/amd64.amd64/tmp\" -ffunction-sections -fdata-sections -gline-tables-only -MD -MF.depend.Basic_SourceLocation.o -MTBasic/SourceLocation.o -Qunused-arguments -I/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -gline-tables-only -stdlib=libc++ -Wno-c++11-extensions -c /usr/src/contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp -o Basic/SourceLocation.o c++ -O2 -pipe -I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libclang -I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libllvm -I/usr/src/contrib/llvm/tools/clang/lib/Driver -I/usr/src/contrib/llvm/tools/clang/include -I/usr/src/lib/clang/include -I/usr/src/contrib/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd12.0\" -DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd12.0\" -DDEFAULT_SYSROOT=\"/usr/obj/usr/src/amd64.amd64/tmp\" -ffunction-sections -fdata-sections -gline-tables-only -MD -MF.depend.Basic_SourceManager.o -MTBasic/SourceManager.o -Qunused-arguments -I/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -gline-tables-only -stdlib=libc++ -Wno-c++11-extensions -c /usr/src/contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp -o Basic/SourceManager.o /usr/src/contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp:1166:10: fatal error: 'emmintrin.h' file not found #include ^ 1 error generated. *** Error code 1 Stop. make[4]: stopped in /usr/src/lib/clang/libclang I'm building on a 12.0-CURRENT VirtualBox guest at r326430. I've checked that my /usr/src is clean and deleted /usr/obj to no effect. I have dug into SourceManager.cpp and the #include is protected by a #if __SSE2__, which is relying on clang internal checks to define (and my CPU supports SSE2). Does anyone have any ideas to explain what is going on? -- Peter Jeremy signature.asc Description: PGP signature
Re: get_swap_pager(x) failed
On 2017-Dec-13 11:23:46 +, Gary Palmer wrote: >An open question would be why ARC is not reducing if the system is >under memory pressure. It's meant to, but there have been various >bugs in that implementation. The OP doesn't say what version of -current he is running but I would point the finger at r325851. I have discovered that, in 11-stable, r326619 (which is the MFC of r325851) stops ARC responding to memory backpressure. -- Peter Jeremy signature.asc Description: PGP signature
Re: dump trying to access incorrect block numbers?
On 2017-Jul-07 10:44:36 -0400, Michael Butler wrote: >Recent builds doing a backup (dump) cause nonsensical errors in syslog: I can't directly offer any ideas but some more background might help: When did you first notice this (what SVN revision)? Do you know what the last good SVN revision was? Is this a new or old filesystem? Is the filesystem mounted/active or not when you dump it? What are the relevant parameters for the filesystem on ada0s3a? Are you running softupdates, journalling etc? Which dump(8) phase is reporting the errors? What are the exact dump and fsck commands you ran? >I now have two UFS-based systems showing the same symptoms - what's up >with this? Was there anything you did on either filesystem that might have triggered it? -- Peter Jeremy signature.asc Description: PGP signature
Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV
On 2017-May-24 20:21:54 +0300, Konstantin Belousov wrote: >No SIGSEGV etc, so I think that the effects seen are due to build system. >rm -rf obj/* is the safest trick, I believe. But the behaviour does indicate that meta mode is not doing the right thing under all circumstances. It's blatently breaking in this scenario but could be causing more subtle (and unnoticed) breakage in other cases. This makes me feel that this is worth investigating further. -- Peter Jeremy signature.asc Description: PGP signature
Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV
On 2017-May-24 18:01:42 -0700, "Simon J. Gerraty" wrote: >Peter Jeremy wrote: >> as follows. My suspicion is that meta mode isn't seeing enough of the >> differences between the bootstrap and main build steps and so causing make >> to incorrectly skip steps. > >I see a number of places in src/Makefile* where BUILD_TOOLS_META=.NOMETA >is added to env of things like CROSSENV, CD2MAKE, LIBCOMPATWMAKEENV > >Use of .NOMETA could be leading to problems - but I'm not familiar with >where BUILD_TOOLS_META is used. I've not looked at the guts of how meta mode works or is inhibited either. In my case, I have "WITH_META_MODE=yes" in /etc/src-env.conf and was using "make buildworld" - which failed. The upgrade worked cleanly when I manually deleted all the .meta files. If I get a round tuit, I'll try to revert to before the update and have a closer look at what broke with the "normal" build, if no-one else beats me to it. -- Peter Jeremy signature.asc Description: PGP signature
Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV
On 2017-May-24 08:47:41 -0700, Ngie Cooper wrote: >There was another report on the list about a stale MAKEOBJDIRPREFIX > causing someone grief. I think it's safe to say that meta mode and -DNO_CLEAN > might not work across this transition--in particular meta mode tends to err > on the side of not to rebuilding things. I ran into a very similar problem trying to update from r318744 to r318781. In my case, even two "make clean" wasn't enough and "make buildworld" died as follows. My suspicion is that meta mode isn't seeing enough of the differences between the bootstrap and main build steps and so causing make to incorrectly skip steps. -- >>> stage 2.3: build tools -- cd /usr/src; MAKEOBJDIRPREFIX=/usr/obj INSTALL="sh /usr/src/tools/install.sh" TOOLS_PREFIX=/usr/obj/usr/src/tmp PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/sbin:/bin:/usr/sbin:/usr/bin WORLDTMP=/usr/obj/usr/src/tmp MAKEFLAGS="-m /usr/src/tools/build/mk -m /usr/src/share/mk" /usr/obj/usr/src/make.amd64/bmake -f Makefile.inc1 TARGET=amd64 TARGET_ARCH=amd64 DESTDIR= BOOTSTRAPPING=1200031 SSP_CFLAGS= -DNO_LINT -DNO_CPU_CFLAGS MK_WARNS=no MK_CTF=no MK_CLANG_EXTRAS=no MK_CLANG_FULL=no MK_LLDB=no MK_TESTS=no build-tools ... ===> usr.bin/mkesdb_static (obj,build-tools) Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_bcs.o Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_db_factory.o Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_db_hash.o Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_lookup_factory.o Building /usr/obj/usr/src/usr.bin/mkesdb_static/lex.c Building /usr/obj/usr/src/usr.bin/mkesdb_static/lex.o /usr/src/usr.bin/mkesdb/lex.l:44:10: fatal error: 'yacc.h' file not found #include "yacc.h" ^~~~ 1 error generated. *** Error code 1 Stop. bmake[3]: stopped in /usr/src/usr.bin/mkesdb_static .ERROR_TARGET='lex.o' .ERROR_META_FILE='/usr/obj/usr/src/usr.bin/mkesdb_static/lex.o.meta' .MAKE.LEVEL='3' MAKEFILE='' .MAKE.MODE='meta missing-filemon=yes missing-meta=yes silent=yes verbose' .CURDIR='/usr/src/usr.bin/mkesdb_static' .MAKE='/usr/obj/usr/src/make.amd64/bmake' .OBJDIR='/usr/obj/usr/src/usr.bin/mkesdb_static' .TARGETS='build-tools' DESTDIR='' LD_LIBRARY_PATH='' MACHINE='amd64' MACHINE_ARCH='amd64' MAKEOBJDIRPREFIX='/usr/obj' MAKESYSPATH='/usr/src/share/mk' MAKE_VERSION='20161212' PATH='/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/sbin:/bin:/usr/sbin:/usr/bin' SRCTOP='/usr/src' OBJTOP='/usr/obj/usr/src' .MAKE.MAKEFILES='/usr/src/share/mk/sys.mk /usr/src/share/mk/local.sys.env.mk /usr/src/share/mk/src.sys.env.mk /etc/src-env.conf /usr/src/share/mk/bsd.mkopt.mk /usr/src/share/mk/bsd.suffixes.mk /etc/make.conf /usr/src/share/mk/local.sys.mk /usr/src/share/mk/src.sys.mk /usr/src/usr.bin/mkesdb_static/Makefile /usr/src/usr.bin/mkesdb/Makefile.inc /usr/src/tools/build/mk/bsd.prog.mk /usr/src/share/mk/bsd.prog.mk /usr/src/share/mk/bsd.init.mk /usr/src/share/mk/bsd.opts.mk /usr/src/share/mk/bsd.cpu.mk /usr/src/share/mk/local.init.mk /usr/src/share/mk/src.init.mk /usr/src/usr.bin/mkesdb_static/../Makefile.inc /usr/src/share/mk/bsd.own.mk /usr/src/share/mk/bsd.compiler.mk /usr/src/share/mk/bsd.compiler.mk /usr/src/share/mk/bsd.libnames.mk /usr/src/share/mk/src.libnames.mk /usr/src/share/mk/src.opts.mk /usr/src/share/mk/bsd.nls.mk /usr/src/share/mk/bsd.confs.mk /usr/src/share/mk/bsd.files.mk /usr/src/share/mk/bsd.incs.mk /usr/src/share/mk/bsd.links.mk /usr/src/share/mk/bsd.man.mk /usr/src/share/mk/bsd.dep.mk /usr/src/share/mk/bsd.clang-analyze.mk /usr/src/share/mk/bsd.obj.mk /usr/src/share/mk/bsd.subdir.mk /usr/src/share/mk/bsd.sys.mk /usr/src/tools/build/mk/Makefile.boot' .PATH='. /usr/src/usr.bin/mkesdb_static /usr/src/lib/libc/iconv /usr/src/usr.bin/mkesdb' *** Error code 1 I've done a "find /usr/obj -name \*.meta -print0 | xargs -0 rm" and am still waiting for that to complete, though it has passed the above failure point. -- Peter Jeremy signature.asc Description: PGP signature
Re: effect of strip(1) on du(1)
On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes" wrote: >> du(1) is using fts_read(3), which is based on the stat(2) information. >> The OpenGroup defines st_blocksize as "Number of blocks allocated for >> this object." In the case of ZFS, a write(2) may return before any >> blocks are actually allocated. And thanks to compression, gang ... >My gut tells me that this is gona cause problems, is it ONLY >the st_blocksize data that is incorrect then not such a big >problem, or are we returning other meta data that is wrong? Note that it's st_blocks, not st_blocksize. I did an experiment, writing a (roughly) 113MB file (some data I had lying around), close()ing it and then stat()ing it in a loop. This is FreeBSD 10.3 with ZFS and lz4 compression. Over the 26ms following the close(), st_blocks gradually rose from 24169 to 51231. It then stayed stable until 4.968s after the close, when st_blocks again started increasing until it stabilized after a total of 5.031s at 87483. Based on this, st_blocks reflects the actual number of blocks physically written to disk. None of the other fields in the struct stat vary. The 5s delay is presumably the TXG delay (since this system is basically unloaded). I'm not sure why it writes roughly ½ the data immediately and the rest as part of the next TXG write. >My expectactions of executing a stat(2) call on a file would >be that the data returned is valid and stable. I think almost >any program would expect that. I think a case could be made that st_blocks is a valid representation of "the number of blocks allocated for this object" - with the number increasing as the data is physically written to disk. As for it being stable, consider a (hypothetical) filesystem that can transparently migrate data between different storage media, with different compression algorithms etc (ZFS will be able to do this once the mythical block rewrite code is written). -- Peter Jeremy signature.asc Description: PGP signature
Re: effect of strip(1) on du(1)
On 2017-Mar-02 22:29:46 +0300, Subbsd wrote: >During some interval after strip call, du will show 512B for any file. >If execute du(1) after strip(1) without delay, this behavior is reproduced >100%: What filesystem are you using? strip(1) rewrites the target file and du(1) reports the number of blocks reported by stat(2). It seems that you are hitting a situation where the file metadata isn't immediately updated. -- Peter Jeremy signature.asc Description: PGP signature
Re: removing SVR4 binary compatibilty layer
On 2017-Feb-14 10:32:32 -0800, Gleb Smirnoff wrote: > After some discussion on svn mailing list [1], there is intention >to remove SVR4 binary compatibilty layer from FreeBSD head, meaning >that FreeBSD 12.0-RELEASE, available in couple of years would >be shipped without it. There is no intention of merge of the removal. >The stable@ mailing list added for wider audience. Can I suggest that we put some warnings into the SVr4 image activation code and MFC that to at least 11 to try and smoke out anyone who might actually be using it. -- Peter Jeremy signature.asc Description: PGP signature
Re: Somethign missing in my environment?
On 2016-Aug-16 23:14:45 +0200, Willem Jan Withagen wrote: >And I'm running: >make -j8 buildworld >So getting a good target that give the error is hard. > >So I continued with make -DNOCLEAN -DNO_CLEAN buildworld. There's nothing immediately obvious. I suggest trying without the "-DNOCLEAN -DNO_CLEAN" - they are shortcuts that aren't guaranteed to work under all circumstances. And if that still fails, skip the '-j8' because it's possible there are still race conditions in buildworld (though that is very unlikely). -- Peter Jeremy signature.asc Description: PGP signature
Re: Somethign missing in my environment?
On 2016-Aug-16 20:31:57 +0200, Willem Jan Withagen wrote: >I'm trying to compile world, but I keep getting: > >/usr/obj/usr/srcs/head/src/tmp/usr/lib/libgcc_s.so: undefined reference >to `__gxx_personality_v0' >cc: error: linker command failed with exit code 1 (use -v to see invocation) >*** [h_raw.full] Error code 1 > >Even after refetching the complete tree. We need more context: - What SVN revision of (presumably) -current is this? - What architecture are you compiling on/for? - What do you have in /etc/make.conf and /etc/src.conf - What is your current environment? - What is the output leading up to that error (what is being built? -- Peter Jeremy signature.asc Description: PGP signature
Re: Mosh regression between 10.x and 11-stable
On 2016-Aug-11 10:06:35 -0700, Ngie Cooper wrote: > >> On Aug 11, 2016, at 09:30, John Hood wrote: >> >> I still can't reproduce this on 3 different 11.0-BETA4 servers and a >> variety of clients and networks. Can you try and identify a more >> portable repro or at least figure out why it fails on your system? >> >> Please try applying this patch, too. It's a shot in the dark, though. > >Dumb question: what ssh key type(s) (dsa, rsa, etc) are you using Peter :)? I'm using ECDSA for both the host and user keys. -- Peter Jeremy signature.asc Description: PGP signature
Re: Mosh regression between 10.x and 11-stable
On 2016-Aug-11 12:30:23 -0400, John Hood wrote: >I still can't reproduce this on 3 different 11.0-BETA4 servers and a >variety of clients and networks. Can you try and identify a more >portable repro or at least figure out why it fails on your system? > >Please try applying this patch, too. It's a shot in the dark, though. That patch seems to fix the problem I'm seeing. Not waiting for output to drain is consistent with the symptoms I'm seeing, though I have no idea why only my Linux client is affected. -- Peter Jeremy signature.asc Description: PGP signature
Re: Mosh regression between 10.x and 11-stable
On 2016-Aug-10 14:32:15 -0400, john hood wrote: >On 8/10/16 4:18 AM, Peter Jeremy wrote: >> I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 >> r303811 and mosh to that host from my Linux laptop stopped working. All >> I get on the laptop is: >> $ mosh remotehost >> Connection to remotehost closed. >> /usr/bin/mosh: Did not find mosh server startup message. >> 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. > >Do you know if the message is getting out of mosh-server? into sshd? >Do you know if mosh-server is actually running? (It will log utmp >entries on startup.) mosh-server is running - I can see it from another session and redirecting verbose output into a file, I get: mosh-server (mosh 1.2.5) [build mosh 1.2.5] Copyright 2012 Keith Winstein License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. [mosh-server detached, pid = 4202] Warning: termios IUTF8 flag not defined. Character-erase of multibyte character sequence probably does not work properly on this platform. I can't tell if it's actually writing into the remote ssh process. >> 2) it's racy because I can get it from "always fails" to "sometimes works". > >How do you get it there? - Add '-v' to the local ssh command. - ktrace the remote mosh-server process (this seems to make it consistently work). -- Peter Jeremy signature.asc Description: PGP signature
Mosh regression between 10.x and 11-stable
I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 r303811 and mosh to that host from my Linux laptop stopped working. All I get on the laptop is: $ mosh remotehost Connection to remotehost closed. /usr/bin/mosh: Did not find mosh server startup message. I've tried rebuilding mosh (and all dependencies) on the host to no avail. This isn't the DSA change that's been discussed elsewhere: I can SSH from my laptop to the host without problem. I can also manually invoke mosh-client and mosh-server and it works. Unfortunately, mosh has no provision for debugging. I've tried hacking the mosh perl script to make it more verbose and that shows that: 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. 2) it's racy because I can get it from "always fails" to "sometimes works". My suspicion is that something has changed in either sshd or TCP that is resulting in the connection going away before the stdout from the remote mosh-server makes it out from the local ssh process. I've looked at tcpdump's of both successful and failed SSH sessions but don't see anything obviously different (encryption makes it difficult to decode the session). Has anyone else seen this behaviour or have any ideas what might be causing it? -- Peter Jeremy signature.asc Description: PGP signature
FreeBSD 11.0-BETA2 won't boot on an Acer Aspire 5560
I'm trying to boot the 11.0-BETA2/amd64 memory stick image and the kernel panics: (Following copied by hand): ACPI APIC Table: ... acpi0: on motherboard ACPI Error: Hardware did not change modes (20160527/hwacpi-160) ACPI Error: Could not transition to APCI mode (20160527/evxfevnt-105) ACPI Warning: AcpiEnable failed (20160527/utxfinit-184) acpi0: Could not enable ACPI: AE_NO_HARDWARE_RESPONSE device_attach: acpi0 attach returned 6 Followed by a NULL dereference panic at nexus_acpi_attach+0x89 The system boots a 10.0-RELEASE/amd64 memstick (the only other image I have conveniently to date) without problem. -- Peter Jeremy signature.asc Description: PGP signature
Re: Recognizing SMR HDDs
On 2016-May-26 08:42:53 +0200, Gary Jennejohn wrote: >Now that ken@ has checked in the SMR code I'm wondering how I can see >whether it's having any effect. camcontrol(8) has been enhanced with SMR options and there's a new zonectl(8) command - these should be able to report whether the drive is recognized as a host-aware or host-managed SMR drive. I believe that drive-managed SMR drives don't admit to anything. >Does the fact that the drive appears as a /dev/daX play any role? USB drives are handled via the SCSI CAM layer rather than as SATA drives. It's possible that either the umass(4) driver or your USB to SATA adapter are not correctly handling the relevant commands. -- Peter Jeremy signature.asc Description: PGP signature
Re: qsort() documentation
On 2016-Apr-20 08:45:00 +0200, Hans Petter Selasky wrote: >There is something which I don't understand. Why is quicksort falling >back to insertion sort which is an O(N**2) algorithm, when there exist a >O(log(N)*log(N)*N) algorithms, which I propose as a solution to the >"bad" characteristics of qsort. O() notation just describes the (normally, worst case) ratio of input size to runtime for a given algorithm: Increasing the input size by (say) 100× means an insertion sort will take about 1× as long to run, whilst the "best" algorithms would take about 2000× as long. It says nothing about how fast sorting (say) 1000 items takes with either sort or how they behave on "typical" inputs. In general, the fancier algorithms might have better worst-case O() numbers but they have higher overheads and may not perform any better on typical inputs - so, for small inputs, insertion sort or bubble sort may be faster. IMO: - If you're only sorting a small number of items and/or doing it infrequently, the sort performance doesn't really matter and you can use any algorithm. - If you're sorting lots of items and sort performance is a real issue, you need to examine the performance of a variety of algorithms on your input data and may need to roll your own implementation. As long as qsort() behaves reasonably and its behaviour is documented sufficiently well that someone can decide whether or not to rule it out for their specific application, that is (IMHO) sufficient. -- Peter Jeremy signature.asc Description: PGP signature
Re: gettimeofday((void *)-1, NULL) implicates core dump on recent FreeBSD 11-CURRENT
On 2015-Jul-08 12:22:03 -0700, Garrett Cooper wrote: >On Jul 8, 2015, at 12:17, Doug Rabson wrote: > >> As far as I can tell, POSIX doesn't require either EFAULT or any other >> behaviour - the text in http://www.open-std.org/jtc1/sc22/open/n4217.pdf >> just says, "No errors are defined". Our man page is wrong and any real >> program which relies on gettimeofday not faulting when given bad inputs is >> broken. > >I would suggest the following: >1. Document behavior in NOTES about gettimeofday returning EFAULT with the >specific scenarios kib mentioned, segfaulting otherwise (wordsmithing the >actual info of course). Otherwise, it might confuse people who look at the >manpage later. I would suggest adding a comment to intro(2) noting that not all functions listed in section 2 are necessarily system calls and may report error conditions (or maybe "perform argument validation") differently when implemented in userland. Note that the issues with gettimeofday() also apply to clock_gettime(). I'm not sure if we want to explicitly document the conditions under which gettimeofday() (or clock_gettime()) are implemented in userland vs syscalls because that is guaranteed to get stale over time. How about stating that these functions are implemented as syscalls only if the AT_TIMEKEEP value reported by "procstat -x" is NULL. -- Peter Jeremy pgpNkOswpFC0C.pgp Description: PGP signature
Re: Bug-report of sorts...
On 2015-Jan-30 22:24:50 +, Poul-Henning Kamp wrote: >But the point is I never get to the webpage, local_unbound just doesn't >seem to be able to resolve anything through the DHCP appointed server, >despite the fact that dig(1) does so just fine. How about some packet captures showing the request/response differences between dig(1) and local_unbound? -- Peter Jeremy pgphVJ2onIPFJ.pgp Description: PGP signature
Re: [CFT] Paravirtualized KVM clock
On 2015-Jan-04 11:56:14 -0600, Bryan Venteicher wrote: >For the last few weeks, I've been working on adding support for KVM clock >in the projects/paravirt branch. Currently, a KVM VM guest will end up >selecting either the HPET or ACPI as the timecounter source. Unfortunately, >this is very costly since every timecounter fetch causes a VM exit. KVM >clock allows the guest to use the TSC instead; it is very similar to the >existing Xen timer. A somewhat late response but have you looked at https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f I've been running this[*] on a Google Compute Engine instance for about 6 months without problems. [*] I had to patch out the test for KVM_FEATURE_CLOCKSOURCE_STABLE_BIT but I think that's a GCE issue. -- Peter Jeremy pgpi9_M8QUFuE.pgp Description: PGP signature
Re: mk output during builds: duplicate script for target "...." ignored
On 2014-Sep-05 18:18:15 +, "Bjoern A. Zeeb" wrote: >Started the last 48 hours at some time: It's now fixed for me. I think the fix was r271168. -- Peter Jeremy pgpv2g5pS98PC.pgp Description: PGP signature
Re: keyboard break to debugger broken?
On 2014-Jul-04 02:28:48 -0700, John-Mark Gurney wrote: >So, I recently tried to break into the debugger w/ the various key >sequences that I know about, and none of them worked... I've tried >CTRL-ESC, ALT-ESC, CTRL-ALT-ESC, CTRL-PRTSCR, ALT-PRTSCR and >CTRL-ALT-PRTSCR, and many other different ones... I've verified that >I can sysctl debug.kdb.enter=1 to enter the debugger, and the >CTRL-ALT-PAUSE works to suspend the machine, and CTRL-ALT-DEL works >to reboot... > >Does anyone know if this works? It works for me on 10.0. Do you have debug.kdb.break_to_debugger=1 and hw.syscons.kbd_debug=1 (if you're using syscons)? -- Peter Jeremy pgpRWEUgfMxEM.pgp Description: PGP signature
Re: OpenSSL vs. LibreSSL (OpenBSD)
On 2014-Apr-25 05:00:38 -0400, Zack Gold wrote: >An important thing to note here is motive. The Linux Foundation is >housing this "Core Infrastructure Initiative" project, and so they are >the ones who get all the money. "The Initiative's funds will be >administered by the Linux Foundation and a steering group comprised of >backers of the project as well as key open source developers and other >industry stakeholders." So, it might be in the interest of these >people to not necessarily fix bugs. They might be interested in other >things, like ownership. Though, this may be a bit irrational. It has occurred to me that Linux (in general, not the Foundation) contains a number of religious zealots and the current OpenSSL license is not in keeping with their religion. And there have been previous cases where portable open source software has passed into the maintainership of Linux groups and had all the cross-platform code excised to make it Linux-only. -- Peter Jeremy pgpwNAwcA6h9m.pgp Description: PGP signature
Re: Import of DragonFly Mail Agent
On 2014-Feb-24 10:44:30 -0600, Bryan Drewery wrote: > >I have the Oreilly sendmail book here and it's thicker than The Design >and Implementation of the FreeBSD Operating System. That's quite an >application! More impressively, ISTR it's thicker than "The Magic Garden Explained" - which is the SVR4 internals. -- Peter Jeremy pgpXr6FrMeCfw.pgp Description: PGP signature
Re: ZFS command can block the whole ZFS subsystem!
On 2014-Jan-05 09:11:38 +0100, "O. Hartmann" wrote: >On Sun, 5 Jan 2014 10:14:26 +1100 >Peter Jeremy wrote: > >> On 2014-Jan-04 23:26:42 +0100, "O. Hartmann" >> wrote: >> >zfs list -r BACKUP00 >> >NAME USED AVAIL REFER MOUNTPOINT >> >BACKUP00 1.48T 1.19T 144K /BACKUP00 >> >BACKUP00/backup 1.47T 1.19T 1.47T /backup >> >> Well, that at least shows it's making progress - it's gone from 2.5T >> to 1.47T used (though I gather that has taken several days). Can you >> pleas post the result of >> zfs get all BACKUP00/backup >BACKUP00/backup deduponlocal This is your problem. Before it can free any block, it has to check for other references to the block via the DDT and I suspect you don't have enough RAM to cache the DDT. Your options are: 1) Wait until the delete finishes. 2) Destroy the pool with extreme prejudice: Forcably export the pool (probably by booting to single user and not starting ZFS) and write zeroes to the first and last MB of ada3p1. BTW, this problem will occur on any filesystem where you've ever enabled dedup - once there are any dedup'd blocks in a filesystem, all deletes need to go via the DDT. -- Peter Jeremy pgp3MDihoDvIU.pgp Description: PGP signature
Re: ZFS command can block the whole ZFS subsystem!
On 2014-Jan-04 23:26:42 +0100, "O. Hartmann" wrote: >zfs list -r BACKUP00 >NAME USED AVAIL REFER MOUNTPOINT >BACKUP00 1.48T 1.19T 144K /BACKUP00 >BACKUP00/backup 1.47T 1.19T 1.47T /backup Well, that at least shows it's making progress - it's gone from 2.5T to 1.47T used (though I gather that has taken several days). Can you pleas post the result of zfs get all BACKUP00/backup -- Peter Jeremy pgpmSrBIo4DlN.pgp Description: PGP signature
Re: ZFS command can block the whole ZFS subsystem!
On 2014-Jan-03 20:25:35 +0100, "O. Hartmann" wrote: >[~] zfs get all BACKUP00 >NAME PROPERTY VALUE SOURCE ... >BACKUP00 usedbysnapshots 0 - >BACKUP00 usedbydataset 144K - >BACKUP00 usedbychildren2.53T - >BACKUP00 usedbyrefreservation 0 - >Funny, the disk is supposed to be "empty" ... but is marked as used by >2.5 TB ... That says there's another filesystem inside BACKUP00 which has 2.5TB used. What are the results of: zpool status -v BACKUP00 zfs list -r BACKUP00 -- Peter Jeremy pgpJndNkyBTKH.pgp Description: PGP signature
Re: PACKAGESITE spam
On 2013-Dec-22 11:53:17 -0800, Darren Pilgrim wrote: >Because of that deinstall log. When you use `pkg install` to upgrade a >port, you get something like this: > >Jul 10 23:06:40 chombo pkg-static: ca_root_nss-3.15.1 installed >Nov 29 15:04:52 chombo pkg: ca_root_nss reinstalled: 3.15.2_1 > >That information does not exist in the pkg database. I agree that's a serious bug/regression in the pkg database: With the old pkg system, I could tell when a port was installed by looking at the timestamps on the +COMMENT file. The install time is needed to answer questions like "does this entry in UPDATING affect me" (ie have I rebuilt the port since the entry date). It's something I used regularly and its absence is a PITA. I shouldn't need to rummage through /var/log/messages - and in any case, by default FreeBSD only keeps 500K of messages history (about a month in my case) so the information has probably rotated into the bit bucket. I agree that having a pkg audit trail would be useful. Unfortunately, what we have today is not an audit trail and isn't especially useful. -- Peter Jeremy pgpVS_m9BxiAC.pgp Description: PGP signature
Re: [Call For Help] Clang + OpenJDK + head + amd64 == cocktail of death (for clusters)
On 2013-Jul-25 10:39:17 +0200, Baptiste Daroussin wrote: >After some investigation we discover that blacklisting openjdk6 allows the >building process to go to completion again. ... >It seems to happen only on head amd64, so far we think it is only >happening when jdk is built with clang. This mail arrives at an opportune time. I've just discovered that if I build openjdk6 with clang (on head/amd64), the resultant jdk SEGV's if I again try to build openjdk6. If I build it with "USE_GCC=any" then the problem goes away. >I have no time, neither skill to investigate that, I don't have the time to investigate further but forcing the use of gcc instead of clang is at least a workaround. -- Peter Jeremy pgpDa0UXCa_Nr.pgp Description: PGP signature
Re: access to hard drives is "blocked" by writes to a flash drive
On 2013-Mar-03 23:12:40 -0800, Don Lewis wrote: >On 4 Mar, Konstantin Belousov wrote: >> It could be argued that the current typical value of 16MB for the >> hirunningbufspace is too low, but experiments with increasing it did >> not provided any measureable change in the throughput or latency for >> some loads. > >The correct value is probably proportional to the write bandwidth >available. The problem is that write bandwidth varies widely depending on the workload. For spinning rust, this will vary between maybe 64KBps (512B random writes) and 100-150MBps (single-theaded large sequential writes). The (low-end) SSD in my Netbook also has about 100:1 variance due to erase blocking. How do you tune hirunningbufspace in the face of 2 or 3 orders of magnitude variance in throughput? Especially since SSDs don't gradually degrade - they hit a brick wall. -- Peter Jeremy pgpZfJbSDrVSA.pgp Description: PGP signature
Re: access to hard drives is "blocked" by writes to a flash drive
On 2013-Mar-02 18:29:54 +0100, deeptech71 wrote: >When one of my flash drives is being heavily written to; typically by >``svn update'' on /usr/src, located on the flash drive; the following >can be said about filesystem behavior: > >- ``svn update'' seems to be able to quickly update a bunch of files, > but is then unable to continue for a period of time. This behavior > is cyclical, and cycles several times, depending on the amount of > updating work to be done for a particular run of ``svn update''. This sounds like normal flash behaviour: You can only write to erased blocks. The SSD firmware attempts to keep a free pool of erased blocks but if you write too fast, you empty the free pool and need to wait for the wear-levelling algorithm to move blocks around and erase them. Enabling TRIM (the '-t' flag on tunefs) will help if the drive supports TRIM (if it doesn't, it'll probably just lockup). Otherwise, you need to either put up with it or upgrade to a better SSD. I run into this regularly with the low-end SuperTalent drive in my Netbook but have never seen it with the OCZ Agility4 that I use for L2ARC in my fileserver. -- Peter Jeremy pgpPsz41Q1HhI.pgp Description: PGP signature
Re: No ZFS when loading modules from loeader prompt
On Wed, Feb 20, 2013 at 7:05 AM, O. Hartmann wrote: > At the loader prompt, I need to unload the buggy kernel and load the old > working one via > > load /boot/kernel.old/kernel > > Then I load also the ZFS related modules > > load /boot/kernel.old/opensolaris.ko > load /boot/kernel.old/zfs.ko > > Issuing boot at the end of that stage boots the kernel - the old one > -successfully - but there is no working ZFS and no ZFS volume gets > mounted although the rc.conf is executed correctly. > > What am I doing wrong at that point? Why isn't ZFS run and mount properly? Last time I ran into this problem, the issue was that "unload" also unloaded the zpool.cache file and the ZFS code relied on that to find the kernel. I don't recall what the workaround was. On 2013-Feb-20 08:17:46 -0800, Freddie Cash wrote: >Sounds like a perfect use case for Boot Environments. Create a new BE, >install the new kernel into it, set it as the default, reboot. If it >fails, you manually set the previous BE as the default, and reboot. That >way, your "known-good", working environment is never affected. How do you change your BE in the loader? Or how do you change your BE when you can't boot? -- Peter Jeremy pgpHx5Un14coz.pgp Description: PGP signature
Re: Zpool surgery
On 2013-Jan-27 14:31:56 -, Steven Hartland wrote: >- Original Message - >From: "Ulrich Spörlein" >> I want to transplant my old zpool tank from a 1TB drive to a new 2TB >> drive, but *not* use dd(1) or any other cloning mechanism, as the pool >> was very full very often and is surely severely fragmented. > >Cant you just drop the disk in the original machine, set it as a mirror >then once the mirror process has completed break the mirror and remove >the 1TB disk. That will replicate any fragmentation as well. "zfs send | zfs recv" is the only (current) way to defragment a ZFS pool. -- Peter Jeremy pgp7mByYv45q2.pgp Description: PGP signature
Re: Programmer dvorak layout for syscons
On 2012-Nov-20 02:42:50 +0200, mbsd wrote: >I've been using this layout for a long time in X and I create kbdmap for >syscons. > >Does it any chance to be put in source tree? So my question is, is it >worth. I suggest you write a PR that includes the keymap and an appropriate patch for /usr/share/syscons/keymaps/INDEX.keymaps as well as explaining how it differs from the 9 existing Dvorak keymaps. -- Peter Jeremy pgpSNEQbnvSGA.pgp Description: PGP signature
Re: HEADS UP: Forth Optimizations
On 2012-Nov-10 16:53:10 -0800, Devin Teske wrote: >Can someone help review this for the commit log? I've had a look through the proposed patch and my comments follow. Other than that, it looks good to me. >Index: menu-commands.4th >=== >--- menu-commands.4th (revision 242835) >+++ menu-commands.4th (working copy) ... >@@ -185,21 +240,21 @@ variable root_state ... > s" set kernel=${kernel_prefix}${kernel[N]}${kernel_suffix}" >-\ command to assemble full kernel-path >- -rot tuck 36 + c! swap\ replace 'N' with array index value >- evaluate \ sets $kernel to full kernel-path >+ 36 +c! \ replace 'N' with ASCII numeral >+ evaluate I think the "sets $kernel to full kernel-path" comment is worth keeping. > s" set root=${root_prefix}${root[N]}${root_suffix}" >-\ command to assemble root image-path >- -rot tuck 30 + c! swap\ replace 'N' with array index value >- evaluate \ sets $kernel to full kernel-path >+ 30 +c! \ replace 'N' with ASCII numeral >+ evaluate Likewise, this could do with a (corrected) comment that it sets $root to the full path to root. >Index: menu.4th >=== >--- menu.4th (revision 242835) >+++ menu.4th (working copy) >@@ -184,18 +223,15 @@ create init_text8 255 allot > > \ base name of environment variable > loader_color? if >- s" ansi_caption[x]" >+ dup ansi_caption[x] > else >- s" menu_caption[x]" >+ dup menu_caption[x] > then Could this be simplified to = dup = loader_color? if = ansi_caption[x] = else = menu_caption[x] = then Or, at a higher level, should this whole block be pulled into a new word (along with similar words for toggled_{ansi,text}[x] and {ansi,menu}_caption[x][y]? >@@ -227,36 +263,26 @@ create init_text8 255 allot ... > getenv dup -1 <> if > \ Assign toggled text to menu caption Some comments on stack contents around here would make it somewhat easier to follow what is going on. >@@ -329,19 +340,18 @@ create init_text8 255 allot ... > \ This is highly unlikely to occur, but to make > \ sure that things move along smoothly, allocate > \ a temporary NULL string > >+ drop ( getenv cruft ) > s" " > then > then Is this the memory leak? If so, can I suggest that this be commited separately since it is a simple change and is distinct from the other changes you are proposing. >@@ -357,14 +367,14 @@ create init_text8 255 allot > \ > \ Let's perform what we need to with the above. > >- \ base name of menuitem caption var >+ \ Assign array value text to menu caption >+ 4 pick According to the docementation just above this hunk, there are only 4 items on the stack, so "4 pick" seems wrong, though it is consistent with my understanding of the old code. The "2 pick [char] 0" you added earlier seems to similarly be out-by-one, though consistent. >@@ -521,17 +528,20 @@ create init_text8 255 allot > > \ If this is the ACPI menu option, act accordingly. > dup menuacpi @ = if >- acpimenuitem ( -- C-Addr/U | -1 ) >+ dup acpimenuitem ( n -- n n c-addr/u | n n -1 ) >+ dup -1 <> if >+ 13 +c! ( n n c-addr/u -- n ) \ replace 'x' I think the stack here should be ( n n c-addr/u -- n c-addr/u ) >@@ -950,100 +914,43 @@ create init_text8 255 allot > > 49 \ Iterator start (loop range 49 to 56; ASCII '1' to '8') > begin >- \ Unset variables in-order of appearance in menu.4th(8) Does the order matter? I notice you've changed it. pgpjhm7HlFkWe.pgp Description: PGP signature
Re: [head tinderbox] failure on arm/arm
On 2012-Nov-10 09:16:32 +1100, Brett wrote: >Just an observation: a few years ago when I got sick of Linux's >"headlong rush" development model, I subscribed to various BSD >mailing lists to see what else was out there. I considered FreeBSD at >the time - there was a neverending avalanche of "[head tinderbox] >failure" messages. The Project tries to avoid it but occasional build failures on the development branch are very likely to occur. As a new user, you would be much better off starting with a release branch. >This told me that I would be more likely to be running code written >by people who knew what they were doing if I went with Open, Net, or >DragonflyBSD. I think that's being unfair. Do Open, Net or DFly have an equivalent to the tinderboxes that do automated test builds and report failures? And, since you have replied to an ARM failure, DragonflyBSD would not be an option since it doesn't support ARM. -- Peter Jeremy pgpggt7LmRYN1.pgp Description: PGP signature
Re: FORTRAN vs. Fortran (was: November 5th is Clang-Day)
On 2012-Nov-02 11:21:10 -0500, Brooks Davis wrote: >On Fri, Nov 02, 2012 at 10:21:19AM +, Anton Shterenlikht wrote: >> It's a shame though that, with LLVM as the >> default compiler, further development of >> FreeBSD/ia64 and FreeBSD/sparc64 >> will probably suffer and then stop altogether. > >If you read either my annoucment or the diff closly you will note that >the default it only changing for x86 architectures. Even with all the best of intentions, once the x86 architectures (which cover the bulk of the user and developer mass) migrate to a different toolchain, the risk of bitrot in the GNU toolchain decomes non-negligible. And once it breaks, there may not be the critical mass to repair it. This is basically what happened to the Alpha. -- Peter Jeremy pgpPdXemjRuOy.pgp Description: PGP signature
Re: memory warnings r240891 | dmesgg
On 2012-Oct-04 23:51:09 +0400, Sergey Kandaurov wrote: >On 4 October 2012 20:18, Darrel wrote: >> warning: total configured swap (2621440 pages) exceeds maximum >> recommended amount (1852656 pages). ... >This is because kernel needs some memory to manage swap too. >Currently for amd64 this roughly reduces to the following rule >(My apologies in advance for the extra simplification): > >100MB RAM per 800MB swap space. That is oversimplified to the point of being wrong. As of HEAD r239255 and 9-stable r240097, there's no longer a limit on amd64. The limit is still required on 32-bit architectures due to the limited KVA available. The actual KVA requirements (RAM is only allocated when the swap space is actually used) is about 5MB KVA per 1GB swap. The default swzone for i386 was 32MiB - which is sufficient for ~7GB swap (the 1852656 pages reported above) and was increased to 34.5MB for i386 in r239730 to support ~8GB swap (this is also in r240097). (It's all approximate because of the way swap space is allocated using struct swblock). See the thread starting http://lists.freebsd.org/pipermail/freebsd-current/2012-August/035839.html for more details. -- Peter Jeremy pgprxHjDiuWkT.pgp Description: PGP signature
Re: sysctl kern.ipc.somaxconn limit 65535 why?
On 2012-Oct-03 19:45:01 +0100, free...@chrysalisnet.org wrote: >In addition we had to migrate all our mysql servers from freebsd to debian >because they were hitting some arbitary OS limit but I could never figure >out what, sys% usage went through the roof when this limit was hit, issue >didnt occur on debian. Did you report this issue on any of the FreeBSD mailing lists? Reporting a problem doesn't guarantee that it will be fixed (unfortunately) but not reporting a problem makes it extremely unlikely that it will be fixed. > I feel recently freebsd is more focused on desktop's >and as such developer's never develop for a heavy server usage scenario, This isn't intentionally true but it's true that few developers run large servers so they may not run into some issues that only impact large systems. Again, it's up to people who do run such systems to provide feedback about bottlenecks & issues they hit so that they can be fixed. >I keep coming across hardcoded low limits. As rightly pointed out default There are lots of defaults that were set some time (potentially decades) ago and may no longer be optimal. It's unrealistic to expect that all the defaults are correct in all circumstances and this is one area where end users can help by flagging defaults that they find need tuning. >values now days are useless 128 for somaxconn? maybe ok for a desktop. But, as others have pointed out, this isn't one of them. Can you please provide more details on a use scenario where a listen(2) backlog exceeding 128 is reasonable. > I cant tell app developers to >fix their apps to work on FreeBSD, they dont care, if it works fine on >windows and linux then the app isnt broken as far as they are concerned. FreeBSD is not Windows or Linux and never will be. There are lots of grey areas in the various standards that *BSD, Linux, Solaris, Windows etc comply with and some OSs interpret these grey areas differently to others (in some areas, it seems Linux has deliberately done things differently to other Unices for no obvious reason, and the GNU embrace-and-extend philosophy doesn't help). Writing portable code takes more than adding some .ac/.am files to an arbitrary blob of code and just because a developer thinks their app isn't broken doesn't make them right. BTW, I note that this was sent to -current? Are you running HEAD on production servers? If so, your feedback on issues you encounter would be appreciated so that they can be corrected before they make it into a RELEASE. -- Peter Jeremy ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Shouldn't world be able to build without /usr/include?
No. The first stage of the buildworld is creating cross-tools - which run on the existing world (and hence need its include files and libs). -- Peter Jeremy pgpFV9rJata7v.pgp Description: PGP signature
Re: pkgng suggestion: renaming /usr/sbin/pkg to /usr/sbin/pkg-bootstrap
On 2012-Aug-26 12:27:41 -0700, Doug Barton wrote: >On 08/26/2012 12:08, Ian Lepore wrote: >> Maybe it could rename itself to /usr/local/sbin/pkg-bootstrap as part of >> replacing itself, so that you could re-bootstrap your way out of a >> problem later. > >That's certainly creative thinking, but I'm still queasy about 2 >commands with the same name that do 2 different things. And having it >rename itself adds to the confusion down the road. I also like the idea of a pkg-bootstrap command. Possibly a symlink from pkg to pkg-bootstrap, that gets removed as part of the bootstrap process, would help - but it should just tell you how to run pkg-bootstrap. I don't like the idea of pkg{-bootstrap} autonomously installing something I didn't ask for. And I don't like the idea that all pkg commands get bounced through a /usr/sbin/pkg once it has been bootstrapped. >Having a simple pkg bootstrapping tool in the base is a good idea. But >the functionality needs to be extremely limited so that we don't >increase the security exposure; and so that we don't end up in a >situation where a bug fix for something in the base limits our ability >to innovate with pkg in the ports tree. Agreed. BTW, one thing that needs to be considered is how to recover from the embedded public key needing to be invalidated (eg due to the private key being exposed). -- Peter Jeremy pgp6uilrjhsXu.pgp Description: PGP signature
Re: dhclient cause up/down cycle after 239356 ?
On 2012-Aug-22 15:35:01 -0400, John Baldwin wrote: >Hmm. Perhaps we could use a debouncer to ignore "short" link flaps? Kind of >gross (and OpenBSD doesn't do this). For now this change basically ignores >link up events if they occur with 5 seconds of the link down event. The 5 is >hardcoded which is kind of yuck. I'm also a bit concerned about this for similar reasons to adrian@. We need to distinguish between short link outages caused by (eg) a switch admin reconfiguring the switch (which needs the lease to be re-checked) and those caused by broken NICs which report link status changes when they are touched. Maybe an alternative is to just ignore link flaps when they occur within a few seconds of a script_go(). (And/or make the ignore timeout configurable). Apart from fxp(4), does anyone know how many NICs are similarly broken? Does anyone know why this issue doesn't bite OpenBSD? Does it have a work-around to avoid resetting the link, not report link status changes or just no-one has noticed the issue? BTW to jhb: Can you check your mailer's list configuration. You appear to be adding and leaving in the Cc list. -- Peter Jeremy pgp9SoqeQglFI.pgp Description: PGP signature
Re: r239356: does it mean, that synchronous dhcp and dhcplcinet with disabled devd gone?
On 2012-Aug-21 17:25:23 -0400, John Baldwin wrote: >Ok, this is what I came up with, somewhat loosely based on OpenBSD's dhclient. >I tested that it survives the following: I've also done some limited testing on both bge and fxp NICs and haven't run into any problems. In particular the spurious link resets from fxp don't seem to cause any problems. -- Peter Jeremy pgp5gbqPFkDoz.pgp Description: PGP signature
Re: dhclient cause up/down cycle after 239356 ?
On 2012-Aug-21 19:42:17 +0300, Vitalij Satanivskij wrote: >Look's like dhclient do down/up sequence - Not intentionally. >Aug 21 19:21:00 home kernel: fxp0: link state changed to UP >Aug 21 19:21:01 home kernel: fxp0: link state changed to DOWN >Aug 21 19:21:01 home dhclient: New IP Address (fxp0): xx.xx.xx.xx >Aug 21 19:21:01 home dhclient: New Subnet Mask (fxp0): 255.255.255.0 >Aug 21 19:21:01 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.xx >Aug 21 19:21:01 home dhclient: New Routers (fxp0): xx.xx.xx.xx >Aug 21 19:21:03 home kernel: fxp0: link state changed to UP I can reproduce this behaviour - but only on fxp (i82559 in my case) NICs. My bge (BCM5750) and rl (RTL8139) NICs do not report the spurious DOWN/UP. (I don't normally run DHCP on any fxp interfaces, so I didn't see it during my testing). The problem appears to be the $IFCONFIG $interface inet alias 0.0.0.0 netmask 255.0.0.0 broadcast 255.255.255.255 up executed by /sbin/dhclient-script during PREINIT. This is making the fxp NIC reset the link (actually, assigning _any_ IP address to an fxp NIC causes it to reset the link). The post r239356 dhclient detects the link going down and exits. >Before r239356 iface just doing down/up without dhclient exit and >everything work fine. For you, anyway. Failing to detect link down causes problems for me because my dhclient was not seeing my cable-modem resets and therefore failing to reacquire a DHCP lease. -- Peter Jeremy pgptb9EOcZ9Yg.pgp Description: PGP signature
Re: buildworld c++ internal error
On 2012-Aug-20 07:17:59 +0900, Randy Bush wrote: >the only thing a night's sleep got me was the idea of attaching an >external sata drive and putting swap on it. You can also swap to a file via NFS. -- Peter Jeremy pgp62N8KdUtmP.pgp Description: PGP signature
Re: Time to bump default VM_SWZONE_SIZE_MAX?
On 2012-Aug-12 15:44:07 -0700, Colin Percival wrote: >If I'm understanding things correctly, the "maxswzone" value -- set by the >kern.maxswzone loader tunable or to VM_SWZONE_SIZE_MAX by default -- should >be approximately 9 MiB per GiB of swap space. I'm not sure how you got that value. By default, struct swblock is 288 bytes (280 bytes on 32-bit archs) and can store up to 32 pages of swap (the comment in vm/swap_pager.c:swap_pager_swap_init() is wrong). For x86, this is 2.25 MiB per GiB (best case). >The current default for VM_SWZONE_SIZE_MAX was set in August 2002 to 32 MiB; >meaning that anyone who wants to use more than ~ 3.5 GB of swap space ought >to set kern.maxswzone in /boot/loader.conf. In practice, you can't fully populate each swblock. I did a test on my amd64 box by running multiple copies of a program that allocates and dirties a big chunk of RAM and then pause()s. That gave me a 90% swblock utilisation - which I suspect is higher than a typical scenario where memory pressure pushes more randomly unused pages out. Realistically, I'd say that the default VM_SWZONE_SIZE_MAX can handle about 9GB swap (at least, that was my experience). BTW, if you plan on allocating lots of swap, be aware that each swap device is limited to 32GiB - see vm/swap_pager.c:swaponsomething(). -- Peter Jeremy pgpwSk7xMhpGY.pgp Description: PGP signature
Re: [HEADSUP & CFT] pkg 1.0rc1 and schedule
On 2012-Jul-16 07:18:05 +0100, Matthew Seaman wrote: >No. Parallel installs will not work -- the first to start will lock the >DB, and the second won't be able to proceed. Good - it was the locking I was mostly concerned about. As long as the install is locked, it's safe to run multiple port installs on different terminalls without them treading on each other. (Next step, outside pkgng, in to allow paralles builds). Thank you for all the answers. -- Peter Jeremy pgp0v7MUuicxP.pgp Description: PGP signature
Re: [HEADSUP & CFT] pkg 1.0rc1 and schedule
On 2012-Jul-12 10:01:10 +, Baptiste Daroussin wrote: >What is pkg >--- >pkg is a new package manager for FreeBSD. It is designed as a replacement for >the pkg_* tools, and as a full featured binary package manager. A couple of specific questions that I haven't seen answered during this thread or in the wiki: - Can pkgng cope with parallel installs? What happpens if I simultaneously (attempt to) install conflicting packages? - If I use "pkg delete -f", what happens to packages that depended on the forcibly-deleted package? - What happens if I delete a package where I've modified one of the files managed by the package? - What facilities does it have for auditing and repairing the package database? (ie checking for inconsistencies between installed files and the content of the package database) - How does it handle the situation where I install a package that depends on foo version 1.2.3 but have foo version 1.2.4 (or 1.2.2) installed? What about if I have bar version 1.3, which is ABI- compatible with foo version 1.2.3, installed? - Will it detect that a package install would overwrite an existing file? What does it do in this case? - I gather it handles "update package" more intelligently than "uninstall old package, install new package". Will it avoid replacing an old file with an identical one in the new package? If so, what happens to the file metadata (particularly uid, gid and mtime)? - Can it track user-edited configuration files that are associated with packages? - Can it do 2- or 3-way merges of package configuration files? - The README states "Directory leftovers are automatically removed if they are not in the MTREE." How does this work for directories that are shared between multiple packages? Does this mean that if I add a file to a directory that was created by a package, that file will be deleted automatically if I delete the package? -- Peter Jeremy pgpJM9KZGxJce.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-Jul-13 11:58:05 -0400, David Schultz wrote: >I propose we set a timeframe for this, on the order of a few months. ... >If the schedule can't be met, then we can just import Cephes as an >interim solution without further ado. This provides Bruce and Steve >an opportunity to commit what they have been working on, without >forcing the rest of the FreeBSD community to wait indefinitely for >the pie in the sky. This sounds good to me as well and I'd be happy to help. -- Peter Jeremy pgpmY7CNvs676.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-Jul-11 15:32:47 -0700, Steve Kargl wrote: >I know an approach to implementing many of the missing >functions. Are you willing to share this insight so someone else could do the work? > When I do find >some free time, I look at what is missing and start to >put together a new function. At the moment, it seems >that it takes 3+ years to get a new function written, >tested, and committed. And, from what I can see, much of this is done quietly - which opens up the possibility that two people might both implement the same code or that people will avoid the area in fear of treading on someone else's toes. As I said previously, I believe the existing wiki page could be improved to form a central co-ordinating point to show what what activity is (or isn't) occurring. >but most people seem to push the "easy button" and want >to grab either cephes or netlib's libm. There are >technical issues with this approach that I won't >rehash again. Doing it properly requires significant effort by people with fairly specialised skills. Whilst the project has several people with the skills, it appears that none of them currently have the time. In the meantime, FreeBSD is taking free kicks from other FOSS groups that have gone down the quick-and-dirty path. AFAIK, none of the relevant standards (POSIX, IEEE754) have any precision requirements for functions other than +-*/ and sqrt() - all of which we have correctly implemented. I therefore believe that, for the remaining missing functions, the Project would be best served by committing the best code that is currently available under a suitable license and cleaning it up over time (as was done for the current libm). -- Peter Jeremy pgpPVXxJTjV0R.pgp Description: PGP signature
Re: Adding support for WC (write-combining) memory to bus_dma
On 2012-Jul-12 10:40:27 -0400, John Baldwin wrote: >contigmalloc(). In fact, even better is to call kmem_alloc_contig() directly >rather than using contigmalloc(). ... >Peter, this is somewhat orthognal (but related) to your bus_dma patch which is >what prompted me to post this. Overall, the change seems good to me. My sole thought on the API was whether the actual attribute should be passed, rather than having a couple of new BUS_DMA_ flags but you've addressed that in a followup. One change is that previously allocated memory was all charged to M_DEVBUF via the malloc_type_allocated() call in contigmalloc() whereas now only small allocations are counted. This would seem to indicate that large bus_dmamem_alloc() allocations won't be visible in (eg) "vmstat -m". -- Peter Jeremy pgpZoejmmJeAW.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-Jul-08 19:01:07 -0700, Steve Kargl wrote: >Well, on the most popular hardware (that being i386/amd64), >ld80 will use hardware fp instruction while ld128 must be >done completely in software. The speed difference is >significant. AFAIK, of the architectures that FreeBSD supports, only sparc64 defines ld128 in the architecture and I don't believe there are any SPARC chip implementations that implement ld128 math in hardware. For that matter, I don't believe anything except x86 provides full IEEE FP support in hardware - most architectures require software assistance for subnormals and some corner cases. If your application happens to hit those cases often, performance will also suffer. On 2012-Jul-08 20:05:04 -0700, Steve Kargl wrote: >AFAIK, neither gcc in base nor clang would be c99 complaint >even if all of the c99 math functions were available. That sort of argument can easily get circular. Lets get the C99 bits of libm out of the way and then we can have another bikeshed about the shortcomings of the compiler(s). On 2012-Jul-08 19:56:52 -0400, David Schultz wrote: >Yes, Bruce has ld128 versions, and clusteradm very kindly got us a >sparc64 machine to test on. That was about the time I ran out of time >to keep working on it. If someone wants to pick it up, that would be >great. I have access to a couple of SPARC systems as well and would be willing to help work on the missing bits. On 2012-Jul-10 18:58:01 -0400, David Schultz wrote: >On Tue, Jul 10, 2012, Rainer Hurling wrote: >> powl: src/extra/trio/triostr.c >> src/extra/trio/trio.c >> src/main/format.c > >It's hard to do a good job on powl(), but the simple approach >(exp(log(x)*y)) plus a few special cases may suffice for many uses. A simplistic exp(log(x)*y) throws away 15 bits of precision (size of the FP exponent field). cephes has a powl() that appears to do better or, alternatively, it shouldn't be too difficult to extend the approach used by __ieee754_pow() using long doubles. >> BTW: There seems to be a discrepancy about missing functions listed in >> http://wiki.freebsd.org/MissingMathStuff and in >> http://svnweb.freebsd.org/base/head/lib/msun/src/math.h?r1=227472&r2=236148&pathrev=236148. >> So the wiki is a bit outdated now? >My list: [elided] I was thinking that a wiki page would be a good spot to co-ordinate the work (as well as making it clear what is still to be done). The existing page needs some TLC to be useful. -- Peter Jeremy pgpJMDQgZRF8K.pgp Description: PGP signature
RAM fragmention problems
I am running into a problem with RAM fragmentation causing contigmalloc() failures and wonder if anyone has a tool that that would allow me to identify the owner(s) of pages of RAM within a region on amd64. -- Peter Jeremy pgpJ5bQo0Tiwa.pgp Description: PGP signature
Re: Add new syscons font to FreeBSD current release
On 2012-Jun-20 17:38:36 +0430, Mohammad Shafiee wrote: >I've made a Persian font for FreeBSD syscons. >You can download the font from here: >http://sourceforge.net/projects/bsdpersiancons/ > >How can I add this font to FreeBSD current release? As a first step, I'd create a port for it. See http://www.freebsd.org/doc/en/books/porters-handbook/ -- Peter Jeremy pgprd7bzEzHR2.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-Jun-01 10:29:13 -0400, John Baldwin wrote: >On Friday, June 01, 2012 1:55:10 am Eitan Adler wrote: >> Also, are there BSD licensed naive implementations of these functions >> we can use? Would it be okay to has slow, but accurate versions of >> these functions as a stopgap? > >Peter Jeremy more or less has a stopgap already ready judging by the comments >in the thread thus far. There's probably an hours work by either stephen@ or myself to adapt the work I did on cephes in Sage to a standalone FreeBSD port. Unfortunately, both stephen@ & I are currently otherwise occupied and other comments in this thread suggest that the inclusion of such a port would be strongly opposed. Note that cephes isn't "slow but accurate" - it's reasonably fast but naive and therefore dodgy in edge cases. -- Peter Jeremy pgpHAsPC0mWbI.pgp Description: PGP signature
Re: OptionalObsoleteFiles.inc completeness
On 2012-Jun-01 20:50:24 +0200, Ulrich Spörlein wrote: >Why is xargs even calling /bin/echo when "utility" is not specified. Because that's what it's documented as doing. >Shouldn't it just print a certain number of arguments (one in this >case)? The current approach is simpler - there's always "utility" and it defaults to "/bin/echo". Therefore xargs can just always fork/exec. I agree that special-casing the default to have xargs print the relevant number of arguments would be more efficient. -- Peter Jeremy pgpjWzNyZgd8T.pgp Description: PGP signature
Re: OptionalObsoleteFiles.inc completeness
On 2012-May-30 13:27:03 +1000, Peter Jeremy wrote: >On 2012-May-29 02:18:25 +0400, Dmitry Marakasov wrote: >>Then you should try to profile it - my script basically runs >>delete-old delete-old-libs for every knob (131 of them), and it >>hadn't taken more than 4 seconds even once. > >I've done some investigating and the problem is that "xargs -n1" >fork()/exec()s /bin/echo on each file (and there are 5538 files for >me). Changing this to "tr ' ' '\n'" reduces "make delete-old" runtime >to 1.75s - which is much nicer. I've checked a variety of other >systems running 8.x & 9.x and the 97s seems to be anomalously long so >I'll do some more investigating. I've tracked the problem down to excessive VM faults caused by jemalloc. Whilst executing /bin/echo, jemalloc mmap()s two 4MiB chunks of memory. Unless you build with MALLOC_PRODUCTION (which I hadn't), it then proceeds to verify that both blocks are zero-filled. This causes 2048 (unnecessary) page faults (out of a total of 2133). When I rebuilt jemalloc with MALLOC_PRODUCTION, this dropped to 87 page faults (cf 76 an 8.x and 62 on 9.x) and the elapsed time for "make delete-old" dropped to slightly more than 8.x & 9.x. "xargs -n1" is probably a worst case scenario for jemalloc but this probably similarly affects other short-lived processes (and the shell scripts that invoke them). It's a pity that this particular test is a compile-time option. I still think that saving 5500 fork()/exec() pairs is a good reason to switch from "xargs -n1" to "tr ' ' '\n'". -- Peter Jeremy pgp66hvYrS7pF.pgp Description: PGP signature
Re: OptionalObsoleteFiles.inc completeness
On 2012-May-29 02:18:25 +0400, Dmitry Marakasov wrote: >* Peter Jeremy (pe...@rulingia.com) wrote: >> My experience is that it now takes about 2½ minutes on 10.x with warm >> caches, compared to less than 1 second on 8.x. > >Now = after applying my patch or after changing system? Which knobs >were enabled? "Now" as in -current as against 8.x. But, that 2½ mins was wrong, sorry. I recalled "150s" but actually checking, it's really 1:50 (100s). It occurred to me that was an oldish -current (r235127) so I updated to r236183 and the time dropped to 107s. Since this is an oldish P4, I tried a UP kernel and that reduced it to 96s. Your patch made no noticable change (ministat reported no difference with 95% confidence). The system is amd64 with no MK_* knobs defined. >Then you should try to profile it - my script basically runs >delete-old delete-old-libs for every knob (131 of them), and it >hadn't taken more than 4 seconds even once. I've done some investigating and the problem is that "xargs -n1" fork()/exec()s /bin/echo on each file (and there are 5538 files for me). Changing this to "tr ' ' '\n'" reduces "make delete-old" runtime to 1.75s - which is much nicer. I've checked a variety of other systems running 8.x & 9.x and the 97s seems to be anomalously long so I'll do some more investigating. -- Peter Jeremy pgp23vtZvpadf.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-May-28 15:54:06 -0700, Steve Kargl wrote: >Given that cephes was written years before C99 was even >conceived, I suspect all functions are sub-standard. Well, most of cephes was written before C99. The C99 parts of cephes were written to turn it into a complete C99 implementation. > For >example, AFAIK, none of the long double functions are >appropriate for any platform that has an 128-bit long double; >as cephes was written for an Intel 80-bit format. FreeBSD currently supports: 64-bit long doubles on ARM, MIPS and PowerPC; 80-bit long doubles on amd64, i386 and iA64; 128-bit long doubles on SPARC. The lack of LD128 in cephes therefore only affects one (not widely used) platform. The lack of even de facto standards for long double mean that any applications wanting to use them already need to cope with at least a 2:1 precision range. >If portmgr or a port maintainer wants to use a library with >untested implementations of missing libm functions, please do >not put it into /usr/local/lib and call it libm. There some test code in cephes. Can you point me to a suitable test suite for LD80 and LD128? The reason for calling it libm is to avoid having to hack every consumer to add an additional library. On 2012-May-28 16:30:35 -0700, Steve Kargl wrote: >Who's writing the code to test the implementations? That is >better much the problem. Without testing, one might get an >implementation that appears to work until it doesn't! That is equally true of the rest of FreeBSD. The list of open PRs suggests that FreeBSD still has a fair way to go before reaching perfection. And, most of this thread has been about using this code in ports - where the bar is much lower. Who is writing the code to test all the other ports? What is so special about this particular proposed port that it needs to come with solid-gold credentials? > It took >me 3+ years to get sqrtl() into libm, but bde and das (and >myself) wanted to make sure the code worked. Last time I checked (a couple of years ago), FreeBSD was missing 65 C99 libm functions. At 3 years per function, we should have C99 support available early in the 23rd century - which may be a bit late. On 2012-May-28 22:03:43 -0500, Stephen Montgomery-Smith wrote: >1. By being so picky about being so precise, FreeBSD is behind the time >line in rolling out a usable set of C99 functions. And at the current rate, we'll all be long dead before they are available. Whilst I'd far prefer to have a properly verifed library function, I think we are better off with an implementation that has some caveats regarding edge-case behaviour than having nothing. >In the end, I do think it is good to ultimately settle on good C99 >compliant code. But having something intermediate that mostly works is >better than nothing. Especially if it exists only in the ports, and not >in the base code. I agree with this sentiment. What do people do on other free OSs? Does a tested open source C99 libm exist anywhere? glibc implements cpow(x,y) as cexp(y*clog(x)) and cephes does better than that. Is FreeBSD wasting its time writing "correct" C99 code because all the libm consumers expect no better than what glibc offers? I agree that writing correct libm functions is hard. I think a lot of the problem is that it's a mix of lots of boilerplate code testing for special conditions and edge cases that is boring to write and fiddly to get right, together with a kernel that is a pile of polynomial evaluations full of magic numbers that needs specialist skills to write. If we could get someone with the relevant skills to formally list all the special conditions & edge cases for each function, it should be possible to generate both the library C code and test cases from that - which would remove a lot of the tedium. -- Peter Jeremy pgpUnZGDcc79l.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-May-28 13:31:59 -0700, Steve Kargl wrote: >On Mon, May 28, 2012 at 11:01:24AM -0500, Stephen Montgomery-Smith wrote: >> One thing that could be done is to have a "math/cephes" port that adds >> the extra C99 math functions. This is already done in the math/sage >> port, using a rather clever patch due to Peter Jeremy, that applies to >> the cephes code. ... >This is a horrible, horrible, horrible idea. Have you >looked at the cephes code, particularly the complex.h >functions? The cephes code is somewhat a mess layout-wise. Algorithmetically, it seems somewhat variable - some functions are implemented (hopefully correctly) using semi-numerical techniques, whereas others just use mathematical identities which will result in precision loss - though most of the functions include accuracy information. I agree it would be far preferable to have a properly validated C99 libm with all functions having maximum errors of a no more than a few LSB over their complete domain, as well as correct support for signed zeroes, infinities and signalling and non-signalling NaNs but that is a non-trivial undertaking. In the interim, how should FreeBSD handle apps that want a C99 libm? 1) Fail to build them 2) Provide possibly imperfect fallbacks for the unimplemented bits. If someone (I don't have the expertise) wants to identify the cephes functions that are sub-standard, we can include link-time warnings (as done for eg gets(3)) when they are used. -- Peter Jeremy pgpcG5SKNkFm9.pgp Description: PGP signature
Re: Use of C99 extra long double math functions after r236148
On 2012-May-28 11:01:24 -0500, Stephen Montgomery-Smith wrote: >One thing that could be done is to have a "math/cephes" port that adds >the extra C99 math functions. This is already done in the math/sage >port, using a rather clever patch due to Peter Jeremy, that applies to >the cephes code. > >What it would do is to create a /usr/local/lib/libm.so that would >provide the extra functions not currently included in /lib/libm.so, and >then link in /lib/libm.so as well. It would also create its own >/usr/local/include/math.h and /usr/local/include/complex.h as well. Basically, as long as the compiler searches /usr/local/{include,lib} before the base include/lib then , and -lm give the application a complete C99 math implementation by using base functions where they exist and cephes functions where they don't. The patch I wrote for sage can be found at http://trac.sagemath.org/sage_trac/ticket/9543 If there's any interest, I could produce a port for this. Another option would be to import cephes into base and use it to provide the missing C99 functions. Cephes includes copyright notices but the closest I can find to a license is: " Some software in this archive may be from the book _Methods and Programs for Mathematical Functions_ (Prentice-Hall or Simon & Schuster International, 1989) or from the Cephes Mathematical Library, a commercial product. In either event, it is copyrighted by the author. What you see here may be used freely but it comes with no support or guarantee." -- Peter Jeremy pgpYmCz2gMd3i.pgp Description: PGP signature
Re: OptionalObsoleteFiles.inc completeness
On 2012-May-28 23:55:42 +0400, Dmitry Marakasov wrote: >* Peter Jeremy (pe...@rulingia.com) wrote: > >> >2) Is this ok to backport the list from current to stable branches? Pro >> >- it's really simple, con - it will contain files never installed with >> >this (old) branch. >> >> Another con: "make delete-old" on -current takes about 2 orders of >> magnitude longer to run than on 8.x. I would prefer to see some >> effort put into speeding it up before it was backported. > >Is that really a reason while it is still under 4 seconds and is not >usually run more often than updates (which take minutes if not hours)? My experience is that it now takes about 2½ minutes on 10.x with warm caches, compared to less than 1 second on 8.x. For most of that time, there's no output and there's no warning of the increased time. I actually wrote about the poor performance here a couple of weeks ago. -- Peter Jeremy pgpj1hAqZ4ktC.pgp Description: PGP signature
Re: OptionalObsoleteFiles.inc completeness
On 2012-May-27 18:05:41 +0400, Dmitry Marakasov wrote: >2) Is this ok to backport the list from current to stable branches? Pro >- it's really simple, con - it will contain files never installed with >this (old) branch. Another con: "make delete-old" on -current takes about 2 orders of magnitude longer to run than on 8.x. I would prefer to see some effort put into speeding it up before it was backported. -- Peter Jeremy pgptJtyQZ4Lv8.pgp Description: PGP signature
Re: UFS+J panics on HEAD
On 2012-May-24 12:04:21 +0400, Lev Serebryakov wrote: > I afraid, that after real hardware failure (like real HDD death, >not these pseudo-broken-hardware situations, when HDDs is perfectly >alive and in good condition), all data will be lost. I could restore >data from remains of FFS by hands (format is straightforward and >well-known), but ZFS is different story... If your disk dies then you need a redundant copy of your data - either via backups or via RAID. Normally, you'd run ZFS with some level of redundancy so that disk failures did not result in data loss. That said, ZFS is touchier about data - if it can't verify the checksums in your data, it will refuse to give it to you - whereas UFS will hand you back a pile of bytes that may or may the same as what you gave it to store. And you can't necessarily get _any_ data off a failed disk. > Yes, backups is solution, but I don't have money to buy (reliable) >hardware to backup 4Tb of data :( 4TB disks are available but not really economical at present. 2TB disks still seem to be the happy medium. If your data will compress down to 2TB then save it to a disk, otherwise split your backups across a pair of disks. A 2TB disk with enclosure is < I attended "Solaris internals" 5-days training four years ago (when I >worked for Sun Microsystems), and instructor says same words... I have had lots of problems at $work with Solaris UFS quietly corrupting data following crashes. At least with ZFS, you have a detter chance of knowing when your data has been corrupted. -- Peter Jeremy pgpk4t2qrNnV7.pgp Description: PGP signature
Re: "make delete-old" performance.
On 2012-May-16 18:11:32 -0700, Devin Teske wrote: >Right now, I believe the most useful comparison between systems is >(assuming UFS is in play) the output of "tunefs -p" for the >filesystem that the slowness is appearing on. These systems all run ZFS and apart from the first run, there doesn't seem to be any disk activity at all. It looks like the kernel is the bottleneck. >SoftUpdates (and whether it's enabled or disabled) can play a huge >difference in how fast file-deletions are. I've already successfully run "make delete-old" so there are no actual file deletions. This is all just looking for files that aren't present. -- Peter Jeremy pgpI6smYwen8A.pgp Description: PGP signature