Re: ENOTCAPABLE returned without Capsicum
On 2021-May-16 11:48:24 +1000, Peter Jeremy via freebsd-stable wrote: >I am running 13-stable from a couple of weeks ago, without Capsicum >(neither CAPABILITY_MODE nor CAPABILITIES are specified in my kernel). >Despite this, I am getting Capsicum-related errors. As an example: >openat(AT_FDCWD, "/") >will return ENOTCAPABLE. Please ignore. I worked out I was misreading how O_RESOLVE_BENEATH worked. -- Peter Jeremy signature.asc Description: PGP signature
ENOTCAPABLE returned without Capsicum
I am running 13-stable from a couple of weeks ago, without Capsicum (neither CAPABILITY_MODE nor CAPABILITIES are specified in my kernel). Despite this, I am getting Capsicum-related errors. As an example: openat(AT_FDCWD, "/") will return ENOTCAPABLE. Rummaging around the sources, it seems that there's a non-trivial amount of code in kern/vfs_lookup.c that's capable of returning capability-related errors but isn't protected by CAPABILITY_MODE. This seems undesirable since it means that FreeBSD is defaulting to being locked down but unless I build it with Capsicum, there's no way to change the processes capabilities. -- Peter Jeremy signature.asc Description: PGP signature
fileargs_init(3) doesn't work without CAPABILITIES (was: Re: tail(1) broken in 13-stable)
On 2021-May-06 19:07:23 -0400, monochrome wrote: ... >On 5/6/21 7:49 AM, Peter Jeremy via freebsd-stable wrote: ... >> server% tail /COPYRIGHT <&- >> Assertion failed: (procfd > STDERR_FILENO), function service_clean, file >> /usr/src/lib/libcasper/libcasper/service.c, line 394. >> tail: unable to init casper: Socket is not connected >I get a different error on a 13.0-RELEASE machine I converted from 12 to >current about a year ago (bash and sh): > >$ tail /COPYRIGHT <&- >tail: can't limit stdio rights: Bad file descriptor I've done some more testing across a number of systems and narrowed the difference in behaviour down to the presence of the CAPABILITIES option in the kernel (it looks like I never added it to my kernel config on that system): If CAPABILITIES is present then the cap_rights_limit(2) call for the closed FD fails, generating the "can't limit stdio rights" error. (Whether this behaviour is reasonable is a different issue - it was introduced in r348708, based on https://reviews.freebsd.org/D20393 and the issue of closed file descriptors doesn't seem to have been considered). If CAPABILITIES is not present then the cap_rights_limit() failure is (correctly) ignored but the subsequent fileargs_init(3) call gets upset at opening a FD <= 2. This behaviour seems wrong - if CAPABILITIES aren't present in the kernel then the userland behaviour should be the same as if WITHOUT_CASPER is specified. IMO, this is a bug in fileargs_init(3). -- Peter Jeremy signature.asc Description: PGP signature
Re: tail(1) broken in 13-stable
On 2021-May-06 12:59:54 +0200, Mariusz Zaborski wrote: >Could you provide details how to reproduce this? > >On Thu, 6 May 2021 at 12:13, Peter Jeremy via freebsd-stable > wrote: >> >> Since updating from 12-stable to 13-stable, I've found that tail(1) >> crashes, reporting: >> Assertion failed: (procfd > STDERR_FILENO), function service_clean, file >> /usr/src/lib/libcasper/libcasper/service.c, line 394. >> tail: unable to init casper: Socket is not connected >> unless all three of stdin, stdout and stderr are open. Whilst it >> probably doesn't make sense to call tail without stdout open. there's >> no obvious reason to require that stdin or stderr must be open. server% tail /COPYRIGHT <&- Assertion failed: (procfd > STDERR_FILENO), function service_clean, file /usr/src/lib/libcasper/libcasper/service.c, line 394. tail: unable to init casper: Socket is not connected -- Peter Jeremy signature.asc Description: PGP signature
tail(1) broken in 13-stable
Since updating from 12-stable to 13-stable, I've found that tail(1) crashes, reporting: Assertion failed: (procfd > STDERR_FILENO), function service_clean, file /usr/src/lib/libcasper/libcasper/service.c, line 394. tail: unable to init casper: Socket is not connected unless all three of stdin, stdout and stderr are open. Whilst it probably doesn't make sense to call tail without stdout open. there's no obvious reason to require that stdin or stderr must be open. -- Peter Jeremy signature.asc Description: PGP signature
Re: geli broken in 13.0-BETA4 and later on armv8
On 2021-Mar-06 10:39:02 -0800, Oleksandr Tymoshenko wrote: >Peter Jeremy via freebsd-current (freebsd-curr...@freebsd.org) wrote: >> [Adding arm@ and making it clearer that this is armv8-only] >> >> On 2021-Mar-06 20:26:19 +1100, Peter Jeremy >> wrote: >> >On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable >> > wrote: >> >>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 >> >>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - >> >>RK3399, arm64) has changed so that a geli-encrypted partition (using >> >>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on >> >>13.0-BETA4. >> > >> >I've confirmed that the problem is f76393a6305b - reverting that >> >commit fixes the problem in releng/13.0. >> > >> >I've further verified that the bug is still present in main (14.x) >> >at 028616d0dd69. > >Could you test this patch and let me know if it fixes the issue? > >https://people.freebsd.org/~gonzo/patches/armv8crypto-xts-fix.diff Yes, it does. Thank you very much. --- Peter Jeremy signature.asc Description: PGP signature
Re: geli broken in 13.0-BETA4 and later on armv8
[Adding arm@ and making it clearer that this is armv8-only] On 2021-Mar-06 20:26:19 +1100, Peter Jeremy wrote: >On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable > wrote: >>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 >>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - >>RK3399, arm64) has changed so that a geli-encrypted partition (using >>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on >>13.0-BETA4. > >I've confirmed that the problem is f76393a6305b - reverting that >commit fixes the problem in releng/13.0. > >I've further verified that the bug is still present in main (14.x) >at 028616d0dd69. -- Peter Jeremy signature.asc Description: PGP signature
Re: geli broken in 13.0-BETA4 and later
On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable wrote: >Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 >(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - >RK3399, arm64) has changed so that a geli-encrypted partition (using >AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on >13.0-BETA4. I've confirmed that the problem is f76393a6305b - reverting that commit fixes the problem in releng/13.0. I've further verified that the bug is still present in main (14.x) at 028616d0dd69. -- Peter Jeremy signature.asc Description: PGP signature
geli broken in 13.0-BETA4 and later
Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4 (releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 - RK3399, arm64) has changed so that a geli-encrypted partition (using AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on 13.0-BETA4. I've verified that the garbage seems consistent between reboots and isn't impacted by enabling the big cores in 7ba4d0f82955. There's nothing useful reported via geli debugging. I've tried updating to releng/13.0 60e8939aa85b and it's still broken. My suspicion is f76393a6305b - whilst that just talks about AES-GCM, it does a reasonable job of roto-tilling the entire armv8crypto stack. I notice that there are a fixes to f76393a6305b that don't seem to have made it into releng/13.0 and I will continue to investigate. -- Peter Jeremy signature.asc Description: PGP signature
Re: lots of "no such file or directory" errors in zfs filesystem
On 2021-Feb-23 11:30:58 -0600, Chris Anderson wrote: >nope, it led a pretty boring life. that zfs filesystem was created on that >server and has been on the same two mirrored disks for its lifetime. Does the server have ECC RAM? Possibly it's a bitflip somewhere before the data got to disk. >prior to the upgrades) the server does have a relatively modest amount of >ram (2GB). dunno if that makes it more likely that these kinds of issues >get triggered. Low amounts of RAM are going to increase the IO load but shouldn't otherwise impact the filesystem consistency. I have a FreeBSD test system that's running ZFS in <1GB RAM and rebuilding itself daily for multiple years and haven't run into any ZFS corruption issues. -- Peter Jeremy signature.asc Description: PGP signature
Re: svn commit: r362848 - in stable/12/sys: net netinet sys
TL;DR: Ensure you explicitly destroy all ZFS labels on disused root pools. On 2020-Jul-19 21:21:02 +1000, Peter Jeremy wrote: >I'm sending this to -stable, rather than the src groups because I >don't believe the problem is the commit itself, rather the commit >has uncovered a latent problem elsewhere. > >On 2020-Jul-01 18:03:38 +, Michael Tuexen wrote: >>Author: tuexen >>Date: Wed Jul 1 18:03:38 2020 >>New Revision: 362848 >>URL: https://svnweb.freebsd.org/changeset/base/362848 >> >>Log: >> MFC r353480: Use event handler in SCTP > >I have no idea how, but this update breaks booting amd64 for me (r362847 >works and this doesn't). I have a custom kernel with ZFS but no SCTP so I >have no real idea how this could break booting - presumably the >eventhandler change has uncovered a bug somewhere else. To close the loop on this, the problem was a combination of: * changes in GEOM provider ordering; * insufficient checks when ZFS is looking for the root pool; * my system having remnants of a disused pool with the same name as the root poop. It seems that the order of GEOM providers is relatively unstable - even including a device, that doesn't physically exist, in a kernel can change the provider order. Presumably r362848 also resulted in a change in order. During a root-on-ZFS boot, the kernel scans all providers, looking for ZFS labels with a pool name matching the root pool. Only minimal checks are performed, in particular, there's no check that it's a valid pool, and the first such label found is assumed to describe the root pool. In my case, some time ago, I'd moved things around on my boot disk. My old root pool went to the end of the physical disk but I'd decided to shrink it and left some free space at the end of the disk. This meant that ZFS found one (out of 4) labels when it tasted the physical disk and if GEOM sorted the physical disk prior to its partitions then ZFS would use the pool GUIDs from the stray label on the physical disk and then fail to find a usable pool matching those GUIDs. My fix was to zero the end of my disk. -- Peter Jeremy signature.asc Description: PGP signature
Re: svn commit: r362848 - in stable/12/sys: net netinet sys
On 2020-Jul-21 00:47:23 +0300, Konstantin Belousov wrote: >On Tue, Jul 21, 2020 at 07:20:44AM +1000, Peter Jeremy wrote: >> On 2020-Jul-19 14:48:28 +0300, Konstantin Belousov >> wrote: >> >On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote: >> >> The symptoms are that I get: >> >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 >> >> more seconds >> >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6 >> >> >> >> (r363310 is where I was trying to update to and I didn't change the BE >> >> name as I was searching for the problem and error 6 is ENXIO). >> >> >> >> I tried to reproduce the problem with GENERIC but it hangs after >> >> displaying the EFI framebuffer information (I've seen that before and >> >> suspect it is a loader problem but haven't dug into it). >> >> I've confirmed that particular problem is bug 209821. I've disabled >> EFI and GENERIC r362848 boots and runs successfully. >Did you mis-typed the PR number ? The referenced bug talks about very >early hang, while your report said that kernel boots up to the point of >mounting root. My failure was with a custom kernel. Once I narrowed the problem to a commit that seemed unrelated to my problem, I tried to boot a GENERIC kernel at r362848. The GENERIC kernel boot failed much earlier due to the EFI problem documented in PR 209821. When I disabled EFI, then the GENERIC kernel worked, showing that my problem was due to my custom kernel. >> Since GENERIC worked, I did some more experimenting and tracked the >> problem down to a lack of "options ACPI_DMAR" in my kernel config. >> That makes more sense, though I have no idea why it suddenly became >> mandatory for my system. >No, this does not make too much sense either, since DMAR is disabled >by default. Did you enabled it ? "options ACPI_DMAR" has been in GENERIC since you first submitted the DMAR code was in r257251. I haven't ever set the hw.dmar.enable=1 loader tunable but it's not at all obvious that a kernel built without "options ACPI_DMAR" is functionally equivalent to a kernel that has DMAR compiled in but disabled - there's a lot of IOMMU manipulation code that is purely conditional on ACPI_DMAR. That said, I'm not using virtualisation and haven't actually enabled DMAR in the loader so I suspect that I've only masked the real issue. I currently have INVARIANTS and WITNESS but will look into some of the more extensive debugging options. (It looks like I missed the addition of "options ACPI_DMAR" when I was updating my custom kernel config with the differences between r250963 and r259512 about 8 years ago, and it hasn't caused any obvious problems until now. Obviously, I need to do a more careful review of my custom kernel config against GENERIC/NOTES). >BTW, you are using stable, right ? There were some code reorganization >commits in HEAD moving DMAR code around, but they were not merged to >stable. I'm using 12-STABLE. -- Peter Jeremy signature.asc Description: PGP signature
Re: svn commit: r362848 - in stable/12/sys: net netinet sys
On 2020-Jul-19 14:48:28 +0300, Konstantin Belousov wrote: >On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote: >> I'm sending this to -stable, rather than the src groups because I >> don't believe the problem is the commit itself, rather the commit >> has uncovered a latent problem elsewhere. >> >> On 2020-Jul-01 18:03:38 +, Michael Tuexen wrote: >> >Author: tuexen >> >Date: Wed Jul 1 18:03:38 2020 >> >New Revision: 362848 >> >URL: https://svnweb.freebsd.org/changeset/base/362848 >> > >> >Log: >> > MFC r353480: Use event handler in SCTP >> >> I have no idea how, but this update breaks booting amd64 for me (r362847 >> works and this doesn't). I have a custom kernel with ZFS but no SCTP so I >> have no real idea how this could break booting - presumably the >> eventhandler change has uncovered a bug somewhere else. >> >> The symptoms are that I get: >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 >> more seconds >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6 >> >> (r363310 is where I was trying to update to and I didn't change the BE >> name as I was searching for the problem and error 6 is ENXIO). >> >> I tried to reproduce the problem with GENERIC but it hangs after >> displaying the EFI framebuffer information (I've seen that before and >> suspect it is a loader problem but haven't dug into it). I've confirmed that particular problem is bug 209821. I've disabled EFI and GENERIC r362848 boots and runs successfully. >> Does anyone have any ideas? > >Did you checked that the physical devices where your ZFS pool is located, >are detected, and that kernel messages for their drivers are as usual ? >Overall, is there anything strange in the verbose dmesg ? There's nothing obviously strange (in particular, I can see the physical boot/root disk) but the faulty kernel appears to have moved the msgbuf somewhere unexpected so it's not saved across reboots and I'm limited to eyeballing the messages via DDB. Since GENERIC worked, I did some more experimenting and tracked the problem down to a lack of "options ACPI_DMAR" in my kernel config. That makes more sense, though I have no idea why it suddenly became mandatory for my system. -- Peter Jeremy signature.asc Description: PGP signature
Re: svn commit: r362848 - in stable/12/sys: net netinet sys
I'm sending this to -stable, rather than the src groups because I don't believe the problem is the commit itself, rather the commit has uncovered a latent problem elsewhere. On 2020-Jul-01 18:03:38 +, Michael Tuexen wrote: >Author: tuexen >Date: Wed Jul 1 18:03:38 2020 >New Revision: 362848 >URL: https://svnweb.freebsd.org/changeset/base/362848 > >Log: > MFC r353480: Use event handler in SCTP I have no idea how, but this update breaks booting amd64 for me (r362847 works and this doesn't). I have a custom kernel with ZFS but no SCTP so I have no real idea how this could break booting - presumably the eventhandler change has uncovered a bug somewhere else. The symptoms are that I get: Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 more seconds Mounting from zfs:zroot/ROOT/r363310 failed with error 6 (r363310 is where I was trying to update to and I didn't change the BE name as I was searching for the problem and error 6 is ENXIO). I tried to reproduce the problem with GENERIC but it hangs after displaying the EFI framebuffer information (I've seen that before and suspect it is a loader problem but haven't dug into it). Does anyone have any ideas? -- Peter Jeremy signature.asc Description: PGP signature
Re: swap space issues
On 2020-Jun-28 12:33:21 -0700, Donald Wilde wrote: >On 6/28/20, Donald Wilde wrote: >> On 6/27/20, Donald Wilde wrote: >>> 'spinning rust' for a disk. My loader.conf has >>> kern.maxswzone=420 and ccache is fully active and working for both >>> root on tcsh and users on sh. Based on my calculations, that maxswzone is good for just under 1GB swap. What do you see have for vm.swap_maxpages and vm.swzone? >> Synth is still crashing hard, same issue. >An update. Synth still crashed with one swap zone of 16GB. What do you mean by "swap zone"? Do you mean you have one 16GB swap device? >stack overflow. As I say, there was no warning. Everything was fine, >then memory usage went through the roof! I've just tried building llvm80 via ports[1] on my laptop, using the same options as you. I have 4GB RAM and 4GB swap with system defaults and had no problems with an 8-way build. The highest swap usage I noticed was <500MB. I suspect your problems are related to either ccache or synth. >The second one, hopefully, contains every log up to the one that >crashed and hopefully also the beginning of that task. As I say, ONE >builder and ONE task, after a reboot. LLVM80 was the only builder >input. "one builder and one task" - these are presumably synth terms since they aren't standard ports building terms. You should be able to do a single-theaded build of llvm80 in 4GB RAM without problems. That said, I notice that the first log file suggests you were building 3 ports in parallel, and each port build was running 3 jobs - that's 9 jobs in parallel on a low-spec CPU with 4 threads. You should limit the number of CPU-bound processes to the number of CPU threads you have. [1] cd /usr/ports/devel/llvm80 && make -- Peter Jeremy signature.asc Description: PGP signature
Re: swap space issues
On 2020-Jun-25 11:30:31 -0700, Donald Wilde wrote: >Here's 'pstat -s' on the i3 (which registers as cpu HAMMER): > >Device 1K-blocks UsedAvail Capacity >/dev/ada0s1b 335544320 33554432 0% >/dev/ada0s1d 335544320 33554432 0% >Total671088640 67108864 0% I strongly suggest you don't have more than one swap device on spinning rust - the VM system will stripe I/O across the available devices and that will give particularly poor results when it has to seek between the partitions. Also, you can't actually use 64GB swap with 4GB RAM. If you look back through your boot messages, I expect you'll find messages like: warning: total configured swap (524288 pages) exceeds maximum recommended amount (498848 pages). warning: increase kern.maxswzone or reduce amount of swap. or maybe: WARNING: reducing swap size to maximum of MB per unit The absolute limit on swap space is vm.swap_maxpages pages but the realistic limit is about half that. By default the realistic limit is about 4×RAM (on 64-bit architectures), but this can be adjusted via kern.maxswzone (which defines the #bytes of RAM to allocate to swzone structures - the actual space allocated is vm.swzone). As a further piece of arcana, vm.pageout_oom_seq is a count that controls the number of passes before the pageout daemon gives up and starts killing processes when it can't free up enough RAM. "out of swap space" messages generally mean that this number is too low, rather than there being a shortage of swap - particularly if your swap device is rather slow. -- Peter Jeremy signature.asc Description: PGP signature
Re: New Xorg - different key-codes
On 2020-Mar-11 10:29:08 +0100, Niclas Zeising wrote: >This has to do with switching to using evdev to handle input devices on >FreeBSD 12 and CURRENT. There's been several reports, and suggested >solutions to this, as well as an UPDATING entry detailing the change. The UPDATING entry says that it's switched from devd to udev. There's no mention of evdev or that the keycodes have been roto-tilled. It's basically a vanilla "things have been changed, see the documentation" entry. Given that entry, it's hardly surprising that people are confused. -- Peter Jeremy signature.asc Description: PGP signature
Re: ntp problems stratum 2 to 14?
Hi Dewayne, Sorry for the delay. Unfortunately, I can't really suggest anything - it's not clear to me why ntpd would prefer a stratum 14 clock over a stratum 2 clock. Have you tried looking through the debugging hints page (https://www.eecis.udel.edu/~mills/ntp/html/debug.html)? I haven't seen that problem but I don't use the local clock. During startup, it would not seem unreasonable for the local clock to become valid first because it will have a lower jitter. But ntpd should switch to the stratum 2 clock and stay with in as the better time source. One problem is that if ntpd decides to switch away from the clock for any reason (eg a burst of jitter), it may get stuck on the local clock as it drifts further from "real" time. -- Peter Jeremy signature.asc Description: PGP signature
Re: ntp problems stratum 2 to 14?
On 2020-Feb-26 16:37:43 +1100, Dewayne Geraghty wrote: >I usually run ntpd with both aslr and as user ntpd. While testing I >noticed that my server with a direct network cable to my main time keeper, >jumped from the expected stratum 2 to 14 as follows (I record the date so I >can synch with the debug log, also below): > >vm.loadavg={ 0.09 0.10 0.18 } > >Wed 26 Feb 2020 15:16:38 AEDT > remote refid st t when poll reach delay offset > jitter >== > 10.0.7.6203.35.83.2422 u 44 64 3770.147 -227.12 33.560 >*127.127.1.1 .LOCL. 14 l 59 128 3770.0000.000 0.000 >26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad Why is this bad? You've specified that this is a valid clock source so ntpd is free to use it if it decides it is the best source of time. >server 127.127.1.1 minpoll 7 maxpoll 7 >fudge 127.127.1.1 stratum 14 Synchronizing to the local clock (ie using 127.127.1.x as a reference) is almost never correct. What external (to NTP) source is being used to synchronize the local clock? >I'm also very surprised that the jitter on the server (under testing) is so >poor. The internet facing time server is >*x.y.z.t .ATOM. 1 u 73 5127 23.776 34.905 95.961 >but its very old and not running aslr. The 23ms distance to the peer suggests that this is over the Internet. What sort of link do you have to the Internet and how heavily loaded is it? The NTP protocol includes the assumption that the client-server path delay is symmetric - this is often untrue for SOHO connections. And SOHO connections will often wind up saturated in one direction - which skews the apparent timestamps and shows up as high jitter values. > /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g -u ntpd --nofork ... >I get similar results with /usr/sbin/ntpd, I've been testing both and >happened to record details for the port ntpd. It's probably not relevant but it would be useful for you to say up front which ntpd you are having problems with and which version of the port you have installed. -- Peter Jeremy signature.asc Description: PGP signature
Re: linker.hints not being update for ARMs
On 2019-Nov-12 10:30:21 +0200, Daniel Braniss wrote: > warning: KLD '/boot/kernel/wlan.ko' is newer than the linker.hints file > warning: KLD '/boot/kernel/rtwn.ko' is newer than the linker.hints file ... >the link.hints is indeed very old : >neo-000# ls -ls /boot/kernel/linker.hints >224 -rw-r--r-- 1 root wheel 228972 Jan 1 2010 /boot/kernel/linker.hints Well, that's a nonsense timestamp because FreeBSD didn't support AllWinner in 2010. My guess is that your system clock was wrong. >how can this be fixed? Try rerunning kldxref (with the clock set correctly). -- Peter Jeremy signature.asc Description: PGP signature
Re: `uname -a' can't display revision
On 2019-Aug-20 14:36:14 +0200, Trond Endrestøl wrote: >Maybe NFS is to blame, particularly if file locks cannot be obtained. Yes, it is. SVN tries to obtain locks, even for read-only commands like "svn info". My solution is to mount /usr/src with the option "nolockd". -- Peter Jeremy signature.asc Description: PGP signature
Re: Issues starting unbound on boot
On 2019-Apr-30 19:44:36 +, Markus Wipp wrote: >I currently face an issue, where I don’t know further on why this happens and >what I could do about it. >I hope that this is the correct list to ask my question. If not please let me >know where else I might try my luck. >I installed unbound from ports, configured it and can start / stop it from >command line with service unbound start without any problems. >But whenever I reboot the machine it just doesn’t get started. The only >information I was able to find out so far can be found in /var/log/messages: >root: /etc/rc: WARNING: failed to start unbound I have seen unbound fail to start for a variety of reasons but in all cases, it has written a useful hint to the console. Can you confirm that it's not writing anything to your console. Are you able to share your configuration? -- Peter Jeremy signature.asc Description: PGP signature
Re: about zfs and ashift and changing ashift on existing zpool
On 2019-Apr-07 16:36:40 +0100, tech-lists wrote: >storage ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 >replacing-0 ONLINE 0 0 1.65K > ada2 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 block size: 512B configured, > 4096B native >ada3 ONLINE 0 0 0 >ada4 ONLINE 0 0 0 > >What I'd like to know is: > >1. is the above situation harmful to data In general no. The only danger is that ZFS is updating the uberblock replicas at the start and end of the volume assuming 512B sectors which means you are at a higher risk or losing one of the replica sets if a power failure occurs during an uberblock update. >2. given that vfs.zfs.min_auto_ashift=12, why does it still say 512B > configured for ada1 which is the new disk, or.. The pool is configured with ashift=9. >3. does "configured" pertain to the pool, the disk, or both "configured" relates to the pool - all vdevs match the pool >4. what would be involved in making them all 4096B Rebuild the pool - backup/destroy/create/restore >5. does a 512B disk wear out faster than 4096B (all other things being > equal) It shouldn't. It does mean that the disk is doing read/modify/write at the physical sector level but that should be masked by the drive cache. >Given that the machine and disks were new in 2016, I can't understand why zfs >didn't default to 4096B on installation I can't answer that easily. The current version of ZFS looks at the native disk blocksize to determine the pool ashift but I'm not sure how things were in 2016. Possibilities include: * The pool was built explicitly with ashift=9 * The initial disks reported 512B native (I think this is most likely) * That version of ZFS was using logical, rather than native blocksize. My guess (given that only ada1 is reporting a blocksize mismatch) is that your disks reported a 512B native blocksize. In the absence of any override, ZFS will then build an ashift=9 pool. -- Peter Jeremy signature.asc Description: PGP signature
Re: dmesg submission service -- please submit today
On 2018-Oct-07 23:41:43 +, Roger Leigh wrote: >Out of interest, has FreeBSD considered implementing an equivalent of >Debian's "popularity-contest" package, which periodically submits >anonymised lists of installed packages? On FreeBSD this could be from >the pkg database, and could also include hardware information. There's ports/sysutils/bsdstats but I'm not sure how popular that is. -- Peter Jeremy signature.asc Description: PGP signature
Re: FCP-0101: Deprecating most 10/100 Ethernet drivers
On 2018-Oct-04 08:44:11 +, Alexey Dokuchaev wrote: >Looking at the commits they require near zero maintenance. What exactly >is the burden here? As various others have stated, this isn't true. All the code in FreeBSD has an ongoing maintenance cost and is an impediment to adding new features. There is no point in spending valuable developer effort to update drivers and test them with unusual/obsolete hardware unless those drivers are going to actually be used. >Another question: why the fuck FreeBSD likes to kill >non-broken, low-volatile and perfectly working stuff? That language is uncalled for. >We offer probably >the best NIC driver support on the block, yet you're proposing to shrink >one of the few areas where we shine. WTF?! Supporting NICs that no-one uses doesn't benefit anyone. No-one is talking about removing NICs that are in active use. >ae(4) was used in Asus EeePC 701/900 which are still popular among hackers. Those netbooks are more than a decade old now and I don't expect many are still functional. Will people still expect to use them with FreeBSD 13 in 5 years time? >As it can be seen this list tends to cover nearly all 100 cards, yet no >one (pardon me if I missed those) asks for 10. So how about making this >proposal cover only 10 cards, What is the purpose in keeping unused FastEthernet cards in the tree? >if you can't resist the itch to remove >something from the tree? Again, that language is uncalled for. -- Peter Jeremy signature.asc Description: PGP signature
ZFS+find(1) wiring all RAM
I've noticed that 11-stable/amd64 has been wiring seemingly excessive amounts of RAM for some time (the problem goes back at least 6 months). This extends to getting ENOMEM errors from g_io_deliver() and out-of-swap errors killing processes on a low-memory system. I'm not sure when it started by it seems to hawe gotten worse between r331535 and r334494. I can see the "excessive wired memory" on my main home system with 32GB RAM but haven't seen it completely run out of RAM. After some gentle use and a nightly run, there is 10GB more wired RAM than ARC. My "low memory" system is a Google GCE f1-micro instance[*] (600MB RAM) with about 723k inodes used and the following ZFS tuning: vfs.zfs.arc_max="128M" vfs.zfs.arc_meta_limit="50M" vfs.zfs.arc_min="25M" The following numbers were gatherer by looking at top(1). Running r334494, after booting, to multi-user, the system has about 187MB wired (94MB ARC). If I then run /etc/periodic/security/100.chksetuid, wired RAM increases to about 580MB, with 380MB ARC, dropping to 467MB and 217MB ARC when the script exits (this is still nearly twice arc_max). Free memory can drop to <10KB whilst the find(1) is running. I have several issues with this behaviour: 0) ARC usage can significantly exceed arc_max. I understand that arc_max is a soft limit but IMO, 3x is unreasonable - especially when the system is under extreme memory pressuse. 1) Significant amounts of wired memory are in use but I can't find anything in "vmstat -mz" that would explain where it's going. Does anyone have any suggestions for digging into this? [1] I get the same behaviour using a VBox instance with similar dimensioning and the same tuning) -- Peter Jeremy signature.asc Description: PGP signature
Re: Problems building 11-stable/i386 with readonly /usr/src
On 2018-Feb-18 09:06:38 -0600, Kyle Evans <kev...@freebsd.org> wrote: >On Sun, Feb 18, 2018 at 3:12 AM, Peter Jeremy <pe...@rulingia.com> wrote: >> Sometime between r329122 and r329157, my 11-stable i386 box stopped >> being able to buildworld with a readonly /usr/src. I've been updating >> regularly but the problem still remains at r329450. I don't have any >> problems building the same tree on amd64 or building head on i386 or >> amd64. Does anyone have any ideas? >> >> Starting from an empty /usr/obj, the failure is: >> ... > >This would have come in with the recent MFC of imp@'s rototilling. I >seem to recall some build system funkiness that put .OBJDIR inside the >src tree inconsistently before recent-ish changes in head. CC'ing >bdrewery@ and imp@ in hopes they have an idea of how to handle this in >stable/11. The offending ln invocation would be this one: >https://svnweb.freebsd.org/base/stable/11/stand/defs.mk?view=markup#l178 Thanks for that. I added some debug code to stand/defs.mk and confirmed that in stand/efi, the ${_ILINKS} target is invoked in /usr/src/stand/efi, whereas in (eg) stand/zfs, it is invoked in /usr/obj/usr/src/stand/efi. The main difference is that SUBDIR is empty on i386 but non-empty on amd64. If I add i386 to the main build list (see patch below) then it all works. I'm not sure why efi isn't built on i386 because boot1, libefi and loader all support i386. (This obviously is a work-around rather than a real fix but might be an option if the relevant head changes can't be MFC'd immediately). [Caution: copy and paste, tabs have been converted to spaces] Index: stand/efi/Makefile === --- stand/efi/Makefile (revision 329477) +++ stand/efi/Makefile (working copy) @@ -14,7 +14,8 @@ .if ${MACHINE_CPUARCH} == "aarch64" || \ ${MACHINE_CPUARCH} == "amd64" || \ -${MACHINE_CPUARCH} == "arm" +${MACHINE_CPUARCH} == "arm" || \ +${MACHINE_CPUARCH} == "i386" SUBDIR+= libefi loader boot1 .endif -- Peter Jeremy signature.asc Description: PGP signature
Problems building 11-stable/i386 with readonly /usr/src
Sometime between r329122 and r329157, my 11-stable i386 box stopped being able to buildworld with a readonly /usr/src. I've been updating regularly but the problem still remains at r329450. I don't have any problems building the same tree on amd64 or building head on i386 or amd64. Does anyone have any ideas? Starting from an empty /usr/obj, the failure is: ... >>> stage 4.3: building everything ... ===> stand/zfs (all) Building /usr/obj/usr/src/stand/zfs/machine machine -> /usr/src/sys/i386/include Building /usr/obj/usr/src/stand/zfs/x86 x86 -> /usr/src/sys/x86/include Building /usr/obj/usr/src/stand/zfs/zfs.o Building /usr/obj/usr/src/stand/zfs/skein.o Building /usr/obj/usr/src/stand/zfs/skein_block.o Building /usr/obj/usr/src/stand/zfs/libzfsboot.a building static zfsboot library ===> stand/efi (all) machine -> /usr/src/sys/i386/include ln: machine: Read-only file system *** Error code 1 Stop. make[4]: stopped in /usr/src/stand/efi .ERROR_TARGET='machine' .ERROR_META_FILE='' .MAKE.LEVEL='4' MAKEFILE='' .MAKE.MODE='meta missing-filemon=yes missing-meta=yes silent=yes verbose' _ERROR_CMD='.PHONY' .CURDIR='/usr/src/stand/efi' .MAKE='make' .OBJDIR='/usr/src/stand/efi' .TARGETS='all' DESTDIR='/usr/obj/usr/src/tmp' LD_LIBRARY_PATH='' MACHINE='i386' MACHINE_ARCH='i386' MAKEOBJDIRPREFIX='/usr/obj' MAKESYSPATH='/usr/src/share/mk' MAKE_VERSION='20170720' PATH='/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin' SRCTOP='/usr/src' OBJTOP='/usr/src' .MAKE.MAKEFILES='/usr/src/share/mk/sys.mk /usr/src/share/mk/local.sys.env.mk /usr/src/share/mk/src.sys.env.mk /etc/src-env.conf /usr/src/share/mk/bsd.mkopt.mk /etc/make.conf /usr/src/share/mk/local.sys.mk /usr/src/share/mk/src.sys.mk Makefile /usr/src/share/mk/bsd.init.mk /usr/src/share/mk/bsd.opts.mk /usr/src/share/mk/bsd.cpu.mk /usr/src/share/mk/local.init.mk /usr/src/share/mk/src.init.mk /usr/src/stand/efi/../Makefile.inc /usr/src/stand/efi/../defs.mk /usr/src/share/mk/src.opts.mk /usr/src/share/mk/bsd.own.mk /usr/src/share/mk/bsd.compiler.mk /usr/src/share/mk/bsd.compiler.mk /usr/src/share/mk/bsd.subdir.mk' .PATH='. /usr/src/stand/efi' *** Error code 1 -- Peter Jeremy signature.asc Description: PGP signature
Unkillable process in "vm map (user)"
I was experimenting with ports/devel/libmill (which is a library that provides Go-styly functionality for C programs) and managed to create an unkillable process by spawning 100 "goroutines" (think very cheap "thread" or "coroutine") joined by "channels" (think message passing pipes). (The program ran basically instantaneously with 1 or 10 "goroutines", and the Go version has no problems with 100 goroutines on a much smaller system). According to SIGINFO, it's blocked on "vm map (user)" but I can't kill it. Can anyone suggest a way to unwedge it? This is on a system running FreeBSD/amd64 11.1-STABLE r324494. server% procstat -kk 452 PIDTID COMMTDNAME KSTACK 452 102382 chain - mi_switch+0x17c sleepq_switch+0x118 sleepq_wait+0x43 _sx_slock_hard+0x34e _sx_slock+0xd4 vm_map_lookup+0xbd vm_fault_hold+0x194b vm_fault+0x75 trap_pfault+0x107 trap+0x382 calltrap+0x8 server% ps -wal -p 452 UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 204 452 53567 0 20 0 244064932 2180 vm map ( DL+ 13 0:10.31 ./chain 100 server% cat src/mill/chain.c #include #include #include coroutine void f(chan left, chan right) { chs(left, int, 1 + chr(right, int)); } int main(int argc, char **argv) { int i, n = argv[1] ? atoi(argv[1]) : 1; chan leftmost = chmake(int, 0); chan left = NULL; chan right = leftmost; for (i = 0; i < n; i++) { left = right; right = chmake(int, 0); go(f(left, right)); } chs(right, int, 0); i = chr(leftmost, int); printf("result = %d\n", i); return 0; } server% -- Peter Jeremy signature.asc Description: PGP signature
Re: Unable to boot 11.0-release on Unisurf Notebook
On 2017-Mar-27 00:39:32 +1100, Ian Smith <smi...@nimnet.asn.au> wrote: >I did have a look at [1] http://unisurf.com.au/unisurf-14-Notebook.html >and wondered who really made them. INSYDE Corp., I see. Cutely, their >ACPI tables are mostly listed as 'INTEL INSYDE' :) and it's called a >'CherryTrail'; all news to me. Looks pretty well locked in to Windows. Well, ark.intel.com lists the Atom x5-Z8350 as "products formerly Cherry Trail". >I wondered what possessed you to buy it, going on specs and 'manual'? >and what the '32GB Storage' might denote. Good thing you could return >it. For reference, just how cheap are they in AU? Something cheap and cheerful as a portable to play on - there wasn't enough information available online to determine whether it would be usable outside Windows. Aldi's "no questions" returns policy was one of the things that swayed me to try it. It was AUD249 but I've found something by "Pendo" for AUD229 that looks like it came off the same production line. I hadn't realised just how weird the insides of some "PC compatible" computers had become. -- Peter Jeremy signature.asc Description: PGP signature
Re: Unable to boot 11.0-release on Unisurf Notebook
On 2017-Mar-25 10:19:57 +1100, Peter Jeremy <pe...@rulingia.com> wrote: >I've just bought a Unisurf Notebook[1] and am trying to boot it from a >FreeBSD-11.0-RELEASE-amd64-memstick.img. The boot starts OK but hangs >whilst probing devices. With safe and verbose enabled, it hangs after >complaining about ppc0 (see https://goo.gl/photos/3e7tLWygjsQ6ayBT9). >At this point neither Ctrl-Alt-Del nor Ctrl-Alt-Esc have any effect and >the only option is to hold the power button down until it powers off. For the record, I have both good and bad news: The good news is that setting hint.uart.1.disabled="1" let it boot. The bad news is that FreeBSD-11 (I didn't try head) can't see the eMMC flash. The worse news is that the WiFi adapter is attached to the SDIO. I've given up and returned it. If anyone's interested, I've posted dmesg and similar information from both FreeBSD-11 and xubuntu 16.04.2 at https://www.rulingia.com/~peter/unisurf/ -- Peter Jeremy signature.asc Description: PGP signature
Unable to boot 11.0-release on Unisurf Notebook
I've just bought a Unisurf Notebook[1] and am trying to boot it from a FreeBSD-11.0-RELEASE-amd64-memstick.img. The boot starts OK but hangs whilst probing devices. With safe and verbose enabled, it hangs after complaining about ppc0 (see https://goo.gl/photos/3e7tLWygjsQ6ayBT9). At this point neither Ctrl-Alt-Del nor Ctrl-Alt-Esc have any effect and the only option is to hold the power button down until it powers off. Does anyone have any suggestions on troubleshooting? [1] http://unisurf.com.au/unisurf-14-Notebook.html -- Peter Jeremy signature.asc Description: PGP signature
Re: removing SVR4 binary compatibilty layer
On 2017-Feb-14 10:32:32 -0800, Gleb Smirnoff <gleb...@freebsd.org> wrote: > After some discussion on svn mailing list [1], there is intention >to remove SVR4 binary compatibilty layer from FreeBSD head, meaning >that FreeBSD 12.0-RELEASE, available in couple of years would >be shipped without it. There is no intention of merge of the removal. >The stable@ mailing list added for wider audience. Can I suggest that we put some warnings into the SVr4 image activation code and MFC that to at least 11 to try and smoke out anyone who might actually be using it. -- Peter Jeremy signature.asc Description: PGP signature
Re: Help! two machines ran out of swap and corrupted their zpools!
On 2016-Nov-22 10:07:49 +, Pete French <petefre...@ingresso.co.uk> wrote: >to another machine and trying to import the pools causes an instant panic. Can you provide details of the panic, please. -- Peter Jeremy signature.asc Description: PGP signature
Re: zfs, a directory that used to hold lot of files and listing pause
Have you done any ZFS tuning? Could you try installing ports/sysutils/zfs-stats and posting the output from "zfs-stats -a". That might point to a bottleneck or poor cache tuning. -- Peter Jeremy signature.asc Description: PGP signature
Re: Reproducible panic - Going nowhere without my init!
On 2016-Oct-04 11:14:38 +1000, Andy Farkas <chuzzwa...@gmail.com> wrote: >Is it just me or > >Step 1: boot >Step 2: login as root >Step 3: type "w" * >Step 4: type "shutdown now; logout" >Step 5: press at the 'Enter full pathname of shell or RETURN for >/bin/sh:' prompt >Step 6: type "reboot" >Step 7: get a Panic: "Going nowhere without my init!" > >* The panic will not happen if you skip step 3. > >The panic will not happen if you type "sync; sync; sync" after step 5. > >The panic will not happen if you wait (an unknown amount of) some time >after step 5. I can reproduce this on the console of my GCE instance but the timing seems important. It doesn't seem to fail if I ssh in or if I pause between any of the commands. ... gce1# w 7:47PM up 38 secs, 1 users, load averages: 0.69, 0.22, 0.08 USER TTY FROM LOGIN@ IDLE WHAT root u0 - 7:47PM - w gce1# shutdown now;logout Shutdown NOW! shutdown: [pid 1071] Stopping cron. Stopping sshd. Stopping ntpd. Stopping local_unbound. Stopping devd. Writing entropy file:. Writing early boot entropy file:. Terminated . Oct 6 19:47:09 pflog0: promiscuous mode disabled Enter full pathname of shell or RETURN for /bin/sh: gce1# reboot Oct 6 19:47:17 init: single user shell terminated. init died (signal 0, exit 0) panic: Going nowhere without my init! Uptime: 55s Changing serial settings was 0/0 now 3/0 Start bios (version 1.7.2-20150226_170051-google) gce1$ uname -a FreeBSD gce1.rulingia.com 11.0-PRERELEASE FreeBSD 11.0-PRERELEASE #83 r306704M: Thu Oct 6 13:22:27 AEDT 2016 r...@gce1.rulingia.com:/usr/obj/usr/src/sys/GCE amd64 I haven't investigated the cause yet. -- Peter Jeremy signature.asc Description: PGP signature
FreeBSD 11.0-BETA2 won't boot on an Acer Aspire 5560
I'm trying to boot the 11.0-BETA2/amd64 memory stick image and the kernel panics: (Following copied by hand): ACPI APIC Table: ... acpi0: on motherboard ACPI Error: Hardware did not change modes (20160527/hwacpi-160) ACPI Error: Could not transition to APCI mode (20160527/evxfevnt-105) ACPI Warning: AcpiEnable failed (20160527/utxfinit-184) acpi0: Could not enable ACPI: AE_NO_HARDWARE_RESPONSE device_attach: acpi0 attach returned 6 Followed by a NULL dereference panic at nexus_acpi_attach+0x89 The system boots a 10.0-RELEASE/amd64 memstick (the only other image I have conveniently to date) without problem. -- Peter Jeremy signature.asc Description: PGP signature
Re: HAST, zfs and local mirroring
On 2016-Jun-03 22:12:55 +0700, Eugene Grosbein <eu...@grosbein.net> wrote: >> all your media content is valid. I've also had bad experiences with >> gmirror volumes silently getting out of sync on a crash. > > >gmirror or (gmirror+gjournal) ones? Plain gmirror. -- Peter Jeremy signature.asc Description: PGP signature
Re: HAST, zfs and local mirroring
On 2016-Jun-02 12:12:35 +0500, "Eugene M. Zheganin" <e...@norma.perm.ru> wrote: >differs a lot ? And why should I prefere this overcomplicated scheme >over the geom_mirror, which seems rather simple when comparing. Seems >like I can point HAST to /dev/mirror/whatever device, right ? Because using RAID of any sort under ZFS defeats a lot of the smarts in ZFS. In particular, you can no longer rely on scrub verifying that all your media content is valid. I've also had bad experiences with gmirror volumes silently getting out of sync on a crash. -- Peter Jeremy signature.asc Description: PGP signature
Re: 10-STABLE hangups frequently
On 2016-Feb-04 11:45:56 +1030, Shane Ambler <free...@shaneware.biz> wrote: >Going by figures shown in top, ARC is usually in the 1500M to 2000M >range but when wired gets over 6GB I often see ARC drop to 500MB which >I now realise matches arc_min. That's definitely abnormal. You might like to run "vmstat -mz" when the system is running normally and as the non-ARC wired memory increases to identify where the RAM is going. -- Peter Jeremy signature.asc Description: PGP signature
Re: 10-STABLE hangups frequently
On 2016-Feb-02 14:52:37 +0200, Konstantin Belousov <kostik...@gmail.com> wrote: >Please gather the information listed at >https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Interestingly, as soon as I enable INVARIANTS, my kernel will no longer boot. All the other options listed there are OK. The console output ends: TSC timecounter discards lower 1 bit(s) Timecounter "TSC-low" frequency 1150026690 Hz quality 800 hwpc_core: unknown PMC architecture: 0 hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> init_KVMCLOCK_tc: 0x000b KVM-style paravirtualized clock detected. Timecounter "KVMCLOCK" frequency 10 Hz quality 1000 WARNING: WITNESS option enabled, expect reduced performance. WARNING: DIAGNOSTIC option enabled, expect reduced performance. Trying to mount root from zfs:zroot []... <> Unfortunately, I don't have write access to the console so I can't do anything other than reboot at this point. (This is 10-stable/amd64 r295088). If I have some spare time, I'll try reproducing this in a local VBox over the next few days. -- Peter Jeremy signature.asc Description: PGP signature
Re: 10-STABLE hangups frequently
On 2016-Feb-03 18:23:13 +1030, Shane Ambler <free...@shaneware.biz> wrote: >Any chance you get high wired allocations? A high wired allocation is normal for ZFS - ARC shows up as "wired" memory. >Sometimes several times in a day I see the wired amount shown in top >rise to over 6GB (of 8GB) bringing the system to a crawl. When wired >gets over 7GB the system rarely recovers. The ARC limit defaults to 1GB less than physical RAM so 6GB wired on an 8GB system isn't unexpected (my home system currently has 30GB wired out of 32GB). If this is causing problems for your workload, it sounds like you may need to explicitly reduce vfs.zfs.arc_max (note that this is a soft limit). You might like to install sysutils/zfs-stats and do some ZFS tunung. -- Peter Jeremy signature.asc Description: PGP signature
Re: 10-STABLE hangups frequently
On 2016-Feb-02 16:55:46 +0900, Hajimu UMEMOTO <u...@mahoroba.org> wrote: >I'm disturbed by a frequent hangup of my 10-STABLE boxes since this >year. It seems occur during running the periodic daily scripts. >I've narrowed which commit causes this problem. It seems r292895 >causes it. I see many `Resource temporarily unavailable' message just >before hangup occurs. >Any idea? As others have said, you need to provide lots more detail on your configuration. That said, I'm seeing something potentially similar on a Google Compute Engine f1-micro instance (1 vCPU, 0.6GB RAM) that is running FreeBSD 10-stable/amd64 with ZFS but basically idle. (Yes, I realize that's very little RAM for ZFS but I previously had no problems with things like buildworld). There were no problems at r290231 but after I upgraded to r295005, I started seeing "out of swap" errors and hangs during the periodic daily runs. I'm not seeing this on 1GB instances - though they are all running UFS. Some experimentation suggested that just "find /" was enough to wedge my system. I did some experimenting and found that the following loader config was enough to prevent it hanging: vfs.zfs.arc_max="128M" vfs.zfs.arc_meta_limit="50M" vfs.zfs.arc_min="25M" (previously, I had no ZFS tuning at all). One odditity was that I would semi-regularly see: kernel: pid 67431 (ntpd), uid 0, was killed: out of swap space I haven't worked out why the OOM killer preferred ntpd to anything else - it didn't seem to be bigger. And I didn't see any signs that swap space was being consumed (though I haven't done a scientific examination). (Note that swap is on a raw partition). The behaviour is definitely a regression and my initial suspicion is ZFS, though I haven't identified any smoking gun. Unfortunately, GCE only offers read access to the console, so I can't use DDB to poke around after it wedges. -- Peter Jeremy signature.asc Description: PGP signature
Re: dev/random warning on 10-STABLE after r292122 up till r292855
On 2016-Jan-04 16:44:49 -0500, Mark Saad <nones...@longcount.org> wrote: >On boot dmesg logs the following warning not seen on 10.2-RELEASE amd64. > >random device not loaded; using insecure entropy When I first noticed this, I investigated and worked out that it's related to how the random device initialises itself and its data and entropy sources. In particular, it reflects the state of the random device at that point in time, not at any later point when random data is actually requested. I agree that the wording of this message could unnecessarily alarm a sysadmin and think it could be done better. IMHO, this sort of alamist message should only be output if there is no decent entropy source available when the random device is unblocked. -- Peter Jeremy signature.asc Description: PGP signature
Re: Swap Usage
[reformatted] On 2015-Jul-29 17:41:33 -0700, Doug Hardie bc...@lafn.org wrote: I have several FreeBSD 9.3 systems that are using swap and I can’t figure out what is doing it. The key system has 6GB swap and currently it has over 2GB in use. Is the system currently paging (top(1) and systat -v will show this)? If not, this just means that at some time in the past, the system was under memory pressure and paged some process memory out. Since then, that memory hasn't been touched so the system hasn't paged it in. ps shows only a kernel module [intr] with a W status. 'W' means the whole process is 'swapped' out - this will only occur under severe RAM pressure. Normally, the system will just page out inactive parts of a processes address space - and none of the ps flags will show this. How do I figure out what that swap space is being used for? I don't think this can be trivially done. procstat -v will show the number of resident pages within each swap-backed region, any pages in that region that have been touched but are not resident are on the swap device but any pages that have never been touched aren't counted at all. -- Peter Jeremy pgp_HZj6rM0Rp.pgp Description: PGP signature
Re: Will 10.2 also ship with a very stale NTP?
On 2015-Jul-12 09:41:43 -0600, Ian Lepore i...@freebsd.org wrote: And let's all just hope that a week or two of testing is enough when jumping a major piece of software forward several years in its independent evolution. Whilst I support John's desire for NTP to be updated, I also do not think this is the appropriate time to do so. That said, the final decision is up to re@. The import of 4.2.8p2 several months ago resulted in complete failure of timekeeping on all my arm systems. Just last week I tracked it down to a kernel bug (which I haven't committed the fix for yet). While the bug has been in the kernel for years, it tooks a small change in ntpd behavior to trigger it. Granted it's an odd corner-case problem that won't affect most users because they just use the stock ntp.conf file (and it only affects systems that have a large time step due to no battery-backed clock). But it took me weeks to find enough time to track down the cause of the problem. I'm not using the stock ntp.conf on my RPis and didn't notice any NTP issues. Are you able to provide more details of either the ntp.conf options that trigger the bug or the kernel bug itself? A quick search failed to find anything. -- Peter Jeremy pgpN60GLK7zew.pgp Description: PGP signature
Re: Will 10.2 also ship with a very stale NTP?
On 2015-Jul-13 04:31:40 +1000, Peter Jeremy pe...@rulingia.com wrote: The import of 4.2.8p2 several months ago resulted in complete failure of timekeeping on all my arm systems. Just last week I tracked it down to a kernel bug (which I haven't committed the fix for yet). While the bug has been in the kernel for years, it tooks a small change in ntpd behavior to trigger it. Ah... I just saw r285424. -- Peter Jeremy pgppTLPyM4JSi.pgp Description: PGP signature
Re: Will 10.2 also ship with a very stale NTP?
On 2015-Jul-11 23:22:56 -0400, Chris Nehren cnehren+freebsd-sta...@pobox.com wrote: On Sat, Jul 11, 2015 at 09:58:11 +1000, John Marshall wrote: It's me again with my annual NTP whinge. The answer to the perennial will release $foo ship with old / insecure / otherwise deficient $bar? is still install $bar from ports. That's a non-answer. It just changes the question to why bother to include $bar in base when I need to install the port anyway. -- Peter Jeremy pgpKqgl4dmmHG.pgp Description: PGP signature
Re: dev.cpu.0.freq disapeared
On 2015-Mar-22 00:58:55 +0300, Dmitry Sivachenko trtrmi...@gmail.com wrote: I have a machine with the following processor: CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2400.14-MHz K8-class CPU) Origin=GenuineIntel Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 ... After I upgraded to 10.1-STABLE #0 r279956, this sysctl disapeared. % sysctl dev.cpu.0.freq sysctl: unknown oid 'dev.cpu.0.freq': No such file or directory % What OIDs do you have? Does dev.cpu.0 exist? How about dev.cpu? Can you set 'debug.cpufreq.verbose=1' in /boot/loader.conf and post (or make available) the dmesg from a verbose boot. -- Peter Jeremy pgp18RHROUkCO.pgp Description: PGP signature
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 2013-Mar-04 16:48:18 -0600, Karl Denninger k...@denninger.net wrote: The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. What sort of disks? SAS or SATA? also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. When the system has stalled: - Do you see very low free memory? - What happens to all the different CPU utilisation figures? Do they all go to zero? Do you get high system or interrupt CPU (including going to 1 core's worth)? - What happens to interrupt load? Do you see any disk controller interrupts? Would you be able to build a kernel with WITNESS (and WITNESS_SKIPSPIN) and see if you get any errors when stalls happen. On 2013-Mar-05 14:09:36 -0800, Jeremy Chadwick j...@koitsu.org wrote: On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: Completely unrelated to the main thread: on 05/03/2013 07:32 Jeremy Chadwick said the following: That said, I still do not recommend ZFS for a root filesystem Why? Too long a history of problems with it and weird edge cases (keep reading); the last thing an administrator wants to deal with is a system where the root filesystem won't mount/can't be used. It makes recovery or problem-solving (i.e. the server is not physically accessible given geographic distances) very difficult. I've had lots of problems with a gmirrored UFS root as well. The biggest issue is that gmirror has no audit functionality so you can't verify that both sides of a mirror really do have the same data. My point/opinion: UFS for a root filesystem is guaranteed to work without any fiddling about and, barring drive failures or controller issues, is (again, my opinion) a lot more risk-free than ZFS-on-root. AFAIK, you can't boot from anything other than a single disk (ie no graid). -- Peter Jeremy pgp7H3m449swl.pgp Description: PGP signature
Re: Musings on ZFS Backup strategies
On 2013-Mar-01 08:24:53 -0600, Karl Denninger k...@denninger.net wrote: If I then restore the base and snapshot, I get back to where I was when the latest snapshot was taken. I don't need to keep the incremental snapshot for longer than it takes to zfs send it, so I can do: zfs snapshot pool/some-filesystem@unique-label zfs send -i pool/some-filesystem@base pool/some-filesystem@unique-label zfs destroy pool/some-filesystem@unique-label and that seems to work (and restore) just fine. This gives you an incremental since the base snapshot - which will probably grow in size over time. If you are storing the ZFS send streams on (eg) tape, rather than receiving them, you probably still want the Towers of Hanoi style backup hierarchy to control your backup volume. It's also worth noting that whilst the stream will contain the compression attributes of the filesystem(s) in it, the actual data is the stream in uncompressed This in turn means that keeping more than two incremental dumps offline has little or no value; the second merely being taken to insure that there is always at least one that has been written to completion without error to apply on top of the base. This is quite a critical point with this style of backup: The ZFS send stream is not intended as an archive format. It includes error detection but no error correction and any error in a stream renders the whole stream unusable (you can't retrieve only part of a stream). If you go this way, you probably want to wrap the stream in a FEC container (eg based on ports/comms/libfec) and/or keep multiple copies. The recommended approach is to do zfs send | zfs recv and store a replica of your pool (with whatever level of RAID that meets your needs). This way, you immediately detect an error in the send stream and can repeat the send. You then use scrub to verify (and recover) the replica. (Yes, I know, I've been a ZFS resister ;-)) Resistance is futile. :-) On 2013-Mar-01 15:34:39 -0500, Daniel Eischen deisc...@freebsd.org wrote: It wasn't clear that snapshots were traversable as a normal directory structure. I was thinking it was just a blob that you had to roll back to in order to get anything out of it. Snapshots appear in a .zfs/snapshot/SNAPSHOT_NAME directory at each mountpoint and are accessible as a normal read-only directory hierarchy below there. OTOH, the send stream _is_ a blob. Am I correct in assuming that one could: # zfs send -R snapshot | dd obs=10240 of=/dev/rst0 to archive it to tape instead of another [system:]drive? Yes. The output from zfs send is a stream of bytes that you can treat as you would any other stream of bytes. But this approach isn't recommended. -- Peter Jeremy pgp61ijyBCuu8.pgp Description: PGP signature
Re: Why can't gcc-4.2.1 build usable libreoffice?
On 2013-Feb-19 09:23:37 -0500, Mikhail T. mi+t...@aldan.algebra.com wrote: See, my understanding always was, the only possible reasons for a compiler to produce a non-starting executable are: 1. The code is buggy. 2. The compiler is buggy. 3. Both of the above. My question was, which is it? You left out: 4. Code relies on language features that are not supported by the compiler. (It's not a bug that gcc 4.2.1 (eg) doesn't suppert C++11) 5. Code relies on specific compiler features Feel free to answer your own question if it's important to you. No-one else is particularly interested. Yes, 4.6 is supposed to work and is supported by the office@ team. My question was about 4.2.1, which happens to be the base cc/c++ in 8.x and in 9.x as well, if world was built WITHOUT_CLANG. I too observe the 4.2.1-compiled office die at start-up -- the splash screen starts nicely and exits after kicking off the actual soffice.bin which segfaults. As others have indicated, the toolchain provided in the base system is intended only for building the base system. If it works for you for other purposes, that's good. If you believe it has bugs, feel free to submit PRs. If the bugs don't affect the base system, they are unlikely -- Peter Jeremy pgpzlf8YtE7jb.pgp Description: PGP signature
Re: svn - but smaller?
On 2013-Jan-27 21:54:44 -0600, Stephen Montgomery-Smith step...@missouri.edu wrote: On 01/27/2013 09:24 PM, Isaac (.ike) Levy wrote: Thank you for adding the ctm bits in the page, I'm deeply intrigued by possibly solving this problem with bits *already* in base?!! Suppose you want to keep up with 9.x-stable. Then you look at the ftp site ftp://ftp.freebsd.org/pub/FreeBSD/CTM/src-9/, look at the latest xEmpty file, and fetch it. Then create an empty directory /usr/src, and then do cd /usr/src ctm the-xEmpty-file-you-downloaded. No need to decompress the file first. Then fetch from the same web site all the files whose number is greater than the xEmpty file you downloaded and do cd /usr/src ctm the-rest-of-the-files* I tracked the CVS repo for at least 10 years using a perl script I wrote. It checks the local .ctm_status and then fetches successive deltas until the fetch fails. A second script ran ctm on the downloaded deltas to update my local CVS repo. If there's sufficient interest, I could make the scripts available. At $ex-work, I had an email subscription and had a script setup to run the emails through gpg and feed them into gpg. Unfortunately, I can't distribute that script. Now, if you want something not offered by ctm (e.g. 8.2-release), then you need to use svn. Or freebsd-update. The biggest downside of CTM ishat you can't pick arbitrary deltas - you can only fetch the head of pre-configured branches. The only way to get an older tree is to not apply deltas (ZFS snapshots are the best work-around here). -- Peter Jeremy pgpbZzqAVCO6F.pgp Description: PGP signature
Re: Svnsup architecture [was: Re: svn - but smaller?]
On 2013-Jan-25 13:42:19 +0100, Arrigo Marchiori ard...@yahoo.it wrote: The current svnsup design is composed of: 1- svnsup-distill: takes a revision from svn and creates a text file (called a delta) that represents it. It seems to be almost complete. 2- svnsup-apply: takes a delta generated by svnsup-distill and applies it to an existing source tree. It's currently a work in progress. 3- a server-side application that runs svnsup-distill and distributes the deltas (still to be developed). 4- a client-side application that fetches new deltas and runs svnsup-apply. New trees are bootstrapped from other sources, e.g. weekly tarballs (still to be developed). I think you've just re-invented CTM. Before spending too much more time on svnsup, I suggest you read ctm(1). -- Peter Jeremy pgpfk7Isksqu6.pgp Description: PGP signature
Re: svn - but smaller?
On 2013-Jan-23 15:40:50 +0100, Oliver Brandmueller o...@e-gitt.net wrote: in ancient times there was cvsup. cvsup was a PITA if you wanted (or needed) to install it via ports, the only reasonable way was to use pkg_add for that if you didn't want to pollute your system with otherwise unneeded software. There was also ctm(1). ctm is small, BSD-licensed and has been part of FreeBSD forever (almost). Thanks to stephen@, ctm deltas for various src trees, as well as the entire SVN repo are still available. c[v]sup can do things than aren't possible with ctm but I would expect that most people who currently use c[v]sup could readily migrate to using ctm. See http://www.freebsd.org/doc/handbook/ctm.html for details. Note that mirroring the actual SVN repo via ctm requires some patches. There is a README and patches in ftp://ftp.freebsd.org/pub/FreeBSD/CTM/svn-cur/ -- Peter Jeremy pgpTfeB8Ea6dH.pgp Description: PGP signature
Re: Does / Is anyone maintaining CVS for FreeBSD?
On 2013-Jan-01 21:30:14 -0800, Doug Hardie bc...@lafn.org wrote: Is the cvs code going away? There has been some discussion about removing CVS from the base system now it is no longer used. No concensus was reached, so it's not going away immediately (and would not be removed from 9.x or earlier branches in any case). CVS is (and will remain) available in ports (devel/cvs). -- Peter Jeremy pgp0orRYshWTs.pgp Description: PGP signature
Re: Increasing the DMESG buffer....
On 2012-Nov-21 10:57:49 +0100, Willem Jan Withagen w...@digiware.nl wrote: Probably because the kernelbuffer for it is too small. I know there used to be a kernel option to increase it. But I can not find it with the setting in NOTES or any other place I looked # Size of the kernel message buffer. Should be N * pagesize. options MSGBUF_SIZE=40960 -- Peter Jeremy pgpwUscTboaAO.pgp Description: PGP signature
Re: Node conflicts in SVN
On 2012-Nov-19 23:12:33 -0500, Frank Seltzer fran...@bellsouth.net wrote: On Mon, 19 Nov 2012, Eitan Adler wrote: did you run svn checkout on a directory which wasn't controlled by svn (with stuff in it?). If so you need to remove those directories and run svn up. No, svn created the directory. I moved /usr/ports out of the way and ran 'svn co'. Can you give the exact steps (including commands) you performed? -- Peter Jeremy pgpkZK9jxVe7D.pgp Description: PGP signature
Re: ZFS corruption due to lack of space?
On 2012-Nov-02 09:30:04 -, Steven Hartland kill...@multiplay.co.uk wrote: From: Peter Jeremy pe...@rulingia.com Many years ago, I wrote a simple utility that fills a raw disk with a pseudo-random sequence and then verifies it. This sort of tool Sounds useful, got a link? Sorry, no. I never released it. But writing something like it is quite easy. -- Peter Jeremy pgpVnWRVGtD8V.pgp Description: PGP signature
Re: ZFS corruption due to lack of space?
On 2012-Nov-01 13:29:34 -, Steven Hartland kill...@multiplay.co.uk wrote: After destroying and re-creating the pool and then writing zeros to the disk in multiple files without filling the fs I've manged to reproduce the corruption again so we can rule out full disk as the cause. Many years ago, I wrote a simple utility that fills a raw disk with a pseudo-random sequence and then verifies it. This sort of tool can be useful for detecting the presence of silent data corruption (or disk address wraparound). Suspects: HW issues (memory, cables, MB, disks), driver issue (not used mfi on tbolt 2208 based cards before). There has been a recent thread about various strange behaviours from LSI controllers and it has been stated that (at least for the 2008) the card firmware _must_ match the FreeBSD driver version. See http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069205.html -- Peter Jeremy pgp0iCOscX7cA.pgp Description: PGP signature
Re: ZFS corruption due to lack of space?
On 2012-Oct-31 17:25:09 -, Steven Hartland ste...@multiplay.co.uk wrote: Been running some tests on new hardware here to verify all is good. One of the tests was to fill the zfs array which seems like its totally corrupted the tank. I've accidently filled a pool, and had multiple processes try to write to the full pool, without either emptying the free space reserve (so I could still delete the offending files) or corrupting the pool. Had you tried to read/write the raw disks before you tried the ZFS testing? Do you have compression and/or dedupe enabled on the pool? 1. Given the information it seems like the multiple writes filling the disk may have caused metadata corruption? I don't recall seeing this reported before. 2. Is there anyway to stop the scrub? Other than freeing up some space, I don't think so. If this is a test pool that you don't need, you could try destroying it and re-creating it - that may be quicker and easier than recovering the existing pool. 3. Surely low space should never prevent stopping a scrub? As Artem noted, ZFS is a copy-on-write filesystem. It is supposed to reserve some free space to allow metadata updates (stop scrubs, delete files, etc) even when it is full but I have seen reports of this not working correctly in the past. A truncate-in-place may work. You could also try asking on zfs-disc...@opensolaris.org -- Peter Jeremy pgptbOF1VVAh4.pgp Description: PGP signature
Re: time keeps on slipping... slipping...
On 2012-Oct-10 23:30:30 -0700, John-Mark Gurney j...@funkthat.com wrote: kern.timecounter.tc.TSC-low.mask: 4294967295 kern.timecounter.tc.TSC-low.counter: 2854866610 kern.timecounter.tc.TSC-low.frequency: 10937740 kern.timecounter.tc.TSC-low.quality: 1000 ... Since I switch to HPET, it hasn't happened at all in the last 3 days.. That suggests that there's something peculiar about your TSC. There are a variety of possibilities... Does your CPU support multiple Cx states and are you using them (sysctl dev.cpu | grep cx_)? -- Peter Jeremy pgpeGw8Kk7Rhc.pgp Description: PGP signature
Re: Problem adding more than 8 network adapters
[Moving to -stable and adding jhb@ for his input] On 2012-Aug-29 11:32:44 +0200, Gustau Pérez i Querol gpe...@entel.upc.edu wrote: Al 29/08/2012 11:02, En/na Peter Jeremy ha escrit: On 2012-Aug-28 11:44:44 +0200, Gustau Pérez i Querol gpe...@entel.upc.edu wrote: I'm running FreeBSD 9.1 RC1/AMD64 with VirtualBox. The problem I'm facing is that I can't use more than 8 network adapters plugged to the virtual machine. ... I don't know if it's a net@ problem or maybe it is a problem with the emulated PCI-bridge and then stable@ should be contacted. Also, I'm not sure if a real machine would support more than 8 network adapters or not. Any hints would be appreciated. I don't think I've ever used more than 6 physical NICs in a host but don't know of any reason for 8 to not work. Can you please post a pciconf -lv from FreeBSD and the equivalent lspci from Linux. A FreeBSD verbose boot log might also help. Sure. I'm attaching them to this mail. I hope the mailing list doesn't eat them. If it does, I will post them online and send the URL to the mailing list. Ah.. lspci shows the 9th LANCE at 02:00.0. The verbose boot shows FreeBSD finds pcib2 (at pci0 device 25.0) but doesn't see anything on that bus. ISTR jhb@ will recognize that problem. Table 'FACP' at 0x3fff0110 Table 'APIC' at 0x3fff0280 APIC: Found table at 0x3fff0280 APIC: Using the MADT enumerator. MADT: Found CPU APIC ID 0 ACPI ID 0: enabled SMP: Added CPU 0 (AP) Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-STABLE #0 r232547: Mon Mar 5 17:46:14 UTC 2012 root@hast1:/usr/obj/usr/src/sys/GENERIC amd64 Preloaded elf kernel /boot/kernel/kernel at 0x8168e000. Preloaded elf obj module /boot/kernel/zfs.ko at 0x8168e180. Preloaded elf obj module /boot/kernel/opensolaris.ko at 0x8168e828. Preloaded /boot/zfs/zpool.cache /boot/zfs/zpool.cache at 0x8168ee58. Preloaded elf obj module /boot/modules/virtio.ko at 0x8168eeb8. Preloaded elf obj module /boot/modules/virtio_pci.ko at 0x8168f420. Preloaded elf obj module /boot/modules/virtio_blk.ko at 0x8168f950. Preloaded elf obj module /boot/modules/if_vtnet.ko at 0x8168fec0. Calibrating TSC clock ... TSC clock: 2390626999 Hz CPU: Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz (2390.63-MHz K8-class CPU) Origin = GenuineIntel Id = 0x20655 Family = 6 Model = 25 Stepping = 5 Features=0x783fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2 Features2=0x209SSE3,MON,SSSE3 AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF real memory = 1073676288 (1023 MB) Physical memory chunk(s): 0x1000 - 0x0009bfff, 634880 bytes (155 pages) 0x0010 - 0x001f, 1048576 bytes (256 pages) 0x016bf000 - 0x3e18, 1017974784 bytes (248529 pages) avail memory = 1008267264 (961 MB) Event timer LAPIC quality 400 ACPI APIC Table: VBOX VBOXAPIC APIC: CPU 0 has ACPI ID 0 x86bios: IVT 0x00-0x0004ff at 0xfe00 x86bios: SSEG 0x001000-0x001fff at 0xff8000203000 x86bios: EBDA 0x09f000-0x09 at 0xfe09f000 x86bios: ROM 0x0a-0x0fefff at 0xfe0a ULE: setup cpu 0 ACPI: RSDP 0xe 00024 (v02 VBOX ) ACPI: XSDT 0x3fff0040 0004C (v01 VBOX VBOXXSDT 0001 ASL 0061) ACPI: FACP 0x3fff0110 000F4 (v04 VBOX VBOXFACP 0001 ASL 0061) ACPI: DSDT 0x3fff0530 01B96 (v01 VBOX VBOXBIOS 0002 INTL 20120816) ACPI: FACS 0x3fff0240 00040 ACPI: APIC 0x3fff0280 00054 (v02 VBOX VBOXAPIC 0001 ASL 0061) ACPI: HPET 0x3fff02e0 00038 (v01 VBOX VBOXHPET 0001 ASL 0061) ACPI: MCFG 0x3fff0320 0003C (v01 VBOX VBOXMCFG 0001 ASL 0061) ACPI: SSDT 0x3fff0360 001CC (v01 VBOX VBOXCPUT 0002 INTL 20120816) MADT: Found IO APIC ID 1, Interrupt 0 at 0xfec0 ioapic0: Routing external 8259A's - intpin 0 MADT: Interrupt override: source 0, irq 2 ioapic0: Routing IRQ 0 - intpin 2 MADT: Interrupt override: source 9, irq 9 ioapic0: intpin 9 trigger: level ioapic0 Version 1.1 irqs 0-23 on motherboard cpu0 BSP: ID: 0x VER: 0x00050014 LDR: 0x DFR: 0x lint0: 0x00010700 lint1: 0x0400 TPR: 0x SVR: 0x01ff timer: 0x000100ef therm: 0x0001 err: 0x00f0 pmc: 0x00010400 wlan: 802.11 Link Layer snd_unit_init() u=0x00ff8000 [512] d=0x7c00 [32] c=0x03ff [1024] feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 kbd: new array size 4 kbd1 at kbdmux0 nfslock: pseudo-device mem: memory CPU supports MTRRs but not enabled io: I/O null: null device, zero device random: entropy source, Software, Yarrow hptrr: RocketRAID 17xx/2xxx SATA
Re: Removing CVS from base
On 2012-Aug-23 16:06:18 -0400, John Baldwin j...@freebsd.org wrote: On Thursday, August 23, 2012 3:41:03 pm Peter Wemm wrote: * Don't expect to see any 10.0-alpha/beta/rc/release/stable to *ever* make it to an official cvs tree. It's probably time to move a freebsd-ified cvs from head to ports. I think this is a bit premature. Just because we are moving away from using CVS as FreeBSD's scm doesn't mean CVS isn't a useful general-purpose tool still. For smaller repositories that don't need fancier things like branches, CVS is quite useful and far lighter weight. To me, this reads like the exact definition of a ports, not base use case. CVS (and RCS) are both GPL-licensed tools that (as of 10.x) no longer serve any purpose in the base system. I agree that they still serve a purpose (I use CVS as a SCM both at home and $work) but (IMHO) if they are not needed to support FreeBSD, they are not needed in the FreeBSD base. I could see moving csup out to ports, but not necessarily CVS. Ideally, csup would learn how to talk to a SVN repository so it can continue to be used to update a local src tree (without needing to install subversion). Failing that, csup should probably also go. -- Peter Jeremy pgpNb5AENPidH.pgp Description: PGP signature
Re: sh(1) exiting on SIGWINCH
On 2012-Jul-05 00:22:45 -0500, Brandon Gooch jamesbrandongo...@gmail.com wrote: Seems that the window resize is somehow causing sh(1) to receive an EOF while the shell is sitting at the prompt, which results in the shell exiting; haven't dug too deeply into the source yet, but can you try to run /bin/sh with the '-I' (that's capital letter 'i') and it the shell shouldn't exit (but it will bark at you with a 'Use exit to leave shell.' message on each resize. Interesting. I hadn't tried '-I' but now also see that. I'm CC'ing jilles@ for any potential insight into the behavior of sh(1) (and perhaps this updated libedit snapshot). I would also welcome any insights jilles@ can offer. pfg@ (who shepherded the libedit update into the tree), David Shao (originator of kern/169603) and I have been investigating fixes to libedit but do not have a solution yet. There is a possibility that sh(1) is relying on bugs in the old libedit. At this stage, it seems likely that the libedit update (r237738) will be reverted for 9.1-RELEASE. -- Peter Jeremy pgpUCwQdtnSJm.pgp Description: PGP signature
sh(1) exiting on SIGWINCH
I've recently updated a box from 8-stable to 9-stable/amd64 (r237995), compiled with gcc, and now sh(1) exits if I change the window size (ssh'ing to the target system within an xterm). I don't recall ever seeing this sort of behaviour before and am still trying to track down the relevant code path. ktrace output looks like: 1766 sh GIO fd 2 wrote 2 bytes # 1766 sh RET write 2 1766 sh CALL ioctl(0,TIOCGETA,0x801020364) 1766 sh RET ioctl 0 1766 sh CALL ioctl(0,TIOCSETAW,0x801020338) 1766 sh RET ioctl 0 1766 sh CALL read(0,0x7fffda8f,0x1) 1766 sh RET read -1 errno 4 Interrupted system call 1766 sh PSIG SIGWINCH caught handler=0x417d10 mask=0x0 code=0x10006 1766 sh CALL sigreturn(0x7fffd600) 1766 sh RET sigreturn JUSTRETURN 1766 sh CALL ioctl(0,TIOCSETAW,0x80102030c) 1766 sh RET ioctl 0 1766 sh CALL setpgid(0,0x6e6) 1766 sh RET setpgid -1 errno 1 Operation not permitted 1766 sh CALL ioctl(0xa,TIOCSPGRP,0x7fffda74) 1766 sh RET ioctl 0 1766 sh CALL close(0xa) 1766 sh RET close 0 1766 sh CALL exit(0) Does this ring any bells with anyone? -- Peter Jeremy pgp5N9G6c8SWr.pgp Description: PGP signature
Re: sh(1) exiting on SIGWINCH
On 2012-Jul-04 20:03:32 +1000, Peter Jeremy pe...@server.rulingia.com wrote: I've recently updated a box from 8-stable to 9-stable/amd64 (r237995), compiled with gcc, and now sh(1) exits if I change the window size (ssh'ing to the target system within an xterm). I don't recall ever seeing this sort of behaviour before and am still trying to track down the relevant code path. Someone pointed me at kern/169603 and I can confirm that reverting r237738 (MFC of r237448) fixes the problem. Unfortunately, that is a fairly large patch and so I haven't investigated further. -- Peter Jeremy pgp3dL0c9CGxT.pgp Description: PGP signature
Re: Xorg in swwrt
On 2011-Feb-06 15:19:12 +1030, Daniel O'Connor dar...@dons.net.au wrote: I updated ports (portmaster -a basically) on this 8.2-PRE box and now I find X takes a long, long time to start up and uses lots of CPU. It shows the wchan as swwrt. FWIW, I've run into this a couple of times recently when logging out of X. This is with X.Org X Server 1.10.6 and a ATI Radeon HD 2400 Pro on 8-STABLE r235229. The problem seems to go away after a couple of hours. -- Peter Jeremy pgpiZRKfLAQrH.pgp Description: PGP signature
Re: Seeking 6.4 make source for ports
On 2012-Jun-20 14:56:18 -0400, Michael R. Wayne freebs...@wayne47.com wrote: Have the problem myself. There are ports with security vulnerabilities So, you're happy to have vulnerabilities is the base system but not in your ports? and the recent change broke make for ports. Older releases generally require special code within the ports tree and this code is removed once the relevant branch is no longer supported. So, any chance of getting a 6.4 make compiled for 6.3? Use your favourite source management tool (csup, cvsup, cvs or svn) to checkout a copy of /usr/src/usr.bin/make (and any other build infrastructure it needs) for RELENG_6_4 or later and run make in that directory. You could even grab the files from the 6.4-RELEASE src install bundle on an FTP site. -- Peter Jeremy pgpF0FRK4GTke.pgp Description: PGP signature
Re: zfs, 1 gig of RAM and periodic weekly
On 2012-Feb-27 14:48:05 -0800, Freddie Cash fjwc...@gmail.com wrote: You can get away with 2 GB of RAM, if you spend a lot of time manually tuning things to prevent kmem exhaustion and prevent ZFS ARC from starving the rest of the system (especially on the network side of things). I run a system with ZFS and 2GB RAM (though only 40GB disk) without any major tuning (AFAIR, I've only adjusted vfs.zfs.arc_max). That said, more RAM would be better. Definitely go with a 64-bit install. Even with less than 4 GB of RAM, you'll benefit from the large kmem size and better auto-tuning. I'd strongly recommend against running ZFS on i386 as anything other than an experiment. -- Peter Jeremy pgpsJbLUQk3mc.pgp Description: PGP signature
Re: Another ZFS ARC memory question
On 2012-Feb-24 11:06:52 +, Luke Marsden luke-li...@hybrid-logic.co.uk wrote: We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines but have been having trouble with short spikes in application memory usage resulting in huge amounts of swapping, bringing the whole machine to its knees and crashing it hard. I suspect this is because when there is a sudden spike in memory usage the zfs arc reclaim thread is unable to free system memory fast enough. There were a large number of fairly serious ZFS bugs that have been fixed since 8.2-RELEASE and I would suggest you look at upgrading. That said, I haven't seen the specific problem you are reporting. * is this a known problem? I'm unaware of it specifically as it relates to ZFS. You don't mention how big the memory usage spike is but unless there is sufficient free+ cache available to cope with a usage spike then you will have problems whether it's UFS or ZFS (though it's possibly worse with ZFS). FreeBSD is known not to cope well with running out of memory. * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? Are you swapping onto a ZFS vdev? If so, change back to a raw (or geom) device - swapping to ZFS is known to be problematic. If you have very spiky memory requirements, increasing vm.v_cache_min and/or vm.v_free_reserved might give you better results. * has FreeBSD 9.0 / ZFS v28 solved this problem? The ZFS code is the same in 9.0 and 8.3. Since 8.3 is less of a jump, I'd recommend that you try 8.3-prerelease in a test box and see how it handles your load. Note that there's no need to upgrade your pools from v15 to v28 unless you want the ZFS features - the actual ZFS code is independent of pool version. * rather than setting a hard limit on the ARC cache size, is it possible to adjust the auto-tuning variables to leave more free memory for spiky memory situations? e.g. set the auto-tuning to make arc eat 80% of memory instead of ~95% like it is at present? Memory spikes are absorbed by vm.v_cache_min and vm.v_free_reserved in the first instance. The current vfs.zfs.arc_max default may be a bit high for some workloads but at this point in time, you will need to tune it manually. -- Peter Jeremy pgpb0kzq1SDsY.pgp Description: PGP signature
Re: disk devices speed is ugly
On 2012-Feb-13 08:28:21 -0500, Gary Palmer gpal...@freebsd.org wrote: The filesystem is the *BEST* place to do caching. It knows what metadata is most effective to cache and what other data (e.g. file contents) doesn't need to be cached. Agreed. Any attempt to do this in layers between the FS and the disk won't achieve the same gains as a properly written filesystem. Agreed - but traditionally, Unix uses this approach via block devices. For various reasons, FreeBSD moved caching into UFS and removed block devices. Unfortunately, this means that any FS that wants caching has to implement its own - and currently only UFS ZFS do. What would be nice is a generic caching subsystem that any FS can use - similar to the old block devices but with hooks to allow the FS to request read-ahead, advise of unwanted blocks and ability to flush dirty blocks in a requested order with the equivalent of barriers (request Y will not occur until preceeding request X has been committed to stable media). This would allow filesystems to regain the benefits of block devices with minimal effort and then improve performance cache efficiency with additional work. One downside of the each FS does its own caching in that the caches are all separate and need careful integration into the VM subsystem to prevent starvation (eg past problems with UFS starving ZFS L2ARC). -- Peter Jeremy pgpa3o0LQ2kfG.pgp Description: PGP signature
Re: Can't boot 9.0-RELEASE on sparc64
On 2012-Jan-15 17:02:33 +0100, C. P. Ghost cpgh...@cordula.ws wrote: I'm trying to boot 9.0-RELEASE on sparc64, but I'm getting stuck at: panic: kmem_suballoc: bad status return of 3 cpuid = 0 KDB: stack backtrace: #0 0xc079841c at ??+0 #1 0xc04ca59c at ??+0 #2 0xc0487f90 at ??+0 #3 0xc0098028 at ??+0 I'm not able to break into the kernel debugger from there. This is a SunBlade 1500 with 2GB of RAM, booting from cdrom. For anyone following this on -stable only, it looks like the FreeBSD VM system doesn't like the RAM layout when a SB1500 has 2GB RAM. There's no problem with 1GB or 4GB RAM. The OP has created sparc64/164227 -- Peter Jeremy pgpg7mKRrjkxp.pgp Description: PGP signature
Re: Booting problem for FreeBSD SPARC64
On 2012-Jan-20 21:18:20 +0530, Desai, Kashyap kashyap.de...@lsi.com wrote: I am using below machine and seeing some basic installation problem with FreeBSD 8.2 and 9.0 ( I have not tried other releases) Sun SPARC Enterprise M3000 Server, using Domain console The new M-class machines are not listed in http://www.freebsd.org/platforms/sparc.html and it's likely that they aren't supported. Is there any work around/solution for this issue ? If you wanted to assist with support for the M3000, I suggest you start a thread on freebsd-sparc64. -- Peter Jeremy pgpsIPp38DaXh.pgp Description: PGP signature
Re: FLAME - security advisories on the 23rd ? uncool idea is uncool
On 2011-Dec-23 20:06:10 +0100, Lars Engels lars.eng...@0x20.net wrote: On Fri, Dec 23, 2011 at 06:30:59PM +0100, Bas Smeelen wrote: _but_ FreeBSD is not a distribution It is *a complete operating system* Happy holidays And the D in BSD is for? ;-) FreeBSD is a complete operating system _derived_from_ the Berkeley Software Distribution that used to be available from the now-defunct UCB CSRG. The BSD in FreeBSD acknowledges its roots. And on-topic - yes, the timing sucks (especially since I'm one of the people reading this on the Saturday commencing a long holiday period) but I think the SO made the right call. Hopefully, this was all that was holding up 9.0-RELEASE and RE will be giving us a more welcome Xmas present. -- Peter Jeremy pgpJ5YZU425S5.pgp Description: PGP signature
Re: FLAME - security advisories on the 23rd ? uncool idea is uncool
On 2011-Dec-23 23:40:10 +0200, George Kontostanos gkontos.m...@gmail.com wrote: In any case, and IMHO this was not the proper time for this kind of advisories considering the fact that many companies are in a freeze period. My honeypot logs suggest that the black hats aren't taking a holiday. As Colin posted, the SO had to decide between two unpalatable options and, IMHO, he made the correct decision. The details and fixes are now available - it's up to you to weigh up the risks of patching vs the risks of not patching. -- Peter Jeremy pgpwPaYsswqdf.pgp Description: PGP signature
Re: fsck_ufs out of swapspace
On 2011-Dec-19 22:27:49 +0100, Michiel Boland bolan...@xs4all.nl wrote: Problem solved - it was indeed an endian thing. The problem is that fsck uses a real_dev_bsize variable that is declared long, but the DIOCGSECTORSIZE ioctl takes an u_int argument. To be accurate, this isn't an endian problem, it's a general problem of passing a pointer to an incorrectly sized object. The bug is masked on amd64 iA64 because real_dev_bsize is statically allocated and therefore initialised to zero. This means the failure to assign the top 32 bits in the ioctl doesn't affect the final result. A PR has been submitted. sparc64/163460 for the record. Thank you for tracking that down. -- Peter Jeremy pgp7m3HL1diGx.pgp Description: PGP signature
Re: bad sector in gmirror HDD
On 2011-Aug-19 20:24:38 -0700, Jeremy Chadwick free...@jdc.parodius.com wrote: The reallocated LBA cannot be dealt with aside from re-creating the filesystem and telling it not to use the LBA. I see no flags in newfs(8) that indicate a way to specify LBAs to avoid. And we don't know what LBA it is so we can't refer to it right now anyway. As I said previously, I have no idea how UFS/FFS deals with this. It doesn't. UFS/FFS and ZFS expect and assume perfect media. It's up to the drive to transparently remap faulty sectors. UFS used to have support for visible bad sectors (and Solaris UFS still reserves space for this, though I don't know if it still works) but the code was removed from FreeBSD long ago. AFAIR, wd(4) supported bad sectors but it was removed long ago. -- Peter Jeremy pgpzqxeB9mDZP.pgp Description: PGP signature
Re: 32GB limit per swap device?
On 2011-Aug-18 12:16:44 +0400, Alexander V. Chernikov melif...@ipfw.ru wrote: The code should look like this: ... (move pages recalculation before b-list check) I notice a very similar patch has been applied to -current as r225076. For the archives, I've done some testing with this patch on a Sun V890 with 64GB RAM and two 64GB swap partitions. Prior to this patch, each swap partition was truncated to 32GB. With this patch, I have 128GB swap. I've tried filling the swap space to over 80GB and I am not seeing any corruption (allocate lots of memory and fill with a known pseudo-random pattern and then verify). -- Peter Jeremy pgpo8PkzVBfqo.pgp Description: PGP signature
Re: ZFS directory with a large number of files
On 2011-Aug-02 08:39:03 +0100, seanr...@gmail.com seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. As you might imagine, a 'ls' of this directory took quite some time. The files were conveniently named with a timestamp in the filename (still images from a security camera, once per second) so I've since moved them all to timestamped directories (/MM/dd/hh/mm). What I found though was the original directory the images were in is still very slow to ls -- and it only has 1 file in it, another directory. I've also seen this behaviour on Solaris 10 after cleaning out a directory with a large number of files (though not as pathological as your case). I tried creating and deleting entries in an unsuccessful effort to trigger directory compaction. I wound up moving the remaining contents into a new directory, deleting the original one and renaming the new directory. It would appear te be a garbage collection bug in ZFS. On 2011-Aug-02 13:10:27 +0300, Daniel Kalchev dan...@digsys.bg wrote: On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. Traditional UFS (including Solaris) behave badly in this scenario but 4.4BSD derivatives will release unused space at the end of a directory and have smarts to more efficiently skip unused entries at the start of a directory. -- Peter Jeremy pgpmdeH6w8Ny5.pgp Description: PGP signature
Re: SATA 6g 4-port non-RAID controller ?
On 2011-Jul-28 17:57:52 +1000, Jan Mikkelsen j...@transactionware.com wrote: On 28/07/2011, at 2:55 AM, Jeremy Chadwick wrote: I can find you examples on Google of people who invested in Areca ARC-1220 cards (PCIe x8) only to find out that when inserted into one of their two PCIe x16 slots the mainboard wouldn't start (see above). I can also find you examples on Google of people with Intel 915GM chipsets whose user manuals explicitly state the PCIe x16 slot on their board is intended for use with graphics cards only. Just trying to understand; I think I can recall reading about issues with the 915 chipset. I agree a check, don't assume warning is reasonable. I have also run into problems (wouldn't POST from memory) trying to use a NIC in the x16 slot of Dell GX620 boxes, which use an i945 chipset. -- Peter Jeremy pgpxVKRjcgfQY.pgp Description: PGP signature
Re: Status of support for 4KB disk sectors
On 2011-Jul-19 10:54:38 -0700, Chuck Swiger cswi...@mac.com wrote: Unix operating systems like SunOS 3 and NEXTSTEP would happily run with a DEV_BSIZE of 1024 or larger-- they'd boot fine off of optical media using 2048-byte sectors, Actually, Sun used customised CD-ROM drives that faked 512-byte sectors to work around their lack of support for anything else. some of the early 1990's era SCSI hard drives supported low-level reformatting to a different sector size like 1024 or 2048 bytes. Did anyone actually do this? I wanted to but was warned against it by the local OS rep (this was a Motorola SVR2). -- Peter Jeremy pgp9GiYFCh7fP.pgp Description: PGP signature
Re: current status of digi driver
On 2011-Jun-23 17:55:15 -0400, David Boyd david.b...@insightbb.com wrote: It appears that there was also agreement that (at least) some of the drivers, digi included, would be converted soon after 8.0-RELEASE. That came down to developer time and it appeared that I was the only person interested in it. Is there any plan to bring digi forward? See kern/158086 (which updates digi(4)) and kern/152254 (which re- implements TTY functionality that was lost with TTYng). Both include patches that should work on either 8.x or -current. Of the two, the latter is more urgent because it impacts the KBI. We have about 55 modem ports over ten 8-port Xr cards (PCI) that connect remote sites via dial-up. I've only got access to PCI Xem cards that are used for serial console concentration so it would be useful for you to test both the Xr cards and dial-in support. -- Peter Jeremy pgpjTK58QmWt2.pgp Description: PGP signature
Re: Automatic reboot doesn't reboot
On 2011-May-02 16:32:30 +0200, Olaf Seibert o.seib...@cs.ru.nl wrote: However, it doesn't automatically reboot in 15 seconds, as promised. It just sits there the whole weekend, until I log onto the IPMI console and press the virtual reset button. Your reference to IMPI indicates this is not a consumer PC. Can you please provide some details of the hardware. Are you running ipmitools or similar? Does shutdown -r or reboot work normally? panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated I suggest you have a read of the thread beginning http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010862.html (note that mailman has split it into at least 3 threads). -- Peter Jeremy pgpQGveibDZlq.pgp Description: PGP signature
Linker set issues with ath(4) HALs
I have a Atheros AR5424 and so, based on the 8.2-STABLE i386 NOTES and some rummaging in the sources, I tried to build a kernel with: device ath # Atheros pci/cardbus NIC's device ath_ar5212 # HAL for Atheros AR5212 and derived chips device ath_rate_sample # SampleRate tx rate control for ath and this died during the kernel linking with: linking kernel.debug ah.o(.text+0x23c): In function `ath_hal_rfprobe': /usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__start_set_ah_rf s' ah.o(.text+0x241):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__stop_set_ah_rfs' ah.o(.text+0x25a):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__stop_set_ah_rfs' Following a suggestion by a friend, I changed that to: device ath # Atheros pci/cardbus NIC's options AH_SUPPORT_AR5416 device ath_hal # Atheros HAL device ath_rate_sample # SampleRate tx rate control for ath and it worked. Normally, I would leave it at that but I'd like to understand what is actually going on... In both cases, ah.o contains the following 4 references: U __start_set_ah_chips U __start_set_ah_rfs U __stop_set_ah_chips U __stop_set_ah_rfs generated by: /* linker set of registered chips */ OS_SET_DECLARE(ah_chips, struct ath_hal_chip); /* linker set of registered RF backends */ OS_SET_DECLARE(ah_rfs, struct ath_hal_rf); These symbols do not appear in any other .o files, though there are a variety of other __{start,stop}_set_* symbols - all of which show up as 'A' (absolule) values in the final kernel. My questions are: How are these linker set references resolved? I can't find anything that defines these symbols - either in .o files or in ldscript files. In the first case, there are 2 pairs of undefined linker set variables but the linker only reports one pair as unresolved. Why don't both sets show up as resolved or unresolved? Why does using the generic ath_hal, rather than the hardware-specific HAL fix the problem? -- Peter Jeremy pgpL0aiDu3NMN.pgp Description: PGP signature
Re: Linker set issues with ath(4) HALs
On 2011-Mar-05 11:48:54 +0200, Kostik Belousov kostik...@gmail.com wrote: On Sat, Mar 05, 2011 at 07:50:05PM +1100, Peter Jeremy wrote: I have a Atheros AR5424 and so, based on the 8.2-STABLE i386 NOTES and some rummaging in the sources, I tried to build a kernel with: device ath # Atheros pci/cardbus NIC's device ath_ar5212 # HAL for Atheros AR5212 and derived chips device ath_rate_sample # SampleRate tx rate control for ath ... These symbols do not appear in any other .o files, though there are a variety of other __{start,stop}_set_* symbols - all of which show up as 'A' (absolule) values in the final kernel. My questions are: How are these linker set references resolved? I can't find anything that defines these symbols - either in .o files or in ldscript files. ... Linker synthesizes the symbols assuming the following two conditions are met: - the symbols are referenced; - there exists an ELF section named `set_ah_rfs'. It assigns the (relocated) start of the section to __start_sectionname, and end to __stop_sectionname. Thank you for that. Looking through the output of 'objdump -h' showed that it was user error: When using device ath_ar, it looks like you need to include a device ath_rf as well. After a closer look at my dmesg and available options, I've add device ath_rf2425 and things seem much happier. -- Peter Jeremy pgpI70wFRMecY.pgp Description: PGP signature
Re: system crash during make installworld
On 2011-Feb-21 08:04:00 +, David J Brooks freys...@comcast.net wrote: As the subject suggests, my laptop crashed during make installworld. The new kernel boots, but the ELF interpreter is not found and I cannot get to a single user prompt. What is the least painful way to proceed? My first suggestion would be to boot the previous kernel. If that doesn't help, try specifying /rescue/sh as the single-user shell. If neither of those work, please specify the exact error message you get and the point where you get it (if you don't have a serial console available, post a link to picture of the screen showing the issue). -- Peter Jeremy pgptX1VTtosbn.pgp Description: PGP signature
Re: classes and kernel_cookie was Re: Specifying root mount options on diskless boot.
On 2011-Jan-09 10:32:48 -0500, Daniel Feenberg feenb...@nber.org wrote: Daniel Braniss writes... I have it pxebooting nicely and running with an NFS root but it then reports locking problems: devd, syslogd, moused (and maybe Actually, that was me, not Daniel. Are you mounting /var via nfs? Yes. I'm using diskless in the traditional Sun workstation style - the system itself is running with a normal filesystem which is all NFS mounted from another (FreeBSD) server. I'm aware of the MFS-based read-only approach but didn't want to use that approach. I note that the response to your message from danny offers the ability to pass arguments to the nfs mount command, Actually, my original mail indicates that that I'm aware you can pass options to the NFS mount command (passing nolockd will solve my problem). My issue is that there are several incompatible approaches and none of them work by default. but also seems to offer a fix for the fact that classes are not supported under PXE: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90368 I wasn't previously aware of that PR but it is consistent with my findings. On 2011-Jan-10 10:52:34 +0200, Daniel Braniss da...@cs.huji.ac.il wrote: I'm willing to try and add the missing pieces, but I need some better explanantion as to what they are, for example, I have no clue what the kernel_cookie is used for, nor what the ${class} is all about. I'm also happy to patch the code but feel that both PXE and BOOTP should be consistent and I'm not sure which is the correct approach. BTW, it would be kind if the line in the pxeboot(8): As PXE is still in its infancy ... can be changed :-) Well, there are still some issues with PXE booting FreeBSD - eg as discussed here. But, I agree, that comment can probably go. -- Peter Jeremy pgp1ovrNzYvjf.pgp Description: PGP signature
Re: sed is broken under freebsd?
On 2011-Jan-12 02:32:52 +0100, Oliver Pinter oliver.p...@gmail.com wrote: The freebsd versions of sed contained a bug/regression, when \n char can i subsitue, gsed not affected with this bug: gsed contains non-standard extensions and you have been suckered into using them. Try using 'gsed --posix' and/or setting POSIXLY_CORRECT. This is part of the GNU/FSF lockin policy that encourages people to use their non-standard extensions to ensure that you don't have any choice other than to use their software. -- Peter Jeremy pgpd7zj0Dn2kG.pgp Description: PGP signature
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 2010-Dec-30 12:40:00 +0100, Damien Fleuriot m...@my.gd wrote: What are the steps for properly removing my drives from the zraid1 pool and inserting them in the zraid2 pool ? I've documented my experiences in migrating from a 3-way RAIDZ1 to a 6-way RAIDZ2 at http://bugs.au.freebsd.org/dokuwiki/doku.php/zfsraid Note that, even for a home system, backups are worthwhile. In my case, I backup onto a 2TB disk in an eSATA enclosure. That's currently (just) adequate but I'll soon need to identify data that I can leave off that backup. -- Peter Jeremy pgpOSt5NCO7Do.pgp Description: PGP signature
Specifying root mount options on diskless boot.
[I'm not sure if -stable is the best list for this but anyway...] I'm trying to convert an old laptop running FreeBSD 8.0 into a diskless client (since its internal HDD is growing bad spots faster than I can repair them). I have it pxebooting nicely and running with an NFS root but it then reports locking problems: devd, syslogd, moused (and maybe others) lock their PID file to protect against multiple instances. Unfortunately, these daemons all start before statd/lockd and so the locking fails and reports operation not supported. It's not practical to reorder the startup sequence to make lockd start early enough (I've tried). Since the filesystem is reserved for this client, there's no real need to forward lock requests across the wire and so specifying nolockd would be another solution. Looking through sys/nfsclient/bootp_subr.c, DHCP option 130 should allow NFS mount options to be specified (though it's not clear that the relevant code path is actually followed because I don't see the associated printf()s anywhere on the console. After getting isc-dhcpd to forward this option (made more difficult because its documentation is incorrect), it still doesn't work. Understanding all this isn't helped by kenv(8) reporting three different sets of root filesystem options: boot.nfsroot.path=/tank/m3 boot.nfsroot.server=192.168.123.200 dhcp.option-130=nolockd dhcp.root-path=192.168.123.200:/tank/m3 vfs.root.mountfrom=nfs:server:/tank/m3 vfs.root.mountfrom.options=rw,tcp,nolockd And the console also reports conflicting root definitions: Trying to mount root from nfs:server:/tank/m3 NFS ROOT: 192.168.123.200:/tank/m3 Working through all these: boot.nfsroot.* appears to be initialised by sys/boot/i386/libi386/pxe.c but, whilst nfsclient/nfs_diskless.c can parse boot.nfsroot.options, there's no code to initialise that kenv name in pxe.c dhcp.* appears to be initialised by lib/libstand/bootp.c - which does include code to populate boot.nfsroot.options (using vendor specific DHCP option 20) but this code is not compiled in. Further studying of bootp.c shows that it's possible to initialise arbitrary kenv's using DHCP options 246-254 - but the DHCPDISCOVER packets do not request these options so they don't work without special DHCP server configuration (to forward options that aren't requested). vfs.root.* is parsed out of /etc/fstab but, other than being reported in the console message above, it doesn't appear to be used in this environment (it looks like the root entry can be commented out of /etc/fstab without problem). My final solution was to specify 'boot.nfsroot.options=nolockd' in loader.conf - and this seems to actually work. It seems rather unfortunate that FreeBSD has code to allow NFS root mount options to be specified via DHCP (admittedly in several incompatible ways) but none actually work. A quick look at -current suggests that the situation there remains equally broken. Has anyone else tried to use any of this? And would anyone be interested in trying to make it actually work? -- Peter Jeremy pgpVVITD1dFyb.pgp Description: PGP signature
Re: slow ZFS on FreeBSD 8.1
On 2010-Dec-30 07:20:57 -0500, Dan Langille d...@langille.org wrote: The reason I've not installed ZFS on root is because of the added complications. I run the OS on ufs (with gmirror) and my data is on ZFS. We must be hanging out with different groups. Most of the people I know don't have ZFS on root. My primary system at home is setup this way - primarily because at the time I built it (Nov 2008), I felt ZFS was a bit immature and wanted to have src and obj on UFS so I could do a rebuild if I lost access to ZFS for some reason. My experience has been that the UFS root has caused me far more headaches than the ZFS parts. I've since done some reconfiguration and plan to switch to ZFS root soon. Based on my experiences at home, I converted my desktop at work to pure ZFS. The only issues I've run into have been programs that extensively use mmap(2) - which is a known issue with ZFS. -- Peter Jeremy pgp9CvWOJe1I9.pgp Description: PGP signature
Re: slow ZFS on FreeBSD 8.1
On 2010-Dec-30 02:31:30 -0500, Adam Stylinski kungfujesu...@gmail.com wrote: I can tell you what the problem is right now, actually. ZFS performs very poorly on low performance CPUs (i.e. your Atom N330). I would disagree. In this case, the op's most serious problem is a bug in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:arc_memory_throttle() which is leading to ARC starvation. The direct effect of this is very poor ZFS I/O performance. It can be identified by very high inactive and possibly cache memory (as reported by 'systat -v' or top) as well as very high kstat.zfs.misc.arcstats.memory_throttle_count This bug was fixed in r210427 on -current, r211599 on 8.x and r211623 on 7.x. Try the same system with a different CPU and you'll get a different result. Not until the above bug is fixed. That said, ZFS is far more CPU intensive than UFS and a more powerful CPU may help - especially if you want gzip compression and/or sha256 checksumming. -- Peter Jeremy pgpRE0BXxVZbL.pgp Description: PGP signature
Re: slow ZFS on FreeBSD 8.1
On 2010-Dec-31 15:47:47 -0800, Jeremy Chadwick free...@jdc.parodius.com wrote: Is your ZFS root filesystem associated with a pool that's mirrored or using raidzX? Currently, mirrored. I'm considering raidz1 at home. Note that my work system is a single pool, whereas I'll use a separate pool for root at home. What about mismatched /boot content (ZFS vs. UFS)? Can you give me an example of what you mean here. What about booting into single-user mode? I haven't run into any problems here, though I agree that starting ZFS in single-user mode is a lot messier than starting UFS. error/mistake). Is it worth the risk? Most administrators don't have the tolerance for stuff like that in the middle of a system upgrade or what not; they should be able to follow exactly what's in the handbook, to a tee. I've been using FreeBSD for long enough that I'm confident to upgrade or similar without blindly following a process. But I agree that FreeBSD should be usable without needing to be a guru. There's a link to www.dan.me.uk at the bottom of the above Wiki page that outlines the madness that's required to configure the setup, all of which has to be done by hand. I don't know many administrators who are going to tolerate this when deploying numerous machines, especially when compounded by the complexities mentioned above. Root on ZFS is still very bespoke. I agree there's no way you could roll it out across lots of machines at present but I'm happy to hand- craft installs on a few machines. Hopefully, son-of-sysinstall will support ZFS installs (one prerequisite is someone being willing to do the work). The mmap(2) and sendfile(2) complexities will bite an junior or mid-level SA in the butt too -- they won't know why software starts failing or behaving oddly (FreeBSD ftpd is a good example). It just so happens that Apache, out-of-the-box, comes with mmap and sendfile use disabled. mmap(2) is a design problem with ZFS - it's present on Solaris as well. IMHO, it's the biggest flaw in ZFS. The sendfile(2) issues haven't bitten me so I haven't studied them as much but I'm aware that some fixes were committed recently. Oh and one root-on-ZFS gotcha that I missed is the lack of gzip support. I spent about ½day tracking that down - not helped by the lack of any documentation or a useful error message (though there is a comment in the code when you eventually track it down). -- Peter Jeremy pgpObQwMbJjKU.pgp Description: PGP signature
Re: root mount error
On 2010-Dec-28 23:08:44 +0300, Michael BlackHeart amdm...@gmail.com wrote: I'm jsut trying to say than recent changes in kernel or kernel-modules broke up my HDD support and I'd like to notice developres to check where the problem is. It doesn't work that way. The developers don't have a problem or it would have been fixed. You are going to need to provide more details and do some investigations yourself to help identify the problem. Loader on it's own stage easily detects HDD and root partition so I can just select old kernel and boot up, but I'm not shure how he gain access to HDD to mfke any conclusion, probably through BIOS interrupts but it's out of piont. Yes. Until the kernel starts, all I/O is via BIOS. And for my pity I don't know how to dump demsg without having any serial connection or usable disk drive, maybe to flash drive, but I don't know how. And anyway there's no real kernel painc, it just asks for root mountpoint. Best suggestion I can offer is to take photographs of the boot messages (you can use scroll-lock to let you scroll back) and post them somewhere. If you need any aditional info I'll give it all, just ask. What is the SVN revision of a kernel that works? What is the SVN revision of a kernel that fails? Can you please post a verbose dmesg of a successful boot. Can you please post a dmesg of an unsuccessful boot (see above). -- Peter Jeremy pgpHUhVzb3Hvu.pgp Description: PGP signature
Re: 8.1 livelock/hangup: possible actions
On 2010-Dec-11 18:14:28 +0500, Eugene M. Zheganin e...@norma.perm.ru wrote: I'm having problems with 8.1-REL/zfs/amd64. It's a IMB x3250 m2 system, 1Gb RAM, dualcore intel e3110, two bge(4) and LSI1064e disk controller. 1GB RAM is really light on for ZFS and there are some known ARC issues in 8.1 that can lead to free memory starvation. The most obvious indicator of this issue is that free memory reported by top OR systat -v drops _very_ low although there is plenty of cache and inactive memory. If you can't update to 8-stable, try changing arc_memory_throttle() in /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c to have uint64_t available_memory = ptoa((uintmax_t)cnt.v_free_count + cnt.v_cache_count); instead of uint64_t available_memory = ptoa((uintmax_t)cnt.v_free_count); at the top of the function. This fixes the worst bug but there are lots of other fixes if you upgrade. -- Peter Jeremy pgpMi5QjNUH8u.pgp Description: PGP signature
Re: idprio processes slowing down system
On 2010-Nov-28 02:24:21 -0600, Adam Vande More amvandem...@gmail.com wrote: On Sun, Nov 28, 2010 at 1:26 AM, Peter Jeremy peterjer...@acm.org wrote: Since all the boinc processes are running at i31, why are they impacting a buildkernel that runs with 0 nicety? With the setup you presented you're going to have a lot of context switches as the buildworld is going to give plenty of oppurtunities for boinc processes to get some time. Agreed. When it does switch out, the CPU cache is invalidated, then invalidated again when the buildworld preempts back. Not quite. The amd64 uses physically addressed caches (see [1] 7.6.1) so there's no need to flush the caches on a context switch. (Though the TLB _will_ need to be flushed since it does virtual-to-physical mapping (see [1] 5.5)). OTOH, whilst the boinc code is running, it will occupy space in the caches, thus reducing the effective cache size and presumably reducing the effective cache hit rate. This is what makes it slow. Unfortunately, I don't think this explains the difference. My system doesn't have hyperthreading so any memory stalls will block the affected core and the stall time will be added to the currently running process. My timing figures show that the user and system time is unaffected by boinc - which is inconsistent with the slowdown being due to the impact on boinc on caching. I've done some further investigations following a suggestion from a friend. In particular, an idprio process should only be occupying idle time so the time used by boinc and the system idle task whilst boinc is running should be the same as the system idle time whilst boinc is not running. Re-running the tests and additionally monitoring process times gives me the following idle time stats: x /tmp/boinc_running + /tmp/boinc_stopped ++ | + ++ + xx x x| ||__A_M___| |__AM| | ++ N Min MaxMedian AvgStddev x 4 493.3507.78501.69 499.765 6.3722759 + 4332.35392.08361.84 356.885 26.514364 Difference at 95.0% confidence -142.88 +/- 33.364 -28.5894% +/- 6.67595% (Student's t, pooled s = 19.2823) The numbers represent seconds of CPU time charged to [idle] (+) or [idle] and all boinc processes (x). This shows that when boinc is running, it is using time that would not otherwise be idle - which isn't what idprio processes should be doing. My suspicion is that idprio processes are not being preempted immediately a higher priority process becomes ready but are being allowed to continue to run for a short period (possibly until their current timeslice expires). Unfortunately, I haven't yet worked out how to prove or disprove this. I was hoping that someone more familiar with the scheduler behaviour would comment. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming http://support.amd.com/us/Processor_TechDocs/24593.pdf -- Peter Jeremy pgp9YwrxEtbQK.pgp Description: PGP signature
idprio processes slowing down system
Since scheduler issues have been popular lately, I thought I'd investigate a ULE issue I've been aware of for a while... I normally have some boinc (ports/astro/boinc) applications running and I'd noticed that my nightly builds appear to end much sooner when there's no boinc work units (this has been common for setiathome). This morning, I timed 4 make -j3 KERNCONF=GENERIC buildkernel of 8-stable with the following results: boinc running: 1167.839u 287.055s 18:45.69 129.2%6140+1975k 1+0io 114pf+0w 1166.431u 288.265s 18:00.16 134.6%6139+1975k 0+0io 106pf+0w 1168.490u 287.599s 17:52.24 135.7%6137+1975k 0+0io 106pf+0w 1165.747u 287.641s 17:10.38 141.0%6138+1975k 0+0io 106pf+0w boinc stopped: 1165.052u 291.492s 15:54.72 152.5%6125+1972k 0+0io 106pf+0w 1166.101u 290.305s 15:42.54 154.5%6132+1973k 0+0io 106pf+0w 1165.248u 290.335s 15:35.93 155.5%6132+1974k 0+0io 106pf+0w 1166.100u 289.749s 15:26.35 157.1%6137+1974k 0+0io 106pf+0w Since the the results were all monotonically reducing in wallclock time, I decided to do a further 4 buildkernels with boinc running: 1168.242u 284.693s 17:33.05 137.9%6140+1975k 0+0io 106pf+0w 1167.191u 285.332s 17:19.27 139.7%6140+1976k 0+0io 106pf+0w 1224.813u 291.963s 20:14.90 124.8%6121+1966k 0+0io 106pf+0w 1213.132u 294.564s 19:48.98 126.8%6116+1967k 0+0io 106pf+0w ministat(1) reports there is no statistical difference in the user or system time: User time: x boinc_running + boinc_stopped +--+ | +* x | | +*xxx x| |||A_M___A___| | +--+ N Min MaxMedian AvgStddev x 8 1165.747 1224.813 1168.242 1180.2356 24.12896 + 4 1165.052 1166.1011166.1 1165.62530.55457454 No difference proven at 95.0% confidence System time: x boinc_running + boinc_stopped +--+ | + | |x xx xx x + ++ x x| | |_MA|___MA_|__|| +--+ N Min MaxMedian AvgStddev x 8 284.693 294.564 287.641 288.389 3.3142183 + 4 289.749 291.492 290.335 290.470250.73252412 No difference proven at 95.0% confidence But there is a significant difference in the wallclock time: x boinc_running + boinc_stopped +--+ |+ + + + x x xx x x x x| ||__AM_| |___MA___| | +--+ N Min MaxMedian AvgStddev x 8 1030.381214.9 1080.16 1100.5838 69.364795 + 4926.35954.72942.54 939.885 11.915879 Difference at 95.0% confidence -160.699 +/- 79.6798 -14.6012% +/- 7.23977% (Student's t, pooled s = 58.4006) Since all the boinc processes are running at i31, why are they impacting a buildkernel that runs with 0 nicety? System information: AMD Athlon(tm) Dual Core Processor 4850e (2511.45-MHz K8-class CPU) running FreeBSD/amd64 from just before 8.1-RELEASE with WITNESS and WITNESS_SKIPSPIN, 8GB RAM, src and obj are both on ZFS. -- Peter Jeremy pgpTYgNUsxZxp.pgp Description: PGP signature
Re: ZFS backups: retrieving a few files?
On 2010-Nov-24 11:07:23 +0100, Alexander Leidinger alexan...@leidinger.net wrote: Quoting Peter Jeremy peterjer...@acm.org (from Wed, 24 Nov 2010 06:32:07 +1100): BTW, the entire export is performed at the current compression level - recompressing existing data. Are you sure the compression is done on the sending side, and not at the receiving side? I would expect the later (as I can specify a different compression level on an existing destination, if I remember correctly). Sorry, that was poorly worded. The actual send stream is not compressed but the entire filesystem stream will be re-compressed on the receive side as specified by the compression parameter on the sending filesystem. -- Peter Jeremy pgpiHfu1rdR5j.pgp Description: PGP signature
Re: ZFS backups: retrieving a few files?
On 2010-Nov-23 23:45:43 +1100, Andrew Reilly arei...@bigpond.net.au wrote: zfs send -vR tank/h...@0 | zfs receive -d /backup/snapshots in order to experiment with this strategy. One would then become alarmed when one discovered that the receive mechanism also invoked the mountpoint= parameter of the source filesystem, and the zfs propensity for just doing stuff, and boom: you have a read-only version of your home directory mounted *on top of* your actual home directory... Been there, done that. The undocumented '-u' option to receive will prevent the receive side performing mounts. The poorly documented '-R' option on import allows you to specify an alternative root mountpoint. Once you have done the initial transfer, you can also set 'canmount=noauto' on each fileset (it isn't inherited) to prevent ZFS automounting them on import. BTW, the entire export is performed at the current compression level - recompressing existing data. -- Peter Jeremy pgpJ45k6nVaUR.pgp Description: PGP signature