Re: Problems with pf + ftp-proxy on gateway
--- Renato Botelho [EMAIL PROTECTED] wrote: I'm trying to use pf + ftp-proxy n a 6.1-PRERELEASE machine. I have this line on inetd.conf: ftp-proxy stream tcp nowait root/usr/libexec/ftp-proxy ftp-proxy -n And this lines on pf.conf: rdr on $int_if proto tcp from any to any port ftp - 127.0.0.1 port ftp-proxy pass in quick on $ext_if inet proto tcp from any port ftp-data to $ext_if:0 user proxy flags S/SA keep state When one machine inside my network (e.g. 192.168.x.x) connects to an external ftp server (e.g. ftp.FreeBSD.org), data connection doesn't work. Connection comes to my firewall and is accepted but connection is not established and stay like this here: self tcp 200.x.x.x:57625 - 200.x.x.x:20 ESTABLISHED:FIN_WAIT_2 You need to decide whether you are working with passive ftp clients (probably), active, or both. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Script-friendly (parseble) ps(1) output?
Hello, I need to write a cgi script which will print the output from ps(1) in a table (html), so the average-operator can click on a KILL link and the cgi will send the selected signal. I need to add one ps information per column in a table (html), however, I found ps(1) output to be too hard to parse. There is no separator. I believed \t was the separator but its not. The ps(1) command I need to use is: ps -ax -o pid -o user -o emul -o lstart -o lockname -o stat -o command Since many of those args use [:space:] in the output, I can not use [:space:] as a separator. Sadly, `-o fiend='value'` will only format the HEADER output, not the values. Ive got no clue what to do, can someone enlight me? Thank you all in advance. -- === Eduardo Meyer pessoal: [EMAIL PROTECTED] profissional: [EMAIL PROTECTED] Here is something simple, and you can wrap the HTML around it...; poshta:$ps axuww | while read USER PID CPU MEM VSZ RSS TT STAT STARTED TIME COMMAND; do echo $PID $CPU $USER $COMMAND;done |head -3 PID %CPU USER COMMAND 11 89.6 root [idle] 5127 2.9 qscand spamd child (perl5.8.8) the read ignores all white space...the last variable in that 'while read' will hold everything beyond it... ie; poshta:$ps axuww| while read USER PID CPU MEM VSZ RSS TT STAT STARTED TIME; do echo $PID $CPU $USER $TIME;done |head -3 PID %CPU USER TIME COMMAND 11 77.9 root 138080:11.91 [idle] 13607 5.0 qscand 0:09.12 spamd child (perl5.8.8) etc.etc... ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
stable/8/UPDATING - no mention of 8.0 release
iH, Been updating my src via svn and following stable/8. Looking at the UPDATING file, it does not mention '8.0-RELEASE' Did it just not make it in there yet? http://svn.freebsd.org/viewvc/base/stable/8/UPDATING?view=markup vs. http://svn.freebsd.org/viewvc/base/release/8.0.0/UPDATING?view=markup It just confused me for awhile after I did a 'svn up' and did not see the release notes... [for the 7.X series, both stable/ and release/ mention the release in UPDATING] ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
panic: spin lock held too long 8.0-STABLE r207638
iH, Got a system that whenever I launch another Virtualbox instance, it will panic anywhere from 10 minutes to several days. I was able to get this system to panic constantly yesterday by trying to install Windows 2008, or after it was eventually installed, it only ran for ~45 minutes before panic. Some installs panic at around 2% done, most at ~80% and one actually completed. Trying to figure out if it's my hardware or what. I used to have a Windows XP VM running alongside, and the panics happened about once a week - eventually I got lazy and didn't start the XP VM. [less load and no swap usage, and no panics for 1.5 months until yesterday] Yesterday I tried to install Windows 2008, and in the ~6 hours I was messing around, it panicked around 8 times [after random amounts of time] panic: spin lock held too long [cpuid either 2,3, or 4 as far as I remember] 4GB of RAM AMD X4 CPU 8.0-STABLE r207638 amd64 vbox 3.1.6 mostly GENERIC kernel [sched_ule], with firewall/altq compiled in and unneccessary hardware removed from kernel config. With just one VM [FreeBSD 8-stable, 2 CPU 2GB RAM] the system runs fine for months. [it's also a file/print server] As soon as I try to get a Windows VM running on there, and it starts using some swap, anywhere from 20MB to 150MB it eventually panics [usually within an hour or so]. Any ideas ? hardware issues? Anyone successfull in running several VMs on FreeBSD - VirtualBox ? [overloading it?] ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Some questions about jails on FreeBSD9.0-RC1
On 10/26/2011 03:12 AM, Patrick Lamaiziere wrote: Le Tue, 25 Oct 2011 22:52:55 +0200, carlopmartcarlopm...@gmail.com a écrit : Hello, I have installed one FreeBSD 9.0-RC1 host to run different services (dns, smtp and www only) using jails. This host has two physical nics: em0 and em1. em0 is assigned to pyhiscal host, and I would like to assign em1 to jails. But em0 and em1 are on different networks: em0 is on 192.168.1.0/24 and em1 in 192.168.2.0/29. I have setup one jail using ezjail. My first surprise is that ezjail only installs -RELEASE versions and not RC versions. Ok, I supouse that it is normal. But my first question is: can I install a FreeBSD 8.2 jail under a FreeBSD 9.0 host?? You may run 8.2 installed ports on 9.0 by using the port /usr/ports/misc/compat8x/ But I suggest to upgrade the port ASAP. And the real question: How do I need to configure network under this jail to access it? I have configured ifconfig param for em1 on host's rc.conf, but what about the default route under this jail?? I thought to use pf rules, but I am not sure. jail enforces the use of the jail IP address in the jail, but that's all. Just enable routing on the host. But, that is not possible. Between host and jail exists a firewall ... I can't do simple routing with the host. Maybe a posible solution is to use policy source routing ?? -- CL Martinez carlopmart {at} gmail {d0t} com ___ I'm using FIBs. The host is in on a private network with gateway of 192.168.1.1 and jails are on public network with their own real/public gateway. FIBs work without the box becoming a gateway: %grep gateway /etc/rc.conf gateway_enable=NO I have this in system startup to setup public gateway for jails: %cat /usr/local/etc/rc.d/0.setfib.sh #!/bin/sh echo setfib 1 for public jails /usr/sbin/setfib 1 /sbin/route add default 216.241.167.1 and in /usr/local/etc/ezjail/myjail I added this line to the end of configs: export jail_myjail_fib=1 [/usr/sbin/jail has FIB support built in, but at that time ezjail did not, so I had to manually add it in the config - nowadays I believe ezjail has FIB support natively, but the resulting config file is the same] The host is using NAT to get out via private IP, and jails are available via public IP. All the IPs are defined in rc.conf the normal _alias way. FIB support as I remember needs a custom kernel - not sure about 9, this is in 8.2. I even run openbsd spamd on the host and using FIBs to start the spamd daemon via a 'setfib 1' wrapper script: %cat /usr/local/etc/rc.d/obspamdfib.sh #!/bin/sh # # this just calls the orignal file, but with setfib 1 /usr/sbin/setfib 1 /usr/local/etc/rc.d.fib/obspamd $1 I had moved the 'obspamd' startup script to rc.d.fib just so a 'setfib 1' wrapper is called. ]Peter[ FIBs are awesome when you don't have many public IPs and when host is _only_ a jail host running no services ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: why is pkg_add on 9.0 stable still using packages from packages-9-current?
I've just updated from 9.0RC3 to 9.0-stable r229626 after a clean install. However when trying to add the first packages I noticed that pkg_add is installing packages from ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/ and not from package-9-stable which I would expect. In particular because the package-9-stable directory does now exist on the server. Using Google this appears to have to do with /usr/src/usr.sbin/pkg_install/add/main.c in which the following revision appears to include the link: http://svnweb.freebsd.org/base/head/usr.sbin/pkg_install/add/main.c?r1=225757r2=225756pathrev=225757 However if I look at this specific source file on my system line 98 that brings in the link to packages-9-stable is missing (The file has packages-9.0-release, but not packages-9-stable) Any suggestion on what I am doing wrong would be highly appreciated as I don't understand this. *) uname -a shows 9.0-stable r229626; the update to latest stable has been done according to handbook. No variables that could affect this have been changed as it was a clean install using the 9.0RC3 bootdisk Should have stayed at -release :) I just had the same question and after doing some further research, the issue is this: http://www.freebsd.org/doc/en/books/porters-handbook/freebsd-versions.html and pkbsdpkg:#pwd /usr/src/usr.sbin/pkg_install pkbsdpkg:#grep -R packages-9 * add/main.c: { 90, 900499, /packages-9.0-release }, add/main.c: { 90, 999000, /packages-9-current }, They haven't updated main.c for -stable yet and -stable is at 900500 per sys/sys/param.h [or URL]. I'm just setting PACKAGESITE=ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/ manually. ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS l2arc broken in 10.3
details to follow ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
[fixed] ZFS l2arc broken in 10.3
sendbug seems not to work anymore, I end up on websites with marketing- babble and finally get asked to provide some login and passwd. :( But the former mail looks like having come back to me, so it seems I'm still allowed to post here... *** sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c.orig Wed Oct 12 21:07:25 2016 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.cWed Oct 12 21:46:16 2016 *** *** 6508,6514 */ buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; ! buf_a_sz = P2ROUNDUP(buf_sz, align); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; --- 6508,6514 */ buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; ! buf_a_sz = P2ROUNDUP_TYPED(buf_sz, align, uint64_t); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS l2arc broken in 10.3
Details: After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the l2arc stays empty (capacity alloc = 0), although it is online and gets accessed. It did work well on 9.3. I did the following tests: * Create a zpool on a stick, with two volumes: one filesystem and one cache. The cache stays with alloc=0. Export it and move it into the other machine. The cache immediately fills. Move it back, the cache stays with alloc=0. -> this rules out all zpool/zfs get/set options, as they should walk with the pool. * Boot the GENERIC kernel. l2arc stays with alloc=0. -> this rules out all my nonstandard kernel options. * Boot in single user mode. l2arc stays with alloc=0. -> this rules out all /etc/* config files. * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0. * Copy the /boot/loader.conf settings to the other machine. The l2arc still works there. I could not think of any remaining place where this could come from, except the kernel code itself. From there, I found these counters nicely incrementing each second: kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488 But also this counter incrementing: kstat.zfs.misc.arcstats.l2_write_full: 14604 Then with some printf in the code I saw these values provided: buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; buf_a_sz = P2ROUNDUP(buf_sz, align); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; mutex_exit(hash_lock); ARCSTAT_BUMP(arcstat_l2_write_full); break; } buf_sz =1536 align = 512 buf_a_sz = 18446744069414585856 write_asize = 0 target_sz = 16777216 where buf_a_sz is obviousely off by (2^64 - 2^32). Maybe this is an effect of crosscompiling i386 on amd64. But anyway, as long as i386 is still supported, it should not happen. Now, my real concern is: if this really obvious ... made it undetected until 10.3, how many other missing typecasts are still in the code?? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Nightly disk-related panic since upgrade to 10.3
Andrea Venturoli wrote: Hello. Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and rebooted at least once every night. Hi, I have quite similar issue, crash dumps every night, but then my stacktrace is different (crashing mostly in cam/scsi/scsi.c), and my env is also quite different (old i386, individual disks, extensive use of ZFS), so here is very likely a different reason. Also here the upgrade is not the only change, I also replaced a burnt powersupply recently and added an SSD cache. Basically You have two options: A) fire up kgdb, go into the code and try and understand what exactly is happening. This depends if You have clue enough to go that way; I found "man 4 gdb" and especially the "Debugging Kernel Problems" pdf by Greg Lehey quite helpful. B) systematically change parameters. Start by figuring from the logs the exact time of crash and what was happening then, try to reproduce that. Then change things and isolate the cause. Having a RAID controller is a bit ugly in this regard, as it is more or less a blackbox, and difficult to change parameters or swap components. The only exception was on Friday, when it locked without rebooting: it still answered ping request and logins through HTTP would half work; I'm under the impression that the disk subsystem was hung, so ICMP would work since it does no I/O and HTTP too worked as far as no disk access was required. Yep. That tends to happen. It doesnt give much clue, except that there is a disk related problem. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs, a directory that used to hold lot of files and listing pause
Eugene M. Zheganin wrote: Hi. I have FreeBSD 10.2-STABLE r289293 (but I have observed this situation on different releases) and a zfs. I also have one directory that used to have a lot of (tens of thousands) files. I surely takes a lot of time to get a listing of it. But now I have 2 files and a couple of dozens directories in it (I sorted files into directories). Surprisingly, there's still a lag between "ls" and an output: I see this on my pgsql_tmp dirs (where Postgres stores intermediate query data that gets too big for mem - usually lots of files) - in normal operation these dirs are completely empty, but make heavy disk activity (even writing!) when doing ls. Seems normal, I dont care as long as the thing is stable. One would need to check how ZFS stores directories and what kind of fragmentation can happen there. Or wait for some future feature that would do housekeeping. ;) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
10-STABLE zfs: strange memory stats
I observe a strange reading of the ZFS memory stats: Mem: 298M Active, 207M Inact, 446M Wired, 10M Cache, 91M Buf, 29M Free ARC: 339M Total, 8758K MFU, 43M MRU, 52K Anon, 35M Header, 40M Other Swap: 2441M Total, 402M Used, 2040M Free, 16% Inuse Usually I perceived the "Total" value being approx. the sum of the other values. Now this is still the case after system start, but after a day the significant difference appears like shown above. (40+35+43+9 = 127 << 339) Also it seems the ARC is reluctant to grow when free mem is avail nor does it shrink much while paging out. The build is r309023M. Definitely the behaviour is different than what I tried before (r306589:306943M), but that one was probably unstable, and I see a bunch of ZFS related commits interim. Also I now see some counts on "l2_cksum_bad" which weren't there before. BTW: is there some specific mailing-list where ZFS changes are pronounced? Machine is i386 with 1GB mem. Probably hardware is somehow crappy, but at least the mem readings are difficult to explain by hardware weakness. Config (in case it matters): vm.kmem_size="576M" vm.kmem_size_max="576M" vfs.zfs.arc_max="320M" vfs.zfs.arc_min="120M" vfs.zfs.vdev.cache.size="5M" vfs.zfs.prefetch_disable="0" vfs.zfs.l2arc_norw="0" vfs.zfs.l2arc_noprefetch="0" kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch: 1016019 kstat.zfs.misc.arcstats.sync_wait_for_async: 1157 kstat.zfs.misc.arcstats.arc_meta_min: 62914560 kstat.zfs.misc.arcstats.arc_meta_max: 242711832 kstat.zfs.misc.arcstats.arc_meta_limit: 83886080 kstat.zfs.misc.arcstats.arc_meta_used: 133996612 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 272242 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 489828 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 3372460809216 kstat.zfs.misc.arcstats.l2_write_pios: 14313 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 122496 kstat.zfs.misc.arcstats.l2_write_full: 177 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 4673385 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 925 kstat.zfs.misc.arcstats.l2_write_in_l2: 93122523 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 196362282 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 57198 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 20575 kstat.zfs.misc.arcstats.l2_padding_needed: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 33567112 kstat.zfs.misc.arcstats.l2_asize: 4040757248 kstat.zfs.misc.arcstats.l2_size: 4472570880 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 61 kstat.zfs.misc.arcstats.l2_abort_lowmem: 15 kstat.zfs.misc.arcstats.l2_free_on_write: 26703 kstat.zfs.misc.arcstats.l2_evict_l1cached: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_writes_lock_retry: 173 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_done: 14313 kstat.zfs.misc.arcstats.l2_writes_sent: 14313 kstat.zfs.misc.arcstats.l2_write_bytes: 6030606336 kstat.zfs.misc.arcstats.l2_read_bytes: 11140009984 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_feeds: 122496 kstat.zfs.misc.arcstats.l2_misses: 4370503 kstat.zfs.misc.arcstats.l2_hits: 2932017 kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 46062080 kstat.zfs.misc.arcstats.mfu_ghost_evictable_data: 1047040 kstat.zfs.misc.arcstats.mfu_ghost_size: 47109120 kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0 kstat.zfs.misc.arcstats.mfu_evictable_data: 114688 kstat.zfs.misc.arcstats.mfu_size: 9073664 kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 178836480 kstat.zfs.misc.arcstats.mru_ghost_evictable_data: 86231040 kstat.zfs.misc.arcstats.mru_ghost_size: 265067520 kstat.zfs.misc.arcstats.mru_evictable_metadata: 5632 kstat.zfs.misc.arcstats.mru_evictable_data: 1155072 kstat.zfs.misc.arcstats.mru_size: 49945088 kstat.zfs.misc.arcstats.anon_evictable_metadata: 0 kstat.zfs.misc.arcstats.anon_evictable_data: 0 kstat.zfs.misc.arcstats.anon_size: 53248 kstat.zfs.misc.arcstats.other_size: 44759120 kstat.zfs.misc.arcstats.metadata_size: 50840064 kstat.zfs.misc.arcstats.data_size: 231464448 kstat.zfs.misc.arcstats.hdr_size: 4830316 kstat.zfs.misc.arcstats.overhead_size: 41351168 kstat.zfs.misc.arcstats.uncompressed_size: 52131328 kstat.zfs.misc.arcstats.compressed_size: 17729024 kstat.zfs.misc.arcstats.size: 365461060 kstat.zfs.misc.arcstats.c_max: 335544320 kstat.zfs.misc.arcstats.c_min: 125829120 kstat.zfs.misc.arcstats.c: 315017029 kstat.zfs.misc.arcstats.p: 145334923 kstat.zfs.misc.arcstats.hash_chain_max: 17 kstat.zfs.misc.arcstats.hash_chains: 119135 kstat.zfs.misc.arcstats.hash_collisions: 6453863 kstat.zfs.misc.arcstats.hash_elements_max: 538227 kstat.zfs.misc.arcstats.hash_elements: 525460 kstat.zfs.misc.arcstats.evict_l2_skip: 4277 kstat.zfs.misc.arcstats.evict_l2_ineligible: 7410790400 kstat.zfs.misc.arcstats.evict_l2_eligible: 14946466816 kstat.zfs.misc.arcstats.evict_l2_cached: 26608123904
Re: Dying jail
Eugene Grosbein wrote: Hi! Recently I've upgraded one of my server running 9.3-STABLE with jail containing 4.11-STABLE system. The host was source-upgraded upto 10.3-STABLE first and next to 11.0-STABLE and jail configuration migrated to /etc/jail.conf. The jail kept intact. "service jail start" started the jail successfully but "service jail restart" fails due to jail being stuck in "dying" state for long time: "jls" shows no running jails and "jls -d" shows the dying jail. Same issue here. During upgrade to 10 I wrote a proper jail.conf, and, as this is now a much more transparent handling, I also began to start+stop my jails individually w/o reboot. I found the same issue: often jails do not want to fully terminate, but stay in the "dying" state - sometimes for a minute or so, but sometimes very long (indefinite). It seems this is not related to remaining processes or open files (there are none) but to network connections/sockets which are still present. Probably these connections can be displayed with netstat, and probably netstat -x shows some decreasing counters associated with them - I have not yet found the opportunity to figure out what they exactly mean, but anyway it seems like there may be long times involved (hours? forever?), unless one finds the proper connection and terminates both ends. There seems to be no other way to deliberately "kill" such connections and thereby terminate the jail, so the proposal to let it have a new number might be the only feasible approach. (I dont like it, I got used to the numbers of my jails.) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS l2arc broken in 10.3
Pete French wrote: Ok, thats a bit worry if true - but I can confirm that l2arc works fine under 10.3 on amd64, so what you say about cross-compling might be true. Am taking an inetrest in this as I have just dpeloyed a lot of machines which are going to be relying on l2arc working to get reasobale performance. Sure on my amd64 it also works fine. AFAIK such things are tolerated when compiling in 64bit. But I was pointed to another point interim: my source is from STABLE branch; in the 10.3 RELEASE the code is different. Obviousely there were recent changes, and that explains why the problem was not yet detected. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Rel.10.3 zfs GEOM removal and memory leak
Question: how to get ZFS l2arc working on FBSD 10.3 (RELENG or STABLE)? Problem using 10.3 RELENG: When ZFS is called the first time after boot, it will delete all device nodes of the drive carrying l2arc. ZFS itself will access it's slices by a "diskid/" string, but all other access is impossible - especially, a swapspace on the same drive (NOT under ZFS) will fail to activate: > NAME STATE READ WRITE CKSUM > gr ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da0s2 ONLINE 0 0 0 > da1s2 ONLINE 0 0 0 > da2s2 ONLINE 0 0 0 > cache > diskid/DISK-162020405512s1e ONLINE 0 0 0 Here "diskid/DISK-162020405512s1e" equals to ada3s1e, and trying to open a swapspace on ada3s1b now fails, because that device is no longer present in /dev : > root@edge:~ # gpart show ada3 > gpart: No such geom: ada3. If we now remove the l2arc via "zfs remove gr diskid/DISK-162020405512s1e" then the device nodes magically reappear, and we can activate swapspace. Afterwards we can add the l2arc again, and it will be shown correctly as "ada3s1e" - but at the next boot the problem appears again. This problem does not exist in 10.3 STABLE, but instead there is: Problem using 10.3 STABLE: Here seems to be a memory leak: the ARC grows above its limits, while the space used is not accounted in one of [MFU MRU Anon Header Other L2Hdr]. After some time the MFU+MRU shrink to the bare minimum, and the system is all busy with arc_reclaim. The behaviour seems to be triggered by writing to l2arc.(*) Any advice on how to proceed (or which supported version might work better)? (*) Addendum: I tried to understand the phenomen, and found this on arcstats: (metadata_size + data_size) + hdr_size + l2_hdr_size + other_size = size and metadata_size + data_size = mfu_size + mru_size + anon_size + X The X is the memory leak, it does never shrink, does not disappear when all l2arc are removed, and while l2arc are written it does continually (but not linear) grow until the system is quite stuck and l2arc write ceases. Further investigations shows the growing of X being synchronous with the growing of kstat.zfs.misc.arcstats.l2_free_on_write figure. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: huge amount of l2_cksum_bad
After upgrading 11.0-RELEASE-p10 to 11.1-RELEASE I suddenly see a huge amount of kstat.zfs.misc.arcstats.l2_cksum_bad (nearly 2% of kstat.zfs.misc.arcstats.l2_hits). I have set > vfs.zfs.compressed_arc_enabled="0" in loader.conf. When removing this, the errors are gone. It seems that option is not working well in 11.1-RELEASE. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: new line containing garbage added to "top"
After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: > last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 > 1030 processes:9 running, 1004 sleeping, 17 waiting > CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle > CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle > Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free > ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other > 136¿176M Compress185 194M Uncompressed361.94:1 Ratio > Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse > PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. (Actually it looks like a wrong TERMCAP, but wasn't that ~20 years ago? checking...) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE: new line containing garbage added to "top"
Glen Barber wrote: On Fri, Jul 28, 2017 at 03:24:50PM +0200, Peter wrote: After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 1030 processes:9 running, 1004 sleeping, 17 waiting CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other 136¿176M Compress185 194M Uncompressed361.94:1 Ratio Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. (Actually it looks like a wrong TERMCAP, but wasn't that ~20 years ago? checking...) Do you mean the blank line between the 'Swap:' line and 'PID'? If so, that has been there as long as I can recall. It is used for things like killing processes, etc. (Hit 'k' when using top(1), and you will see a prompt for a PID to kill.) Glen No, I mean the line *above* the 'Swap:' line, which is new and *should* show compressed arc stats. (What we actually see there is the printing of a random memory location - working on it...) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: panic! acl_from_aces: a_type is 0x4000
This is mostly for the search engines, so others running into it may find it easier to solve. While updating some ports via "portupgrade", I got this panic: Panic String: acl_from_aces: a_type is 0x4000 The phenomen was reproducible; it appeared while creating a backup package from the "glib" port. I checked readability of all concerned files, did a scrub on the pool, but found no errors! As I was busy with other issues, I then neglected the matter and simply deleted and reinstalled that port. A couple days later, working on a different installation, I got the exact same panic at the exact same point, while updating the "glib" port. This time I looked closer into the matter. According to "truss", the panic appears while "pkg" calls __acl_get_link() on a specific file. That file is readable. The directory tree can be searched. But it is not possible to do "ls -l" on the directory -> panic! It is possible to send+recv the Filesystem: the error gets transported to the new filesystem! (From ZFS view it seems to be legal payload; only from FreeBSD file-handling view it is reason for panic.) Finally, the file can be copied, unlinked, and recreated. I did a thorough search and found a dozen other files on the system with the same issue. REMEDY: --- It seems that such flaws can lure undetected on a system for an indefinite time. The only way to find them seems read all inode data, via something like #find -x `mount -t zfs | awk '{print $3}'` -type d -exec ls -la {} \; ROOT CAUSE: --- Not fully clear. It may be related to hardware (memory) flaws. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
errors from port make (analyzed: bug in pkg)
For a long time already, I get these strange messages whenever building a port: pkg: Bad argument on pkg_set 2143284626 Today I looked into it, and found it is easily reproducible: # pkg audit whatever pkg: Bad argument on pkg_set 2143284618 0 problem(s) in the installed packages found. # Looking closer, I found this offending call in src/audit.c:exec_audit(): pkg_set(pkg, PKG_UNIQUEID, name); This goes into libpkg/pkg.c:pkg_vset(), but there nobody is interested in an UNIQUEID parameter, so that the parameter does not get fetched from the va_list. It does not do any harm, but it is ugly. Please fix. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: a strange and terrible saga of the cursed iSCSI ZFS SAN
Eugene M. Zheganin wrote: Hi, On 05.08.2017 22:08, Eugene M. Zheganin wrote: pool: userdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM userdata ONLINE 0 0 216K mirror-0 ONLINE 0 0 432K gpt/userdata0 ONLINE 0 0 432K gpt/userdata1 ONLINE 0 0 432K That would be funny, if not that sad, but while writing this message, the pool started to look like below (I just asked zpool status twice in a row, comparing to what it was): [root@san1:~]# zpool status userdata pool: userdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM userdata ONLINE 0 0 728K mirror-0 ONLINE 0 0 1,42M gpt/userdata0 ONLINE 0 0 1,42M gpt/userdata1 ONLINE 0 0 1,42M errors: 4 data errors, use '-v' for a list [root@san1:~]# zpool status userdata pool: userdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM userdata ONLINE 0 0 730K mirror-0 ONLINE 0 0 1,43M gpt/userdata0 ONLINE 0 0 1,43M gpt/userdata1 ONLINE 0 0 1,43M errors: 4 data errors, use '-v' for a list So, you see, the error rate is like speed of light. And I'm not sure if the data access rate is that enormous, looks like they are increasing on their own. So may be someone have an idea on what this really means. It is remarkable that You always have the same error count on both sides of the mirror. From what I have seen, such a picture appears when an unrecoverable error (i.e. one that is on both sides of the mirror) is read again and again. File number 0x1 is probably some important metadata, and since it is not readable it cannot be put into the ARC, so the read is tried ever again. An error that would appear only on one side appears only once, because then it is auto-corrected. In that case the figures have some erratic deviations. Therefore it is worthwile to remove the erroneous data soon, because as long as that exists one does not get anything useful from the figures (like how many errors are actually appearing anew). ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE: new line containing garbage added to "top"
Glen Barber wrote: On Fri, Jul 28, 2017 at 03:24:50PM +0200, Peter wrote: After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 1030 processes:9 running, 1004 sleeping, 17 waiting CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other 136¿176M Compress185 194M Uncompressed361.94:1 Ratio Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. It appears to be fixed in 11-STABLE (r321419). Glen I don't think so. At least there is nothing in the commitlog. r318449 is the last commit in 11-STABLE for the respective file; and thats before the 11.1-RELEASE branch. The error is in the screen-formatting in "top", and that error was already present back in 1997 (and probably earlier), and it is also present in HEAD. What "top" does is basically this: > char *string = some_buffer_to_print; > printf("%.5s", [-4]); A negative index on a string usually yields a nullified area. (Except if otherwise *eg*) Thats why we usually don't see the matter - nullbytes are invisible on screen. Fix is very simple: Index: contrib/top/display.c === --- display.c (revision 321434) +++ display.c (working copy) @@ -1310,7 +1310,7 @@ cursor_on_line = Yes; putchar(ch); *old = ch; - lastcol = 1; + lastcol++; } old++; - Then, since I was at it, I decided to beautify the proc display as well, as I usually see >1000 procs: --- display.c (revision 321434) +++ display.c (working copy) @@ -100,7 +100,7 @@ int y_loadave = 0; int x_procstate = 0; int y_procstate = 1; -int x_brkdn = 15; +int x_brkdn = 16; int y_brkdn = 1; int x_mem = 5; int y_mem = 3; @@ -373,9 +373,9 @@ printf("%d processes:", total); ltotal = total; -/* put out enough spaces to get to column 15 */ +/* put out enough spaces to get to column 16 */ i = digits(total); -while (i++ < 4) +while (i++ < 5) { putchar(' '); } Then, concerning the complaint about the empty line (bug #220996), I couldn't really reproduce this. But it seems that specifically this issue was already fixed in HEAD by this one here: https://reviews.freebsd.org/D11693 Now, can anybody make the above snippets appear in HEAD and 11-STABLE? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE: new line containing garbage added to "top"
Glen Barber wrote: On Fri, Jul 28, 2017 at 07:04:51PM +0200, Peter wrote: Glen Barber wrote: On Fri, Jul 28, 2017 at 03:24:50PM +0200, Peter wrote: After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 1030 processes:9 running, 1004 sleeping, 17 waiting CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other 136¿176M Compress185 194M Uncompressed361.94:1 Ratio Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. It appears to be fixed in 11-STABLE (r321419). Glen I don't think so. At least there is nothing in the commitlog. r318449 is the last commit in 11-STABLE for the respective file; and thats before the 11.1-RELEASE branch. See r321419. Yes, thats the issue with the empty line when ZFS is *not* in use, which I mentioned below (bug #220996). For that a fix is committed. The error is in the screen-formatting in "top", and that error was already present back in 1997 (and probably earlier), and it is also present in HEAD. What "top" does is basically this: char *string = some_buffer_to_print; printf("%.5s", [-4]); A negative index on a string usually yields a nullified area. (Except if otherwise *eg*) Thats why we usually don't see the matter - nullbytes are invisible on screen. Fix is very simple: Index: contrib/top/display.c === --- display.c (revision 321434) +++ display.c (working copy) @@ -1310,7 +1310,7 @@ cursor_on_line = Yes; putchar(ch); *old = ch; - lastcol = 1; + lastcol++; } old++; - Then, since I was at it, I decided to beautify the proc display as well, as I usually see >1000 procs: --- display.c (revision 321434) +++ display.c (working copy) @@ -100,7 +100,7 @@ int y_loadave = 0; int x_procstate = 0; int y_procstate = 1; -int x_brkdn = 15; +int x_brkdn = 16; int y_brkdn = 1; int x_mem = 5; int y_mem = 3; @@ -373,9 +373,9 @@ printf("%d processes:", total); ltotal = total; -/* put out enough spaces to get to column 15 */ +/* put out enough spaces to get to column 16 */ i = digits(total); -while (i++ < 4) +while (i++ < 5) { putchar(' '); } Then, concerning the complaint about the empty line (bug #220996), I couldn't really reproduce this. But it seems that specifically this issue was already fixed in HEAD by this one here: https://reviews.freebsd.org/D11693 Now, can anybody make the above snippets appear in HEAD and 11-STABLE? I've CC'd allanjude, who has touched some of these in the past. Thanks a lot! ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: magic hosed, file recognition fails
Just found that my scripts that would detect image types by means of the "file" command do not work anymore in RELEASE-11. :( Whats happening in R11.1 is this: $ scanimage > /tmp/SCAN $ file /tmp/SCAN /tmp/SCAN: data While on R10 in looked this way, which appears slightly more useful: $ scanimage > /tmp/SCAN $ file /tmp/SCAN /tmp/SCAN: Netpbm image data, size = 2480 x 3507, rawbits, greymap Further investigation shows, the problem may have appeared with this update: >r309847 | delphij | 2016-12-11 08:33:02 +0100 (Sun, 11 Dec 2016) | 2 lines > >MFC r308420: MFV r308392: file 5.29. And that is a contrib, it seems the original comes from fishy penguins. So no proper repo, and doubtful if anybody might be in charge, but instead some colorful pictures like this one: https://fossies.org/diffs/file/5.28_vs_5.29/magic/Magdir/images-diff.html --- Looking closer - this is my file header: pmc@disp:604:1/tmp$ hd SCAN |more 50 35 0a 23 20 53 41 4e 45 20 64 61 74 61 20 66 |P5.# SANE data f| 0010 6f 6c 6c 6f 77 73 0a 32 34 38 30 20 33 35 30 37 |ollows.2480 3507| 0020 0a 32 35 35 0a 5f 58 56 4b 53 49 4b 52 54 50 51 |.255._XVKSIKRTPQ| 0030 4e 4c 52 5b 56 55 4c 47 4e 4f 4e 4d 53 54 53 4d |NLR[VULGNONMSTSM| 0040 53 49 50 52 4c 51 4f 53 56 55 53 4d 55 4e 4e 4c |SIPRLQOSVUSMUNNL| 0050 55 49 4d 50 52 4c 4e 50 4d 56 4e 51 52 4e 4e 50 |UIMPRLNPMVNQRNNP| And this is the ruleset in the magic file: # PBMPLUS images # The next byte following the magic is always whitespace. # strength is changed to try these patterns before "x86 boot sector" 0 namenetpbm >3 regex/s =[0-9]{1,50}\ [0-9]{1,50} Netpbm image data >>&0regex =[0-9]{1,50}\b, size = %s x >>>&0 regex =[0-9]{1,50}\b %s 0 string P5 >0 regex/4 P5\\s >>0 use netpbm >>>0string x \b, rawbits, pixmap !:strength + 45 !:mime image/x-portable-pixmap The failing line is the one with "regex/4" command, and I dont see why there is a *double* \ - but a single one doesnt work either. Using \n instead, would work. And what also works is this one: >0 regex/4 P5[[:space:]] To figure the root cause would mean to look into that libmagic, and maybe there is a misunderstanding between the design of that lib and the linux guys maintaining the magic file? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-BETA1: lsof build failure
FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-BETA1: lsof build failure
Larry Rosenman wrote: Current lsof is 4.90M. Ack, that does. Larry Sysutils/lsof maintainer On 6/14/17, 8:13 AM, "Peter" <owner-freebsd-sta...@freebsd.org on behalf of p...@citylink.dinoex.sub.org> wrote: FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Security patch SA-18:03 removed from 11.2 - why?
Release/update 11.1-p8 introduced so-called "mitigation for speculative execution vulnerabilities". In RElease 11.2 these "mitigation" have been removed. What is the reason for the removal, and specifically why is Security advisory 18:03 still mentioned in the release notes? Behaviour with 11.1-p8: # sysctl hw.ibrs_disable hw.ibrs_disable: 0 # sysctl hw.ibrs_active hw.ibrs_active: 1 Behaviour with 11.2 w/ same CPU + microcode: # sysctl hw.ibrs_disable hw.ibrs_disable: 0 # sysctl hw.ibrs_active hw.ibrs_active: 0 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Eugene Grosbein wrote: I see no reasons to use SHED_ULE for such single core systems and use SCHED_BSD. Nitpicking: it is not a single core system, it's a dual that for now is equipped with only one chip, the other is in the shelf. But seriously, I am currently working myself through the design papers for the SCHED_ULE and the SMP stuff, and I tend to be with You and George, in that I do not really need these features. Nevertheless, I think the system should have proper behaviour *as default*, or otherwise there should be a hint in the docs what to do about. Thats the reason why I raise this issue - if the matter can be fixed, thats great, but if we come to the conclusion that small/single-core/CPU-bound/whatever systems are better off with SCHED_4BSD, then thats perfectly fine as well. Or maybe, that those systems should disable preemption? I currently don't know, but i hope we can figure this out, as the problem is clearly visible. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: more data: SCHED_ULE+PREEMPTION is the problem
Hi Stefan, I'm glad to see You're thinking along similar paths as I did. But let me fist answer Your question straight away, and sort out the remainder afterwards. > I'd be interested in your results with preempt_thresh set to a value > of e.g.190. There is no difference. Any value above 7 shows the problem identically. I think this value (or preemtion as a whole) is not the actual cause for the problem; it just changes some conditions that make the problem visible. So, trying to adjust preempt_thresh in order to fix the problem seems to be a dead end. Stefan Esser wrote: The critical use of preempt_thresh is marked above. If it is 0, no preemption will occur. On a single processor system, this should allow the CPU bound thread to run for as long its quantum lasts. I would like to contradict here. From what I understand, preemption is *not* the base of task switching. AFAIK preemption is an additional feature that allows to switch threads while they execute in kernel mode. While executing in user mode, a thread can be interrupted and switched at any time, and that is how the traditional time-sharing systems did it. Traditionally a thread would execute in kernel mode only during interrupts and syscalls, and those last no longer than a few ms, and for long that was not an issue. Only when we got the fast interfaces (10Gbps etc.) and got big monsters executing in kernel space (traffic-shaper, ZFS, etc.), that scheme became problematic and preemption was invented. According to McKusicks book, the scheduler is two-fold: an outer logic runs few times per second and calculates priorities. And an inner logic runs very often (at every interrupt?) and chooses the next runnable thread simply by priority. The meaning of the quantum is then: when it is used up, the thread is moved to the end of it's queue, so that it may take a while until it runs again. This is for implementing round-robin behaviour within a single queue (= a single priority). It should not prevent task-switching as such. Lets have a look. sched_choose() seems to be that low-level scheduler function that decides which thread to run next. Lets create a log of its decisions.[1] With preempt_thresh >= 12 (kernel threads left out): PIDCOMMAND TIMESTAMP 18196 bash 1192.549 18196 bash 1192.554 18196 bash 1192.559 66683 lz4 1192.560 18196 bash 1192.560 18196 bash 1192.562 18196 bash 1192.563 18196 bash 1192.564 79496 ntpd 1192.569 18196 bash 1192.569 18196 bash 1192.574 18196 bash 1192.579 18196 bash 1192.584 18196 bash 1192.588 18196 bash 1192.589 18196 bash 1192.594 18196 bash 1192.599 18196 bash 1192.604 18196 bash 1192.609 18196 bash 1192.613 18196 bash 1192.614 18196 bash 1192.619 18196 bash 1192.624 18196 bash 1192.629 18196 bash 1192.634 18196 bash 1192.638 18196 bash 1192.639 18196 bash 1192.644 18196 bash 1192.649 18196 bash 1192.654 66683 lz4 1192.654 18196 bash 1192.655 18196 bash 1192.655 18196 bash 1192.659 The worker is indeed called only after 95ms. And with preempt_thresh < 8: PIDCOMMAND TIMESTAMP 18196 bash 1268.955 66683 lz4 1268.956 18196 bash 1268.956 66683 lz4 1268.956 18196 bash 1268.957 66683 lz4 1268.957 18196 bash 1268.957 66683 lz4 1268.958 18196 bash 1268.958 66683 lz4 1268.959 18196 bash 1268.959 66683 lz4 1268.959 18196 bash 1268.960 66683 lz4 1268.960 18196 bash 1268.961 66683 lz4 1268.961 18196 bash 1268.961 66683 lz4 1268.962 18196 bash 1268.962 Here we have 3 Csw per millisecond. (The fact that the decisions are over-all more frequent is easily explained: when lz4 gets to run, it will do disk I/O, which quickly returns and triggers new decisions.) In the second record, things are clear: while lz4 does disk I/O, the scheduler MUST run bash, because nothing else is there. But when data arrives, it runs again lz4. But in the first record - why does the scheduler choose bash, although lz4 has already much higher prio (52 versus 97, usually)? A value of 120 (corresponding to PRI=20 in top) will allow the I/O bound thread to preempt any other thread with
Appendices - more data: SCHED_ULE+PREEMPTION is the problem
I forgot to attach the commands used to create the logs - they are ugly anyway: [1] dtrace -q -n '::sched_choose:return { @[((struct thread *)arg1)->td_proc->p_pid, stringof(((struct thread *)arg1)->td_proc->p_comm), timestamp] = count(); } tick-1s { exit(0); }' | sort -nk 3 | awk '$1 > 27 {$3 = ($3/100)*1.0/1000; printf "%6d %20s %3.3f\n", $1, $2, $3 }' [2] dtrace -q -n '::runq_choose_from:entry /arg1 == 0||arg1 == 32/ { @[arg1, timestamp] = count(); }' | sort -nk2 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Found the issue! - SCHED_ULE+PREEMPTION is the problem
Results: 1. The tdq_ridx pointer The perceived slow advance (of the tdq_ridx pointer into the circular array) is correct behaviour. McKusick writes: The pointer is advanced once per system tick, although it may not advance on a tick until the currently selected queue is empty. Since each thread is given a maximum time slice and no threads may be added to the current position, the queue will drain in a bounded amount of time. Therefore, it is also normal that the process (the piglet in this case) does run until it's time slice (aka quantum) is used up. 2. The influence of preempt_thresh This can be found in tdq_runq_add(). A simplified description of the logic there is as follows: td_priority < 152 ? -> add to realtime-queue td_priority <= 223 ? -> add to timeshare-queue if preempted circular-index = tdq_ridx else circular_index = tdq_idx + td_priority else-> add to idle-queue If the thread had been preempted, it is reinserted at the current working position of the circular array, otherwise the position is calculated from thread priority. 3. The quorum Most of the task switches come from device interrupts. Those are running at priority intr:8 or intr:12. So, as soon as preempt_thresh is 12 or bigger, the piglet is almost always reinserted in the runqueue due to preemption. And, as we see, in that case we do not have a scheduling, we have a simple resume! A real scheduling happens only after the quorum is exhausted. Therefore, reducing the quorum helps. 4. History In r171713 was this behaviour deliberately introduced. In r220198 it was fixed, with a focus on CPU-hogs and single-CPU. In r239157 the fix was undone due to performance considerations, with the focus on rescheduling only at end of the time-slice. 5. Conclusion The current defaults seem not very well suited for certain CPU-intense tasks. Possible solutions are one of: * not use SCHED_ULE * not use preemption * change kern.sched.quorum to minimal value. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Hi Alban! Alban Hertroys wrote: Occasionally I noticed that the system would not quickly process the tasks i need done, but instead prefer other, longrunning tasks. I figured it must be related to the scheduler, and decided it hates me. If it hated you, it would behave much worse. Thats encouraging :) But I would say, running a job 100 times slower than expected is quite an amount of hate for my taste. A closer look shows the behaviour as follows (single CPU): A single CPU? That's becoming rare! Is that a VM? Old hardware? Something really specific? I don't plug in another CPU because there is no need to. Yes, its old hardware: CPU: Intel Pentium III (945.02-MHz 686-class CPU) ACPI APIC Table: If I had bought new hardware, this one would now rot in Africa, and I would have new hardware idling along that is spectre/meltdown affected nevertheless. Lets run an I/O-active task, e.g, postgres VACUUM that would And you're running a multi-process database server on it no less. That > is going to hurt, I'm running a lot more than only that on it. But it's all private use, idling most of the time. no matter how well the scheduler works. Maybe. But this post is not about my personal expectations on over-all performance - it is about a specific behaviour that is not how a scheduler is expected to behave - no matter if we're on a PDP-11 or on a KabyLake. Now, as usual, the "root-cause" questions arise: What exactly does this "quantum"? Is this solution a workaround, i.e. actually something else is wrong, and has it tradeoff in other situations? Or otherwise, why is such a default value chosen, which appears to be ill-deceived? The docs for the quantum parameter are a bit unsatisfying - they say its the max num of ticks a process gets - and what happens when they're exhausted? If by default the endless loop is actually allowed to continue running for 94k ticks (or 94ms, more likely) uninterrupted, then that explains the perceived behaviour - buts thats certainly not what a scheduler should do when other procs are ready to run. I can answer this from the operating systems course I followed recently. This does not apply to FreeBSD specifically, it is general job scheduling theory. I still need to read up on SCHED_ULE to see how the details were implemented there. Or are you using the older SCHED_4BSD? I'm using the default scheduler, which is ULE. I would not go non-default without reason. (But it seems, a reason is just appering now.) Now, that would cause a much worse situation in your example case. The endless loop would keep running once it gets the CPU and would never release it. No other process would ever get a turn again. You wouldn't even be able to get into such a system in that state using remote ssh. That is why the scheduler has this "quantum", which limits the maximum time the CPU will be assigned to a specific job. Once the quantum has expired (with the job unfinished), the scheduler removes the job from the CPU, puts it back on the ready queue and assigns the next job from that queue to the CPU. That's why you seem to get better performance with a smaller value for the quantum; the endless loop gets forcibly interrupted more often. Good description. Only my (old-fashioned) understanding was that this is the purpose of the HZ value: to give control back to the kernel, so that a new decision can be made. So, I would not have been surpized to see 200 I/Os for postgres (kern.hz=200), but what I see is 9 I/Os (which indeed figures to a "quantum" of 94ms). But then, we were able to do all this nicely on single-CPU machines for almost four decades. It does not make sense to me if now we state that we cannot do it anymore because single-CPU is uncommon today. (Yes indeed, we also cannot fly to the moon anymore, because today nobody seems to recall how that stuff was built. *headbangwall*) This changing of the active job however, involves a context switch for the CPU. Memory, registers, file handles, etc. that were required by the previous job needs to be put aside and replaced by any such resources related to the new job to be run. That uses up time and does nothing to progress the jobs that are waiting for the CPU. Hence, you don't want the quantum to be too small either, or you'll end up spending significant time switching contexts. Yepp. My understanding was that I can influence this behaviour via the HZ value, so to tradeoff responsiveness against performance. Obviousely that was wrong. From Your writing, it seems the "quantum" is indeed the correct place to tune this. (But I will still have to ponder a while about the knob mentioned by Stefan, concerning preemption, which seems to magically resolve the issue.) That said, SCHED_ULE (the default scheduler for quite a while now) was designed with multi-CPU configurations in mind and there are claims that SCHED_4BSD works better for single-CPU configurations. You may
Re: kern.sched.quantum: Creepy, sadistic scheduler
George Mitchell wrote: On 04/04/18 06:39, Alban Hertroys wrote: [...] That said, SCHED_ULE (the default scheduler for quite a while now) was designed with multi-CPU configurations in mind and there are claims that SCHED_4BSD works better for single-CPU configurations. You may give that a try, if you're not already on SCHED_4BSD. [...] A small, disgruntled community of FreeBSD users who have never seen proof that SCHED_ULE is better than SCHED_4BSD in any environment continue to regularly recompile with SCHED_4BSD. I dread the day when that becomes impossible, but at least it isn't here yet. -- George Yes *laugh*, I found a very lengthy and mind-boggling discussion from back in 2011. And I found that You made this statement somewhere there: // With nCPU compute-bound processes running, with SCHED_ULE, any other // process that is interactive (which to me means frequently waiting for // I/O) gets ABYSMAL performance -- over an order of magnitude worse // than it gets with SCHED_4BSD under the same conditions. -- https://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064984.html And this describes quite exactly what I perceive. Now, I would like to ask: what has been done about this issue? P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Try setting kern.sched.preempt_thresh != 0
Stefan Esser wrote: I'm guessing that the problem is caused by kern.sched.preempt_thresh=0, which prevents preemption of low priority processes by interactive or I/O bound processes. For a quick test try: # sysctl kern.sched.preempt_thresh=1 Hi Stefan, thank You, thats an interesting knob! Only it is actually the other way round: it is not set to 0. My settings (as default) are: kern.sched.steal_thresh: 2 kern.sched.steal_idle: 1 kern.sched.balance_interval: 127 kern.sched.balance: 1 kern.sched.affinity: 1 kern.sched.idlespinthresh: 157 kern.sched.idlespins: 1 kern.sched.static_boost: 152 kern.sched.preempt_thresh: 80 kern.sched.interact: 30 kern.sched.slice: 12 kern.sched.quantum: 94488 kern.sched.name: ULE kern.sched.preemption: 1 kern.sched.cpusetsize: 4 But then, if I change kern.sched.preempt_thresh to 1 *OR* 0, things behave properly! Precisely, changing from 8 down to 7 changes things completely: >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G927 0 7.32M 0 > PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND > 1900 pgsql 820 618M 17532K RUN 0:53 34.90% postgres > 1911 admin 810 7044K 2824K RUN 6:07 28.34% bash (Notice the PRI values which also look differnt now.) rgds, P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
more data: SCHED_ULE+PREEMPTION is the problem (was: kern.sched.quantum: Creepy, sadistic scheduler
Hi all, in the meantime I did some tests and found the following: A. The Problem: --- On a single CPU, there are -exactly- two processes runnable: One is doing mostly compute without I/O - this can be a compressing job or similar; in the tests I used simply an endless-loop. Lets call this the "piglet". The other is doing frequent file reads, but also some compute interim - this can be a backup job traversing the FS, or a postgres VACUUM, or some fast compressor like lz4. Lets call this the "worker". It then happens that the piglet gets 99% CPU, while the worker gets only 0.5% CPU and makes nearly no progress at all. Investigations shows that the worker makes precisely one I/O per timeslice (timeslice as defined in kern.sched.quantum) - or two I/O on a mirrored ZFS. B. Findings: 1. Filesystem I could never reproduce this when reading from plain UFS. Only when reading from ZFS (direct or via l2arc). 2. Machine The problem originally appeared on a pentium3@1GHz. I was able to reproduce it on an i5-3570T, given the following measures: * config in BIOS to use only one CPU * reduce speed: "dev.cpu.0.freq=200" I did see the problem also when running full speed (which means it happens there also), but could not reproduce it well. 3. kern.sched.preempt_thresh I could make the problem disappear by changing kern.sched.preempt_thresh from the default 80 to either 11 (i5-3570T) or 7 (p3) or smaller. This seems to correspond to the disk interrupt threads, which run at intr:12 (i5-3570T) or intr:8 (p3). 4. dynamic behaviour Here the piglet is already running as PID=2119. Then we can watch the dynamic behaviour as follows (on i5-3570T@200MHz): a. with kern.sched.preempt_thresh=80 $ lz4 DATABASE_TEST_FILE /dev/null & while true; do ps -o pid,pri,"%cpu",command -p 2119,$! sleep 3 done [1] 6073 PID PRI %CPU COMMAND 6073 20 0.0 lz4 DATABASE_TEST_FILE /dev/null 2119 100 91.0 -bash (bash) PID PRI %CPU COMMAND 6073 76 15.0 lz4 DATABASE_TEST_FILE /dev/null 2119 95 74.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 19.0 lz4 DATABASE_TEST_FILE /dev/null 2119 94 71.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 16.0 lz4 DATABASE_TEST_FILE /dev/null 2119 95 76.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 14.0 lz4 DATABASE_TEST_FILE /dev/null 2119 96 80.0 -bash (bash) PID PRI %CPU COMMAND 6073 52 12.5 lz4 DATABASE_TEST_FILE /dev/null 2119 96 82.5 -bash (bash) PID PRI %CPU COMMAND 6073 74 10.0 lz4 DATABASE_TEST_FILE /dev/null 2119 98 86.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 8.0 lz4 DATABASE_TEST_FILE /dev/null 2119 98 89.0 -bash (bash) PID PRI %CPU COMMAND 6073 52 7.0 lz4 DATABASE_TEST_FILE /dev/null 2119 98 90.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 6.5 lz4 DATABASE_TEST_FILE /dev/null 2119 99 91.5 -bash (bash) b. with kern.sched.preempt_thresh=11 PID PRI %CPU COMMAND 4920 21 0.0 lz4 DATABASE_TEST_FILE /dev/null 2119 101 93.5 -bash (bash) PID PRI %CPU COMMAND 4920 78 20.0 lz4 DATABASE_TEST_FILE /dev/null 2119 94 70.5 -bash (bash) PID PRI %CPU COMMAND 4920 82 34.5 lz4 DATABASE_TEST_FILE /dev/null 2119 88 54.0 -bash (bash) PID PRI %CPU COMMAND 4920 85 42.5 lz4 DATABASE_TEST_FILE /dev/null 2119 86 45.0 -bash (bash) PID PRI %CPU COMMAND 4920 85 43.5 lz4 DATABASE_TEST_FILE /dev/null 2119 86 44.5 -bash (bash) PID PRI %CPU COMMAND 4920 85 43.0 lz4 DATABASE_TEST_FILE /dev/null 2119 85 45.0 -bash (bash) PID PRI %CPU COMMAND 4920 85 43.0 lz4 DATABASE_TEST_FILE /dev/null 2119 85 45.5 -bash (bash) From this we can see that in case b. both processes balance out nicely and meet at equal CPU shares. Whereas in case a., after about 10 Seconds (the first 3 records) they move to opposite ends of the scale and stay there. From this I might suppose that here is some kind of mis-calculation or mis-adjustment of the task priorities happening. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Julian Elischer wrote: for a single CPU you really should compile a kernel with SMP turned off and 4BSD scheduler. ULE is just trying too hard to do stuff you don't need. Julian, if we agree on this, I am fine. (This implies that SCHED_4BSD will *not* be retired for an indefinite time!) I tested yesterday, and SCHED_4BSD doesn't show the annoying behaviour. SMP seems to be no problem (and I need that), but PREEMPTION is definitely related to the problem (see my other message sent now). P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Andriy Gapon wrote: On 04/04/2018 03:52, Peter wrote: Lets run an I/O-active task, e.g, postgres VACUUM that would continuousely read from big files (while doing compute as well [1]): Not everyone has a postgres server and a suitable database. Could you please devise a test scenario that demonstrates the problem and that anyone could run? Andriy, and maybe nobody anymore has such old system that is CPU-bound instead of IO-bound. I'd rather think about reproducing it on my IvyBridge. I know for sure that it is *not* specifically dependent on postgres. What I posted was the case when an endless-loop piglet starves a postgres VACUUM - and there we see a very pronounced effect of almost factor 100. When I first clearly discovered it (after a long time of belly-feeling that something behaves strange), it was postgres pg_dump (which does compression, i.e. CPU-bound) as the piglet starving an bacula-fd backup that would scan the filesystem. So, there is a general rule: we have one process that is a CPU-hog, and another process that does periodic I/O (but also *some* compute). and -important!- nothing else. If we understand the logic of the scheduler, that information should already suit for some logical verification *eg* - but I will see if I get it reprocuved on the IvyBridge machine and/or see if I get a testcase together. May take a while. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Andriy Gapon wrote: Not everyone has a postgres server and a suitable database. Could you please devise a test scenario that demonstrates the problem and that anyone could run? Alright, simple things first: I can reproduce the effect without postgres, with regular commands. I run this on my database file: # lz4 2058067.1 /dev/null And have this as throughput: poolalloc free read write read write cache - - - - - - ada1s47.08G 10.9G889 0 7.07M 42.3K PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND 51298 root870 16184K 7912K RUN 1:00 51.60% lz4 I start the piglet: $ while true; do :; done And, same effect: poolalloc free read write read write cache - - - - - - ada1s47.08G 10.9G 10 0 82.0K 0 PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND 1911 admin 980 7044K 2860K RUN 65:48 89.22% bash 51298 root520 16184K 7880K RUN 0:05 0.59% lz4 It does *not* happen with plain "cat" instead of "lz4". What may or may not have an influence on it: the respective filesystem is block=8k, and is 100% resident in l2arc. What is also interesting: I started trying this with "tar" (no effect, behaves properly), then with "tar --lz4". In the latter case "tar" starts "lz4" as a sub-process, so we have three processes in the play - and in that case the effect happens, but to lesser extent: about 75 I/Os per second. So, it seems quite clear that this has something to do with the logic inside the scheduler. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Hi all of You, thank You very much for Your commenting and reports! From what I see, we have (at least) two rather different demands here: while George looks at the over-all speed of compute throughput, others are concerned about interactive response. My own issue is again a little bit different: I am running this small single-CPU machine as my home-office router, and it also runs a backup service, which involves compressing big files and handling an outgrown database (but that does not need to happen fast, as it's just backup stuff). So, my demand is to maintain a good balance between realtime network activity being immediately served, and low-priority batch compute jobs, while still staying responsive to shell-commands - but the over-all compute throughput is not important here. But then, I find it very difficult to devise some metrics, by which such a demand could be properly measured, to get compareable figures. George Mitchell wrote: I suspect my case (make buildworld while running misc/dnetc) doesn't qualify. However, I just completed a SCHED_ULE run with preempt_thresh set to 5, and "time make buildworld" reports: 7336.748u 677.085s 9:25:19.86 23.6% 27482+473k 42147+431581io 38010pf+0w Much closer to SCHED_4BSD! I'll try preempt_thresh=0 next, and I guess I'll at least try preempt_thresh=224 to see how that works for me. -- George I found that preempt_thresh=0 cannot be used in practice: When I try to do this on my quadcode desktop, and then start four endless-loops to get the cores busy, the (internet)radio will have a dropout every 2-3 seconds (and there is nothing else running, just a sleeping icewm and a mostly sleeping firefox)! So, the (SMP) system *depends* on preemption, it cannot handle streaming data without it. (@George: Your buildworld test is pure batch load, and may not be bothered by this effect.) I think the problem is *not* to be solved by finding a good setting for preempt_thresh (or other tuneables). I think the problem lies deeper, and these tuneables only change its appearance. I have worked out a writeup explaining my thoughts in detail, and I would be glad if You stay tuned and evaluate that. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
EBFE via freebsd-stable wrote: On Tue, 17 Apr 2018 09:05:48 -0700 Freddie Cashwrote: # Tune for desktop usage kern.sched.preempt_thresh=224 Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor (3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU. For interactive tasks, there is a "special" tunable: % sysctl kern.sched.interact kern.sched.interact: 10 # default is 30 % sysctl -d kern.sched.interact kern.sched.interact: Interactivity score threshold reducing the value from 30 to 10-15 keeps your gui/system responsive, even under high load. Yes, this may improve the "irresponsive-desktop" problem. Because threads that are scored interactive, are run as realtime threads, ahead of all regular workload queues. But it will likely not solve the problem described by George, having two competing batch jobs. And for my problem as described at the beginning of the thread, I could probably tune so far that my "worker" thread would be considered interactive, but then it would just toggle between realtime and timesharing queues - and while this may make things better, it will probably not lead to a smooth system behaviour. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
kern.sched.quantum: Creepy, sadistic scheduler
Occasionally I noticed that the system would not quickly process the tasks i need done, but instead prefer other, longrunning tasks. I figured it must be related to the scheduler, and decided it hates me. A closer look shows the behaviour as follows (single CPU): Lets run an I/O-active task, e.g, postgres VACUUM that would continuousely read from big files (while doing compute as well [1]): >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G 1.58K 0 12.9M 0 Now start an endless loop: # while true; do :; done And the effect is: >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G 9 0 76.8K 0 The VACUUM gets almost stuck! This figures with WCPU in "top": > PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND >85583 root990 7044K 1944K RUN 1:06 92.21% bash >53005 pgsql 520 620M 91856K RUN 5:47 0.50% postgres Hacking on kern.sched.quantum makes it quite a bit better: # sysctl kern.sched.quantum=1 kern.sched.quantum: 94488 -> 7874 >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G395 0 3.12M 0 > PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND >85583 root940 7044K 1944K RUN 4:13 70.80% bash >53005 pgsql 520 276M 91856K RUN 5:52 11.83% postgres Now, as usual, the "root-cause" questions arise: What exactly does this "quantum"? Is this solution a workaround, i.e. actually something else is wrong, and has it tradeoff in other situations? Or otherwise, why is such a default value chosen, which appears to be ill-deceived? The docs for the quantum parameter are a bit unsatisfying - they say its the max num of ticks a process gets - and what happens when they're exhausted? If by default the endless loop is actually allowed to continue running for 94k ticks (or 94ms, more likely) uninterrupted, then that explains the perceived behaviour - buts thats certainly not what a scheduler should do when other procs are ready to run. 11.1-RELEASE-p7, kern.hz=200. Switching tickless mode on or off does not influence the matter. Starting the endless loop with "nice" does not influence the matter. [1] A pure-I/O job without compute load, like "dd", does not show this behaviour. Also, when other tasks are running, the unjust behaviour is not so stongly pronounced. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
kernel crash from adjacent partitions (gpart, zfs)
Hi, when creating partitions directly adjacent without a safety free space between them, the kernel may crash. Does anybody know how big that free space needs to be? How I found out (and how to reproduce the crash): https://forums.freebsd.org/threads/create-degraded-raid-5-with-2-disks-on-freebsd.70750/post-426756 OS concerned: 11.2, amd64 and i386. Or, does anybody know if this is fixed in 12? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Rel. 11.3: Kernel doesn't compile anymore :(
Trying to compile my custom kernel in Rel. 11.3 results in this: [code]--- kernel.full --- linking kernel.full atomic.o: In function `atomic_add_64': /usr/obj/usr/src/sys/E1R11V1/./machine/atomic.h:629: multiple definition of `atomic_add_64' opensolaris_atomic.o:/usr/src/sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71: first defined here *** [kernel.full] Error code 1[/code] Same config worked with 11.2 The offending feature is either options ZFS or device dtrace (Adding any of these to the GENERIC config gives the same error.) This happens only when building for i386. Building amd64 with these options works. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Rel. 11.3: Kernel doesn't compile anymore (SVN-334762, please fix!)
> Trying to compile my custom kernel in Rel. 11.3 results in this: > > -- kernel.full --- > linking kernel.full > atomic.o: In function `atomic_add_64': > /usr/obj/usr/src/sys/E1R11V1/./machine/atomic.h:629: multiple definition of > `atomic_add_64' > opensolaris_atomic.o:/usr/src/sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71: > first defined here > *** [kernel.full] Error code 1 > > Same config worked with 11.2 > > The offending feature is either >options ZFS > or >device dtrace > (Adding any of these to the GENERIC config gives the same error.) > > This happens only when building for i386. Building amd64 with these > options works. Trying to analyze the issue: The problem appears with SVN 334762 in 11.3: This change adds two new functions to sys/i386/include/atomic.h: atomic_add_64() atomic_subtract_64() [I don't really understand why this goes into a headerfile, but, well, nevermind] Also, this change deactivates two functions (only in case *i386*) from sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c atomic_add_64() atomic_del_64() [Now, there seems to be a slight strangeness here: if we *deactivate* atomic_del_64(), and *insert* atomic_subtract_64(), then these two names are not the same, and I might suppose that the atomic_del_64() is then somehow missing. But, well, nevermind] Now, the strange thing: this file sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c from which now two functions get excluded *only in case i386*, is not even compiled for i386: >/usr/src/sys/conf$ grep opensolaris_atomic.c * >files.arm:cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs | >dtrace compile-with "${CDDL_C}" >files.mips:cddl/compat/opensolaris/kern/opensolaris_atomic.coptional zfs | >dtrace compile-with "${CDDL_C}" >files.powerpc:cddl/compat/opensolaris/kern/opensolaris_atomic.c > optional zfs powerpc | dtrace powerpc compile-with "${ZFS_C}" >files.riscv:cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs | >dtrace compile-with "${CDDL_C}" [So maybe that's the reason why the now lack of atomic_del_64() is not complained? Or maybe it's not used, or maybe I didn't find some definition whereever. Well, nevermind] Anyway, the actual name clash happens between sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S, because that one *is* compiled: >/usr/src/sys/conf$ grep i386/opensolaris_atomic.S * >files.i386:cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > optional zfs | dtrace compile-with "${ZFS_S}" I tried to move out the changes from SVN 334762. Sadly, that didn't work, because something does already use these atomic_add_64() stuff, So instead, I did this one: --- sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S (revision 350287) +++ sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S (working copy) @@ -66,8 +66,7 @@ * specific mapfile and remove the NODYNSORT attribute * from atomic_add_64_nv. */ - ENTRY(atomic_add_64) - ALTENTRY(atomic_add_64_nv) + ENTRY(atomic_add_64_nv) pushl %edi pushl %ebx movl12(%esp), %edi // %edi = target address @@ -87,7 +86,6 @@ popl%edi ret SET_SIZE(atomic_add_64_nv) - SET_SIZE(atomic_add_64) ENTRY(atomic_or_8_nv) movl4(%esp), %edx // %edx = target address And at least it compiles now. If it actually runs, that remains to be found out. Bottomline: Please, please, please, sort this out and fix it. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Rel. 11.3: Kernel doesn't compile anymore (SVN-334762, please fix!)
Hi Hans Petter, glad to read You! :) On Thu, Jul 25, 2019 at 09:39:26AM +0200, Hans Petter Selasky wrote: ! On 2019-07-25 01:00, Peter wrote: ! >> The offending feature is either ! >> options ZFS ! >> or ! >> device dtrace ! >> (Adding any of these to the GENERIC config gives the same error.) ! Can you attach your kernel configuration file? Yes, but to what point? I can reproduce this with the GENERIC configuration by adding "options ZFS" (My custom KERNCONF relates to my local patches, and is rather pointless without these. So at first I tried to reproduce without my local patches and with minimal changes from GENERIC config. And the minimal change is to add "options ZFS" into the GENERIC conf.) See here: root@disp:/usr/src/sys/i386/compile/GENERIC # make linking kernel.full atomic.o: In function `atomic_add_64': /usr/src/sys/i386/compile/GENERIC/./machine/atomic.h:629: multiple definition of `atomic_add_64' opensolaris_atomic.o:/usr/src/sys/i386/compile/GENERIC/../../../cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71: first defined here *** Error code 1 Stop. make: stopped in /usr/src/sys/i386/compile/GENERIC root@disp:/usr/src/sys/i386/compile/GENERIC # root@disp:/usr/src/sys/i386/compile/GENERIC # cd ../../../.. root@disp:/usr/src # svn stat M sys/i386/conf/GENERIC root@disp:/usr/src # svn diff Index: sys/i386/conf/GENERIC === --- sys/i386/conf/GENERIC (revision 350287) +++ sys/i386/conf/GENERIC (working copy) @@ -1,3 +1,4 @@ +options ZFS # # GENERIC -- Generic kernel configuration file for FreeBSD/i386 # root@disp:/usr/src # svn info Path: . Working Copy Root Path: /usr/src URL: https://svn0.us-east.freebsd.org/base/releng/11.3 Relative URL: ^/releng/11.3 Repository Root: https://svn0.us-east.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 350287 Node Kind: directory Schedule: normal Last Changed Author: gordon Last Changed Rev: 350287 Last Changed Date: 2019-07-24 12:58:21 + (Wed, 24 Jul 2019) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: wrong value from DTRACE (uint32 for int64)
On Mon, 02 Dec 2019 21:58:36 +0100, Mark Johnston wrote: The DTRACE_PROBE* macros cast their parameters to uintptr_t, which will be 32 bits wide on i386. You might be able to work around the problem by casting arg0 to uint32_t in the script. Thanks for the info - good that it has a logical explanation. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Disabling speculative execution mitigations
On Fri, 06 Dec 2019 06:21:04 +0100, O'Connor, Daniel wrote: vm.pmap.pti="0"# Disable page table isolation hw.ibrs_disable="1"# Disable Indirect Branch Restricted Speculation hw.mds_disable="0" # Disable Microarchitectural Data Sampling flush hw.vmm.vmx="1" # Don't flush RSB on vmexit (presumably only affects bhyve etc) hw.lazy_fpu_switch="1" # Lazily flush FPU Does anyone know of any others? hw.spec_store_bypass_disable=2 I have that on 11.3 (no idea yet about 12). And honestly, I lost track which of these should be on, off, automatic, opaque or elsewhere to achieve either performance or security (not to mention for which cores and under which circumstances it would matter, and what the impact might be), and my oracle says this will not end with these. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
wrong value from DTRACE (uint32 for int64)
Hi @all, I felt the need to look into my ZFS ARC, but DTRACE provided misleading (i.e., wrong) output (on i386, 11.3-RELEASE): # dtrace -Sn 'arc-available_memory { printf("%x %x", arg0, arg1); }' DIFO 0x286450a0 returns D type (integer) (size 8) OFF OPCODE INSTRUCTION 00: 29010601ldgs DT_VAR(262), %r1 ! DT_VAR(262) = "arg0" 01: 2301ret %r1 NAME ID KND SCP FLAG TYPE arg0 262 scl glb rD type (integer) (size 8) DIFO 0x286450f0 returns D type (integer) (size 8) OFF OPCODE INSTRUCTION 00: 29010701ldgs DT_VAR(263), %r1 ! DT_VAR(263) = "arg1" 01: 2301ret %r1 NAME ID KND SCP FLAG TYPE arg1 263 scl glb rD type (integer) (size 8) dtrace: description 'arc-available_memory ' matched 1 probe 0 14none:arc-available_memory 2fb000 2 0 14none:arc-available_memory 4e000 2 1 14none:arc-available_memory b000 2 1 14none:arc-available_memory b000 2 1 14none:arc-available_memory b000 2 1 14none:arc-available_memory 19000 2 0 14none:arc-available_memory d38000 2 # dtrace -n 'arc-available_memory { printf("%d %d", arg0, arg1); }' 1 14none:arc-available_memory 81920 5 1 14none:arc-available_memory 69632 5 1 14none:arc-available_memory 4294955008 5 1 14none:arc-available_memory 4294955008 5 The arg0 Variable is shown here obviousely as an unsigned int32 value. But in fact, the probe in the sourcecode in arc.c is a signed int64: DTRACE_PROBE2(arc__available_memory, int64_t, lowest, int, r); User @shkhin in the forum pointed me to check the bare dtrace program, unattached to the kernel code: https://forums.freebsd.org/threads/dtrace-treats-int64_t-as-uint32_t-on-i386.73223/post-446517 And there everything appears correct. So two questions: 1. can anybody check and confirm this happening? 2. any idea what could be wrong here? (The respective variable in arc.c bears the correct 64bit negative value, I checked that - and otherwise the ARC couldn't shrink.) rgds, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
jedec_dimm fails to boot
I met an Issue: When I kldload jedec_dimm durig runtime, it works just as expected, and the DIMM data appears in sysctl. But when I do * load the jedec_dimm at the loader prompt, or * add it to loader.conf, or * compile it into a custom kernel, it does not boot anymore. My custom kernel does just hang somewhere while switching the screen, i.e. no output. The GENERIC does immediate-reboot during the device probe phase. So both are not suitable for gathering additional info in an easy way. (And since my DIMM appear to have neither thermal nor serial, there is not much to gain for me here, so I will not pursue this further, at least not before switching to R.12.) But I fear there are some general problems with sorting out of the modules during system bringup - see also my other message titled "panic: too many modules". Some data for those interested: FreeBSD 11.3-RELEASE-p6 CPU: Intel(R) Core(TM) i5-3570T CPU (IvyBridge) Board: https://www.asus.com/Motherboards/P8B75V/specifications/ Config: hint.jedec_dimm.0.at="smbus12" hint.jedec_dimm.0.addr="0xa0" hint.jedec_dimm.1.at="smbus12" hint.jedec_dimm.1.addr="0xa2" hint.jedec_dimm.2.at="smbus12" hint.jedec_dimm.2.addr="0xa4" hint.jedec_dimm.3.at="smbus12" hint.jedec_dimm.3.addr="0xa6" ichsmb0: port 0xf040-0xf05f mem 0xf7d1500 0-0xf7d150ff irq 18 at device 31.3 on pci0 smbus12: on ichsmb0 smb12: on smbus12 With GENERIC it becomes smbus0 (because drm2 is not loaded) and I need to load "smbus" and "ichsmb" frontup. Cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
panic: too many modules
Front up: I do not like loadable modules. They are nice to try something out, but when you start to depend on some dozen loaded modules, debugging becomes a living hell: say you hunt some spurious misbehaviour and compare logfiles with those from four weeks ago, you will not know exactly which modules were loaded at that time. Compiling everything into the kernel has the advantage that the 'uname' does change on every change and so does precisely describe the running kernel. So I came across the cc_vegas and cc_cdg modules, and they aren't provided to compile into the kernel straightaway. But that should not be a big deal: just add some arbitrary new device to the KERNCONF, and then add the required files to sys/conf/files appropriately. Should work. But it doesn't. Right after the startup message, before even probing devices, it says panic: module_register_init: module named ertt not found and a stacktrace from kern/init_main.c:mi_startup(). But definitely the h_ertt is present in the kernel (I checked). To have a closer look, I added VERBOSE_SYSINIT to the kernel, and - the panic is gone, everything working as expected. Without even activating the output from VERBOSE_SYSINIT. Then, I moved netinet/khelp/h_ertt.c to the very end of sys/conf/files - and this also avoids the panic and things do work. While this change does nothing but change the sequence in which the files are compiled (and probably linked). I think this is not good. Everybody likes modules, (although -see above- they come with a serious tradeoff on reproducability). But if we now deliver components only as loadable modules because a compound kernel is no longer able to sort them out on boot, that's a more serious issue. I wouldn't complain if the module would simply not work (reproducible) when compiled into the kernel - but this here appears to be a race, most likely a timing race. And such being possible to happen at the point where the kernel sorts out it's own components - ups, that does worry me indeed... There seems also to be a desire for a *fast* system bringup. I don't share that. I do boot once a quarter, and if that takes a hour I don't mind. Maybe there is need for an option, to give fast boot to those who want a gaming console alike to be available immediately, and slow boot for those who want a reliable system in 24/7 operation? Maybe I'll take a closer look at the issue after switching to R.12 (probably not this year). Or, maybe somebody would like to point me to some paper describing how the module fabric is supposed to interface and by which steps the runtime linkage is achieved? Platform: FreeBSD 11.3-RELEASE-p6, Intel(R) Core(TM) i5-3570T CPU (IvyBridge) cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: jedec_dimm fails to boot
On Wed, Mar 04, 2020 at 11:41:22PM +0300, Yuri Pankov wrote: ! On 04.03.2020 19:09, Peter wrote: ! > When I kldload jedec_dimm durig runtime, it works just as expected, ! > and the DIMM data appears in sysctl. ! > ! > But when I do ! > * load the jedec_dimm at the loader prompt, or ! > * add it to loader.conf, or ! > * compile it into a custom kernel, ! > it does not boot anymore. ! Could you try backporting r351604 and see if it helps? Yepp, that works. Thank You! :) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS and power management
On Wed, 18 Dec 2019 17:22:16 +0100, Karl Denninger wrote: I'm curious if anyone has come up with a way to do this... I have a system here that has two pools -- one comprised of SSD disks that are the "most commonly used" things including user home directories and mailboxes, and another that is comprised of very large things that are far less-commonly used (e.g. video data files, media, build environments for various devices, etc.) I'm using such a configuration for more than 10 years already, and didn't perceive the problems You describe. Disks are powered down with gstopd or other means, and they stay powered down until filesystems in the pool are actively accessed. A difficulty for me was that postgres autovacuum must be completeley disabled if there are tablespaces on the quiesced pools. Another thing that comes to mind is smartctl in daemon mode (but I never used that). There are probably a whole bunch more of potential culprits, so I suggest You work thru all the housekeeping stuff (daemons, cronjobs, etc.) to find it. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
session mgmt: does POSIX indeed prohibit NOOP execution?
pgrp = getpid(); if(setpgid(0, pgrp) < 0) err(1, "setpgid"); This appears to me a program trying to deemonize itself (in the old style when there was only job control but no session management). In the case this program is already properly daemonized, e.g. by starting it from /usr/sbin/daemon, this code now fails, invoking the err() clause and thereby aborting. From what I could find out, POSIX does not allow a session leader to do setpgid() on itself. When a program is invoked via /usr/sbin/daemon, it should already be session leader AND group leader, and then the above code WOULD be a NOOP, unless POSIX would require the setpgid() to fail and thereby the program to abort - which, btw, is NOT a NOOP :( So, where is the mistake here? Option 1: I have completely misunderstood something. Then please tell me what. Option 2: The quoted code is bogus. Then why is it in base? option 3: The setpgid() behaviour is bogus. It may stop a session leader from executing it, but it should detect a NOOP and just go thru with it. Then why don't we fix that? Option 4: POSIX is bogus. Unlikely, because as far as I could find out, that part of it was written following the Berkeley implementation. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: session mgmt: does POSIX indeed prohibit NOOP execution?
On Mon, 06 Jan 2020 01:10:57 +0100, Christoph Moench-Tegeder wrote: When a program is invoked via /usr/sbin/daemon, it should already be session leader AND group leader, and then the above code WOULD be a NOOP, unless POSIX would require the setpgid() to fail and thereby the program to abort - which, btw, is NOT a NOOP :( https://pubs.opengroup.org/onlinepubs/9699919799/ "The setpgid() function shall fail if: [...] The process indicated by the pid argument is a session leader." Okay, so, what You are saying is that I got correct information insofar that POSIX indeed demands the perceived behaviour. Thanks for that confirmation. Not much room to argue? Why that? This is not about laws you have to follow blindly whether you understand them or not, this is all about an Outcome - a working machine that should properly function. So either there are other positive aspects in this behaviour that weight against the perceived malfunction, or the requirement is simply wrong. And the latter case should be all the argument that is needed. I do not say disobey Posix. I only say that one of the involved parts must certainly be wrong, and that should be fixed. So if You are saying, the problem is in Posix, but we are in the role of blind monkeys who have to follow that alien commandment by all means no matter the outcome, then this does not seem acceptable to me. Actually, as it seems to me, this whole session thing came originally out of Kirk McKusick's kitchen and made its way from there into Posix, so if there is indeed a flaw in it, it should well be possible to fix it going the same way. In any case, this here (to be found in /etc/rc,d/kadmind) is a crappy workaround and not acceptable style: command_args="$command_args &" We aren't slaves, or, are we? I for my part came just accidentially across this matter, and as my stance is, 1. the code has to be solid enough to stand the Jupiter mission, and therefore 2. do a rootcause Always, on Every misbehaviour (and then fix it once and for all), so I figured that thing out. rgds, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Fwd: Re: session mgmt: does POSIX indeed prohibit NOOP execution?
> Not much room to argue? Why that? This is not about laws you have to follow blindly whether you understand them or not, this is all about an Outcome - a working machine that should properly function. "Not much to argue about what behaviour is required by the standard". The standard could have been written to require different behaviour and most probably still make sense, but it wasn't; but at least it's unambiguous. After that, the discussion is rather... philosophical. It is not the standard that concerns me, it is *failure* that concerns me. When I try to run a daemon from the base OS (in the orderly way, via daemon command), and it just DOES NOT WORK, and I need to find out and look into it what's actually wrong, then for me that's not philosophy, that's a failure that needs some effort to fix. And I dont want such issues, and, more important, I don't want other people to run into the same issue again! (Not sure what is so difficult to understand with that.) In any case, either the base system has a flaw, or the syscall has a flaw, or the Posix has a flaw. I don't care which, You're free to choose, But if you instead think that flaws are not allowed to exist because Posix is perfect, and therefore the much better solution is to just bully the people who happen to run into the flaws, well, thats also okay. rgds, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
12.2 cpuset behaves unexpected
After upgrading 11.4 -> 12.2, cpuset now behaves rather different: # cpuset -C -p NNN 11.4: a new set is created with all cpu enabled, and the process is moved into that set, with the thread mask unchanged. 12.2: nothing is done, but an error raises if threadmask == setmask. # cpuset -l XX -C -p NNN 11.4: a new set is created with all cpu enabled, and the process is moved into that set, with the thread mask changed to the -l parameter. 12.2: an error raises if threadmask == setmask, otherwise the threadmask is changed to the -l parameter. It seems the -C option does not work anymore (except for creating errors that appear somehow bogus). PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
12.2 Firefox immediate crash "exiting due to channel error"
Hi all, I was forced to upgrade 11.4 -> 12.2, as QT5 reqires openssl 1.1.1. I did a full rebuild from source as of this: 12.2-RC2 FreeBSD 12.2-RC2 #11 r366648M#N1055:1078 (local patches applied - some published via sendbug 10 or 12 years ago) I did a full rebuild of ALL ports from source, as of 2020Q4, Revision: 552058. I verified all files in /usr/local were newly written. Then I removed COMPAT_FREEBSD11. Firefox (firefox-esr 78.3.1_3,1) reproducibly crashes immediate at startup with some "exiting due to channel error". This is solved by putting COMPAT_FREEBSD11 back in (after the better part of a day spent with kernel builds while halving the diffs between GENERIC and mine). I found some comments, but they do not elaborate on the issue, e.g: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233028#c13 (that's two years ago and concerns 12.0-PRERELEASE!) Finally I found this: https://reviews.freebsd.org/D23100 "The Rust ecosystem currently uses pre-ino64 syscalls, so building lang/rust without COMPAT_FREEBSD11 is not going to work." It seems, *RUNNING* rust-built stuff w/o COMPAT11 is also not going to work - and one wouldn't expect this (and probably search for a long time), because removing compat switches finally before rebooting, *AFTER* everything was rebuilt and installation verified, is just good practice. So, as a user I would expect to find this mentioned in some release notes. OTOH, rust is an add-on, and so one could take the position that base is not concerned. But then at least ports/UPDATING should somehow mention it. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to free used Swap-Space?
On Tue, Sep 22, 2020 at 12:33:19PM -0400, Mark Johnston wrote: ! On Tue, Sep 22, 2020 at 06:08:01PM +0200, Peter wrote: ! > my machine should use about 3-4, maybe 5 GB swapspace. Today I found ! > it suddenly uses 8 GB (which is worryingly near the configured 10G). ! > ! > I stopped all the big suckers - nothing found. ! > I stopped all the jails - no success. ! > I brought it down to singleuser: it tried to swapoff, but failed. ! > ! > I unmounted all filesystems, exported all pools, detached all geli, ! > and removed most of the netgraphs. Swap is still occupied. ! > ! > Machine is now running only the init and a shell processes, has ! > almost no filesystems mounted, has mostly native networks only, and ! > this still occupies 3 GB of swap which cannot be released. ! > ! > What is going on, what is doing this, and how can I get this swapspace ! > released?? ! ! Do you have any shared memory segments lingering? ipcs -a will show ! SysV shared memory usage. I have four small shmem segments from four postgres clusters running. These should cleanly disappear when the clusters are stopped, and they are very small. Shared Memory: T ID KEY MODEOWNERGROUPCREATOR CGROUP NATTCHSEGSZ CPID LPID ATIMEDTIMECTIME m65536 5432001 --rw--- postgres postgres postgres postgres 7 48 4793 4793 6:09:34 18:00:31 6:09:34 m655370 --rw--- postgres postgres postgres postgres 11 48 6268 6268 6:09:42 10:48:27 6:09:42 m655380 --rw--- postgres postgres postgres postgres 5 48 6968 6968 6:09:46 18:28:36 6:09:46 m655390 --rw--- postgres postgres postgres postgres 6 48 6992 6992 6:09:47 3:38:34 6:09:47 ! For POSIX shared memory, in 11.4 we do not ! have any good way of listing objects, but "vmstat -m | grep shmfd" will ! at least show whether any are allocated. There is something, and I don't know who owns that: $ vmstat -m | grep shmfd shmfd1314K - 473 64,256,1024,8192 But that doesn't look big either. Furthermore, this machine is running for quite some time already; it was running as i386 (with ZFS) until very recently, and I know quite well what is using much memory: these 3 GB were illegitimate; they came from nothing I did install. And they are new; this has not happened before. ! If those don't turn anything ! up then it's possible that there's a swap leak. Do you use any DRM ! graphics drivers on this system? Probably yes. There is no graphics used at all; it just uses "device vt" in text mode, but it uses i5-3570T CPU (IvyBridge HD2500) graphics for that, and the driver is "drm2" and "i915drm" from /usr/src/sys (not those from ports). Not sure how that would account for 3 GB, unless there is indeed some leak. regards, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to free used Swap-Space? (from errno=8)
I think I can reproduce the problem now. See below. On Tue, Sep 22, 2020 at 02:09:01PM -0400, Mark Johnston wrote: ! On Tue, Sep 22, 2020 at 07:31:07PM +0200, Peter wrote: ! > There is something, and I don't know who owns that: ! > $ vmstat -m | grep shmfd ! > shmfd1314K - 473 64,256,1024,8192 ! > ! > But that doesn't look big either. ! ! That is just the amount of kernel memory used to track a set of objects, ! not the actual object sizes. Unfortunately, in 11 I don't think there's ! any way to enumerate them other than running kgdb and examining the ! shm_dictionary hash table. One of the owners of this is also postgres (maybe among others). ! I think I see a possible problem in i915, though I'm not sure if you'd ! trigger it just by using vt(4). It should be fixed in later FreeBSD ! versions, but is still a problem in 11. Here's a (untested) patch: Thank You, I'll keep that one in store, just in case. But now I found something simpler, while tracking error messages that came into my glance alongside: When patching to 11.4-p3, I had been reluctant to recompile lib32 and install that everywhere, and had kicked it off the systems. And obviousely, I had missed to recompile some of my old self-written binaries and they were still i386 and were called by various scripts. So what happens then is this: $ file scc.e scc.e: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 9.3 (903504), stripped $ ./scc.e ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap And this will cost about some (hundred?) kB of swapspace every time it happens. And they do not go away again, neither can the concerned jail do fully die again. So, maybe, when removing the lib32 & friends from the system, one must also remove the "options COMPAT_FREEBSD32" from the kernel, so that it might not try to run that binary, and maybe that would avoid the issue. (But then, what if one uses lib32 only in *some* jails? Some evil user in another jail can then bring along an i386 binary and crash the system by bloating the mem.) Anyway, my problem is now solved; as I needed these binaries back in working order anyway. regards, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to free used Swap-Space? (from errno=8)
On Wed, Sep 23, 2020 at 12:03:32AM +0300, Konstantin Belousov wrote: ! On Tue, Sep 22, 2020 at 09:11:49PM +0200, Peter wrote: ! > So what happens then is this: ! > ! > $ file scc.e ! > scc.e: ELF 32-bit LSB executable, Intel 80386, version 1 ! > (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, ! > for FreeBSD 9.3 (903504), stripped ! > ! > $ ./scc.e ! > ELF interpreter /libexec/ld-elf.so.1 not found, error 8 ! > Abort trap ! > ! > And this will cost about some (hundred?) kB of swapspace every time it ! > happens. And they do not go away again, neither can the concerned jail ! > do fully die again. ! In what sense it 'costs' ? Well that amount memory gets occupied. Forever, that is, until poweroff/reset. ! Can you show exact sequence of commands and outputs that demostrate your ! point ? What type of filesystem the binaries live on ? Oh, I didn't care. Originally on ZFS. When I tried to reproduce it, most likely on an NFS-4 share, as I didn't bother to put it anywhere special. ! I want to reproduce it locally. Yes that's great! Lets see which info You are lacking. Here we are now on my desktop box (mostly same machine, same configuration, i5-3570, 11.4-p3, amd64). I explicitely removed all the files that do not get installed when /etc/src.conf contains the "WITHOUT_LIB32=", but I have the COMPAT_FREEBSD32 still in the kernel. Now I fetch such an old R9.3/i386 binary from my backups, and drop it into some NFS filesystem: (That binary is only 4kB, I just attach it here, if you wanna try you can straightaway use that one - in normal operation it just converts some words stdin to stdout). admin@disp:510:1/ext/Repos$ dir usr2sys -rwxr-xr-x 1 bin bin4316 Apr 7 2016 usr2sys admin@disp:511:1/ext/Repos$ file usr2sys usr2sys: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 9.3 (903504), stripped admin@disp:513:1/ext/Repos$ mount | grep Repos edge-e:/ext/Repos on /ext/Repos (nfs, nfsv4acls) admin@disp:514:1/ext/Repos$ top | cat Mem: 952M Active, 1687M Inact, 419M Laundry, 4423M Wired, 774M Buf, 348M Free ARC: 1940M Total, 1378M MFU, 172M MRU, 2492K Anon, 48M Header, 340M Other 1134M Compressed, 2749M Uncompressed, 2.43:1 Ratio Swap: 20G Total, 36M Used, 20G Free As we see, this machine has 8 Gig installed and currently about no swap used. Now watch what happens: epos$ ./usr2sys ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap admin@disp:519:1/ext/Repos$ for i in `seq 1000` > do ./usr2sys > done ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap ... admin@disp:514:1/ext/Repos$ top | cat Mem: 1010M Active, 1807M Inact, 419M Laundry, 4523M Wired, 774M Buf, 69M Free ARC: 1940M Total, 1383M MFU, 166M MRU, 2503K Anon, 48M Header, 340M Other 1134M Compressed, 2750M Uncompressed, 2.43:1 Ratio Swap: 20G Total, 36M Used, 20G Free The free memory has already disappeared! admin@disp:521:1/ext/Repos$ for i in `seq 5000`; do ./usr2sys ; done ... admin@disp:522:1/ext/Repos$ top | cat Mem: 2154M Active, 78M Inact, 787M Laundry, 4722M Wired, 774M Buf, 89M Free ARC: 1753M Total, 1273M MFU, 97M MRU, 2653K Anon, 39M Header, 340M Other 953M Compressed, 2445M Uncompressed, 2.56:1 Ratio Swap: 20G Total, 358M Used, 20G Free, 1% Inuse Now the swapspace starts filling. Lets see if the placement filesystem makes any difference and go onto UFS: admin@disp:525:1/ext/Repos$ su - Password: root@disp:~ # cp /ext/Repos/usr2sys /var root@disp:~ # dir /var/usr2sys -rwxr-xr-x 1 bin bin 4316 Sep 22 23:55 /var/usr2sys root@disp:~ # mount | grep /var /dev/ada0p5 on /var (ufs, local, soft-updates) admin@disp:527:1/var$ ./usr2sys ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap admin@disp:521:1/ext/Repos$ for i in `seq 5000`; do ./usr2sys ; done ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap ... Ahh, that runs a LOT faster now than on the NFS! admin@disp:529:1/var$ top | cat Mem: 1546M Active, 67M Inact, 934M Laundry, 5121M Wired, 774M Buf, 161M Free ARC: 1646M Total, 1159M MFU, 107M MRU, 2686K Anon, 37M Header, 340M Other 849M Compressed, 2257M Uncompressed, 2.66:1 Ratio Swap: 20G Total, 1658M Used, 18G Free, 8% Inuse But memory leakage is similar to worse. admin@disp:530:1/var$ df tmp Filesystem1K-blocks UsedAvail Capacity Mounted on zdesk/var/tmp 24747504 231052 24516452 1%/var/tmp admin@disp:531:1/var$ cp usr2sys tmp admin@disp:532:1/var$ cd tmp admin@disp:533:1/var/tmp$ ./usr2sys ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap admin@disp:534:1/var/tmp$ for i in `seq 5000`; do ./usr2sys ; done ... You can see this is now a ZFS, and the behaviour is basically the same: Mem: 1497M Active, 5292K Inact, 803M Laundry, 5313M Wired, 774M Buf, 212M Free ARC: 1432M Total, 963M MFU, 105M MRU, 2511K Anon, 21M Header, 341M Other 6
How to free used Swap-Space?
Hi all, my machine should use about 3-4, maybe 5 GB swapspace. Today I found it suddenly uses 8 GB (which is worryingly near the configured 10G). I stopped all the big suckers - nothing found. I stopped all the jails - no success. I brought it down to singleuser: it tried to swapoff, but failed. I unmounted all filesystems, exported all pools, detached all geli, and removed most of the netgraphs. Swap is still occupied. Machine is now running only the init and a shell processes, has almost no filesystems mounted, has mostly native networks only, and this still occupies 3 GB of swap which cannot be released. What is going on, what is doing this, and how can I get this swapspace released?? It is 11.4-RELEASE-p3 amd64. Script started on Mon Sep 21 05:43:20 2020 root@edge# ps axlww UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 0 0 0 -16 00 752 swapin DLs -291:32.41 [kernel] 0 1 0 0 20 0 5416 248 wait ILs - 0:00.22 /sbin/init -- 0 2 0 0 -16 00 16 ftcl DL- 0:00.00 [ftcleanup] 0 3 0 0 -16 00 16 crypto_w DL- 0:00.00 [crypto] 0 4 0 0 -16 00 16 crypto_r DL- 0:00.00 [crypto returns] 0 5 0 0 -16 00 32 -DL- 11:41.94 [cam] 0 6 0 0 -8 00 80 t->zthr_ DL- 13:07.13 [zfskern] 0 7 0 0 -16 00 16 waiting_ DL- 0:00.00 [sctp_iterator] 0 8 0 0 -16 00 16 -DL- 2:05.20 [rand_harvestq] 0 9 0 0 -16 00 16 -DL- 0:00.04 [soaiod1] 010 0 0 155 00 64 -RNL - 17115:06.48 [idle] 011 0 0 -52 00 352 -WL- 49:05.30 [intr] 012 0 0 -16 00 64 sleepDL- 16:28.51 [ng_queue] 013 0 0 -8 00 48 -DL- 23:10.60 [geom] 014 0 0 -16 00 16 seqstate DL- 0:00.00 [sequencer 00] 015 0 0 -68 00 160 -DL- 0:23.64 [usb] 016 0 0 -16 00 16 -DL- 0:00.04 [soaiod2] 017 0 0 -16 00 16 -DL- 0:00.04 [soaiod3] 018 0 0 -16 00 16 -DL- 0:00.04 [soaiod4] 019 0 0 -16 00 16 idle DL- 0:00.83 [enc_daemon0] 020 0 0 -16 00 48 psleep DL- 12:07.72 [pagedaemon] 021 0 0 20 00 16 psleep DL- 4:12.41 [vmdaemon] 022 0 0 155 00 16 pgzero DNL - 0:00.00 [pagezero] 023 0 0 -16 00 64 psleep DL- 0:23.50 [bufdaemon] 024 0 0 20 00 16 -DL- 0:04.21 [bufspacedaemon] 025 0 0 16 00 16 syncer DL- 0:32.48 [syncer] 026 0 0 -16 00 16 vlruwt DL- 0:02.31 [vnlru] 027 0 0 -16 00 16 -DL- 7:11.58 [racctd] 0 157 0 0 20 00 16 geli:w DL- 0:22.03 [g_eli[0] ada1p2] 0 158 0 0 20 00 16 geli:w DL- 0:22.77 [g_eli[1] ada1p2] 0 159 0 0 20 00 16 geli:w DL- 0:31.08 [g_eli[2] ada1p2] 0 160 0 0 20 00 16 geli:w DL- 0:29.41 [g_eli[3] ada1p2] 0 70865 1 0 20 0 7076 3104 wait Ss v0 0:00.21 -sh (sh) 0 71135 70865 0 20 0 6392 2308 select S+ v0 0:00.00 script 0 71136 71135 0 23 0 7076 3068 wait Ss0 0:00.00 /bin/sh -i 0 71142 71136 0 23 0 6928 2584 -R+0 0:00.00 ps axlww root@edge# df Filesystem 512-blocksUsed Avail Capacity Mounted on /dev/ada3p31936568 860864 92078448%/ devfs2 2 0 100%/dev procfs 8 8 0 100%/proc /dev/ada3p43099192 1184896 166636842%/usr /dev/ada3p5 5803448112 525808 2%/var root@edge# pstat -s Device 512-blocks UsedAvail Capacity /dev/ada1p2.eli 10485760 5839232 464652856% root@edge# top | cat last pid: 71147; load averages: 0.19, 0.08, 0.09 up 3+03:21:0005:44:12 5 processes:1 running, 4 sleeping Mem: 9732K Active, 10M Inact, 882M Laundry, 1920M Wired, 10M Buf, 1023M Free ARC: 335K Total, 16K MFU, 304K MRU, 15K Header 320K Compressed, 2944K Uncompressed, 9.20:1 Ratio Swap: 5120M Total, 2851M Used, 2269M Free, 55% Inuse PID USERNAMETHR PRI NICE SIZERES STATE C TIMEWCPU COMMAND 70865 root 1 200 7076K 3104K wait2 0:00 0.00% sh 71135 root 1 200 6392K 2308K select 1 0:00 0.00% script 71136 root 1 200 7076K 3068K wait2 0:00 0.00% sh 71146 root 1 200 7928K 2980K CPU00 0:00 0.00% top 71147 root 1 200 6300K 2088K piperd 1 0:00 0.00%
Help! 12.2 mem ctrl knob missing, might need 3 times more memory
Hiya, after upgrading 11.4 -> 12.2, I get this error: > sysctl: unknown oid 'vm.pageout_wakeup_thresh' at line 105 How do I adjust the paging now? The ARC is much too small: Mem: 1929M Active, 109M Inact, 178M Laundry, 1538M Wired, 37M Buf, 88M Free ARC: 729M Total, 428M MFU, 154M MRU, 196K Anon, 25M Header, 122M Other 118M Compressed, 533M Uncompressed, 4.52:1 Ratio Swap: 10G Total, 1672M Used, 8567M Free, 16% Inuse With 11.4 there was 200M active, 2500M wired, 4200M swap and the ARC stayed filled to the configured arc_max. And there are not even all applications loaded yet! Config: installed 4G ram, application footprint ~11G. vm.pageout_wakeup_thresh=11000 # default 6886 vm.v_inactive_target=48000 # default 1.5x vm.v_free_target vfs.zfs.arc_grow_retry=6 # override shrink-event from pageout (every 10sec.) I did this intentional: the ram is over-used with applications. These applications are rarely accessed, but should respond to the network. So they are best accomodated in paging space - taking a few seconds for page-in at first access does not matter, and not many of them are accessed at the same time. So, I want the machine to page out *before* shrinking the ARC, because pageout is a normal happening in this layout. The above tuning achieved exactly that, but now in 12.2 it seems missing. Without that I would need to install the full 12G RAM, which is just a waste. How do I get this behaviour back with 12.2? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Panic: 12.2 fails to use VIMAGE jails
After clean upgrade (from source) from 11.4 to 12.2-p1 my jails do no longer work correctly. Old-fashioned jails seem to work, but most are VIMAGE+NETGRAPH style, and do not work properly. All did work flawlessly for nearly a year with Rel.11. If I start 2-3 jails, and then stop them again, there is always a panic. Also reproducible with GENERIC kernel. Can this be fixed, or do I need to revert to 11.4? The backtrace looks like this: #4 0x810bbadf at trap_pfault+0x4f #5 0x810bb23f at trap+0x4cf #6 0x810933f8 at calltrap+0x8 #7 0x80cdd555 at _if_delgroup_locked+0x465 #8 0x80cdbfbe at if_detach_internal+0x24e #9 0x80ce305c at if_vmove+0x3c #10 0x80ce3010 at vnet_if_return+0x50 #11 0x80d0e696 at vnet_destroy+0x136 #12 0x80ba781d at prison_deref+0x27d #13 0x80c3e38a at taskqueue_run_locked+0x14a #14 0x80c3f799 at taskqueue_thread_loop+0xb9 #15 0x80b9fd52 at fork_exit+0x82 #16 0x8109442e at fork_trampoline+0xe This is my typical jail config, designed and tested with Rel.11: rail { jid = 10; devfs_ruleset = 11; host.hostname = "xxx.xxx.xxx.org"; vnet = "new"; sysvshm; $ifname1l = nge_${name}_1l; $ifname1l_mac = 00:1d:92:01:01:0a; vnet.interface = "$ifname1l"; exec.prestart = " echo -e \"mkpeer eiface crhook ether\nname .:crhook $ifname1l\" \ | /usr/sbin/ngctl -f - /usr/sbin/ngctl connect ${ifname1l}: svcswitch: ether link2 ifname=`/usr/sbin/ngctl msg ${ifname1l}: getifname | \ awk '$1 == \"Args:\" { print substr($2, 2, length($2)-2)}'` /sbin/ifconfig \$ifname name $ifname1l /sbin/ifconfig $ifname1l link $ifname1l_mac "; exec.poststart = " /usr/sbin/jexec $name /sbin/sysctl kern.securelevel=3 ; "; exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:"; } ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Analyzing kernel panic from VIMAGE/Netgraph takedown
Stopping a VIMAGE+Netgraph jail in 12.2 in the same way as it did work with Rel. 11.4, crashes the kernel after 2 or 3 start/stop iterations. Specifically. this does not work: exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:"; Also this new option from Rel.12 does not work either, it just gives a few more iterations: exec.release = "/usr/sbin/ngctl shutdown ${ifname1l}:"; What seems to work is adding a delay: exec.poststop = " sleep 2 ; /usr/sbin/ngctl shutdown ${ifname1l}: ; "; The big question now is: how long should the delay be? This example did run a test with 100 start/stop iterations. But then, on a loaded machine stopping a jail that had been running for a few months, is an entirely different matter: in such a case the jail will spend hours in "dying" state, while in this test the jid became instantly free for restart. In any case, as all this did work flawlessly with Rel. 11.4, there is now something broken in the code, and should be fixed. PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
Hi Kristof, it's great to read You! On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote: ! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 244703, ! 250870. epair? No. It is purely Netgraph here. ! I pushed a fix for that in CURRENT in r368237. It’s scheduled to go into ! stable/12 sometime next week, but it’d be good to know that it fixes your ! problem too before I merge it. ! In other words: can you test a recent CURRENT? It’s likely fixed there, and ! if it’s not I may be able to fix it quickly. Oh my Gods. No offense meant, but this is not really a good time for that. This is the most horrible upgrade I experienced in 25 years FreeBSD (and it was prepared, 12.2 did run fine on the other machine). I have issue with mem config https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/ I have issue with damaged filesystem, for no apparent reason https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/ Then I have this issue here which is now gladly workarounded https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365 and when I then dare to have a look at my applications, they look like sheer horror, segfaults all over, and I don't even know where to begin with these. Other option: can you make this fix so that I can patch it into 12.2 source and just redeploy? I tried to apply the changes from r368237 into my 12.2 source, that seemed to be quite obvious, but it doesn't work; jails fail to remove entirely: # service jail stop rail Stopping jails: rail. # jexec rail jexec: jail "rail" not found -> it works once. # service jail start rail Starting jails: rail. # service jail stop rail Stopping jails: rail. # jexec rail root@rail:/ # ps ax ps: empty file: Invalid argument -> And here it doesn't work anymore, and leaves a skull of a jail one cannot get rid of. Cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote: ! Yeah, the bug is not exclusive to epair but that’s where it’s most easily ! seen. Ack. ! Try http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch Great, thanks a lot. Now I have bad news: when playing yoyo with the next-best three application jails (with all their installed stuff) it took about ten up and down's then I got this one: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80aad73c stack pointer = 0x28:0xfe003f80e810 frame pointer = 0x28:0xfe003f80e810 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 15486 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607450838 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0 vpanic() at vpanic+0x17b/frame 0xfe003f80e520 panic() at panic+0x43/frame 0xfe003f80e580 trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0 trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630 trap() at trap+0x4cf/frame 0xfe003f80e740 calltrap() at calltrap+0x8/frame 0xfe003f80e740 --- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 0xfe003f80e810 --- ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810 ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850 ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0 ifioctl() at ifioctl+0x448/frame 0xfe003f80e990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe358, rbp = 0x7fffe450 --- Uptime: 9m51s Dumping 899 out of 3959 MB: I decided to give it a second try, and this is what I did: root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 8 kerb.***.org /j/kerb 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail stop rail Stopping jails: rail. root@edge:/var/crash # service jail stop tele Stopping jails: tele. root@edge:/var/crash # service jail stop kerb Stopping jails: kerb. root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn root@edge:/var/crash # jls -d JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail start kerb Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: Broken pipe Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfe00540ea658 frame pointer = 0x28:0xfe00540ea670 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13420 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607451910 KDB: stack backtrace: db_trace_self_wrapper() at
Re: Panic: 12.2 fails to use VIMAGE jails
Here is the next funny crashdump - I obtained this one twice and also the sysctl_rtsock() again. I can reproduce this by just starting and stopping a most simple jail that does only exec.start = "/bin/sleep 4 &"; (And as usual, when I let it time out, nothing bad happens.) Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 02 instruction pointer = 0x20:0x80a2ac45 stack pointer = 0x28:0xfe0047cf2890 frame pointer = 0x28:0xfe0047cf2890 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13557 (ifconfig) trap number = 9 panic: general protection fault cpuid = 1 time = 1607469295 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0047cf25a0 vpanic() at vpanic+0x17b/frame 0xfe0047cf25f0 panic() at panic+0x43/frame 0xfe0047cf2650 trap_fatal() at trap_fatal+0x391/frame 0xfe0047cf26b0 trap() at trap+0x67/frame 0xfe0047cf27c0 calltrap() at calltrap+0x8/frame 0xfe0047cf27c0 --- trap 0x9, rip = 0x80a2ac45, rsp = 0xfe0047cf2890, rbp = 0xfe0047cf2890 --- strncmp() at strncmp+0x15/frame 0xfe0047cf2890 ifunit_ref() at ifunit_ref+0x59/frame 0xfe0047cf28d0 ifioctl() at ifioctl+0x427/frame 0xfe0047cf2990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe0047cf29f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe0047cf2ac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe0047cf2bf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0047cf2bf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe3b8, rbp = 0x7fffe450 --- Uptime: 8m54s Dumping 880 out of 3959 MB: ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: ! > Sorry for the bad news. ! > ! You appear to be triggering two or three different bugs there. That is possible. Then there are two or three different bugs in the production code. In any case, my current workaround, i.e. delaying in the exec.poststop > exec.poststop = " >sleep 6 ; >/usr/sbin/ngctl shutdown ${ifname1l}: ; >"; helps for it all and makes the system behave solid. This is true with and without Your patch. ! Can you reduce your netgraph use case to a small test case that can trigger ! the problem? I'm sorry, I fear I don't get Your point. Assumed there are actually two or three bugs here, You are asking me to reduce config so that it will trigger only one of them? Is that correct? Then let me put this different: assuming this is the OS for the life support system of the manned Jupiter mission. Then, which one of the bugs do You want to get fixed, and which would You prefer to keep and make Your oxygen supply cut off? https://www.youtube.com/watch?v=BEo2g-w545A ! I’m not likely to be able to do anything unless I can reproduce ! the problem(s). I understand that. From Your former mail I get the impression that you prefer to rely on tests. I consider this a bad habit[1] and prefer logical thinking. So lets try that: We know that there is a problem with taking down an interface from a VIMAGE, in the way it is done by "jail -r". We know this problem can be solidly workarounded by delaying the interface takedown for a short time. Now with Your patch, we do not get the typical crash at interface takedown. Instead, all of a sudden, there are strange crashes from various other places. And, interestingly, we get these also when STARTING a jail. I think this is not an additional problem, it is instead a valuable information (albeit not the one You might like to get). Furthermore, we get these new crashes always invoked by "ifconfig", and they seem to have in common that somebody tries to obtain information about some interface configuration and receives some bogus. I might conclude, just out of the belly without looking into details, that either - your patch achieves to garble some internal interface data, instead of what it is intended to do, or - the original problem manages to garble internal interface data (leading to the usual crash), and Your patch does not achieve to solve this, but only protects from the immediate consequence. It might also be worth consideration, that, while the problem may be more easy to reproduce with epair, this effect may or may not be a netgraph specific one[2]. Now lets keep in mind that a successful test means EXACTLY NOTHING. By which other means can we confirm that Your patch fully achieves what it is intended for? (E.g. something like dumping and verifying the respective internal tables in-vivo) (Background: It is not that I would be unwilling to create clean and precisely reproducible scenarious, But, one of my problems is currently, I only have two machines availabe: the graphical one where I'm just typing, and the backend server with the jails that does practically everything. Therefore, experimenting on any of them creates considerable pain. I'm working on that issue, trying to get a real server board for the backend so to get the current one free for testing - but what I would like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would easily find on yardsales - and seldom for an acceptable price.) cheerio, PMc [1] Rationale: a failing test tells us that either the test or the application has a bug (50/50 chance). A succeeding test tells us that 1 equals 1, which we knew already before. In fact, tests tell us *nothing at all* about the state of our code, and specifically, 'successful' outcomes do NOT mean that things are all correct. The only true usefulness of tests is to protect against re-introducing a fault that was already fixed before, i.e. regressions. [2] My netgraph configuration consists of bringing up some bridges and then attaching the jails to them. Here is the bridge starter (only respective component, there are more of these populated, but probably not influencing the issue): #! /bin/sh # PROVIDE: netgraphs # REQUIRE: netwait # BEFORE: NETWORKING . /etc/rc.subr name="netgraphs" start_cmd="${name}_start" stop_cmd="${name}_stop" load_rc_config $name netgraphs_graphs="svc" netgraphs_svc_if1_name="nge_svc_1u" netgraphs_svc_if1_mac="00:1d:92:01:02:01" netgraphs_svc_if1_addr="***.***.***.***/29" netgraphs_svc_start() { local _ifname if ngctl info svcswitch: > /dev/null 2>&1; then netgraphs_svc_stop fi echo "Creating SVC Switch" ngctl -f - < /dev/null 2>&1; then $_cmd else echo "netgraphs-start: object $i not found" >&2 fi done
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote: ! You seem to have misinterpreted this; he doesn't want to narrow it ! down to one bug, he wants simple steps that he can follow to reproduce Maybe I did misinterpret, but then I don't really understand it. I would suppose, when testing a proposed fix, the fact that it does break under the exact same conditions as before, is all the information needed at that point. Put in simple words: that it does not work. ! any failure, preferably steps that can actually be followed by just ! about anyone and don't require immense amounts of setup time or ! additional hardware. Engineering does not normally work that way. I'll try to explain: when a bug is first encountered, it is necessary to isolate it insofar that somebody who is knowledgeable of the code, can actually reproduce it, in order to have a look at it and analyze what causes the mis-happening. If then a remedy is devised, and that does not work as expected, then the flaw is in the analysis, and we just start over from there. In fact, I would have expected somebody who is trying to fix such kind of bug, to already have testing tools available and tell me exactly which kind of data I might retrieve from the dumps. The open question now is: am I the only one seeing these failures? Might they be attributed to a faulty configuration or maybe hardware issues or whatever? We cannot know this, we can only watch out what happens at other sites. And that is why I sent out all these backtraces - because they appear weird and might be difficult to associate with this issue. I don't think there is much more we can do at this point, unless we were willing to actually look into the details. Am I discouraging? Indeed, I think, engineering is discouraging by it's very nature, and that's the fun of it: to overcome odds and finally maybe make things better. And when we start to forget about that, bad things begin to happen (anybody remember Apollo 13?). But talking about disencouragement: I usually try to track down defects I encounter, and, if possible, do a viable root-cause analysis. I tended to be very willing to share the outcomes and. if a solution arises, by all means make that get back into the code base; but I found that even ready made patches for easy matters would linger forever in the sendbug system without anybody caring, or, in more complex cases where I would need some feedback from the original writer, if only to clarify the purpose of some defaults or verify than an approach is viable, that communication is very difficult to establish. And that is what I would call disencouraging, and I for my part have accepted to just leave the developers in their ivory tower and tend to my own business. cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Wed, Dec 09, 2020 at 02:00:37PM +1100, Dewayne Geraghty wrote: ! On a jail with config: ! exec.start = "/bin/sh -x /etc/rc"; ! exec.stop = "/bin/sh /etc/rc.shutdown"; ! exec.clean; ! ! test_prod { jid=7; persist; ip4.addr = ! "10.0.7.96,10.0.5.96,127.0.5.96"; devfs_ruleset = "6"; ! host.hostuuid=---0001-0302; host.hostid=000302; } ! ! I successfully performed ! for i in `seq 10`; do jail -vc test_prod; sleep 3; jail -vr test_prod; done But, this is not a VIMAGE jail, is it? Old-style jails are unaffected by this issue. Only VIMAGE jails, using epair or netgraph, might be affected. (In that case, you would not have an "ip4.addr" configured, and rather a "vnet.interface".) ! I think the normal use of jail.conf is to NOT explicitly use a jid in ! the definition, which may be why this may not have been picked up? ! (Maybe a clue). This is an interesting point. When you stop a jail, it may stay for a more or less long time in a "dying" state (visible with "jls -d"), keeping the jid occupied. During that time, the jail cannot be restarted with that same jid. Once ago, I read people complaining about this, and the advice was to just not define the jid in the definition, so that the jail can be restarted immediately (and will probably grab another jid). I did not find a solid explanation for what is happening in that "dying" state (and why it does take more or less long), even less an approach to fix that. I found some theories circling the net, but these don't really figure. So I would need to look into the source myself - and I did postpone that indefinitely. ;) But what I found out, with the VIMAGE jails (those that can carry their own network interfaces), when you make a slight mistake with managing and handling the interfaces, then the jail will stay in the dying state forever. If you don't make a mistake, then it will finally die within some time. So I decided to keep the jid, so that rightaway nothing is allowed to linger from misconfigured unnoticed. (The tradeoff is obviousely that one might have to wait before restarting.) cheerio, PMc P.S. 41 celsius is phantastic! I envy You! :) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 4.11 snapshots?
On Sun, 2006-May-21 13:20:24 -0600, Brett Glass wrote: Well, y'know, if they could release a FreeBSD 2.2.9 (as was done last month), it shouldn't be a problem to do a 4.12 release as a last gasp to tide us over Maybe for April 1st next year - though novel April Fools Day jokes are always much better. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD Security Survey
On Mon, 2006-May-22 15:20:11 -, FreeBSD User wrote: Since time is always and issue, if the system could by default (without an admin having to write scripts and/or apps, or manually update) update itself for both system and installed ports/packages, it likely would reduce security issues exponentially. I think it would substantially reduce the reliability and security. Firstly, automatically installing arbitrary fixes on a production system is almost always a bad idea. The release engineering and security teams do regression testing but can't test exactly your system configuration and there's a non-trivial likelihood that installing patch X will break something that your configuration relies on. This can be mitigated by using a test system and rolling out the updates from it, but that negates the whole point. It's also likely to inconvenience users. Our ITS department take it upon themselves to automatically roll out (wintel) desktop updates. This almost always results in your desktop machine insisting that it needs to be rebooted immediately when you are in the middle of doing something crucial - thus breaking your concentration and potentially losing data (my manager managed to lose 3 man-hours work once). I, for one, would hate it if my FreeBSD boxes started doing the same. Specific FreeBSD versions aren't maintained forever. An install it and forget it philosophy will increase the number of machines that aren't being patched because they are running unmaintained versions of FreeBSD. With the current approach, the sysadmin is aware that particular machines need to be updated to a newer version. If everyting is automatic, the sysadmin will probably forget. Finally, it only takes one security failure in the update process for someone undesirable to own all the FreeBSD machines that have been left in this default mode. Despite the best efforts of FreeBSD developers, FreeBSD will always contain bugs and some of them will be security holes. Any automatic update process needs to balance the benefits of reducing the number of unpatched boxes against the risks of the update system being subverted. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Odd RS232 problem
On Tue, 2006-May-23 19:23:20 +0930, Daniel O'Connor wrote: I would hope that 9600 baud wouldn't be *too* fast for a 2GHz CPU :( That depends on what else is sharing the IRQ. PLIP can give you 10's of msec of latency. PIO disks can also destroy latency as can NE2000-style NICs. -- Peter Jeremy pgpW6x4995kmg.pgp Description: PGP signature
Re: 6.1-stable crash
On Tue, 2006-May-23 19:04:18 +, Hunter Fuller wrote: Am I the only one who sees the oxymoronity in 6.1-stable crash? Hopefully. As is regularly pointed out, 'stable' refers primarily to the ABI. FreeBSD 6.1-stable is still under active development, though only code that has previously been tested in -current is supposed to be commited. There are also fairly regular postings pointing out that _all_ software has bugs. Some of these bugs will cause crashes. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: (bsd)tar is broken on 6.1
On Fri, 2006-May-26 11:34:43 +0200, Patrick M. Hausen wrote: -rw-r--r-- 1 jmz jmz4312 Apr 16 1947 supclkrd.prg Since there cannot be a date before January 1st 1970, 0:00 on any Unix system, i guess there's something seriously broken here. Why do you say that? time_t is signed so it can represent a date prior to 1970. In theory, a file prepared on an earlier computer could have been transferred onto a Unix system whilst retaining its original modification time. In this particular case, the date seems unlikely. That said, it is a perfectly valid date and it would be nice if tar could support it - though tar(5) states dates before the epoch are not handled consistently. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How can I know which files a proccess is accessing?
On Tue, 2006-Jun-06 18:16:39 -0300, Eduardo Meyer wrote: On 6/6/06, David Wolfskill [EMAIL PROTECTED] wrote: You may find the lsof port useful for answering such questions. I tried it, but it seems that I found some limitations: lsof: no local file space at PID 16543 I don't know that exact message but lsof needs to very closely match your running kernel: You should have the kernel sources installed when you build lsof. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Error while 'make buildworld' in terminal.o
On Wed, 2006-Jun-07 10:03:12 +0900, ?? ??? wrote: After that, I was re-synchronise /usr/src by cvsup whith tag 'src-all'. Gave command`s: # make cleandepends # make cleanworld Possibly there's some buildworld output inside your /usr/src tree. Try deleting /usr/obj and then running make clean. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: unmounting a filesystem safely that doesn't exist anymore
On Sat, 2006-Jun-10 19:40:41 +0200, Bjrn Knig wrote: I did a mistake: I unplugged my digital camera accidentally before I unmounted the filesystem. *doh* This happens very often, because I'm very scatterbrained. =) Your best solution may be to use mtools (ports/emulators/mtools) rather than mounting the filesystem. changed ad hoc. I just want to know if somebody knows a workaround or small trick that prevents the other filesystems from being unclean on next boot-up. The only way to do this is to have all the other filesystems mounted read-only. The filesystem clean flag is part of the superblock and is cleared when a filesystem is mounted. It will be set only if the filesystem is cleanly unmounted. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
'make release' questions...
I have been mucking about w/ 'make release' for some time now (stripping out OpenSSH, sendmail, Heimdal, bits oF BIND, etc.) and while I now have a working .iso image, that will install and update, I have some questions that 'man release' just won't answer. :) First, is there any way to instruct 'make release' to just build certain packages (and their dependencies) for inclusion in the release instead of a blanket NOPORTS? There's no need for us spend two/three days compiling all the various ports when we will only use a small handful of them on most of our boxes. (and would speed up the amount of time it takes to roll out a new release) ;) Second, is there a way to build/tell sysinstall that if NO_OPENSSH is set, that it doesn't ask you whether you want to enable SSH logins? Thanks again in advance for any advice you can provide. Best Wishes - Peter -- [ http://www.plosh.net/ ] - Earth Halted: Please reboot to continue signature.asc Description: OpenPGP digital signature
Re: Powerd: Adaptive mode causing a hard hang
On Sat, 2006-Jun-17 08:43:07 -0500, Scot Hetzel wrote: I noticed a problem where my system would hard hang when powerd was enabled with no changes to the powerd flags in rc.conf. Yesterday, I tracked the problem down to powerd's Adaptive mode causing the hard hang. I reported exactly the same problem in mid-February on -amd64 with my HP nx6125. I found that it was random but could be triggered by changing the CPU clock using either cpufreq or raw ACPI frequency control. Unfortunately, I don't have a solution. I've taken to manually adjusting dev.cpu.0.freq based on what I'm doing and have only had a single hang in the past four months. -- Peter Jeremy pgpHl4a1DWqvN.pgp Description: PGP signature
make rerelease broken at camcontrol...
Thanks to all who answered my 'make release' questions; now that I have done the initial release cut, now I am trying out 'make rerelease', and it's bombing at the stage 4.4: building everything stage. -=- === sbin/camcontrol (all) cc -O2 -fno-strict-aliasing -pipe -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wcast-align -Wunused-parameter -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -o camcontrol camcontrol.o util.o modeedit.o -lcam -lsbuf modeedit.o(.text+0xd14): In function `mode_edit': : undefined reference to `mode_sense' modeedit.o(.text+0xd5c): In function `mode_edit': : undefined reference to `mode_sense' modeedit.o(.text+0xdf9): In function `mode_edit': : undefined reference to `mode_sense' modeedit.o(.text+0xe86): In function `mode_edit': : undefined reference to `mode_select' modeedit.o(.text+0xebf): In function `mode_edit': : undefined reference to `mode_sense' modeedit.o(.text+0x11ec): In function `mode_list': : undefined reference to `mode_sense' *** Error code 1 Stop in /usr/src/sbin/camcontrol. *** Error code 1 Stop in /usr/src/sbin. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. + exit 1 + umount /dev *** Error code 1 (ignored) -=- Looking in CVS; modeedit.c hasn't changed in two years, so I am perplexed at what is going on here. Any ideas? The make rerelease command used is: make -i rerelease NODOC=YES NO_FLOPPIES=YES CHROOTDIR=/hog/release \ BUILDNAME=6.1-RELEASE-p2 CVSROOT=/hog/FreeBSD-CVS RELEASETAG=RELENG_6_1 (no optimizations, etc.) Thanks - Peter signature.asc Description: OpenPGP digital signature
Re: 'make release' questions...
Andrew Li wrote: First, is there any way to instruct 'make release' to just build certain packages (and their dependencies) for inclusion in the release instead of a blanket NOPORTS? There's no need for us spend two/three days From my experience with playing with make release, you can do it. First build your packages with make package or pkg_create to get a package tarball. Then put the packages into your_path/release/R/cdrom/disc1 into a directory call packages. Create the package directory structure, like packages/All, packages/your_package_category, ... O.k. did that (and generated a new INDEX file using scrubindex.pl) However in sysinstall, after selecting the packages and selecting Install it remarks that: This is disc 1. Package $package_name is on Disc 0. Would you like to switch discs now? Disk 0? After that, modify the INDEX file so it only contain your packages (plus dependencies). Then run mkisomages.sh to create your ISO. So I have used scrubindex.pl to generate a INDEX of just our packages, however, I have to generate a new master INDEX-6 file to account for the fact that we build our packages w/o x11 support and with GSSAPI. (so scrubindex.pl can then strip the rest out) However after a 'make index' things like mtr-nox11 are included into INDEX-6, but not openssh-gssapi? The only difference is that the -gssapi moniker is declared as a GSSAPI_SUFFIX make varaible vs. the normal PKGNAMESUFFIX. I'll send a version of this email to ports@ to get their input. That's all from memory, may contain rough edges, hope it helps anyway. Thanks... The base release is done, it's just the pesky frills that take forever to resolve... :) Best Wishes - Peter signature.asc Description: OpenPGP digital signature
Re: Effects of changing tar's -b option.
On Sat, 2006-Jun-24 20:12:56 -0700, Nikolas Britton wrote: Test Setup: 250 50MB files (13068252KB) dd if=/dev/random of=testfile bs=1m count=50 Ethernet mtu=6500 Transferred files were wiped after every test with 'rm -r *'. Test: hostB: nc -4l port | tar xpbf n - hostA: date; tar cbf n - . | nc hostB port; date Test Results: seconds = n 645sec. = 1024 670sec. = 512 546sec. = 256 503sec. = 128 500sec. = 128 (control) 515sec. = 96 508sec. = 64 501sec. = 20 (default) Conclusions: Make your own. I don't think that's so unexpected. tar doesn't use multiple buffers so filling and emptying the buffer is done serially. Once the buffer exceeds the space in the pipe buffer and the local TCP send buffer, then the write from the hostA tar is delayed until the TCP buffer can drain. At the same time, the read from the hostB tar is blocked waiting for data from the network. Optimal throughput will depend on maximally overlapping the file reads on hostB with the network traffic and file writes on hostB. This, in turn, means you want to be able to hold at least a full buffer of data in the intervening processes and kernel buffers. Assuming that you aren't network bandwidth limited, you should look at increasing net.inet.tcp.sendspace and maybe net.inet.tcp.recvspace, or using an intervening program on hostA that does its own re-buffering. -- Peter Jeremy pgpG7EqbGqXO2.pgp Description: PGP signature
Re: RELENG_6_1 unstable also ...
On Sun, 2006-Jun-25 00:20:41 -0300, Marc G. Fournier wrote: cvsup'd both RELENG_6 and RELENG_6_1 ... both cause the server to crash ... in fact, barely get into a buildworld with RELENG_6_1 and get: internal:0: internal compiler error: Abort trap: 6 This tends to indicate a hardware problem. Check the usual suspects. but, just finished a clean buildworld with RELENG_6_1 sources on a 6.1-RC1 kernel installed from CD, and not a hiccup ... That doesn't rule out hardware. A slight change in the data or code layout can cause a pattern-sensitive fault vanish or critical data to miss a marginal DRAM cell. -- Peter Jeremy pgpQweltzDuE4.pgp Description: PGP signature
Re: What denotes a 'blocked' process?
Looking at the sources: The 'blocked' column in vmstat is the sum of (struct vmtotal).t_dw /* jobs in ``disk wait'' (neg priority) */ and (struct vmtotal).t_pw /* jobs in page wait */ 'systat -v' splits these into two fields (Proc:d and Proc:p) as does sysctl vm.vmtotal It's difficult to map these counters onto ps output. State 'D' and 'W' should catch most of them. You might find it useful looking through the MWCHAN column for anything looking suspicious. -- Peter Jeremy pgpMYVQSfkNHN.pgp Description: PGP signature
Re: FreeBSD 6.x CVSUP today crashes with zero load ...
On Tue, 2006-Jun-27 00:01:08 +0300, Dmitry Pryanishnikov wrote: On Mon, 26 Jun 2006, Robert Watson wrote: I think this is a useful activity, especially if you've already run extensive memory testing on the box. If you haven't yet done that, I encourage you to take a break from buildworld's and make sure the memory tests pass. I spent several months on and off trying to track down a bug a few years ago, which turned out to be a one bit error in memory on the box. It would appear and This is precisely the task which hardware ECC solves: to correct any single-bit memory error and to detect 2-bit and most of several-bit errors. Parity will detect any odd number of bits in error. ECC can typically correct correct one bit and detect 2 or any odd number of errors. Note that ECC only checks the path between the RAM and DRAM controller (eg northbridge). You can also get errors between the northbridge and the CPU (including the cache). Some caches (eg Alpha) have parity to help here. Mainframes typically have ECC or parity on _all_ datapaths (including through the ALU) to catch those errors. -- Peter Jeremy pgpDBoWJivs0T.pgp Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
--- Fabian Keil [EMAIL PROTECTED] wrote: There was a request for Tor related problem reports a while ago, I couldn't find the message again, but I believe it was posted here. Is anyone on this list running a Tor node on FreeBSD 6.1-RELEASE or later with similar or higher load? I am hitting the same issue still Fabian. I had that PR closed as works for me with insignificant testing. I am still crashing (as before) but maybe only once every week or two instead of every couple hours with 6.1 RELEASE. The PR really should be reopened. Couple other folk have emailed me with similiar issues offline (and also spoke with it about me on IRC). I am still 99% sure this is NOT A TOR ISSUE!!! I have spoken with many tor users on other platforms and the actual developers and this is not seen by any of them. I can also recreate this crash NOT running tor but just generating a heavy load with freenet and i2p. My gut feeling is still a network code regression between 5.x - 6.x with the stack rewrite. I am at a loss how to troubleshoot this anymore (as noted in the PR and my earlier email). I truly hope somebody (e.g. a developer) can shed some light on this issue or troubleshoot it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Thu, 2006-Jun-29 17:30:07 +1000, Michael Vince wrote: For me its IP alias additions take 1 or maybe 2secs, but it is noticeable, but really isn't an issue for me. But it obviously is for Atanas, who has 100's of aliases. As far as I have noticed the em driver in 6.1 after being rebuilt is at its peak of driver quality, ... now I can get up to 850mbits on some and on the lowest side 500mbits/sec on others which I suspect is due to cable quality etc. In other words, the shortcomings of the em device/driver aren't an issue for you. Other people have different requirements and the em(4) is currently unsuitable for them. -- Peter Jeremy pgpCzMs0W9tP4.pgp Description: PGP signature
6.1-RELEASE won't boot on Compaq 1580
I've just acquired a couple of old Compaq Armada laptops. I can successfully install 6.1-RELEASE but the system won't boot from the HDD: The MBR menu displays but pressing F1 (the FreeBSD partition) just beeps. Pressing F3 (the Compaq configuration partition) works. I've checked that the CHS parameters in the MBR match those reported by the BIOS. I've tried using boot0cfg (on the fixit CD) to switch between packet and non-packet mode, as well as trying a 4.9-RELEASE MBR. I've checked that the boot sectors look sane (and in the correct sectors) using dd. I've previously used similar models with 4.11 so I wasn't expecting any problems. Before I start re-writing boot0.S, does anyone have any suggestions on how to debug this or what I've missed? -- Peter Jeremy pgpPxRvWikbIW.pgp Description: PGP signature
Re: 6.1-RELEASE won't boot on Compaq 1580
On Sat, 2006-Jul-01 08:02:56 +1000, Peter Jeremy wrote: I've just acquired a couple of old Compaq Armada laptops. I can successfully install 6.1-RELEASE but the system won't boot from the HDD: The MBR menu displays but pressing F1 (the FreeBSD partition) just beeps. Pressing F3 (the Compaq configuration partition) works. For the archives: Rebuilding (and re-installing) boot0 from the sources on the 6.1-RELEASE CD using the default options (no /etc/make.conf) fixes the problem. I have no idea why the boot0 on the CD doesn't work for me (though the two boot0's are definitely different). -- Peter Jeremy pgp7NJlV1QOns.pgp Description: PGP signature
Re: FreeBSD 6.0-6.1 binary upgrade script
On Sun, 2006-Jul-09 00:42:31 -0700, Colin Percival wrote: I have written an automatic script for performing binary FreeBSD 6.0 - FreeBSD 6.1 upgrades. That sounds useful. Are you intending to provide this for future FreeBSD minor-revision releases? Naturally, the cryptographic hashes of all the files are verified against values stored in the script, so as long as you trust the FreeBSD Security Officer (and if you don't, why are you running FreeBSD?), the process is entirely secure. But how can I tell that the script came from the FreeBSD Security Officer? You have signed your mail with a key (ID 0xD09347FC) that claims to be a Colin Percival with an Oxford Uni address (whereas this mail has a freebsd.org address) but the key that I downloaded from a PGP keyserver has no other signatures. You don't have a key in the FreeBSD CVS repository that I can locate and I can't find any keys on www.daemonology.net. Basically, I only have your word that you are who you claim to be. (Of course, I still need to be able to trust the FreeBSD CVS repository but if I can't trust that, I can't trust my OS either). If you really are the FreeBSD Security Officer why can't I find copies of your key and FreeBSD SO key (0xCA6CDFB2) that are counter-signed by each other? -- Peter Jeremy pgpi5U6qviUzV.pgp Description: PGP signature
Re: MySQL and default memory limits (mysqld: Out of memory)
On Sun, 2006-Jul-09 23:45:44 +0200, Mathieu Arnold wrote: +-Le 09/07/2006 17:36 -0400, Mike Jakubik a dit : | Exactly, its nice being able to see the current values. How else can i | see what the values are set to? As I previously said, it's 512M on i386, and 1G on 64 bit platforms. That doesn't answer Mike's question. The _default_ i386 size is 512M, the _current_ values can be found using ulimit (getrlimit(2)). Note that on non-PAE i386, the maximum process size is limited by the kernel size - there is a total of 4GB address space available and by default, the kernel has 2GB allocated to it. This isn't quite enough if you have 4GB RAM. -- Peter Jeremy pgpPna19fy7xV.pgp Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Hey Fabian, To you have pf running? If so can you turn it off for a bit a see if you still crash. On my box I was getting all sorts of witness kbd backtraces on pf and since turning pf off (maybe a week ago), haven't crashed yet. Going to let it keep running unmetered for another 2 weeks and see if I crash or not. -Peter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to setup polling on 'bge' interface
On Wed, 2006-Jul-19 22:38:56 -0400, Ed Maste wrote: - You may have to adjust some parameters in the kern.polling sysctl tree - specifically, kern.polling.burst_max, kern.polling.each_burst and kern.polling.user_frac might need tweaking. Note that increasing kern.polling.burst_max and kern.polling.each_burst will also increase the number of soft interrupts. - The polling feedback algorithm does not work very well if your workload is focused largely on per-packet tasks (such as routing or bridging). You'll find that there is still idle CPU time at the point you start dropping packets. I have some work in progress to address this, but it's not yet committed. I thought setting kern.polling.idle_poll would allow the CPU to utilise all idle time. The downside is that the system always shows as 100% utilised so it's very difficult to know how busy the system actually is. - Polling's major advantage is the avoidance of livelock on UP systems, and not improved performance. The limited testing I've done on a Sun V20z at work suggests that you can get better routing throughput in interrupt mode than polling mode. YMMV and this is before tweaking the polling parameters. (My testing also suggests that I don't really need to do any tweaking because the limiting factor is the gigabit interfaces rather than the V20z). -- Peter Jeremy pgpMPGKdXFs4w.pgp Description: PGP signature
Re: Xen dom0 support?
On Sat, 2006-Jul-22 06:56:44 -0500, Nikolas Britton wrote: Does FreeBSD support Xen 3 dom0 yet??? It looks like work is in progress. See http://www.fsmware.com/ What's the current status of domU support? See http://wiki.xensource.com/xenwiki/OSCompatibility -- Peter Jeremy pgpPcTps0xxcE.pgp Description: PGP signature
Re: Memory management
On Wed, 2006-Jul-26 11:26:34 +0200, Stephane Dupille wrote: As time passing by, the memory fills up. When the machine starts, memory is occupied to 30 %, and after two or three weeks memory is occupied to 100 % and it begins to use swap. How are you monitoring memory usage? Do you mean 'swap' or 'page'? A level of page-in's is normal because text and data areas for processes are loaded by paging them in. I was not able to find a correct definition of what inactive memory is. First, I would like to know what are these kind of pages : wired, active, inactive, cache and free. Wired pages are pages that the kernel has wired to RAM so they cannot be paged out. Active pages are being mapped by virtual memory and in use by running processes. Inactive pages are not currently mapped but the kernel knows their contents and can re-map them without needing to retrieve them from disk - they may be dirty. Cache pages are similar to active pages but aren't dirty and are higher-priority candidates for being freed. Free pages have no useful content and will be used to fulfil page-in requests. Is that normal that inactive memory usage grows ? Yes. 'Free' memory is basically wasted and so the kernel tries to limit it, subject to having sufficient free memory to meet page-faults. Most of your RAM should be wired, active or inactive. Inactive memory will start at 0 and grow as active pages are released. What should I do ? Nothing. Why do you think you have a problem? Do you have any tools to monitor memory usage of processes ? ps(1) -- Peter Jeremy pgpYlM6L9c5ab.pgp Description: PGP signature
Re: filesystem full error with inumber
On Wed, 2006-Jul-26 13:07:19 -0400, Sven Willenberger wrote: One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also exhibiting df reporting wrong data usage numbers. What did you upgrade from? Is this UFS1 or UFS2? Does a full fsck fix the problem? -- Peter Jeremy pgpzljAGgFapT.pgp Description: PGP signature
Sound device reported but no devices created
I've been trying to get sound working on a Compaq Armada 1580 with a recent 6-STABLE. With devices 'sound' and 'snd_sbc' built into the kernel and no hints, I get: ESS0004: adding io range 0x250-0x257, size=0x8, align=0 pnpbios: handle 16 device ID ESS0004 (04007316) ESS1878: adding io range 0x220-0x22f, size=0x10, align=0 ESS1878: adding io range 0x388-0x38b, size=0x4, align=0 ESS1878: adding io range 0x330-0x331, size=0x2, align=0 ESS1878: adding irq mask 0x20 ESS1878: adding dma mask 0x2 ESS1878: adding dma mask 0x20 pnpbios: handle 17 device ID ESS1878 (78187316) ... unknown: ESS0004 failed to probe at port 0x250-0x257 on isa0 sbc0: ESS ES1878 at port 0x220-0x22f,0x388-0x38b,0x330-0x331 irq 5 drq 1,5 on isa0 sbc0: [GIANT-LOCKED] unknown: PNP0e03 failed to probe at port 0x3e0-0x3e1 on isa0 Device configuration finished. Note that there's no pcm reported and the only sound-related device is /dev/sndstat. I get the same behaviour if I don't build sound into the kernel and kldload snd_sbc.ko (which autoloads sound.ko). laptop1# sysctl hw.snd.verbose=2 hw.snd.verbose: 1 - 2 laptop1# sysctl hw.snd hw.snd.targetirqrate: 32 hw.snd.report_soft_formats: 1 hw.snd.verbose: 2 hw.snd.unit: 0 hw.snd.maxautovchans: 0 laptop1# cat /dev/sndstat FreeBSD Audio Driver (newpcm) Installed devices: laptop1# I have successfully used sound on a similar laptop with 4.9, where the probes looked like: sbc0: ESS ES1878 at port 0x220-0x22f,0x388-0x38b,0x330-0x331 irq 5 drq 1,5 on isa0 pcm0: ESS 18xx DSP on sbc0 I suspect the problem is that sbc0 is not triggering attachment of pcm0 but I'm uncertain why. Does anyone have any suggestions? -- Peter Jeremy pgpjNBhOfnkYL.pgp Description: PGP signature
Re: Sound device reported but no devices created
On Sun, 2006-Jul-30 12:10:54 +0800, Ariff Abdullah wrote: On Sun, 30 Jul 2006 13:43:52 +1000 Peter Jeremy [EMAIL PROTECTED] wrote: I've been trying to get sound working on a Compaq Armada 1580 with a recent 6-STABLE. With devices 'sound' and 'snd_sbc' built into the kernel and no hints, I get: ... I suspect the problem is that sbc0 is not triggering attachment of pcm0 but I'm uncertain why. Does anyone have any suggestions? Try to disable ACPI, or edit /usr/src/sys/dev/sound/isa/sbc.c and remove acpi module dependency (line 794) The system is too old to support ACPI. I've removed the module dependency anyway with no difference. -- Peter Jeremy pgpimZhdfTFyR.pgp Description: PGP signature
Re: Sound device reported but no devices created
On Sun, 2006-Jul-30 17:52:27 +1000, Ian Smith wrote: Peter, I don't know if this is likely helpful or not, but my Compaq is a 1500c with an ESS ES1869, reporting (on 5.4-RELEASE): sbc0: ESS ES1869 (Compaq OEM) at port 0x330-0x331,0x388-0x38b,0x220-0x22f irq 5 drq 5,1 on isa0 pcm0: ESS 18xx DSP on sbc0 I too have in kernel: device sound device snd_sbc # ES1869 (Compaq OEM) but after much headscratching it only finally worked after adding: snd_ess_load=YES # this fixed it .. bridge driver for ESS That worked, thank you. The man pages are not the clearest here. snd_ess(4) implies that all three drivers are needed but snd_sbc(4) has no reference to it - which is what confused me. -- Peter Jeremy pgpmNw0jBuEPT.pgp Description: PGP signature