Re: Some questions about jails on FreeBSD9.0-RC1
> On 10/26/2011 03:12 AM, Patrick Lamaiziere wrote: >> Le Tue, 25 Oct 2011 22:52:55 +0200, >> carlopmart a écrit : >> >> Hello, >> >>>I have installed one FreeBSD 9.0-RC1 host to run different services >>> (dns, smtp and www only) using jails. This host has two physical >>> nics: em0 and em1. em0 is assigned to pyhiscal host, and I would like >>> to assign em1 to jails. But em0 and em1 are on different networks: >>> em0 is on 192.168.1.0/24 and em1 in 192.168.2.0/29. >>> >>>I have setup one jail using ezjail. My first surprise is that >>> ezjail only installs -RELEASE versions and not RC versions. Ok, I >>> supouse that it is normal. But my first question is: can I install a >>> FreeBSD 8.2 jail under a FreeBSD 9.0 host?? >> >> You may run 8.2 installed ports on 9.0 by using the port >> /usr/ports/misc/compat8x/ >> >> But I suggest to upgrade the port ASAP. >> >>>And the real question: How do I need to configure network under >>> this jail to access it? I have configured ifconfig param for em1 on >>> host's rc.conf, but what about the default route under this jail?? I >>> thought to use pf rules, but I am not sure. >> >> jail enforces the use of the jail IP address in the jail, but that's >> all. Just enable routing on the host. >> > > But, that is not possible. Between host and jail exists a firewall ... I > can't do simple routing with the host. Maybe a posible solution is to > use policy source routing ?? > > > > -- > CL Martinez > carlopmart {at} gmail {d0t} com > ___ I'm using FIBs. The host is in on a private network with gateway of 192.168.1.1 and jails are on public network with their own real/public gateway. FIBs work without the box becoming a gateway: %grep gateway /etc/rc.conf gateway_enable="NO" I have this in system startup to setup "public gateway" for jails: %cat /usr/local/etc/rc.d/0.setfib.sh #!/bin/sh echo setfib 1 for public jails /usr/sbin/setfib 1 /sbin/route add default 216.241.167.1 and in /usr/local/etc/ezjail/myjail I added this line to the end of configs: export jail_myjail_fib="1" [/usr/sbin/jail has FIB support built in, but at that time ezjail did not, so I had to manually add it in the config - nowadays I believe ezjail has FIB support natively, but the resulting config file is the same] The host is using NAT to get out via private IP, and jails are available via public IP. All the IPs are defined in rc.conf the normal _alias way. FIB support as I remember needs a custom kernel - not sure about 9, this is in 8.2. I even run openbsd spamd on the host and using FIBs to start the spamd daemon via a 'setfib 1' wrapper script: %cat /usr/local/etc/rc.d/obspamdfib.sh #!/bin/sh # # this just calls the orignal file, but with setfib 1 /usr/sbin/setfib 1 /usr/local/etc/rc.d.fib/obspamd $1 I had moved the 'obspamd' startup script to rc.d.fib just so a 'setfib 1' wrapper is called. ]Peter[ FIBs are awesome when you don't have many public IPs and when host is _only_ a jail host running no services ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: why is pkg_add on 9.0 stable still using packages from packages-9-current?
> I've just updated from 9.0RC3 to 9.0-stable r229626 after a clean > install. However when trying to add the first packages I noticed that > pkg_add is installing packages from > ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/ and > not from package-9-stable which I would expect. In particular because > the package-9-stable directory does now exist on the server. > > Using Google this appears to have to do with > /usr/src/usr.sbin/pkg_install/add/main.c in which the following > revision appears to include the link: > http://svnweb.freebsd.org/base/head/usr.sbin/pkg_install/add/main.c?r1=225757&r2=225756&pathrev=225757 > However if I look at this specific source file on my system line 98 > that brings in the link to packages-9-stable is missing (The file has > packages-9.0-release, but not packages-9-stable) > > Any suggestion on what I am doing wrong would be highly appreciated as > I don't understand this. > > *) uname -a shows 9.0-stable r229626; the update to latest stable has > been done according to handbook. No variables that could affect this > have been changed as it was a clean install using the 9.0RC3 bootdisk > Should have stayed at -release :) I just had the same question and after doing some further research, the issue is this: http://www.freebsd.org/doc/en/books/porters-handbook/freebsd-versions.html and pkbsdpkg:#pwd /usr/src/usr.sbin/pkg_install pkbsdpkg:#grep -R packages-9 * add/main.c: { 90, 900499, "/packages-9.0-release" }, add/main.c: { 90, 999000, "/packages-9-current" }, They haven't updated main.c for -stable yet and -stable is at 900500 per sys/sys/param.h [or URL]. I'm just setting PACKAGESITE=ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/ manually. ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Script-friendly (parseble) ps(1) output?
>Hello, > >I need to write a cgi script which will print the output from ps(1) in >a table (html), so the average-operator can click on a KILL link and >the cgi will send the selected signal. > >I need to add one ps information per column in a table (html), >however, I found ps(1) output to be too hard to parse. There is no >separator. I believed \t was the separator but its not. > >The ps(1) command I need to use is: > >ps -ax -o pid -o user -o emul -o lstart -o lockname -o stat -o command > >Since many of those args use [:space:] in the output, I can not use >[:space:] as a separator. >Sadly, `-o fiend='value'` will only format the HEADER output, not the >values. >Ive got no clue what to do, can someone enlight me? >Thank you all in advance. -- >=== >Eduardo Meyer >pessoal: [EMAIL PROTECTED] >profissional: [EMAIL PROTECTED] Here is something simple, and you can wrap the HTML around it...; poshta:$ps axuww | while read USER PID CPU MEM VSZ RSS TT STAT STARTED TIME COMMAND; do echo $PID $CPU $USER $COMMAND;done |head -3 PID %CPU USER COMMAND 11 89.6 root [idle] 5127 2.9 qscand spamd child (perl5.8.8) the read ignores all white space...the last variable in that 'while read' will hold everything beyond it... ie; poshta:$ps axuww| while read USER PID CPU MEM VSZ RSS TT STAT STARTED TIME; do echo $PID $CPU $USER $TIME;done |head -3 PID %CPU USER TIME COMMAND 11 77.9 root 138080:11.91 [idle] 13607 5.0 qscand 0:09.12 spamd child (perl5.8.8) etc.etc... ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
stable/8/UPDATING - no mention of 8.0 release
iH, Been updating my src via svn and following stable/8. Looking at the UPDATING file, it does not mention '8.0-RELEASE' Did it just not make it in there yet? http://svn.freebsd.org/viewvc/base/stable/8/UPDATING?view=markup vs. http://svn.freebsd.org/viewvc/base/release/8.0.0/UPDATING?view=markup It just confused me for awhile after I did a 'svn up' and did not see the release notes... [for the 7.X series, both stable/ and release/ mention the release in UPDATING] ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
panic: spin lock held too long 8.0-STABLE r207638
iH, Got a system that whenever I launch another Virtualbox instance, it will panic anywhere from 10 minutes to several days. I was able to get this system to panic constantly yesterday by trying to install Windows 2008, or after it was eventually installed, it only ran for ~45 minutes before panic. Some installs panic at around 2% done, most at ~80% and one actually completed. Trying to figure out if it's my hardware or what. I used to have a Windows XP VM running alongside, and the panics happened about once a week - eventually I got lazy and didn't start the XP VM. [less load and no swap usage, and no panics for 1.5 months until yesterday] Yesterday I tried to install Windows 2008, and in the ~6 hours I was messing around, it panicked around 8 times [after random amounts of time] panic: spin lock held too long [cpuid either 2,3, or 4 as far as I remember] 4GB of RAM AMD X4 CPU 8.0-STABLE r207638 amd64 vbox 3.1.6 mostly GENERIC kernel [sched_ule], with firewall/altq compiled in and unneccessary hardware removed from kernel config. With just one VM [FreeBSD 8-stable, 2 CPU 2GB RAM] the system runs fine for months. [it's also a file/print server] As soon as I try to get a Windows VM running on there, and it starts using some swap, anywhere from 20MB to 150MB it eventually panics [usually within an hour or so]. Any ideas ? hardware issues? Anyone successfull in running several VMs on FreeBSD -> VirtualBox ? [overloading it?] ]Peter[ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
kernel crash from adjacent partitions (gpart, zfs)
Hi, when creating partitions directly adjacent without a safety free space between them, the kernel may crash. Does anybody know how big that free space needs to be? How I found out (and how to reproduce the crash): https://forums.freebsd.org/threads/create-degraded-raid-5-with-2-disks-on-freebsd.70750/post-426756 OS concerned: 11.2, amd64 and i386. Or, does anybody know if this is fixed in 12? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Rel. 11.3: Kernel doesn't compile anymore :(
Trying to compile my custom kernel in Rel. 11.3 results in this: [code]--- kernel.full --- linking kernel.full atomic.o: In function `atomic_add_64': /usr/obj/usr/src/sys/E1R11V1/./machine/atomic.h:629: multiple definition of `atomic_add_64' opensolaris_atomic.o:/usr/src/sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71: first defined here *** [kernel.full] Error code 1[/code] Same config worked with 11.2 The offending feature is either options ZFS or device dtrace (Adding any of these to the GENERIC config gives the same error.) This happens only when building for i386. Building amd64 with these options works. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Rel. 11.3: Kernel doesn't compile anymore (SVN-334762, please fix!)
> Trying to compile my custom kernel in Rel. 11.3 results in this: > > -- kernel.full --- > linking kernel.full > atomic.o: In function `atomic_add_64': > /usr/obj/usr/src/sys/E1R11V1/./machine/atomic.h:629: multiple definition of > `atomic_add_64' > opensolaris_atomic.o:/usr/src/sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71: > first defined here > *** [kernel.full] Error code 1 > > Same config worked with 11.2 > > The offending feature is either >options ZFS > or >device dtrace > (Adding any of these to the GENERIC config gives the same error.) > > This happens only when building for i386. Building amd64 with these > options works. Trying to analyze the issue: The problem appears with SVN 334762 in 11.3: This change adds two new functions to sys/i386/include/atomic.h: atomic_add_64() atomic_subtract_64() [I don't really understand why this goes into a headerfile, but, well, nevermind] Also, this change deactivates two functions (only in case *i386*) from sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c atomic_add_64() atomic_del_64() [Now, there seems to be a slight strangeness here: if we *deactivate* atomic_del_64(), and *insert* atomic_subtract_64(), then these two names are not the same, and I might suppose that the atomic_del_64() is then somehow missing. But, well, nevermind] Now, the strange thing: this file sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c from which now two functions get excluded *only in case i386*, is not even compiled for i386: >/usr/src/sys/conf$ grep opensolaris_atomic.c * >files.arm:cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs | >dtrace compile-with "${CDDL_C}" >files.mips:cddl/compat/opensolaris/kern/opensolaris_atomic.coptional zfs | >dtrace compile-with "${CDDL_C}" >files.powerpc:cddl/compat/opensolaris/kern/opensolaris_atomic.c > optional zfs powerpc | dtrace powerpc compile-with "${ZFS_C}" >files.riscv:cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs | >dtrace compile-with "${CDDL_C}" [So maybe that's the reason why the now lack of atomic_del_64() is not complained? Or maybe it's not used, or maybe I didn't find some definition whereever. Well, nevermind] Anyway, the actual name clash happens between sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S, because that one *is* compiled: >/usr/src/sys/conf$ grep i386/opensolaris_atomic.S * >files.i386:cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > optional zfs | dtrace compile-with "${ZFS_S}" I tried to move out the changes from SVN 334762. Sadly, that didn't work, because something does already use these atomic_add_64() stuff, So instead, I did this one: --- sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S (revision 350287) +++ sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S (working copy) @@ -66,8 +66,7 @@ * specific mapfile and remove the NODYNSORT attribute * from atomic_add_64_nv. */ - ENTRY(atomic_add_64) - ALTENTRY(atomic_add_64_nv) + ENTRY(atomic_add_64_nv) pushl %edi pushl %ebx movl12(%esp), %edi // %edi = target address @@ -87,7 +86,6 @@ popl%edi ret SET_SIZE(atomic_add_64_nv) - SET_SIZE(atomic_add_64) ENTRY(atomic_or_8_nv) movl4(%esp), %edx // %edx = target address And at least it compiles now. If it actually runs, that remains to be found out. Bottomline: Please, please, please, sort this out and fix it. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Rel. 11.3: Kernel doesn't compile anymore (SVN-334762, please fix!)
Hi Hans Petter, glad to read You! :) On Thu, Jul 25, 2019 at 09:39:26AM +0200, Hans Petter Selasky wrote: ! On 2019-07-25 01:00, Peter wrote: ! >> The offending feature is either ! >> options ZFS ! >> or ! >> device dtrace ! >> (Adding any of these to the GENERIC config gives the same error.) ! Can you attach your kernel configuration file? Yes, but to what point? I can reproduce this with the GENERIC configuration by adding "options ZFS" (My custom KERNCONF relates to my local patches, and is rather pointless without these. So at first I tried to reproduce without my local patches and with minimal changes from GENERIC config. And the minimal change is to add "options ZFS" into the GENERIC conf.) See here: root@disp:/usr/src/sys/i386/compile/GENERIC # make linking kernel.full atomic.o: In function `atomic_add_64': /usr/src/sys/i386/compile/GENERIC/./machine/atomic.h:629: multiple definition of `atomic_add_64' opensolaris_atomic.o:/usr/src/sys/i386/compile/GENERIC/../../../cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71: first defined here *** Error code 1 Stop. make: stopped in /usr/src/sys/i386/compile/GENERIC root@disp:/usr/src/sys/i386/compile/GENERIC # root@disp:/usr/src/sys/i386/compile/GENERIC # cd ../../../.. root@disp:/usr/src # svn stat M sys/i386/conf/GENERIC root@disp:/usr/src # svn diff Index: sys/i386/conf/GENERIC === --- sys/i386/conf/GENERIC (revision 350287) +++ sys/i386/conf/GENERIC (working copy) @@ -1,3 +1,4 @@ +options ZFS # # GENERIC -- Generic kernel configuration file for FreeBSD/i386 # root@disp:/usr/src # svn info Path: . Working Copy Root Path: /usr/src URL: https://svn0.us-east.freebsd.org/base/releng/11.3 Relative URL: ^/releng/11.3 Repository Root: https://svn0.us-east.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 350287 Node Kind: directory Schedule: normal Last Changed Author: gordon Last Changed Rev: 350287 Last Changed Date: 2019-07-24 12:58:21 + (Wed, 24 Jul 2019) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
wrong value from DTRACE (uint32 for int64)
Hi @all, I felt the need to look into my ZFS ARC, but DTRACE provided misleading (i.e., wrong) output (on i386, 11.3-RELEASE): # dtrace -Sn 'arc-available_memory { printf("%x %x", arg0, arg1); }' DIFO 0x286450a0 returns D type (integer) (size 8) OFF OPCODE INSTRUCTION 00: 29010601ldgs DT_VAR(262), %r1 ! DT_VAR(262) = "arg0" 01: 2301ret %r1 NAME ID KND SCP FLAG TYPE arg0 262 scl glb rD type (integer) (size 8) DIFO 0x286450f0 returns D type (integer) (size 8) OFF OPCODE INSTRUCTION 00: 29010701ldgs DT_VAR(263), %r1 ! DT_VAR(263) = "arg1" 01: 2301ret %r1 NAME ID KND SCP FLAG TYPE arg1 263 scl glb rD type (integer) (size 8) dtrace: description 'arc-available_memory ' matched 1 probe 0 14none:arc-available_memory 2fb000 2 0 14none:arc-available_memory 4e000 2 1 14none:arc-available_memory b000 2 1 14none:arc-available_memory b000 2 1 14none:arc-available_memory b000 2 1 14none:arc-available_memory 19000 2 0 14none:arc-available_memory d38000 2 # dtrace -n 'arc-available_memory { printf("%d %d", arg0, arg1); }' 1 14none:arc-available_memory 81920 5 1 14none:arc-available_memory 69632 5 1 14none:arc-available_memory 4294955008 5 1 14none:arc-available_memory 4294955008 5 The arg0 Variable is shown here obviousely as an unsigned int32 value. But in fact, the probe in the sourcecode in arc.c is a signed int64: DTRACE_PROBE2(arc__available_memory, int64_t, lowest, int, r); User @shkhin in the forum pointed me to check the bare dtrace program, unattached to the kernel code: https://forums.freebsd.org/threads/dtrace-treats-int64_t-as-uint32_t-on-i386.73223/post-446517 And there everything appears correct. So two questions: 1. can anybody check and confirm this happening? 2. any idea what could be wrong here? (The respective variable in arc.c bears the correct 64bit negative value, I checked that - and otherwise the ARC couldn't shrink.) rgds, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: wrong value from DTRACE (uint32 for int64)
On Mon, 02 Dec 2019 21:58:36 +0100, Mark Johnston wrote: The DTRACE_PROBE* macros cast their parameters to uintptr_t, which will be 32 bits wide on i386. You might be able to work around the problem by casting arg0 to uint32_t in the script. Thanks for the info - good that it has a logical explanation. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Disabling speculative execution mitigations
On Fri, 06 Dec 2019 06:21:04 +0100, O'Connor, Daniel wrote: vm.pmap.pti="0"# Disable page table isolation hw.ibrs_disable="1"# Disable Indirect Branch Restricted Speculation hw.mds_disable="0" # Disable Microarchitectural Data Sampling flush hw.vmm.vmx="1" # Don't flush RSB on vmexit (presumably only affects bhyve etc) hw.lazy_fpu_switch="1" # Lazily flush FPU Does anyone know of any others? hw.spec_store_bypass_disable=2 I have that on 11.3 (no idea yet about 12). And honestly, I lost track which of these should be on, off, automatic, opaque or elsewhere to achieve either performance or security (not to mention for which cores and under which circumstances it would matter, and what the impact might be), and my oracle says this will not end with these. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
session mgmt: does POSIX indeed prohibit NOOP execution?
pgrp = getpid(); if(setpgid(0, pgrp) < 0) err(1, "setpgid"); This appears to me a program trying to deemonize itself (in the old style when there was only job control but no session management). In the case this program is already properly daemonized, e.g. by starting it from /usr/sbin/daemon, this code now fails, invoking the err() clause and thereby aborting. From what I could find out, POSIX does not allow a session leader to do setpgid() on itself. When a program is invoked via /usr/sbin/daemon, it should already be session leader AND group leader, and then the above code WOULD be a NOOP, unless POSIX would require the setpgid() to fail and thereby the program to abort - which, btw, is NOT a NOOP :( So, where is the mistake here? Option 1: I have completely misunderstood something. Then please tell me what. Option 2: The quoted code is bogus. Then why is it in base? option 3: The setpgid() behaviour is bogus. It may stop a session leader from executing it, but it should detect a NOOP and just go thru with it. Then why don't we fix that? Option 4: POSIX is bogus. Unlikely, because as far as I could find out, that part of it was written following the Berkeley implementation. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS and power management
On Wed, 18 Dec 2019 17:22:16 +0100, Karl Denninger wrote: I'm curious if anyone has come up with a way to do this... I have a system here that has two pools -- one comprised of SSD disks that are the "most commonly used" things including user home directories and mailboxes, and another that is comprised of very large things that are far less-commonly used (e.g. video data files, media, build environments for various devices, etc.) I'm using such a configuration for more than 10 years already, and didn't perceive the problems You describe. Disks are powered down with gstopd or other means, and they stay powered down until filesystems in the pool are actively accessed. A difficulty for me was that postgres autovacuum must be completeley disabled if there are tablespaces on the quiesced pools. Another thing that comes to mind is smartctl in daemon mode (but I never used that). There are probably a whole bunch more of potential culprits, so I suggest You work thru all the housekeeping stuff (daemons, cronjobs, etc.) to find it. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: session mgmt: does POSIX indeed prohibit NOOP execution?
On Mon, 06 Jan 2020 01:10:57 +0100, Christoph Moench-Tegeder wrote: When a program is invoked via /usr/sbin/daemon, it should already be session leader AND group leader, and then the above code WOULD be a NOOP, unless POSIX would require the setpgid() to fail and thereby the program to abort - which, btw, is NOT a NOOP :( https://pubs.opengroup.org/onlinepubs/9699919799/ "The setpgid() function shall fail if: [...] The process indicated by the pid argument is a session leader." Okay, so, what You are saying is that I got correct information insofar that POSIX indeed demands the perceived behaviour. Thanks for that confirmation. Not much room to argue? Why that? This is not about laws you have to follow blindly whether you understand them or not, this is all about an Outcome - a working machine that should properly function. So either there are other positive aspects in this behaviour that weight against the perceived malfunction, or the requirement is simply wrong. And the latter case should be all the argument that is needed. I do not say disobey Posix. I only say that one of the involved parts must certainly be wrong, and that should be fixed. So if You are saying, the problem is in Posix, but we are in the role of blind monkeys who have to follow that alien commandment by all means no matter the outcome, then this does not seem acceptable to me. Actually, as it seems to me, this whole session thing came originally out of Kirk McKusick's kitchen and made its way from there into Posix, so if there is indeed a flaw in it, it should well be possible to fix it going the same way. In any case, this here (to be found in /etc/rc,d/kadmind) is a crappy workaround and not acceptable style: command_args="$command_args &" We aren't slaves, or, are we? I for my part came just accidentially across this matter, and as my stance is, 1. the code has to be solid enough to stand the Jupiter mission, and therefore 2. do a rootcause Always, on Every misbehaviour (and then fix it once and for all), so I figured that thing out. rgds, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Fwd: Re: session mgmt: does POSIX indeed prohibit NOOP execution?
> Not much room to argue? Why that? This is not about laws you have to follow blindly whether you understand them or not, this is all about an Outcome - a working machine that should properly function. "Not much to argue about what behaviour is required by the standard". The standard could have been written to require different behaviour and most probably still make sense, but it wasn't; but at least it's unambiguous. After that, the discussion is rather... philosophical. It is not the standard that concerns me, it is *failure* that concerns me. When I try to run a daemon from the base OS (in the orderly way, via daemon command), and it just DOES NOT WORK, and I need to find out and look into it what's actually wrong, then for me that's not philosophy, that's a failure that needs some effort to fix. And I dont want such issues, and, more important, I don't want other people to run into the same issue again! (Not sure what is so difficult to understand with that.) In any case, either the base system has a flaw, or the syscall has a flaw, or the Posix has a flaw. I don't care which, You're free to choose, But if you instead think that flaws are not allowed to exist because Posix is perfect, and therefore the much better solution is to just bully the people who happen to run into the flaws, well, thats also okay. rgds, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
jedec_dimm fails to boot
I met an Issue: When I kldload jedec_dimm durig runtime, it works just as expected, and the DIMM data appears in sysctl. But when I do * load the jedec_dimm at the loader prompt, or * add it to loader.conf, or * compile it into a custom kernel, it does not boot anymore. My custom kernel does just hang somewhere while switching the screen, i.e. no output. The GENERIC does immediate-reboot during the device probe phase. So both are not suitable for gathering additional info in an easy way. (And since my DIMM appear to have neither thermal nor serial, there is not much to gain for me here, so I will not pursue this further, at least not before switching to R.12.) But I fear there are some general problems with sorting out of the modules during system bringup - see also my other message titled "panic: too many modules". Some data for those interested: FreeBSD 11.3-RELEASE-p6 CPU: Intel(R) Core(TM) i5-3570T CPU (IvyBridge) Board: https://www.asus.com/Motherboards/P8B75V/specifications/ Config: hint.jedec_dimm.0.at="smbus12" hint.jedec_dimm.0.addr="0xa0" hint.jedec_dimm.1.at="smbus12" hint.jedec_dimm.1.addr="0xa2" hint.jedec_dimm.2.at="smbus12" hint.jedec_dimm.2.addr="0xa4" hint.jedec_dimm.3.at="smbus12" hint.jedec_dimm.3.addr="0xa6" ichsmb0: port 0xf040-0xf05f mem 0xf7d1500 0-0xf7d150ff irq 18 at device 31.3 on pci0 smbus12: on ichsmb0 smb12: on smbus12 With GENERIC it becomes smbus0 (because drm2 is not loaded) and I need to load "smbus" and "ichsmb" frontup. Cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
panic: too many modules
Front up: I do not like loadable modules. They are nice to try something out, but when you start to depend on some dozen loaded modules, debugging becomes a living hell: say you hunt some spurious misbehaviour and compare logfiles with those from four weeks ago, you will not know exactly which modules were loaded at that time. Compiling everything into the kernel has the advantage that the 'uname' does change on every change and so does precisely describe the running kernel. So I came across the cc_vegas and cc_cdg modules, and they aren't provided to compile into the kernel straightaway. But that should not be a big deal: just add some arbitrary new device to the KERNCONF, and then add the required files to sys/conf/files appropriately. Should work. But it doesn't. Right after the startup message, before even probing devices, it says panic: module_register_init: module named ertt not found and a stacktrace from kern/init_main.c:mi_startup(). But definitely the h_ertt is present in the kernel (I checked). To have a closer look, I added VERBOSE_SYSINIT to the kernel, and - the panic is gone, everything working as expected. Without even activating the output from VERBOSE_SYSINIT. Then, I moved netinet/khelp/h_ertt.c to the very end of sys/conf/files - and this also avoids the panic and things do work. While this change does nothing but change the sequence in which the files are compiled (and probably linked). I think this is not good. Everybody likes modules, (although -see above- they come with a serious tradeoff on reproducability). But if we now deliver components only as loadable modules because a compound kernel is no longer able to sort them out on boot, that's a more serious issue. I wouldn't complain if the module would simply not work (reproducible) when compiled into the kernel - but this here appears to be a race, most likely a timing race. And such being possible to happen at the point where the kernel sorts out it's own components - ups, that does worry me indeed... There seems also to be a desire for a *fast* system bringup. I don't share that. I do boot once a quarter, and if that takes a hour I don't mind. Maybe there is need for an option, to give fast boot to those who want a gaming console alike to be available immediately, and slow boot for those who want a reliable system in 24/7 operation? Maybe I'll take a closer look at the issue after switching to R.12 (probably not this year). Or, maybe somebody would like to point me to some paper describing how the module fabric is supposed to interface and by which steps the runtime linkage is achieved? Platform: FreeBSD 11.3-RELEASE-p6, Intel(R) Core(TM) i5-3570T CPU (IvyBridge) cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: jedec_dimm fails to boot
On Wed, Mar 04, 2020 at 11:41:22PM +0300, Yuri Pankov wrote: ! On 04.03.2020 19:09, Peter wrote: ! > When I kldload jedec_dimm durig runtime, it works just as expected, ! > and the DIMM data appears in sysctl. ! > ! > But when I do ! > * load the jedec_dimm at the loader prompt, or ! > * add it to loader.conf, or ! > * compile it into a custom kernel, ! > it does not boot anymore. ! Could you try backporting r351604 and see if it helps? Yepp, that works. Thank You! :) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
How to free used Swap-Space?
Hi all, my machine should use about 3-4, maybe 5 GB swapspace. Today I found it suddenly uses 8 GB (which is worryingly near the configured 10G). I stopped all the big suckers - nothing found. I stopped all the jails - no success. I brought it down to singleuser: it tried to swapoff, but failed. I unmounted all filesystems, exported all pools, detached all geli, and removed most of the netgraphs. Swap is still occupied. Machine is now running only the init and a shell processes, has almost no filesystems mounted, has mostly native networks only, and this still occupies 3 GB of swap which cannot be released. What is going on, what is doing this, and how can I get this swapspace released?? It is 11.4-RELEASE-p3 amd64. Script started on Mon Sep 21 05:43:20 2020 root@edge# ps axlww UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 0 0 0 -16 00 752 swapin DLs -291:32.41 [kernel] 0 1 0 0 20 0 5416 248 wait ILs - 0:00.22 /sbin/init -- 0 2 0 0 -16 00 16 ftcl DL- 0:00.00 [ftcleanup] 0 3 0 0 -16 00 16 crypto_w DL- 0:00.00 [crypto] 0 4 0 0 -16 00 16 crypto_r DL- 0:00.00 [crypto returns] 0 5 0 0 -16 00 32 -DL- 11:41.94 [cam] 0 6 0 0 -8 00 80 t->zthr_ DL- 13:07.13 [zfskern] 0 7 0 0 -16 00 16 waiting_ DL- 0:00.00 [sctp_iterator] 0 8 0 0 -16 00 16 -DL- 2:05.20 [rand_harvestq] 0 9 0 0 -16 00 16 -DL- 0:00.04 [soaiod1] 010 0 0 155 00 64 -RNL - 17115:06.48 [idle] 011 0 0 -52 00 352 -WL- 49:05.30 [intr] 012 0 0 -16 00 64 sleepDL- 16:28.51 [ng_queue] 013 0 0 -8 00 48 -DL- 23:10.60 [geom] 014 0 0 -16 00 16 seqstate DL- 0:00.00 [sequencer 00] 015 0 0 -68 00 160 -DL- 0:23.64 [usb] 016 0 0 -16 00 16 -DL- 0:00.04 [soaiod2] 017 0 0 -16 00 16 -DL- 0:00.04 [soaiod3] 018 0 0 -16 00 16 -DL- 0:00.04 [soaiod4] 019 0 0 -16 00 16 idle DL- 0:00.83 [enc_daemon0] 020 0 0 -16 00 48 psleep DL- 12:07.72 [pagedaemon] 021 0 0 20 00 16 psleep DL- 4:12.41 [vmdaemon] 022 0 0 155 00 16 pgzero DNL - 0:00.00 [pagezero] 023 0 0 -16 00 64 psleep DL- 0:23.50 [bufdaemon] 024 0 0 20 00 16 -DL- 0:04.21 [bufspacedaemon] 025 0 0 16 00 16 syncer DL- 0:32.48 [syncer] 026 0 0 -16 00 16 vlruwt DL- 0:02.31 [vnlru] 027 0 0 -16 00 16 -DL- 7:11.58 [racctd] 0 157 0 0 20 00 16 geli:w DL- 0:22.03 [g_eli[0] ada1p2] 0 158 0 0 20 00 16 geli:w DL- 0:22.77 [g_eli[1] ada1p2] 0 159 0 0 20 00 16 geli:w DL- 0:31.08 [g_eli[2] ada1p2] 0 160 0 0 20 00 16 geli:w DL- 0:29.41 [g_eli[3] ada1p2] 0 70865 1 0 20 0 7076 3104 wait Ss v0 0:00.21 -sh (sh) 0 71135 70865 0 20 0 6392 2308 select S+ v0 0:00.00 script 0 71136 71135 0 23 0 7076 3068 wait Ss0 0:00.00 /bin/sh -i 0 71142 71136 0 23 0 6928 2584 -R+0 0:00.00 ps axlww root@edge# df Filesystem 512-blocksUsed Avail Capacity Mounted on /dev/ada3p31936568 860864 92078448%/ devfs2 2 0 100%/dev procfs 8 8 0 100%/proc /dev/ada3p43099192 1184896 166636842%/usr /dev/ada3p5 5803448112 525808 2%/var root@edge# pstat -s Device 512-blocks UsedAvail Capacity /dev/ada1p2.eli 10485760 5839232 464652856% root@edge# top | cat last pid: 71147; load averages: 0.19, 0.08, 0.09 up 3+03:21:0005:44:12 5 processes:1 running, 4 sleeping Mem: 9732K Active, 10M Inact, 882M Laundry, 1920M Wired, 10M Buf, 1023M Free ARC: 335K Total, 16K MFU, 304K MRU, 15K Header 320K Compressed, 2944K Uncompressed, 9.20:1 Ratio Swap: 5120M Total, 2851M Used, 2269M Free, 55% Inuse PID USERNAMETHR PRI NICE SIZERES STATE C TIMEWCPU COMMAND 70865 root 1 200 7076K 3104K wait2 0:00 0.00% sh 71135 root 1 200 6392K 2308K select 1 0:00 0.00% script 71136 root 1 200 7076K 3068K wait2 0:00 0.00% sh 71146 root 1 200 7928K 2980K CPU00 0:00 0.00% top 71147 root 1 200 6300K 2088K piperd 1 0:00 0.00% c
Re: How to free used Swap-Space?
On Tue, Sep 22, 2020 at 12:33:19PM -0400, Mark Johnston wrote: ! On Tue, Sep 22, 2020 at 06:08:01PM +0200, Peter wrote: ! > my machine should use about 3-4, maybe 5 GB swapspace. Today I found ! > it suddenly uses 8 GB (which is worryingly near the configured 10G). ! > ! > I stopped all the big suckers - nothing found. ! > I stopped all the jails - no success. ! > I brought it down to singleuser: it tried to swapoff, but failed. ! > ! > I unmounted all filesystems, exported all pools, detached all geli, ! > and removed most of the netgraphs. Swap is still occupied. ! > ! > Machine is now running only the init and a shell processes, has ! > almost no filesystems mounted, has mostly native networks only, and ! > this still occupies 3 GB of swap which cannot be released. ! > ! > What is going on, what is doing this, and how can I get this swapspace ! > released?? ! ! Do you have any shared memory segments lingering? ipcs -a will show ! SysV shared memory usage. I have four small shmem segments from four postgres clusters running. These should cleanly disappear when the clusters are stopped, and they are very small. Shared Memory: T ID KEY MODEOWNERGROUPCREATOR CGROUP NATTCHSEGSZ CPID LPID ATIMEDTIMECTIME m65536 5432001 --rw--- postgres postgres postgres postgres 7 48 4793 4793 6:09:34 18:00:31 6:09:34 m655370 --rw--- postgres postgres postgres postgres 11 48 6268 6268 6:09:42 10:48:27 6:09:42 m655380 --rw--- postgres postgres postgres postgres 5 48 6968 6968 6:09:46 18:28:36 6:09:46 m655390 --rw--- postgres postgres postgres postgres 6 48 6992 6992 6:09:47 3:38:34 6:09:47 ! For POSIX shared memory, in 11.4 we do not ! have any good way of listing objects, but "vmstat -m | grep shmfd" will ! at least show whether any are allocated. There is something, and I don't know who owns that: $ vmstat -m | grep shmfd shmfd1314K - 473 64,256,1024,8192 But that doesn't look big either. Furthermore, this machine is running for quite some time already; it was running as i386 (with ZFS) until very recently, and I know quite well what is using much memory: these 3 GB were illegitimate; they came from nothing I did install. And they are new; this has not happened before. ! If those don't turn anything ! up then it's possible that there's a swap leak. Do you use any DRM ! graphics drivers on this system? Probably yes. There is no graphics used at all; it just uses "device vt" in text mode, but it uses i5-3570T CPU (IvyBridge HD2500) graphics for that, and the driver is "drm2" and "i915drm" from /usr/src/sys (not those from ports). Not sure how that would account for 3 GB, unless there is indeed some leak. regards, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to free used Swap-Space? (from errno=8)
I think I can reproduce the problem now. See below. On Tue, Sep 22, 2020 at 02:09:01PM -0400, Mark Johnston wrote: ! On Tue, Sep 22, 2020 at 07:31:07PM +0200, Peter wrote: ! > There is something, and I don't know who owns that: ! > $ vmstat -m | grep shmfd ! > shmfd1314K - 473 64,256,1024,8192 ! > ! > But that doesn't look big either. ! ! That is just the amount of kernel memory used to track a set of objects, ! not the actual object sizes. Unfortunately, in 11 I don't think there's ! any way to enumerate them other than running kgdb and examining the ! shm_dictionary hash table. One of the owners of this is also postgres (maybe among others). ! I think I see a possible problem in i915, though I'm not sure if you'd ! trigger it just by using vt(4). It should be fixed in later FreeBSD ! versions, but is still a problem in 11. Here's a (untested) patch: Thank You, I'll keep that one in store, just in case. But now I found something simpler, while tracking error messages that came into my glance alongside: When patching to 11.4-p3, I had been reluctant to recompile lib32 and install that everywhere, and had kicked it off the systems. And obviousely, I had missed to recompile some of my old self-written binaries and they were still i386 and were called by various scripts. So what happens then is this: $ file scc.e scc.e: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 9.3 (903504), stripped $ ./scc.e ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap And this will cost about some (hundred?) kB of swapspace every time it happens. And they do not go away again, neither can the concerned jail do fully die again. So, maybe, when removing the lib32 & friends from the system, one must also remove the "options COMPAT_FREEBSD32" from the kernel, so that it might not try to run that binary, and maybe that would avoid the issue. (But then, what if one uses lib32 only in *some* jails? Some evil user in another jail can then bring along an i386 binary and crash the system by bloating the mem.) Anyway, my problem is now solved; as I needed these binaries back in working order anyway. regards, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to free used Swap-Space? (from errno=8)
On Wed, Sep 23, 2020 at 12:03:32AM +0300, Konstantin Belousov wrote: ! On Tue, Sep 22, 2020 at 09:11:49PM +0200, Peter wrote: ! > So what happens then is this: ! > ! > $ file scc.e ! > scc.e: ELF 32-bit LSB executable, Intel 80386, version 1 ! > (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, ! > for FreeBSD 9.3 (903504), stripped ! > ! > $ ./scc.e ! > ELF interpreter /libexec/ld-elf.so.1 not found, error 8 ! > Abort trap ! > ! > And this will cost about some (hundred?) kB of swapspace every time it ! > happens. And they do not go away again, neither can the concerned jail ! > do fully die again. ! In what sense it 'costs' ? Well that amount memory gets occupied. Forever, that is, until poweroff/reset. ! Can you show exact sequence of commands and outputs that demostrate your ! point ? What type of filesystem the binaries live on ? Oh, I didn't care. Originally on ZFS. When I tried to reproduce it, most likely on an NFS-4 share, as I didn't bother to put it anywhere special. ! I want to reproduce it locally. Yes that's great! Lets see which info You are lacking. Here we are now on my desktop box (mostly same machine, same configuration, i5-3570, 11.4-p3, amd64). I explicitely removed all the files that do not get installed when /etc/src.conf contains the "WITHOUT_LIB32=", but I have the COMPAT_FREEBSD32 still in the kernel. Now I fetch such an old R9.3/i386 binary from my backups, and drop it into some NFS filesystem: (That binary is only 4kB, I just attach it here, if you wanna try you can straightaway use that one - in normal operation it just converts some words stdin to stdout). admin@disp:510:1/ext/Repos$ dir usr2sys -rwxr-xr-x 1 bin bin4316 Apr 7 2016 usr2sys admin@disp:511:1/ext/Repos$ file usr2sys usr2sys: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 9.3 (903504), stripped admin@disp:513:1/ext/Repos$ mount | grep Repos edge-e:/ext/Repos on /ext/Repos (nfs, nfsv4acls) admin@disp:514:1/ext/Repos$ top | cat Mem: 952M Active, 1687M Inact, 419M Laundry, 4423M Wired, 774M Buf, 348M Free ARC: 1940M Total, 1378M MFU, 172M MRU, 2492K Anon, 48M Header, 340M Other 1134M Compressed, 2749M Uncompressed, 2.43:1 Ratio Swap: 20G Total, 36M Used, 20G Free As we see, this machine has 8 Gig installed and currently about no swap used. Now watch what happens: epos$ ./usr2sys ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap admin@disp:519:1/ext/Repos$ for i in `seq 1000` > do ./usr2sys > done ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap ... admin@disp:514:1/ext/Repos$ top | cat Mem: 1010M Active, 1807M Inact, 419M Laundry, 4523M Wired, 774M Buf, 69M Free ARC: 1940M Total, 1383M MFU, 166M MRU, 2503K Anon, 48M Header, 340M Other 1134M Compressed, 2750M Uncompressed, 2.43:1 Ratio Swap: 20G Total, 36M Used, 20G Free The free memory has already disappeared! admin@disp:521:1/ext/Repos$ for i in `seq 5000`; do ./usr2sys ; done ... admin@disp:522:1/ext/Repos$ top | cat Mem: 2154M Active, 78M Inact, 787M Laundry, 4722M Wired, 774M Buf, 89M Free ARC: 1753M Total, 1273M MFU, 97M MRU, 2653K Anon, 39M Header, 340M Other 953M Compressed, 2445M Uncompressed, 2.56:1 Ratio Swap: 20G Total, 358M Used, 20G Free, 1% Inuse Now the swapspace starts filling. Lets see if the placement filesystem makes any difference and go onto UFS: admin@disp:525:1/ext/Repos$ su - Password: root@disp:~ # cp /ext/Repos/usr2sys /var root@disp:~ # dir /var/usr2sys -rwxr-xr-x 1 bin bin 4316 Sep 22 23:55 /var/usr2sys root@disp:~ # mount | grep /var /dev/ada0p5 on /var (ufs, local, soft-updates) admin@disp:527:1/var$ ./usr2sys ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap admin@disp:521:1/ext/Repos$ for i in `seq 5000`; do ./usr2sys ; done ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap ... Ahh, that runs a LOT faster now than on the NFS! admin@disp:529:1/var$ top | cat Mem: 1546M Active, 67M Inact, 934M Laundry, 5121M Wired, 774M Buf, 161M Free ARC: 1646M Total, 1159M MFU, 107M MRU, 2686K Anon, 37M Header, 340M Other 849M Compressed, 2257M Uncompressed, 2.66:1 Ratio Swap: 20G Total, 1658M Used, 18G Free, 8% Inuse But memory leakage is similar to worse. admin@disp:530:1/var$ df tmp Filesystem1K-blocks UsedAvail Capacity Mounted on zdesk/var/tmp 24747504 231052 24516452 1%/var/tmp admin@disp:531:1/var$ cp usr2sys tmp admin@disp:532:1/var$ cd tmp admin@disp:533:1/var/tmp$ ./usr2sys ELF interpreter /libexec/ld-elf.so.1 not found, error 8 Abort trap admin@disp:534:1/var/tmp$ for i in `seq 5000`; do ./usr2sys ; done ... You can see this is now a ZFS, and the behaviour is basically the same: Mem: 1497M Active, 5292K Inact, 803M Laundry, 5313M Wired, 774M Buf, 212M Free ARC: 1432M Total, 963M MFU, 105M MRU, 2511K Anon, 21M
12.2 Firefox immediate crash "exiting due to channel error"
Hi all, I was forced to upgrade 11.4 -> 12.2, as QT5 reqires openssl 1.1.1. I did a full rebuild from source as of this: 12.2-RC2 FreeBSD 12.2-RC2 #11 r366648M#N1055:1078 (local patches applied - some published via sendbug 10 or 12 years ago) I did a full rebuild of ALL ports from source, as of 2020Q4, Revision: 552058. I verified all files in /usr/local were newly written. Then I removed COMPAT_FREEBSD11. Firefox (firefox-esr 78.3.1_3,1) reproducibly crashes immediate at startup with some "exiting due to channel error". This is solved by putting COMPAT_FREEBSD11 back in (after the better part of a day spent with kernel builds while halving the diffs between GENERIC and mine). I found some comments, but they do not elaborate on the issue, e.g: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233028#c13 (that's two years ago and concerns 12.0-PRERELEASE!) Finally I found this: https://reviews.freebsd.org/D23100 "The Rust ecosystem currently uses pre-ino64 syscalls, so building lang/rust without COMPAT_FREEBSD11 is not going to work." It seems, *RUNNING* rust-built stuff w/o COMPAT11 is also not going to work - and one wouldn't expect this (and probably search for a long time), because removing compat switches finally before rebooting, *AFTER* everything was rebuilt and installation verified, is just good practice. So, as a user I would expect to find this mentioned in some release notes. OTOH, rust is an add-on, and so one could take the position that base is not concerned. But then at least ports/UPDATING should somehow mention it. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
12.2 cpuset behaves unexpected
After upgrading 11.4 -> 12.2, cpuset now behaves rather different: # cpuset -C -p NNN 11.4: a new set is created with all cpu enabled, and the process is moved into that set, with the thread mask unchanged. 12.2: nothing is done, but an error raises if threadmask == setmask. # cpuset -l XX -C -p NNN 11.4: a new set is created with all cpu enabled, and the process is moved into that set, with the thread mask changed to the -l parameter. 12.2: an error raises if threadmask == setmask, otherwise the threadmask is changed to the -l parameter. It seems the -C option does not work anymore (except for creating errors that appear somehow bogus). PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Help! 12.2 mem ctrl knob missing, might need 3 times more memory
Hiya, after upgrading 11.4 -> 12.2, I get this error: > sysctl: unknown oid 'vm.pageout_wakeup_thresh' at line 105 How do I adjust the paging now? The ARC is much too small: Mem: 1929M Active, 109M Inact, 178M Laundry, 1538M Wired, 37M Buf, 88M Free ARC: 729M Total, 428M MFU, 154M MRU, 196K Anon, 25M Header, 122M Other 118M Compressed, 533M Uncompressed, 4.52:1 Ratio Swap: 10G Total, 1672M Used, 8567M Free, 16% Inuse With 11.4 there was 200M active, 2500M wired, 4200M swap and the ARC stayed filled to the configured arc_max. And there are not even all applications loaded yet! Config: installed 4G ram, application footprint ~11G. vm.pageout_wakeup_thresh=11000 # default 6886 vm.v_inactive_target=48000 # default 1.5x vm.v_free_target vfs.zfs.arc_grow_retry=6 # override shrink-event from pageout (every 10sec.) I did this intentional: the ram is over-used with applications. These applications are rarely accessed, but should respond to the network. So they are best accomodated in paging space - taking a few seconds for page-in at first access does not matter, and not many of them are accessed at the same time. So, I want the machine to page out *before* shrinking the ARC, because pageout is a normal happening in this layout. The above tuning achieved exactly that, but now in 12.2 it seems missing. Without that I would need to install the full 12G RAM, which is just a waste. How do I get this behaviour back with 12.2? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Panic: 12.2 fails to use VIMAGE jails
After clean upgrade (from source) from 11.4 to 12.2-p1 my jails do no longer work correctly. Old-fashioned jails seem to work, but most are VIMAGE+NETGRAPH style, and do not work properly. All did work flawlessly for nearly a year with Rel.11. If I start 2-3 jails, and then stop them again, there is always a panic. Also reproducible with GENERIC kernel. Can this be fixed, or do I need to revert to 11.4? The backtrace looks like this: #4 0x810bbadf at trap_pfault+0x4f #5 0x810bb23f at trap+0x4cf #6 0x810933f8 at calltrap+0x8 #7 0x80cdd555 at _if_delgroup_locked+0x465 #8 0x80cdbfbe at if_detach_internal+0x24e #9 0x80ce305c at if_vmove+0x3c #10 0x80ce3010 at vnet_if_return+0x50 #11 0x80d0e696 at vnet_destroy+0x136 #12 0x80ba781d at prison_deref+0x27d #13 0x80c3e38a at taskqueue_run_locked+0x14a #14 0x80c3f799 at taskqueue_thread_loop+0xb9 #15 0x80b9fd52 at fork_exit+0x82 #16 0x8109442e at fork_trampoline+0xe This is my typical jail config, designed and tested with Rel.11: rail { jid = 10; devfs_ruleset = 11; host.hostname = "xxx.xxx.xxx.org"; vnet = "new"; sysvshm; $ifname1l = nge_${name}_1l; $ifname1l_mac = 00:1d:92:01:01:0a; vnet.interface = "$ifname1l"; exec.prestart = " echo -e \"mkpeer eiface crhook ether\nname .:crhook $ifname1l\" \ | /usr/sbin/ngctl -f - /usr/sbin/ngctl connect ${ifname1l}: svcswitch: ether link2 ifname=`/usr/sbin/ngctl msg ${ifname1l}: getifname | \ awk '$1 == \"Args:\" { print substr($2, 2, length($2)-2)}'` /sbin/ifconfig \$ifname name $ifname1l /sbin/ifconfig $ifname1l link $ifname1l_mac "; exec.poststart = " /usr/sbin/jexec $name /sbin/sysctl kern.securelevel=3 ; "; exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:"; } ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Analyzing kernel panic from VIMAGE/Netgraph takedown
Stopping a VIMAGE+Netgraph jail in 12.2 in the same way as it did work with Rel. 11.4, crashes the kernel after 2 or 3 start/stop iterations. Specifically. this does not work: exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:"; Also this new option from Rel.12 does not work either, it just gives a few more iterations: exec.release = "/usr/sbin/ngctl shutdown ${ifname1l}:"; What seems to work is adding a delay: exec.poststop = " sleep 2 ; /usr/sbin/ngctl shutdown ${ifname1l}: ; "; The big question now is: how long should the delay be? This example did run a test with 100 start/stop iterations. But then, on a loaded machine stopping a jail that had been running for a few months, is an entirely different matter: in such a case the jail will spend hours in "dying" state, while in this test the jid became instantly free for restart. In any case, as all this did work flawlessly with Rel. 11.4, there is now something broken in the code, and should be fixed. PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
Hi Kristof, it's great to read You! On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote: ! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 244703, ! 250870. epair? No. It is purely Netgraph here. ! I pushed a fix for that in CURRENT in r368237. It’s scheduled to go into ! stable/12 sometime next week, but it’d be good to know that it fixes your ! problem too before I merge it. ! In other words: can you test a recent CURRENT? It’s likely fixed there, and ! if it’s not I may be able to fix it quickly. Oh my Gods. No offense meant, but this is not really a good time for that. This is the most horrible upgrade I experienced in 25 years FreeBSD (and it was prepared, 12.2 did run fine on the other machine). I have issue with mem config https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/ I have issue with damaged filesystem, for no apparent reason https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/ Then I have this issue here which is now gladly workarounded https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365 and when I then dare to have a look at my applications, they look like sheer horror, segfaults all over, and I don't even know where to begin with these. Other option: can you make this fix so that I can patch it into 12.2 source and just redeploy? I tried to apply the changes from r368237 into my 12.2 source, that seemed to be quite obvious, but it doesn't work; jails fail to remove entirely: # service jail stop rail Stopping jails: rail. # jexec rail jexec: jail "rail" not found -> it works once. # service jail start rail Starting jails: rail. # service jail stop rail Stopping jails: rail. # jexec rail root@rail:/ # ps ax ps: empty file: Invalid argument -> And here it doesn't work anymore, and leaves a skull of a jail one cannot get rid of. Cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote: ! Yeah, the bug is not exclusive to epair but that’s where it’s most easily ! seen. Ack. ! Try http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch Great, thanks a lot. Now I have bad news: when playing yoyo with the next-best three application jails (with all their installed stuff) it took about ten up and down's then I got this one: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80aad73c stack pointer = 0x28:0xfe003f80e810 frame pointer = 0x28:0xfe003f80e810 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 15486 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607450838 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0 vpanic() at vpanic+0x17b/frame 0xfe003f80e520 panic() at panic+0x43/frame 0xfe003f80e580 trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0 trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630 trap() at trap+0x4cf/frame 0xfe003f80e740 calltrap() at calltrap+0x8/frame 0xfe003f80e740 --- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 0xfe003f80e810 --- ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810 ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850 ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0 ifioctl() at ifioctl+0x448/frame 0xfe003f80e990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe358, rbp = 0x7fffe450 --- Uptime: 9m51s Dumping 899 out of 3959 MB: I decided to give it a second try, and this is what I did: root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 8 kerb.***.org /j/kerb 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail stop rail Stopping jails: rail. root@edge:/var/crash # service jail stop tele Stopping jails: tele. root@edge:/var/crash # service jail stop kerb Stopping jails: kerb. root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn root@edge:/var/crash # jls -d JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail start kerb Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: Broken pipe Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfe00540ea658 frame pointer = 0x28:0xfe00540ea670 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13420 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607451910 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_s
Re: Panic: 12.2 fails to use VIMAGE jails
Here is the next funny crashdump - I obtained this one twice and also the sysctl_rtsock() again. I can reproduce this by just starting and stopping a most simple jail that does only exec.start = "/bin/sleep 4 &"; (And as usual, when I let it time out, nothing bad happens.) Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 02 instruction pointer = 0x20:0x80a2ac45 stack pointer = 0x28:0xfe0047cf2890 frame pointer = 0x28:0xfe0047cf2890 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13557 (ifconfig) trap number = 9 panic: general protection fault cpuid = 1 time = 1607469295 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0047cf25a0 vpanic() at vpanic+0x17b/frame 0xfe0047cf25f0 panic() at panic+0x43/frame 0xfe0047cf2650 trap_fatal() at trap_fatal+0x391/frame 0xfe0047cf26b0 trap() at trap+0x67/frame 0xfe0047cf27c0 calltrap() at calltrap+0x8/frame 0xfe0047cf27c0 --- trap 0x9, rip = 0x80a2ac45, rsp = 0xfe0047cf2890, rbp = 0xfe0047cf2890 --- strncmp() at strncmp+0x15/frame 0xfe0047cf2890 ifunit_ref() at ifunit_ref+0x59/frame 0xfe0047cf28d0 ifioctl() at ifioctl+0x427/frame 0xfe0047cf2990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe0047cf29f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe0047cf2ac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe0047cf2bf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0047cf2bf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe3b8, rbp = 0x7fffe450 --- Uptime: 8m54s Dumping 880 out of 3959 MB: ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: ! > Sorry for the bad news. ! > ! You appear to be triggering two or three different bugs there. That is possible. Then there are two or three different bugs in the production code. In any case, my current workaround, i.e. delaying in the exec.poststop > exec.poststop = " >sleep 6 ; >/usr/sbin/ngctl shutdown ${ifname1l}: ; >"; helps for it all and makes the system behave solid. This is true with and without Your patch. ! Can you reduce your netgraph use case to a small test case that can trigger ! the problem? I'm sorry, I fear I don't get Your point. Assumed there are actually two or three bugs here, You are asking me to reduce config so that it will trigger only one of them? Is that correct? Then let me put this different: assuming this is the OS for the life support system of the manned Jupiter mission. Then, which one of the bugs do You want to get fixed, and which would You prefer to keep and make Your oxygen supply cut off? https://www.youtube.com/watch?v=BEo2g-w545A ! I’m not likely to be able to do anything unless I can reproduce ! the problem(s). I understand that. From Your former mail I get the impression that you prefer to rely on tests. I consider this a bad habit[1] and prefer logical thinking. So lets try that: We know that there is a problem with taking down an interface from a VIMAGE, in the way it is done by "jail -r". We know this problem can be solidly workarounded by delaying the interface takedown for a short time. Now with Your patch, we do not get the typical crash at interface takedown. Instead, all of a sudden, there are strange crashes from various other places. And, interestingly, we get these also when STARTING a jail. I think this is not an additional problem, it is instead a valuable information (albeit not the one You might like to get). Furthermore, we get these new crashes always invoked by "ifconfig", and they seem to have in common that somebody tries to obtain information about some interface configuration and receives some bogus. I might conclude, just out of the belly without looking into details, that either - your patch achieves to garble some internal interface data, instead of what it is intended to do, or - the original problem manages to garble internal interface data (leading to the usual crash), and Your patch does not achieve to solve this, but only protects from the immediate consequence. It might also be worth consideration, that, while the problem may be more easy to reproduce with epair, this effect may or may not be a netgraph specific one[2]. Now lets keep in mind that a successful test means EXACTLY NOTHING. By which other means can we confirm that Your patch fully achieves what it is intended for? (E.g. something like dumping and verifying the respective internal tables in-vivo) (Background: It is not that I would be unwilling to create clean and precisely reproducible scenarious, But, one of my problems is currently, I only have two machines availabe: the graphical one where I'm just typing, and the backend server with the jails that does practically everything. Therefore, experimenting on any of them creates considerable pain. I'm working on that issue, trying to get a real server board for the backend so to get the current one free for testing - but what I would like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would easily find on yardsales - and seldom for an acceptable price.) cheerio, PMc [1] Rationale: a failing test tells us that either the test or the application has a bug (50/50 chance). A succeeding test tells us that 1 equals 1, which we knew already before. In fact, tests tell us *nothing at all* about the state of our code, and specifically, 'successful' outcomes do NOT mean that things are all correct. The only true usefulness of tests is to protect against re-introducing a fault that was already fixed before, i.e. regressions. [2] My netgraph configuration consists of bringing up some bridges and then attaching the jails to them. Here is the bridge starter (only respective component, there are more of these populated, but probably not influencing the issue): #! /bin/sh # PROVIDE: netgraphs # REQUIRE: netwait # BEFORE: NETWORKING . /etc/rc.subr name="netgraphs" start_cmd="${name}_start" stop_cmd="${name}_stop" load_rc_config $name netgraphs_graphs="svc" netgraphs_svc_if1_name="nge_svc_1u" netgraphs_svc_if1_mac="00:1d:92:01:02:01" netgraphs_svc_if1_addr="***.***.***.***/29" netgraphs_svc_start() { local _ifname if ngctl info svcswitch: > /dev/null 2>&1; then netgraphs_svc_stop fi echo "Creating SVC Switch" ngctl -f - < /dev/null 2>&1; then $_cmd else echo "netgraphs-start: object $i not found" >&2 fi done }
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote: ! You seem to have misinterpreted this; he doesn't want to narrow it ! down to one bug, he wants simple steps that he can follow to reproduce Maybe I did misinterpret, but then I don't really understand it. I would suppose, when testing a proposed fix, the fact that it does break under the exact same conditions as before, is all the information needed at that point. Put in simple words: that it does not work. ! any failure, preferably steps that can actually be followed by just ! about anyone and don't require immense amounts of setup time or ! additional hardware. Engineering does not normally work that way. I'll try to explain: when a bug is first encountered, it is necessary to isolate it insofar that somebody who is knowledgeable of the code, can actually reproduce it, in order to have a look at it and analyze what causes the mis-happening. If then a remedy is devised, and that does not work as expected, then the flaw is in the analysis, and we just start over from there. In fact, I would have expected somebody who is trying to fix such kind of bug, to already have testing tools available and tell me exactly which kind of data I might retrieve from the dumps. The open question now is: am I the only one seeing these failures? Might they be attributed to a faulty configuration or maybe hardware issues or whatever? We cannot know this, we can only watch out what happens at other sites. And that is why I sent out all these backtraces - because they appear weird and might be difficult to associate with this issue. I don't think there is much more we can do at this point, unless we were willing to actually look into the details. Am I discouraging? Indeed, I think, engineering is discouraging by it's very nature, and that's the fun of it: to overcome odds and finally maybe make things better. And when we start to forget about that, bad things begin to happen (anybody remember Apollo 13?). But talking about disencouragement: I usually try to track down defects I encounter, and, if possible, do a viable root-cause analysis. I tended to be very willing to share the outcomes and. if a solution arises, by all means make that get back into the code base; but I found that even ready made patches for easy matters would linger forever in the sendbug system without anybody caring, or, in more complex cases where I would need some feedback from the original writer, if only to clarify the purpose of some defaults or verify than an approach is viable, that communication is very difficult to establish. And that is what I would call disencouraging, and I for my part have accepted to just leave the developers in their ivory tower and tend to my own business. cheerio, PMc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Wed, Dec 09, 2020 at 02:00:37PM +1100, Dewayne Geraghty wrote: ! On a jail with config: ! exec.start = "/bin/sh -x /etc/rc"; ! exec.stop = "/bin/sh /etc/rc.shutdown"; ! exec.clean; ! ! test_prod { jid=7; persist; ip4.addr = ! "10.0.7.96,10.0.5.96,127.0.5.96"; devfs_ruleset = "6"; ! host.hostuuid=---0001-0302; host.hostid=000302; } ! ! I successfully performed ! for i in `seq 10`; do jail -vc test_prod; sleep 3; jail -vr test_prod; done But, this is not a VIMAGE jail, is it? Old-style jails are unaffected by this issue. Only VIMAGE jails, using epair or netgraph, might be affected. (In that case, you would not have an "ip4.addr" configured, and rather a "vnet.interface".) ! I think the normal use of jail.conf is to NOT explicitly use a jid in ! the definition, which may be why this may not have been picked up? ! (Maybe a clue). This is an interesting point. When you stop a jail, it may stay for a more or less long time in a "dying" state (visible with "jls -d"), keeping the jid occupied. During that time, the jail cannot be restarted with that same jid. Once ago, I read people complaining about this, and the advice was to just not define the jid in the definition, so that the jail can be restarted immediately (and will probably grab another jid). I did not find a solid explanation for what is happening in that "dying" state (and why it does take more or less long), even less an approach to fix that. I found some theories circling the net, but these don't really figure. So I would need to look into the source myself - and I did postpone that indefinitely. ;) But what I found out, with the VIMAGE jails (those that can carry their own network interfaces), when you make a slight mistake with managing and handling the interfaces, then the jail will stay in the dying state forever. If you don't make a mistake, then it will finally die within some time. So I decided to keep the jid, so that rightaway nothing is allowed to linger from misconfigured unnoticed. (The tradeoff is obviousely that one might have to wait before restarting.) cheerio, PMc P.S. 41 celsius is phantastic! I envy You! :) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS l2arc broken in 10.3
details to follow ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS l2arc broken in 10.3
Details: After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the l2arc stays empty (capacity alloc = 0), although it is online and gets accessed. It did work well on 9.3. I did the following tests: * Create a zpool on a stick, with two volumes: one filesystem and one cache. The cache stays with alloc=0. Export it and move it into the other machine. The cache immediately fills. Move it back, the cache stays with alloc=0. -> this rules out all zpool/zfs get/set options, as they should walk with the pool. * Boot the GENERIC kernel. l2arc stays with alloc=0. -> this rules out all my nonstandard kernel options. * Boot in single user mode. l2arc stays with alloc=0. -> this rules out all /etc/* config files. * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0. * Copy the /boot/loader.conf settings to the other machine. The l2arc still works there. I could not think of any remaining place where this could come from, except the kernel code itself. From there, I found these counters nicely incrementing each second: kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488 But also this counter incrementing: kstat.zfs.misc.arcstats.l2_write_full: 14604 Then with some printf in the code I saw these values provided: buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; buf_a_sz = P2ROUNDUP(buf_sz, align); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; mutex_exit(hash_lock); ARCSTAT_BUMP(arcstat_l2_write_full); break; } buf_sz =1536 align = 512 buf_a_sz = 18446744069414585856 write_asize = 0 target_sz = 16777216 where buf_a_sz is obviousely off by (2^64 - 2^32). Maybe this is an effect of crosscompiling i386 on amd64. But anyway, as long as i386 is still supported, it should not happen. Now, my real concern is: if this really obvious ... made it undetected until 10.3, how many other missing typecasts are still in the code?? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
[fixed] ZFS l2arc broken in 10.3
sendbug seems not to work anymore, I end up on websites with marketing- babble and finally get asked to provide some login and passwd. :( But the former mail looks like having come back to me, so it seems I'm still allowed to post here... *** sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c.orig Wed Oct 12 21:07:25 2016 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.cWed Oct 12 21:46:16 2016 *** *** 6508,6514 */ buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; ! buf_a_sz = P2ROUNDUP(buf_sz, align); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; --- 6508,6514 */ buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; ! buf_a_sz = P2ROUNDUP_TYPED(buf_sz, align, uint64_t); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS l2arc broken in 10.3
Pete French wrote: Ok, thats a bit worry if true - but I can confirm that l2arc works fine under 10.3 on amd64, so what you say about cross-compling might be true. Am taking an inetrest in this as I have just dpeloyed a lot of machines which are going to be relying on l2arc working to get reasobale performance. Sure on my amd64 it also works fine. AFAIK such things are tolerated when compiling in 64bit. But I was pointed to another point interim: my source is from STABLE branch; in the 10.3 RELEASE the code is different. Obviousely there were recent changes, and that explains why the problem was not yet detected. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Nightly disk-related panic since upgrade to 10.3
Andrea Venturoli wrote: Hello. Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and rebooted at least once every night. Hi, I have quite similar issue, crash dumps every night, but then my stacktrace is different (crashing mostly in cam/scsi/scsi.c), and my env is also quite different (old i386, individual disks, extensive use of ZFS), so here is very likely a different reason. Also here the upgrade is not the only change, I also replaced a burnt powersupply recently and added an SSD cache. Basically You have two options: A) fire up kgdb, go into the code and try and understand what exactly is happening. This depends if You have clue enough to go that way; I found "man 4 gdb" and especially the "Debugging Kernel Problems" pdf by Greg Lehey quite helpful. B) systematically change parameters. Start by figuring from the logs the exact time of crash and what was happening then, try to reproduce that. Then change things and isolate the cause. Having a RAID controller is a bit ugly in this regard, as it is more or less a blackbox, and difficult to change parameters or swap components. The only exception was on Friday, when it locked without rebooting: it still answered ping request and logins through HTTP would half work; I'm under the impression that the disk subsystem was hung, so ICMP would work since it does no I/O and HTTP too worked as far as no disk access was required. Yep. That tends to happen. It doesnt give much clue, except that there is a disk related problem. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs, a directory that used to hold lot of files and listing pause
Eugene M. Zheganin wrote: Hi. I have FreeBSD 10.2-STABLE r289293 (but I have observed this situation on different releases) and a zfs. I also have one directory that used to have a lot of (tens of thousands) files. I surely takes a lot of time to get a listing of it. But now I have 2 files and a couple of dozens directories in it (I sorted files into directories). Surprisingly, there's still a lag between "ls" and an output: I see this on my pgsql_tmp dirs (where Postgres stores intermediate query data that gets too big for mem - usually lots of files) - in normal operation these dirs are completely empty, but make heavy disk activity (even writing!) when doing ls. Seems normal, I dont care as long as the thing is stable. One would need to check how ZFS stores directories and what kind of fragmentation can happen there. Or wait for some future feature that would do housekeeping. ;) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Dying jail
Eugene Grosbein wrote: Hi! Recently I've upgraded one of my server running 9.3-STABLE with jail containing 4.11-STABLE system. The host was source-upgraded upto 10.3-STABLE first and next to 11.0-STABLE and jail configuration migrated to /etc/jail.conf. The jail kept intact. "service jail start" started the jail successfully but "service jail restart" fails due to jail being stuck in "dying" state for long time: "jls" shows no running jails and "jls -d" shows the dying jail. Same issue here. During upgrade to 10 I wrote a proper jail.conf, and, as this is now a much more transparent handling, I also began to start+stop my jails individually w/o reboot. I found the same issue: often jails do not want to fully terminate, but stay in the "dying" state - sometimes for a minute or so, but sometimes very long (indefinite). It seems this is not related to remaining processes or open files (there are none) but to network connections/sockets which are still present. Probably these connections can be displayed with netstat, and probably netstat -x shows some decreasing counters associated with them - I have not yet found the opportunity to figure out what they exactly mean, but anyway it seems like there may be long times involved (hours? forever?), unless one finds the proper connection and terminates both ends. There seems to be no other way to deliberately "kill" such connections and thereby terminate the jail, so the proposal to let it have a new number might be the only feasible approach. (I dont like it, I got used to the numbers of my jails.) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
10-STABLE zfs: strange memory stats
I observe a strange reading of the ZFS memory stats: Mem: 298M Active, 207M Inact, 446M Wired, 10M Cache, 91M Buf, 29M Free ARC: 339M Total, 8758K MFU, 43M MRU, 52K Anon, 35M Header, 40M Other Swap: 2441M Total, 402M Used, 2040M Free, 16% Inuse Usually I perceived the "Total" value being approx. the sum of the other values. Now this is still the case after system start, but after a day the significant difference appears like shown above. (40+35+43+9 = 127 << 339) Also it seems the ARC is reluctant to grow when free mem is avail nor does it shrink much while paging out. The build is r309023M. Definitely the behaviour is different than what I tried before (r306589:306943M), but that one was probably unstable, and I see a bunch of ZFS related commits interim. Also I now see some counts on "l2_cksum_bad" which weren't there before. BTW: is there some specific mailing-list where ZFS changes are pronounced? Machine is i386 with 1GB mem. Probably hardware is somehow crappy, but at least the mem readings are difficult to explain by hardware weakness. Config (in case it matters): vm.kmem_size="576M" vm.kmem_size_max="576M" vfs.zfs.arc_max="320M" vfs.zfs.arc_min="120M" vfs.zfs.vdev.cache.size="5M" vfs.zfs.prefetch_disable="0" vfs.zfs.l2arc_norw="0" vfs.zfs.l2arc_noprefetch="0" kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch: 1016019 kstat.zfs.misc.arcstats.sync_wait_for_async: 1157 kstat.zfs.misc.arcstats.arc_meta_min: 62914560 kstat.zfs.misc.arcstats.arc_meta_max: 242711832 kstat.zfs.misc.arcstats.arc_meta_limit: 83886080 kstat.zfs.misc.arcstats.arc_meta_used: 133996612 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 272242 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 489828 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 3372460809216 kstat.zfs.misc.arcstats.l2_write_pios: 14313 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 122496 kstat.zfs.misc.arcstats.l2_write_full: 177 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 4673385 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 925 kstat.zfs.misc.arcstats.l2_write_in_l2: 93122523 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 196362282 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 57198 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 20575 kstat.zfs.misc.arcstats.l2_padding_needed: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 33567112 kstat.zfs.misc.arcstats.l2_asize: 4040757248 kstat.zfs.misc.arcstats.l2_size: 4472570880 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 61 kstat.zfs.misc.arcstats.l2_abort_lowmem: 15 kstat.zfs.misc.arcstats.l2_free_on_write: 26703 kstat.zfs.misc.arcstats.l2_evict_l1cached: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_writes_lock_retry: 173 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_done: 14313 kstat.zfs.misc.arcstats.l2_writes_sent: 14313 kstat.zfs.misc.arcstats.l2_write_bytes: 6030606336 kstat.zfs.misc.arcstats.l2_read_bytes: 11140009984 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_feeds: 122496 kstat.zfs.misc.arcstats.l2_misses: 4370503 kstat.zfs.misc.arcstats.l2_hits: 2932017 kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 46062080 kstat.zfs.misc.arcstats.mfu_ghost_evictable_data: 1047040 kstat.zfs.misc.arcstats.mfu_ghost_size: 47109120 kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0 kstat.zfs.misc.arcstats.mfu_evictable_data: 114688 kstat.zfs.misc.arcstats.mfu_size: 9073664 kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 178836480 kstat.zfs.misc.arcstats.mru_ghost_evictable_data: 86231040 kstat.zfs.misc.arcstats.mru_ghost_size: 265067520 kstat.zfs.misc.arcstats.mru_evictable_metadata: 5632 kstat.zfs.misc.arcstats.mru_evictable_data: 1155072 kstat.zfs.misc.arcstats.mru_size: 49945088 kstat.zfs.misc.arcstats.anon_evictable_metadata: 0 kstat.zfs.misc.arcstats.anon_evictable_data: 0 kstat.zfs.misc.arcstats.anon_size: 53248 kstat.zfs.misc.arcstats.other_size: 44759120 kstat.zfs.misc.arcstats.metadata_size: 50840064 kstat.zfs.misc.arcstats.data_size: 231464448 kstat.zfs.misc.arcstats.hdr_size: 4830316 kstat.zfs.misc.arcstats.overhead_size: 41351168 kstat.zfs.misc.arcstats.uncompressed_size: 52131328 kstat.zfs.misc.arcstats.compressed_size: 17729024 kstat.zfs.misc.arcstats.size: 365461060 kstat.zfs.misc.arcstats.c_max: 335544320 kstat.zfs.misc.arcstats.c_min: 125829120 kstat.zfs.misc.arcstats.c: 315017029 kstat.zfs.misc.arcstats.p: 145334923 kstat.zfs.misc.arcstats.hash_chain_max: 17 kstat.zfs.misc.arcstats.hash_chains: 119135 kstat.zfs.misc.arcstats.hash_collisions: 6453863 kstat.zfs.misc.arcstats.hash_elements_max: 538227 kstat.zfs.misc.arcstats.hash_elements: 525460 kstat.zfs.misc.arcstats.evict_l2_skip: 4277 kstat.zfs.misc.arcstats.evict_l2_ineligible: 7410790400 kstat.zfs.misc.arcstats.evict_l2_eligible: 14946466816 kstat.zfs.misc.arcstats.evict_l2_cached: 26608123904 kstat.
Rel.10.3 zfs GEOM removal and memory leak
Question: how to get ZFS l2arc working on FBSD 10.3 (RELENG or STABLE)? Problem using 10.3 RELENG: When ZFS is called the first time after boot, it will delete all device nodes of the drive carrying l2arc. ZFS itself will access it's slices by a "diskid/" string, but all other access is impossible - especially, a swapspace on the same drive (NOT under ZFS) will fail to activate: > NAME STATE READ WRITE CKSUM > gr ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da0s2 ONLINE 0 0 0 > da1s2 ONLINE 0 0 0 > da2s2 ONLINE 0 0 0 > cache > diskid/DISK-162020405512s1e ONLINE 0 0 0 Here "diskid/DISK-162020405512s1e" equals to ada3s1e, and trying to open a swapspace on ada3s1b now fails, because that device is no longer present in /dev : > root@edge:~ # gpart show ada3 > gpart: No such geom: ada3. If we now remove the l2arc via "zfs remove gr diskid/DISK-162020405512s1e" then the device nodes magically reappear, and we can activate swapspace. Afterwards we can add the l2arc again, and it will be shown correctly as "ada3s1e" - but at the next boot the problem appears again. This problem does not exist in 10.3 STABLE, but instead there is: Problem using 10.3 STABLE: Here seems to be a memory leak: the ARC grows above its limits, while the space used is not accounted in one of [MFU MRU Anon Header Other L2Hdr]. After some time the MFU+MRU shrink to the bare minimum, and the system is all busy with arc_reclaim. The behaviour seems to be triggered by writing to l2arc.(*) Any advice on how to proceed (or which supported version might work better)? (*) Addendum: I tried to understand the phenomen, and found this on arcstats: (metadata_size + data_size) + hdr_size + l2_hdr_size + other_size = size and metadata_size + data_size = mfu_size + mru_size + anon_size + X The X is the memory leak, it does never shrink, does not disappear when all l2arc are removed, and while l2arc are written it does continually (but not linear) grow until the system is quite stuck and l2arc write ceases. Further investigations shows the growing of X being synchronous with the growing of kstat.zfs.misc.arcstats.l2_free_on_write figure. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-BETA1: lsof build failure
FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-BETA1: lsof build failure
Larry Rosenman wrote: Current lsof is 4.90M. Ack, that does. Larry Sysutils/lsof maintainer On 6/14/17, 8:13 AM, "Peter" wrote: FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: new line containing garbage added to "top"
After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: > last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 > 1030 processes:9 running, 1004 sleeping, 17 waiting > CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle > CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle > Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free > ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other > 136¿176M Compress185 194M Uncompressed361.94:1 Ratio > Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse > PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. (Actually it looks like a wrong TERMCAP, but wasn't that ~20 years ago? checking...) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: huge amount of l2_cksum_bad
After upgrading 11.0-RELEASE-p10 to 11.1-RELEASE I suddenly see a huge amount of kstat.zfs.misc.arcstats.l2_cksum_bad (nearly 2% of kstat.zfs.misc.arcstats.l2_hits). I have set > vfs.zfs.compressed_arc_enabled="0" in loader.conf. When removing this, the errors are gone. It seems that option is not working well in 11.1-RELEASE. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE: new line containing garbage added to "top"
Glen Barber wrote: On Fri, Jul 28, 2017 at 03:24:50PM +0200, Peter wrote: After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 1030 processes:9 running, 1004 sleeping, 17 waiting CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other 136¿176M Compress185 194M Uncompressed361.94:1 Ratio Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. (Actually it looks like a wrong TERMCAP, but wasn't that ~20 years ago? checking...) Do you mean the blank line between the 'Swap:' line and 'PID'? If so, that has been there as long as I can recall. It is used for things like killing processes, etc. (Hit 'k' when using top(1), and you will see a prompt for a PID to kill.) Glen No, I mean the line *above* the 'Swap:' line, which is new and *should* show compressed arc stats. (What we actually see there is the printing of a random memory location - working on it...) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE: new line containing garbage added to "top"
Glen Barber wrote: On Fri, Jul 28, 2017 at 03:24:50PM +0200, Peter wrote: After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 1030 processes:9 running, 1004 sleeping, 17 waiting CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other 136¿176M Compress185 194M Uncompressed361.94:1 Ratio Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. It appears to be fixed in 11-STABLE (r321419). Glen I don't think so. At least there is nothing in the commitlog. r318449 is the last commit in 11-STABLE for the respective file; and thats before the 11.1-RELEASE branch. The error is in the screen-formatting in "top", and that error was already present back in 1997 (and probably earlier), and it is also present in HEAD. What "top" does is basically this: > char *string = some_buffer_to_print; > printf("%.5s", &string[-4]); A negative index on a string usually yields a nullified area. (Except if otherwise *eg*) Thats why we usually don't see the matter - nullbytes are invisible on screen. Fix is very simple: Index: contrib/top/display.c === --- display.c (revision 321434) +++ display.c (working copy) @@ -1310,7 +1310,7 @@ cursor_on_line = Yes; putchar(ch); *old = ch; - lastcol = 1; + lastcol++; } old++; - Then, since I was at it, I decided to beautify the proc display as well, as I usually see >1000 procs: --- display.c (revision 321434) +++ display.c (working copy) @@ -100,7 +100,7 @@ int y_loadave = 0; int x_procstate = 0; int y_procstate = 1; -int x_brkdn = 15; +int x_brkdn = 16; int y_brkdn = 1; int x_mem = 5; int y_mem = 3; @@ -373,9 +373,9 @@ printf("%d processes:", total); ltotal = total; -/* put out enough spaces to get to column 15 */ +/* put out enough spaces to get to column 16 */ i = digits(total); -while (i++ < 4) +while (i++ < 5) { putchar(' '); } Then, concerning the complaint about the empty line (bug #220996), I couldn't really reproduce this. But it seems that specifically this issue was already fixed in HEAD by this one here: https://reviews.freebsd.org/D11693 Now, can anybody make the above snippets appear in HEAD and 11-STABLE? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE: new line containing garbage added to "top"
Glen Barber wrote: On Fri, Jul 28, 2017 at 07:04:51PM +0200, Peter wrote: Glen Barber wrote: On Fri, Jul 28, 2017 at 03:24:50PM +0200, Peter wrote: After upgrading to 11.1-RELEASE, a new line appears in the output of "top" which contains rubbish: last pid: 10789; load averages: 5.75, 5.19, 3.89up 0+00:34:46 03:23:51 1030 processes:9 running, 1004 sleeping, 17 waiting CPU 0: 16.0% user, 0.0% nice, 78.7% system, 4.9% interrupt, 0.4% idle CPU 1: 8.0% user, 0.0% nice, 82.5% system, 9.1% interrupt, 0.4% idle Mem: 218M Active, 34M Inact, 105M Laundry, 600M Wired, 18M Buf, 34M Free ARC: 324M Total, 54M MFU, 129M MRU, 2970K Anon, 13M Header, 125M Other 136¿176M Compress185 194M Uncompressed361.94:1 Ratio Swap: 2441M Total, 277M Used, 2164M Free, 11% Inuse PID USERNAME PRI NICE SIZERES STATE C TIMEWCPU COMMAND .. That looks funny. But I dont like it. It appears to be fixed in 11-STABLE (r321419). Glen I don't think so. At least there is nothing in the commitlog. r318449 is the last commit in 11-STABLE for the respective file; and thats before the 11.1-RELEASE branch. See r321419. Yes, thats the issue with the empty line when ZFS is *not* in use, which I mentioned below (bug #220996). For that a fix is committed. The error is in the screen-formatting in "top", and that error was already present back in 1997 (and probably earlier), and it is also present in HEAD. What "top" does is basically this: char *string = some_buffer_to_print; printf("%.5s", &string[-4]); A negative index on a string usually yields a nullified area. (Except if otherwise *eg*) Thats why we usually don't see the matter - nullbytes are invisible on screen. Fix is very simple: Index: contrib/top/display.c === --- display.c (revision 321434) +++ display.c (working copy) @@ -1310,7 +1310,7 @@ cursor_on_line = Yes; putchar(ch); *old = ch; - lastcol = 1; + lastcol++; } old++; - Then, since I was at it, I decided to beautify the proc display as well, as I usually see >1000 procs: --- display.c (revision 321434) +++ display.c (working copy) @@ -100,7 +100,7 @@ int y_loadave = 0; int x_procstate = 0; int y_procstate = 1; -int x_brkdn = 15; +int x_brkdn = 16; int y_brkdn = 1; int x_mem = 5; int y_mem = 3; @@ -373,9 +373,9 @@ printf("%d processes:", total); ltotal = total; -/* put out enough spaces to get to column 15 */ +/* put out enough spaces to get to column 16 */ i = digits(total); -while (i++ < 4) +while (i++ < 5) { putchar(' '); } Then, concerning the complaint about the empty line (bug #220996), I couldn't really reproduce this. But it seems that specifically this issue was already fixed in HEAD by this one here: https://reviews.freebsd.org/D11693 Now, can anybody make the above snippets appear in HEAD and 11-STABLE? I've CC'd allanjude, who has touched some of these in the past. Thanks a lot! ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: panic! acl_from_aces: a_type is 0x4000
This is mostly for the search engines, so others running into it may find it easier to solve. While updating some ports via "portupgrade", I got this panic: Panic String: acl_from_aces: a_type is 0x4000 The phenomen was reproducible; it appeared while creating a backup package from the "glib" port. I checked readability of all concerned files, did a scrub on the pool, but found no errors! As I was busy with other issues, I then neglected the matter and simply deleted and reinstalled that port. A couple days later, working on a different installation, I got the exact same panic at the exact same point, while updating the "glib" port. This time I looked closer into the matter. According to "truss", the panic appears while "pkg" calls __acl_get_link() on a specific file. That file is readable. The directory tree can be searched. But it is not possible to do "ls -l" on the directory -> panic! It is possible to send+recv the Filesystem: the error gets transported to the new filesystem! (From ZFS view it seems to be legal payload; only from FreeBSD file-handling view it is reason for panic.) Finally, the file can be copied, unlinked, and recreated. I did a thorough search and found a dozen other files on the system with the same issue. REMEDY: --- It seems that such flaws can lure undetected on a system for an indefinite time. The only way to find them seems read all inode data, via something like #find -x `mount -t zfs | awk '{print $3}'` -type d -exec ls -la {} \; ROOT CAUSE: --- Not fully clear. It may be related to hardware (memory) flaws. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: a strange and terrible saga of the cursed iSCSI ZFS SAN
Eugene M. Zheganin wrote: Hi, On 05.08.2017 22:08, Eugene M. Zheganin wrote: pool: userdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM userdata ONLINE 0 0 216K mirror-0 ONLINE 0 0 432K gpt/userdata0 ONLINE 0 0 432K gpt/userdata1 ONLINE 0 0 432K That would be funny, if not that sad, but while writing this message, the pool started to look like below (I just asked zpool status twice in a row, comparing to what it was): [root@san1:~]# zpool status userdata pool: userdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM userdata ONLINE 0 0 728K mirror-0 ONLINE 0 0 1,42M gpt/userdata0 ONLINE 0 0 1,42M gpt/userdata1 ONLINE 0 0 1,42M errors: 4 data errors, use '-v' for a list [root@san1:~]# zpool status userdata pool: userdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM userdata ONLINE 0 0 730K mirror-0 ONLINE 0 0 1,43M gpt/userdata0 ONLINE 0 0 1,43M gpt/userdata1 ONLINE 0 0 1,43M errors: 4 data errors, use '-v' for a list So, you see, the error rate is like speed of light. And I'm not sure if the data access rate is that enormous, looks like they are increasing on their own. So may be someone have an idea on what this really means. It is remarkable that You always have the same error count on both sides of the mirror. From what I have seen, such a picture appears when an unrecoverable error (i.e. one that is on both sides of the mirror) is read again and again. File number 0x1 is probably some important metadata, and since it is not readable it cannot be put into the ARC, so the read is tried ever again. An error that would appear only on one side appears only once, because then it is auto-corrected. In that case the figures have some erratic deviations. Therefore it is worthwile to remove the erroneous data soon, because as long as that exists one does not get anything useful from the figures (like how many errors are actually appearing anew). ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-RELEASE: magic hosed, file recognition fails
Just found that my scripts that would detect image types by means of the "file" command do not work anymore in RELEASE-11. :( Whats happening in R11.1 is this: $ scanimage > /tmp/SCAN $ file /tmp/SCAN /tmp/SCAN: data While on R10 in looked this way, which appears slightly more useful: $ scanimage > /tmp/SCAN $ file /tmp/SCAN /tmp/SCAN: Netpbm image data, size = 2480 x 3507, rawbits, greymap Further investigation shows, the problem may have appeared with this update: >r309847 | delphij | 2016-12-11 08:33:02 +0100 (Sun, 11 Dec 2016) | 2 lines > >MFC r308420: MFV r308392: file 5.29. And that is a contrib, it seems the original comes from fishy penguins. So no proper repo, and doubtful if anybody might be in charge, but instead some colorful pictures like this one: https://fossies.org/diffs/file/5.28_vs_5.29/magic/Magdir/images-diff.html --- Looking closer - this is my file header: pmc@disp:604:1/tmp$ hd SCAN |more 50 35 0a 23 20 53 41 4e 45 20 64 61 74 61 20 66 |P5.# SANE data f| 0010 6f 6c 6c 6f 77 73 0a 32 34 38 30 20 33 35 30 37 |ollows.2480 3507| 0020 0a 32 35 35 0a 5f 58 56 4b 53 49 4b 52 54 50 51 |.255._XVKSIKRTPQ| 0030 4e 4c 52 5b 56 55 4c 47 4e 4f 4e 4d 53 54 53 4d |NLR[VULGNONMSTSM| 0040 53 49 50 52 4c 51 4f 53 56 55 53 4d 55 4e 4e 4c |SIPRLQOSVUSMUNNL| 0050 55 49 4d 50 52 4c 4e 50 4d 56 4e 51 52 4e 4e 50 |UIMPRLNPMVNQRNNP| And this is the ruleset in the magic file: # PBMPLUS images # The next byte following the magic is always whitespace. # strength is changed to try these patterns before "x86 boot sector" 0 namenetpbm >3 regex/s =[0-9]{1,50}\ [0-9]{1,50} Netpbm image data >>&0regex =[0-9]{1,50}\b, size = %s x >>>&0 regex =[0-9]{1,50}\b %s 0 string P5 >0 regex/4 P5\\s >>0 use netpbm >>>0string x \b, rawbits, pixmap !:strength + 45 !:mime image/x-portable-pixmap The failing line is the one with "regex/4" command, and I dont see why there is a *double* \ - but a single one doesnt work either. Using \n instead, would work. And what also works is this one: >0 regex/4 P5[[:space:]] To figure the root cause would mean to look into that libmagic, and maybe there is a misunderstanding between the design of that lib and the linux guys maintaining the magic file? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
errors from port make (analyzed: bug in pkg)
For a long time already, I get these strange messages whenever building a port: pkg: Bad argument on pkg_set 2143284626 Today I looked into it, and found it is easily reproducible: # pkg audit whatever pkg: Bad argument on pkg_set 2143284618 0 problem(s) in the installed packages found. # Looking closer, I found this offending call in src/audit.c:exec_audit(): pkg_set(pkg, PKG_UNIQUEID, name); This goes into libpkg/pkg.c:pkg_vset(), but there nobody is interested in an UNIQUEID parameter, so that the parameter does not get fetched from the va_list. It does not do any harm, but it is ugly. Please fix. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
kern.sched.quantum: Creepy, sadistic scheduler
Occasionally I noticed that the system would not quickly process the tasks i need done, but instead prefer other, longrunning tasks. I figured it must be related to the scheduler, and decided it hates me. A closer look shows the behaviour as follows (single CPU): Lets run an I/O-active task, e.g, postgres VACUUM that would continuousely read from big files (while doing compute as well [1]): >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G 1.58K 0 12.9M 0 Now start an endless loop: # while true; do :; done And the effect is: >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G 9 0 76.8K 0 The VACUUM gets almost stuck! This figures with WCPU in "top": > PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND >85583 root990 7044K 1944K RUN 1:06 92.21% bash >53005 pgsql 520 620M 91856K RUN 5:47 0.50% postgres Hacking on kern.sched.quantum makes it quite a bit better: # sysctl kern.sched.quantum=1 kern.sched.quantum: 94488 -> 7874 >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G395 0 3.12M 0 > PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND >85583 root940 7044K 1944K RUN 4:13 70.80% bash >53005 pgsql 520 276M 91856K RUN 5:52 11.83% postgres Now, as usual, the "root-cause" questions arise: What exactly does this "quantum"? Is this solution a workaround, i.e. actually something else is wrong, and has it tradeoff in other situations? Or otherwise, why is such a default value chosen, which appears to be ill-deceived? The docs for the quantum parameter are a bit unsatisfying - they say its the max num of ticks a process gets - and what happens when they're exhausted? If by default the endless loop is actually allowed to continue running for 94k ticks (or 94ms, more likely) uninterrupted, then that explains the perceived behaviour - buts thats certainly not what a scheduler should do when other procs are ready to run. 11.1-RELEASE-p7, kern.hz=200. Switching tickless mode on or off does not influence the matter. Starting the endless loop with "nice" does not influence the matter. [1] A pure-I/O job without compute load, like "dd", does not show this behaviour. Also, when other tasks are running, the unjust behaviour is not so stongly pronounced. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
George Mitchell wrote: On 04/04/18 06:39, Alban Hertroys wrote: [...] That said, SCHED_ULE (the default scheduler for quite a while now) was designed with multi-CPU configurations in mind and there are claims that SCHED_4BSD works better for single-CPU configurations. You may give that a try, if you're not already on SCHED_4BSD. [...] A small, disgruntled community of FreeBSD users who have never seen proof that SCHED_ULE is better than SCHED_4BSD in any environment continue to regularly recompile with SCHED_4BSD. I dread the day when that becomes impossible, but at least it isn't here yet. -- George Yes *laugh*, I found a very lengthy and mind-boggling discussion from back in 2011. And I found that You made this statement somewhere there: // With nCPU compute-bound processes running, with SCHED_ULE, any other // process that is interactive (which to me means frequently waiting for // I/O) gets ABYSMAL performance -- over an order of magnitude worse // than it gets with SCHED_4BSD under the same conditions. -- https://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064984.html And this describes quite exactly what I perceive. Now, I would like to ask: what has been done about this issue? P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Try setting kern.sched.preempt_thresh != 0
Stefan Esser wrote: I'm guessing that the problem is caused by kern.sched.preempt_thresh=0, which prevents preemption of low priority processes by interactive or I/O bound processes. For a quick test try: # sysctl kern.sched.preempt_thresh=1 Hi Stefan, thank You, thats an interesting knob! Only it is actually the other way round: it is not set to 0. My settings (as default) are: kern.sched.steal_thresh: 2 kern.sched.steal_idle: 1 kern.sched.balance_interval: 127 kern.sched.balance: 1 kern.sched.affinity: 1 kern.sched.idlespinthresh: 157 kern.sched.idlespins: 1 kern.sched.static_boost: 152 kern.sched.preempt_thresh: 80 kern.sched.interact: 30 kern.sched.slice: 12 kern.sched.quantum: 94488 kern.sched.name: ULE kern.sched.preemption: 1 kern.sched.cpusetsize: 4 But then, if I change kern.sched.preempt_thresh to 1 *OR* 0, things behave properly! Precisely, changing from 8 down to 7 changes things completely: >poolalloc free read write read write >cache - - - - - - > ada1s47.08G 10.9G927 0 7.32M 0 > PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND > 1900 pgsql 820 618M 17532K RUN 0:53 34.90% postgres > 1911 admin 810 7044K 2824K RUN 6:07 28.34% bash (Notice the PRI values which also look differnt now.) rgds, P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Hi Alban! Alban Hertroys wrote: Occasionally I noticed that the system would not quickly process the tasks i need done, but instead prefer other, longrunning tasks. I figured it must be related to the scheduler, and decided it hates me. If it hated you, it would behave much worse. Thats encouraging :) But I would say, running a job 100 times slower than expected is quite an amount of hate for my taste. A closer look shows the behaviour as follows (single CPU): A single CPU? That's becoming rare! Is that a VM? Old hardware? Something really specific? I don't plug in another CPU because there is no need to. Yes, its old hardware: CPU: Intel Pentium III (945.02-MHz 686-class CPU) ACPI APIC Table: If I had bought new hardware, this one would now rot in Africa, and I would have new hardware idling along that is spectre/meltdown affected nevertheless. Lets run an I/O-active task, e.g, postgres VACUUM that would And you're running a multi-process database server on it no less. That > is going to hurt, I'm running a lot more than only that on it. But it's all private use, idling most of the time. no matter how well the scheduler works. Maybe. But this post is not about my personal expectations on over-all performance - it is about a specific behaviour that is not how a scheduler is expected to behave - no matter if we're on a PDP-11 or on a KabyLake. Now, as usual, the "root-cause" questions arise: What exactly does this "quantum"? Is this solution a workaround, i.e. actually something else is wrong, and has it tradeoff in other situations? Or otherwise, why is such a default value chosen, which appears to be ill-deceived? The docs for the quantum parameter are a bit unsatisfying - they say its the max num of ticks a process gets - and what happens when they're exhausted? If by default the endless loop is actually allowed to continue running for 94k ticks (or 94ms, more likely) uninterrupted, then that explains the perceived behaviour - buts thats certainly not what a scheduler should do when other procs are ready to run. I can answer this from the operating systems course I followed recently. This does not apply to FreeBSD specifically, it is general job scheduling theory. I still need to read up on SCHED_ULE to see how the details were implemented there. Or are you using the older SCHED_4BSD? I'm using the default scheduler, which is ULE. I would not go non-default without reason. (But it seems, a reason is just appering now.) Now, that would cause a much worse situation in your example case. The endless loop would keep running once it gets the CPU and would never release it. No other process would ever get a turn again. You wouldn't even be able to get into such a system in that state using remote ssh. That is why the scheduler has this "quantum", which limits the maximum time the CPU will be assigned to a specific job. Once the quantum has expired (with the job unfinished), the scheduler removes the job from the CPU, puts it back on the ready queue and assigns the next job from that queue to the CPU. That's why you seem to get better performance with a smaller value for the quantum; the endless loop gets forcibly interrupted more often. Good description. Only my (old-fashioned) understanding was that this is the purpose of the HZ value: to give control back to the kernel, so that a new decision can be made. So, I would not have been surpized to see 200 I/Os for postgres (kern.hz=200), but what I see is 9 I/Os (which indeed figures to a "quantum" of 94ms). But then, we were able to do all this nicely on single-CPU machines for almost four decades. It does not make sense to me if now we state that we cannot do it anymore because single-CPU is uncommon today. (Yes indeed, we also cannot fly to the moon anymore, because today nobody seems to recall how that stuff was built. *headbangwall*) This changing of the active job however, involves a context switch for the CPU. Memory, registers, file handles, etc. that were required by the previous job needs to be put aside and replaced by any such resources related to the new job to be run. That uses up time and does nothing to progress the jobs that are waiting for the CPU. Hence, you don't want the quantum to be too small either, or you'll end up spending significant time switching contexts. Yepp. My understanding was that I can influence this behaviour via the HZ value, so to tradeoff responsiveness against performance. Obviousely that was wrong. From Your writing, it seems the "quantum" is indeed the correct place to tune this. (But I will still have to ponder a while about the knob mentioned by Stefan, concerning preemption, which seems to magically resolve the issue.) That said, SCHED_ULE (the default scheduler for quite a while now) was designed with multi-CPU configurations in mind and there are claims that SCHED_4BSD works better for single-CPU configurations. You may gi
Re: kern.sched.quantum: Creepy, sadistic scheduler
Andriy Gapon wrote: On 04/04/2018 03:52, Peter wrote: Lets run an I/O-active task, e.g, postgres VACUUM that would continuousely read from big files (while doing compute as well [1]): Not everyone has a postgres server and a suitable database. Could you please devise a test scenario that demonstrates the problem and that anyone could run? Andriy, and maybe nobody anymore has such old system that is CPU-bound instead of IO-bound. I'd rather think about reproducing it on my IvyBridge. I know for sure that it is *not* specifically dependent on postgres. What I posted was the case when an endless-loop piglet starves a postgres VACUUM - and there we see a very pronounced effect of almost factor 100. When I first clearly discovered it (after a long time of belly-feeling that something behaves strange), it was postgres pg_dump (which does compression, i.e. CPU-bound) as the piglet starving an bacula-fd backup that would scan the filesystem. So, there is a general rule: we have one process that is a CPU-hog, and another process that does periodic I/O (but also *some* compute). and -important!- nothing else. If we understand the logic of the scheduler, that information should already suit for some logical verification *eg* - but I will see if I get it reprocuved on the IvyBridge machine and/or see if I get a testcase together. May take a while. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Andriy Gapon wrote: Not everyone has a postgres server and a suitable database. Could you please devise a test scenario that demonstrates the problem and that anyone could run? Alright, simple things first: I can reproduce the effect without postgres, with regular commands. I run this on my database file: # lz4 2058067.1 /dev/null And have this as throughput: poolalloc free read write read write cache - - - - - - ada1s47.08G 10.9G889 0 7.07M 42.3K PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND 51298 root870 16184K 7912K RUN 1:00 51.60% lz4 I start the piglet: $ while true; do :; done And, same effect: poolalloc free read write read write cache - - - - - - ada1s47.08G 10.9G 10 0 82.0K 0 PID USERNAME PRI NICE SIZERES STATETIMEWCPU COMMAND 1911 admin 980 7044K 2860K RUN 65:48 89.22% bash 51298 root520 16184K 7880K RUN 0:05 0.59% lz4 It does *not* happen with plain "cat" instead of "lz4". What may or may not have an influence on it: the respective filesystem is block=8k, and is 100% resident in l2arc. What is also interesting: I started trying this with "tar" (no effect, behaves properly), then with "tar --lz4". In the latter case "tar" starts "lz4" as a sub-process, so we have three processes in the play - and in that case the effect happens, but to lesser extent: about 75 I/Os per second. So, it seems quite clear that this has something to do with the logic inside the scheduler. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Eugene Grosbein wrote: I see no reasons to use SHED_ULE for such single core systems and use SCHED_BSD. Nitpicking: it is not a single core system, it's a dual that for now is equipped with only one chip, the other is in the shelf. But seriously, I am currently working myself through the design papers for the SCHED_ULE and the SMP stuff, and I tend to be with You and George, in that I do not really need these features. Nevertheless, I think the system should have proper behaviour *as default*, or otherwise there should be a hint in the docs what to do about. Thats the reason why I raise this issue - if the matter can be fixed, thats great, but if we come to the conclusion that small/single-core/CPU-bound/whatever systems are better off with SCHED_4BSD, then thats perfectly fine as well. Or maybe, that those systems should disable preemption? I currently don't know, but i hope we can figure this out, as the problem is clearly visible. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Julian Elischer wrote: for a single CPU you really should compile a kernel with SMP turned off and 4BSD scheduler. ULE is just trying too hard to do stuff you don't need. Julian, if we agree on this, I am fine. (This implies that SCHED_4BSD will *not* be retired for an indefinite time!) I tested yesterday, and SCHED_4BSD doesn't show the annoying behaviour. SMP seems to be no problem (and I need that), but PREEMPTION is definitely related to the problem (see my other message sent now). P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
more data: SCHED_ULE+PREEMPTION is the problem (was: kern.sched.quantum: Creepy, sadistic scheduler
Hi all, in the meantime I did some tests and found the following: A. The Problem: --- On a single CPU, there are -exactly- two processes runnable: One is doing mostly compute without I/O - this can be a compressing job or similar; in the tests I used simply an endless-loop. Lets call this the "piglet". The other is doing frequent file reads, but also some compute interim - this can be a backup job traversing the FS, or a postgres VACUUM, or some fast compressor like lz4. Lets call this the "worker". It then happens that the piglet gets 99% CPU, while the worker gets only 0.5% CPU and makes nearly no progress at all. Investigations shows that the worker makes precisely one I/O per timeslice (timeslice as defined in kern.sched.quantum) - or two I/O on a mirrored ZFS. B. Findings: 1. Filesystem I could never reproduce this when reading from plain UFS. Only when reading from ZFS (direct or via l2arc). 2. Machine The problem originally appeared on a pentium3@1GHz. I was able to reproduce it on an i5-3570T, given the following measures: * config in BIOS to use only one CPU * reduce speed: "dev.cpu.0.freq=200" I did see the problem also when running full speed (which means it happens there also), but could not reproduce it well. 3. kern.sched.preempt_thresh I could make the problem disappear by changing kern.sched.preempt_thresh from the default 80 to either 11 (i5-3570T) or 7 (p3) or smaller. This seems to correspond to the disk interrupt threads, which run at intr:12 (i5-3570T) or intr:8 (p3). 4. dynamic behaviour Here the piglet is already running as PID=2119. Then we can watch the dynamic behaviour as follows (on i5-3570T@200MHz): a. with kern.sched.preempt_thresh=80 $ lz4 DATABASE_TEST_FILE /dev/null & while true; do ps -o pid,pri,"%cpu",command -p 2119,$! sleep 3 done [1] 6073 PID PRI %CPU COMMAND 6073 20 0.0 lz4 DATABASE_TEST_FILE /dev/null 2119 100 91.0 -bash (bash) PID PRI %CPU COMMAND 6073 76 15.0 lz4 DATABASE_TEST_FILE /dev/null 2119 95 74.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 19.0 lz4 DATABASE_TEST_FILE /dev/null 2119 94 71.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 16.0 lz4 DATABASE_TEST_FILE /dev/null 2119 95 76.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 14.0 lz4 DATABASE_TEST_FILE /dev/null 2119 96 80.0 -bash (bash) PID PRI %CPU COMMAND 6073 52 12.5 lz4 DATABASE_TEST_FILE /dev/null 2119 96 82.5 -bash (bash) PID PRI %CPU COMMAND 6073 74 10.0 lz4 DATABASE_TEST_FILE /dev/null 2119 98 86.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 8.0 lz4 DATABASE_TEST_FILE /dev/null 2119 98 89.0 -bash (bash) PID PRI %CPU COMMAND 6073 52 7.0 lz4 DATABASE_TEST_FILE /dev/null 2119 98 90.5 -bash (bash) PID PRI %CPU COMMAND 6073 52 6.5 lz4 DATABASE_TEST_FILE /dev/null 2119 99 91.5 -bash (bash) b. with kern.sched.preempt_thresh=11 PID PRI %CPU COMMAND 4920 21 0.0 lz4 DATABASE_TEST_FILE /dev/null 2119 101 93.5 -bash (bash) PID PRI %CPU COMMAND 4920 78 20.0 lz4 DATABASE_TEST_FILE /dev/null 2119 94 70.5 -bash (bash) PID PRI %CPU COMMAND 4920 82 34.5 lz4 DATABASE_TEST_FILE /dev/null 2119 88 54.0 -bash (bash) PID PRI %CPU COMMAND 4920 85 42.5 lz4 DATABASE_TEST_FILE /dev/null 2119 86 45.0 -bash (bash) PID PRI %CPU COMMAND 4920 85 43.5 lz4 DATABASE_TEST_FILE /dev/null 2119 86 44.5 -bash (bash) PID PRI %CPU COMMAND 4920 85 43.0 lz4 DATABASE_TEST_FILE /dev/null 2119 85 45.0 -bash (bash) PID PRI %CPU COMMAND 4920 85 43.0 lz4 DATABASE_TEST_FILE /dev/null 2119 85 45.5 -bash (bash) From this we can see that in case b. both processes balance out nicely and meet at equal CPU shares. Whereas in case a., after about 10 Seconds (the first 3 records) they move to opposite ends of the scale and stay there. From this I might suppose that here is some kind of mis-calculation or mis-adjustment of the task priorities happening. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Appendices - more data: SCHED_ULE+PREEMPTION is the problem
I forgot to attach the commands used to create the logs - they are ugly anyway: [1] dtrace -q -n '::sched_choose:return { @[((struct thread *)arg1)->td_proc->p_pid, stringof(((struct thread *)arg1)->td_proc->p_comm), timestamp] = count(); } tick-1s { exit(0); }' | sort -nk 3 | awk '$1 > 27 {$3 = ($3/100)*1.0/1000; printf "%6d %20s %3.3f\n", $1, $2, $3 }' [2] dtrace -q -n '::runq_choose_from:entry /arg1 == 0||arg1 == 32/ { @[arg1, timestamp] = count(); }' | sort -nk2 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: more data: SCHED_ULE+PREEMPTION is the problem
Hi Stefan, I'm glad to see You're thinking along similar paths as I did. But let me fist answer Your question straight away, and sort out the remainder afterwards. > I'd be interested in your results with preempt_thresh set to a value > of e.g.190. There is no difference. Any value above 7 shows the problem identically. I think this value (or preemtion as a whole) is not the actual cause for the problem; it just changes some conditions that make the problem visible. So, trying to adjust preempt_thresh in order to fix the problem seems to be a dead end. Stefan Esser wrote: The critical use of preempt_thresh is marked above. If it is 0, no preemption will occur. On a single processor system, this should allow the CPU bound thread to run for as long its quantum lasts. I would like to contradict here. From what I understand, preemption is *not* the base of task switching. AFAIK preemption is an additional feature that allows to switch threads while they execute in kernel mode. While executing in user mode, a thread can be interrupted and switched at any time, and that is how the traditional time-sharing systems did it. Traditionally a thread would execute in kernel mode only during interrupts and syscalls, and those last no longer than a few ms, and for long that was not an issue. Only when we got the fast interfaces (10Gbps etc.) and got big monsters executing in kernel space (traffic-shaper, ZFS, etc.), that scheme became problematic and preemption was invented. According to McKusicks book, the scheduler is two-fold: an outer logic runs few times per second and calculates priorities. And an inner logic runs very often (at every interrupt?) and chooses the next runnable thread simply by priority. The meaning of the quantum is then: when it is used up, the thread is moved to the end of it's queue, so that it may take a while until it runs again. This is for implementing round-robin behaviour within a single queue (= a single priority). It should not prevent task-switching as such. Lets have a look. sched_choose() seems to be that low-level scheduler function that decides which thread to run next. Lets create a log of its decisions.[1] With preempt_thresh >= 12 (kernel threads left out): PIDCOMMAND TIMESTAMP 18196 bash 1192.549 18196 bash 1192.554 18196 bash 1192.559 66683 lz4 1192.560 18196 bash 1192.560 18196 bash 1192.562 18196 bash 1192.563 18196 bash 1192.564 79496 ntpd 1192.569 18196 bash 1192.569 18196 bash 1192.574 18196 bash 1192.579 18196 bash 1192.584 18196 bash 1192.588 18196 bash 1192.589 18196 bash 1192.594 18196 bash 1192.599 18196 bash 1192.604 18196 bash 1192.609 18196 bash 1192.613 18196 bash 1192.614 18196 bash 1192.619 18196 bash 1192.624 18196 bash 1192.629 18196 bash 1192.634 18196 bash 1192.638 18196 bash 1192.639 18196 bash 1192.644 18196 bash 1192.649 18196 bash 1192.654 66683 lz4 1192.654 18196 bash 1192.655 18196 bash 1192.655 18196 bash 1192.659 The worker is indeed called only after 95ms. And with preempt_thresh < 8: PIDCOMMAND TIMESTAMP 18196 bash 1268.955 66683 lz4 1268.956 18196 bash 1268.956 66683 lz4 1268.956 18196 bash 1268.957 66683 lz4 1268.957 18196 bash 1268.957 66683 lz4 1268.958 18196 bash 1268.958 66683 lz4 1268.959 18196 bash 1268.959 66683 lz4 1268.959 18196 bash 1268.960 66683 lz4 1268.960 18196 bash 1268.961 66683 lz4 1268.961 18196 bash 1268.961 66683 lz4 1268.962 18196 bash 1268.962 Here we have 3 Csw per millisecond. (The fact that the decisions are over-all more frequent is easily explained: when lz4 gets to run, it will do disk I/O, which quickly returns and triggers new decisions.) In the second record, things are clear: while lz4 does disk I/O, the scheduler MUST run bash, because nothing else is there. But when data arrives, it runs again lz4. But in the first record - why does the scheduler choose bash, although lz4 has already much higher prio (52 versus 97, usually)? A value of 120 (corresponding to PRI=20 in top) will allow the I/O bound thread to preempt any other thread with
Found the issue! - SCHED_ULE+PREEMPTION is the problem
Results: 1. The tdq_ridx pointer The perceived slow advance (of the tdq_ridx pointer into the circular array) is correct behaviour. McKusick writes: The pointer is advanced once per system tick, although it may not advance on a tick until the currently selected queue is empty. Since each thread is given a maximum time slice and no threads may be added to the current position, the queue will drain in a bounded amount of time. Therefore, it is also normal that the process (the piglet in this case) does run until it's time slice (aka quantum) is used up. 2. The influence of preempt_thresh This can be found in tdq_runq_add(). A simplified description of the logic there is as follows: td_priority < 152 ? -> add to realtime-queue td_priority <= 223 ? -> add to timeshare-queue if preempted circular-index = tdq_ridx else circular_index = tdq_idx + td_priority else-> add to idle-queue If the thread had been preempted, it is reinserted at the current working position of the circular array, otherwise the position is calculated from thread priority. 3. The quorum Most of the task switches come from device interrupts. Those are running at priority intr:8 or intr:12. So, as soon as preempt_thresh is 12 or bigger, the piglet is almost always reinserted in the runqueue due to preemption. And, as we see, in that case we do not have a scheduling, we have a simple resume! A real scheduling happens only after the quorum is exhausted. Therefore, reducing the quorum helps. 4. History In r171713 was this behaviour deliberately introduced. In r220198 it was fixed, with a focus on CPU-hogs and single-CPU. In r239157 the fix was undone due to performance considerations, with the focus on rescheduling only at end of the time-slice. 5. Conclusion The current defaults seem not very well suited for certain CPU-intense tasks. Possible solutions are one of: * not use SCHED_ULE * not use preemption * change kern.sched.quorum to minimal value. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
EBFE via freebsd-stable wrote: On Tue, 17 Apr 2018 09:05:48 -0700 Freddie Cash wrote: # Tune for desktop usage kern.sched.preempt_thresh=224 Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor (3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU. For interactive tasks, there is a "special" tunable: % sysctl kern.sched.interact kern.sched.interact: 10 # default is 30 % sysctl -d kern.sched.interact kern.sched.interact: Interactivity score threshold reducing the value from 30 to 10-15 keeps your gui/system responsive, even under high load. Yes, this may improve the "irresponsive-desktop" problem. Because threads that are scored interactive, are run as realtime threads, ahead of all regular workload queues. But it will likely not solve the problem described by George, having two competing batch jobs. And for my problem as described at the beginning of the thread, I could probably tune so far that my "worker" thread would be considered interactive, but then it would just toggle between realtime and timesharing queues - and while this may make things better, it will probably not lead to a smooth system behaviour. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.sched.quantum: Creepy, sadistic scheduler
Hi all of You, thank You very much for Your commenting and reports! From what I see, we have (at least) two rather different demands here: while George looks at the over-all speed of compute throughput, others are concerned about interactive response. My own issue is again a little bit different: I am running this small single-CPU machine as my home-office router, and it also runs a backup service, which involves compressing big files and handling an outgrown database (but that does not need to happen fast, as it's just backup stuff). So, my demand is to maintain a good balance between realtime network activity being immediately served, and low-priority batch compute jobs, while still staying responsive to shell-commands - but the over-all compute throughput is not important here. But then, I find it very difficult to devise some metrics, by which such a demand could be properly measured, to get compareable figures. George Mitchell wrote: I suspect my case (make buildworld while running misc/dnetc) doesn't qualify. However, I just completed a SCHED_ULE run with preempt_thresh set to 5, and "time make buildworld" reports: 7336.748u 677.085s 9:25:19.86 23.6% 27482+473k 42147+431581io 38010pf+0w Much closer to SCHED_4BSD! I'll try preempt_thresh=0 next, and I guess I'll at least try preempt_thresh=224 to see how that works for me. -- George I found that preempt_thresh=0 cannot be used in practice: When I try to do this on my quadcode desktop, and then start four endless-loops to get the cores busy, the (internet)radio will have a dropout every 2-3 seconds (and there is nothing else running, just a sleeping icewm and a mostly sleeping firefox)! So, the (SMP) system *depends* on preemption, it cannot handle streaming data without it. (@George: Your buildworld test is pure batch load, and may not be bothered by this effect.) I think the problem is *not* to be solved by finding a good setting for preempt_thresh (or other tuneables). I think the problem lies deeper, and these tuneables only change its appearance. I have worked out a writeup explaining my thoughts in detail, and I would be glad if You stay tuned and evaluate that. P. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Security patch SA-18:03 removed from 11.2 - why?
Release/update 11.1-p8 introduced so-called "mitigation for speculative execution vulnerabilities". In RElease 11.2 these "mitigation" have been removed. What is the reason for the removal, and specifically why is Security advisory 18:03 still mentioned in the release notes? Behaviour with 11.1-p8: # sysctl hw.ibrs_disable hw.ibrs_disable: 0 # sysctl hw.ibrs_active hw.ibrs_active: 1 Behaviour with 11.2 w/ same CPU + microcode: # sysctl hw.ibrs_disable hw.ibrs_disable: 0 # sysctl hw.ibrs_active hw.ibrs_active: 0 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Problems with pf + ftp-proxy on gateway
--- Renato Botelho <[EMAIL PROTECTED]> wrote: > I'm trying to use pf + ftp-proxy n a 6.1-PRERELEASE machine. > > I have this line on inetd.conf: > > ftp-proxy stream tcp nowait root/usr/libexec/ftp-proxy > > ftp-proxy -n > > And this lines on pf.conf: > > rdr on $int_if proto tcp from any to any port ftp -> 127.0.0.1 port > ftp-proxy > pass in quick on $ext_if inet proto tcp from any port ftp-data to > $ext_if:0 user proxy flags S/SA keep state > > When one machine inside my network (e.g. 192.168.x.x) connects to an > external ftp server (e.g. ftp.FreeBSD.org), data connection doesn't > work. > > Connection comes to my firewall and is accepted but connection is not > established and stay like this here: > > self tcp 200.x.x.x:57625 <- 200.x.x.x:20 ESTABLISHED:FIN_WAIT_2 You need to decide whether you are working with passive ftp clients (probably), active, or both. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Problems with make world on latest cvsup RELENG_4
Hi, I have a weird problem making world from the latest cvsup RELENG_4 (today) from cvsup2.nl.freebsd.org. I had no problems with several boxes, but this one gives some weird error. You guys maybe have any solution? Here's the error (regenerated with commandprompt, but its the same error as in make world): su-2.05# cd /usr/src/sys/boot/i386/boot2 su-2.05# make dd if=/dev/zero of=boot2.ldr bs=512 count=1 2>/dev/null *** Error code 126 Stop in /usr/src/sys/boot/i386/boot2. su-2.05# ls -la total 36 drwxr-xr-x 2 root wheel512 Jul 11 02:10 . drwxr-xr-x 11 root wheel512 Jul 11 02:10 .. -rw-r--r-- 1 root wheel 2074 Jul 7 2000 Makefile -rw-r--r-- 1 root wheel 9897 Jul 7 2000 boot1.s -rw-r--r-- 1 root wheel 16632 Jul 7 2000 boot2.c -rw-r--r-- 1 root wheel679 Aug 28 1999 lib.h -rw-r--r-- 1 root wheel 2205 Aug 28 1999 sio.s su-2.05# With kind regards, Peter Batenburg To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: classes and kernel_cookie was Re: Specifying root mount options on diskless boot.
On 2011-Jan-09 10:32:48 -0500, Daniel Feenberg wrote: >Daniel Braniss writes... >> I have it pxebooting nicely and running with an NFS root >> but it then reports locking problems: devd, syslogd, moused (and maybe Actually, that was me, not Daniel. >Are you mounting /var via nfs? Yes. I'm using "diskless" in the traditional Sun workstation style - the system itself is running with a normal filesystem which is all NFS mounted from another (FreeBSD) server. I'm aware of the MFS-based read-only approach but didn't want to use that approach. >I note that the response to your message from "danny" offers the ability >to pass arguments to the nfs mount command, Actually, my original mail indicates that that I'm aware you can pass options to the NFS mount command (passing nolockd will solve my problem). My issue is that there are several incompatible approaches and none of them work by default. > but also seems to offer a fix >for the fact that "classes" are not supported under PXE: > >http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90368 I wasn't previously aware of that PR but it is consistent with my findings. On 2011-Jan-10 10:52:34 +0200, Daniel Braniss wrote: >I'm willing to try and add the missing pieces, but I need some better >explanantion as to what they are, for example, I have no clue what the >kernel_cookie is used for, nor what the ${class} is all about. I'm also happy to patch the code but feel that both PXE and BOOTP should be consistent and I'm not sure which is the correct approach. >BTW, it would be kind if the line in the pxeboot(8): > As PXE is still in its infancy ... >can be changed :-) Well, there are still some issues with PXE booting FreeBSD - eg as discussed here. But, I agree, that comment can probably go. -- Peter Jeremy pgp1ovrNzYvjf.pgp Description: PGP signature
Re: sed is broken under freebsd?
On 2011-Jan-12 02:32:52 +0100, Oliver Pinter wrote: >The freebsd versions of sed contained a bug/regression, when \n char >can i subsitue, gsed not affected with this bug: gsed contains non-standard extensions and you have been suckered into using them. Try using 'gsed --posix' and/or setting POSIXLY_CORRECT. This is part of the GNU/FSF "lockin" policy that encourages people to use their non-standard extensions to ensure that you don't have any choice other than to use their software. -- Peter Jeremy pgpd7zj0Dn2kG.pgp Description: PGP signature
Re: system crash during make installworld
On 2011-Feb-21 08:04:00 +, David J Brooks wrote: >As the subject suggests, my laptop crashed during make installworld. >The new kernel boots, but the ELF interpreter is not found and I >cannot get to a single user prompt. What is the least painful way to >proceed? My first suggestion would be to boot the previous kernel. If that doesn't help, try specifying /rescue/sh as the single-user shell. If neither of those work, please specify the exact error message you get and the point where you get it (if you don't have a serial console available, post a link to picture of the screen showing the issue). -- Peter Jeremy pgptX1VTtosbn.pgp Description: PGP signature
Linker set issues with ath(4) HALs
I have a Atheros AR5424 and so, based on the 8.2-STABLE i386 NOTES and some rummaging in the sources, I tried to build a kernel with: device ath # Atheros pci/cardbus NIC's device ath_ar5212 # HAL for Atheros AR5212 and derived chips device ath_rate_sample # SampleRate tx rate control for ath and this died during the kernel linking with: linking kernel.debug ah.o(.text+0x23c): In function `ath_hal_rfprobe': /usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__start_set_ah_rf s' ah.o(.text+0x241):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__stop_set_ah_rfs' ah.o(.text+0x25a):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__stop_set_ah_rfs' Following a suggestion by a friend, I changed that to: device ath # Atheros pci/cardbus NIC's options AH_SUPPORT_AR5416 device ath_hal # Atheros HAL device ath_rate_sample # SampleRate tx rate control for ath and it worked. Normally, I would leave it at that but I'd like to understand what is actually going on... In both cases, ah.o contains the following 4 references: U __start_set_ah_chips U __start_set_ah_rfs U __stop_set_ah_chips U __stop_set_ah_rfs generated by: /* linker set of registered chips */ OS_SET_DECLARE(ah_chips, struct ath_hal_chip); /* linker set of registered RF backends */ OS_SET_DECLARE(ah_rfs, struct ath_hal_rf); These symbols do not appear in any other .o files, though there are a variety of other __{start,stop}_set_* symbols - all of which show up as 'A' (absolule) values in the final kernel. My questions are: How are these linker set references resolved? I can't find anything that defines these symbols - either in .o files or in ldscript files. In the first case, there are 2 pairs of undefined linker set variables but the linker only reports one pair as unresolved. Why don't both sets show up as resolved or unresolved? Why does using the generic "ath_hal", rather than the hardware-specific HAL fix the problem? -- Peter Jeremy pgpL0aiDu3NMN.pgp Description: PGP signature
Re: Linker set issues with ath(4) HALs
On 2011-Mar-05 11:48:54 +0200, Kostik Belousov wrote: >On Sat, Mar 05, 2011 at 07:50:05PM +1100, Peter Jeremy wrote: >> I have a Atheros AR5424 and so, based on the 8.2-STABLE i386 NOTES >> and some rummaging in the sources, I tried to build a kernel with: >> >> device ath # Atheros pci/cardbus NIC's >> device ath_ar5212 # HAL for Atheros AR5212 and derived >> chips >> device ath_rate_sample # SampleRate tx rate control for ath ... >> These symbols do not appear in any other .o files, though there are a >> variety of other __{start,stop}_set_* symbols - all of which show up >> as 'A' (absolule) values in the final kernel. >> >> My questions are: >> How are these linker set references resolved? I can't find anything >> that defines these symbols - either in .o files or in ldscript files. ... >Linker synthesizes the symbols assuming the following two conditions are >met: >- the symbols are referenced; >- there exists an ELF section named `set_ah_rfs'. >It assigns the (relocated) start of the section to __start_, >and end to __stop_. Thank you for that. Looking through the output of 'objdump -h' showed that it was user error: When using "device ath_ar", it looks like you need to include a "device ath_rf" as well. After a closer look at my dmesg and available options, I've add "device ath_rf2425" and things seem much happier. -- Peter Jeremy pgpI70wFRMecY.pgp Description: PGP signature
t_delta too short while trying to enable C3/TurboBoost
Hello I'm trying to enable C3 states to allow TurboBoost on RELENG_8_2 and dmesg is throwing a lot of t_delta too short messages while using boot -v. This platform is 2x Xeon E5620 Gulftown quad core 2.4ghz CPUs on whatever boards Dell ships them on these days (probably Intel X58 derivative.) Kernel is GENERIC for the most part (with network drivers stripped out). Timecounter "i8254" frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 2394012372 Hz TSC: P-state invariant ACPI timer: 1/2 1/2 1/1 1/2 1/2 1/1 1/2 1/2 1/2 1/1 -> 10 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 Here are my loader.conf: hint.p4tcc.0.disabled=1 hint.acpi_throttle.0.disabled=1 kern.hz=100 hint.apic.0.clock=0 hint.atrtc.0.clock=0 rc.conf: performance_cpu_freq="NONE" # Online CPU frequency economy_cpu_freq="NONE" # Offline CPU frequency performance_cx_lowest="C3" # Online CPU idle state economy_cx_lowest="C3" # Offline CPU idle state and here is sysctl dev.cpu: dev.cpu.0.%desc: ACPI CPU dev.cpu.0.%driver: cpu dev.cpu.0.%location: handle=\_PR_.CPU1 dev.cpu.0.%pnpinfo: _HID=none _UID=0 dev.cpu.0.%parent: acpi0 dev.cpu.0.cx_supported: C1/1 C2/64 C3/96 dev.cpu.0.cx_lowest: C3 dev.cpu.0.cx_usage: 0.90% 0.45% 98.64% last 4096us dev.cpu.1.%desc: ACPI CPU dev.cpu.1.%driver: cpu dev.cpu.1.%location: handle=\_PR_.CPU2 dev.cpu.1.%pnpinfo: _HID=none _UID=0 dev.cpu.1.%parent: acpi0 dev.cpu.1.cx_supported: C1/1 C2/64 C3/96 dev.cpu.1.cx_lowest: C3 dev.cpu.1.cx_usage: 0.68% 0.34% 98.96% last 2965us dev.cpu.2.%desc: ACPI CPU dev.cpu.2.%driver: cpu dev.cpu.2.%location: handle=\_PR_.CPU3 dev.cpu.2.%pnpinfo: _HID=none _UID=0 dev.cpu.2.%parent: acpi0 dev.cpu.2.cx_supported: C1/1 C2/64 C3/96 dev.cpu.2.cx_lowest: C3 dev.cpu.2.cx_usage: 0.94% 0.66% 98.38% last 2081us dev.cpu.3.%desc: ACPI CPU dev.cpu.3.%driver: cpu dev.cpu.3.%location: handle=\_PR_.CPU4 dev.cpu.3.%pnpinfo: _HID=none _UID=0 dev.cpu.3.%parent: acpi0 dev.cpu.3.cx_supported: C1/1 C2/64 C3/96 dev.cpu.3.cx_lowest: C3 dev.cpu.3.cx_usage: 0.81% 0.58% 98.59% last 4124us dev.cpu.4.%desc: ACPI CPU dev.cpu.4.%driver: cpu dev.cpu.4.%location: handle=\_PR_.CPU5 dev.cpu.4.%pnpinfo: _HID=none _UID=0 dev.cpu.4.%parent: acpi0 dev.cpu.4.cx_supported: C1/1 C2/64 C3/96 dev.cpu.4.cx_lowest: C3 dev.cpu.4.cx_usage: 1.07% 0.68% 98.23% last 5046us dev.cpu.5.%desc: ACPI CPU dev.cpu.5.%driver: cpu dev.cpu.5.%location: handle=\_PR_.CPU6 dev.cpu.5.%pnpinfo: _HID=none _UID=0 dev.cpu.5.%parent: acpi0 dev.cpu.5.cx_supported: C1/1 C2/64 C3/96 dev.cpu.5.cx_lowest: C3 dev.cpu.5.cx_usage: 3.01% 1.74% 95.24% last 4504us dev.cpu.6.%desc: ACPI CPU dev.cpu.6.%driver: cpu dev.cpu.6.%location: handle=\_PR_.CPU7 dev.cpu.6.%pnpinfo: _HID=none _UID=0 dev.cpu.6.%parent: acpi0 dev.cpu.6.cx_supported: C1/1 C2/64 C3/96 dev.cpu.6.cx_lowest: C3 dev.cpu.6.cx_usage: 2.45% 1.89% 95.65% last 3506us dev.cpu.7.%desc: ACPI CPU dev.cpu.7.%driver: cpu dev.cpu.7.%location: handle=\_PR_.CPU8 dev.cpu.7.%pnpinfo: _HID=none _UID=0 dev.cpu.7.%parent: acpi0 dev.cpu.7.cx_supported: C1/1 C2/64 C3/96 dev.cpu.7.cx_lowest: C3 dev.cpu.7.cx_usage: 1.21% 0.77% 98.00% last 4180us Should I increment kern.hz until the t_delta too short goes away (I hear that at kern.hz=1000, each core is woken so much by the interrupt counter that it can never enter C3 state) or is there another knob I am supposed to tune? My goal is to see if I can get the box into turboboost as much as possible during idle. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On 2011-May-02 16:32:30 +0200, Olaf Seibert wrote: >However, it doesn't automatically reboot in 15 seconds, as promised. >It just sits there the whole weekend, until I log onto the IPMI console >and press the virtual reset button. Your reference to IMPI indicates this is not a consumer PC. Can you please provide some details of the hardware. Are you running ipmitools or similar? Does "shutdown -r" or "reboot" work normally? >panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated I suggest you have a read of the thread beginning http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010862.html (note that mailman has split it into at least 3 threads). -- Peter Jeremy pgpQGveibDZlq.pgp Description: PGP signature
Re: current status of digi driver
On 2011-Jun-23 17:55:15 -0400, David Boyd wrote: >It appears that there was also agreement that (at least) some of the >drivers, digi included, would be converted soon after 8.0-RELEASE. That came down to developer time and it appeared that I was the only person interested in it. >Is there any plan to bring digi forward? See kern/158086 (which updates digi(4)) and kern/152254 (which re- implements TTY functionality that was lost with TTYng). Both include patches that should work on either 8.x or -current. Of the two, the latter is more urgent because it impacts the KBI. >We have about 55 modem ports over ten 8-port Xr cards (PCI) that connect >remote sites via dial-up. I've only got access to PCI Xem cards that are used for serial console concentration so it would be useful for you to test both the Xr cards and dial-in support. -- Peter Jeremy pgpjTK58QmWt2.pgp Description: PGP signature
Re: scp: Write Failed: Cannot allocate memory
Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708 The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. And configuration of the pool. e.g. ( zpool status -a ;zfs get all ) You should probably prop this information up somewhere so you can reference by URL whenever needed. rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made to use ssh(1) instead of rsh(1) and I believe that is what Jeremy is stating here but correct me if I am wrong. It does use ssh(1) by default. Its a possiblity as well that if using tmpfs(5) or mdmfs(8) for /tmp type filesystems that rsync(1) may be just filling up your temp ram area and causing the connection abort which would be expected. ( df -h ) would help here. Hello, I'm not using tmpfs/mdmfs at all. The clients yesterday were 3 different OSX computers (over gigabit). The FreeBSD server has 12gb of ram and no bce adapter. For what it's worth, the server is backed up remotely every night with rsync (remote FreeBSD uses rsync to pull) to an offsite (slow cable connection) FreeBSD computer, and I have not seen any errors in the nightly rsync. Sorry for the omission of networking info, here's the output of the requested commands and some that popped up in the other thread: http://www.cap-press.com/misc/ In rc.conf: ifconfig_em1="inet 10.1.1.1 netmask 255.255.0.0" Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: scp: Write Failed: Cannot allocate memory
Hi all, just as an addition: an upgrade to last Friday's FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the problem. I will experiment a bit more tomorrow after hours and grab some statistics. Regards Peter Quoting "Peter Ross" : Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708 The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. And configuration of the pool. e.g. ( zpool status -a ;zfs get all ) You should probably prop this information up somewhere so you can reference by URL whenever needed. rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made to use ssh(1) instead of rsh(1) and I believe that is what Jeremy is stating here but correct me if I am wrong. It does use ssh(1) by default. Its a possiblity as well that if using tmpfs(5) or mdmfs(8) for /tmp type filesystems that rsync(1) may be just filling up your temp ram area and causing the connection abort which would be expected. ( df -h ) would help here. Hello, I'm not using tmpfs/mdmfs at all. The clients yesterday were 3 different OSX computers (over gigabit). The FreeBSD server has 12gb of ram and no bce adapter. For what it's worth, the server is backed up remotely every night with rsync (remote FreeBSD uses rsync to pull) to an offsite (slow cable connection) FreeBSD computer, and I have not seen any errors in the nightly rsync. Sorry for the omission of networking info, here's the output of the requested commands and some that popped up in the other thread: http://www.cap-press.com/misc/ In rc.conf: ifconfig_em1="inet 10.1.1.1 netmask 255.255.0.0" Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/l
Re: scp: Write Failed: Cannot allocate memory
Quoting "Jeremy Chadwick" : On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: I'm running virtualbox 3.2.12_1 if that has anything to do with it. sysctl vfs.zfs.arc_max: 62 While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering right around that value, sometimes above, sometimes below (that's as it should be, right?). I don't think that it dies when crossing over arc_max. I can run the same scp 10 times and it might fail 1-3 times, with no correlation to the arcstats.size being above/below arc_max that I can see. Scott On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: Hi all, just as an addition: an upgrade to last Friday's FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the problem. I will experiment a bit more tomorrow after hours and grab some statistics. Regards Peter Quoting "Peter Ross" : Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708 The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. And configuration of the pool. e.g. ( zpool status -a ;zfs get all ) You should probably prop this information up somewhere so you can reference by URL whenever needed. rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made to use ssh(1) instead of rsh(1) and I believe that is what Jeremy is stating here but correct me if I am wrong. It does use ssh(1) by default. Its a possiblity as well that if using tmpfs(5) or mdmfs(8) for /tmp type filesystems that rsync(1) may be just filling up your temp ram area and causing the connection abort which would be expected. ( df -h ) would help here. Hello, I'm not using tmpfs/mdmfs at all. The clients yesterday were 3 different OSX computers (over gigabit). The FreeBSD server
Re: scp: Write Failed: Cannot allocate memory
Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : >On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >>I'm running virtualbox 3.2.12_1 if that has anything to do with it. >> >>sysctl vfs.zfs.arc_max: 62 >> >>While I'm trying to scp, kstat.zfs.misc.arcstats.size is >>hovering right around that value, sometimes above, sometimes >>below (that's as it should be, right?). I don't think that it >>dies when crossing over arc_max. I can run the same scp 10 times >>and it might fail 1-3 times, with no correlation to the >>arcstats.size being above/below arc_max that I can see. >> >>Scott >> >>On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >> >>>Hi all, >>> >>>just as an addition: an upgrade to last Friday's >>>FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the >>>problem. >>> >>>I will experiment a bit more tomorrow after hours and grab some statistics. >>> >>>Regards >>>Peter >>> >>>Quoting "Peter Ross" : >>> >>>>Hi all, >>>> >>>>I noticed a similar problem last week. It is also very >>>>similar to one reported last year: >>>> >>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html >>>> >>>>My server is a Dell T410 server with the same bge card (the >>>>same pciconf -lvc output as described by Mahlon: >>>> >>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html >>>> >>>>Yours, Scott, is a em(4).. >>>> >>>>Another similarity: In all cases we are using VirtualBox. I >>>>just want to mention it, in case it matters. I am still >>>>running VirtualBox 3.2. >>>> >>>>Most of the time kstat.zfs.misc.arcstats.size was reaching >>>>vfs.zfs.arc_max then, but I could catch one or two cases >>>>then the value was still below. >>>> >>>>I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. >>>> >>>>BTW: It looks as ARC only gives back the memory when I >>>>destroy the ZFS (a cloned snapshot containing virtual >>>>machines). Even if nothing happens for hours the buffer >>>>isn't released.. >>>> >>>>My machine was still running 8.2-PRERELEASE so I am upgrading. >>>> >>>>I am happy to give information gathered on old/new kernel if it helps. >>>> >>>>Regards >>>>Peter >>>> >>>>Quoting "Scott Sipe" : >>>> >>>>> >>>>>On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>> >>>>>>On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: >>>>>>>On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: >>>>>>>>I'm running 8.2-RELEASE and am having new problems >>>>>>>>with scp. When scping >>>>>>>>files to a ZFS directory on the FreeBSD server -- >>>>>>>>most notably large files >>>>>>>>-- the transfer frequently dies after just a few >>>>>>>>seconds. In my last test, I >>>>>>>>tried to scp an 800mb file to the FreeBSD system and >>>>>>>>the transfer died after >>>>>>>>200mb. It completely copied the next 4 times I >>>>>>>>tried, and then died again on >>>>>>>>the next attempt. >>>>>>>> >>>>>>>>On the client side: >>>>>>>> >>>>>>>>"Connection to home closed by remote host. >>>>>>>>lost connection" >>>>>>>> >>>>>>>>In /var/log/auth.log: >>>>>>>> >>>>>>>>Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write >>>>>>>>failed: Cannot allocate >>>>>>>>memory >>>>>>>> >>>>>>>>I've never seen this before and have used scp before >>>>>>>>to transfer large files >>>>>>>>without problems. This computer has been used in >>>>>>>>production for months and >>>>>>>>has a current uptime of 36 days. I have not been &
Re: scp: Write Failed: Cannot allocate memory
Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : >On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: >>Quoting "Jeremy Chadwick" : >> >>>On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >>>>I'm running virtualbox 3.2.12_1 if that has anything to do with it. >>>> >>>>sysctl vfs.zfs.arc_max: 62 >>>> >>>>While I'm trying to scp, kstat.zfs.misc.arcstats.size is >>>>hovering right around that value, sometimes above, sometimes >>>>below (that's as it should be, right?). I don't think that it >>>>dies when crossing over arc_max. I can run the same scp 10 times >>>>and it might fail 1-3 times, with no correlation to the >>>>arcstats.size being above/below arc_max that I can see. >>>> >>>>Scott >>>> >>>>On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >>>> >>>>>Hi all, >>>>> >>>>>just as an addition: an upgrade to last Friday's >>>>>FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the >>>>>problem. >>>>> >>>>>I will experiment a bit more tomorrow after hours and grab >>some statistics. >>>>> >>>>>Regards >>>>>Peter >>>>> >>>>>Quoting "Peter Ross" : >>>>> >>>>>>Hi all, >>>>>> >>>>>>I noticed a similar problem last week. It is also very >>>>>>similar to one reported last year: >>>>>> >>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html >>>>>> >>>>>>My server is a Dell T410 server with the same bge card (the >>>>>>same pciconf -lvc output as described by Mahlon: >>>>>> >>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html >>>>>> >>>>>>Yours, Scott, is a em(4).. >>>>>> >>>>>>Another similarity: In all cases we are using VirtualBox. I >>>>>>just want to mention it, in case it matters. I am still >>>>>>running VirtualBox 3.2. >>>>>> >>>>>>Most of the time kstat.zfs.misc.arcstats.size was reaching >>>>>>vfs.zfs.arc_max then, but I could catch one or two cases >>>>>>then the value was still below. >>>>>> >>>>>>I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. >>>>>> >>>>>>BTW: It looks as ARC only gives back the memory when I >>>>>>destroy the ZFS (a cloned snapshot containing virtual >>>>>>machines). Even if nothing happens for hours the buffer >>>>>>isn't released.. >>>>>> >>>>>>My machine was still running 8.2-PRERELEASE so I am upgrading. >>>>>> >>>>>>I am happy to give information gathered on old/new kernel if it helps. >>>>>> >>>>>>Regards >>>>>>Peter >>>>>> >>>>>>Quoting "Scott Sipe" : >>>>>> >>>>>>> >>>>>>>On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>>>> >>>>>>>>On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: >>>>>>>>>On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: >>>>>>>>>>I'm running 8.2-RELEASE and am having new problems >>>>>>>>>>with scp. When scping >>>>>>>>>>files to a ZFS directory on the FreeBSD server -- >>>>>>>>>>most notably large files >>>>>>>>>>-- the transfer frequently dies after just a few >>>>>>>>>>seconds. In my last test, I >>>>>>>>>>tried to scp an 800mb file to the FreeBSD system and >>>>>>>>>>the transfer died after >>>>>>>>>>200mb. It completely copied the next 4 times I >>>>>>>>>>tried, and then died again on >>>>>>>>>>the next attempt. >>>>>>>>>> >>>>>>>>>>On the client side: >>>>>>>>>> >>>>
Re: scp: Write Failed: Cannot allocate memory
Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : >On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: >>Quoting "Jeremy Chadwick" : >> >>>On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: >>>>Quoting "Jeremy Chadwick" : >>>> >>>>>On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >>>>>>I'm running virtualbox 3.2.12_1 if that has anything to do with it. >>>>>> >>>>>>sysctl vfs.zfs.arc_max: 62 >>>>>> >>>>>>While I'm trying to scp, kstat.zfs.misc.arcstats.size is >>>>>>hovering right around that value, sometimes above, sometimes >>>>>>below (that's as it should be, right?). I don't think that it >>>>>>dies when crossing over arc_max. I can run the same scp 10 times >>>>>>and it might fail 1-3 times, with no correlation to the >>>>>>arcstats.size being above/below arc_max that I can see. >>>>>> >>>>>>Scott >>>>>> >>>>>>On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >>>>>> >>>>>>>Hi all, >>>>>>> >>>>>>>just as an addition: an upgrade to last Friday's >>>>>>>FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the >>>>>>>problem. >>>>>>> >>>>>>>I will experiment a bit more tomorrow after hours and grab >>>>some statistics. >>>>>>> >>>>>>>Regards >>>>>>>Peter >>>>>>> >>>>>>>Quoting "Peter Ross" : >>>>>>> >>>>>>>>Hi all, >>>>>>>> >>>>>>>>I noticed a similar problem last week. It is also very >>>>>>>>similar to one reported last year: >>>>>>>> >>>>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html >>>>>>>> >>>>>>>>My server is a Dell T410 server with the same bge card (the >>>>>>>>same pciconf -lvc output as described by Mahlon: >>>>>>>> >>>>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html >>>>>>>> >>>>>>>>Yours, Scott, is a em(4).. >>>>>>>> >>>>>>>>Another similarity: In all cases we are using VirtualBox. I >>>>>>>>just want to mention it, in case it matters. I am still >>>>>>>>running VirtualBox 3.2. >>>>>>>> >>>>>>>>Most of the time kstat.zfs.misc.arcstats.size was reaching >>>>>>>>vfs.zfs.arc_max then, but I could catch one or two cases >>>>>>>>then the value was still below. >>>>>>>> >>>>>>>>I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it >>does not help. >>>>>>>> >>>>>>>>BTW: It looks as ARC only gives back the memory when I >>>>>>>>destroy the ZFS (a cloned snapshot containing virtual >>>>>>>>machines). Even if nothing happens for hours the buffer >>>>>>>>isn't released.. >>>>>>>> >>>>>>>>My machine was still running 8.2-PRERELEASE so I am upgrading. >>>>>>>> >>>>>>>>I am happy to give information gathered on old/new kernel if it helps. >>>>>>>> >>>>>>>>Regards >>>>>>>>Peter >>>>>>>> >>>>>>>>Quoting "Scott Sipe" : >>>>>>>> >>>>>>>>> >>>>>>>>>On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>>>>>> >>>>>>>>>>On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: >>>>>>>>>>>On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: >>>>>>>>>>>>I'm running 8.2-RELEASE and am having new problems >>>>>>>>>>>>with scp. When scping >>>>>>>>>>>>files to a ZFS directory on th
Re: scp: Write Failed: Cannot allocate memory
Quoting "Peter Ross" : Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: I'm running virtualbox 3.2.12_1 if that has anything to do with it. sysctl vfs.zfs.arc_max: 62 While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering right around that value, sometimes above, sometimes below (that's as it should be, right?). I don't think that it dies when crossing over arc_max. I can run the same scp 10 times and it might fail 1-3 times, with no correlation to the arcstats.size being above/below arc_max that I can see. Scott On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: Hi all, just as an addition: an upgrade to last Friday's FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the problem. I will experiment a bit more tomorrow after hours and grab some statistics. Regards Peter Quoting "Peter Ross" : Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708 The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. And configuration of the pool. e.g. ( zpool status -a ;zfs get all ) You should probably prop this information up somewhere so you can reference by URL whenever needed. rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made to use ssh(1) instead of rsh(1) and I believe that is what Jeremy is stating here but correct me if I am wrong. It does use ssh(1) by default. Its a possiblity as well that if using tmpfs(5) or mdmfs(8) f
Re: scp: Write Failed: Cannot allocate memory
Quoting "Peter Ross" : Quoting "Peter Ross" : Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: I'm running virtualbox 3.2.12_1 if that has anything to do with it. sysctl vfs.zfs.arc_max: 62 While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering right around that value, sometimes above, sometimes below (that's as it should be, right?). I don't think that it dies when crossing over arc_max. I can run the same scp 10 times and it might fail 1-3 times, with no correlation to the arcstats.size being above/below arc_max that I can see. Scott On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: Hi all, just as an addition: an upgrade to last Friday's FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the problem. I will experiment a bit more tomorrow after hours and grab some statistics. Regards Peter Quoting "Peter Ross" : Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708 The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. And configuration of the pool. e.g. ( zpool status -a ;zfs get all ) You should probably prop this information up somewhere so you can reference by URL whenever needed. rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made to use ssh(1) instead of rsh(1) and I believe that is what Jeremy is stating here but correct me if I am wrong. It does use ssh(1) by default. Its a possib
Re: scp: Write Failed: Cannot allocate memory
Quoting "Scott Sipe" : On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross wrote: Quoting "Peter Ross" : Quoting "Peter Ross" : Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: I'm running virtualbox 3.2.12_1 if that has anything to do with it. sysctl vfs.zfs.arc_max: 62 While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering right around that value, sometimes above, sometimes below (that's as it should be, right?). I don't think that it dies when crossing over arc_max. I can run the same scp 10 times and it might fail 1-3 times, with no correlation to the arcstats.size being above/below arc_max that I can see. Scott On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: Hi all, just as an addition: an upgrade to last Friday's FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the problem. I will experiment a bit more tomorrow after hours and grab some statistics. Regards Peter Quoting "Peter Ross" : Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** September/058708.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html> My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** September/058711.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html> Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/**sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** September/thread.html#58708<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708> The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. And configuration of the poo
Re: Status of support for 4KB disk sectors
On 2011-Jul-19 10:54:38 -0700, Chuck Swiger wrote: > Unix operating >systems like SunOS 3 and NEXTSTEP would happily run with a DEV_BSIZE >of 1024 or larger-- they'd boot fine off of optical media using >2048-byte sectors, Actually, Sun used customised CD-ROM drives that faked 512-byte sectors to work around their lack of support for anything else. > some of the early 1990's era SCSI >hard drives supported low-level reformatting to a different sector >size like 1024 or 2048 bytes. Did anyone actually do this? I wanted to but was warned against it by the local OS rep (this was a Motorola SVR2). -- Peter Jeremy pgp9GiYFCh7fP.pgp Description: PGP signature
Re: scp: Write Failed: Cannot allocate memory - Problem found and solved (for me)
Quoting "Peter Ross" : Quoting "Scott Sipe" : On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross wrote: Quoting "Peter Ross" : Quoting "Peter Ross" : Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: Quoting "Jeremy Chadwick" : On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: I'm running virtualbox 3.2.12_1 if that has anything to do with it. sysctl vfs.zfs.arc_max: 62 While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering right around that value, sometimes above, sometimes below (that's as it should be, right?). I don't think that it dies when crossing over arc_max. I can run the same scp 10 times and it might fail 1-3 times, with no correlation to the arcstats.size being above/below arc_max that I can see. Scott On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: Hi all, just as an addition: an upgrade to last Friday's FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the problem. I will experiment a bit more tomorrow after hours and grab some statistics. Regards Peter Quoting "Peter Ross" : Hi all, I noticed a similar problem last week. It is also very similar to one reported last year: http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** September/058708.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html> My server is a Dell T410 server with the same bge card (the same pciconf -lvc output as described by Mahlon: http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** September/058711.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html> Yours, Scott, is a em(4).. Another similarity: In all cases we are using VirtualBox. I just want to mention it, in case it matters. I am still running VirtualBox 3.2. Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max then, but I could catch one or two cases then the value was still below. I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help. BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a cloned snapshot containing virtual machines). Even if nothing happens for hours the buffer isn't released.. My machine was still running 8.2-PRERELEASE so I am upgrading. I am happy to give information gathered on old/new kernel if it helps. Regards Peter Quoting "Scott Sipe" : On Jul 2, 2011, at 12:54 AM, jhell wrote: On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: I'm running 8.2-RELEASE and am having new problems with scp. When scping files to a ZFS directory on the FreeBSD server -- most notably large files -- the transfer frequently dies after just a few seconds. In my last test, I tried to scp an 800mb file to the FreeBSD system and the transfer died after 200mb. It completely copied the next 4 times I tried, and then died again on the next attempt. On the client side: "Connection to home closed by remote host. lost connection" In /var/log/auth.log: Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate memory I've never seen this before and have used scp before to transfer large files without problems. This computer has been used in production for months and has a current uptime of 36 days. I have not been able to notice any problems copying files to the server via samba or netatalk, or any problems in apache. Uname: FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST 2011 root@xeon:/usr/obj/usr/src/**sys/GENERIC amd64 I've attached my dmesg and output of vmstat -z. I have not restarted the sshd daemon or rebooted the computer. Am glad to provide any other information or test anything else. {snip vmstat -z and dmesg} You didn't provide details about your networking setup (rc.conf, ifconfig -a, etc.). netstat -m would be useful too. Next, please see this thread circa September 2010, titled "Network memory allocation failures": http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** September/thread.html#58708<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708> The user in that thread is using rsync, which relies on scp by default. I believe this problem is similar, if not identical, to yours. Please also provide your output of ( /usr/bin/limits -a ) for the server end and the client. I am not quite sure I agree with the need for ifconfig -a but some information about the networking driver your using for the interface would be helpful, uptime of the boxes. An
Re: SATA 6g 4-port non-RAID controller ?
On 2011-Jul-28 17:57:52 +1000, Jan Mikkelsen wrote: >On 28/07/2011, at 2:55 AM, Jeremy Chadwick wrote: >> I can find you examples on Google of people who invested in Areca >> ARC-1220 cards (PCIe x8) only to find out that when inserted into one of >> their two PCIe x16 slots the mainboard wouldn't start (see above). I can >> also find you examples on Google of people with Intel 915GM chipsets >> whose user manuals explicitly state the PCIe x16 slot on their board is >> "intended for use with graphics cards only". > >Just trying to understand; I think I can recall reading about issues >with the 915 chipset. I agree a "check, don't assume" warning is >reasonable. I have also run into problems (wouldn't POST from memory) trying to use a NIC in the x16 slot of Dell GX620 boxes, which use an i945 chipset. -- Peter Jeremy pgpxVKRjcgfQY.pgp Description: PGP signature
Re: ZFS directory with a large number of files
On 2011-Aug-02 08:39:03 +0100, "seanr...@gmail.com" wrote: >On my FreeBSD 8.2-S machine (built circa 12th June), I created a >directory and populated it over the course of 3 weeks with about 2 >million individual files. As you might imagine, a 'ls' of this >directory took quite some time. > >The files were conveniently named with a timestamp in the filename >(still images from a security camera, once per second) so I've since >moved them all to timestamped directories (/MM/dd/hh/mm). What I >found though was the original directory the images were in is still >very slow to ls -- and it only has 1 file in it, another directory. I've also seen this behaviour on Solaris 10 after cleaning out a directory with a large number of files (though not as pathological as your case). I tried creating and deleting entries in an unsuccessful effort to trigger directory compaction. I wound up moving the remaining contents into a new directory, deleting the original one and renaming the new directory. It would appear te be a garbage collection bug in ZFS. On 2011-Aug-02 13:10:27 +0300, Daniel Kalchev wrote: >On 02.08.11 12:46, Daniel O'Connor wrote: >> I am pretty sure UFS does not have this problem. i.e. once you >> delete/move the files out of the directory its performance would be >> good again. > >UFS would be the classic example of poor performance if you do this. Traditional UFS (including Solaris) behave badly in this scenario but 4.4BSD derivatives will release unused space at the end of a directory and have smarts to more efficiently skip unused entries at the start of a directory. -- Peter Jeremy pgpmdeH6w8Ny5.pgp Description: PGP signature
Upgrade to 7.4
Hi All I just ran freebsd-update to upgrade from 7.0 to 7.4 I figured everything went ok. This is what I did. 1. freebsd-update upgrade -r 7.4-RELEASE 2. freebsd-update install 3. shutdown -r now 4. freebsd-update install 5. shutdown -r now The system came back up ok but now if I run another freebsd-update fetch, I get this error below config_IDSIgnorePaths: not found Error processing configuration file, line 26: ==> IDSIgnorePaths /usr/share/man/cat Is this an error I need to worry about? How can I correct this if so? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On 2011-Aug-19 20:24:38 -0700, Jeremy Chadwick wrote: >The reallocated LBA cannot be dealt with aside from re-creating the >filesystem and telling it not to use the LBA. I see no flags in >newfs(8) that indicate a way to specify LBAs to avoid. And we don't >know what LBA it is so we can't refer to it right now anyway. > >As I said previously, I have no idea how UFS/FFS deals with this. It doesn't. UFS/FFS and ZFS expect and assume "perfect" media. It's up to the drive to transparently remap faulty sectors. UFS used to have support for visible bad sectors (and Solaris UFS still reserves space for this, though I don't know if it still works) but the code was removed from FreeBSD long ago. AFAIR, wd(4) supported bad sectors but it was removed long ago. -- Peter Jeremy pgpzqxeB9mDZP.pgp Description: PGP signature
Re: 32GB limit per swap device?
On 2011-Aug-18 12:16:44 +0400, "Alexander V. Chernikov" wrote: >The code should look like this: ... >(move pages recalculation before b-list check) I notice a very similar patch has been applied to -current as r225076. For the archives, I've done some testing with this patch on a Sun V890 with 64GB RAM and two 64GB swap partitions. Prior to this patch, each swap partition was truncated to 32GB. With this patch, I have 128GB swap. I've tried filling the swap space to over 80GB and I am not seeing any corruption (allocate lots of memory and fill with a known pseudo-random pattern and then verify). -- Peter Jeremy pgpo8PkzVBfqo.pgp Description: PGP signature
How disable ntpd on IPv6 adresses?
Hello! I hope this is the right list for this question. In FreeBSD 8.2, how do I make ntpd not open any IPv6 ports? I have searched man pages and google, but haven't found the answer. Some ntpd have the command line option -4, but that doesn't seem to be the case with FreeBSD ntpd. The server runs IPv6, but ntpd will only ever be used with IPv4 servers, so I don't want any unnecessary open IPv6 ports for ntpd. "Use restrict" or "Use a firewall" is not the answer. I just don't want this junk in netstat -an: udp6 0 0 fe80:3::1.123 *.* udp6 0 0 ::1.123*.* udp6 0 0 x:x:x:x.123*.* udp6 0 0 fe80:2::219:bbff.123 *.* udp6 0 0 fe80:1::219:bbff.123 *.* udp6 0 0 *.123 *.* Thanks! -- Peter Olssonp...@leissner.se ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs parition probing causing long delay at BTX loader
On 10/20/2011 07:23 PM, Steven Hartland wrote: > Installing a new machine here which has 10+ disks > we're seeing BTX loader take 50+ seconds to enumerate > the disks. I am running 8-STABLE. On my system with 22 disks, it took much longer than a minute (maybe 5 minutes... not sure, but overall boot was about 7 minutes). While this time is passing, I can watch the leds on the disks blink in order, many times in a loop. My IO card is a LSI SATA/SAS 9211-8i 6Gb/s. After I upgraded the firmware to version 11, it seems to take much less time, but I didn't time it. And watching the LEDs last time I rebooted, I don't notice them all blinking the same way. Instead, all were solid for a second or two after the long wait, and then only the root disks. So if you have the same card, I suggest you update the firmware. (I updated for stability rather than boot speed, and it seemed stable until it froze today, after 2 weeks) > After doing some digging I found the following thread > on the forums which hinted that r198420 maybe the > cause. > http://forums.freebsd.org/showthread.php?t=12705 > > A quick change to zfs.c reverting the change to > support 128 partitions back to 4 and BTX completes > instantly like it used to. > > svn commit which introduced this delay is:- > http://svnweb.freebsd.org/base?view=revision&revision=198420 > > the specific file in that changeset:- > http://svnweb.freebsd.org/base/head/sys/boot/zfs/zfs.c?r1=198420&r2=198419&pathrev=198420 > > > So the questions are:- > > 1. Can this be optimised so it doesn't have to test all > of the possible 128 GPT partitions? > > 2. If a optimisation isn't possible or is too complex to > achieve would it be better to have the partitions defined > as an option which can be increased if needed as I suspect > 99.99% if not 100% of users won't be making use of more > than 4 partitions even with GPT, such as what the attached > patch against 8.2-RELEASE achieves. > >Regards >Steve > > > This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained in > it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmas...@multiplay.co.uk. > > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" -- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.malo...@brockmann-consult.de Internet: http://www.brockmann-consult.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs parition probing causing long delay at BTX loader
First post failed, because the attachment was too big. (Or maybe you got a copy anyway since you are also in the To) Here it is again: On 10/21/2011 01:32 PM, Peter Maloney wrote: > On 10/21/2011 01:04 PM, Steven Hartland wrote: >> - Original Message - From: "Peter Maloney" >> >> To: >> Sent: Friday, October 21, 2011 11:17 AM >> Subject: Re: zfs parition probing causing long delay at BTX loader >> >> >>> On 10/20/2011 07:23 PM, Steven Hartland wrote: >>>> Installing a new machine here which has 10+ disks >>>> we're seeing BTX loader take 50+ seconds to enumerate >>>> the disks. >>> I am running 8-STABLE. On my system with 22 disks, it took much longer >>> than a minute (maybe 5 minutes... not sure, but overall boot was >>> about 7 >>> minutes). While this time is passing, I can watch the leds on the disks >>> blink in order, many times in a loop. >>> >>> My IO card is a LSI SATA/SAS 9211-8i 6Gb/s. >> >> We are indeed using that 3 x 9211-8i's per chassis. >>> After I upgraded the firmware to version 11, it seems to take much less >>> time, but I didn't time it. And watching the LEDs last time I rebooted, >>> I don't notice them all blinking the same way. Instead, all were solid >>> for a second or two after the long wait, and then only the root disks. >> >> We are already running fw v11.00.00.00 but thanks for the heads up. >> > Are you running the IT or IR firmware version? I am running the IR one. > > And by the way, here is my uname -a: > # uname -a > FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 > 15:06:03 CEST 2011 > r...@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 > > And I installed 8-STABLE using cvsup using this date filter in my > cvsup file: > *default date=2011.09.27.00.00.00 > > And I remember one other thing I did since the firmware upgrade. I > booted off a Linux hard disk which I had to put in the first hard disk > bay, or it wouldn't boot from it. So I moved the root disk somewhere > else. FreeBSD still boots, so I left it where I moved it. I don't know > if that changed the boot time. > >>> So if you have the same card, I suggest you update the firmware. (I >>> updated for stability rather than boot speed, and it seemed stable >>> until >>> it froze today, after 2 weeks) >> >> Do you have any information about the hang? > I decided to rename some of my replication snapshots to fill in gaps > from daily snapshots (since I wasn't always doing them daily)... just > so I could delete old replication snapshots. > > So I wrote a bash script to take the first replication snapshot per > day and rename it > (need to get bash from ports or hope it runs in sh): > > for day in {4..16}; do > if [ "$day" -lt 10 ]; then > day="0$day" > fi > > firstSnapshotOfDay=$(zfs list -o name -t snapshot -r tank | grep > -E "^tank@replication-201110${day}" | head -n1) > > if [ "$firstSnapshotOfDay" = "" ]; then > continue > fi > > time=$(echo ${firstSnapshotOfDay} | cut -d'-' -f2) > hour=${time:8:2} > minute=${time:10:2} > second=${time:12:2} > > echo "=" > echo $day $firstSnapshotOfDay $time $hour $minute $second > echo zfs rename -r "${firstSnapshotOfDay}" > tank@daily-2011-10-${day}T${hour}:${minute}:${second} > done > > And then I took the output from it, and started running it. > For example: > > zfs rename -r tank@replication-20111004111436 > tank@daily-2011-10-04T11:14:36 > > I ran 8 of the commands like the above, which took about 1 second > each. Then the 9th command froze. > > root@bcnas1:~/bin/zfs/snapshots# zfs rename -r > tank@replication-2011101300 tank@daily-2011-10-13T00:00:00 > ^C > load: 0.13 cmd: zfs 70731 [tx->tx_sync_done_cv)] 489.40r 0.00u 0.07s > 0% 1760k > load: 0.01 cmd: zfs 70731 [tx->tx_sync_done_cv)] 638.13r 0.00u 0.07s > 0% 1328k > > I then tried other things in other windows (using screen). Anything > involving zpool or zfs would hang like this: > > root@bcnas1:~/bin/rsync# zpool status > ^C^C > load: 0.12 cmd: zpool 87352 [spa_namespace_lock] 479.77r 0.00u 0.00s > 0% 1628k > load: 0.01 cmd: zpool 87352 [spa_namespace_lock] 616.65r 0.00u 0.00s > 0% 1288k > > Other attempts to read from the tank zpool worked fine. But maybe it > was only reading from arc and l2arc. The system has 48 GB of memory. > And my NFS mounts stopped worki
Re: zfs parition probing causing long delay at BTX loader
Dear Steven, This script freezes zfs on my 2 systems with replicated data, and on an independent VM I created. I believe it freezes on the rename line. This only happens if you have a zvol. So I will be removing my zvols, since I am not using them. eg. zfs create -V 10m tank/testzvol then run script #!/usr/local/bin/bash # # Author: Peter Maloney # Purpose: try to crash ZFS by doing IO and renaming snapshots # # Result: it crashes every system I put it on, as long as there is a zvol in the pool dataset=tank count=0 nextprint=$(date +%s) while true; do echo Snapshot zfs destroy -r ${dataset}@testcrashsnap >/dev/null 2>&1 zfs snapshot -r ${dataset}@testcrashsnap || break current="" for next in 1 2 3 4 5; do echo Renaming from ${current} to ${next} zfs destroy -r ${dataset}@testcrashsnap${next} >/dev/null 2>&1 zfs rename -r ${dataset}@testcrashsnap${current} ${dataset}@testcrashsnap${next} || break current=${next} done echo Destroy zfs destroy -r ${dataset}@testcrashsnap${current} || break let count++ now=$(date +%s) if [ $now -gt $nextprint ]; then echo $count let nextprint+=1 fi done I'll file a PR on Monday. On 10/21/2011 01:37 PM, Peter Maloney wrote: > First post failed, because the attachment was too big. (Or maybe you got > a copy anyway since you are also in the To) > Here it is again: > > On 10/21/2011 01:32 PM, Peter Maloney wrote: >> On 10/21/2011 01:04 PM, Steven Hartland wrote: >>> - Original Message - From: "Peter Maloney" >>> >>> To: >>> Sent: Friday, October 21, 2011 11:17 AM >>> Subject: Re: zfs parition probing causing long delay at BTX loader >>> >>> >>>> On 10/20/2011 07:23 PM, Steven Hartland wrote: >>>>> Installing a new machine here which has 10+ disks >>>>> we're seeing BTX loader take 50+ seconds to enumerate >>>>> the disks. >>>> I am running 8-STABLE. On my system with 22 disks, it took much longer >>>> than a minute (maybe 5 minutes... not sure, but overall boot was >>>> about 7 >>>> minutes). While this time is passing, I can watch the leds on the disks >>>> blink in order, many times in a loop. >>>> >>>> My IO card is a LSI SATA/SAS 9211-8i 6Gb/s. >>> We are indeed using that 3 x 9211-8i's per chassis. >>>> After I upgraded the firmware to version 11, it seems to take much less >>>> time, but I didn't time it. And watching the LEDs last time I rebooted, >>>> I don't notice them all blinking the same way. Instead, all were solid >>>> for a second or two after the long wait, and then only the root disks. >>> We are already running fw v11.00.00.00 but thanks for the heads up. >>> >> Are you running the IT or IR firmware version? I am running the IR one. >> >> And by the way, here is my uname -a: >> # uname -a >> FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 >> 15:06:03 CEST 2011 >> r...@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 >> >> And I installed 8-STABLE using cvsup using this date filter in my >> cvsup file: >> *default date=2011.09.27.00.00.00 >> >> And I remember one other thing I did since the firmware upgrade. I >> booted off a Linux hard disk which I had to put in the first hard disk >> bay, or it wouldn't boot from it. So I moved the root disk somewhere >> else. FreeBSD still boots, so I left it where I moved it. I don't know >> if that changed the boot time. >> >>>> So if you have the same card, I suggest you update the firmware. (I >>>> updated for stability rather than boot speed, and it seemed stable >>>> until >>>> it froze today, after 2 weeks) >>> Do you have any information about the hang? >> I decided to rename some of my replication snapshots to fill in gaps >> from daily snapshots (since I wasn't always doing them daily)... just >> so I could delete old replication snapshots. >> >> So I wrote a bash script to take the first replication snapshot per >> day and rename it >> (need to get bash from ports or hope it runs in sh): >> >> for day in {4..16}; do >> if [ "$day" -lt 10 ]; then >> day="0$day" >> fi >> >> firstSnapshotOfDay=$(zfs list -o name -t snapshot -r tank | grep >> -E "^tank@replication-201110${day}" | head -n1) >> >> if [ "$firstSnapshotOfDay" = "" ]; then >> con
Re: 8.1 xl + dual-speed Netgear hub = yoyo
On 21 October 2011 16:00, wrote: > > ...snip... > > Both connections were using the same (short) Cat5 cable, I tried two > different ports on the 10/100 hub, and other systems work OK on that > 10/100 hub. > > How do I get this interface to operate properly at 100MB? > > ...snip... "Auto-negotiation" is a nightmare, and *will* cause you problems. The best you can do is try to try to set every device using the switch to 100Mbps full, if that doesn't work buy a proper switch. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"