Re: Mixing amd64 kernel with i386 world
On Sat, 2013-09-28 at 20:37 +1000, Peter Jeremy wrote: I have a system with 4GB RAM and hence need to use an amd64 kernel to use all the RAM (I can only access 3GB RAM with an i386 kernel). OTOH, amd64 processes are significantly (50-100%) larger than equivalent i386 processes and none none of the applications I'll be running on the system need to be 64-bit. This implies that the optimal approach is an amd64 kernel with i386 userland (I'm ignoring PAE as a useable approach). I've successfully run i386 jails on amd64 systems so I know this mostly works. I also know that there are some gotchas: - kdump needs to match the kernel - anything accessing /dev/mem or /dev/kmem (which implies anything that uses libkvm) probably needs to match the kernel. Has anyone investigated this approach? Why are you ignoring PAE? It's been working for me for years. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Trying to use /bin/sh
On Sat, 2013-09-28 at 16:36 +, Teske, Devin wrote: On Sep 28, 2013, at 1:12 AM, Stefan Esser wrote: Am 28.09.2013 00:14, schrieb Jilles Tjoelker: sh's model of startup files (only login shells use startup files with fixed names, other interactive shells only use $ENV) assumes that every session will load /etc/profile and ~/.profile at some point. This includes graphical sessions. The ENV file typically contains only shell options, aliases, function definitions and unexported variables but no environment variables. Some graphical environments actually source shell startup files like ~/.profile when logging in. I remember this from CDE for example. It is important to have some rule where this should happen to avoid doing it twice or never in strange configurations. As a workaround, I made ~/.xsession a script interpreted by my login shell and source some startup files. A problem here is that different login shells have incompatible startup files. I used to modify Xsession to do the final exec with a forced login shell of the user. This worked for users of all shells. The script identified the shell to use and then used argv0 to start a login shell to execute the display manager. A simplified version of my Xsession script is: -- #!/bin/sh LIB=/usr/local/lib SH=$SHELL [ -n $SH ] || SH=/bin/sh SHNAME=`basename $SH` echo exec $LIB/xdm/Xsession.real $* | \ /usr/local/bin/argv0 $SH -$SHNAME -- The argv0 command is part of sysutils/ucspi-tcp, BTW. This script prepends a - to the name of the shell that is started to execute the real Xsession, which had been renamed to Xession.real. I know that the script could be further simplified by using modern variable expansion/substitution commands, but this script was in use some 25 years ago on a variety of Unix systems (SunOS, Ultrix, HP-UX) and I only used the minimal set of Bourne Shell facilities, then. You may want a command to source standard profiles or environment settings before the final exec, in case the users shell does not load them. In my ~/.fvwm2rc file, this is how I launch an XTerm. This achieves the goal of sourcing my profile scripts like a normal login shell while launching XTerm(s) in the GUI. DestroyFunc FvwmXTerm AddToFunc FvwmXTerm PipeRead '\ cmd=/usr/bin/xterm; \ [ -x ${cmd} ] || cmd=/usr/X11R6/bin/xterm; \ [ -x ${cmd} ] || cmd=xterm; \ cmd=${cmd} -sb -sl 400; \ cmd=${cmd} -ls; \ cmd=${cmd} -r -si -sk;\ cmd=${cmd} -fn \\-misc-fixed-medium-r-*-*-15-*\\; \ echo + I Exec exec ${cmd}' Essentially producing an XTerm invocation of: xterm -sb -sl 400 -ls -r -si -sk -fn -misc-fixed-medium-r-*-*-15-* And everytime I launch an XTerm with that, I get my custom prompt set by ~/.bash_profile. Of course, I'm also a TCSH user, so when I flop over to tcsh, I also get my custom prompt set by ~/.tcshrc But failing that... you could actually make your XTerm a login shell with: xterm -e login But of course, then you're looking at having to enter credentials. Perhaps it's just a matter of getting your commands into the right file... .bash_profile for bash and .tcshrc for tcsh. For bash the solution I've been using for like 15 years is that my .bash_profile (used only for a login) contains simply: if [ -f ~/.bashrc ]; then . ~/.bashrc fi And everything goes into .bashrc which runs on non-login shell invocation. I have a few lines of code in .bashrc that have to cope with things like not blindly adding something to PATH that's already there[1] but other than that I generally want all the same things to happen whether its a login shell or not. I think the bourne-shell equivelent is to have a .profile that just sets ENV=~/.shrc or similar. (I think someone mentioned that earlier in the thread.) [1] for example: if [[ $PATH != *$HOME/bin* -d $HOME/bin ]] ; then export PATH=$HOME/bin:$PATH fi -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
The right way to invoke sh from a freebsd makefile?
What's the right way to launch the bourne shell from a makefile? I had assumed the ${SHELL} variable would be set to the right copy of /bin/sh (like maybe the one in tmp or legacy at various stages). It appears that that's not the case, and ${SHELL} is whatever comes from the environment, which can lead to using csh or bash or whatever. I see some of our makefiles use just a bare sh which seems reasonable to me, but I don't want to glitch this in src/include/Makefile again. The goal is to run a script in src/include/Makefile by launching sh with the script name (as opposed to launching the script and letting the #! do its thing, which doesn't work if the source dir is mounted noexec). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: The right way to invoke sh from a freebsd makefile?
On Sun, 2013-09-22 at 19:27 -0400, Glen Barber wrote: On Sun, Sep 22, 2013 at 05:18:25PM -0600, Ian Lepore wrote: What's the right way to launch the bourne shell from a makefile? I had assumed the ${SHELL} variable would be set to the right copy of /bin/sh (like maybe the one in tmp or legacy at various stages). It appears that that's not the case, and ${SHELL} is whatever comes from the environment, which can lead to using csh or bash or whatever. I see some of our makefiles use just a bare sh which seems reasonable to me, but I don't want to glitch this in src/include/Makefile again. The goal is to run a script in src/include/Makefile by launching sh with the script name (as opposed to launching the script and letting the #! do its thing, which doesn't work if the source dir is mounted noexec). I think BUILDENV_SHELL is what you are looking for. For this specific case, I think instead of '#!/bin/sh', maybe '#!/usr/bin/env sh' may be preferable. Glen No, BUILDENV_SHELL is a special thing... it's used when you make buildenv to chroot into a cross-build environment to work interactively. I added that long ago because I can't live in a csh shell (I mean, I can't do anything, I'm totally lost), and I wanted a way to have make buildenv put me right into bash (of course, you have to have bash in the chroot). The flavor of hashbang to use shouldn't matter, since what I'm after here is launching the shell to run the script without using the hashbang mechanism. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: The right way to invoke sh from a freebsd makefile?
On Sun, 2013-09-22 at 19:45 -0400, Glen Barber wrote: On Sun, Sep 22, 2013 at 05:37:51PM -0600, Ian Lepore wrote: On Sun, 2013-09-22 at 19:27 -0400, Glen Barber wrote: On Sun, Sep 22, 2013 at 05:18:25PM -0600, Ian Lepore wrote: What's the right way to launch the bourne shell from a makefile? I had assumed the ${SHELL} variable would be set to the right copy of /bin/sh (like maybe the one in tmp or legacy at various stages). It appears that that's not the case, and ${SHELL} is whatever comes from the environment, which can lead to using csh or bash or whatever. I see some of our makefiles use just a bare sh which seems reasonable to me, but I don't want to glitch this in src/include/Makefile again. The goal is to run a script in src/include/Makefile by launching sh with the script name (as opposed to launching the script and letting the #! do its thing, which doesn't work if the source dir is mounted noexec). I think BUILDENV_SHELL is what you are looking for. For this specific case, I think instead of '#!/bin/sh', maybe '#!/usr/bin/env sh' may be preferable. Glen No, BUILDENV_SHELL is a special thing... it's used when you make buildenv to chroot into a cross-build environment to work interactively. I added that long ago because I can't live in a csh shell (I mean, I can't do anything, I'm totally lost), and I wanted a way to have make buildenv put me right into bash (of course, you have to have bash in the chroot). Ah, right. Thanks for the sanity check. The flavor of hashbang to use shouldn't matter, since what I'm after here is launching the shell to run the script without using the hashbang mechanism. You can hard-code /bin/sh directly, but what I was getting at with the '#!/usr/bin/env sh' is that the 'sh' interpreter of the build environment could be used (instead of /bin/sh directly). Then you don't need to worry about the path to sh(1). Glen My point is that the #! isn't used at all in this case, it doesn't matter what's there. Try this... echo echo foo /tmp/foo sh /tmp/foo Not only does it not need the hashbang, the script doesn't even have to be executable when you launch sh and name a script on the command line, which is just what's needed to run a script from a directory mounted with the noexec flag. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: BUS_PROBE_NOWILDCARD behaviour doesn't seem to match DEVICE_PROBE(9)
On Thu, 2013-06-20 at 10:54 -0400, Ryan Stone wrote: http://www.freebsd.org/cgi/man.cgi?query=DEVICE_PROBEapropos=0sektion=0manpath=FreeBSD%208.2-RELEASEformat=html DEVICE_PROBE(9) has this to say about BUS_PROBE_NOWILDCARD: The driver expects its parent to tell it which children to manage and no probing is really done. The device only matches if its parent bus specifically said to use this driver. I interpreted this as meaning that if BUS_ADD_CHILD() is called with the name parameter specifying a driver then if that driver's probe method returns BUS_PROBE_NOWILDCARD the driver will match that device. However the logic in subr_bus.c is more strict; it will only match if the unit number if also specified. This seems overly strict to me, and there appears to be at least one case in-tree where a driver will never match due to this behaviour: http://svnweb.freebsd.org/base/head/sys/dev/iicbus/iicsmb.c?revision=227843view=markup The iicsmb driver calls BUS_ADD_CHILD() from its identify method with a wildcarded unit number (-1) but the driver specified. It then returns BUS_PROBE_NOWILDCARD from its attach method(intending that it only claim the device created in the identify method), but that won't match. I want to use the exact same pattern in a new driver. The following patch allows this to work: diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c index 1f3d4e8..7e48b0e 100644 --- a/sys/kern/subr_bus.c +++ b/sys/kern/subr_bus.c @@ -2015,7 +2015,7 @@ device_probe_child(device_t dev, device_t child) * in stone by the parent bus. */ if (result = BUS_PROBE_NOWILDCARD - child-flags DF_WILDCARD) + !(child-flags DF_FIXEDCLASS)) continue; best = dl; pri = result; This should be safe to do, as all devices that specified a unit number must have specified a driver, so this can't cause any devices to suddenly fail to match. I supposed that it theoretically could cause a driver to match a device that previously it wouldn't have, but I'm having trouble seeing how somebody could add a device of type foo and not expect the foo driver to attach. Any objections if I commit this? I know this is pretty long after the fact, but it looks like this never got committed. I recently had to port some drivers written for freebsd 4 and 6 to 8.2, and some of them have no real probe mechanism and attached themselves to, like, *everything* (serial and parallel ports and so on). They're instantiated based on hints that are definitive, so I switched to returning BUS_PROBE_NOWILDCARD and sanity returned. Then I remembered this email, so I applied your patch and re-tested and everything still worked perfectly. Not exactly an exhaustive test, but at least a positive datapoint. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: bin/176713: [patch] nc(1) closes network socket too soon
On Tue, 2013-07-23 at 16:48 -0700, Ronald F. Guilmette wrote: In message caj-vmonk-8v9ej0w4qycnnbkieoee9dl3btvp6vqipxkh2j...@mail.gmail.com Adrian Chadd adr...@freebsd.org wrote: Right, and your patch just stops the shutdown(), right? The shutdown that occurs when EOF is encountered on stdin, yes. Rather than teaching nc to correctly check BOTH socket states before deciding to close things. In effect, nc *is* currently checking both sockets and that is exactly the problem. It terminates (prematurely in some cases) whenever it sees an EOF _either_ from the remote host _or_ from stdin. My patch casuses nc to wait for EOF from the remote server before exiting, EVEN IF prior to the time it sees that EOF from the remote server it sees an EOF (first) on stdin. This code change demonstratably makes the functionality of nc better and more pragmatically useful in typical use cases. You appear to be proposing something else, but I'm sorry to say that I cannot decypher what, exactly you are attempting to propose. I have proposed specific code changes. If you have some different ones that you would like to propose, then I feel sure that everyone on the hackers list, including myself, would be interested to take a look at what you have in mind, and also what problem you are solving. I'd personally rather see nc taught to check to see whether it can possibly make ANY more progress before deciding to shut things down. I believe that that is exactly what the patch that I proposed does. I'm not sure why you feel otherwise. Look, there are only two scenarios... either (a) EOF arrives from stdin first or else (b) EOF arrives from the remote server first. I don't think this accurately summarizes things. The view of the remote server isn't just EOF arrives it's can't read anymore and can't write anymore which have to be handled separately. My patch causes nc to continue gathering data from the remote server (and copying it all to stdout) in case (a). In case (b) there is no point in nc continuing to run (and/or continuing to read from stdin) if the remote server has shut down the connection. In this case, the data that nc might yet gather from its stdin channel has noplace to go! So whenever nc has sensed an EOF from the remote server it can (and should) immediately shut down... and that is exactly what it is _already_ programmed to do. Here you seem to be talking about the inability to send more data to the remote side. If you exit immediately when that happens, even if you could still read from the remote side, then you may miss the incoming data that would tell you why you can't send anymore. In this case the thing to do would be to stop reading stdin, but continue to read the remote side and copy it to stdout until you get EOF reading the remote side. Conversely, you can't exit immediately when the remote side has no more to send you and shuts down that half of the connection, you still have to read from stdin and send it to the remote until EOF on stdin or the remote shuts down that half of the connection. How all this applies to netcat's ability to do connectionless (UDP) stuff probably makes the whole thing that much more interesting. BTW, earlier in the thread you asserted more or less that telnet is for interactive and nc for scripting. I virtually never use nc in any way except interactively, and I use it that way every day, all day long. -- Ian So, what problem do you want to solve that is not solved by the patch that I already proposed? Also, with respect, if you think there really is some other problem, then proposing actual concrete patches to solve that other problem would perhaps allow folks, including myself, to better understand what it is that you are driving at. Regards, rfg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: rc.d scripts to control multiple instances of the same daemon?
On Tue, 2013-06-25 at 15:44 -0400, Garrett Wollman wrote: I'm in the process of (re)writing an rc.d script for kadmind (security/krb5). Unlike the main Kerberos daemon, kadmind needs to have a separate instance for each realm on the server -- it can't support multiple realms in a single process. What I need to be able to do: 1) Have different flags and pidfiles for each instance. 2) Be able to start, stop, restart, and status each individual instance by giving its name on the command line. 3) Have all instances start/stop automatically when a specific instance isn't specified. I've looked around for examples of good practice to emulate, and haven't found much. The closest to what I want looks to be vboxheadless, but I'm uncomfortable with the amount of mechanism from rc.subr that it needs to reimplement. Are there any better examples? The one like that I use the most is service netif restart fpx0 but I'm not sure the complex network stuff will be the cleanest example of anything except how to do complex network stuff. :) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Custom kernel under RPI
On Fri, 2013-03-15 at 18:21 +0100, Loïc BLOT wrote: Hi all, I don't know if it's the good list, but hackers for RPI, i think it's a good thing :D I have a little problem with custom kernel with RPI. I have modified RPI-B config file to include run/runfw driver, compiled the kernel and install it (make buildkernel KERNCONF=RPI-B make installkernel KERNCONF=RPI-B, from the RPI). The problem is at reboot. I can't boot on the RPI, because the kernel is frozen after those lines: Kernel entry at 0x100100 .. Kernel args: (null) Nothing after. Can someone tell me if i do something wrong ? Thanks for advance For arm-specific questions, the freebsd-arm list might be better (I've added it to the CC). The problem may be that it has no device-tree info. You can add fdt addr 0x100 to the /boot/loader.rc file to fix that. You can also enter it by hand at the loader prompt first to see if that helps... just hit a character (other than return) while it's loading the kernel, enter that command, then enter 'boot'. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: rtprio_thread trouble
On Wed, 2013-03-06 at 09:17 -0500, John Baldwin wrote: On Thursday, February 28, 2013 2:59:16 pm Ian Lepore wrote: On Tue, 2013-02-26 at 15:29 -0500, John Baldwin wrote: On Friday, February 22, 2013 2:06:00 pm Ian Lepore wrote: I ran into some trouble with rtprio_thread() today. I have a worker thread that I want to run at idle priority most of the time, but if it falls too far behind I'd like to bump it back up to regular timeshare priority until it catches up. In a worst case, the system is continuously busy and something scheduled at idle priority is never going to run (for some definition of 'never'). What I found is that in this worst case, even after my main thread has used rtprio_thread() to change the worker thread back to RTP_PRIO_NORMAL, the worker thread never gets scheduled. This is with the 4BSD scheduler but it appears that the same would be the case with ULE, based on code inspection. I find that this fixes it for 4BSD, and I think the same would be true for ULE... --- a/sys/kern/sched_4bsd.c Wed Feb 13 12:54:36 2013 -0700 +++ b/sys/kern/sched_4bsd.c Fri Feb 22 11:55:35 2013 -0700 @@ -881,6 +881,9 @@ sched_user_prio(struct thread *td, u_cha return; oldprio = td-td_user_pri; td-td_user_pri = prio; + if (td-td_flags TDF_BORROWING td-td_priority = prio) + return; + sched_priority(td, prio); } void But I'm not sure if this would have any negative side effects, especially since in the ULE case there's a comment on this function that specifically notes that it changes the the user priority without changing the current priority (but it doesn't say why that matters). Is this a reasonable way to fix this problem, or is there a better way? This will lose the priority boost afforded to interactive threads when they sleep in the kernel in the 4BSD scheduler. You aren't supposed to drop the user priority to loose this boost until userret(). You could perhaps try only altering the priority if the new user pri is lower than your current priority (and then you don't have to check TDF_BORROWING I believe): if (prio td-td_priority) sched_priority(td, prio); That's just the sort of insight I was looking for, thanks. That made me look at the code more and think harder about the problem I'm trying to solve, and I concluded that doing it within the scheduler is all wrong. That led me to look elsewhere, and I discovered the change you made in r228207, which does almost what I want, but your change does it only for realtime priorities, and I need a similar effect for idle priorities. What I came up with is a bit different than yours (attached below) and I'd like your thoughts on it. I start with the same test as yours: if sched_user_prio() didn't actually change the user priority (due to borrowing), do nothing. Then mine differs: call sched_prio() to effect the change only if either the old or the new priority class is not timeshare. My reasoning for the second half of the test is that if it's a change in timeshare priority then the scheduler is going to adjust that priority in a way that completely wipes out the requested change anyway, so what's the point? (If that's not true, then allowing a thread to change its own timeshare priority would subvert the scheduler's adjustments and let a cpu-bound thread monopolize the cpu; if allowed at all, that should require priveleges.) On the other hand, if either the old or new priority class is not timeshare, then the scheduler doesn't make automatic adjustments, so we should honor the request and make the priority change right away. The reason the old class gets caught up in this is the very reason I'm wanting to make a change: when thread A changes the priority of its child thread B from idle back to timeshare, thread B never actually gets moved to a timeshare-range run queue unless there are some idle cycles available to allow it to first get scheduled again as an idle thread. Finally, my change doesn't consider the td == curthread situation at all, because I don't see how that's germane. This is the thing I'm least sure of -- I don't at all understand why the old code (even before your changes) had that test. The old code had that flagged as XXX dubious (a comment a bit too cryptic to be useful). I think your change is correct. One style nit: please sort the order of variables (oldclass comes before oldpri). Thanks for the review. I've been running my change on one of our products in an 8.2 environment, and on some arm platforms running -current, and it seems to be working well. Alphabetizing: Grrr, yeah. I had it that way at first, but it just offended my sensibilities to separate related values
Re: rtprio_thread trouble
On Tue, 2013-02-26 at 15:29 -0500, John Baldwin wrote: On Friday, February 22, 2013 2:06:00 pm Ian Lepore wrote: I ran into some trouble with rtprio_thread() today. I have a worker thread that I want to run at idle priority most of the time, but if it falls too far behind I'd like to bump it back up to regular timeshare priority until it catches up. In a worst case, the system is continuously busy and something scheduled at idle priority is never going to run (for some definition of 'never'). What I found is that in this worst case, even after my main thread has used rtprio_thread() to change the worker thread back to RTP_PRIO_NORMAL, the worker thread never gets scheduled. This is with the 4BSD scheduler but it appears that the same would be the case with ULE, based on code inspection. I find that this fixes it for 4BSD, and I think the same would be true for ULE... --- a/sys/kern/sched_4bsd.c Wed Feb 13 12:54:36 2013 -0700 +++ b/sys/kern/sched_4bsd.c Fri Feb 22 11:55:35 2013 -0700 @@ -881,6 +881,9 @@ sched_user_prio(struct thread *td, u_cha return; oldprio = td-td_user_pri; td-td_user_pri = prio; + if (td-td_flags TDF_BORROWING td-td_priority = prio) + return; + sched_priority(td, prio); } void But I'm not sure if this would have any negative side effects, especially since in the ULE case there's a comment on this function that specifically notes that it changes the the user priority without changing the current priority (but it doesn't say why that matters). Is this a reasonable way to fix this problem, or is there a better way? This will lose the priority boost afforded to interactive threads when they sleep in the kernel in the 4BSD scheduler. You aren't supposed to drop the user priority to loose this boost until userret(). You could perhaps try only altering the priority if the new user pri is lower than your current priority (and then you don't have to check TDF_BORROWING I believe): if (prio td-td_priority) sched_priority(td, prio); That's just the sort of insight I was looking for, thanks. That made me look at the code more and think harder about the problem I'm trying to solve, and I concluded that doing it within the scheduler is all wrong. That led me to look elsewhere, and I discovered the change you made in r228207, which does almost what I want, but your change does it only for realtime priorities, and I need a similar effect for idle priorities. What I came up with is a bit different than yours (attached below) and I'd like your thoughts on it. I start with the same test as yours: if sched_user_prio() didn't actually change the user priority (due to borrowing), do nothing. Then mine differs: call sched_prio() to effect the change only if either the old or the new priority class is not timeshare. My reasoning for the second half of the test is that if it's a change in timeshare priority then the scheduler is going to adjust that priority in a way that completely wipes out the requested change anyway, so what's the point? (If that's not true, then allowing a thread to change its own timeshare priority would subvert the scheduler's adjustments and let a cpu-bound thread monopolize the cpu; if allowed at all, that should require priveleges.) On the other hand, if either the old or new priority class is not timeshare, then the scheduler doesn't make automatic adjustments, so we should honor the request and make the priority change right away. The reason the old class gets caught up in this is the very reason I'm wanting to make a change: when thread A changes the priority of its child thread B from idle back to timeshare, thread B never actually gets moved to a timeshare-range run queue unless there are some idle cycles available to allow it to first get scheduled again as an idle thread. Finally, my change doesn't consider the td == curthread situation at all, because I don't see how that's germane. This is the thing I'm least sure of -- I don't at all understand why the old code (even before your changes) had that test. The old code had that flagged as XXX dubious (a comment a bit too cryptic to be useful). -- Ian Index: sys/kern/kern_resource.c === --- sys/kern/kern_resource.c (revision 247421) +++ sys/kern/kern_resource.c (working copy) @@ -469,8 +469,7 @@ sys_rtprio(td, uap) int rtp_to_pri(struct rtprio *rtp, struct thread *td) { - u_char newpri; - u_char oldpri; + u_char newpri, oldpri, oldclass; switch (RTP_PRIO_BASE(rtp-type)) { case RTP_PRIO_REALTIME: @@ -493,11 +492,12 @@ rtp_to_pri(struct rtprio *rtp, struct thread *td) } thread_lock(td); + oldclass = td-td_pri_class; sched_class(td, rtp-type); /* XXX fix */ oldpri = td-td_user_pri; sched_user_prio(td, newpri); - if (td-td_user_pri != oldpri (td == curthread || - td
Re: TFTP single file kernel load
On Sat, 2013-02-23 at 16:28 +0100, Wojciech Puchar wrote: can it be done? converting ELF kernel (i don't use kld modules) to format that can be loaded directly over TFTP - without intermediate stages like loader(8)? just to have SINGLE FILE that tftp would load and run. no loader(8) etc. The kernel build process for arm and mips create such a kernel as one of the standard outputs from buildkernel. That doesn't appear to be the case for x86 kernels, but you could use sys/conf/makefile.arm as a guide. Basically what needs doing is to link the kernel with a modified ldscript that doesn't add space for the program headers, and then run the output of that link through objcopy -S -O binary to create a kernel.bin file. That file can be directly loaded to the address it was linked for, and a jump to the load address launches the kernel. Whether the kernel runs properly when launched that way is a different question. An arm kernel will run that way because we haven't had the luxury of loader(8) in the arm world until recently. The x86 kernel may expect values in the environment that the loader obtained from the bios. Without a loader you may need to modify the kernel to get that information in some other way early in startup. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: TFTP single file kernel load
On Sat, 2013-02-23 at 17:57 +0100, Wojciech Puchar wrote: Basically what needs doing is to link the kernel with a modified ldscript that doesn't add space for the program headers, and then run the output of that link through objcopy -S -O binary to create a kernel.bin file. That file can be directly loaded to the address it was linked for, and a jump to the load address launches the kernel. is btxld(8) a tool i have to use after making kernel.bin file? what should i use for -b and -l I've never heard of btxld before now, and from a quick look at its manpage its not clear to me what it does. It may be a part of the x86 build process I've never noticed before. Whether the kernel runs properly when launched that way is a different question. An arm kernel will run that way because we haven't had the luxury of loader(8) in the arm world until recently. The x86 kernel may expect values in the environment that the loader obtained from the bios. it can be loaded without loader for now - if you press a key before loader(8) is loaded and enter kernel image. at least it was like that. Oh, good point, maybe it'll just work fine (although it's been years since I last loaded an x86 kernel directly from boot2, way back before the days of acpi and smap data and all of that modern stuff). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
rtprio_thread trouble
I ran into some trouble with rtprio_thread() today. I have a worker thread that I want to run at idle priority most of the time, but if it falls too far behind I'd like to bump it back up to regular timeshare priority until it catches up. In a worst case, the system is continuously busy and something scheduled at idle priority is never going to run (for some definition of 'never'). What I found is that in this worst case, even after my main thread has used rtprio_thread() to change the worker thread back to RTP_PRIO_NORMAL, the worker thread never gets scheduled. This is with the 4BSD scheduler but it appears that the same would be the case with ULE, based on code inspection. I find that this fixes it for 4BSD, and I think the same would be true for ULE... --- a/sys/kern/sched_4bsd.c Wed Feb 13 12:54:36 2013 -0700 +++ b/sys/kern/sched_4bsd.c Fri Feb 22 11:55:35 2013 -0700 @@ -881,6 +881,9 @@ sched_user_prio(struct thread *td, u_cha return; oldprio = td-td_user_pri; td-td_user_pri = prio; + if (td-td_flags TDF_BORROWING td-td_priority = prio) + return; + sched_priority(td, prio); } void But I'm not sure if this would have any negative side effects, especially since in the ULE case there's a comment on this function that specifically notes that it changes the the user priority without changing the current priority (but it doesn't say why that matters). Is this a reasonable way to fix this problem, or is there a better way? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
why no per-thread scheduling niceness?
I'm curious why the concept of scheduling niceness applies only to an entire process, and it's not possible to have nice threads within a process. Is there any fundamental reason why it couldn't be supported with some extra bookkeeping to track niceness per thread? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Request for review, time_pps_fetch() enhancement
On Tue, 2013-02-12 at 22:34 +0200, Konstantin Belousov wrote: On Tue, Feb 12, 2013 at 09:03:39AM -0700, Ian Lepore wrote: On Sun, 2013-02-10 at 12:37 +0200, Konstantin Belousov wrote: On Sat, Feb 09, 2013 at 02:47:06PM +0100, Jilles Tjoelker wrote: On Wed, Feb 06, 2013 at 05:58:30PM +0200, Konstantin Belousov wrote: On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote: I'd like feedback on the attached patch, which adds support to our time_pps_fetch() implementation for the blocking behaviors described in section 3.4.3 of RFC 2783. The existing implementation can only return the most recently captured data without blocking. These changes add the ability to block (forever or with timeout) until a new event occurs. Index: sys/kern/kern_tc.c === --- sys/kern/kern_tc.c (revision 246337) +++ sys/kern/kern_tc.c (working copy) @@ -1446,6 +1446,50 @@ * RFC 2783 PPS-API implementation. */ +static int +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps) +{ [snip] + aseq = pps-ppsinfo.assert_sequence; + cseq = pps-ppsinfo.clear_sequence; + while (aseq == pps-ppsinfo.assert_sequence + cseq == pps-ppsinfo.clear_sequence) { Note that compilers are allowed to optimize these accesses even over the sequential point, which is the tsleep() call. Only accesses to volatile objects are forbidden to be rearranged. I suggest to add volatile casts to pps in the loop condition. The memory pointed to by pps is global (other code may have a pointer to it); therefore, the compiler must assume that the tsleep() call (which invokes code in a different compilation unit) may modify it. Because volatile does not make concurrent access by multiple threads defined either, adding it here only seems to slow down the code (potentially). The volatile guarantees that the compiler indeed reloads the value on read access. Conceptually, the tsleep() does not modify or even access the checked fields, and compiler is allowed to note this by whatever methods (LTO ?). More, the standard says that an implementation is allowed to not evaluate part of the expression if no side effects are produced, even by calling a function. I agree that for practical means, the _currently_ used compilers should consider the tsleep() call as the sequential point. But then the volatile qualifier cast applied for the given access would not change the code as well. Doesn't this then imply that essentially every driver has this problem, and for that matter, every sequence of code anywhere in the base involving loop while repeatedly sleeping, then waking and checking the state of some data for changes? I sure haven't seen that many volatile qualifiers scattered around the code. No, it does not imply that every driver has this problem. A typical driver provides the mutual exclusion for access of the shared data, which means using locks. Locks include neccessary barries to ensure the visibility of the changes, in particular the compiler barriers. O. I had never considered that using mutexes had other side effects. So is there a correct MI way to invoke the right barrier magic in a situation like this? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Request for review, time_pps_fetch() enhancement
On Sun, 2013-02-10 at 12:41 +0200, Konstantin Belousov wrote: On Fri, Feb 08, 2013 at 04:13:40PM -0700, Ian Lepore wrote: On Wed, 2013-02-06 at 17:58 +0200, Konstantin Belousov wrote: On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote: I'd like feedback on the attached patch, which adds support to our time_pps_fetch() implementation for the blocking behaviors described in section 3.4.3 of RFC 2783. The existing implementation can only return the most recently captured data without blocking. These changes add the ability to block (forever or with timeout) until a new event occurs. -- Ian Index: sys/kern/kern_tc.c === --- sys/kern/kern_tc.c (revision 246337) +++ sys/kern/kern_tc.c (working copy) @@ -1446,6 +1446,50 @@ * RFC 2783 PPS-API implementation. */ +static int +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps) +{ + int err, timo; + pps_seq_t aseq, cseq; + struct timeval tv; + + if (fapi-tsformat fapi-tsformat != PPS_TSFMT_TSPEC) + return (EINVAL); + + /* +* If no timeout is requested, immediately return whatever values were +* most recently captured. If timeout seconds is -1, that's a request +* to block without a timeout. WITNESS won't let us sleep forever +* without a lock (we really don't need a lock), so just repeatedly +* sleep a long time. +*/ Regarding no need for the lock, it would just move the implementation into the low quality one, for the case when one timestamp capture is lost and caller of time_pps_fetch() sleeps until next pps event is generated. I understand the desire to avoid lock, esp. in the pps_event() called from the arbitrary driver context. But the race is also real. What race? A user of the pps interface understands that there is one event per second, and understands that if you ask to block until the next event at approximately the time that event is expected to occur, then it is ambiguous whether the call completes almost-immediately or in about 1 second. Looking at it another way, if a blocking call is made right around the time of the PPS, the thread could get preempted before getting to pps_fetch() function and not get control again until after the PPS has occurred. In that case it's going to block for about a full second, even though the call was made before top-of-second. That situation is exactly the same with or without locking, so what extra functionality is gained with locking? What guarantee does locking let us make to the caller that the lockless code doesn't? No guarantees, but I noted in the original reply that this is about the quality of the implementation and not about correctness. As I said there as well, I am not sure that any locking can be useful for the situation at all. Well then I guess I don't understand what you mean by the term quality. Apparently you use it as some form of jargon rather than its usual accepted meaning in everyday English? Or, more directly: are you implying something should be changed to make the code better? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Request for review, time_pps_fetch() enhancement
On Sun, 2013-02-10 at 12:37 +0200, Konstantin Belousov wrote: On Sat, Feb 09, 2013 at 02:47:06PM +0100, Jilles Tjoelker wrote: On Wed, Feb 06, 2013 at 05:58:30PM +0200, Konstantin Belousov wrote: On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote: I'd like feedback on the attached patch, which adds support to our time_pps_fetch() implementation for the blocking behaviors described in section 3.4.3 of RFC 2783. The existing implementation can only return the most recently captured data without blocking. These changes add the ability to block (forever or with timeout) until a new event occurs. Index: sys/kern/kern_tc.c === --- sys/kern/kern_tc.c (revision 246337) +++ sys/kern/kern_tc.c (working copy) @@ -1446,6 +1446,50 @@ * RFC 2783 PPS-API implementation. */ +static int +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps) +{ [snip] + aseq = pps-ppsinfo.assert_sequence; + cseq = pps-ppsinfo.clear_sequence; + while (aseq == pps-ppsinfo.assert_sequence + cseq == pps-ppsinfo.clear_sequence) { Note that compilers are allowed to optimize these accesses even over the sequential point, which is the tsleep() call. Only accesses to volatile objects are forbidden to be rearranged. I suggest to add volatile casts to pps in the loop condition. The memory pointed to by pps is global (other code may have a pointer to it); therefore, the compiler must assume that the tsleep() call (which invokes code in a different compilation unit) may modify it. Because volatile does not make concurrent access by multiple threads defined either, adding it here only seems to slow down the code (potentially). The volatile guarantees that the compiler indeed reloads the value on read access. Conceptually, the tsleep() does not modify or even access the checked fields, and compiler is allowed to note this by whatever methods (LTO ?). More, the standard says that an implementation is allowed to not evaluate part of the expression if no side effects are produced, even by calling a function. I agree that for practical means, the _currently_ used compilers should consider the tsleep() call as the sequential point. But then the volatile qualifier cast applied for the given access would not change the code as well. Doesn't this then imply that essentially every driver has this problem, and for that matter, every sequence of code anywhere in the base involving loop while repeatedly sleeping, then waking and checking the state of some data for changes? I sure haven't seen that many volatile qualifiers scattered around the code. -- Ian + err = tsleep(pps, PCATCH, ppsfch, timo); + if (err == EWOULDBLOCK fapi-timeout.tv_sec == -1) { + continue; + } else if (err != 0) { + return (err); + } + } + } -- Jilles Tjoelker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Reviewing a FAQ change about LORs
On Thu, 2013-02-07 at 19:32 -0500, Eitan Adler wrote: Does someone here mind reviewing http://www.freebsd.org/cgi/query-pr.cgi?pr=174226 for correctness. Please feel free to post alternate diffs as a reply as well. Does it make sense to reference a web page on LOR status that hasn't been updated in four years? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Request for review, time_pps_fetch() enhancement
On Wed, 2013-02-06 at 17:58 +0200, Konstantin Belousov wrote: On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote: I'd like feedback on the attached patch, which adds support to our time_pps_fetch() implementation for the blocking behaviors described in section 3.4.3 of RFC 2783. The existing implementation can only return the most recently captured data without blocking. These changes add the ability to block (forever or with timeout) until a new event occurs. -- Ian Index: sys/kern/kern_tc.c === --- sys/kern/kern_tc.c (revision 246337) +++ sys/kern/kern_tc.c (working copy) @@ -1446,6 +1446,50 @@ * RFC 2783 PPS-API implementation. */ +static int +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps) +{ + int err, timo; + pps_seq_t aseq, cseq; + struct timeval tv; + + if (fapi-tsformat fapi-tsformat != PPS_TSFMT_TSPEC) + return (EINVAL); + + /* +* If no timeout is requested, immediately return whatever values were +* most recently captured. If timeout seconds is -1, that's a request +* to block without a timeout. WITNESS won't let us sleep forever +* without a lock (we really don't need a lock), so just repeatedly +* sleep a long time. +*/ Regarding no need for the lock, it would just move the implementation into the low quality one, for the case when one timestamp capture is lost and caller of time_pps_fetch() sleeps until next pps event is generated. I understand the desire to avoid lock, esp. in the pps_event() called from the arbitrary driver context. But the race is also real. What race? A user of the pps interface understands that there is one event per second, and understands that if you ask to block until the next event at approximately the time that event is expected to occur, then it is ambiguous whether the call completes almost-immediately or in about 1 second. Looking at it another way, if a blocking call is made right around the time of the PPS, the thread could get preempted before getting to pps_fetch() function and not get control again until after the PPS has occurred. In that case it's going to block for about a full second, even though the call was made before top-of-second. That situation is exactly the same with or without locking, so what extra functionality is gained with locking? What guarantee does locking let us make to the caller that the lockless code doesn't? + if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec) { + if (fapi-timeout.tv_sec == -1) + timo = 0x7fff; + else { + tv.tv_sec = fapi-timeout.tv_sec; + tv.tv_usec = fapi-timeout.tv_nsec / 1000; + timo = tvtohz(tv); + } + aseq = pps-ppsinfo.assert_sequence; + cseq = pps-ppsinfo.clear_sequence; + while (aseq == pps-ppsinfo.assert_sequence + cseq == pps-ppsinfo.clear_sequence) { Note that compilers are allowed to optimize these accesses even over the sequential point, which is the tsleep() call. Only accesses to volatile objects are forbidden to be rearranged. I suggest to add volatile casts to pps in the loop condition. Thank you. I pondered volatility, but was under the impression that the function call took care of it. I'll fix that. -- Ian + err = tsleep(pps, PCATCH, ppsfch, timo); + if (err == EWOULDBLOCK fapi-timeout.tv_sec == -1) { + continue; + } else if (err != 0) { + return (err); + } + } + } + + pps-ppsinfo.current_mode = pps-ppsparam.mode; + fapi-pps_info_buf = pps-ppsinfo; + + return (0); +} + int pps_ioctl(u_long cmd, caddr_t data, struct pps_state *pps) { @@ -1485,13 +1529,7 @@ return (0); case PPS_IOC_FETCH: fapi = (struct pps_fetch_args *)data; - if (fapi-tsformat fapi-tsformat != PPS_TSFMT_TSPEC) - return (EINVAL); - if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec) - return (EOPNOTSUPP); - pps-ppsinfo.current_mode = pps-ppsparam.mode; - fapi-pps_info_buf = pps-ppsinfo; - return (0); + return (pps_fetch(fapi, pps)); #ifdef FFCLOCK case PPS_IOC_FETCH_FFCOUNTER: fapi_ffc = (struct pps_fetch_ffc_args *)data; @@ -1540,7 +1578,7 @@ void pps_init(struct pps_state *pps) { - pps-ppscap |= PPS_TSFMT_TSPEC; + pps-ppscap |= PPS_TSFMT_TSPEC | PPS_CANWAIT; if (pps-ppscap PPS_CAPTUREASSERT) pps-ppscap |= PPS_OFFSETASSERT; if (pps-ppscap PPS_CAPTURECLEAR) @@ -1680,6 +1718,9 @@ hardpps(tsp, ts.tv_nsec
fcntl(2) F_READAHEAD set to zero doesn't work [patch]
I discovered today that fcntl(fd, F_READAHEAD, 0) doesn't work as advertised. It's supposed to disable readahead, but instead it restores the default readahead behavior (if it had previously been changed), and there is no way to disable readahead.[1] I think the attached patch fixes it, but it's not immediately clear from the patch why; here's the deal... The amount of readahead is calculated by sequential_heuristic() in vfs_vnops.c. If the FRDAHEAD flag is set on the file it uses the value stored in the file's f_seqcount, otherwise it calculates a value (and updates f_seqcount, which doesn't ever happen when FRDAHEAD is set). So the patch causes the FRDAHEAD flag to be set even in the case of the readahead amount being zero. Because it seems like a useful concept, it still allows the readahead to be restored to default behavior, now by passing a negative value. Does this look right to those of you who understand this part of the system better than I do? -- Ian [1] No way using F_READAHEAD; I know about POSIX_FADV_RANDOM. Index: sys/kern/kern_descrip.c === --- sys/kern/kern_descrip.c (revision 246337) +++ sys/kern/kern_descrip.c (working copy) @@ -776,7 +776,7 @@ } fhold(fp); FILEDESC_SUNLOCK(fdp); - if (arg != 0) { + if (arg = 0) { vp = fp-f_vnode; error = vn_lock(vp, LK_SHARED); if (error != 0) { Index: lib/libc/sys/fcntl.2 === --- lib/libc/sys/fcntl.2 (revision 246337) +++ lib/libc/sys/fcntl.2 (working copy) @@ -28,7 +28,7 @@ .\ @(#)fcntl.2 8.2 (Berkeley) 1/12/94 .\ $FreeBSD$ .\ -.Dd July 27, 2012 +.Dd February 8, 2013 .Dt FCNTL 2 .Os .Sh NAME @@ -171,7 +171,7 @@ which is rounded up to the nearest block size. A zero value in .Fa arg -turns off read ahead. +turns off read ahead, a negative value restores the system default. .It Dv F_RDAHEAD Equivalent to Darwin counterpart which sets read ahead amount of 128KB when the third argument, ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Request for review, time_pps_fetch() enhancement
I'd like feedback on the attached patch, which adds support to our time_pps_fetch() implementation for the blocking behaviors described in section 3.4.3 of RFC 2783. The existing implementation can only return the most recently captured data without blocking. These changes add the ability to block (forever or with timeout) until a new event occurs. -- Ian Index: sys/kern/kern_tc.c === --- sys/kern/kern_tc.c (revision 246337) +++ sys/kern/kern_tc.c (working copy) @@ -1446,6 +1446,50 @@ * RFC 2783 PPS-API implementation. */ +static int +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps) +{ + int err, timo; + pps_seq_t aseq, cseq; + struct timeval tv; + + if (fapi-tsformat fapi-tsformat != PPS_TSFMT_TSPEC) + return (EINVAL); + + /* + * If no timeout is requested, immediately return whatever values were + * most recently captured. If timeout seconds is -1, that's a request + * to block without a timeout. WITNESS won't let us sleep forever + * without a lock (we really don't need a lock), so just repeatedly + * sleep a long time. + */ + if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec) { + if (fapi-timeout.tv_sec == -1) + timo = 0x7fff; + else { + tv.tv_sec = fapi-timeout.tv_sec; + tv.tv_usec = fapi-timeout.tv_nsec / 1000; + timo = tvtohz(tv); + } + aseq = pps-ppsinfo.assert_sequence; + cseq = pps-ppsinfo.clear_sequence; + while (aseq == pps-ppsinfo.assert_sequence + cseq == pps-ppsinfo.clear_sequence) { + err = tsleep(pps, PCATCH, ppsfch, timo); + if (err == EWOULDBLOCK fapi-timeout.tv_sec == -1) { +continue; + } else if (err != 0) { +return (err); + } + } + } + + pps-ppsinfo.current_mode = pps-ppsparam.mode; + fapi-pps_info_buf = pps-ppsinfo; + + return (0); +} + int pps_ioctl(u_long cmd, caddr_t data, struct pps_state *pps) { @@ -1485,13 +1529,7 @@ return (0); case PPS_IOC_FETCH: fapi = (struct pps_fetch_args *)data; - if (fapi-tsformat fapi-tsformat != PPS_TSFMT_TSPEC) - return (EINVAL); - if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec) - return (EOPNOTSUPP); - pps-ppsinfo.current_mode = pps-ppsparam.mode; - fapi-pps_info_buf = pps-ppsinfo; - return (0); + return (pps_fetch(fapi, pps)); #ifdef FFCLOCK case PPS_IOC_FETCH_FFCOUNTER: fapi_ffc = (struct pps_fetch_ffc_args *)data; @@ -1540,7 +1578,7 @@ void pps_init(struct pps_state *pps) { - pps-ppscap |= PPS_TSFMT_TSPEC; + pps-ppscap |= PPS_TSFMT_TSPEC | PPS_CANWAIT; if (pps-ppscap PPS_CAPTUREASSERT) pps-ppscap |= PPS_OFFSETASSERT; if (pps-ppscap PPS_CAPTURECLEAR) @@ -1680,6 +1718,9 @@ hardpps(tsp, ts.tv_nsec + 10 * ts.tv_sec); } #endif + + /* Wakeup anyone sleeping in pps_fetch(). */ + wakeup(pps); } /* ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Sockets programming question
On Mon, 2013-01-28 at 18:02 +0200, Konstantin Belousov wrote: On Mon, Jan 28, 2013 at 08:11:47AM -0700, Ian Lepore wrote: I've got a question that isn't exactly freebsd-specific, but implemenation-specific behavior may be involved. I've got a server process that accepts connections from clients on a PF_LOCAL stream socket. Multiple clients can be connected at once; a list of them is tracked internally. The server occasionally sends data to each client. The time between messages to clients can range literally from milliseconds to months. Clients never send any data to the server, indeed they may shutdown that side of the connection and just receive data. The only way I can find to discover that a client has disappeared is by trying to send them a message and getting an error because they've closed the socket or died completely. At that point I can reap the resources and remove them from the client list. This is problem because of the months between messages thing. A lot of clients can come and go during those months and I've got this ever-growing list of open socket descriptors because I never had anything to say the whole time they were connected. By trial and error I've discovered that I can sort of poll for their presence by writing a zero-length message. If the other end of the connection is gone I get the expected error and can reap the client, otherwise it appears to quietly write nothing and return zero and have no other side effects than polling the status of the server-client side of the pipe. My problem with this polling is that I can't find anything in writing that sanctions this behavior. Would this amount to relying on a non-portable accident of the current implementation? Also, am I missing something simple and there's a cannonical way to handle this? In all the years I've done client/server stuff I've never had quite this type of interaction (or lack thereof) between client and server before. Check for the IN events as well. I would not trust the mere presence of the IN in the poll result, but consequent read should return EOF and this is good indicator of the dead client. You can't use EOF on a read() to determine client life when the nature of the client/server relationship is that clients are allowed to shutdown(fd, SHUT_WR) as soon as they connect because they expect to receive but never send any data. On the other hand, Alfred's suggestion of using poll(2) rather than select(2) worked perfectly. Polling with an events mask of zero results in it returning POLLHUP in revents if the client has closed the socket. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Sockets programming question
I've got a question that isn't exactly freebsd-specific, but implemenation-specific behavior may be involved. I've got a server process that accepts connections from clients on a PF_LOCAL stream socket. Multiple clients can be connected at once; a list of them is tracked internally. The server occasionally sends data to each client. The time between messages to clients can range literally from milliseconds to months. Clients never send any data to the server, indeed they may shutdown that side of the connection and just receive data. The only way I can find to discover that a client has disappeared is by trying to send them a message and getting an error because they've closed the socket or died completely. At that point I can reap the resources and remove them from the client list. This is problem because of the months between messages thing. A lot of clients can come and go during those months and I've got this ever-growing list of open socket descriptors because I never had anything to say the whole time they were connected. By trial and error I've discovered that I can sort of poll for their presence by writing a zero-length message. If the other end of the connection is gone I get the expected error and can reap the client, otherwise it appears to quietly write nothing and return zero and have no other side effects than polling the status of the server-client side of the pipe. My problem with this polling is that I can't find anything in writing that sanctions this behavior. Would this amount to relying on a non-portable accident of the current implementation? Also, am I missing something simple and there's a cannonical way to handle this? In all the years I've done client/server stuff I've never had quite this type of interaction (or lack thereof) between client and server before. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: NMI watchdog functionality on Freebsd
On Wed, 2013-01-23 at 08:47 -0800, Matthew Jacob wrote: On 1/23/2013 7:25 AM, John Baldwin wrote: On Tuesday, January 22, 2013 5:40:55 pm Sushanth Rai wrote: Hi, Does freebsd have some functionality similar to Linux's NMI watchdog ? I'm aware of ichwd driver, but that depends to WDT to be available in the hardware. Even when it is available, BIOS needs to support a mechanism to trigger a OS level recovery to get any useful information when system is really wedged (with interrupt disabled) The principle purpose of a watchdog is to keep the system from hanging. Information is secondary. The ichwd driver can use the LPC part of ICH hardware that's been there since ICH version 4. I implemented this more fully at Panasas. The first importance is to keep the system from being hung. The next piece of information is to detect, on reboot, that a watchdog event occurred. Finally, trying to isolate why is good. This is equivalent to the tco_WDT stuff on Linux. It's not interrupt driven (it drives the reset line on the processor). I think there's value in the NMI watchdog idea, but unless you back it up with a real hardware watchdog you don't really have full watchdog functionality. If the NMI can get the OS to produce some extra info, that's great, and using an NMI gives you a good chance of doing that even if it is normal interrupt processing that has wedged the machine. But calling panic() invokes plenty of processing that can get wedged in other ways, so even an NMI-based watchdog isn't g'teed to get the machine running again. But adding a real hardware watchdog that fires on a slightly longer timeout than the NMI watchdog gives you the best of everything: you get information if it's possible to produce it, and you get a real hardware reset shortly thereafter if producing the info fails. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Fri, 2013-01-18 at 20:37 +0100, Wojciech Puchar wrote: disk would write data I suspect that I'm encountering situations right now at netflix where this advice is not true. I have drives that are seeing intermittent errors, then being forced into reset after a timeout, and then coming back up with filesystem problems. It's only a suspicion at this point, not a confirmed case. true. I just assumed that anywhere it matters one would use gmirror. As for myself - i always prefer to put different manufacturers drives for gmirror or at least - not manufactured at similar time. That is good advice. I bought six 1TB drives at the same time a few years ago and received drives with consequtive serial numbers. They were all part of the same array, and they all failed (click of death) within a six hour timespan of each other. Luckily I noticed the clicking right away and was able to get all the data copied to another array within a few hours, before they all died. -- Ian 2 fails at the same moment is rather unlikely. Of course - everything is possible so i do proper backups to remote sites. Remote means another city. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Fri, 2013-01-18 at 22:18 +0100, Wojciech Puchar wrote: and anyone who enabled SATA WC or complained about I/O slowness would be forced into Siberian salt mines for the remainder of their lives. so reserve a place for me there. Yeah, me too. I prefer to go for all-out performance with separate risk mitigation strategies. I wouldn't set up a client datacenter that way, but it's wholly appropriate for what I do with this machine. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Failsafe on kernel panic
On Thu, 2013-01-17 at 08:38 +0200, Sami Halabi wrote: btw: i don't see any options in my kernel config for KBD / Unatteneded , th eonly thing that mention its is: device ukbd Sami I think if you don't have any kdb options turned on, then a panic should automatically store a crashdump to swap, then reboot the machine. If that's not working, perhaps it locks up trying to store the dump? If the hardware has a watchdog timer, enabling that might be the best way to ensure a reboot on any kind of crash or hang. -- Ian On Thu, Jan 17, 2013 at 6:45 AM, Sami Halabi sodyn...@gmail.com wrote: Its only a kernel option? There is no flag to pass to the loader? SAMI 17 2013 05:18, Ian Lepore i...@freebsd.org: On Wed, 2013-01-16 at 23:27 +0200, Sami Halabi wrote: Thank you for your response, very helpful. one question - how do i configure auto-reboot once kernel panic occurs? Sami From src/sys/conf/NOTES, this may be what you're looking for... # # Don't enter the debugger for a panic. Intended for unattended operation # where you may want to enter the debugger from the console, but still want # the machine to recover from a panic. # options KDB_UNATTENDED But I think it only has meaning if you have option KDB in effect, otherwise it should just reboot itself after a 15 second pause. -- Ian On Wed, Jan 16, 2013 at 10:13 PM, John Baldwin j...@freebsd.org wrote: On Wednesday, January 16, 2013 2:25:33 pm Sami Halabi wrote: Hi everyone, I have a production box, in which I want to install new kernel without any remotd kvn. my problem is its 2 hours away, and if a kernel panic occurs I got a problem. I woner if I can seg failsafe script to load the old kernel in case of psnic. man nextboot (if you are using UFS) -- John Baldwin -- Sami Halabi Information Systems Engineer NMS Projects Expert FreeBSD SysAdmin Expert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Failsafe on kernel panic
On Wed, 2013-01-16 at 23:27 +0200, Sami Halabi wrote: Thank you for your response, very helpful. one question - how do i configure auto-reboot once kernel panic occurs? Sami From src/sys/conf/NOTES, this may be what you're looking for... # # Don't enter the debugger for a panic. Intended for unattended operation # where you may want to enter the debugger from the console, but still want # the machine to recover from a panic. # options KDB_UNATTENDED But I think it only has meaning if you have option KDB in effect, otherwise it should just reboot itself after a 15 second pause. -- Ian On Wed, Jan 16, 2013 at 10:13 PM, John Baldwin j...@freebsd.org wrote: On Wednesday, January 16, 2013 2:25:33 pm Sami Halabi wrote: Hi everyone, I have a production box, in which I want to install new kernel without any remotd kvn. my problem is its 2 hours away, and if a kernel panic occurs I got a problem. I woner if I can seg failsafe script to load the old kernel in case of psnic. man nextboot (if you are using UFS) -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Getting the current thread ID without a syscall?
On Tue, 2013-01-15 at 14:29 -0800, Alfred Perlstein wrote: On 1/15/13 1:43 PM, Konstantin Belousov wrote: On Tue, Jan 15, 2013 at 04:35:14PM -0500, Trent Nelson wrote: Luckily it's for an open source project (Python), so recompilation isn't a big deal. (I also check the intrinsic result versus the syscall result during startup to verify the same ID is returned, falling back to the syscall by default.) For you, may be. For your users, it definitely will be a problem. And worse, the problem will be blamed on the operating system and not to the broken application. Anything we can do to avoid this would be best. The reason is that we are still dealing with an optimization that perl did, it reached inside of the opaque struct FILE to do nasty things. Now it is very difficult for us to fix struct FILE. We are still paying for this years later. Any way we can make this a supported interface? -Alfred Re-reading the original question, I've got to ask why pthread_self() isn't the right answer? The requirement wasn't I need to know what the OS calls me it was I need a unique ID per thread within a process. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kgzip(1) is broken
On Tue, 2013-01-15 at 13:27 -0800, dte...@freebsd.org wrote: Hello, I have been sad of-late because kgzip(1) no longer produces a usable kernel. All versions of 9.x suffer this. And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this recently broke in the 8.x series. I haven't tried the 7 series lately, but if whatever is making the rounds gets MFC'd that far back, I expect the problem to percolate there too. The symptom is that the machine reboots immediately and unexpectedly the moment the kernel is executed by the loader. This is quite troubling and I am looking for someone to help find the culprit. I don't know where to start looking. Here are some possible candidates from the things that were MFC'd to 8 in that timeframe. I haven't looked at what these do, they're just changes that affect files related to booting. r233211 r233377 r233469 r234563 -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
RE: kgzip(1) is broken
On Tue, 2013-01-15 at 16:10 -0800, Devin Teske wrote: -Original Message- From: Devin Teske [mailto:devin.te...@fisglobal.com] On Behalf Of dte...@freebsd.org Sent: Tuesday, January 15, 2013 3:10 PM To: 'Ian Lepore' Cc: freebsd-hackers@freebsd.org; dte...@freebsd.org Subject: RE: kgzip(1) is broken -Original Message- From: Ian Lepore [mailto:free...@damnhippie.dyndns.org] Sent: Tuesday, January 15, 2013 3:05 PM To: dte...@freebsd.org Cc: freebsd-hackers@freebsd.org Subject: Re: kgzip(1) is broken On Tue, 2013-01-15 at 13:27 -0800, dte...@freebsd.org wrote: Hello, I have been sad of-late because kgzip(1) no longer produces a usable kernel. All versions of 9.x suffer this. And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this recently broke in the 8.x series. I haven't tried the 7 series lately, but if whatever is making the rounds gets MFC'd that far back, I expect the problem to percolate there too. The symptom is that the machine reboots immediately and unexpectedly the moment the kernel is executed by the loader. This is quite troubling and I am looking for someone to help find the culprit. I don't know where to start looking. Here are some possible candidates from the things that were MFC'd to 8 in that timeframe. I haven't looked at what these do, they're just changes that affect files related to booting. r233211 r233377 r233469 r234563 Thanks Ian! I'll test each one individually to see if regressing any one (or all) addresses the problem. Progress... Looks like I found the culprit. Turns out it's a back-ported bxe(4) driver (back-ported from 9 -- where kgzip seems to never work). I wonder why back-porting bxe(4) from stable/9 to releng/8.3 would cause kgzip to produce non-working kernels. Yeah, it'll be interesting to see how a device driver can lead to the machine reboots immediately and unexpectedly the moment the kernel is executed by the loader, which I took to mean before seeing the copyright or anything. I'm emailing the maintainers (davidch + other Broadcom folk) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Tue, 2013-01-15 at 15:28 -0500, Karim Fodil-Lemelin wrote: On 15/01/2013 3:03 PM, Dieter BSD wrote: Disabling the disks's write cache is *required* for data integrity. One op per rev means write caching is disabled and no queueing. But dmesg claims Command Queueing enabled, so you should be getting more than one op per rev, and writes should be fast. Is this dd to the raw drive, to a filesystem? (FFS? ZFS? other?) Are you running compression, encryption, or some other feature that might slow things down? Also, try dd with a larger block size, like bs=1m. Hi, Thanks to everyone that answered so far. Here is a follow up. dd to the raw drive and no compression/encryption or some other features, just a naive boot off a live 9.1 CD then dd (see below). The following results have been gathered on the FreeBSD 9.1 system: You say dd with a raw drive, but as several people have pointed out, linux dd doesn't go directly to the drive by default. It looks like you can make it do so with the direct option, which should make it behave the same as freebsd's dd behaves by default (I think, I'm no linux expert). For example, using a usb thumb drive: th2 # dd if=/dev/sdb4 of=/dev/null count=100 100+0 records in 100+0 records out 51200 bytes (51 kB) copied, 0.0142396 s, 3.6 MB/s th2 # dd if=/dev/sdb4 of=/dev/null count=100 iflag=direct 100+0 records in 100+0 records out 51200 bytes (51 kB) copied, 0.0628582 s, 815 kB/s Hmm, just before hitting send I saw your other response that SAS drives behave badly, SATA are fine. That does seem to point away from dd behavior. It might still be interesting to see if the direct flag on linux drops performance into the same horrible range as freebsd with SAS. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Proper way to determine place of system sources in makefile?
On Sun, 2013-01-06 at 22:17 +0400, Lev Serebryakov wrote: Hello, Hackers. I'm writing some code, which is built outside of system sources but depends on them. I'm using FreeBSD mk infrastructure. When code is kernel module (uses bsd.kmod.mk) here is SYSDIR variable. But which is proper way to refer to system sources when makefile is prepared for shared library (bsd.lib.mk) or program (bsd.prog.mk)? That may depend on what you mean by system sources. In particular, some header files which are generated during the build don't live under /usr/src/sys, they're in $OBJDIR/sys/kernconf/. I was struggling with how to include such a file (in a non-hacky way) while building a bootloader from sys/boot/arm the other day, and I never did come up with a clean answer. (I do understand why -- the header files I wanted have content that changes based on KERNCONF=, and sys/boot is built during buildworld, not buildkernel.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Another WTF moment
On Sun, 2012-12-16 at 12:01 -0800, Ronald F. Guilmette wrote: I have two Seagate ST380011A drives, both in the same single system. On that system, I boot to the FreeBSD 9.1-RC3 LiveCD. The resulting dmesg messages indicate the following regarding the two drives: ada0 at ata0 bus 0 scbus2 target 0 lun 0 ada0: ST380011A 3.54 ATA-6 device ada0: 100.000MB/s transfers (UDMA5, PIO 8192bytes) ada0: 76318MB (156299375 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad0 ada1 at ata0 bus 0 scbus2 target 1 lun 0 ada1: ST380011A 3.06 ATA-6 device ada1: 100.000MB/s transfers (UDMA5, PIO 8192bytes) ada1: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad1 So, um, WTF? One ST380011A is 156299375 sectors big, and the other one is 156301488 big. How exactly does this happen? Assuming the 3.06 and 3.54 are firmware revision numbers, one might speculate that ongoing testing showed higher sector failure rates than intially expected, and thus newer firmware sets aside a few more sectors as spares. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [RFQ] make witness panic an option
On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote: Hi all, When debugging and writing wireless drivers/stack code, I like to sprinkle lots of locking assertions everywhere. However, this does cause things to panic quite often during active development. This patch (against stable/9) makes the actual panic itself configurable. It still prints the message regardless. This has allowed me to sprinkle more locking assertions everywhere to investigate whether particular paths have been hit or not. I don't necessarily want those to panic the kernel. I'd like everyone to consider this for FreeBSD-HEAD. Thanks! I strongly support this, because I'm tired of having to hack it in by hand every time I need it. You can't boot an arm platform right now (on freebsd 8, 9, or 10) without a LOR very early in the boot. Once you get past that, 2 or 3 device drivers I use panic way before we even get to mounting root. Those panics can clearly be ignored, because we've been shipping products for years based on this code. (It's on my to-do list to fix them, but more pressing problems are higher on the list.) When a new problem crops up that isn't harmless, it totally sucks that I can't just turn on witness without first hacking the code to make the known problems non-panicky. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [RFQ] make witness panic an option
On Thu, 2012-11-15 at 17:47 +, Attilio Rao wrote: On 11/15/12, Ian Lepore free...@damnhippie.dyndns.org wrote: On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote: Hi all, When debugging and writing wireless drivers/stack code, I like to sprinkle lots of locking assertions everywhere. However, this does cause things to panic quite often during active development. This patch (against stable/9) makes the actual panic itself configurable. It still prints the message regardless. This has allowed me to sprinkle more locking assertions everywhere to investigate whether particular paths have been hit or not. I don't necessarily want those to panic the kernel. I'd like everyone to consider this for FreeBSD-HEAD. Thanks! I strongly support this, because I'm tired of having to hack it in by hand every time I need it. You can't boot an arm platform right now (on freebsd 8, 9, or 10) without a LOR very early in the boot. Once you get past that, 2 or 3 device drivers I use panic way before we even get to mounting root. Those panics can clearly be ignored, because we've been shipping products for years based on this code. (It's on my to-do list to fix them, but more pressing problems are higher on the list.) This is a ridicolous motivation. What are the panics in question? Why they are not fixed yet? Without WITNESS_KDB you should not panic even in cases where WITNESS yells. So if you do, it means there is a more subdole breakage going on here. Do you really think that an abusable mechanism will help here rather than fixing the actual problems? When a new problem crops up that isn't harmless, it totally sucks that I can't just turn on witness without first hacking the code to make the known problems non-panicky. I really don't understand what are these harmless problems here. I just know one and it is between the dirhash lock and the bufwait lock for UFS, which is carefully documented in the code comments. All the others cases haven't been analyzed deeply enough to quantify them as harmless. Can you please make real examples? No. Since you've made it abundantly clear in this thread that you are not open to anyone else's opinion and won't change your mind, I'm not going to waste even 10 seconds explaining my perfectly valid needs. I'll just keep hacking the code up to not panic when I need to. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Give users a hint when their locate database is too small.
On Tue, 2012-11-13 at 11:05 -0500, Eitan Adler wrote: On 13 November 2012 10:58, Eitan Adler li...@eitanadler.com wrote: Okay... sorry for the spam. I remember there was a reason I used /etc/periodic/weekly/310.locate instead of /usr/libexec/locate.updatedb. The latter must not be run as root, and the former takes care of this work. Since the default is to enable weekly updates I am inclined to use the 310.locate script instead. Would it work to refer them to the locate.updatedb manpage (which references the periodic script, and presumably would be kept up to date with any script renaming/numbering)? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Memory reserves or lack thereof
On Mon, 2012-11-12 at 13:18 +0100, Andre Oppermann wrote: Well, what's the current set of best practices for allocating mbufs? If an allocation is driven by user space then you can use M_WAITOK. If an allocation is driven by the driver or kernel (callout and so on) you do M_NOWAIT and handle a failure by trying again later either directly by rescheduling the callout or by the upper layer retransmit logic. On top of that individual mbuf allocation or stitching mbufs and clusters together manually is deprecated. If every possible you should use m_getm2(). root@pico:/root # man m_getm2 No manual entry for m_getm2 So when you say manually stitching mbufs together is deprecated, I take you mean in the case where you're letting the mbuf routines allocate the actual buffer space for you? I've got an ethernet driver on an ARM SoC in which the hardware receives into a series of buffers fixed at 128 bytes. Right now the code is allocating a cluster and then looping using m_append() to reassemble these buffers back into a full contiguous frame in a cluster. I was going to have a shot at using MEXTADD() to manually string the series of hardware/dma buffers together without copying the data. Is that sort of usage still a good idea? (And would it actually be a performance win? If I hand it off to the net stack and an m_pullup() or similar is going to happen along the way anyway, I might as well do it at driver level.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: watchdogd, jemalloc, and mlockall
On Sat, 2012-11-03 at 12:50 -0600, Ian Lepore wrote: On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote: On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote: In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. VSZ RSS 10236 10164 Dynamic 8624 8636 Static Those numbers are from ps -u on an arm platform. I just updated the PR (bin/173332) with some procstat -v output comparing with/without mlockall(). It appears that the bulk of the new RSS bloat comes from jemalloc allocating vmspace in 8MB chunks. With mlockall(MCL_FUTURE) in effect that leads to wiring 8MB to satisfy what probably amounts to a few hundred bytes of malloc'd memory. It would probably also be a good idea to remove the floating point from watchdogd to avoid wiring all of libm. The floating point is used just to turn the timeout-in-seconds into a power-of-two-nanoseconds value. There's probably a reasonably efficient way to do that without calling log(), considering that it only happens once at program startup. No, I propose to add a switch to turn on/off the mlockall() call. I have no opinion on the default value of the suggested switch. In a patch I submitted along with the PR, I added code to query the vm.swap_enabled sysctl and only call mlockall() when swapping is enabled. Nobody yet has said anything about what seems to me to be the real problem here: jemalloc grabs 8MB at a time even if you only need to malloc a few bytes, and there appears to be no way to control that behavior. Or maybe there's a knob in there that didn't jump out at me on a quick glance through the header files. I finally found some time to pursue this further. A small correction to what I said earlier: it appears that jemalloc allocates chunks of 4MB at a time, not 8, but it also appears that it allocates at least 2 chunks so the net effect is an 8MB default minimum allocation. I played with the jemalloc tuning option lg_chunk and with static versus dynamic linking, and came up with the numbers below, which were generated by ps -u on an ARM-based system with 64MB running -current from a couple weeks ago, but with the recent patch to watchdogd to eliminate the need for libm. I used lg_chunk:14 (16K chunks), the smallest value it would allow on this platform. For comparison I also include the numbers from a FreeBSD 8.2 ARM system (which would be dynamic linked and untuned, and also without any mlockall() calls). Link malloc%MEMVSZ RSS - dynamic untuned15.3 10040 9996 staticuntuned13.2 8624 8636 dynamic tuned 2.8 1880 1836 statictuned 0.8480 492 [ freebsd 8.2 ] 1.1 1752 748 So it appears that using jemalloc's tuning in a daemon that uses mlockall(2) is a big win, especially if the daemon doesn't do much memory allocation (watchdogd allocates 2 things, 4k and 1280 bytes; if you use -e it also strdup()s the command string). It also seems that providing a build-time knob to control static linking would be valuable on platforms that are very memory limited and can't benefit from having all of libc wired. I haven't attached a patch because there appears to be no good way to actually achieve this in a platform-agnostic way. The jemalloc code enforces the lower range of the lg_chunk tuning value to be tied to the page size of the platform, and it rejects out of range values without changing the tuning. The code that works on an ARM with 4K page size, const char *malloc_conf = lg_chunk:14; would fail on a system that had bigger pages. The tuning must be specified with a compile-time constant like that, because it has to be tuned before the first allocation, which apparently happens before main() is entered. It would be nice if jemalloc would clip the tuning to the lowest legal value instead of rejecting it, especially since the lowest legal value is calculated based not only on page size but on the value of other configurable values. There's another potential solution, but it strikes me as rather inelegant... jemalloc can also be tuned with the MALLOC_CONF env var. With the right rc-fu we could provide something like a watchdogd_memtune variable that you could set and watchdogd would be invoked with MALLOC_CONF set to that in the environment. It still couldn't be set to a default value that was good for all platforms. It would also get passed through environment inheritence to any -e whatever command run by watchdogd, which isn't necessarily appropriate
procstat -v question
In a line of procstat -v output such as this: PID STARTEND PRT RES PRES REF SHD FL TP PATH 60065 0x200c1000 0x201c3000 r-x 1820 17 8 CN vn /usr/lib/libstdc++.so.6 Does that 182 resident pages mean that the process being displayed is referencing that many pages itself, or does that represent how many pages are resident due to all the references from all the processes that have the library open? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: watchdogd, jemalloc, and mlockall
On Sun, 2012-11-04 at 09:16 -0500, Dylan Cochran wrote: Have you already tried something like opt.lg_chunk? This, combined with other options for the library (man 3 jemalloc), should reduce the space from 8MB down to 16K, or so. (approximation, I'm being liberal for jemalloc's internal bookkeeping size). For a special case like watchdogd, this would make more sense in general (given it should be designed to do no allocations/deletions during normal operation anyway). For other programs, this would be as unwise as statically linking them. I had completely missed the fact that jemalloc had its own manpage, thank you. Given that new information I think the pieces are in place to put watchdogd on a memory diet. I'll work up a patch in the next couple days -- Ian The 'perfect' solution would obviously be improving the library manager (rtld) to only mmap() function pages it needs, though I will admit I'm not sure if the ELF format is even capable of supporting something like that, what other problems it would cause down the road, or if it even attempts to do this already (I haven't looked at the runtime linker code since 7.0). By the way, remember that when you compare static v dynamic, that the runtime linker does allocate private memory to handle the resolving of symbols to virtual memory addresses. That may skew your memory usage figures a bit. On Sat, Nov 3, 2012 at 4:12 PM, Ian Lepore free...@damnhippie.dyndns.org wrote: On Sat, 2012-11-03 at 12:59 -0700, Xin Li wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 11/3/12 11:38 AM, Ian Lepore wrote: In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. Speaking for this, the last time I brought this up, someone (can't remember, I think it was phk@) argued that the shared library would use only one copy of memory, while statically linked ones would be duplicated and thus use more memory. I haven't yet tried to prove or challenge that, though. That sounds right to me... if 3 or 4 daemons were to eventually be statically linked because of mlockall(), then each of them would have its own private copy of strlen(), and malloc(), and so on; we'd be back to the bad old days before shared libs came along. Each program would contain its own copy of only the routines from the library that it uses, not the entire library in each program. On the other hand, if even one daemon linked with shared libc uses mlockall(), then all of libc gets wired. As I understand it, only one physical copy of libc would exist in memory, still shared by almost all running apps. The entire contents of the library would continuously occupy physical memory, even the parts that no apps are using. It's hard to know how to weigh the various tradeoffs. I suspect there's no one correct answer. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: watchdogd, jemalloc, and mlockall
On Sun, 2012-11-04 at 09:36 -0700, Warner Losh wrote: On Nov 3, 2012, at 12:50 PM, Ian Lepore wrote: On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote: On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote: In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. VSZ RSS 10236 10164 Dynamic 8624 8636 Static Those numbers are from ps -u on an arm platform. I just updated the PR (bin/173332) with some procstat -v output comparing with/without mlockall(). It appears that the bulk of the new RSS bloat comes from jemalloc allocating vmspace in 8MB chunks. With mlockall(MCL_FUTURE) in effect that leads to wiring 8MB to satisfy what probably amounts to a few hundred bytes of malloc'd memory. It would probably also be a good idea to remove the floating point from watchdogd to avoid wiring all of libm. The floating point is used just to turn the timeout-in-seconds into a power-of-two-nanoseconds value. There's probably a reasonably efficient way to do that without calling log(), considering that it only happens once at program startup. No, I propose to add a switch to turn on/off the mlockall() call. I have no opinion on the default value of the suggested switch. In a patch I submitted along with the PR, I added code to query the vm.swap_enabled sysctl and only call mlockall() when swapping is enabled. Nobody yet has said anything about what seems to me to be the real problem here: jemalloc grabs 8MB at a time even if you only need to malloc a few bytes, and there appears to be no way to control that behavior. Or maybe there's a knob in there that didn't jump out at me on a quick glance through the header files. Isn't that only for non-production builds? Warner I don't think so, I discovered this on my tflex unit running -current, and it's built with MALLOC_PRODUCTION defined because it doesn't have enough ram to boot without it defined. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: watchdogd, jemalloc, and mlockall
On Sun, 2012-11-04 at 09:36 -0700, Warner Losh wrote: On Nov 3, 2012, at 12:50 PM, Ian Lepore wrote: On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote: On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote: In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. VSZ RSS 10236 10164 Dynamic 8624 8636 Static Those numbers are from ps -u on an arm platform. I just updated the PR (bin/173332) with some procstat -v output comparing with/without mlockall(). It appears that the bulk of the new RSS bloat comes from jemalloc allocating vmspace in 8MB chunks. With mlockall(MCL_FUTURE) in effect that leads to wiring 8MB to satisfy what probably amounts to a few hundred bytes of malloc'd memory. It would probably also be a good idea to remove the floating point from watchdogd to avoid wiring all of libm. The floating point is used just to turn the timeout-in-seconds into a power-of-two-nanoseconds value. There's probably a reasonably efficient way to do that without calling log(), considering that it only happens once at program startup. No, I propose to add a switch to turn on/off the mlockall() call. I have no opinion on the default value of the suggested switch. In a patch I submitted along with the PR, I added code to query the vm.swap_enabled sysctl and only call mlockall() when swapping is enabled. Nobody yet has said anything about what seems to me to be the real problem here: jemalloc grabs 8MB at a time even if you only need to malloc a few bytes, and there appears to be no way to control that behavior. Or maybe there's a knob in there that didn't jump out at me on a quick glance through the header files. Isn't that only for non-production builds? Warner I just realized the implication of what you asked. I think it must be that jemalloc always allocates big chunks of vmspace at a time (unless tuned to do otherwise; I haven't looked into the tuning stuff yet), but when MALLOC_PRODUCTION isn't defined it also touches all the pages within that allocated space, presumably to lay in known byte patterns or other debugging info. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..
On Wed, 2012-10-31 at 13:38 -0700, Adrian Chadd wrote: On 31 October 2012 12:06, Konstantin Belousov kostik...@gmail.com wrote: Watchdogd was recently changed to mlock its memory. This is the cause of the RSS increase. If not wired, swapout might cause a delay of the next pat, leading to panic. Right, but look at the virtual size of the 6.4 process. It's not 10 megabytes at all. Even if you wired all of that into memory, it wouldn't be 10 megabytes. Adrian After gathering some more evidence, I agree that the huge increase I noticed in watchdogd is caused by a combo of jemalloc's behavior and the recent addition of mlockall(2) to watchdogd. Since this is only slightly tangentially related to the OP's questions as near as I can tell, I've entered a PR for it[1], and we can followup with a separate discusssion thread about that. While jemalloc can explain the growth in VSZ between 6.4 and 9.x, it doesn't look like mlockall() has anything to do with the original question of why the RSZ got so much bigger. In other words, part of the original question is still unanswered. [1] http://www.freebsd.org/cgi/query-pr.cgi?pr=173332 -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
watchdogd, jemalloc, and mlockall
In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. VSZ RSS 10236 10164 Dynamic 8624 8636 Static Those numbers are from ps -u on an arm platform. I just updated the PR (bin/173332) with some procstat -v output comparing with/without mlockall(). It appears that the bulk of the new RSS bloat comes from jemalloc allocating vmspace in 8MB chunks. With mlockall(MCL_FUTURE) in effect that leads to wiring 8MB to satisfy what probably amounts to a few hundred bytes of malloc'd memory. It would probably also be a good idea to remove the floating point from watchdogd to avoid wiring all of libm. The floating point is used just to turn the timeout-in-seconds into a power-of-two-nanoseconds value. There's probably a reasonably efficient way to do that without calling log(), considering that it only happens once at program startup. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: watchdogd, jemalloc, and mlockall
On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote: On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote: In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. VSZ RSS 10236 10164 Dynamic 8624 8636 Static Those numbers are from ps -u on an arm platform. I just updated the PR (bin/173332) with some procstat -v output comparing with/without mlockall(). It appears that the bulk of the new RSS bloat comes from jemalloc allocating vmspace in 8MB chunks. With mlockall(MCL_FUTURE) in effect that leads to wiring 8MB to satisfy what probably amounts to a few hundred bytes of malloc'd memory. It would probably also be a good idea to remove the floating point from watchdogd to avoid wiring all of libm. The floating point is used just to turn the timeout-in-seconds into a power-of-two-nanoseconds value. There's probably a reasonably efficient way to do that without calling log(), considering that it only happens once at program startup. No, I propose to add a switch to turn on/off the mlockall() call. I have no opinion on the default value of the suggested switch. In a patch I submitted along with the PR, I added code to query the vm.swap_enabled sysctl and only call mlockall() when swapping is enabled. Nobody yet has said anything about what seems to me to be the real problem here: jemalloc grabs 8MB at a time even if you only need to malloc a few bytes, and there appears to be no way to control that behavior. Or maybe there's a knob in there that didn't jump out at me on a quick glance through the header files. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: watchdogd, jemalloc, and mlockall
On Sat, 2012-11-03 at 12:59 -0700, Xin Li wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 11/3/12 11:38 AM, Ian Lepore wrote: In an attempt to un-hijack the thread about memory usage increase between 6.4 and 9.x, I'm starting a new thread here related to my recent discovery that watchdogd uses a lot more memory since it began using mlockall(2). I tried statically linking watchdogd and it made a small difference in RSS, presumably because it doesn't wire down all of libc and libm. Speaking for this, the last time I brought this up, someone (can't remember, I think it was phk@) argued that the shared library would use only one copy of memory, while statically linked ones would be duplicated and thus use more memory. I haven't yet tried to prove or challenge that, though. That sounds right to me... if 3 or 4 daemons were to eventually be statically linked because of mlockall(), then each of them would have its own private copy of strlen(), and malloc(), and so on; we'd be back to the bad old days before shared libs came along. Each program would contain its own copy of only the routines from the library that it uses, not the entire library in each program. On the other hand, if even one daemon linked with shared libc uses mlockall(), then all of libc gets wired. As I understand it, only one physical copy of libc would exist in memory, still shared by almost all running apps. The entire contents of the library would continuously occupy physical memory, even the parts that no apps are using. It's hard to know how to weigh the various tradeoffs. I suspect there's no one correct answer. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..
On Thu, 2012-11-01 at 10:12 +0800, David Xu wrote: On 2012/10/31 22:44, Karl Pielorz wrote: --On 31 October 2012 16:06 +0200 Konstantin Belousov kostik...@gmail.com wrote: Since you neglected to provide the verbatim output of procstat, nothing conclusive can be said. Obviously, you can make an investigation on your own. Sorry - when I ran it this morning the output was several hundred lines - I didn't want to post all of that to the list 99% of the lines are very similar. I can email it you off-list if having the whole lot will help? Then there's a bunch of 'large' blocks e.g.. PID STARTEND PRT RES PRES REF SHD FL TP PATH 20100x801c00x80280 rw- 28690 4 0 df 20100x802800x80340 rw- 18800 1 0 Most likely, these are malloc arenas. Ok, that's the heaviest usage. Then lots of 'little' blocks, 2010 0x70161000 0x70181000 rw- 160 1 0 ---D df And those are thread stacks. Ok, lots of those (lots of threads going on) - but they're all pretty small. Note that libc_r's thread stack is 64K, while libthr has 1M bytes per-thread. That would help explain the large increase in virtual size, but not the increase in resident size, right? In other words, there's nothing inherent in libthr that makes it use more stack, it just allocates more vmspace to allow greater potential growth? Hmmm, actually the chunks said to be thread stack above are neither 64K nor 1M, that's 128K. The malloc arenas are 12M, which seems like an unusual value. I haven't looked inside jemalloc at all, maybe that's normal. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..
On Wed, 2012-10-31 at 10:55 -0700, Adrian Chadd wrote: .. isn't the default thread stack size now really quite large? Like one megabyte large? That would explain a larger VSZ but the original post mentions that both virtual and resident sizes have grown by almost an order of magnitude. I think the same is true of the jemalloc aspect -- its design makes it use more virtual address space than phkmalloc when you've got lots of threads, but that shouldn't make it use so much more physical memory. I'm not positive of that, but I did notice when we upgraded from 6.x to 8.2 at work, our apps that have many dozens of threads use more virtual space, but not dramatically as much more physical memory as in the OP's case. I think there are some things we should be investigating about the growth of memory usage. I just noticed this: Freebsd 6.2 on an arm processor: 369 root 1 8 -88 1752K 748K nanslp 3:00 0.00% watchdogd Freebsd 10.0 on the same system: 367 root 1 -52 r0 10232K 10160K nanslp 10:04 0.00% watchdogd The 10.0 system is built with MALLOC_PRODUCTION (without that defined the system won't even boot, it only has 64MB of ram). That's a crazy amount of growth for a relatively simple daemon. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..
On Tue, 2012-10-30 at 13:46 +0100, Fabian Keil wrote: Karl Pielorz kpielorz_...@tdx.co.uk wrote: Can anyone think of any quick pointers as to why some code originally written under 6.4 amd64 - when re-compiled under 9.0-stable amd64 takes up a *lot* more memory when running? 6.4 comes with phkmalloc while 9.0 uses jemalloc. Maybe you are allocating memory in a way that is less space-efficiently handled by jemalloc's default configuration. Fabian jemalloc is certainly the first thing that came to my mind. Does MALLOC_PRODUCTION need to be defined on a 9.0 system, or is that something that gets turned on automatically in an official release build? (I'm always working with non-release stuff so I'm not sure how that gets handled). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: opensolaris B_TRUE and B_FALSE
On Mon, 2012-10-29 at 00:02 +0100, Erik Cederstrand wrote: Hello, I'm looking at this Clang analyzer report: http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-24-amd64/report-uH6BjZ.html.gz#EndPath Apart from the actual error, which is a apse positive, it seems like Clang can't find the macro definitions for B_TRUE and B_FALSE (if it did, hovering over them would show the macro definition). These are defined in sys/cddl/compat/opensolaris/sys/types.h as an enum of type boolean_t as long as _KERNEL is not defined. The only definition for boolean_t I can find is in sys/sys/types.h but it's only defined if _KERNEL is defined. I'm sure that ZFS wouldn't work if B_TRUE or B_FALSE were undefined, I just can't figure out where it's happening. I need a hint :-) Thanks, Erik Look further up in sys/cddl/compat/opensolaris/sys/types.h, they're also defined (as macros rather than enum) in the KERNEL case. They're also defined (as enum) in sys/gnu/fs/xfs/xfs_types.h. (Once again, SlickEdit pays for itself by answering with one right-click a question that would have been a pain to use grep for.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs instead of a singular program
On Fri, 2012-10-26 at 08:27 -0600, Warner Losh wrote: On Oct 26, 2012, at 12:23 AM, Simon J. Gerraty wrote: In particular, why cannot the ':L' and ':U' support be added ? Because they already exist - with different meanings. They were added to NetBSD make over 10 years ago, from the OSF version of pmake. And we've had the :U and :L for a similar period of time as well. Arguing age here is an interesting historical footnote, but not a compelling argument to justify the pain to our users. In several areas the behavior of bmake has been changed to make it a drop in replacement for FreeBSD, but the above (not used at all in the FreeBSD base) are easier dealt with the other way. The :tl and :tu equivalents were added to FreeBSD make a while back to ease the transition. Why can't there be a make target that turns them on in FreeBSD compat mode. You could then just drop those into bsd.port.mk and be done with it? We already do this with the posix target, so there's precedent for it. I know you've objected to this as ugly, but as I pointed out when I mentioned it before, I think this is less ugly and less work and would offer a smoother transition than forcing all the scripts to change. Warner I second this concept. At work, we create dozens of products using literally hundreds of makefiles scattered throughout a huge source base. We have to be able to build the same source for multiple versions of freebsd, so even finding all the old :U and :L and any other incompatibilities and fixing them isn't an option because we'd just trade works in freebsd 10 for broken in every other environment. If there were some way to turn on a compatibility mode, we'd have a way to slowly transition to the newer stuff over the course of a couple OS versions. Eventually we'd reach the point where we no longer need to build products using an older version and we could update to the newer syntax and stop using compatibility mode. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs instead of a singular program
On Fri, 2012-10-26 at 11:09 -0700, David O'Brien wrote: On Fri, Oct 26, 2012 at 09:41:36AM -0600, Ian Lepore wrote: We have to be able to build the same source for multiple versions of freebsd, so even finding all the old :U and :L and any other incompatibilities and fixing them isn't an option because we'd just trade works in freebsd 10 for broken in every other environment. Ian, If you're using FreeBSD 9 after 2012-06-14, or FreeBSD 8 or 7 after 2012-10-09 you can use the Bmake spelling of :U and :L (:tu/:tl). I am not aruging against you, just giving some information you may not be aware of. Yeah. And if I have to, I could modify all our makefiles to use the new syntax, then backport support for the new syntax to earlier freebsd make source in our local repos. But to give you some idea of what I've got to support... yesterday afternoon I was struggling with whether I can find the time in a release schedule to update an old product that needs a new feature from freebsd 6 to 8. The sad fact is that I can't, I'm going to have to do another freebsd 6-based release to meet the schedule. It's interesting having to work on a daily basis in everything between freebsd 6.2 and -current. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD in Google Code-In 2012? You can help too!
On Tue, 2012-10-23 at 12:39 +0200, Erik Cederstrand wrote: Den 16/10/2012 kl. 12.19 skrev Wojciech A. Koszek wkos...@freebsd.org: (cross-posted message; please keep discussion on freebsd-hackers@) Those of you who have Wiki access, please spent 2 more minutes and submit straight to Wiki: http://wiki.freebsd.org/GoogleCodeIn/2012Tasks There are lots of smallish tasks in the code-quality department: * Analyze and fix Clang Static Analyzer warnings * Analyze and fix compiler warnings to increase WARNS level * Write regression tests for src/tools/regression * Run include-what-you-use to clean up header inclusion * Verify bugs with patches I think they're too open-ended to enter in the wiki as-is, but I'd also like to not spam the wiki with lots of almost-identical tasks. What's the best way to suggest them for CodeIn? Analyzing and fixing warnings is the last thing I'd assign to a young inexperienced programmer. It's far too easy (and tempting) to cast away warnings or otherwise treat the symptoms when what's really needed is to dig deeply into code (often including analyzing call chains) to evaluate the consequences of any changes. On the last 3 tasks in your list, I agree completely, just the sort of thing you'd assign to an intern or new junior engineer to get them started on a large existing project. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: time_t when used as timedelta
On Tue, 2012-10-09 at 17:35 +0200, Erik Cederstrand wrote: Hi list, I'm looking at this possible divide-by zero in dhclient: http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-07-amd64/report-nBhqE2.html.gz#EndPath In this specific case, it's obvious from the intention of the code that ip-client-interval is always 0, but it's not obvious to me in the code. I could add an assert before the possible divide-by-zero: assert(ip-client-interval 0); But looking at the code, I'm not sure it's very elegant. ip-client-interval is defined as time_t (see src/sbin/dhclient/dhcpd.h), which is a signed integer type, if I'm correct. However, some time_t members of struct client_state and struct client_config (see said header file) are assumed in the code to be positive and possibly non-null. Instead of plastering the code with asserts, is there something like an utime_t type? Or are there better ways to enforce the invariant? It looks to me like the place where enforcement is really needed is in parse_lease_time() which should ensure at the very least that negative values never get through, and in some cases that zeroes don't sneak in from config files. If it were ensured that ip-client-config-backoff_cutoff could never be less than 1 (and it appears any value less than 1 would be insane), then the division by zero case could never happen. However, at least one of the config statements handled by parse_lease_time() allows a value of zero. Since nothing seems to ensure that backoff_cutoff is non-zero, it seems like a potential source of div-by-zero errors too, in that same function. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: syslog(3) issues
On Mon, 2012-09-03 at 00:35 +0100, Attilio Rao wrote: Hi, I was trying to use syslog(3) in a port application that uses threading , having all of them at the LOG_CRIT level. What I see is that when the logging gets massive (1000 entries) I cannot find some items within the /var/log/messages (I know because I started stamping also some sort of message ID in order to see what is going on). The missing items are in the order of 25% of what really be there. Someone has a good idea on where I can start verifying for my syslogd system? I have really 0 experience with syslogd and maybe I could be missing something obvious. There's a chance this PR about syslogd incorrectly calculating socket receive buffer sizes is related and the patch attached to it could fix it... http://www.freebsd.org/cgi/query-pr.cgi?pr=1604331 I filed the PR long ago, if the patches have drifted out of date I'll be happy to re-work them. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: syslog(3) issues
On Sun, 2012-09-02 at 19:50 -0600, Ian Lepore wrote: On Mon, 2012-09-03 at 00:35 +0100, Attilio Rao wrote: Hi, I was trying to use syslog(3) in a port application that uses threading , having all of them at the LOG_CRIT level. What I see is that when the logging gets massive (1000 entries) I cannot find some items within the /var/log/messages (I know because I started stamping also some sort of message ID in order to see what is going on). The missing items are in the order of 25% of what really be there. Someone has a good idea on where I can start verifying for my syslogd system? I have really 0 experience with syslogd and maybe I could be missing something obvious. There's a chance this PR about syslogd incorrectly calculating socket receive buffer sizes is related and the patch attached to it could fix it... http://www.freebsd.org/cgi/query-pr.cgi?pr=1604331 I filed the PR long ago, if the patches have drifted out of date I'll be happy to re-work them. -- Ian Oops, I glitched the PR number when I pasted it, this one should be correct: http://www.freebsd.org/cgi/query-pr.cgi?pr=160433 -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: any status of that project?
On Thu, 2012-08-16 at 14:03 +0200, Wojciech Puchar wrote: http://freebsdfoundation.blogspot.com/2012/03/new-project-nand-flash-support.html this would be great thing to have it working properly. any progress info? In the past few days I've tested the flash code in -current on a GlobalScale DreamPlug (arm platform), and confirmed that the low-level part of the code is working. I can read the flash on the unit and identify the existing partitions and data within them (but the main partition is formatted as UBI fs, so I've only looked at it with hexdump so far). I haven't tried the nandfs layer yet, or writing to the flash. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to Expose Chip-level Ethernet Statistics?
On Sat, 2012-08-04 at 12:21 -0700, Tim Kientzle wrote: I believe that some of the issues I'm having with this Ethernet driver might be easier to diagnose if I could expose the chip-level statistics counters (especially queue overrun counts). Is there a standard way to do this? I've looked at systat, netstat, and ifconfig but haven't yet found a standard tool that queries this sort of information. (If I could find that, I could figure out which ioctl it used…) Pointers appreciated… In particular, if there's another Ethernet driver that does this well, I can use that for a reference. Tim I don't know if this is exactly what you mean, but have a look at src/tools/tools/ifinfo, and find some examples of drivers that fill in that info by grepping for ifmib_iso_8802_3. (I really know nothing about this stuff, except that your request triggered a memory that the atmel if_ate driver gathers some stats that I've not seen in most other drivers.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: newbus' ivar's limitation..
On Mon, 2012-07-30 at 17:06 -0400, John Baldwin wrote: On Tuesday, July 17, 2012 2:03:14 am Arnaud Lacombe wrote: Hi, On Fri, Jul 13, 2012 at 1:56 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh i...@bsdimp.com wrote: [..] Honestly, though, I think you'll be more pissed when you find out that the N:1 interface that you want is being done in the wrong domain. But I've been wrong before and look forward to seeing your replacement. I will just pass function pointers for now, if things should be done dirty, let's be explicit about it. Now, the hinted device attachment did work quite smoothly, however, I would have a few suggestion: 1) add a call to bus_enumerate_hinted_children() before the call DEVICE_IDENTIFY() call in bus_generic_driver_added() this is required to be able to support dynamic loading and attachment of hinted children. I'm not sure this is a feature we want to support (to date hinted children have only been created at boot time). It seems to me that the bus should be in control of calling bus_enumerate_hinted_children() at whatever time works best for it. Also, shouldn't it only ever be called once? The comment block for BUS_HINTED_CHILD in bus_if.h says This method is only called in response to the parent bus asking for hinted devices to be enumerated. I think one of the implications of that is that any given bus may not call bus_enumerate_hinted_children() because it may not be able to do anything for hinted children. Adding a hint.somedev.0.at=somebus and then forcing the bus to enumerate hinted children amounts to forcing the bus to adopt a child it may not be able to provide resources for, which sounds like a panic or crash waiting to happen (or at best, no crash but nothing useful happens either). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kqueue periodic timer confusion
On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kqueue periodic timer confusion
On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote: On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote: On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta for things like nanosleep(). It is incorrect when computing a periodic delta such as for computing the interval for an itimer (setitimer) or EVFILT_TIMER(). Hah, setitimer()'s callout (realitexpire) uses
Re: /proc filesystem
On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote: that is what i need. but still need some explanation after using it and reading manual say: PID STARTEND PRT RES PRES REF SHD FL TP PATH 1378 0x40 0x5ac000 r-x 385 415 2 1 CN- vn /usr/local/bin/Xorg 1378 0x7ab000 0x7bc000 rw- 170 1 0 C-- vn /usr/local/bin/Xorg 1378 0x7bc000 0x80 rw- 140 1 0 C-- df 13780x8007ab0000x8007c3000 r-x 240 32 0 CN- vn /libexec/ld-elf.so.1 13780x8007c30000x8007f rw- 430 1 0 C-- df 13780x8007f0x8007f2000 rw-10 4 0 --- dv 13780x8007f20000x8007f4000 rw-20 4 0 --- dv 13780x8007f40000x800874000 rw- 110 4 0 --- dv 13780x8008740000x800884000 rw- 160 4 0 --- dv 13780x8008840000x800895000 rw- 100 1 0 CN- df 13780x8009c20000x8009c5000 rw-30 1 0 C-- df 1) Xorg is mapped twice - IMHO first is text/rodata second is data. But what REF really means here and why it is 2 once and 1 second. 2) what really PRES (private resident) means? df (default) mappings are IMHO anonymous maps==private data of process. so why RES is nonzero while PRES is zero, while on shared code PRES is nonzero and large. what does it really means? thanks. I'm catching up on threads I was following before I went on vacation, and it looks like there was never a response to this. I'm interested in the answers to these questions too, so today I did some spelunking in the code to see what I could figure out. I don't think I really understand things too well, but I'll just say what I think I found and hopefully the experts will correct anything I get wrong. I think you're right about the first two mappings in that procstat output. The REF value is the reference count on the vm object (the vnode for the exe file, I presume). I think the reason the reference count is 2 is that one reference is the open file itself, and the other is the shadow object. I've always been a bit confused about the concept of shadow objects in freebsd's vm, but I think it's somehow related to the running processes that are based on that executable vnode. For example, if another copy of Xorg were running, I think REF would be 3, and SHD would be 2. I don't know why there is no shadow object for the writable data mapping and why the refcount is only 1 for that. The PRES thing seemed simple when I first looked at the code, but the more I think about it in relation to other numbers the more confused I get. The logic in the code is if the shadow count is 1 then PRES is the resident size of the shadow object. This seems to be a measure of shared-code usage... any object which could be shared but isn't gets counted as private resident. The part that confuses me is how PRES can be larger than RES. The value for PRES is taken from the resident_page_count field of the shadow object. The RES value is calculated by walking each page of the map entry and calling pmap_mincore() to see if it's resident. So the number of resident pages is calculated to be fewer than the resident_page_count of the object the entry maps. I don't understand. Oh hmmm, wait a sec... could it be that read-ahead or relocation fixup or various other things caused lots of pages to be faulted in for the vnode object (so they're resident) but not all of those pages are mapped into the process because the path of execution has never referenced them and caused faults to map them into the process' vmspace? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kqueue periodic timer confusion
On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. Another source of oversleeping is that the length of a tick in microseconds is simplisticly calculated as 100 / hz on most hardware, so for HZ=1000, tick=1000. Unless the clock producing the tick interrupts is running at a frequency exactly divisible by 1000, that tick-length calculation has some rounding error in it, and it results in systematic oversleeping. On modern hardware with high-frequency clocks it's typically less than 1%. The routines for sleeping in the kernel take a count of ticks for how long to sleep, so when tvtohz() converts some number of microseconds to the corresponding number of ticks, any rounding error in the value for the length of a tick results in oversleeping by some small percentage of the time you wanted to sleep. Note that this rounding error in calculating the length of a tick does not result in a systematic skew in system timekeeping, because when each tick interrupt happens, the system reads a clock counter register that may or may not be related to the clock producing tick interrupts; the value in the register is full precision without the rounding error you get when counting ticks. It might be an interesting experiment to add kern.hz=1 to your /boot/loader.conf and see how that affects your test. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Interfacing devices with multiple parents within newbus
On Fri, 2012-07-06 at 16:45 -0400, Arnaud Lacombe wrote: Hi, On Fri, Jul 6, 2012 at 3:09 PM, Ian Lepore free...@damnhippie.dyndns.org wrote: On Fri, 2012-07-06 at 14:46 -0400, Arnaud Lacombe wrote: Hi, On Fri, Jul 6, 2012 at 11:33 AM, Arnaud Lacombe lacom...@gmail.com wrote: That's neither correct nor robust in a couple of way: 1) you have no guarantee a device unit will always give you the same resource. this raises the following question: how can a device, today, figure out which parent in a given devclass would give it access to resources it needs. Say, you have gpiobus0 provided by a superio and gpiobus1 provided by the chipset and a LED on the chipset's GPIO. Now, say gpiobus0 attachment is conditional to some BIOS setting. How can you tell gpioled(4) to attach on the chipset provided GPIO without hardcoding unit number either way ? AFAIK, you can not. Even hints provided layout description is defeated. Each device in a given devclass need to have a set of unique attribute to allow a child to distinguish it from other potential parent in the same devclass... - Arnaud Talking about a child being unable to choose the correct parent seems to indicate that this whole problem is turned upside-down somehow; children don't choose their parents. actually, I think I was wrong, I thought device were attached to a devclass, but they are truly attached to a given device. My mistake. Just blue-sky dreaming here on the fly... what we really have is a resource-management problem. A device comes along that needs a GPIO resource, how does it find and use that resource? Well, we have a resource manager, could that help somehow? Could a driver that provides access to GPIO somehow register its availability so that another driver can find and access it? The resource may be a callable interface, it doesn't really matter, I'm just wondering if the current rman stuff could be leveraged to help make the connection between unrelated devices. I think that implies that there would have to be something near the root of the hiearchy willing to be the owner/manager of dynamic resources. AFAIR, rman is mostly there to manage memory vs. i/o mapped resources. The more I think about it, the more FTD is the answer. The open question now being how to map a flexible device structure (FTD) to a less flexible structure (Newbus) :/ - Arnaud Memory- and IO-mapped regions and IRQs are the only current uses of rman (that I know of), but it was designed to be fairly agnostic about the resources it manages. It just works with ranges of values (that it really doesn't know how to interpret at all), leaving lots of room to define new types of things it can manage. The downside is that it's designed to be used hierarchically in the context of newbus, specifically to help parents manage the resources that they are able to provide to their children. Trying to use it in a way that allows devices which are hierarchically unrelated to allocate resources from each other may amount to a square-peg/round-hole situation. But the alternative is writing a new facility to allow registration and allocation of resources using some sort symbolic method of representing the resources such that the new manager doesn't have to know much about what it's managing. I think it would be better to find a way to reuse what we've already got if that's possible. I think we have two semi-related aspects to this problem... How do we symbolically represent the resources that drivers can provide to each other? (FDT may be the answer; I don't know much about it.) How do devices use that symbolic representation to locate the provider of the resource, and how is the sharing of those resources managed? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Interfacing devices with multiple parents within newbus
On Fri, 2012-07-06 at 14:46 -0400, Arnaud Lacombe wrote: Hi, On Fri, Jul 6, 2012 at 11:33 AM, Arnaud Lacombe lacom...@gmail.com wrote: That's neither correct nor robust in a couple of way: 1) you have no guarantee a device unit will always give you the same resource. this raises the following question: how can a device, today, figure out which parent in a given devclass would give it access to resources it needs. Say, you have gpiobus0 provided by a superio and gpiobus1 provided by the chipset and a LED on the chipset's GPIO. Now, say gpiobus0 attachment is conditional to some BIOS setting. How can you tell gpioled(4) to attach on the chipset provided GPIO without hardcoding unit number either way ? AFAIK, you can not. Even hints provided layout description is defeated. Each device in a given devclass need to have a set of unique attribute to allow a child to distinguish it from other potential parent in the same devclass... - Arnaud Talking about a child being unable to choose the correct parent seems to indicate that this whole problem is turned upside-down somehow; children don't choose their parents. Just blue-sky dreaming here on the fly... what we really have is a resource-management problem. A device comes along that needs a GPIO resource, how does it find and use that resource? Well, we have a resource manager, could that help somehow? Could a driver that provides access to GPIO somehow register its availability so that another driver can find and access it? The resource may be a callable interface, it doesn't really matter, I'm just wondering if the current rman stuff could be leveraged to help make the connection between unrelated devices. I think that implies that there would have to be something near the root of the hiearchy willing to be the owner/manager of dynamic resources. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Pull in upstream before 9.1 code freeze?
On Wed, 2012-07-04 at 15:08 -0700, Doug Barton wrote: On 07/04/2012 15:01, Mike Meyer wrote: On Wed, 04 Jul 2012 14:19:38 -0700 Doug Barton do...@freebsd.org wrote: On 07/04/2012 11:51, Jason Hellenthal wrote: What would be really nice here is a command wrapper hooked into the shell so that when you type a command and it does not exist it presents you with a question for suggestions to install somewhat like Fedora has done. I would also like to see this feature, which is pretty much universal in linux at this point. It's very handy. I, on the other hand, count it as one of the many features of Linux that make me use FreeBSD. First, I agree that being able to turn it off should be possible. But I can't help being curious ... why would you *not* want a feature that tells you what to install if you type a command that doesn't exist on the system? Doug The only response I can think of is... If you can even ask that question, then there's no answer I could give that would make any sense to you. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: /etc/resolv.conf getting over written with dhcp
On Wed, 2012-06-20 at 13:39 +0530, Varuna wrote: Ian Lepore wrote: Using the 'prepend' or 'supercede' keywords in /etc/dhclient.conf is pretty much the standard way of handling a mix of static and dhcp interfaces where the static config needs to take precedence. I'm not sure why you dismiss it as essentially good, but somehow not good enough. It's been working for me for years. -- Ian The issue that I had indicated that the issue with the /etc/resolv.conf is being caused by an error in /sbin/dhclient-script; hence, I am definitely not looking at solving the issue either with /etc/dhclient.conf or /etc/dhclient-exit-hooks configuration file. BTW, resolver(5) / resolv.conf(5) does not mention the usage of /etc/dhclient-exit-hooks file to protect the earlier contents of /etc/resolv.conf file. Will put this issue in the freebsd-doc mailing list. With regards, Varuna Eudaemonic Systems Simple, Specific Insightful I have re-read your original message and I think the confusion is here: 2***# When resolv.conf is not changed actually, we don't # need to update it. # If /usr is not mounted yet, we cannot use cmp, then # the following test fails. In such case, we simply # ignore an error and do update resolv.conf. 3***if cmp -s $tmpres /etc/resolv.conf; then rm -f $tmpres return 0 fi 2/dev/null [...] I guess, the 1***, 3*** and 4*** is causing the recreation of /etc/resolv.conf. Is this correct? I did a small modification to 3*** which is: if !(cmp -s $tmpres /etc/resolv.conf); then rm -f $tmpres return 0 fi 2/dev/null This seems to have solved the issue of /etc/resolv.conf getting overwritten with just: nameserver 192.168.98.4. This ensures that: If there is a difference between $tmpres and /etc/resolv.conf, then it exits post removal of $tmpres. If the execution of 3*** returns a 0, a new file gets created. I guess the modification get the intent of 3*** working. Have I barked up the wrong tree? I think yes, you have barked up the wrong tree. The intent of the code at 3*** is not to exit if there is a difference, it is to exit if there is NO difference. In other words, if the old and new files are identical then there is no need to re-write the file, just cleanup and exit. If the files are different then replace the existing file with the new one. This is just the (sometimes annoying) way dhcp works. If the dhcp server provides new resolver info it completely replaces any existing resolver info unless you've configured your dhclient.conf to prevent it. It only does so if the interface being configured is the current default-route interface, or there is no current default-route interface. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: /etc/resolv.conf getting over written with dhcp
On Fri, 2012-06-15 at 23:02 +0530, Varuna wrote: Thanks for the pointers. Dima Panov wrote: From my /etc/dhclient.conf: interface lagg0 { send dhcp-lease-time 3600; prepend domain-name-servers 127.0.0.1, 4.4.4.4, 8.8.8.8; request subnet-mask, broadcast-address, time-offset, routers, domain-name, domain-name-servers; require subnet-mask, domain-name-servers; } And result is /etc/resolv.conf: # Generated by resolvconf nameserver 127.0.0.1 nameserver 4.4.4.4 nameserver 8.8.8.8 nameserver 192.168.1.1 True indeed this will work and I did have a look at dhclient.conf(5) to setup the freebsd8:/etc/dhclient.conf. This will still call /sbin/dhclient-script which will overwrite the configuration done to the /etc/resolv.conf each time the system power is recycled. As per /usr/src/include/resolv.h, the MAXNS is by default set to 3; which the default configuration user will not be aware of as the entire focus will be on the ifconfig related flags in /etc/rc.conf. BTW, the example indicated in dhclient.conf(5) has a typo which says /etc/dhclient-script instead of /sbin/dhclient-script, indeed the system does not fail if the typo exists in dhclient.conf. Eugene Grosbein wrote: There is simple solution: create file /etc/dhclient-enter-hooks and override add_new_resolv_conf() there to do nothing: add_new_resolv_conf() { return 0 } Works just fine for my systems. Indeed this is a good suggestion, and this is if the user is aware of what to look for and where in /sbin/dhclient-script it is documented. A general sysadmin would be aware of /etc/nsswitch.conf and /etc/resolv.conf for name resolution issues and I do not think they will be aware of so many possible ways to handle the issue of resolv.conf getting overwritten by the usage of dhcp. What would be the way out? Do you think it would be a good idea to push the nameserver configuration information into /etc/rc.conf which happens to be the single file that would handle the system configuration? With regards, Varuna Eudaemonic Systems Simple, Specific Insightful IT Consultants, Continued Education Systems Distribution +91-88-92-47-62-63 http://www.eudaemonicsystems.net http://enquiry.eudaemonicsystems.net -- This email is confidential, and may be legally privileged. If you are not the intended recipient, you must not use or disseminate this information in any format. If you have received this email in error, please do delete it along with copies of it existing in any other format, and notify the sender immediately. The sender of this email believes it is virus free, and does not accept any liability for any errors or omissions arising thereof. Using the 'prepend' or 'supercede' keywords in /etc/dhclient.conf is pretty much the standard way of handling a mix of static and dhcp interfaces where the static config needs to take precedence. I'm not sure why you dismiss it as essentially good, but somehow not good enough. It's been working for me for years. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: wired memory - again!
On Tue, 2012-06-12 at 23:45 +0300, Konstantin Belousov wrote: On Tue, Jun 12, 2012 at 08:51:34AM -0600, Ian Lepore wrote: On Sat, 2012-06-09 at 22:45 +0200, Wojciech Puchar wrote: First, all memory allocated by UMA and consequently malloc(9) is wired. In other words, almost all memory used by kernel is accounted as wired. yes i understand this. still i found no way how to find out what allocated that much. Second, the buffer cache wires the pages which are inserted into VMIO buffers. So your observation is basically right, cached buffers means what are exactly VMIO buffers. i understand that page must be wired WHEN doing I/O. But i have too much wired memory even when doing no I/O at all. I agree, this is The Big Question for me. Why does the system keep wired writable mappings of the buffers in kva after the IO operations are completed? Read about buffer cache, e.g. in the Design and Implementation of the FreeBSD OS book. If it did not do so, it would fix the instruction-cache-disabled bug that kills performance on VIVT cache architectures (arm and mips) and it would reduce the amount of wired memory (that apparently doesn't need to be wired, unless I've missed the implications of a previous reply in this thread). I have no idea what is the bug you are talking about. If my guess is right, and it specifically references unability of some processors to correctly handle several mappings of the same physical page into different virtual addresses due to cache tagging using virtual address instead of physical, then this is a hardware bug, not software. This bug: http://lists.freebsd.org/pipermail/freebsd-arm/2012-January/003288.html The bug isn't the VIVT cache hardware, it's the fact that the way we handle the requirements of the hardware has the side effect of leaving the instruction cache bit disabled on executable pages because the kernel keeps writable mappings of the pages even after the IO is done. AFAIR, at least HP PA and MIPS have different instantiation of this problem. Our kernel uses multi-mapping quite often, and buffers is only one example. Also, why do you think that the pages entered into buffers shall not be wired, it is completely beyond my understanding. What's beyond my understanding is why a page has to remain wired after the IO is complete. That question seems to me to be tangentially related to the above question of why the kernel needs to keep a writable mapping of the buffer after it's done writing into the page (either via DMA or via uiomove() depending on the direction of the IO). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD Boot Times
On Wed, 2012-06-13 at 09:10 +0200, Wojciech Puchar wrote: Greetings, I was just wondering what it is that FreeBSD does that makes it take so long to boot. Booting into Ubuntu minimal or my own custom Linux distro, literally takes 0.5-2 seconds to boot up to shell, where FreeBSD takes about 10-20 seconds. I'm not sure if anything could be parallelized in the boot process, mostly kernel time. Note: This isn't really an issue, moreso a curiosity. true. system that never crash are not often booted An embedded system may be booted or powered cycled dozens of times a day, and boot time can be VERY important. Don't assume that the way you use FreeBSD is the only way. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: wired memory - again!
On Sat, 2012-06-09 at 22:45 +0200, Wojciech Puchar wrote: First, all memory allocated by UMA and consequently malloc(9) is wired. In other words, almost all memory used by kernel is accounted as wired. yes i understand this. still i found no way how to find out what allocated that much. Second, the buffer cache wires the pages which are inserted into VMIO buffers. So your observation is basically right, cached buffers means what are exactly VMIO buffers. i understand that page must be wired WHEN doing I/O. But i have too much wired memory even when doing no I/O at all. I agree, this is The Big Question for me. Why does the system keep wired writable mappings of the buffers in kva after the IO operations are completed? If it did not do so, it would fix the instruction-cache-disabled bug that kills performance on VIVT cache architectures (arm and mips) and it would reduce the amount of wired memory (that apparently doesn't need to be wired, unless I've missed the implications of a previous reply in this thread). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: wired memory - again!
On Sat, 2012-06-09 at 09:21 +0200, Wojciech Puchar wrote: top reports wired memory 128MB WHERE it is used? below results of vmstat -m and vmstat -z values does not sum up even to half of it FreeBSD 9 - few days old. What i am missing and why there are SO MUCH wired memory on 1GB machine without X11 or virtualbox [vmstat output snipped] I have been struggling to answer the same question for about a week on our embedded systems (running 8.2). We have systems with 64MB ram which have 20MB wired, and I couldn't find any way to directly view what that wired memory is being used for. I also discovered that the vmstat output accounted for only a tiny fraction of the 20MB. What I eventually determined is that there is some sort of correlation between vfs buffer space and wired memory. Our embedded systems typically do very little disk IO, but during some testing we were spewing debug output to /var/log/messages at the rate of several lines per second for hours. Under these conditions the amount of wired memory would climb from its usual of about 8MB to around 20MB, and once it climbed that high it pretty much never went down, or only went down a couple MB. The resulting memory pressure caused our apps to get killed over and over again with out of swap space (we have no swap on these systems). The kernel auto-tunes the vfs buffer space using the formula for the first 64 MB of ram use 1/4 for buffers, plus 1/10 of the ram over 64 MB. Using 16 of 64 MB of ram for buffer space seems insane to me, but maybe it makes sense on certain types of servers or something. I added option NBUF=128 to our kernel config and that dropped the buffer space to under 2 MB and since doing that I haven't seen the amount of wired memory ever go above 8 MB. I wonder whether my tuning of NBUF is affecting wired memory usage by indirectly tuning the 'nswbuf' value; I can't tune nswbuf directly because the embedded system is ARM-based and we have no loader(8) for setting tunablables. I'm not sure NBUF=128 is a good setting even for a system that doesn't do much IO, so I consider it experimental and we're testing under a variety of conditions to see if it leads to any unexpected behaviors. I'm certainly not suggesting anyone else rush to add this option to their kernel config. I am VERY curious about the nature of this correlation between vfs buffer space and wired memory. For the VM gurus: Is the behavior I'm seeing expected? Why would memory become wired and seemingly never get released back to one of the page queues after the IO is done? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Need to revert behavior of OpenSSH to the old key order ...
On Mon, 2012-05-21 at 14:26 -0700, Jason Usher wrote: --- On Mon, 5/21/12, Garance A Drosehn g...@freebsd.org wrote: But have you tried it in this order ? HostKey /usr/local/etc/ssh/ssh_host_key HostKey /usr/local/etc/ssh/ssh_host_dsa_key HostKey /usr/local/etc/ssh/ssh_host_rsa_key HostKey /usr/local/etc/ssh/ssh_host_ecdsa_key Which is to say, have your sshd_config file list multiple hostkey's, and then restart sshd after making that change? I tried a similar change and it seemed to have some effect on what clients saw when connecting, but I can't tell if it has the effect that you want. The order of HostKey directives in sshd_config does not change the actual order. In newer implementations, RSA is provided first, no matter how you configure the sshd_config. As I mentioned before, removing RSA completely is sort of a fix, but I can't do that because some people might actually be explicitly using RSA, and they would all break. Anyone ? After poking through the sshd code a bit, it looks to me like this is working as designed and it's the clients that are broken. For host key algorithm, and other things where both the server and the client side have a list of possibilities and have to agree on a match from those lists, the client side is completely in control of precedence, by design. The server has a list of things it can support, A,B,C,D. The client sends a list of things it desires, in order of preference, D,A,C. The server chooses a match as follows: for each client list item for each server list item if current-client-item matches current-server-item return current-client-item as the match end if end for end for In your case it appears that the client sends rsa,dsa as the host key algorithm list. The server has dsa,rsa,maybe,other,stuff and since rsa is the client's first choice and exists in the server list, it gets used. Then the client rejects the rsa key because it was really only ever going to be happy with a dsa key. IMO, this is a client-side bug; if it's only going to accept dsa (because that's the only thing in the known_hosts file) then it should only ask for that. So I think you have two choices... 1) Only offer a dsa key. It appears the right way to do this would be to have just one HostKey statement in the sshd config file that names your dsa key file. The presence of at least one HostKey statement will prevent the code from adding the default keyfile names internally, so you should end up with only a dsa key being offered. 2) Try the attached patch to violate the design and force the server's configuration order to override the precedence implied by the client's request list. Put HostKey statements the sshd_config file in the order you want enforced. I don't think #2 is a good option, but I know how it is in a production world... sometimes you've got to do things that you know are bad to keep the show running. Hopefully when you do such things it's just to buy some time to deploy a better fix (but it doesn't always work out that way; I still maintain horrible temporary hacks like this from years and years ago). Maybe option 1 would work okay for you in light of this info: When I look in the openssh source from freebsd 6.4, it appears that while an rsa hostkey was supported, it would not be added to the server config by default; it would only be used if you specifically configured it with a HostKey statement in sshd_config. So maybe you can safely assume that nobody was ever connecting to your freebsd 6.x machines using an rsa hostkey. Now for The Big Caveat: All of the above is based on code inspection. I haven't tested anything, including the attached patch. -- Ian Index: crypto/openssh/kex.c === --- crypto/openssh/kex.c (revision 235554) +++ crypto/openssh/kex.c (working copy) @@ -371,7 +371,7 @@ static void choose_hostkeyalg(Kex *k, char *client, char *server) { - char *hostkeyalg = match_list(client, server, NULL); + char *hostkeyalg = match_list(server, client, NULL); if (hostkeyalg == NULL) fatal(no hostkey alg); k-hostkey_type = key_type_from_name(hostkeyalg); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Need to revert behavior of OpenSSH to the old key order ...
On Tue, 2012-05-22 at 09:59 -0700, Jason Usher wrote: Hi Ian, Thank you very much for taking a look at this, and for understanding what I'm talking about here. Comments inline, below... --- On Tue, 5/22/12, Ian Lepore free...@damnhippie.dyndns.org wrote: But have you tried it in this order ? HostKey /usr/local/etc/ssh/ssh_host_key HostKey /usr/local/etc/ssh/ssh_host_dsa_key HostKey /usr/local/etc/ssh/ssh_host_rsa_key HostKey /usr/local/etc/ssh/ssh_host_ecdsa_key Which is to say, have your sshd_config file list multiple hostkey's, and then restart sshd after making that change? I tried a similar change and it seemed to have some effect on what clients saw when connecting, but I can't tell if it has the effect that you want. The order of HostKey directives in sshd_config does not change the actual order. In newer implementations, RSA is provided first, no matter how you configure the sshd_config. As I mentioned before, removing RSA completely is sort of a fix, but I can't do that because some people might actually be explicitly using RSA, and they would all break. Anyone ? After poking through the sshd code a bit, it looks to me like this is working as designed and it's the clients that are broken. For host key algorithm, and other things where both the server and the client side have a list of possibilities and have to agree on a match from those lists, the client side is completely in control of precedence, by design. OK. That's bad news, as I have no influence on the clients at all. In your case it appears that the client sends rsa,dsa as the host key algorithm list. The server has dsa,rsa,maybe,other,stuff and since rsa is the client's first choice and exists in the server list, it gets used. Then the client rejects the rsa key because it was really only ever going to be happy with a dsa key. IMO, this is a client-side bug; if it's only going to accept dsa (because that's the only thing in the known_hosts file) then it should only ask for that. Exactly. It would be nice if the client at least tried the other algorithm to see if that does indeed match up with the public key it is sitting on ... breaking automation out in the field is really problematic. 1) Only offer a dsa key. It appears the right way to do this would be to have just one HostKey statement in the sshd config file that names your dsa key file. The presence of at least one HostKey statement will prevent the code from adding the default keyfile names internally, so you should end up with only a dsa key being offered. Ok, I did this - I explicitly defined a HostKey in sshd_config that happens to be my DSA key: #HostKey for protocol version 1 #HostKey /etc/ssh/ssh_host_key #HostKeys for protocol version 2 HostKey /etc/ssh/ssh_host_dsa_key (note the last line is uncommented) and sshd does indeed just present the DSA key (to clients that were previously negotiating the RSA key, after the upgrade). So this is great... I was originally wary of forcing DSA only like this, since there might be clients out in the world that had somehow negotiated an RSA key, but based on your further comments, it sounds like that is not the case. So if everyone has DSA keys (we'll find out ...) then we are all set. Thank you very much for examining this issue - I hope the archives of this conversation will help others in the future. Seeing your example config with the commented-out HostKey lines made me realize that you probably want to have two HostKey lines, one for the protocol v1 key and another for the dsa key for v2. The 6.x server added the v1 key and the v2 dsa key by default, so you could have existing clients relying on a v1 key. Since you now have a HostKey statement the new server code won't add the v1 key by default so you'd need to be explicit about it. Based on examining the code, I think this will be safe because the keys have different type-names (rsa1 vs rsa) so a client wanting to use a protocol v2 rsa key won't accidentally match the protcol v1 rsa key named in the config file (and it will still match the dsa key). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ARM + CACHE_LINE_SIZE + DMA
On Fri, 2012-05-18 at 16:13 +0200, Svatopluk Kraus wrote: On Thu, May 17, 2012 at 10:07 PM, Ian Lepore free...@damnhippie.dyndns.org wrote: On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote: Hi, I'm working on DMA bus implementation for ARM11mpcore platform. I've looked at implementation in ARM tree, but IMHO it only works with some assumptions. There is a problem with DMA on memory block which is not aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent. Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE. Then first cache line associated with the buffer can be divided into two parts, A and B, where A is a memory we know nothing about it and B is buffer memory. The same stands for last cache line associatted with the buffer. We have no problem if a memory is coherent. Otherwise it depends on memory attributes. 1. [no cache] attribute No problem as memory is coherent. 2. [write throught] attribute The part A can be invalidated without loss of any data. It's not problem too. 3. [write back] attribute In general, there is no way how to keep both parts consistent. At the start of DMA transaction, the cache line is written back and invalidated. However, as we know nothing about memory associated with part A of the cache line, the cache line can be filled again at any time and messing up DMA transaction if flushed. Even if the cache line is only filled but not flushed during DMA transaction, we must make it coherent with memory after that. There is a trick with saving part A of the line into temporary buffer, invalidating the line, and restoring part A in current ARM (MIPS) implementation. However, if somebody is writting to memory associated with part A of the line during this trick, the part A will be messed up. Moreover, the part A can be part of another DMA transaction. To safely use DMA with no coherent memory, a memory with [no cache] or [write throught] attributes can be used without problem. A memory with [write back] attribute must be aligned on CACHE_LINE_SIZE. However, for example mbuf, a buffer for DMA can be part of a structure which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We can know that nobody will write to the structure during DMA transaction, so it's safe to use the buffer event if it's not aligned on CACHE_LINE_SIZE. So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and we want to avoid bounce pages overhead, we must support additional information to DMA transaction. It should be easy to support the information about drivers data buffers. However, what about OS data buffers like mentioned mbufs? The question is following. Is or can be guaranteed for all or at least well-known OS data buffers which can be part of DMA access that the not CACHE_LINE_SIZE aligned buffers are surrounded by data which belongs to the same object as the buffer and the data is not written by OS when given to a driver? Any answer is appreciated. However, 'bounce pages' is not an answer. Thanks, Svata I'm adding freebsd-arm@ to the CC list; that's where this has been discussed before. Your analysis is correct... to the degree that it works at all right now, it's working by accident. At work we've been making the good accident a bit more likely by setting the minimum allocation size to arm_dcache_align in kern_malloc.c. This makes it somewhat less likely that unrelated objects in the kernel are sharing a cache line, but it also reduces the effectiveness of the cache somewhat. Another factor, not mentioned in your analysis, is the size of the IO operation. Even if the beginning of the DMA buffer is cache-aligned, if the size isn't exactly a multiple of the cache line size you still have the partial flush situation and all of its problems. It's not guaranteed that data surrounding a DMA buffer will be untouched during the DMA, even when that surrounding data is part of the same conceptual object as the IO buffer. It's most often true, but certainly not guaranteed. In addition, as Mark pointed out in a prior reply, sometimes the DMA buffer is on the stack, and even returning from the function that starts the IO operation affects the cacheline associated with the DMA buffer. Consider something like this: void do_io() { int buffer; start_read(buffer); // maybe do other stuff here wait_for_read_done(); } start_read() gets some IO going, so before it returns a call has been made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets done on the cacheline containing the variable 'buffer'. The act of returning from the start_read() function causes that cacheline to get reloaded, so now the stale pre-DMA value of the variable 'buffer' is in cache again. Right after that, the DMA completes so that ram
Re: ARM + CACHE_LINE_SIZE + DMA
On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote: Hi, I'm working on DMA bus implementation for ARM11mpcore platform. I've looked at implementation in ARM tree, but IMHO it only works with some assumptions. There is a problem with DMA on memory block which is not aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent. Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE. Then first cache line associated with the buffer can be divided into two parts, A and B, where A is a memory we know nothing about it and B is buffer memory. The same stands for last cache line associatted with the buffer. We have no problem if a memory is coherent. Otherwise it depends on memory attributes. 1. [no cache] attribute No problem as memory is coherent. 2. [write throught] attribute The part A can be invalidated without loss of any data. It's not problem too. 3. [write back] attribute In general, there is no way how to keep both parts consistent. At the start of DMA transaction, the cache line is written back and invalidated. However, as we know nothing about memory associated with part A of the cache line, the cache line can be filled again at any time and messing up DMA transaction if flushed. Even if the cache line is only filled but not flushed during DMA transaction, we must make it coherent with memory after that. There is a trick with saving part A of the line into temporary buffer, invalidating the line, and restoring part A in current ARM (MIPS) implementation. However, if somebody is writting to memory associated with part A of the line during this trick, the part A will be messed up. Moreover, the part A can be part of another DMA transaction. To safely use DMA with no coherent memory, a memory with [no cache] or [write throught] attributes can be used without problem. A memory with [write back] attribute must be aligned on CACHE_LINE_SIZE. However, for example mbuf, a buffer for DMA can be part of a structure which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We can know that nobody will write to the structure during DMA transaction, so it's safe to use the buffer event if it's not aligned on CACHE_LINE_SIZE. So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and we want to avoid bounce pages overhead, we must support additional information to DMA transaction. It should be easy to support the information about drivers data buffers. However, what about OS data buffers like mentioned mbufs? The question is following. Is or can be guaranteed for all or at least well-known OS data buffers which can be part of DMA access that the not CACHE_LINE_SIZE aligned buffers are surrounded by data which belongs to the same object as the buffer and the data is not written by OS when given to a driver? Any answer is appreciated. However, 'bounce pages' is not an answer. Thanks, Svata I'm adding freebsd-arm@ to the CC list; that's where this has been discussed before. Your analysis is correct... to the degree that it works at all right now, it's working by accident. At work we've been making the good accident a bit more likely by setting the minimum allocation size to arm_dcache_align in kern_malloc.c. This makes it somewhat less likely that unrelated objects in the kernel are sharing a cache line, but it also reduces the effectiveness of the cache somewhat. Another factor, not mentioned in your analysis, is the size of the IO operation. Even if the beginning of the DMA buffer is cache-aligned, if the size isn't exactly a multiple of the cache line size you still have the partial flush situation and all of its problems. It's not guaranteed that data surrounding a DMA buffer will be untouched during the DMA, even when that surrounding data is part of the same conceptual object as the IO buffer. It's most often true, but certainly not guaranteed. In addition, as Mark pointed out in a prior reply, sometimes the DMA buffer is on the stack, and even returning from the function that starts the IO operation affects the cacheline associated with the DMA buffer. Consider something like this: void do_io() { int buffer; start_read(buffer); // maybe do other stuff here wait_for_read_done(); } start_read() gets some IO going, so before it returns a call has been made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets done on the cacheline containing the variable 'buffer'. The act of returning from the start_read() function causes that cacheline to get reloaded, so now the stale pre-DMA value of the variable 'buffer' is in cache again. Right after that, the DMA completes so that ram has a newer value that belongs in the buffer variable and the copy in the cacheline is stale. Before control gets into the wait_for_read_done() routine that will attempt to handle the POSTREAD partial cacheline flush, another thread gets control and begins
Re: csh builtin command problems
On Wed, 2012-05-09 at 21:34 -0400, Robert Simmons wrote: I'm trying to use sysv style echo in /bin/csh and I've hit a wall as to how to get it to work. The following does not have the outcome that I'm looking for: # echo_style=sysv # echo test\ttest test # cat test testttest I want this: # echo test\ttest test # cat test testtest Any thoughts? What I see on 8.3 is this: % set echo_style=sysv % echo test\ttest testttest % echo test\ttest testtest % So it seems from this very minimal test that the implementation of echo is correct, but the parsing of the command line in csh requires that the \t in the arg be protected with quotes. (I don't normally spend any longer in csh than it takes for a .cshrc to launch bash, and even that's only on systems where I don't control /etc/passwd to just use bash directly.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: diagonising a overheating problem
On Mon, 2012-05-14 at 18:56 -0400, Aryeh Friedman wrote: On Mon, May 14, 2012 at 6:37 PM, Bartosz Fabianowski free...@chillt.de wrote: Try sysctl dev.cpu.0.temperature. I have a notoriously overheating Dell laptop and for me, this sysctl always reports the temperature. - Bartosz ~/Desktop aryeh@localhost% sysctl dev.cpu.0.temperature sysctl: unknown oid 'dev.cpu.0.temperature' ~/Desktop aryeh@localhost% sysctl dev.cpu.0 dev.cpu.0.%desc: ACPI CPU dev.cpu.0.%driver: cpu dev.cpu.0.%location: handle=\_PR_.C000 dev.cpu.0.%pnpinfo: _HID=none _UID=0 dev.cpu.0.%parent: acpi0 dev.cpu.0.freq: 1500 dev.cpu.0.freq_levels: 1500/7260 1400/6056 1225/5299 1200/5125 1100/4500 1000/4095 900/3753 800/3468 700/3034 600/2601 500/2167 400/1734 300/1300 200/867 100/433 dev.cpu.0.cx_supported: C1/0 C2/100 dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_usage: 100.00% 0.00% last 233us dev.cpu.0.temperature is provided by the coretemp(4) driver, maybe you need to kldload it? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Calling tsleep(9) with interrupts disabled
I just realized that I've accidentally coded a sequence similar to this in a driver: s = intr_disable(); // do stuff here tsleep(sc, 0, twird, hz / 4); // more stuff intr_restore(s); Much to my surpise this works, including waking up due to wakeup(sc) being called from an interrupt handler. So apparently tsleep() results in interrupts being re-enabled during the sleep, although nothing in the manpage says that will happen. Can I safely rely on this behavior, or is it working by accident? (Please no lectures on the evils of disabling interrupts... This is not a multi-GHz multi-core Xeon, it's a 180mhz embedded SoC with buggy builtin devices that will drop or corrupt data if an interrupt happens during the do stuff here part of the code.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wed, 2012-04-18 at 09:41 -0400, John Baldwin wrote: On Wednesday, April 18, 2012 2:02:22 am Andriy Gapon wrote: on 17/04/2012 23:43 John Baldwin said the following: On Tuesday, April 17, 2012 4:22:19 pm Andriy Gapon wrote: We already have a flag for ZFS (KARGS_FLAGS_ZFS, 0x4). So the new flag could be named something ZFS-specific (as silly as KARGS_FLAGS_ZFS2) or something more general such as KARGS_FLAGS_32_BYTES meaning that the total size of arguments area is 32 bytes (as opposed to 24 previously). Does KARGS_FLAGS_GUID work? I think that's too terse, we already passed a pool guid via the existing argument space. So it should be something like KARGS_FLAGS_ZFS_FS_GUID or KARGS_FLAGS_ZFS_DS_GUID (DS - dataset). Ah. I do think the flag should indicate that the bootinfo structure is larger, I was assuming you were adding a new GUID field that didn't exist before. I can't think of something better than KARGS_FLAGS_32. What might be nice actually, is to add a new field to indicate the size of the argument area and to set a flag to indicate that the size field is present (KARGS_FLAGS_SIZE)? YES! A size field (preferably as the first field in the struct) along with a flag to indicate that it's a new-style boot info struct that starts with a size field, will allow future changes without a lot of drama. It can allow code that has to deal with the struct without interpretting it (such as trampoline code that has to copy it to a new stack or memory area as part of loading the kernel) to be immune to future changes. This probably isn't a big deal in the x86 world, but it can be important for embedded systems where a proprietary bootloader has to pass info to a proprietary board_init() type routine in the kernel using non-proprietary loader/trampoline code that's part of the base. We have a bit of a mess in this regard in the ARM world right now, and it would be a lot lessy messy if something like this had been in place. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote: on 18/04/2012 17:22 Ian Lepore said the following: YES! A size field (preferably as the first field in the struct) along with a flag to indicate that it's a new-style boot info struct that starts with a size field, will allow future changes without a lot of drama. It can allow code that has to deal with the struct without interpretting it (such as trampoline code that has to copy it to a new stack or memory area as part of loading the kernel) to be immune to future changes. Yeah, placing the new field at front would immediately break compatibility and even access to the flags field :-) Code would only assume the new field was at the front of the struct if the new flag is set, otherwise it would use the historical struct layout. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote: on 18/04/2012 17:22 Ian Lepore said the following: YES! A size field (preferably as the first field in the struct) along with a flag to indicate that it's a new-style boot info struct that starts with a size field, will allow future changes without a lot of drama. It can allow code that has to deal with the struct without interpretting it (such as trampoline code that has to copy it to a new stack or memory area as part of loading the kernel) to be immune to future changes. Yeah, placing the new field at front would immediately break compatibility and even access to the flags field :-) Oh wait, is the flags field embedded in the struct? My bad, I didn't look. In the ARM code I'm used to working with, the flags are passed from the bootloader to the kernel entry point in a register; I don't know why assumed that would be true on other platforms. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [GSoC] [ARM] arm cleanup - my own proposal
On Sun, 2012-04-01 at 20:19 +0200, Aleksander Dutkowski wrote: hello! after few weeks searching for interesting idea for me, I've decided to propose my own one. It is already mentioned on IdeasPage: - ARM cleanup Why I have chosen this one? I am very interested in embedded world. Now I am working on porting FBSD to at91sam9g45 - I will be much more motivated working on arm fbsd project than any other. Why should you let me do that project? While working on freebsd/arm I've noticed places that could be optimized, or separated, i.e. at91_samsize() should be declared for each board separately - now, this function has if-else and checks, which board is he running on. I would like to identify and fix that bugs, so the code will be more efficient and clear. Moreover, I think there should be a tutorial/framework for adding new boards or SoCs, so I will be simplier. I am currently reading the code in sys/arm/at91 and searching for improvements but I will be very pleased, if you send me your insights. The first question is - should I cleanup only at91 branch or more? I am quite familiar with at91 right now. The second - how to test the code? Some of boards could be tested in qemu, I could buy board with at91rm9200 for example, if I'm in. But maybe I will find here people with their own boards, they could help me testing? I havs sbc6045 board with at91sam9g45 SoC but it hasn't fbsd support yet (I'm working on it now :) ) I also thought about reducing kernel size for embedded, if arm cleanup won't fit. I'm curious whether you ever got a reply to this privately, since nothing appeared on the list? I meant to reply and offer to do testing of at91 changes on rm9200 hardware, but I was on vacation when you posted originally, and I forgot to reply until just now. It's been my growing impression for about a year that the arm support in FreeBSD has atrophied to the point where it can barely be said that it's supported at all. Now I see this morning that marius@ has committed a set of style cleanups to the at91 code (r234281), so maybe it's not quite as dead as I feared. At Symmetricom we build a variety of products based on the rm9200, and we're maintaining quite a set of diffs from stock FreeBSD. Some are bug fixes, some are enhancements such as allowing the master clock frequency to be changed during kernel init (instead of in the bootloader) and a hints-based system that allows the atmelarm bus to become aware of new child devices that aren't in the stock code and manage their resources. It sure would be nice if some of those diffs could get rolled back in; it would certainly make it easier for me to integrate things like Marius' style cleanups back into our repo. Anyway, if ongoing changes are going to be happening to the at91 code, I'm certainly interested in helping however I can. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging zombies: pthread_sigmask and sigwait
On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote: Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't normal which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org The signal mask for a new thread is inherited from the parent thread. In your example code, the signal handling thread inherits the blocked status of the signals as set up in main(). Try adding this line to signal_handler() before it goes into its while() loop: pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL); -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging zombies: pthread_sigmask and sigwait
On Wed, 2012-04-11 at 17:47 +0300, Konstantin Belousov wrote: On Wed, Apr 11, 2012 at 08:26:13AM -0600, Ian Lepore wrote: On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote: Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't normal which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org The signal mask for a new thread is inherited from the parent thread. In your example code, the signal handling thread inherits the blocked status of the signals as set up in main(). Try adding this line to signal_handler() before it goes into its while() loop: pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL); This is completely wrong. sigwait(2) requires the waited signals to be blocked, so the code is right in this regard. Ooops, sorry. The code that sets up our signal handling threads uses SIG_SETMASK rather than BLOCK/UNBLOCK, and my quick glance at it misinterpretted what it was doing. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Regarding coredump and restart
On Fri, 2012-03-30 at 01:10 +0800, Mahesh Babu wrote: I am currently working on coredump and then restarting the process in FreeBSD 9. I have created the coredump file for a process using gcore of gdb. I am not able restart the process from the coredump file. Is there any ways to restart the process using gdb itself or any other ways to implement restarting of the process from the coredump file? Thanks, Mahesh A coredump does not contain the entire state of a process, it only contains the part of the state that is contained within memory belonging to the process. Other parts of the state can exist outside of that memory. For example, in open disk files, in the corresponding state of another process at the other end of a socket connection, and so on. Bringing back the memory image will not bring back the corresponding state in external resources. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Graphical Terminal Environment
On Tue, 2012-03-06 at 10:24 -0500, Brandon Falk wrote: On 3/6/2012 11:05 AM, per...@pluto.rain.com wrote: Brandon Falk bfalk_...@brandonfa.lk wrote: I havent tried tmux yet, but on my system im only able to get 80x40 with vidcontrol on one monitor. But with xterm in xorg i can get 319x89 per monitor ... To get higher resolution than what vidcontrol provides, you'll most likely need to run the display in graphic mode (which is what X11 does) rather than in text mode. That means that you will need to either use, or reinvent, the lowest levels of X (display driver, window mapping) and at least part of the xterm/rxvt application (terminal emulation, font rasterizing, perhaps scrolling). You could, however, eliminate the X practice of using the network to connect the terminal emulator to the display; this would give you an architecture resembling SunView (and its predecessor, SunTools). I _think_ SunTools/SunView were proprietary, although it's possible that Sun released the source code at some point. You could try doing some research with Google and/or the Internet Archive. That's pretty much my plan. To write some lower level drivers to put the system in a graphics mode. I have 4 monitors and there is no other way to get multiple monitors without a GPU specific driver (at least from my VGA OSDev experience). My goal will be to make a driver that will be able to be runnable by any other driver easily. Instead of having to use Xorg, just calls to the video driver to set the mode to graphics, then some primitive functions to draw lines and dots. I don't see why Xorg should dominate the drivers completely, I really wish it was a matter of having an open, well documented, easy to use API that you can just give direct commands to. From my understanding, this is the current model: [ Apps ] | v [ Xorg ] | v [ Driver ] | v [ GPU] I think it should be the following: [ Apps ] | v [ Xorg ] [ Apps ] | | v v [Driver ] | v [ GPU ] Does this make sense to anyone else? I really want to get this idea across because I think it would be really beneficial. -Brandon With that model and your statement that the driver should support only primitive functions to draw lines and dots, that leaves the non-trivial problem of font rendering to the app. Given your original goal, font rendering is pretty much the bulk of what you want to do, is the app layer the right place for it? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mtree(8) reporting of file modes
On Tue, 2012-03-06 at 12:41 -0800, David Wolfskill wrote: As I mentioned in http://docs.FreeBSD.org/cgi/mid.cgi?20120306000520.GS1519, at work, we're trying to use mtree(8) to do reality checks on server configuration/provisioning. (We are not proposing the use of mtree to actually enforce a particular configuration -- we are only considering using it to generate specification files, then check aa given system against those specification files.) I had thought it odd (after running mtree -c) that most of the entries in the resulting specification file failed to mention the mode of the file; this was the catalyst for the above-cited message. In the mean time, I started poking at the sources. Caveat: I'm not really a C programmer; the bulk of my background is in sysadmin-type positions (though I've been doing other stuff for the last 4 years). Anyway, I fairly quickly focused my attention on src/usr.sbin/mtree/create.c, in particular, on the statf() function therein. Most of this part of the code is barely changed since 4.4 Lite; the most recent change to the section in question (lines 207 - 208 from the version in head as of r232599) was made by rgrimes@ back in 1994. So I presume that there's something I'm overlooking or otherwise missing, since the folks who have been here before were certainly more clueful than I am. But the code in question: ... 206 } 207 if (keys F_MODE (p-fts_statp-st_mode MBITS) != mode) 208 output(indent, offset, mode=%#o, p-fts_statp-st_mode MBITS); ... is what outputs the mode to standard output. Here is (the bulk of) what I found: * The keys F_MODE term merely tests to see if we are interested in reporting the file mode. (By default, we are.) * p-fts_statp-st_mode refers to the st_mode returned from stat() for the file presently being examined. * MBITS is a mask of mode bits about which we care; it is defined (in mtree.h) as (S_ISUID|S_ISGID|S_ISTXT|S_IRWXU|S_IRWXG|S_IRWXO). These are defined in sys/stat.h; MBITS, thus, works out to 000. * mode is set to the (masked) mode of the (immediately) enclosing directory when it is visited in pre-order. (This is done in statd().) As a result, we only report the mode of a file if it differs from the mode of its parent directory. Huh??!? Maybe I'm confused, but certainly for my present purposes, and likely in general, I'd think it would make sense to just always report the file mode. A way to do that would be to change the above excerpt to read: ... 206 } 207 if (keys F_MODE) 208 output(indent, offset, mode=%#o, p-fts_statp-st_mode MBITS); ... Another alternative, in case there are use cases for the existing behavior, would be to provide either another key or a command-line flag that says give me all the modes. Am I the only one who would find such a change useful? Thanks for any reality checks. :-} Peace, david At a glance I think the idea here is that when it outputs the directory entry it outputs a /set line that has the directory's mode in it, and then as it does the files in that directory it only needs to output a mode= clause for a file if it differs from the most recent /set line. (This is based on studying the code for about 30 seconds, so don't take it as gospel.) -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to access kernel memory from user space
On Wed, 2012-02-22 at 17:24 +, Svetlin Manavski wrote: Hi all, I have a very similar problem as described in this thread back in 2009: http://lists.freebsd.org/pipermail/freebsd-hackers/2009-January/027367.html I have a kernel module producing networking stats which I need to frequently read from the user space. A copy of the data structure would be too expensive so I need to access the kernel data directly from the user space. Unfortunately Alexej's code crashes in the following area: vm_map_lookup(kmem_map, addr, VM_PROT_ALL, myentry, myobject, mypindex, myprot, mywired); /* OUT */ vm_map_lookup_done(kmem_map, myentry); I am using 64bit FreeBSD 8.2 on Intel Xeon hardware. Any idea how to make a stable implementation on my platform? Thank you, Svetlin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org I've never done this, but if I needed to, I think the first thing I'd try is to use an mmap(2) of /dev/kmem to map the memory you need into userspace (of course your userspace app will need to be running with root privs to do this). That leaves the interesting problem of locating what offset within the kernel virtual address space you need to map to get at your data. Two things come to mind... have your kernel module export the address in a sysctl (that feels kind of hack-ish but it should be quick and easy to do), or use libkvm's kvm_nlist() function to locate the symbol within your module (I think that should be possible; again I've never actually done any of this). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Parallels v4 regression (aka ada(4) oddity) in RELENG_9
On Mon, 2012-01-23 at 10:06 -0800, Devin Teske wrote: I have a Parallels virtual machine and it runs FreeBSD 4 through 8 just swimmingly. However, in RELENG_9 I notice something different. My once ad0 is now showing up as ada0. However, something even stranger is that devfs is providing both ad0 family devices AND ada0 family devices. What's worse is that I can't seem to partition the disk with MBR+disklabel scheme. My procedure goes something like this: 1. Boot from RELENG_9 LiveCD 2. Execute: sysctl -n kern.disks 3. Notice two items: cd0 ada0 4. Look in /dev 5. Notice several items: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1 ada0p2 ada0p3 6. Wipe partition table by executing: dd if=/dev/zero of=/dev/ada0 bs=512k count=256 7. Look in /dev 8. Notice less items now: ad0 ada0 9. Execute: sysctl -n kern.disks 10. Notice nothing changed: cd0 ada0 11. Write out standard whole disk MBR slice 12. Look in /dev 13. Notice that nothing changed: ad0 ada0 NOTE: Where is ad0s1 or ada0s1? 14. Use fdisk to make sure everything was written successfully 15. Notice everything looks good (slice 1 is of type FreeBSD, slice 2, 3, and 4 are unused) 16. Reboot 17. Boot back into RELENG_9 LiveCD 18. Look in /dev 19. Notice that the old devices are back!: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1 ada0p2 ada0p3 20. Use fstab to look at MBR partition table 21. Notice that things look good (with respect to fdisk'ing): slice 1 is FreeBSD, 2, 3, and 4 are still unused 22. Notice /dev still doesn't have ad0s1 or ada0s1 23. Use gpart to look at ada0 24. Notice GPT [CORRUPT] ... OK!?!? ... Use same exact RELENG_9 LiveCD on either a physical machine or VMware Virtual machine. SUCCESS!! Go back to Parallels 4 FAILURE!! Go back to RELENG_8 LiveCD with Parallels 4 SUCCESS!! What's going on here? I think ada(4) is my problem. Can someone please provide feedback? Willing to dig further and provide any/all feedback to help fix this regression. I've experienced the part of that scenario where changing a drive from gpt to mbr scheme results in all the gpt partitions reappearing after a reboot. I concluded (but didn't take time to be absolutely certain) that during boot the geom layer was seeing the backup gpt partition info at the end of the disk and concluding that it needed to ignore the mbr and use the backup gpt info instead. Once I quit using dd and similar tools and consistantly used gpart destroy to wipe out the gpt before changing to mbr, it stopped happening. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Parallels v4 regression (aka ada(4) oddity) in RELENG_9
On Mon, 2012-01-23 at 10:15 -0800, Garrett Cooper wrote: On Mon, Jan 23, 2012 at 10:06 AM, Devin Teske devin.te...@fisglobal.com wrote: I have a Parallels virtual machine and it runs FreeBSD 4 through 8 just swimmingly. However, in RELENG_9 I notice something different. My once ad0 is now showing up as ada0. However, something even stranger is that devfs is providing both ad0 family devices AND ada0 family devices. What's worse is that I can't seem to partition the disk with MBR+disklabel scheme. My procedure goes something like this: 1. Boot from RELENG_9 LiveCD 2. Execute: sysctl -n kern.disks 3. Notice two items: cd0 ada0 4. Look in /dev 5. Notice several items: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1 ada0p2 ada0p3 6. Wipe partition table by executing: dd if=/dev/zero of=/dev/ada0 bs=512k count=256 7. Look in /dev 8. Notice less items now: ad0 ada0 9. Execute: sysctl -n kern.disks 10. Notice nothing changed: cd0 ada0 11. Write out standard whole disk MBR slice 12. Look in /dev 13. Notice that nothing changed: ad0 ada0 NOTE: Where is ad0s1 or ada0s1? 14. Use fdisk to make sure everything was written successfully 15. Notice everything looks good (slice 1 is of type FreeBSD, slice 2, 3, and 4 are unused) 16. Reboot 17. Boot back into RELENG_9 LiveCD 18. Look in /dev 19. Notice that the old devices are back!: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1 ada0p2 ada0p3 20. Use fstab to look at MBR partition table 21. Notice that things look good (with respect to fdisk'ing): slice 1 is FreeBSD, 2, 3, and 4 are still unused 22. Notice /dev still doesn't have ad0s1 or ada0s1 23. Use gpart to look at ada0 24. Notice GPT [CORRUPT] ... OK!?!? ... Use same exact RELENG_9 LiveCD on either a physical machine or VMware Virtual machine. SUCCESS!! Go back to Parallels 4 FAILURE!! Go back to RELENG_8 LiveCD with Parallels 4 SUCCESS!! What's going on here? I think ada(4) is my problem. Can someone please provide feedback? Willing to dig further and provide any/all feedback to help fix this regression. The 'bug' is in gpart/geom and the 'issue' is present in prior versions of FreeBSD. The backup partition is now more of a thorn in everyone's side than previous versions. gpart delete'ing all the partitions, then doing gpart destroy is probably what you want (there isn't a simple one-liner that would do this). Thanks, -Garrett 'gpart destroy -F geom' should do it in one step. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Rebooting/Halting system from kernel module
On Sun, 2012-01-22 at 14:19 +0400, geoffrey levand wrote: Hi, how would i reboot/halt the system from a kernel module ? regards -- Почта@Mail.Ru в твоем мобильном! Просто зайди с телефона на m.mail.ru There is an undocumented (at least in terms of a manpage) function named shutdown_nice() in sys/kern/kern_shutdown.c that will send a signal to the init process if it's running or call boot(9) if not. Or maybe a direct call to boot(9) is what you're looking for, if bypassing the running of rc shutdown scripts and all is your goal. (There is a mapage for boot(9)). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On Tue, 2012-01-17 at 10:56 -0800, Julian Elischer wrote: If it came to that maybe all the people who are currently saying they need better support of the 8.x branch could get together and together, support someone to do that job for them..would 1/5th of a person be too expensive for them? if not, what is a reasonable cost? Is it worth 1/20 th of a person? Julian I've got to say, this strikes me as the most interesting idea floated so far in this conversation. I've heard of many instances of sponsored projects; they almost always involve major new features or support for new hardware or technologies; paying someone for a specific small focused fix is also common. A sponsored branch is... well... just an interesting concept to me. Unlike most developers, I have little interest in creating new code from scratch to implement the fad of the week. (There's that whole other opensource OS if fad of the week technology is your thing.) I live to find and fix bugs. Sometimes that means days of frustration to generate a one-line patch. Sometimes you find the problem in minutes but the fix means a painful redesign that touches 342 files and has the potential to ruin everyone's day when you get it wrong. But, for me at least, it's much more challenging and thus more rewarding when you get it right. Despite being a developer myself, I understand completely where John is coming from in opening this conversation, and I'm firmly in the me too camp because I'm also an end user of FreeBSD. I work at a company that creates embedded systems products with FreeBSD as our OS. In July we started the process of converting our products from 6.2 to 8.2. Out of sheer emergency necessity we shipped a product using 8.2 in October -- 6.2 was panicking and the customer was screaming, we had no choice; we've had to do several fix releases since then. It's only within the past couple weeks that I think we're finally ready to deploy 8.2 for all new products. More testing is needed before updating existing products in the field. It takes a long time for a business to vet a major release of an OS and deploy it. It costs a lot. Now, before we're even really completely up and running on 8.2 at work, 9.0 hits the street, and developers have moved on to working in the 10.0 world. What are the chances that any of the patches I've submitted for bugs we fixed in 8.x are ever going to get commited now that 8 is well on its way to becoming ancient history in developers' minds? So back to where I started this rambling... that concept of a sponsored branch, or maybe something along the lines of a long-lived stable branch supported by a co-op of interested users. Some co-op members may be able to provide developers or other engineering-related resources, some may just pay cash to help acquire those resources for various short-term or targeted needs along the way. I think it could work, and I think businesses that need such stability might find it easier to contribute something to a co-op than the current situation that requires a company such as ours to become, in effect, our own little FreeBSD Project Lite (if you think FreeBSD lacks manpower to do release engineering, imagine how hard it is for a small or medium sized business). -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On Wed, 2012-01-18 at 01:17 +0200, Andriy Gapon wrote: on 17/01/2012 23:46 Ian Lepore said the following: Now, before we're even really completely up and running on 8.2 at work, 9.0 hits the street, and developers have moved on to working in the 10.0 world. What are the chances that any of the patches I've submitted for bugs we fixed in 8.x are ever going to get commited now that 8 is well on its way to becoming ancient history in developers' minds? My opinion is that this will have more to do with your approach to pushing the patches (and your persistence) rather than with anything else. As long as stable/8 is still a supported branch or the bugs are reproducible in any of the supported branches. Well I submitted a sort of random sample of the patches we're maintaining at work, 11 of them as formal PRs and 2 posted to the lists here recently. So far two have been committed (the most important one and the most trivial one, oddly enough). I'm not sure just how pushy one is supposed to be, I don't want to be a jerk. Not to mention that I wouldn't know who to push. That's actually why I'm now being active on the mailing lists, I figured maybe patches will be more accepted from someone the commiters know rather than just as code out of the blue attached to a PR. I think it would be great if there were some developers (a team, maybe something not quite that formal) who concentrated on maintenance of older code for the user base who needs it. I'd be happy to contribute to that effort, both on my own time, and I have a commitment from management at work to allow me a certain amount of billable work hours to interface with the FreeBSD community, especially in terms of getting our work contributed back to the project (both to help the project, and to help us upgrade more easily in the future). I have no idea if there are enough developers who'd be interested in such a concept to make it work, co-op or otherwise. But I like the fact that users and developers are talking about their various needs and concerns without any degeneration into flame wars. It's cool that most of the focus here is centered on how to make things better for everyone. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: BeagleBone?
On Sun, 2012-01-15 at 16:05 -0800, Tim Kientzle wrote: Just got a BeagleBone in the mail and so far, it seems like fun: * Under $100 * Relatively modern Cortex-A8 ARM CPU (TI AM3358) * Built-in Ethernet, USB console, etc. So far, I've gotten console access from my FreeBSD laptop and am starting to tinker with a nanobsd-like script to build a bootable SD image. (By copying the MLO and u-boot.img files; nothing FreeBSD-specific yet.) Next step: Compile the arm/uboot boot loader and see if I can get that to load and run. Anyone else tinkering with one of these? Any hints? ;-) Tim The freebsd-arm list would be the place for info. There's still work to do to get FreeBSD running on a Cortex-A8, last I heard. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: trouble with atrtc
On Thu, 2012-01-05 at 10:33 -0500, John Baldwin wrote: On Wednesday, January 04, 2012 5:22:29 pm Ian Lepore wrote: [...] Because atrtc.c has a long and rich history of modifcations, some of them fairly recent, I thought it would be a good idea to toss out my ideas for changes here and solicit feedback up front, rather than just blindly posting a PR with a patch... It turns out to be very easy to probe for the latched-read behavior with just a few lines of code in atrtc_start(), so I'd propose doing that and setting a flag that the in/out code can use to disable the caching of the current register number on hardware that needs it. I'd like to add a new public function, atrtc_nmi_enable(int enable) that drivers can use to manipulate the NMI flag safely under clock_lock and playing nicely with the register number caching code. Completely unrelated but nice to have: I'd like to add a tuneable to control the use of inb(0x84) ops to insert delays after writing to 0x70 and 0x71. Modern hardware doesn't need this, so I think it should default to not inserting delays. I've done all these things in our local 8.2 code base and tested them on all the hardware I've got on hand. If these changes sound acceptable I'll prepare patches to -current as well. These changes all sound good to me. Here is the patch for -current and 9. I can provide a patch to 8-stable as well; it's essentially the same patch with small context differences. I've tested this using -current on several systems, recent and old hardware, including manually bumping up the quality score for the rtc event timer to force it to get used, and it seems to work without trouble (and of course I've been testing the same patch in 8.2 for a while on a bunch of different hardware). Index: sys/isa/rtc.h === RCS file: /local/base/FreeBSD-CVS/src/sys/isa/rtc.h,v retrieving revision 1.16.2.1 diff -u -p -r1.16.2.1 rtc.h --- sys/isa/rtc.h 23 Sep 2011 00:51:37 - 1.16.2.1 +++ sys/isa/rtc.h 9 Jan 2012 22:04:12 - @@ -117,6 +117,7 @@ extern int atrtcclock_disable; intrtcin(int reg); void atrtc_restore(void); void writertc(int reg, u_char val); +void atrtc_nmi_enable(int enable); #endif #endif /* _I386_ISA_RTC_H_ */ Index: sys/x86/isa/atrtc.c === RCS file: /local/base/FreeBSD-CVS/src/sys/x86/isa/atrtc.c,v retrieving revision 1.13.2.1 diff -u -p -r1.13.2.1 atrtc.c --- sys/x86/isa/atrtc.c 23 Sep 2011 00:51:37 - 1.13.2.1 +++ sys/x86/isa/atrtc.c 9 Jan 2012 22:04:12 - @@ -55,28 +55,59 @@ __FBSDID($FreeBSD: src/sys/x86/isa/atrt #defineRTC_LOCKmtx_lock_spin(clock_lock) #defineRTC_UNLOCK mtx_unlock_spin(clock_lock) +/* atrtcclock_disable is set to 1 by apm_attach() or by hint.atrtc.0.clock=0 */ intatrtcclock_disable = 0; -static int rtc_reg = -1; -static u_char rtc_statusa = RTCSA_DIVIDER | RTCSA_NOPROF; -static u_char rtc_statusb = RTCSB_24HR; +static int use_iodelay = 0; /* set from hint.atrtc.0.use_iodelay */ + +#define RTC_REINDEX_REQUIRED 0xffU +#define NMI_ENABLE_BIT0x80U + +static u_char nmi_enable; +static u_char rtc_reg = RTC_REINDEX_REQUIRED; +static u_char rtc_statusa = RTCSA_DIVIDER | RTCSA_NOPROF; +static u_char rtc_statusb = RTCSB_24HR; + +/* + * Delay after writing to IO_RTC[+1] registers. Modern hardware doesn't + * require this expensive delay, so it's a tuneable that's disabled by default. + */ +static __inline void +rtc_iodelay(void) +{ + if (use_iodelay) + inb(0x84); +} /* * RTC support routines + * + * Most rtc chipsets let you write a value into the index register and then each + * read of the IO register obtains a new value from the indexed location. Others + * behave as if they latch the indexed value when you write to the index, and + * repeated reads keep returning the same value until you write to the index + * register again. atrtc_start() probes for this behavior and leaves rtc_reg + * set to RTC_REINDEX_REQUIRED if reads keep returning the same value. */ +static __inline void +rtcindex(u_char reg) +{ + if (rtc_reg != reg) { + if (rtc_reg != RTC_REINDEX_REQUIRED) + rtc_reg = reg; + outb(IO_RTC, reg | nmi_enable); + rtc_iodelay(); + } +} + int rtcin(int reg) { u_char val; RTC_LOCK; - if (rtc_reg != reg) { - inb(0x84); - outb(IO_RTC, reg); - rtc_reg = reg; - inb(0x84); - } + rtcindex(reg); val = inb(IO_RTC + 1); RTC_UNLOCK; return (val); @@ -87,14 +118,9 @@ writertc(int reg, u_char val) { RTC_LOCK; - if (rtc_reg != reg) { - inb(0x84); - outb(IO_RTC, reg); - rtc_reg
Re: backup BIOS settings
On Tue, 2012-01-10 at 04:01 +0100, Łukasz Kurek wrote: Hi, Is it possible to backup BIOS settings (CMOS configuration) to file and restore this settings on the other machine (the same hardware configuration and the same BIOS)? I try do it for this way: kldload nvram dd if=/dev/nvram of=nvram.bin (backup) dd if=nvram.bin of=/dev/nvram (restore) but this way always load default BIOS settings, not my (probably there is some kind of error). Examine the contents of the nvram.bin file with hexdump. If every byte has the same value, I just posted a patch to this list earlier today (subject is trouble with atrtc) that will fix the problem. Many new RTC chipsets have more than the original 114 bytes of nvram. The nvram driver doesn't currently provide access to the extra banks. I'm not sure whether the BIOS would store anything in those other banks, but if so, failing to save and restore those values might cause the behavior you see. Also, it's not directly related to your question, but I notice the nvram(4) manpage says the driver does nothing about the checksum, but looking at the driver code, it does recalculate the checksum when it writes to nvram. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: backup BIOS settings
On Tue, 2012-01-10 at 04:01 +0100, Łukasz Kurek wrote: Hi, Is it possible to backup BIOS settings (CMOS configuration) to file and restore this settings on the other machine (the same hardware configuration and the same BIOS)? I try do it for this way: kldload nvram dd if=/dev/nvram of=nvram.bin (backup) dd if=nvram.bin of=/dev/nvram (restore) but this way always load default BIOS settings, not my (probably there is some kind of error). Oh wait, the patch I posted can't help, because it fixes a problem that only happens when you read the same location repeatedly, and the nvram driver never does that. But it would still be interesting to examine the nvram.bin file and see if it looks reasonable. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org