from:"Ian Lepore"

Re: Mixing amd64 kernel with i386 world

2013-09-28 Thread Ian Lepore

On Sat, 2013-09-28 at 20:37 +1000, Peter Jeremy wrote:
 I have a system with 4GB RAM and hence need to use an amd64 kernel to use
 all the RAM (I can only access 3GB RAM with an i386 kernel).  OTOH, amd64
 processes are significantly (50-100%) larger than equivalent i386 processes
 and none none of the applications I'll be running on the system need to be
 64-bit.
 
 This implies that the optimal approach is an amd64 kernel with i386
 userland (I'm ignoring PAE as a useable approach).  I've successfully
 run i386 jails on amd64 systems so I know this mostly works.  I also
 know that there are some gotchas:
 - kdump needs to match the kernel
 - anything accessing /dev/mem or /dev/kmem (which implies anything that
   uses libkvm) probably needs to match the kernel.
 
 Has anyone investigated this approach?
 

Why are you ignoring PAE?  It's been working for me for years.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Trying to use /bin/sh

2013-09-28 Thread Ian Lepore

On Sat, 2013-09-28 at 16:36 +, Teske, Devin wrote:
 On Sep 28, 2013, at 1:12 AM, Stefan Esser wrote:
 
  Am 28.09.2013 00:14, schrieb Jilles Tjoelker:
  sh's model of startup files (only login shells use startup files with
  fixed names, other interactive shells only use $ENV) assumes that every
  session will load /etc/profile and ~/.profile at some point. This
  includes graphical sessions. The ENV file typically contains only shell
  options, aliases, function definitions and unexported variables but no
  environment variables.
  
  Some graphical environments actually source shell startup files like
  ~/.profile when logging in. I remember this from CDE for example. It is
  important to have some rule where this should happen to avoid doing it
  twice or never in strange configurations. As a workaround, I made
  ~/.xsession a script interpreted by my login shell and source some
  startup files. A problem here is that different login shells have
  incompatible startup files.
  
  I used to modify Xsession to do the final exec with a forced login
  shell of the user. This worked for users of all shells.
  
  The script identified the shell to use and then used argv0 to start
  a login shell to execute the display manager.
  
  A simplified version of my Xsession script is:
  
  --
  #!/bin/sh
  
  LIB=/usr/local/lib
  
  SH=$SHELL
  [ -n $SH ] || SH=/bin/sh
  SHNAME=`basename $SH`
  
  echo exec $LIB/xdm/Xsession.real $* | \
  /usr/local/bin/argv0 $SH -$SHNAME
  --
  
  The argv0 command is part of sysutils/ucspi-tcp, BTW.
  
  This script prepends a - to the name of the shell that is
  started to execute the real Xsession, which had been renamed
  to Xession.real.
  
  I know that the script could be further simplified by using modern
  variable expansion/substitution commands, but this script was in use
  some 25 years ago on a variety of Unix systems (SunOS, Ultrix, HP-UX)
  and I only used the minimal set of Bourne Shell facilities, then.
  
  You may want a command to source standard profiles or environment
  settings before the final exec, in case the users shell does not
  load them.
  
 
 In my ~/.fvwm2rc file, this is how I launch an XTerm. This achieves the
 goal of sourcing my profile scripts like a normal login shell while launching
 XTerm(s) in the GUI.
 
DestroyFunc FvwmXTerm
AddToFunc   FvwmXTerm
PipeRead '\
 cmd=/usr/bin/xterm;   \
 [ -x ${cmd} ] || cmd=/usr/X11R6/bin/xterm;  \
 [ -x ${cmd} ] || cmd=xterm; \
 cmd=${cmd} -sb -sl 400;   \
 cmd=${cmd} -ls;   \
 cmd=${cmd} -r -si -sk;\
 cmd=${cmd} -fn \\-misc-fixed-medium-r-*-*-15-*\\;   \
 echo + I Exec exec ${cmd}'
 
 Essentially producing an XTerm invocation of:
 
   xterm -sb -sl 400 -ls -r -si -sk -fn -misc-fixed-medium-r-*-*-15-*
 
 And everytime I launch an XTerm with that, I get my custom prompt set
 by ~/.bash_profile.
 
 Of course, I'm also a TCSH user, so when I flop over to tcsh, I also get
 my custom prompt set by ~/.tcshrc
 
 But failing that... you could actually make your XTerm a login shell with:
 
   xterm -e login
 
 But of course, then you're looking at having to enter credentials.
 
 Perhaps it's just a matter of getting your commands into the right file...
 
 .bash_profile for bash and .tcshrc for tcsh.

For bash the solution I've been using for like 15 years is that
my .bash_profile (used only for a login) contains simply:

if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

And everything goes into .bashrc which runs on non-login shell
invocation.  I have a few lines of code in .bashrc that have to cope
with things like not blindly adding something to PATH that's already
there[1] but other than that I generally want all the same things to
happen whether its a login shell or not.

I think the bourne-shell equivelent is to have a .profile that just sets
ENV=~/.shrc or similar.  (I think someone mentioned that earlier in the
thread.)

[1] for example:

 if [[ $PATH != *$HOME/bin*  -d $HOME/bin ]] ; then
export PATH=$HOME/bin:$PATH
 fi

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

The right way to invoke sh from a freebsd makefile?

2013-09-22 Thread Ian Lepore

What's the right way to launch the bourne shell from a makefile?  I had
assumed the ${SHELL} variable would be set to the right copy
of /bin/sh (like maybe the one in tmp or legacy at various stages).  It
appears that that's not the case, and ${SHELL} is whatever comes from
the environment, which can lead to using csh or bash or whatever.

I see some of our makefiles use just a bare sh which seems reasonable
to me, but I don't want to glitch this in src/include/Makefile again.
The goal is to run a script in src/include/Makefile by launching sh with
the script name (as opposed to launching the script and letting the #!
do its thing, which doesn't work if the source dir is mounted noexec).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: The right way to invoke sh from a freebsd makefile?

2013-09-22 Thread Ian Lepore

On Sun, 2013-09-22 at 19:27 -0400, Glen Barber wrote:
 On Sun, Sep 22, 2013 at 05:18:25PM -0600, Ian Lepore wrote:
  What's the right way to launch the bourne shell from a makefile?  I had
  assumed the ${SHELL} variable would be set to the right copy
  of /bin/sh (like maybe the one in tmp or legacy at various stages).  It
  appears that that's not the case, and ${SHELL} is whatever comes from
  the environment, which can lead to using csh or bash or whatever.
  
  I see some of our makefiles use just a bare sh which seems reasonable
  to me, but I don't want to glitch this in src/include/Makefile again.
  The goal is to run a script in src/include/Makefile by launching sh with
  the script name (as opposed to launching the script and letting the #!
  do its thing, which doesn't work if the source dir is mounted noexec).
  
 
 I think BUILDENV_SHELL is what you are looking for.  For this specific
 case, I think instead of '#!/bin/sh', maybe '#!/usr/bin/env sh' may be
 preferable.
 
 Glen
 

No, BUILDENV_SHELL is a special thing... it's used when you make
buildenv to chroot into a cross-build environment to work
interactively.  I added that long ago because I can't live in a csh
shell (I mean, I can't do anything, I'm totally lost), and I wanted a
way to have make buildenv put me right into bash (of course, you have
to have bash in the chroot).

The flavor of hashbang to use shouldn't matter, since what I'm after
here is launching the shell to run the script without using the hashbang
mechanism.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: The right way to invoke sh from a freebsd makefile?

2013-09-22 Thread Ian Lepore

On Sun, 2013-09-22 at 19:45 -0400, Glen Barber wrote:
 On Sun, Sep 22, 2013 at 05:37:51PM -0600, Ian Lepore wrote:
  On Sun, 2013-09-22 at 19:27 -0400, Glen Barber wrote:
   On Sun, Sep 22, 2013 at 05:18:25PM -0600, Ian Lepore wrote:
What's the right way to launch the bourne shell from a makefile?  I had
assumed the ${SHELL} variable would be set to the right copy
of /bin/sh (like maybe the one in tmp or legacy at various stages).  It
appears that that's not the case, and ${SHELL} is whatever comes from
the environment, which can lead to using csh or bash or whatever.

I see some of our makefiles use just a bare sh which seems reasonable
to me, but I don't want to glitch this in src/include/Makefile again.
The goal is to run a script in src/include/Makefile by launching sh with
the script name (as opposed to launching the script and letting the #!
do its thing, which doesn't work if the source dir is mounted noexec).

   
   I think BUILDENV_SHELL is what you are looking for.  For this specific
   case, I think instead of '#!/bin/sh', maybe '#!/usr/bin/env sh' may be
   preferable.
   
   Glen
   
  
  No, BUILDENV_SHELL is a special thing... it's used when you make
  buildenv to chroot into a cross-build environment to work
  interactively.  I added that long ago because I can't live in a csh
  shell (I mean, I can't do anything, I'm totally lost), and I wanted a
  way to have make buildenv put me right into bash (of course, you have
  to have bash in the chroot).
  
 
 Ah, right.  Thanks for the sanity check.
 
  The flavor of hashbang to use shouldn't matter, since what I'm after
  here is launching the shell to run the script without using the hashbang
  mechanism.
  
 
 You can hard-code /bin/sh directly, but what I was getting at with the
 '#!/usr/bin/env sh' is that the 'sh' interpreter of the build
 environment could be used (instead of /bin/sh directly).  Then you don't
 need to worry about the path to sh(1).
 
 Glen
 

My point is that the #! isn't used at all in this case, it doesn't
matter what's there.  Try this...

  echo echo foo /tmp/foo
  sh /tmp/foo

Not only does it not need the hashbang, the script doesn't even have to
be executable when you launch sh and name a script on the command line,
which is just what's needed to run a script from a directory mounted
with the noexec flag.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: BUS_PROBE_NOWILDCARD behaviour doesn't seem to match DEVICE_PROBE(9)

2013-07-29 Thread Ian Lepore

On Thu, 2013-06-20 at 10:54 -0400, Ryan Stone wrote:
http://www.freebsd.org/cgi/man.cgi?query=DEVICE_PROBEapropos=0sektion=0manpath=FreeBSD%208.2-RELEASEformat=html

DEVICE_PROBE(9) has this to say about BUS_PROBE_NOWILDCARD:

The driver expects its parent to tell it which children to manage and no
probing is really done. The device only matches if its parent bus
specifically said to use this driver.

I interpreted this as meaning that if BUS_ADD_CHILD() is called with the
name parameter specifying a driver then if that driver's probe method
returns BUS_PROBE_NOWILDCARD the driver will match that device. However
the logic in subr_bus.c is more strict; it will only match if the unit
number if also specified. This seems overly strict to me, and there
appears to be at least one case in-tree where a driver will never match due
to this behaviour:

http://svnweb.freebsd.org/base/head/sys/dev/iicbus/iicsmb.c?revision=227843view=markup

The iicsmb driver calls BUS_ADD_CHILD() from its identify method with a
wildcarded unit number (-1) but the driver specified. It then returns
BUS_PROBE_NOWILDCARD from its attach method(intending that it only claim
the device created in the identify method), but that won't match.

I want to use the exact same pattern in a new driver. The following patch
allows this to work:

diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
index 1f3d4e8..7e48b0e 100644
--- a/sys/kern/subr_bus.c
+++ b/sys/kern/subr_bus.c
@@ -2015,7 +2015,7 @@ device_probe_child(device_t dev, device_t child)
* in stone by the parent bus.
*/
if (result = BUS_PROBE_NOWILDCARD
- child-flags DF_WILDCARD)
+ !(child-flags DF_FIXEDCLASS))
continue;
best = dl;
pri = result;

This should be safe to do, as all devices that specified a unit number must
have specified a driver, so this can't cause any devices to suddenly fail
to match. I supposed that it theoretically could cause a driver to match a
device that previously it wouldn't have, but I'm having trouble seeing how
somebody could add a device of type foo and not expect the foo driver
to attach.

Any objections if I commit this?

I know this is pretty long after the fact, but it looks like this never
got committed. I recently had to port some drivers written for freebsd
4 and 6 to 8.2, and some of them have no real probe mechanism and
attached themselves to, like, *everything* (serial and parallel ports
and so on). They're instantiated based on hints that are definitive, so
I switched to returning BUS_PROBE_NOWILDCARD and sanity returned.

Then I remembered this email, so I applied your patch and re-tested and
everything still worked perfectly. Not exactly an exhaustive test, but
at least a positive datapoint.

-- Ian

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: bin/176713: [patch] nc(1) closes network socket too soon

2013-07-29 Thread Ian Lepore

On Tue, 2013-07-23 at 16:48 -0700, Ronald F. Guilmette wrote:
 In message 
 caj-vmonk-8v9ej0w4qycnnbkieoee9dl3btvp6vqipxkh2j...@mail.gmail.com
 Adrian Chadd adr...@freebsd.org wrote:
 
 Right, and your patch just stops the shutdown(), right?
 
 The shutdown that occurs when EOF is encountered on stdin, yes.
 
 Rather than
 teaching nc to correctly check BOTH socket states before deciding to
 close things.
 
 In effect, nc *is* currently checking both sockets and that is exactly
 the problem.  It terminates (prematurely in some cases) whenever it sees
 an EOF _either_ from the remote host _or_ from stdin.
 
 My patch casuses nc to wait for EOF from the remote server before exiting,
 EVEN IF prior to the time it sees that EOF from the remote server it sees
 an EOF (first) on stdin.
 
 This code change demonstratably makes the functionality of nc better and
 more pragmatically useful in typical use cases.
 
 You appear to be proposing something else, but I'm sorry to say that
 I cannot decypher what, exactly you are attempting to propose.
 
 I have proposed specific code changes.  If you have some different ones
 that you would like to propose, then I feel sure that everyone on the
 hackers list, including myself, would be interested to take a look at
 what you have in mind, and also what problem you are solving.
 
 I'd personally rather see nc taught to check to see whether it can
 possibly make ANY more progress before deciding to shut things down.
 
 I believe that that is exactly what the patch that I proposed does.
 I'm not sure why you feel otherwise.
 
 Look, there are only two scenarios... either (a) EOF arrives from stdin
 first or else (b) EOF arrives from the remote server first.
 

I don't think this accurately summarizes things.  The view of the remote
server isn't just EOF arrives it's can't read anymore and can't
write anymore which have to be handled separately.

 My patch causes nc to continue gathering data from the remote server
 (and copying it all to stdout) in case (a).
 
 In case (b) there is no point in nc continuing to run (and/or continuing
 to read from stdin) if the remote server has shut down the connection.
 In this case, the data that nc might yet gather from its stdin channel
 has noplace to go!  So whenever nc has sensed an EOF from the remote
 server it can (and should) immediately shut down... and that is exactly
 what it is _already_ programmed to do.
 

Here you seem to be talking about the inability to send more data to the
remote side.  If you exit immediately when that happens, even if you
could still read from the remote side, then you may miss the incoming
data that would tell you why you can't send anymore.  In this case the
thing to do would be to stop reading stdin, but continue to read the
remote side and copy it to stdout until you get EOF reading the remote
side.

Conversely, you can't exit immediately when the remote side has no more
to send you and shuts down that half of the connection, you still have
to read from stdin and send it to the remote until EOF on stdin or the
remote shuts down that half of the connection.

How all this applies to netcat's ability to do connectionless (UDP)
stuff probably makes the whole thing that much more interesting.

BTW, earlier in the thread you asserted more or less that telnet is for
interactive and nc for scripting.  I virtually never use nc in any way
except interactively, and I use it that way every day, all day long.

-- Ian

 So, what problem do you want to solve that is not solved by the patch
 that I already proposed?
 
 Also, with respect, if you think there really is some other problem,
 then proposing actual concrete patches to solve that other problem
 would perhaps allow folks, including myself, to better understand what
 it is that you are driving at.
 
 
 Regards,
 rfg
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: rc.d scripts to control multiple instances of the same daemon?

2013-06-25 Thread Ian Lepore

On Tue, 2013-06-25 at 15:44 -0400, Garrett Wollman wrote:
 I'm in the process of (re)writing an rc.d script for kadmind
 (security/krb5).  Unlike the main Kerberos daemon, kadmind needs to
 have a separate instance for each realm on the server -- it can't
 support multiple realms in a single process.  What I need to be able
 to do:
 
 1) Have different flags and pidfiles for each instance.
 2) Be able to start, stop, restart, and status each individual
 instance by giving its name on the command line.
 3) Have all instances start/stop automatically when a specific
 instance isn't specified.
 
 I've looked around for examples of good practice to emulate, and
 haven't found much.  The closest to what I want looks to be
 vboxheadless, but I'm uncomfortable with the amount of mechanism from
 rc.subr that it needs to reimplement.  Are there any better examples?

The one like that I use the most is service netif restart fpx0 but I'm
not sure the complex network stuff will be the cleanest example of
anything except how to do complex network stuff. :)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Custom kernel under RPI

2013-03-15 Thread Ian Lepore

On Fri, 2013-03-15 at 18:21 +0100, Loïc BLOT wrote:
 Hi all,
 I don't know if it's the good list, but hackers for RPI, i think it's a
 good thing :D
 
 I have a little problem with custom kernel with RPI. I have modified
 RPI-B config file to include run/runfw driver, compiled the kernel and
 install it (make buildkernel KERNCONF=RPI-B  make installkernel
 KERNCONF=RPI-B, from the RPI). The problem is at reboot. I can't boot on
 the RPI, because the kernel is frozen after those lines:
 
 Kernel entry at 0x100100 ..
 Kernel args: (null)
 
 Nothing after.
 Can someone tell me if i do something wrong ?
 Thanks for advance

For arm-specific questions, the freebsd-arm list might be better (I've
added it to the CC).

The problem may be that it has no device-tree info.  You can add fdt
addr 0x100 to the /boot/loader.rc file to fix that.  You can also enter
it by hand at the loader prompt first to see if that helps... just hit a
character (other than return) while it's loading the kernel, enter that
command, then enter 'boot'.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: rtprio_thread trouble

2013-03-06 Thread Ian Lepore

On Wed, 2013-03-06 at 09:17 -0500, John Baldwin wrote:
 On Thursday, February 28, 2013 2:59:16 pm Ian Lepore wrote:
  On Tue, 2013-02-26 at 15:29 -0500, John Baldwin wrote:
   On Friday, February 22, 2013 2:06:00 pm Ian Lepore wrote:
I ran into some trouble with rtprio_thread() today.  I have a worker
thread that I want to run at idle priority most of the time, but if it
falls too far behind I'd like to bump it back up to regular timeshare
priority until it catches up.  In a worst case, the system is
continuously busy and something scheduled at idle priority is never
going to run (for some definition of 'never').  

What I found is that in this worst case, even after my main thread has
used rtprio_thread() to change the worker thread back to
RTP_PRIO_NORMAL, the worker thread never gets scheduled.  This is with
the 4BSD scheduler but it appears that the same would be the case with
ULE, based on code inspection.  I find that this fixes it for 4BSD, and
I think the same would be true for ULE...

--- a/sys/kern/sched_4bsd.c Wed Feb 13 12:54:36 2013 -0700
+++ b/sys/kern/sched_4bsd.c Fri Feb 22 11:55:35 2013 -0700
@@ -881,6 +881,9 @@ sched_user_prio(struct thread *td, u_cha
return;
oldprio = td-td_user_pri;
td-td_user_pri = prio;
+   if (td-td_flags  TDF_BORROWING  td-td_priority = prio)
+   return;
+   sched_priority(td, prio);
 }
 
 void

But I'm not sure if this would have any negative side effects,
especially since in the ULE case there's a comment on this function that
specifically notes that it changes the the user priority without
changing the current priority (but it doesn't say why that matters).

Is this a reasonable way to fix this problem, or is there a better way?
   
   This will lose the priority boost afforded to interactive threads when 
   they 
   sleep in the kernel in the 4BSD scheduler.  You aren't supposed to drop 
   the
   user priority to loose this boost until userret().  You could perhaps try
   only altering the priority if the new user pri is lower than your current
   priority (and then you don't have to check TDF_BORROWING I believe):
   
 if (prio  td-td_priority)
 sched_priority(td, prio);
   
  
  That's just the sort of insight I was looking for, thanks.  That made me
  look at the code more and think harder about the problem I'm trying to
  solve, and I concluded that doing it within the scheduler is all wrong.
  
  That led me to look elsewhere, and I discovered the change you made in
  r228207, which does almost what I want, but your change does it only for
  realtime priorities, and I need a similar effect for idle priorities.
  What I came up with is a bit different than yours (attached below) and
  I'd like your thoughts on it.
  
  I start with the same test as yours: if sched_user_prio() didn't
  actually change the user priority (due to borrowing), do nothing.  Then
  mine differs:  call sched_prio() to effect the change only if either the
  old or the new priority class is not timeshare.
  
  My reasoning for the second half of the test is that if it's a change in
  timeshare priority then the scheduler is going to adjust that priority
  in a way that completely wipes out the requested change anyway, so
  what's the point?  (If that's not true, then allowing a thread to change
  its own timeshare priority would subvert the scheduler's adjustments and
  let a cpu-bound thread monopolize the cpu; if allowed at all, that
  should require priveleges.)
  
  On the other hand, if either the old or new priority class is not
  timeshare, then the scheduler doesn't make automatic adjustments, so we
  should honor the request and make the priority change right away.  The
  reason the old class gets caught up in this is the very reason I'm
  wanting to make a change:  when thread A changes the priority of its
  child thread B from idle back to timeshare, thread B never actually gets
  moved to a timeshare-range run queue unless there are some idle cycles
  available to allow it to first get scheduled again as an idle thread.
  
  Finally, my change doesn't consider the td == curthread situation at
  all, because I don't see how that's germane.  This is the thing I'm
  least sure of -- I don't at all understand why the old code (even before
  your changes) had that test.  The old code had that flagged as XXX
  dubious (a comment a bit too cryptic to be useful).
 
 I think your change is correct.  One style nit: please sort the order of
 variables (oldclass comes before oldpri).
 

Thanks for the review.  I've been running my change on one of our
products in an 8.2 environment, and on some arm platforms running
-current, and it seems to be working well.

Alphabetizing:  Grrr, yeah.  I had it that way at first, but it just
offended my sensibilities to separate related values

Re: rtprio_thread trouble

2013-02-28 Thread Ian Lepore

On Tue, 2013-02-26 at 15:29 -0500, John Baldwin wrote:
 On Friday, February 22, 2013 2:06:00 pm Ian Lepore wrote:
  I ran into some trouble with rtprio_thread() today.  I have a worker
  thread that I want to run at idle priority most of the time, but if it
  falls too far behind I'd like to bump it back up to regular timeshare
  priority until it catches up.  In a worst case, the system is
  continuously busy and something scheduled at idle priority is never
  going to run (for some definition of 'never').  
  
  What I found is that in this worst case, even after my main thread has
  used rtprio_thread() to change the worker thread back to
  RTP_PRIO_NORMAL, the worker thread never gets scheduled.  This is with
  the 4BSD scheduler but it appears that the same would be the case with
  ULE, based on code inspection.  I find that this fixes it for 4BSD, and
  I think the same would be true for ULE...
  
  --- a/sys/kern/sched_4bsd.c Wed Feb 13 12:54:36 2013 -0700
  +++ b/sys/kern/sched_4bsd.c Fri Feb 22 11:55:35 2013 -0700
  @@ -881,6 +881,9 @@ sched_user_prio(struct thread *td, u_cha
  return;
  oldprio = td-td_user_pri;
  td-td_user_pri = prio;
  +   if (td-td_flags  TDF_BORROWING  td-td_priority = prio)
  +   return;
  +   sched_priority(td, prio);
   }
   
   void
  
  But I'm not sure if this would have any negative side effects,
  especially since in the ULE case there's a comment on this function that
  specifically notes that it changes the the user priority without
  changing the current priority (but it doesn't say why that matters).
  
  Is this a reasonable way to fix this problem, or is there a better way?
 
 This will lose the priority boost afforded to interactive threads when they 
 sleep in the kernel in the 4BSD scheduler.  You aren't supposed to drop the
 user priority to loose this boost until userret().  You could perhaps try
 only altering the priority if the new user pri is lower than your current
 priority (and then you don't have to check TDF_BORROWING I believe):
 
   if (prio  td-td_priority)
   sched_priority(td, prio);
 

That's just the sort of insight I was looking for, thanks.  That made me
look at the code more and think harder about the problem I'm trying to
solve, and I concluded that doing it within the scheduler is all wrong.

That led me to look elsewhere, and I discovered the change you made in
r228207, which does almost what I want, but your change does it only for
realtime priorities, and I need a similar effect for idle priorities.
What I came up with is a bit different than yours (attached below) and
I'd like your thoughts on it.

I start with the same test as yours: if sched_user_prio() didn't
actually change the user priority (due to borrowing), do nothing.  Then
mine differs:  call sched_prio() to effect the change only if either the
old or the new priority class is not timeshare.

My reasoning for the second half of the test is that if it's a change in
timeshare priority then the scheduler is going to adjust that priority
in a way that completely wipes out the requested change anyway, so
what's the point?  (If that's not true, then allowing a thread to change
its own timeshare priority would subvert the scheduler's adjustments and
let a cpu-bound thread monopolize the cpu; if allowed at all, that
should require priveleges.)

On the other hand, if either the old or new priority class is not
timeshare, then the scheduler doesn't make automatic adjustments, so we
should honor the request and make the priority change right away.  The
reason the old class gets caught up in this is the very reason I'm
wanting to make a change:  when thread A changes the priority of its
child thread B from idle back to timeshare, thread B never actually gets
moved to a timeshare-range run queue unless there are some idle cycles
available to allow it to first get scheduled again as an idle thread.

Finally, my change doesn't consider the td == curthread situation at
all, because I don't see how that's germane.  This is the thing I'm
least sure of -- I don't at all understand why the old code (even before
your changes) had that test.  The old code had that flagged as XXX
dubious (a comment a bit too cryptic to be useful).

-- Ian

Index: sys/kern/kern_resource.c
===
--- sys/kern/kern_resource.c	(revision 247421)
+++ sys/kern/kern_resource.c	(working copy)
@@ -469,8 +469,7 @@ sys_rtprio(td, uap)
 int
 rtp_to_pri(struct rtprio *rtp, struct thread *td)
 {
-	u_char	newpri;
-	u_char	oldpri;
+	u_char  newpri, oldpri, oldclass;
 
 	switch (RTP_PRIO_BASE(rtp-type)) {
 	case RTP_PRIO_REALTIME:
@@ -493,11 +492,12 @@ rtp_to_pri(struct rtprio *rtp, struct thread *td)
 	}
 
 	thread_lock(td);
+	oldclass = td-td_pri_class;
 	sched_class(td, rtp-type);	/* XXX fix */
 	oldpri = td-td_user_pri;
 	sched_user_prio(td, newpri);
-	if (td-td_user_pri != oldpri  (td == curthread ||
-	td

Re: TFTP single file kernel load

2013-02-23 Thread Ian Lepore

On Sat, 2013-02-23 at 16:28 +0100, Wojciech Puchar wrote:
 can it be done?
 
 converting ELF kernel (i don't use kld modules) to format that can be 
 loaded directly over TFTP - without intermediate stages like loader(8)?
 
 just to have SINGLE FILE that tftp would load and run. no loader(8) etc.

The kernel build process for arm and mips create such a kernel as one of
the standard outputs from buildkernel.  That doesn't appear to be the
case for x86 kernels, but you could use sys/conf/makefile.arm as a
guide.

Basically what needs doing is to link the kernel with a modified
ldscript that doesn't add space for the program headers, and then run
the output of that link through objcopy -S -O binary to create a
kernel.bin file.  That file can be directly loaded to the address it was
linked for, and a jump to the load address launches the kernel.

Whether the kernel runs properly when launched that way is a different
question.  An arm kernel will run that way because we haven't had the
luxury of loader(8) in the arm world until recently.  The x86 kernel may
expect values in the environment that the loader obtained from the bios.
Without a loader you may need to modify the kernel to get that
information in some other way early in startup.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: TFTP single file kernel load

2013-02-23 Thread Ian Lepore

On Sat, 2013-02-23 at 17:57 +0100, Wojciech Puchar wrote:
 
  Basically what needs doing is to link the kernel with a modified
  ldscript that doesn't add space for the program headers, and then run
  the output of that link through objcopy -S -O binary to create a
  kernel.bin file.  That file can be directly loaded to the address it was
  linked for, and a jump to the load address launches the kernel.
 
 is btxld(8) a tool i have to use after making kernel.bin file?
 
 what should i use for -b and -l
 

I've never heard of btxld before now, and from a quick look at its
manpage its not clear to me what it does.  It may be a part of the x86
build process I've never noticed before.

 
 
  Whether the kernel runs properly when launched that way is a different
  question.  An arm kernel will run that way because we haven't had the
  luxury of loader(8) in the arm world until recently.  The x86 kernel may
  expect values in the environment that the loader obtained from the bios.
 
 it can be loaded without loader for now - if you press a key before 
 loader(8) is loaded and enter kernel image.
 
 at least it was like that.

Oh, good point, maybe it'll just work fine (although it's been years
since I last loaded an x86 kernel directly from boot2, way back before
the days of acpi and smap data and all of that modern stuff).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

rtprio_thread trouble

2013-02-22 Thread Ian Lepore

I ran into some trouble with rtprio_thread() today.  I have a worker
thread that I want to run at idle priority most of the time, but if it
falls too far behind I'd like to bump it back up to regular timeshare
priority until it catches up.  In a worst case, the system is
continuously busy and something scheduled at idle priority is never
going to run (for some definition of 'never').  

What I found is that in this worst case, even after my main thread has
used rtprio_thread() to change the worker thread back to
RTP_PRIO_NORMAL, the worker thread never gets scheduled.  This is with
the 4BSD scheduler but it appears that the same would be the case with
ULE, based on code inspection.  I find that this fixes it for 4BSD, and
I think the same would be true for ULE...

--- a/sys/kern/sched_4bsd.c Wed Feb 13 12:54:36 2013 -0700
+++ b/sys/kern/sched_4bsd.c Fri Feb 22 11:55:35 2013 -0700
@@ -881,6 +881,9 @@ sched_user_prio(struct thread *td, u_cha
return;
oldprio = td-td_user_pri;
td-td_user_pri = prio;
+   if (td-td_flags  TDF_BORROWING  td-td_priority = prio)
+   return;
+   sched_priority(td, prio);
 }
 
 void

But I'm not sure if this would have any negative side effects,
especially since in the ULE case there's a comment on this function that
specifically notes that it changes the the user priority without
changing the current priority (but it doesn't say why that matters).

Is this a reasonable way to fix this problem, or is there a better way?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

why no per-thread scheduling niceness?

2013-02-22 Thread Ian Lepore

I'm curious why the concept of scheduling niceness applies only to an
entire process, and it's not possible to have nice threads within a
process.  Is there any fundamental reason why it couldn't be supported
with some extra bookkeeping to track niceness per thread?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Request for review, time_pps_fetch() enhancement

2013-02-13 Thread Ian Lepore

On Tue, 2013-02-12 at 22:34 +0200, Konstantin Belousov wrote:
 On Tue, Feb 12, 2013 at 09:03:39AM -0700, Ian Lepore wrote:
  On Sun, 2013-02-10 at 12:37 +0200, Konstantin Belousov wrote:
   On Sat, Feb 09, 2013 at 02:47:06PM +0100, Jilles Tjoelker wrote:
On Wed, Feb 06, 2013 at 05:58:30PM +0200, Konstantin Belousov wrote:
 On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote:
  I'd like feedback on the attached patch, which adds support to our
  time_pps_fetch() implementation for the blocking behaviors 
  described in
  section 3.4.3 of RFC 2783.  The existing implementation can only 
  return
  the most recently captured data without blocking.  These changes 
  add the
  ability to block (forever or with timeout) until a new event occurs.

  Index: sys/kern/kern_tc.c
  ===
  --- sys/kern/kern_tc.c  (revision 246337)
  +++ sys/kern/kern_tc.c  (working copy)
  @@ -1446,6 +1446,50 @@
* RFC 2783 PPS-API implementation.
*/
   
  +static int
  +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps)
  +{
  [snip]
  +   aseq = pps-ppsinfo.assert_sequence;
  +   cseq = pps-ppsinfo.clear_sequence;
  +   while (aseq == pps-ppsinfo.assert_sequence 
  +   cseq == pps-ppsinfo.clear_sequence) {
 Note that compilers are allowed to optimize these accesses even over
 the sequential point, which is the tsleep() call. Only accesses to
 volatile objects are forbidden to be rearranged.

 I suggest to add volatile casts to pps in the loop condition.

The memory pointed to by pps is global (other code may have a pointer to
it); therefore, the compiler must assume that the tsleep() call (which
invokes code in a different compilation unit) may modify it.

Because volatile does not make concurrent access by multiple threads
defined either, adding it here only seems to slow down the code
(potentially).
   The volatile guarantees that the compiler indeed reloads the value on
   read access. Conceptually, the tsleep() does not modify or even access
   the checked fields, and compiler is allowed to note this by whatever
   methods (LTO ?).
   
   More, the standard says that an implementation is allowed to not evaluate
   part of the expression if no side effects are produced, even by calling
   a function.
   
   I agree that for practical means, the _currently_ used compilers should
   consider the tsleep() call as the sequential point. But then the volatile
   qualifier cast applied for the given access would not change the code as
   well.
   
  
  Doesn't this then imply that essentially every driver has this problem,
  and for that matter, every sequence of code anywhere in the base
  involving loop while repeatedly sleeping, then waking and checking the
  state of some data for changes?  I sure haven't seen that many volatile
  qualifiers scattered around the code.
 
 No, it does not imply that every driver has this problem.
 A typical driver provides the mutual exclusion for access of
 the shared data, which means using locks. Locks include neccessary
 barries to ensure the visibility of the changes, in particular the
 compiler barriers.

O.   I had never considered that using mutexes had other side
effects.  So is there a correct MI way to invoke the right barrier magic
in a situation like this?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Request for review, time_pps_fetch() enhancement

2013-02-12 Thread Ian Lepore

On Sun, 2013-02-10 at 12:41 +0200, Konstantin Belousov wrote:
 On Fri, Feb 08, 2013 at 04:13:40PM -0700, Ian Lepore wrote:
  On Wed, 2013-02-06 at 17:58 +0200, Konstantin Belousov wrote:
   On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote:
I'd like feedback on the attached patch, which adds support to our
time_pps_fetch() implementation for the blocking behaviors described in
section 3.4.3 of RFC 2783.  The existing implementation can only return
the most recently captured data without blocking.  These changes add the
ability to block (forever or with timeout) until a new event occurs.

-- Ian

   
Index: sys/kern/kern_tc.c
===
--- sys/kern/kern_tc.c  (revision 246337)
+++ sys/kern/kern_tc.c  (working copy)
@@ -1446,6 +1446,50 @@
  * RFC 2783 PPS-API implementation.
  */
 
+static int
+pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps)
+{
+   int err, timo;
+   pps_seq_t aseq, cseq;
+   struct timeval tv;
+
+   if (fapi-tsformat  fapi-tsformat != PPS_TSFMT_TSPEC)
+   return (EINVAL);
+
+   /*
+* If no timeout is requested, immediately return whatever 
values were
+* most recently captured.  If timeout seconds is -1, that's a 
request
+* to block without a timeout.  WITNESS won't let us sleep 
forever
+* without a lock (we really don't need a lock), so just 
repeatedly
+* sleep a long time.
+*/
   Regarding no need for the lock, it would just move the implementation into
   the low quality one, for the case when one timestamp capture is lost
   and caller of time_pps_fetch() sleeps until next pps event is generated.
   
   I understand the desire to avoid lock, esp. in the pps_event() called
   from the arbitrary driver context. But the race is also real.
   
  
  What race?  A user of the pps interface understands that there is one
  event per second, and understands that if you ask to block until the
  next event at approximately the time that event is expected to occur,
  then it is ambiguous whether the call completes almost-immediately or in
  about 1 second.
  
  Looking at it another way, if a blocking call is made right around the
  time of the PPS, the thread could get preempted before getting to
  pps_fetch() function and not get control again until after the PPS has
  occurred.  In that case it's going to block for about a full second,
  even though the call was made before top-of-second.  That situation is
  exactly the same with or without locking, so what extra functionality is
  gained with locking?  What guarantee does locking let us make to the
  caller that the lockless code doesn't?
 
 No guarantees, but I noted in the original reply that this is about the
 quality of the implementation and not about correctness.
 
 As I said there as well, I am not sure that any locking can be useful
 for the situation at all.

Well then I guess I don't understand what you mean by the term
quality.  Apparently you use it as some form of jargon rather than its
usual accepted meaning in everyday English?

Or, more directly:  are you implying something should be changed to make
the code better?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Request for review, time_pps_fetch() enhancement

2013-02-12 Thread Ian Lepore

On Sun, 2013-02-10 at 12:37 +0200, Konstantin Belousov wrote:
 On Sat, Feb 09, 2013 at 02:47:06PM +0100, Jilles Tjoelker wrote:
  On Wed, Feb 06, 2013 at 05:58:30PM +0200, Konstantin Belousov wrote:
   On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote:
I'd like feedback on the attached patch, which adds support to our
time_pps_fetch() implementation for the blocking behaviors described in
section 3.4.3 of RFC 2783.  The existing implementation can only return
the most recently captured data without blocking.  These changes add the
ability to block (forever or with timeout) until a new event occurs.
  
Index: sys/kern/kern_tc.c
===
--- sys/kern/kern_tc.c  (revision 246337)
+++ sys/kern/kern_tc.c  (working copy)
@@ -1446,6 +1446,50 @@
  * RFC 2783 PPS-API implementation.
  */
 
+static int
+pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps)
+{
[snip]
+   aseq = pps-ppsinfo.assert_sequence;
+   cseq = pps-ppsinfo.clear_sequence;
+   while (aseq == pps-ppsinfo.assert_sequence 
+   cseq == pps-ppsinfo.clear_sequence) {
   Note that compilers are allowed to optimize these accesses even over
   the sequential point, which is the tsleep() call. Only accesses to
   volatile objects are forbidden to be rearranged.
  
   I suggest to add volatile casts to pps in the loop condition.
  
  The memory pointed to by pps is global (other code may have a pointer to
  it); therefore, the compiler must assume that the tsleep() call (which
  invokes code in a different compilation unit) may modify it.
  
  Because volatile does not make concurrent access by multiple threads
  defined either, adding it here only seems to slow down the code
  (potentially).
 The volatile guarantees that the compiler indeed reloads the value on
 read access. Conceptually, the tsleep() does not modify or even access
 the checked fields, and compiler is allowed to note this by whatever
 methods (LTO ?).
 
 More, the standard says that an implementation is allowed to not evaluate
 part of the expression if no side effects are produced, even by calling
 a function.
 
 I agree that for practical means, the _currently_ used compilers should
 consider the tsleep() call as the sequential point. But then the volatile
 qualifier cast applied for the given access would not change the code as
 well.
 

Doesn't this then imply that essentially every driver has this problem,
and for that matter, every sequence of code anywhere in the base
involving loop while repeatedly sleeping, then waking and checking the
state of some data for changes?  I sure haven't seen that many volatile
qualifiers scattered around the code.

-- Ian

  
+   err = tsleep(pps, PCATCH, ppsfch, timo);
+   if (err == EWOULDBLOCK  fapi-timeout.tv_sec 
== -1) {
+   continue;
+   } else if (err != 0) {
+   return (err);
+   }
+   }
+   }
  -- 
  Jilles Tjoelker


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Reviewing a FAQ change about LORs

2013-02-08 Thread Ian Lepore

On Thu, 2013-02-07 at 19:32 -0500, Eitan Adler wrote:
 Does someone here mind reviewing
 http://www.freebsd.org/cgi/query-pr.cgi?pr=174226 for correctness.
 
 Please feel free to post alternate diffs as a reply as well.
 

Does it make sense to reference a web page on LOR status that hasn't
been updated in four years?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Request for review, time_pps_fetch() enhancement

2013-02-08 Thread Ian Lepore

On Wed, 2013-02-06 at 17:58 +0200, Konstantin Belousov wrote:
 On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote:
  I'd like feedback on the attached patch, which adds support to our
  time_pps_fetch() implementation for the blocking behaviors described in
  section 3.4.3 of RFC 2783.  The existing implementation can only return
  the most recently captured data without blocking.  These changes add the
  ability to block (forever or with timeout) until a new event occurs.
  
  -- Ian
  
 
  Index: sys/kern/kern_tc.c
  ===
  --- sys/kern/kern_tc.c  (revision 246337)
  +++ sys/kern/kern_tc.c  (working copy)
  @@ -1446,6 +1446,50 @@
* RFC 2783 PPS-API implementation.
*/
   
  +static int
  +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps)
  +{
  +   int err, timo;
  +   pps_seq_t aseq, cseq;
  +   struct timeval tv;
  +
  +   if (fapi-tsformat  fapi-tsformat != PPS_TSFMT_TSPEC)
  +   return (EINVAL);
  +
  +   /*
  +* If no timeout is requested, immediately return whatever values were
  +* most recently captured.  If timeout seconds is -1, that's a request
  +* to block without a timeout.  WITNESS won't let us sleep forever
  +* without a lock (we really don't need a lock), so just repeatedly
  +* sleep a long time.
  +*/
 Regarding no need for the lock, it would just move the implementation into
 the low quality one, for the case when one timestamp capture is lost
 and caller of time_pps_fetch() sleeps until next pps event is generated.
 
 I understand the desire to avoid lock, esp. in the pps_event() called
 from the arbitrary driver context. But the race is also real.
 

What race?  A user of the pps interface understands that there is one
event per second, and understands that if you ask to block until the
next event at approximately the time that event is expected to occur,
then it is ambiguous whether the call completes almost-immediately or in
about 1 second.

Looking at it another way, if a blocking call is made right around the
time of the PPS, the thread could get preempted before getting to
pps_fetch() function and not get control again until after the PPS has
occurred.  In that case it's going to block for about a full second,
even though the call was made before top-of-second.  That situation is
exactly the same with or without locking, so what extra functionality is
gained with locking?  What guarantee does locking let us make to the
caller that the lockless code doesn't?

  +   if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec) {
  +   if (fapi-timeout.tv_sec == -1)
  +   timo = 0x7fff;
  +   else {
  +   tv.tv_sec = fapi-timeout.tv_sec;
  +   tv.tv_usec = fapi-timeout.tv_nsec / 1000;
  +   timo = tvtohz(tv);
  +   }
  +   aseq = pps-ppsinfo.assert_sequence;
  +   cseq = pps-ppsinfo.clear_sequence;
  +   while (aseq == pps-ppsinfo.assert_sequence 
  +   cseq == pps-ppsinfo.clear_sequence) {
 Note that compilers are allowed to optimize these accesses even over
 the sequential point, which is the tsleep() call. Only accesses to
 volatile objects are forbidden to be rearranged.
 
 I suggest to add volatile casts to pps in the loop condition.
 

Thank you.  I pondered volatility, but was under the impression that the
function call took care of it.  I'll fix that.

-- Ian

  +   err = tsleep(pps, PCATCH, ppsfch, timo);
  +   if (err == EWOULDBLOCK  fapi-timeout.tv_sec == -1) {
  +   continue;
  +   } else if (err != 0) {
  +   return (err);
  +   }
  +   }
  +   }
  +
  +   pps-ppsinfo.current_mode = pps-ppsparam.mode;
  +   fapi-pps_info_buf = pps-ppsinfo;
  +
  +   return (0);
  +}
  +
   int
   pps_ioctl(u_long cmd, caddr_t data, struct pps_state *pps)
   {
  @@ -1485,13 +1529,7 @@
  return (0);
  case PPS_IOC_FETCH:
  fapi = (struct pps_fetch_args *)data;
  -   if (fapi-tsformat  fapi-tsformat != PPS_TSFMT_TSPEC)
  -   return (EINVAL);
  -   if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec)
  -   return (EOPNOTSUPP);
  -   pps-ppsinfo.current_mode = pps-ppsparam.mode;
  -   fapi-pps_info_buf = pps-ppsinfo;
  -   return (0);
  +   return (pps_fetch(fapi, pps));
   #ifdef FFCLOCK
  case PPS_IOC_FETCH_FFCOUNTER:
  fapi_ffc = (struct pps_fetch_ffc_args *)data;
  @@ -1540,7 +1578,7 @@
   void
   pps_init(struct pps_state *pps)
   {
  -   pps-ppscap |= PPS_TSFMT_TSPEC;
  +   pps-ppscap |= PPS_TSFMT_TSPEC | PPS_CANWAIT;
  if (pps-ppscap  PPS_CAPTUREASSERT)
  pps-ppscap |= PPS_OFFSETASSERT;
  if (pps-ppscap  PPS_CAPTURECLEAR)
  @@ -1680,6 +1718,9 @@
  hardpps(tsp, ts.tv_nsec

fcntl(2) F_READAHEAD set to zero doesn't work [patch]

2013-02-08 Thread Ian Lepore

I discovered today that fcntl(fd, F_READAHEAD, 0) doesn't work as
advertised.  It's supposed to disable readahead, but instead it restores
the default readahead behavior (if it had previously been changed), and
there is no way to disable readahead.[1]  I think the attached patch
fixes it, but it's not immediately clear from the patch why; here's the
deal...

The amount of readahead is calculated by sequential_heuristic() in
vfs_vnops.c.  If the FRDAHEAD flag is set on the file it uses the value
stored in the file's f_seqcount, otherwise it calculates a value (and
updates f_seqcount, which doesn't ever happen when FRDAHEAD is set).

So the patch causes the FRDAHEAD flag to be set even in the case of the
readahead amount being zero.  Because it seems like a useful concept, it
still allows the readahead to be restored to default behavior, now by
passing a negative value.

Does this look right to those of you who understand this part of the
system better than I do?

-- Ian

[1] No way using F_READAHEAD; I know about POSIX_FADV_RANDOM.
Index: sys/kern/kern_descrip.c
===
--- sys/kern/kern_descrip.c	(revision 246337)
+++ sys/kern/kern_descrip.c	(working copy)
@@ -776,7 +776,7 @@
 		}
 		fhold(fp);
 		FILEDESC_SUNLOCK(fdp);
-		if (arg != 0) {
+		if (arg = 0) {
 			vp = fp-f_vnode;
 			error = vn_lock(vp, LK_SHARED);
 			if (error != 0) {
Index: lib/libc/sys/fcntl.2
===
--- lib/libc/sys/fcntl.2	(revision 246337)
+++ lib/libc/sys/fcntl.2	(working copy)
@@ -28,7 +28,7 @@
 .\ @(#)fcntl.2	8.2 (Berkeley) 1/12/94
 .\ $FreeBSD$
 .\
-.Dd July 27, 2012
+.Dd February 8, 2013
 .Dt FCNTL 2
 .Os
 .Sh NAME
@@ -171,7 +171,7 @@
 which is rounded up to the nearest block size.
 A zero value in
 .Fa arg
-turns off read ahead.
+turns off read ahead, a negative value restores the system default.
 .It Dv F_RDAHEAD
 Equivalent to Darwin counterpart which sets read ahead amount of 128KB
 when the third argument,
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Request for review, time_pps_fetch() enhancement

2013-02-05 Thread Ian Lepore

I'd like feedback on the attached patch, which adds support to our
time_pps_fetch() implementation for the blocking behaviors described in
section 3.4.3 of RFC 2783.  The existing implementation can only return
the most recently captured data without blocking.  These changes add the
ability to block (forever or with timeout) until a new event occurs.

-- Ian

Index: sys/kern/kern_tc.c
===
--- sys/kern/kern_tc.c	(revision 246337)
+++ sys/kern/kern_tc.c	(working copy)
@@ -1446,6 +1446,50 @@
  * RFC 2783 PPS-API implementation.
  */
 
+static int
+pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps)
+{
+	int err, timo;
+	pps_seq_t aseq, cseq;
+	struct timeval tv;
+
+	if (fapi-tsformat  fapi-tsformat != PPS_TSFMT_TSPEC)
+		return (EINVAL);
+
+	/*
+	 * If no timeout is requested, immediately return whatever values were
+	 * most recently captured.  If timeout seconds is -1, that's a request
+	 * to block without a timeout.  WITNESS won't let us sleep forever
+	 * without a lock (we really don't need a lock), so just repeatedly
+	 * sleep a long time.
+	 */
+	if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec) {
+		if (fapi-timeout.tv_sec == -1)
+			timo = 0x7fff;
+		else {
+			tv.tv_sec = fapi-timeout.tv_sec;
+			tv.tv_usec = fapi-timeout.tv_nsec / 1000;
+			timo = tvtohz(tv);
+		}
+		aseq = pps-ppsinfo.assert_sequence;
+		cseq = pps-ppsinfo.clear_sequence;
+		while (aseq == pps-ppsinfo.assert_sequence 
+		cseq == pps-ppsinfo.clear_sequence) {
+			err = tsleep(pps, PCATCH, ppsfch, timo);
+			if (err == EWOULDBLOCK  fapi-timeout.tv_sec == -1) {
+continue;
+			} else if (err != 0) {
+return (err);
+			}
+		}
+	}
+
+	pps-ppsinfo.current_mode = pps-ppsparam.mode;
+	fapi-pps_info_buf = pps-ppsinfo;
+
+	return (0);
+}
+
 int
 pps_ioctl(u_long cmd, caddr_t data, struct pps_state *pps)
 {
@@ -1485,13 +1529,7 @@
 		return (0);
 	case PPS_IOC_FETCH:
 		fapi = (struct pps_fetch_args *)data;
-		if (fapi-tsformat  fapi-tsformat != PPS_TSFMT_TSPEC)
-			return (EINVAL);
-		if (fapi-timeout.tv_sec || fapi-timeout.tv_nsec)
-			return (EOPNOTSUPP);
-		pps-ppsinfo.current_mode = pps-ppsparam.mode;
-		fapi-pps_info_buf = pps-ppsinfo;
-		return (0);
+		return (pps_fetch(fapi, pps));
 #ifdef FFCLOCK
 	case PPS_IOC_FETCH_FFCOUNTER:
 		fapi_ffc = (struct pps_fetch_ffc_args *)data;
@@ -1540,7 +1578,7 @@
 void
 pps_init(struct pps_state *pps)
 {
-	pps-ppscap |= PPS_TSFMT_TSPEC;
+	pps-ppscap |= PPS_TSFMT_TSPEC | PPS_CANWAIT;
 	if (pps-ppscap  PPS_CAPTUREASSERT)
 		pps-ppscap |= PPS_OFFSETASSERT;
 	if (pps-ppscap  PPS_CAPTURECLEAR)
@@ -1680,6 +1718,9 @@
 		hardpps(tsp, ts.tv_nsec + 10 * ts.tv_sec);
 	}
 #endif
+
+	/* Wakeup anyone sleeping in pps_fetch().  */
+	wakeup(pps);
 }
 
 /*
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Sockets programming question

2013-01-29 Thread Ian Lepore

On Mon, 2013-01-28 at 18:02 +0200, Konstantin Belousov wrote:
 On Mon, Jan 28, 2013 at 08:11:47AM -0700, Ian Lepore wrote:
  I've got a question that isn't exactly freebsd-specific, but
  implemenation-specific behavior may be involved.
  
  I've got a server process that accepts connections from clients on a
  PF_LOCAL stream socket.  Multiple clients can be connected at once; a
  list of them is tracked internally.  The server occasionally sends data
  to each client.  The time between messages to clients can range
  literally from milliseconds to months.  Clients never send any data to
  the server, indeed they may shutdown that side of the connection and
  just receive data.
  
  The only way I can find to discover that a client has disappeared is by
  trying to send them a message and getting an error because they've
  closed the socket or died completely.  At that point I can reap the
  resources and remove them from the client list.  This is problem because
  of the months between messages thing.  A lot of clients can come and
  go during those months and I've got this ever-growing list of open
  socket descriptors because I never had anything to say the whole time
  they were connected.
  
  By trial and error I've discovered that I can sort of poll for their
  presence by writing a zero-length message.  If the other end of the
  connection is gone I get the expected error and can reap the client,
  otherwise it appears to quietly write nothing and return zero and have
  no other side effects than polling the status of the server-client side
  of the pipe.
  
  My problem with this polling is that I can't find anything in writing
  that sanctions this behavior.  Would this amount to relying on a
  non-portable accident of the current implementation?  
  
  Also, am I missing something simple and there's a cannonical way to
  handle this?  In all the years I've done client/server stuff I've never
  had quite this type of interaction (or lack thereof) between client and
  server before.
 
 Check for the IN events as well. I would not trust the mere presence
 of the IN in the poll result, but consequent read should return EOF
 and this is good indicator of the dead client.

You can't use EOF on a read() to determine client life when the nature
of the client/server relationship is that clients are allowed to
shutdown(fd, SHUT_WR) as soon as they connect because they expect to
receive but never send any data.

On the other hand, Alfred's suggestion of using poll(2) rather than
select(2) worked perfectly.  Polling with an events mask of zero results
in it returning POLLHUP in revents if the client has closed the socket.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Sockets programming question

2013-01-28 Thread Ian Lepore

I've got a question that isn't exactly freebsd-specific, but
implemenation-specific behavior may be involved.

I've got a server process that accepts connections from clients on a
PF_LOCAL stream socket.  Multiple clients can be connected at once; a
list of them is tracked internally.  The server occasionally sends data
to each client.  The time between messages to clients can range
literally from milliseconds to months.  Clients never send any data to
the server, indeed they may shutdown that side of the connection and
just receive data.

The only way I can find to discover that a client has disappeared is by
trying to send them a message and getting an error because they've
closed the socket or died completely.  At that point I can reap the
resources and remove them from the client list.  This is problem because
of the months between messages thing.  A lot of clients can come and
go during those months and I've got this ever-growing list of open
socket descriptors because I never had anything to say the whole time
they were connected.

By trial and error I've discovered that I can sort of poll for their
presence by writing a zero-length message.  If the other end of the
connection is gone I get the expected error and can reap the client,
otherwise it appears to quietly write nothing and return zero and have
no other side effects than polling the status of the server-client side
of the pipe.

My problem with this polling is that I can't find anything in writing
that sanctions this behavior.  Would this amount to relying on a
non-portable accident of the current implementation?  

Also, am I missing something simple and there's a cannonical way to
handle this?  In all the years I've done client/server stuff I've never
had quite this type of interaction (or lack thereof) between client and
server before.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: NMI watchdog functionality on Freebsd

2013-01-23 Thread Ian Lepore

On Wed, 2013-01-23 at 08:47 -0800, Matthew Jacob wrote:
 On 1/23/2013 7:25 AM, John Baldwin wrote:
  On Tuesday, January 22, 2013 5:40:55 pm Sushanth Rai wrote:
  Hi,
 
  Does freebsd have some functionality similar to  Linux's NMI watchdog ? I'm
  aware of ichwd driver, but that depends to WDT to be available in the
  hardware. Even when it is available, BIOS needs to support a mechanism to
  trigger a OS level recovery to get any useful information when system is
  really wedged (with interrupt disabled)
 The principle purpose of a watchdog is to keep the system from hanging. 
 Information is secondary. The ichwd driver can use the LPC part of ICH 
 hardware that's been there since ICH version 4. I implemented this more 
 fully at Panasas. The first importance is to keep the system from being 
 hung. The next piece of information is to detect, on reboot, that a 
 watchdog event occurred. Finally, trying to isolate why is good.
 
 This is equivalent to the tco_WDT stuff on Linux. It's not interrupt 
 driven (it drives the reset line on the processor).
 

I think there's value in the NMI watchdog idea, but unless you back it
up with a real hardware watchdog you don't really have full watchdog
functionality.  If the NMI can get the OS to produce some extra info,
that's great, and using an NMI gives you a good chance of doing that
even if it is normal interrupt processing that has wedged the machine.
But calling panic() invokes plenty of processing that can get wedged in
other ways, so even an NMI-based watchdog isn't g'teed to get the
machine running again.

But adding a real hardware watchdog that fires on a slightly longer
timeout than the NMI watchdog gives you the best of everything: you get
information if it's possible to produce it, and you get a real hardware
reset shortly thereafter if producing the info fails.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: IBM blade server abysmal disk write performances

2013-01-18 Thread Ian Lepore

On Fri, 2013-01-18 at 20:37 +0100, Wojciech Puchar wrote:
  disk would write data
 
 
  I suspect that I'm encountering situations right now at netflix where this 
  advice is not true.  I have drives that are seeing intermittent errors, 
  then being forced into reset after a timeout, and then coming back up with 
  filesystem problems.  It's only a suspicion at this point, not a confirmed 
  case.
 true. I just assumed that anywhere it matters one would use gmirror.
 As for myself - i always prefer to put different manufacturers drives for 
 gmirror or at least - not manufactured at similar time.
 

That is good advice.  I bought six 1TB drives at the same time a few
years ago and received drives with consequtive serial numbers.  They
were all part of the same array, and they all failed (click of death)
within a six hour timespan of each other.  Luckily I noticed the
clicking right away and was able to get all the data copied to another
array within a few hours, before they all died.

-- Ian

 2 fails at the same moment is rather unlikely. Of course - everything is 
 possible so i do proper backups to remote sites. Remote means another 
 city.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: IBM blade server abysmal disk write performances

2013-01-18 Thread Ian Lepore

On Fri, 2013-01-18 at 22:18 +0100, Wojciech Puchar wrote:
  and anyone who enabled SATA WC or complained about I/O slowness
  would be forced into Siberian salt mines for the remainder of their lives.
 
 so reserve a place for me there.

Yeah, me too.  I prefer to go for all-out performance with separate risk
mitigation strategies.  I wouldn't set up a client datacenter that way,
but it's wholly appropriate for what I do with this machine.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Failsafe on kernel panic

2013-01-17 Thread Ian Lepore

On Thu, 2013-01-17 at 08:38 +0200, Sami Halabi wrote:
 btw: i don't see any options in my kernel config for KBD / Unatteneded , th
 eonly thing that mention its
 is: device ukbd
 
 Sami

I think if you don't have any kdb options turned on, then a panic should
automatically store a crashdump to swap, then reboot the machine.  If
that's not working, perhaps it locks up trying to store the dump?  

If the hardware has a watchdog timer, enabling that might be the best
way to ensure a reboot on any kind of crash or hang.

-- Ian


 On Thu, Jan 17, 2013 at 6:45 AM, Sami Halabi sodyn...@gmail.com wrote:
 
  Its only a kernel option? There is no flag to pass to the loader?
 
  SAMI
   17  2013 05:18,  Ian Lepore i...@freebsd.org:
 
  On Wed, 2013-01-16 at 23:27 +0200, Sami Halabi wrote:
   Thank you for your response, very helpful.
   one question - how do i configure auto-reboot once kernel panic occurs?
  
   Sami
  
 
  From src/sys/conf/NOTES, this may be what you're looking for...
 
  #
  # Don't enter the debugger for a panic. Intended for unattended operation
  # where you may want to enter the debugger from the console, but still
  want
  # the machine to recover from a panic.
  #
  options KDB_UNATTENDED
 
  But I think it only has meaning if you have option KDB in effect,
  otherwise it should just reboot itself after a 15 second pause.
 
  -- Ian
 
 
 
 
 
 
  
   On Wed, Jan 16, 2013 at 10:13 PM, John Baldwin j...@freebsd.org wrote:
  
On Wednesday, January 16, 2013 2:25:33 pm Sami Halabi wrote:
 Hi everyone,
 I have a production box, in which I want to install new kernel
  without
any
 remotd kvn.
 my problem is its 2 hours away, and if a kernel panic occurs I got a
 problem.
 I woner if I can seg failsafe script to load the old kernel in case
  of
 psnic.
   
man nextboot (if you are using UFS)
   
--
John Baldwin
   
  
  
  
 
 
 
 
 
 -- 
 Sami Halabi
 Information Systems Engineer
 NMS Projects Expert
 FreeBSD SysAdmin Expert
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Failsafe on kernel panic

2013-01-16 Thread Ian Lepore

On Wed, 2013-01-16 at 23:27 +0200, Sami Halabi wrote:
 Thank you for your response, very helpful.
 one question - how do i configure auto-reboot once kernel panic occurs?
 
 Sami
 

From src/sys/conf/NOTES, this may be what you're looking for...

#
# Don't enter the debugger for a panic. Intended for unattended operation
# where you may want to enter the debugger from the console, but still want
# the machine to recover from a panic.
#
options KDB_UNATTENDED

But I think it only has meaning if you have option KDB in effect,
otherwise it should just reboot itself after a 15 second pause.

-- Ian






 
 On Wed, Jan 16, 2013 at 10:13 PM, John Baldwin j...@freebsd.org wrote:
 
  On Wednesday, January 16, 2013 2:25:33 pm Sami Halabi wrote:
   Hi everyone,
   I have a production box, in which I want to install new kernel without
  any
   remotd kvn.
   my problem is its 2 hours away, and if a kernel panic occurs I got a
   problem.
   I woner if I can seg failsafe script to load the old kernel in case of
   psnic.
 
  man nextboot (if you are using UFS)
 
  --
  John Baldwin
 
 
 
 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Getting the current thread ID without a syscall?

2013-01-15 Thread Ian Lepore

On Tue, 2013-01-15 at 14:29 -0800, Alfred Perlstein wrote:
 On 1/15/13 1:43 PM, Konstantin Belousov wrote:
  On Tue, Jan 15, 2013 at 04:35:14PM -0500, Trent Nelson wrote:
 
   Luckily it's for an open source project (Python), so recompilation
   isn't a big deal.  (I also check the intrinsic result versus the
   syscall result during startup to verify the same ID is returned,
   falling back to the syscall by default.)
  For you, may be. For your users, it definitely will be a problem.
  And worse, the problem will be blamed on the operating system and not
  to the broken application.
 
 Anything we can do to avoid this would be best.
 
 The reason is that we are still dealing with an optimization that perl 
 did, it reached inside of the opaque struct FILE to do nasty things.  
 Now it is very difficult for us to fix struct FILE.
 
 We are still paying for this years later.
 
 Any way we can make this a supported interface?
 
 -Alfred

Re-reading the original question, I've got to ask why pthread_self()
isn't the right answer?  The requirement wasn't I need to know what the
OS calls me it was I need a unique ID per thread within a process.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: kgzip(1) is broken

2013-01-15 Thread Ian Lepore

On Tue, 2013-01-15 at 13:27 -0800, dte...@freebsd.org wrote:
 Hello,
 
 I have been sad of-late because kgzip(1) no longer produces a usable kernel.
 
 All versions of 9.x suffer this.
 
 And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this recently broke in
 the 8.x series.
 
 I haven't tried the 7 series lately, but if whatever is making the rounds gets
 MFC'd that far back, I expect the problem to percolate there too.
 
 The symptom is that the machine reboots immediately and unexpectedly the 
 moment
 the kernel is executed by the loader.
 
 This is quite troubling and I am looking for someone to help find the 
 culprit. I
 don't know where to start looking.

Here are some possible candidates from the things that were MFC'd to 8
in that timeframe.  I haven't looked at what these do, they're just
changes that affect files related to booting.

r233211
r233377
r233469
r234563

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

RE: kgzip(1) is broken

2013-01-15 Thread Ian Lepore

On Tue, 2013-01-15 at 16:10 -0800, Devin Teske wrote:

  -Original Message-
  From: Devin Teske [mailto:devin.te...@fisglobal.com] On Behalf Of
  dte...@freebsd.org
  Sent: Tuesday, January 15, 2013 3:10 PM
  To: 'Ian Lepore'
  Cc: freebsd-hackers@freebsd.org; dte...@freebsd.org
  Subject: RE: kgzip(1) is broken

   -Original Message-
   From: Ian Lepore [mailto:free...@damnhippie.dyndns.org]
   Sent: Tuesday, January 15, 2013 3:05 PM
   To: dte...@freebsd.org
   Cc: freebsd-hackers@freebsd.org
   Subject: Re: kgzip(1) is broken

   On Tue, 2013-01-15 at 13:27 -0800, dte...@freebsd.org wrote:
Hello,

I have been sad of-late because kgzip(1) no longer produces a usable
 kernel.

All versions of 9.x suffer this.

And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this recently
   broke in
the 8.x series.

I haven't tried the 7 series lately, but if whatever is making the 
rounds
  gets
MFC'd that far back, I expect the problem to percolate there too.

The symptom is that the machine reboots immediately and unexpectedly the
   moment
the kernel is executed by the loader.

This is quite troubling and I am looking for someone to help find the
  culprit. I
don't know where to start looking.

   Here are some possible candidates from the things that were MFC'd to 8
   in that timeframe.  I haven't looked at what these do, they're just
   changes that affect files related to booting.

   r233211
   r233377
   r233469
   r234563

  Thanks Ian!

  I'll test each one individually to see if regressing any one (or all)
 addresses
  the problem.

 Progress...

 Looks like I found the culprit.

 Turns out it's a back-ported bxe(4) driver (back-ported from 9 -- where kgzip
 seems to never work).

 I wonder why back-porting bxe(4) from stable/9 to releng/8.3 would cause kgzip
 to produce non-working kernels.

Yeah, it'll be interesting to see how a device driver can lead to the
machine reboots immediately and unexpectedly the moment the kernel is
executed by the loader, which I took to mean before seeing the
copyright or anything.

 I'm emailing the maintainers (davidch + other Broadcom folk)

-- Ian

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: IBM blade server abysmal disk write performances

2013-01-15 Thread Ian Lepore

On Tue, 2013-01-15 at 15:28 -0500, Karim Fodil-Lemelin wrote:
 On 15/01/2013 3:03 PM, Dieter BSD wrote:
  Disabling the disks's write cache is *required* for data integrity.
  One op per rev means write caching is disabled and no queueing.
  But dmesg claims Command Queueing enabled, so you should be
 getting
  more than one op per rev, and writes should be fast.
  Is this dd to the raw drive, to a filesystem? (FFS? ZFS? other?)
  Are you running compression, encryption, or some other feature
  that might slow things down? Also, try dd with a larger block size,
  like bs=1m.
 Hi,
 
 Thanks to everyone that answered so far. Here is a follow up.  dd to
 the 
 raw drive and no compression/encryption or some other features, just
 a 
 naive boot off a live 9.1 CD then dd (see below). The following
 results 
 have been gathered on the FreeBSD 9.1 system:

You say dd with a raw drive, but as several people have pointed out,
linux dd doesn't go directly to the drive by default.  It looks like you
can make it do so with the direct option, which should make it behave
the same as freebsd's dd behaves by default (I think, I'm no linux
expert).

For example, using a usb thumb drive:

th2 # dd if=/dev/sdb4 of=/dev/null count=100 
100+0 records in
100+0 records out
51200 bytes (51 kB) copied, 0.0142396 s, 3.6 MB/s

th2 # dd if=/dev/sdb4 of=/dev/null count=100 iflag=direct
100+0 records in
100+0 records out
51200 bytes (51 kB) copied, 0.0628582 s, 815 kB/s

Hmm, just before hitting send I saw your other response that SAS drives
behave badly, SATA are fine.  That does seem to point away from dd
behavior.  It might still be interesting to see if the direct flag on
linux drops performance into the same horrible range as freebsd with
SAS.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Proper way to determine place of system sources in makefile?

2013-01-06 Thread Ian Lepore

On Sun, 2013-01-06 at 22:17 +0400, Lev Serebryakov wrote:
 Hello, Hackers.
 
  I'm  writing  some code, which is built outside of system sources but
 depends on them.
 
  I'm using FreeBSD mk infrastructure.
 
  When code is kernel module (uses bsd.kmod.mk) here is SYSDIR
 variable.
 
   But which is proper way to refer to system sources when makefile is
 prepared for shared library (bsd.lib.mk) or program (bsd.prog.mk)?
 

That may depend on what you mean by system sources.  In particular,
some header files which are generated during the build don't live
under /usr/src/sys, they're in $OBJDIR/sys/kernconf/.  I was
struggling with how to include such a file (in a non-hacky way) while
building a bootloader from sys/boot/arm the other day, and I never did
come up with a clean answer.  (I do understand why -- the header files I
wanted have content that changes based on KERNCONF=, and sys/boot is
built during buildworld, not buildkernel.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Another WTF moment

2012-12-16 Thread Ian Lepore

On Sun, 2012-12-16 at 12:01 -0800, Ronald F. Guilmette wrote:
 
 I have two Seagate ST380011A drives, both in the same single system.
 
 On that system, I boot to the FreeBSD 9.1-RC3 LiveCD.
 
 The resulting dmesg messages indicate the following regarding the two drives:
 
 ada0 at ata0 bus 0 scbus2 target 0 lun 0
 ada0: ST380011A 3.54 ATA-6 device
 ada0: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
 ada0: 76318MB (156299375 512 byte sectors: 16H 63S/T 16383C)
 ada0: Previously was known as ad0
 ada1 at ata0 bus 0 scbus2 target 1 lun 0
 ada1: ST380011A 3.06 ATA-6 device
 ada1: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
 ada1: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
 ada1: Previously was known as ad1
 
 
 So, um, WTF?  One ST380011A is 156299375 sectors big, and the other one
 is 156301488 big.
 
 How exactly does this happen?

Assuming the 3.06 and 3.54 are firmware revision numbers, one might
speculate that ongoing testing showed higher sector failure rates than
intially expected, and thus newer firmware sets aside a few more sectors
as spares.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [RFQ] make witness panic an option

2012-11-15 Thread Ian Lepore

On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote:
 Hi all,
 
 When debugging and writing wireless drivers/stack code, I like to
 sprinkle lots of locking assertions everywhere. However, this does
 cause things to panic quite often during active development.
 
 This patch (against stable/9) makes the actual panic itself
 configurable. It still prints the message regardless.
 
 This has allowed me to sprinkle more locking assertions everywhere to
 investigate whether particular paths have been hit or not. I don't
 necessarily want those to panic the kernel.
 
 I'd like everyone to consider this for FreeBSD-HEAD.
 
 Thanks!

I strongly support this, because I'm tired of having to hack it in by
hand every time I need it.

You can't boot an arm platform right now (on freebsd 8, 9, or 10)
without a LOR very early in the boot.  Once you get past that, 2 or 3
device drivers I use panic way before we even get to mounting root.
Those panics can clearly be ignored, because we've been shipping
products for years based on this code.  (It's on my to-do list to fix
them, but more pressing problems are higher on the list.)

When a new problem crops up that isn't harmless, it totally sucks that I
can't just turn on witness without first hacking the code to make the
known problems non-panicky.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [RFQ] make witness panic an option

2012-11-15 Thread Ian Lepore

On Thu, 2012-11-15 at 17:47 +, Attilio Rao wrote:
 On 11/15/12, Ian Lepore free...@damnhippie.dyndns.org wrote:
  On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote:
  Hi all,
 
  When debugging and writing wireless drivers/stack code, I like to
  sprinkle lots of locking assertions everywhere. However, this does
  cause things to panic quite often during active development.
 
  This patch (against stable/9) makes the actual panic itself
  configurable. It still prints the message regardless.
 
  This has allowed me to sprinkle more locking assertions everywhere to
  investigate whether particular paths have been hit or not. I don't
  necessarily want those to panic the kernel.
 
  I'd like everyone to consider this for FreeBSD-HEAD.
 
  Thanks!
 
  I strongly support this, because I'm tired of having to hack it in by
  hand every time I need it.
 
  You can't boot an arm platform right now (on freebsd 8, 9, or 10)
  without a LOR very early in the boot.  Once you get past that, 2 or 3
  device drivers I use panic way before we even get to mounting root.
  Those panics can clearly be ignored, because we've been shipping
  products for years based on this code.  (It's on my to-do list to fix
  them, but more pressing problems are higher on the list.)
 
 This is a ridicolous motivation.
 What are the panics in question? Why they are not fixed yet?
 Without WITNESS_KDB you should not panic even in cases where WITNESS
 yells. So if you do, it means there is a more subdole breakage going
 on here.
 
 Do you really think that an abusable mechanism will help here rather
 than fixing the actual problems?
 
  When a new problem crops up that isn't harmless, it totally sucks that I
  can't just turn on witness without first hacking the code to make the
  known problems non-panicky.
 
 I really don't understand what are these harmless problems here.
 I just know one and it is between the dirhash lock and the bufwait
 lock for UFS, which is carefully documented in the code comments. All
 the others cases haven't been analyzed deeply enough to quantify them
 as harmless.
 
 Can you please make real examples?
 

No. 

Since you've made it abundantly clear in this thread that you are not
open to anyone else's opinion and won't change your mind, I'm not going
to waste even 10 seconds explaining my perfectly valid needs.

I'll just keep hacking the code up to not panic when I need to.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Give users a hint when their locate database is too small.

2012-11-13 Thread Ian Lepore

On Tue, 2012-11-13 at 11:05 -0500, Eitan Adler wrote:
 On 13 November 2012 10:58, Eitan Adler li...@eitanadler.com wrote:
 
 Okay... sorry for the spam. I remember there was a reason I used
 /etc/periodic/weekly/310.locate instead of /usr/libexec/locate.updatedb.
 The latter must not be run as root, and the former takes care of this work.
 
 Since the default is to enable weekly updates I am inclined to use the
 310.locate script instead.
 
 
 

Would it work to refer them to the locate.updatedb manpage (which
references the periodic script, and presumably would be kept up to date
with any script renaming/numbering)?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Memory reserves or lack thereof

2012-11-12 Thread Ian Lepore

On Mon, 2012-11-12 at 13:18 +0100, Andre Oppermann wrote:
  Well, what's the current set of best practices for allocating mbufs?
 
 If an allocation is driven by user space then you can use M_WAITOK.
 
 If an allocation is driven by the driver or kernel (callout and so on)
 you do M_NOWAIT and handle a failure by trying again later either
 directly by rescheduling the callout or by the upper layer retransmit
 logic.
 
 On top of that individual mbuf allocation or stitching mbufs and
 clusters together manually is deprecated.  If every possible you
 should use m_getm2().

root@pico:/root # man m_getm2
No manual entry for m_getm2

So when you say manually stitching mbufs together is deprecated, I take
you mean in the case where you're letting the mbuf routines allocate the
actual buffer space for you?

I've got an ethernet driver on an ARM SoC in which the hardware receives
into a series of buffers fixed at 128 bytes.  Right now the code is
allocating a cluster and then looping using m_append() to reassemble
these buffers back into a full contiguous frame in a cluster.  I was
going to have a shot at using MEXTADD() to manually string the series of
hardware/dma buffers together without copying the data.  Is that sort of
usage still a good idea?  (And would it actually be a performance win?
If I hand it off to the net stack and an m_pullup() or similar is going
to happen along the way anyway, I might as well do it at driver level.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: watchdogd, jemalloc, and mlockall

2012-11-10 Thread Ian Lepore

On Sat, 2012-11-03 at 12:50 -0600, Ian Lepore wrote:
 On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote:
  On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote:
   In an attempt to un-hijack the thread about memory usage increase
   between 6.4 and 9.x, I'm starting a new thread here related to my recent
   discovery that watchdogd uses a lot more memory since it began using
   mlockall(2).
   
   I tried statically linking watchdogd and it made a small difference in
   RSS, presumably because it doesn't wire down all of libc and libm.
   
VSZ   RSS
   10236 10164  Dynamic
8624  8636  Static
   
   Those numbers are from ps -u on an arm platform.  I just updated the PR
   (bin/173332) with some procstat -v output comparing with/without
   mlockall().
   
   It appears that the bulk of the new RSS bloat comes from jemalloc
   allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
   that leads to wiring 8MB to satisfy what probably amounts to a few
   hundred bytes of malloc'd memory.
   
   It would probably also be a good idea to remove the floating point from
   watchdogd to avoid wiring all of libm.  The floating point is used just
   to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
   There's probably a reasonably efficient way to do that without calling
   log(), considering that it only happens once at program startup.
  
  No, I propose to add a switch to turn on/off the mlockall() call.
  I have no opinion on the default value of the suggested switch.
 
 In a patch I submitted along with the PR, I added code to query the
 vm.swap_enabled sysctl and only call mlockall() when swapping is
 enabled.  
 
 Nobody yet has said anything about what seems to me to be the real
 problem here:  jemalloc grabs 8MB at a time even if you only need to
 malloc a few bytes, and there appears to be no way to control that
 behavior.  Or maybe there's a knob in there that didn't jump out at me
 on a quick glance through the header files.

I finally found some time to pursue this further.  A small correction to
what I said earlier: it appears that jemalloc allocates chunks of 4MB at
a time, not 8, but it also appears that it allocates at least 2 chunks
so the net effect is an 8MB default minimum allocation.

I played with the jemalloc tuning option lg_chunk and with static versus
dynamic linking, and came up with the numbers below, which were
generated by ps -u on an ARM-based system with 64MB running -current
from a couple weeks ago, but with the recent patch to watchdogd to
eliminate the need for libm.  I used lg_chunk:14 (16K chunks), the
smallest value it would allow on this platform.  For comparison I also
include the numbers from a FreeBSD 8.2 ARM system (which would be
dynamic linked and untuned, and also without any mlockall() calls).

 Link malloc%MEMVSZ  RSS
-
dynamic   untuned15.3  10040 9996
staticuntuned13.2   8624 8636
dynamic   tuned   2.8   1880 1836
statictuned   0.8480  492

[ freebsd 8.2 ]   1.1   1752  748

So it appears that using jemalloc's tuning in a daemon that uses
mlockall(2) is a big win, especially if the daemon doesn't do much
memory allocation (watchdogd allocates 2 things, 4k and 1280 bytes; if
you use -e it also strdup()s the command string).  It also seems that
providing a build-time knob to control static linking would be valuable
on platforms that are very memory limited and can't benefit from having
all of libc wired.

I haven't attached a patch because there appears to be no good way to
actually achieve this in a platform-agnostic way.  The jemalloc code
enforces the lower range of the lg_chunk tuning value to be tied to the
page size of the platform, and it rejects out of range values without
changing the tuning.  The code that works on an ARM with 4K page size,

const char *malloc_conf = lg_chunk:14;

would fail on a system that had bigger pages.  The tuning must be
specified with a compile-time constant like that, because it has to be
tuned before the first allocation, which apparently happens before
main() is entered.  It would be nice if jemalloc would clip the tuning
to the lowest legal value instead of rejecting it, especially since the
lowest legal value is calculated based not only on page size but on the
value of other configurable values.

There's another potential solution, but it strikes me as rather
inelegant... jemalloc can also be tuned with the MALLOC_CONF env var.
With the right rc-fu we could provide something like a watchdogd_memtune
variable that you could set and watchdogd would be invoked with
MALLOC_CONF set to that in the environment.  It still couldn't be set to
a default value that was good for all platforms.  It would also get
passed through environment inheritence to any -e whatever command run
by watchdogd, which isn't necessarily appropriate

procstat -v question

2012-11-05 Thread Ian Lepore

In a line of procstat -v output such as this:

  PID  STARTEND PRT  RES PRES REF SHD FL TP PATH
60065 0x200c1000 0x201c3000 r-x  1820  17   8 CN vn /usr/lib/libstdc++.so.6

Does that 182 resident pages mean that the process being displayed is
referencing that many pages itself, or does that represent how many
pages are resident due to all the references from all the processes that
have the library open?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: watchdogd, jemalloc, and mlockall

2012-11-04 Thread Ian Lepore

On Sun, 2012-11-04 at 09:16 -0500, Dylan Cochran wrote:
 Have you already tried something like opt.lg_chunk? This, combined
 with other options for the library (man 3 jemalloc), should reduce the
 space from 8MB down to 16K, or so. (approximation, I'm being liberal
 for jemalloc's internal bookkeeping size). For a special case like
 watchdogd, this would make more sense in general (given it should be
 designed to do no allocations/deletions during normal operation
 anyway). For other programs, this would be as unwise as statically
 linking them.
 

I had completely missed the fact that jemalloc had its own manpage,
thank you.  

Given that new information I think the pieces are in place to put
watchdogd on a memory diet.  I'll work up a patch in the next couple
days

-- Ian

 The 'perfect' solution would obviously be improving the library
 manager (rtld) to only mmap() function pages it needs, though I will
 admit I'm not sure if the ELF format is even capable of supporting
 something like that, what other problems it would cause down the road,
 or if it even attempts to do this already (I haven't looked at the
 runtime linker code since 7.0).
 
 By the way, remember that when you compare static v dynamic, that the
 runtime linker does allocate private memory to handle the resolving of
 symbols to virtual memory addresses. That may skew your memory usage
 figures a bit.
 
 On Sat, Nov 3, 2012 at 4:12 PM, Ian Lepore
 free...@damnhippie.dyndns.org wrote:
 
  On Sat, 2012-11-03 at 12:59 -0700, Xin Li wrote:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA256
  
   On 11/3/12 11:38 AM, Ian Lepore wrote:
In an attempt to un-hijack the thread about memory usage increase
between 6.4 and 9.x, I'm starting a new thread here related to my
recent discovery that watchdogd uses a lot more memory since it
began using mlockall(2).
   
I tried statically linking watchdogd and it made a small difference
in RSS, presumably because it doesn't wire down all of libc and
libm.
  
   Speaking for this, the last time I brought this up, someone (can't
   remember, I think it was phk@) argued that the shared library would
   use only one copy of memory, while statically linked ones would be
   duplicated and thus use more memory.  I haven't yet tried to prove or
   challenge that, though.
 
  That sounds right to me... if 3 or 4 daemons were to eventually be
  statically linked because of mlockall(), then each of them would have
  its own private copy of strlen(), and malloc(), and so on; we'd be back
  to the bad old days before shared libs came along.  Each program would
  contain its own copy of only the routines from the library that it uses,
  not the entire library in each program.
 
  On the other hand, if even one daemon linked with shared libc uses
  mlockall(), then all of libc gets wired.  As I understand it, only one
  physical copy of libc would exist in memory, still shared by almost all
  running apps.  The entire contents of the library would continuously
  occupy physical memory, even the parts that no apps are using.
 
  It's hard to know how to weigh the various tradeoffs.  I suspect there's
  no one correct answer.
 
  -- Ian
 
 
  ___
  freebsd-hackers@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
  To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: watchdogd, jemalloc, and mlockall

2012-11-04 Thread Ian Lepore

On Sun, 2012-11-04 at 09:36 -0700, Warner Losh wrote:
 On Nov 3, 2012, at 12:50 PM, Ian Lepore wrote:
 
  On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote:
  On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote:
  In an attempt to un-hijack the thread about memory usage increase
  between 6.4 and 9.x, I'm starting a new thread here related to my recent
  discovery that watchdogd uses a lot more memory since it began using
  mlockall(2).
  
  I tried statically linking watchdogd and it made a small difference in
  RSS, presumably because it doesn't wire down all of libc and libm.
  
  VSZ   RSS
  10236 10164  Dynamic
  8624  8636  Static
  
  Those numbers are from ps -u on an arm platform.  I just updated the PR
  (bin/173332) with some procstat -v output comparing with/without
  mlockall().
  
  It appears that the bulk of the new RSS bloat comes from jemalloc
  allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
  that leads to wiring 8MB to satisfy what probably amounts to a few
  hundred bytes of malloc'd memory.
  
  It would probably also be a good idea to remove the floating point from
  watchdogd to avoid wiring all of libm.  The floating point is used just
  to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
  There's probably a reasonably efficient way to do that without calling
  log(), considering that it only happens once at program startup.
  
  No, I propose to add a switch to turn on/off the mlockall() call.
  I have no opinion on the default value of the suggested switch.
  
  In a patch I submitted along with the PR, I added code to query the
  vm.swap_enabled sysctl and only call mlockall() when swapping is
  enabled.  
  
  Nobody yet has said anything about what seems to me to be the real
  problem here:  jemalloc grabs 8MB at a time even if you only need to
  malloc a few bytes, and there appears to be no way to control that
  behavior.  Or maybe there's a knob in there that didn't jump out at me
  on a quick glance through the header files.
 
 Isn't that only for non-production builds?
 
 Warner

I don't think so, I discovered this on my tflex unit running -current,
and it's built with MALLOC_PRODUCTION defined because it doesn't have
enough ram to boot without it defined.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: watchdogd, jemalloc, and mlockall

2012-11-04 Thread Ian Lepore

On Sun, 2012-11-04 at 09:36 -0700, Warner Losh wrote:
 On Nov 3, 2012, at 12:50 PM, Ian Lepore wrote:
 
  On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote:
  On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote:
  In an attempt to un-hijack the thread about memory usage increase
  between 6.4 and 9.x, I'm starting a new thread here related to my recent
  discovery that watchdogd uses a lot more memory since it began using
  mlockall(2).
  
  I tried statically linking watchdogd and it made a small difference in
  RSS, presumably because it doesn't wire down all of libc and libm.
  
  VSZ   RSS
  10236 10164  Dynamic
  8624  8636  Static
  
  Those numbers are from ps -u on an arm platform.  I just updated the PR
  (bin/173332) with some procstat -v output comparing with/without
  mlockall().
  
  It appears that the bulk of the new RSS bloat comes from jemalloc
  allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
  that leads to wiring 8MB to satisfy what probably amounts to a few
  hundred bytes of malloc'd memory.
  
  It would probably also be a good idea to remove the floating point from
  watchdogd to avoid wiring all of libm.  The floating point is used just
  to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
  There's probably a reasonably efficient way to do that without calling
  log(), considering that it only happens once at program startup.
  
  No, I propose to add a switch to turn on/off the mlockall() call.
  I have no opinion on the default value of the suggested switch.
  
  In a patch I submitted along with the PR, I added code to query the
  vm.swap_enabled sysctl and only call mlockall() when swapping is
  enabled.  
  
  Nobody yet has said anything about what seems to me to be the real
  problem here:  jemalloc grabs 8MB at a time even if you only need to
  malloc a few bytes, and there appears to be no way to control that
  behavior.  Or maybe there's a knob in there that didn't jump out at me
  on a quick glance through the header files.
 
 Isn't that only for non-production builds?
 
 Warner

I just realized the implication of what you asked.  I think it must be
that jemalloc always allocates big chunks of vmspace at a time (unless
tuned to do otherwise; I haven't looked into the tuning stuff yet), but
when MALLOC_PRODUCTION isn't defined it also touches all the pages
within that allocated space, presumably to lay in known byte patterns or
other debugging info.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..

2012-11-03 Thread Ian Lepore

On Wed, 2012-10-31 at 13:38 -0700, Adrian Chadd wrote:
 On 31 October 2012 12:06, Konstantin Belousov kostik...@gmail.com wrote:
 
  Watchdogd was recently changed to mlock its memory. This is the cause
  of the RSS increase.
 
  If not wired, swapout might cause a delay of the next pat, leading to
  panic.
 
 Right, but look at the virtual size of the 6.4 process. It's not 10
 megabytes at all. Even if you wired all of that into memory, it
 wouldn't be 10 megabytes.
 
 
 
 Adrian

After gathering some more evidence, I agree that the huge increase I
noticed in watchdogd is caused by a combo of jemalloc's behavior and the
recent addition of mlockall(2) to watchdogd.  Since this is only
slightly tangentially related to the OP's questions as near as I can
tell, I've entered a PR for it[1], and we can followup with a separate
discusssion thread about that.

While jemalloc can explain the growth in VSZ between 6.4 and 9.x, it
doesn't look like mlockall() has anything to do with the original
question of why the RSZ got so much bigger.  In other words, part of the
original question is still unanswered.

[1]  http://www.freebsd.org/cgi/query-pr.cgi?pr=173332

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

watchdogd, jemalloc, and mlockall

2012-11-03 Thread Ian Lepore

In an attempt to un-hijack the thread about memory usage increase
between 6.4 and 9.x, I'm starting a new thread here related to my recent
discovery that watchdogd uses a lot more memory since it began using
mlockall(2).

I tried statically linking watchdogd and it made a small difference in
RSS, presumably because it doesn't wire down all of libc and libm.

 VSZ   RSS
10236 10164  Dynamic
 8624  8636  Static

Those numbers are from ps -u on an arm platform.  I just updated the PR
(bin/173332) with some procstat -v output comparing with/without
mlockall().

It appears that the bulk of the new RSS bloat comes from jemalloc
allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
that leads to wiring 8MB to satisfy what probably amounts to a few
hundred bytes of malloc'd memory.

It would probably also be a good idea to remove the floating point from
watchdogd to avoid wiring all of libm.  The floating point is used just
to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
There's probably a reasonably efficient way to do that without calling
log(), considering that it only happens once at program startup.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: watchdogd, jemalloc, and mlockall

2012-11-03 Thread Ian Lepore

On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote:
 On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote:
  In an attempt to un-hijack the thread about memory usage increase
  between 6.4 and 9.x, I'm starting a new thread here related to my recent
  discovery that watchdogd uses a lot more memory since it began using
  mlockall(2).
  
  I tried statically linking watchdogd and it made a small difference in
  RSS, presumably because it doesn't wire down all of libc and libm.
  
   VSZ   RSS
  10236 10164  Dynamic
   8624  8636  Static
  
  Those numbers are from ps -u on an arm platform.  I just updated the PR
  (bin/173332) with some procstat -v output comparing with/without
  mlockall().
  
  It appears that the bulk of the new RSS bloat comes from jemalloc
  allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
  that leads to wiring 8MB to satisfy what probably amounts to a few
  hundred bytes of malloc'd memory.
  
  It would probably also be a good idea to remove the floating point from
  watchdogd to avoid wiring all of libm.  The floating point is used just
  to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
  There's probably a reasonably efficient way to do that without calling
  log(), considering that it only happens once at program startup.
 
 No, I propose to add a switch to turn on/off the mlockall() call.
 I have no opinion on the default value of the suggested switch.

In a patch I submitted along with the PR, I added code to query the
vm.swap_enabled sysctl and only call mlockall() when swapping is
enabled.  

Nobody yet has said anything about what seems to me to be the real
problem here:  jemalloc grabs 8MB at a time even if you only need to
malloc a few bytes, and there appears to be no way to control that
behavior.  Or maybe there's a knob in there that didn't jump out at me
on a quick glance through the header files.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: watchdogd, jemalloc, and mlockall

2012-11-03 Thread Ian Lepore

On Sat, 2012-11-03 at 12:59 -0700, Xin Li wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 On 11/3/12 11:38 AM, Ian Lepore wrote:
  In an attempt to un-hijack the thread about memory usage increase 
  between 6.4 and 9.x, I'm starting a new thread here related to my
  recent discovery that watchdogd uses a lot more memory since it
  began using mlockall(2).
  
  I tried statically linking watchdogd and it made a small difference
  in RSS, presumably because it doesn't wire down all of libc and
  libm.
 
 Speaking for this, the last time I brought this up, someone (can't
 remember, I think it was phk@) argued that the shared library would
 use only one copy of memory, while statically linked ones would be
 duplicated and thus use more memory.  I haven't yet tried to prove or
 challenge that, though.

That sounds right to me... if 3 or 4 daemons were to eventually be
statically linked because of mlockall(), then each of them would have
its own private copy of strlen(), and malloc(), and so on; we'd be back
to the bad old days before shared libs came along.  Each program would
contain its own copy of only the routines from the library that it uses,
not the entire library in each program.

On the other hand, if even one daemon linked with shared libc uses
mlockall(), then all of libc gets wired.  As I understand it, only one
physical copy of libc would exist in memory, still shared by almost all
running apps.  The entire contents of the library would continuously
occupy physical memory, even the parts that no apps are using.

It's hard to know how to weigh the various tradeoffs.  I suspect there's
no one correct answer.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..

2012-11-01 Thread Ian Lepore

On Thu, 2012-11-01 at 10:12 +0800, David Xu wrote:
 On 2012/10/31 22:44, Karl Pielorz wrote:
 
 
  --On 31 October 2012 16:06 +0200 Konstantin Belousov
  kostik...@gmail.com wrote:
 
  Since you neglected to provide the verbatim output of procstat, nothing
  conclusive can be said. Obviously, you can make an investigation on your
  own.
 
  Sorry - when I ran it this morning the output was several hundred lines
  - I didn't want to post all of that to the list 99% of the lines are
  very similar. I can email it you off-list if having the whole lot will
  help?
 
  Then there's a bunch of 'large' blocks e.g..
 
   PID  STARTEND PRT  RES PRES REF SHD  FL TP
   PATH 20100x801c00x80280 rw- 28690   4   0
   df 20100x802800x80340 rw- 18800   1   0
 
  Most likely, these are malloc arenas.
 
  Ok, that's the heaviest usage.
 
  Then lots of 'little' blocks,
 
  2010 0x70161000 0x70181000 rw-   160   1   0 ---D df
 
  And those are thread stacks.
 
  Ok, lots of those (lots of threads going on) - but they're all pretty
  small.
 
 Note that libc_r's thread stack is 64K, while libthr has 1M bytes
 per-thread.

That would help explain the large increase in virtual size, but not the
increase in resident size, right?  In other words, there's nothing
inherent in libthr that makes it use more stack, it just allocates more
vmspace to allow greater potential growth?

Hmmm, actually the chunks said to be thread stack above are neither 64K
nor 1M, that's 128K.  The malloc arenas are 12M, which seems like an
unusual value.  I haven't looked inside jemalloc at all, maybe that's
normal.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..

2012-10-31 Thread Ian Lepore

On Wed, 2012-10-31 at 10:55 -0700, Adrian Chadd wrote:
 .. isn't the default thread stack size now really quite large?
 
 Like one megabyte large?

That would explain a larger VSZ but the original post mentions that both
virtual and resident sizes have grown by almost an order of magnitude. 

I think the same is true of the jemalloc aspect -- its design makes it
use more virtual address space than phkmalloc when you've got lots of
threads, but that shouldn't make it use so much more physical memory.
I'm not positive of that, but I did notice when we upgraded from 6.x to
8.2 at work, our apps that have many dozens of threads use more virtual
space, but not dramatically as much more physical memory as in the OP's
case.

I think there are some things we should be investigating about the
growth of memory usage.  I just noticed this:

Freebsd 6.2 on an arm processor:

  369 root 1   8  -88  1752K   748K nanslp   3:00  0.00% watchdogd

Freebsd 10.0 on the same system:

  367 root 1 -52   r0 10232K 10160K nanslp  10:04  0.00% watchdogd

The 10.0 system is built with MALLOC_PRODUCTION (without that defined
the system won't even boot, it only has 64MB of ram).  That's a crazy
amount of growth for a relatively simple daemon.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..

2012-10-30 Thread Ian Lepore

On Tue, 2012-10-30 at 13:46 +0100, Fabian Keil wrote:
 Karl Pielorz kpielorz_...@tdx.co.uk wrote:
 
  Can anyone think of any quick pointers as to why some code originally 
  written under 6.4 amd64 - when re-compiled under 9.0-stable amd64 takes
  up a *lot* more memory when running?
 
 6.4 comes with phkmalloc while 9.0 uses jemalloc. Maybe you are
 allocating memory in a way that is less space-efficiently handled by
 jemalloc's default configuration.
 
 Fabian

jemalloc is certainly the first thing that came to my mind.  Does
MALLOC_PRODUCTION need to be defined on a 9.0 system, or is that
something that gets turned on automatically in an official release
build? (I'm always working with non-release stuff so I'm not sure how
that gets handled).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: opensolaris B_TRUE and B_FALSE

2012-10-28 Thread Ian Lepore

On Mon, 2012-10-29 at 00:02 +0100, Erik Cederstrand wrote:
 Hello,
 
 I'm looking at this Clang analyzer report: 
 http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-24-amd64/report-uH6BjZ.html.gz#EndPath
  Apart from the actual error, which is a apse positive, it seems like Clang 
 can't find the macro definitions for B_TRUE and B_FALSE (if it did, hovering 
 over them would show the macro definition).
 
 These are defined in sys/cddl/compat/opensolaris/sys/types.h as an enum of 
 type boolean_t as long as _KERNEL is not defined. The only definition for 
 boolean_t I can find is in sys/sys/types.h but it's only defined if _KERNEL 
 is defined.
 
 I'm sure that ZFS wouldn't work if B_TRUE or B_FALSE were undefined, I just 
 can't figure out where it's happening. I need a hint :-)
 
 Thanks,
 Erik

Look further up in sys/cddl/compat/opensolaris/sys/types.h, they're also
defined (as macros rather than enum) in the KERNEL case.  They're also
defined (as enum) in sys/gnu/fs/xfs/xfs_types.h.  (Once again, SlickEdit
pays for itself by answering with one right-click a question that would
have been a pain to use grep for.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs instead of a singular program

2012-10-26 Thread Ian Lepore

On Fri, 2012-10-26 at 08:27 -0600, Warner Losh wrote:
 On Oct 26, 2012, at 12:23 AM, Simon J. Gerraty wrote:
 
  In particular, why cannot the ':L' and ':U' support be added ?
  
  Because they already exist - with different meanings.
  They were added to NetBSD make over 10 years ago, from the OSF version
  of pmake.
 
 And we've had the :U and :L for a similar period of time as well.  Arguing 
 age here is an interesting historical footnote, but not a compelling argument 
 to justify the pain to our users.
 
  In several areas the behavior of bmake has been changed to make it a
  drop in replacement for FreeBSD, but the above (not used at all in the
  FreeBSD base) are easier dealt with the other way.  The :tl and :tu
  equivalents were added to FreeBSD make a while back to ease the
  transition.
 
 Why can't there be a make target that turns them on in FreeBSD compat mode.  
 You could then just drop those into bsd.port.mk and be done with it?  We 
 already do this with the posix target, so there's precedent for it.
 
 I know you've objected to this as ugly, but as I pointed out when I mentioned 
 it before, I think this is less ugly and less work and would offer a smoother 
 transition than forcing all the scripts to change.
 
 Warner

I second this concept.  At work, we create dozens of products using
literally hundreds of makefiles scattered throughout a huge source base.
We have to be able to build the same source for multiple versions of
freebsd, so even finding all the old :U and :L and any other
incompatibilities and fixing them isn't an option because we'd just
trade works in freebsd 10 for broken in every other environment.

If there were some way to turn on a compatibility mode, we'd have a way
to slowly transition to the newer stuff over the course of a couple OS
versions.  Eventually we'd reach the point where we no longer need to
build products using an older version and we could update to the newer
syntax and stop using compatibility mode.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs instead of a singular program

2012-10-26 Thread Ian Lepore

On Fri, 2012-10-26 at 11:09 -0700, David O'Brien wrote:
 On Fri, Oct 26, 2012 at 09:41:36AM -0600, Ian Lepore wrote:
  We have to be able to build the same source for multiple versions of
  freebsd, so even finding all the old :U and :L and any other
  incompatibilities and fixing them isn't an option because we'd just
  trade works in freebsd 10 for broken in every other environment.
 
 Ian,
 If you're using FreeBSD 9 after 2012-06-14, or FreeBSD 8 or 7 after
 2012-10-09 you can use the Bmake spelling of :U and :L (:tu/:tl).
 
 I am not aruging against you, just giving some information you may not
 be aware of.
 

Yeah.  And if I have to, I could modify all our makefiles to use the new
syntax, then backport support for the new syntax to earlier freebsd make
source in our local repos.  But to give you some idea of what I've got
to support... yesterday afternoon I was struggling with whether I can
find the time in a release schedule to update an old product that needs
a new feature from freebsd 6 to 8.  The sad fact is that I can't, I'm
going to have to do another freebsd 6-based release to meet the
schedule.  

It's interesting having to work on a daily basis in everything between
freebsd 6.2 and -current.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD in Google Code-In 2012? You can help too!

2012-10-23 Thread Ian Lepore

On Tue, 2012-10-23 at 12:39 +0200, Erik Cederstrand wrote:
 Den 16/10/2012 kl. 12.19 skrev Wojciech A. Koszek wkos...@freebsd.org:
 
  (cross-posted message; please keep discussion on freebsd-hackers@)
  
  Those of you who have Wiki access, please spent 2 more minutes and submit
  straight to Wiki:
  
  http://wiki.freebsd.org/GoogleCodeIn/2012Tasks
 
 
 There are lots of smallish tasks in the code-quality department:
 
 * Analyze and fix Clang Static Analyzer warnings
 * Analyze and fix compiler warnings to increase WARNS level
 * Write regression tests for src/tools/regression
 * Run include-what-you-use to clean up header inclusion
 * Verify bugs with patches
 
 I think they're too open-ended to enter in the wiki as-is, but I'd also like 
 to not spam the wiki with lots of almost-identical tasks. What's the best way 
 to suggest them for CodeIn?
 

Analyzing and fixing warnings is the last thing I'd assign to a young
inexperienced programmer.  It's far too easy (and tempting) to cast away
warnings or otherwise treat the symptoms when what's really needed is to
dig deeply into code (often including analyzing call chains) to evaluate
the consequences of any changes.

On the last 3 tasks in your list, I agree completely, just the sort of
thing you'd assign to an intern or new junior engineer to get them
started on a large existing project.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: time_t when used as timedelta

2012-10-09 Thread Ian Lepore

On Tue, 2012-10-09 at 17:35 +0200, Erik Cederstrand wrote:
 Hi list,
 
 I'm looking at this possible divide-by zero in dhclient: 
 http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-07-amd64/report-nBhqE2.html.gz#EndPath
 
 In this specific case, it's obvious from the intention of the code that 
 ip-client-interval is always 0, but it's not obvious to me in the code. I 
 could add an assert before the possible divide-by-zero:
 
 assert(ip-client-interval  0);
 
 But looking at the code, I'm not sure it's very elegant. ip-client-interval 
 is defined as time_t (see src/sbin/dhclient/dhcpd.h), which is a signed 
 integer type, if I'm correct. However, some time_t members of struct 
 client_state and struct client_config (see said header file) are assumed in 
 the code to be positive and possibly non-null. Instead of plastering the code 
 with asserts, is there something like an utime_t type? Or are there better 
 ways to enforce the invariant?
 

It looks to me like the place where enforcement is really needed is in
parse_lease_time() which should ensure at the very least that negative
values never get through, and in some cases that zeroes don't sneak in
from config files.  If it were ensured that
ip-client-config-backoff_cutoff could never be less than 1 (and it
appears any value less than 1 would be insane), then the division by
zero case could never happen.  However, at least one of the config
statements handled by parse_lease_time() allows a value of zero.

Since nothing seems to ensure that backoff_cutoff is non-zero, it seems
like a potential source of div-by-zero errors too, in that same
function.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: syslog(3) issues

2012-09-02 Thread Ian Lepore

On Mon, 2012-09-03 at 00:35 +0100, Attilio Rao wrote:
 Hi,
 I was trying to use syslog(3) in a port application that uses
 threading , having all of them at the LOG_CRIT level. What I see is
 that when the logging gets massive (1000 entries) I cannot find some
 items within the /var/log/messages (I know because I started stamping
 also some sort of message ID in order to see what is going on). The
 missing items are in the order of 25% of what really be there.
 
 Someone has a good idea on where I can start verifying for my syslogd
 system? I have really 0 experience with syslogd and maybe I could be
 missing something obvious.

There's a chance this PR about syslogd incorrectly calculating socket
receive buffer sizes is related and the patch attached to it could fix
it...

http://www.freebsd.org/cgi/query-pr.cgi?pr=1604331

I filed the PR long ago, if the patches have drifted out of date I'll be
happy to re-work them.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: syslog(3) issues

2012-09-02 Thread Ian Lepore

On Sun, 2012-09-02 at 19:50 -0600, Ian Lepore wrote:
 On Mon, 2012-09-03 at 00:35 +0100, Attilio Rao wrote:
  Hi,
  I was trying to use syslog(3) in a port application that uses
  threading , having all of them at the LOG_CRIT level. What I see is
  that when the logging gets massive (1000 entries) I cannot find some
  items within the /var/log/messages (I know because I started stamping
  also some sort of message ID in order to see what is going on). The
  missing items are in the order of 25% of what really be there.
  
  Someone has a good idea on where I can start verifying for my syslogd
  system? I have really 0 experience with syslogd and maybe I could be
  missing something obvious.
 
 There's a chance this PR about syslogd incorrectly calculating socket
 receive buffer sizes is related and the patch attached to it could fix
 it...
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=1604331
 
 I filed the PR long ago, if the patches have drifted out of date I'll be
 happy to re-work them.
 
 -- Ian
 

Oops, I glitched the PR number when I pasted it, this one should be
correct:

http://www.freebsd.org/cgi/query-pr.cgi?pr=160433

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: any status of that project?

2012-08-16 Thread Ian Lepore

On Thu, 2012-08-16 at 14:03 +0200, Wojciech Puchar wrote:
 http://freebsdfoundation.blogspot.com/2012/03/new-project-nand-flash-support.html
 
 this would be great thing to have it working properly. any progress info?

In the past few days I've tested the flash code in -current on a
GlobalScale DreamPlug (arm platform), and confirmed that the low-level
part of the code is working.  I can read the flash on the unit and
identify the existing partitions and data within them (but the main
partition is formatted as UBI fs, so I've only looked at it with hexdump
so far).  I haven't tried the nandfs layer yet, or writing to the flash.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: How to Expose Chip-level Ethernet Statistics?

2012-08-04 Thread Ian Lepore

On Sat, 2012-08-04 at 12:21 -0700, Tim Kientzle wrote:
 I believe that some of the issues I'm having with this
 Ethernet driver might be easier to diagnose if I could
 expose the chip-level statistics counters (especially queue
 overrun counts).
 
 Is there a standard way to do this?
 
 I've looked at systat, netstat, and ifconfig but haven't
 yet found a standard tool that queries this sort of
 information.  (If I could find that, I could figure out
 which ioctl it used…)
 
 Pointers appreciated…  In particular, if there's another
 Ethernet driver that does this well, I can use that for a
 reference.
 
 Tim

I don't know if this is exactly what you mean, but have a look at
src/tools/tools/ifinfo, and find some examples of drivers that fill in
that info by grepping for ifmib_iso_8802_3.

(I really know nothing about this stuff, except that your request
triggered a memory that the atmel if_ate driver gathers some stats that
I've not seen in most other drivers.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: newbus' ivar's limitation..

2012-07-30 Thread Ian Lepore

On Mon, 2012-07-30 at 17:06 -0400, John Baldwin wrote:
 On Tuesday, July 17, 2012 2:03:14 am Arnaud Lacombe wrote:
  Hi,
  
  On Fri, Jul 13, 2012 at 1:56 PM, Arnaud Lacombe lacom...@gmail.com wrote:
   Hi,
  
   On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh i...@bsdimp.com wrote:
   [..]
   Honestly, though, I think you'll be more pissed when you find out that 
 the N:1 interface that you want is being done in the wrong domain.  But I've 
 been wrong before and look forward to seeing your replacement.
  
   I will just pass function pointers for now, if things should be done
   dirty, let's be explicit about it.
  
   Now, the hinted device attachment did work quite smoothly, however, I
   would have a few suggestion:
1) add a call to bus_enumerate_hinted_children() before the call
   DEVICE_IDENTIFY() call in bus_generic_driver_added()
  
   this is required to be able to support dynamic loading and attachment
   of hinted children.
 
 I'm not sure this is a feature we want to support (to date hinted children
 have only been created at boot time). 

It seems to me that the bus should be in control of calling
bus_enumerate_hinted_children() at whatever time works best for it.
Also, shouldn't it only ever be called once?

The comment block for BUS_HINTED_CHILD in bus_if.h says This method is
only called in response to the parent bus asking for hinted devices to
be enumerated.  I think one of the implications of that is that any
given bus may not call bus_enumerate_hinted_children() because it may
not be able to do anything for hinted children.  Adding a
hint.somedev.0.at=somebus and then forcing the bus to enumerate hinted
children amounts to forcing the bus to adopt a child it may not be able
to provide resources for, which sounds like a panic or crash waiting to
happen (or at best, no crash but nothing useful happens either).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: kqueue periodic timer confusion

2012-07-12 Thread Ian Lepore

On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
 On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
  On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
   Hi,
   
   Sorry about this repost but I'm confused about the responses I received
   in my last post so I'm looking for some clarification.
   
   Specifically, I though I could use the kqueue timer as essentially a
   drop in replacement for linuxfd_create/read, but was surprised that
   the accuracy of the kqueue timer is much less than what I need for my
   application.
   
   So my confusion at this point is whether this is consider to be a bug or
   feature?
   
   Here's some test code if you want to verify the problem:
   
   #include stdio.h
   #include stdlib.h
   #include string.h
   #include unistd.h
   #include errno.h
   #include sys/types.h
   #include sys/event.h
   #include sys/time.h
   
   int
   main(void)
   {
   int i,msec;
   int kq,nev;
   struct kevent inqueue;
   struct kevent outqueue;
   struct timeval start,end;
   
   if ((kq = kqueue()) == -1) {
   fprintf(stderr, kqueue error!? errno = %s, 
 strerror(errno));
   exit(EXIT_FAILURE);
   }
   EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
   
   gettimeofday(start, 0);
   for (i = 0; i  50; i++) {
   if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) 
   == 
 -1) {
   fprintf(stderr, kevent error!? errno = %s, 
 strerror(errno));
   exit(EXIT_FAILURE);
   } else if (outqueue.flags  EV_ERROR) {
   fprintf(stderr, EV_ERROR: %s\n, 
 strerror(outqueue.data));
   exit(EXIT_FAILURE);
   }
   }
   gettimeofday(end, 0);
   
   msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
 end.tv_usec - start.tv_usec) / 1000) - 1000);
   
   printf(msec = %d\n, msec);
   
   close(kq);
   return EXIT_SUCCESS;
   }
   
   
  
  What you are seeing is just the way FreeBSD currently works.  
  
  Sleeping (in most all of its various forms, and I've just looked at the
  kevent code to verify this is true there) is handled by converting the
  amount of time to sleep (usually specified in a timeval or timespec
  struct) to a count of timer ticks, using an internal routine called
  tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
  account for the current tick.  Whether that's a good idea or not (it
  probably was once, and probably not anymore) it's how things currently
  work, and could explain the fairly consistant +1ms you're seeing.
 
 This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
 installs a periodic callout that executes KNOTE() and then resets itself (via 
 callout_reset()) each time it runs.  This should generally be closer to
 regulary spaced intervals than something that does:
 

In what way is it irrelevant?  That is, what did I miss?  It appears to
me that the next callout is scheduled by calling timertoticks() passing
a count of milliseconds, that count is converted to a struct timeval and
passed to tvtohz() which is where the +1 adjustment happens.  If you ask
for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
There is some time, likely a small number of microseconds, that you've
consumed of the current tick, and that's what the +1 in tvtohz() is
supposed to account for according to the comments.  

The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
and then adds one tick on top of that.  That seems not quite right to
me, except that it is a way to g'tee that you don't return early, and
that is the one promise made by sleep routines on any OS; those magical
at least words always appear in the docs.

Actually what I'm missing (that I know of) is how the scheduler works.
Maybe the +1 adjustment to account for the fraction of the current tick
you've already consumed is the right thing to do, even when that
fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
behavior that I know nothing about.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: kqueue periodic timer confusion

2012-07-12 Thread Ian Lepore

On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote:
 On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
  On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
  On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
   On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
 Hi,

 Sorry about this repost but I'm confused about the responses I 
 received
 in my last post so I'm looking for some clarification.

 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised that
 the accuracy of the kqueue timer is much less than what I need for my
 application.

 So my confusion at this point is whether this is consider to be a 
 bug or
 feature?

 Here's some test code if you want to verify the problem:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h

 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;

 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 
 0);

 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
 NULL)) ==
   -1) {
 fprintf(stderr, kevent error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n,
   strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);

 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
   end.tv_usec - start.tv_usec) / 1000) - 1000);

 printf(msec = %d\n, msec);

 close(kq);
 return EXIT_SUCCESS;
 }


   
What you are seeing is just the way FreeBSD currently works.
   
Sleeping (in most all of its various forms, and I've just looked at the
kevent code to verify this is true there) is handled by converting the
amount of time to sleep (usually specified in a timeval or timespec
struct) to a count of timer ticks, using an internal routine called
tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
account for the current tick.  Whether that's a good idea or not (it
probably was once, and probably not anymore) it's how things currently
work, and could explain the fairly consistant +1ms you're seeing.
  
   This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
   installs a periodic callout that executes KNOTE() and then resets itself 
   (via
   callout_reset()) each time it runs.  This should generally be closer to
   regulary spaced intervals than something that does:
  
 
  In what way is it irrelevant?  That is, what did I miss?  It appears to
  me that the next callout is scheduled by calling timertoticks() passing
  a count of milliseconds, that count is converted to a struct timeval and
  passed to tvtohz() which is where the +1 adjustment happens.  If you ask
  for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
  There is some time, likely a small number of microseconds, that you've
  consumed of the current tick, and that's what the +1 in tvtohz() is
  supposed to account for according to the comments.
 
  The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
  and then adds one tick on top of that.  That seems not quite right to
  me, except that it is a way to g'tee that you don't return early, and
  that is the one promise made by sleep routines on any OS; those magical
  at least words always appear in the docs.
 
  Actually what I'm missing (that I know of) is how the scheduler works.
  Maybe the +1 adjustment to account for the fraction of the current tick
  you've already consumed is the right thing to do, even when that
  fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
  behavior that I know nothing about.
 
  Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
  case.  That is, the +1 makes sense when you are computing a one-time delta
  for things like nanosleep().  It is incorrect when computing a periodic
  delta such as for computing the interval for an itimer (setitimer) or
  EVFILT_TIMER().
 
  Hah, setitimer()'s callout (realitexpire) uses

Re: /proc filesystem

2012-07-12 Thread Ian Lepore

On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote:
 that is what i need.
 
 but still need some explanation after using it and reading manual
 
 say:
PID  STARTEND PRT  RES PRES REF SHD  FL TP PATH
   1378   0x40   0x5ac000 r-x  385  415   2   1 CN- vn 
 /usr/local/bin/Xorg
   1378   0x7ab000   0x7bc000 rw-   170   1   0 C-- vn 
 /usr/local/bin/Xorg
   1378   0x7bc000   0x80 rw-   140   1   0 C-- df
   13780x8007ab0000x8007c3000 r-x   240  32   0 CN- vn 
 /libexec/ld-elf.so.1
   13780x8007c30000x8007f rw-   430   1   0 C-- df
   13780x8007f0x8007f2000 rw-10   4   0 --- dv
   13780x8007f20000x8007f4000 rw-20   4   0 --- dv
   13780x8007f40000x800874000 rw-   110   4   0 --- dv
   13780x8008740000x800884000 rw-   160   4   0 --- dv
   13780x8008840000x800895000 rw-   100   1   0 CN- df
   13780x8009c20000x8009c5000 rw-30   1   0 C-- df
 
 
 1) Xorg is mapped twice - IMHO first is text/rodata second is data. But 
 what REF really means here and why it is 2 once and 1 second.
 
 2) what really PRES (private resident) means? df (default) mappings are 
 IMHO anonymous maps==private data of process. so why RES is nonzero while 
 PRES is zero, while on shared code PRES is nonzero and large. what does it 
 really means?
 
 thanks.
 

I'm catching up on threads I was following before I went on vacation,
and it looks like there was never a response to this.  I'm interested in
the answers to these questions too, so today I did some spelunking in
the code to see what I could figure out.  I don't think I really
understand things too well, but I'll just say what I think I found and
hopefully the experts will correct anything I get wrong.

I think you're right about the first two mappings in that procstat
output.  The REF value is the reference count on the vm object (the
vnode for the exe file, I presume).  I think the reason the reference
count is 2 is that one reference is the open file itself, and the other
is the shadow object.  I've always been a bit confused about the concept
of shadow objects in freebsd's vm, but I think it's somehow related to
the running processes that are based on that executable vnode.  For
example, if another copy of Xorg were running, I think REF would be 3,
and SHD would be 2.

I don't know why there is no shadow object for the writable data mapping
and why the refcount is only 1 for that.

The PRES thing seemed simple when I first looked at the code, but the
more I think about it in relation to other numbers the more confused I
get.  The logic in the code is if the shadow count is 1 then PRES is
the resident size of the shadow object.  This seems to be a measure of
shared-code usage... any object which could be shared but isn't gets
counted as private resident.

The part that confuses me is how PRES can be larger than RES.  The value
for PRES is taken from the resident_page_count field of the shadow
object.  The RES value is calculated by walking each page of the map
entry and calling pmap_mincore() to see if it's resident.  So the number
of resident pages is calculated to be fewer than the resident_page_count
of the object the entry maps.  I don't understand.

Oh hmmm, wait a sec... could it be that read-ahead or relocation fixup
or various other things caused lots of pages to be faulted in for the
vnode object (so they're resident) but not all of those pages are mapped
into the process because the path of execution has never referenced them
and caused faults to map them into the process' vmspace?

-- Ian

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: kqueue periodic timer confusion

2012-07-11 Thread Ian Lepore

On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
 Hi,
 
 Sorry about this repost but I'm confused about the responses I received
 in my last post so I'm looking for some clarification.
 
 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised that
 the accuracy of the kqueue timer is much less than what I need for my
 application.
 
 So my confusion at this point is whether this is consider to be a bug or
 feature?
 
 Here's some test code if you want to verify the problem:
 
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h
 
 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;
 
 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s, strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
 
 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == 
 -1) {
 fprintf(stderr, kevent error!? errno = %s, 
 strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n, 
 strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);
 
 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
 end.tv_usec - start.tv_usec) / 1000) - 1000);
 
 printf(msec = %d\n, msec);
 
 close(kq);
 return EXIT_SUCCESS;
 }
 
 

What you are seeing is just the way FreeBSD currently works.  

Sleeping (in most all of its various forms, and I've just looked at the
kevent code to verify this is true there) is handled by converting the
amount of time to sleep (usually specified in a timeval or timespec
struct) to a count of timer ticks, using an internal routine called
tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
account for the current tick.  Whether that's a good idea or not (it
probably was once, and probably not anymore) it's how things currently
work, and could explain the fairly consistant +1ms you're seeing.

Another source of oversleeping is that the length of a tick in
microseconds is simplisticly calculated as 100 / hz on most
hardware, so for HZ=1000, tick=1000.  Unless the clock producing the
tick interrupts is running at a frequency exactly divisible by 1000,
that tick-length calculation has some rounding error in it, and it
results in systematic oversleeping.  On modern hardware with
high-frequency clocks it's typically less than 1%.  

The routines for sleeping in the kernel take a count of ticks for how
long to sleep, so when tvtohz() converts some number of microseconds to
the corresponding number of ticks, any rounding error in the value for
the length of a tick results in oversleeping by some small percentage of
the time you wanted to sleep.  Note that this rounding error in
calculating the length of a tick does not result in a systematic skew in
system timekeeping, because when each tick interrupt happens, the system
reads a clock counter register that may or may not be related to the
clock producing tick interrupts; the value in the register is full
precision without the rounding error you get when counting ticks.

It might be an interesting experiment to add kern.hz=1 to
your /boot/loader.conf and see how that affects your test.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Interfacing devices with multiple parents within newbus

2012-07-07 Thread Ian Lepore

On Fri, 2012-07-06 at 16:45 -0400, Arnaud Lacombe wrote:
 Hi,
 
 On Fri, Jul 6, 2012 at 3:09 PM, Ian Lepore
 free...@damnhippie.dyndns.org wrote:
  On Fri, 2012-07-06 at 14:46 -0400, Arnaud Lacombe wrote:
  Hi,
 
  On Fri, Jul 6, 2012 at 11:33 AM, Arnaud Lacombe lacom...@gmail.com wrote:
   That's neither correct nor robust in a couple of way:
1) you have no guarantee a device unit will always give you the same 
   resource.
  this raises the following question: how can a device, today, figure
  out which parent in a given devclass would give it access to resources
  it needs.
 
  Say, you have gpiobus0 provided by a superio and gpiobus1 provided by
  the chipset and a LED on the chipset's GPIO. Now, say gpiobus0
  attachment is conditional to some BIOS setting. How can you tell
  gpioled(4) to attach on the chipset provided GPIO without hardcoding
  unit number either way ?
 
  AFAIK, you can not.
 
  Even hints provided layout description is defeated. Each device in a
  given devclass need to have a set of unique attribute to allow a child
  to distinguish it from other potential parent in the same devclass...
 
   - Arnaud
 
  Talking about a child being unable to choose the correct parent seems to
  indicate that this whole problem is turned upside-down somehow; children
  don't choose their parents.
 
 actually, I think I was wrong, I thought device were attached to a
 devclass, but they are truly attached to a given device. My mistake.
 
  Just blue-sky dreaming here on the fly... what we really have is a
  resource-management problem.  A device comes along that needs a GPIO
  resource, how does it find and use that resource?
 
  Well, we have a resource manager, could that help somehow?  Could a
  driver that provides access to GPIO somehow register its availability so
  that another driver can find and access it?  The resource may be a
  callable interface, it doesn't really matter, I'm just wondering if the
  current rman stuff could be leveraged to help make the connection
  between unrelated devices.   I think that implies that there would have
  to be something near the root of the hiearchy willing to be the
  owner/manager of dynamic resources.
 
 AFAIR, rman is mostly there to manage memory vs. i/o mapped resources.
 The more I think about it, the more FTD is the answer. The open
 question now being how to map a flexible device structure (FTD) to a
 less flexible structure (Newbus) :/
 
  - Arnaud

Memory- and IO-mapped regions and IRQs are the only current uses of rman
(that I know of), but it was designed to be fairly agnostic about the
resources it manages.  It just works with ranges of values (that it
really doesn't know how to interpret at all), leaving lots of room to
define new types of things it can manage.

The downside is that it's designed to be used hierarchically in the
context of newbus, specifically to help parents manage the resources
that they are able to provide to their children.  Trying to use it in a
way that allows devices which are hierarchically unrelated to allocate
resources from each other may amount to a square-peg/round-hole
situation.  But the alternative is writing a new facility to allow
registration and allocation of resources using some sort symbolic method
of representing the resources such that the new manager doesn't have to
know much about what it's managing.  I think it would be better to find
a way to reuse what we've already got if that's possible.

I think we have two semi-related aspects to this problem... 

How do we symbolically represent the resources that drivers can provide
to each other?   (FDT may be the answer; I don't know much about it.)

How do devices use that symbolic representation to locate the provider
of the resource, and how is the sharing of those resources managed?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Interfacing devices with multiple parents within newbus

2012-07-06 Thread Ian Lepore

On Fri, 2012-07-06 at 14:46 -0400, Arnaud Lacombe wrote:
 Hi,
 
 On Fri, Jul 6, 2012 at 11:33 AM, Arnaud Lacombe lacom...@gmail.com wrote:
  That's neither correct nor robust in a couple of way:
   1) you have no guarantee a device unit will always give you the same 
  resource.
 this raises the following question: how can a device, today, figure
 out which parent in a given devclass would give it access to resources
 it needs.
 
 Say, you have gpiobus0 provided by a superio and gpiobus1 provided by
 the chipset and a LED on the chipset's GPIO. Now, say gpiobus0
 attachment is conditional to some BIOS setting. How can you tell
 gpioled(4) to attach on the chipset provided GPIO without hardcoding
 unit number either way ?
 
 AFAIK, you can not.
 
 Even hints provided layout description is defeated. Each device in a
 given devclass need to have a set of unique attribute to allow a child
 to distinguish it from other potential parent in the same devclass...
 
  - Arnaud

Talking about a child being unable to choose the correct parent seems to
indicate that this whole problem is turned upside-down somehow; children
don't choose their parents.

Just blue-sky dreaming here on the fly... what we really have is a
resource-management problem.  A device comes along that needs a GPIO
resource, how does it find and use that resource?  

Well, we have a resource manager, could that help somehow?  Could a
driver that provides access to GPIO somehow register its availability so
that another driver can find and access it?  The resource may be a
callable interface, it doesn't really matter, I'm just wondering if the
current rman stuff could be leveraged to help make the connection
between unrelated devices.   I think that implies that there would have
to be something near the root of the hiearchy willing to be the
owner/manager of dynamic resources.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Pull in upstream before 9.1 code freeze?

2012-07-04 Thread Ian Lepore

On Wed, 2012-07-04 at 15:08 -0700, Doug Barton wrote:
 On 07/04/2012 15:01, Mike Meyer wrote:
  On Wed, 04 Jul 2012 14:19:38 -0700
  Doug Barton do...@freebsd.org wrote:
  On 07/04/2012 11:51, Jason Hellenthal wrote:
  What would be really nice here is a command wrapper hooked into the
  shell so that when you type a command and it does not exist it presents
  you with a question for suggestions to install somewhat like Fedora has
  done.
  I would also like to see this feature, which is pretty much universal in
  linux at this point. It's very handy.
  
  I, on the other hand, count it as one of the many features of Linux
  that make me use FreeBSD.
 
 First, I agree that being able to turn it off should be possible. But I
 can't help being curious ... why would you *not* want a feature that
 tells you what to install if you type a command that doesn't exist on
 the system?
 
 Doug
 

The only response I can think of is... If you can even ask that
question, then there's no answer I could give that would make any sense
to you.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: /etc/resolv.conf getting over written with dhcp

2012-06-27 Thread Ian Lepore

On Wed, 2012-06-20 at 13:39 +0530, Varuna wrote:
 Ian Lepore wrote:
  
  Using the 'prepend' or 'supercede' keywords in /etc/dhclient.conf is
  pretty much the standard way of handling a mix of static and dhcp
  interfaces where the static config needs to take precedence.  I'm not
  sure why you dismiss it as essentially good, but somehow not good
  enough.  It's been working for me for years.
  
  -- Ian
  
 The issue that I had indicated that the issue with the /etc/resolv.conf is 
 being 
 caused by an error in /sbin/dhclient-script; hence, I am definitely not 
 looking 
 at solving the issue either with /etc/dhclient.conf or 
 /etc/dhclient-exit-hooks 
 configuration file.
 
 BTW, resolver(5) / resolv.conf(5) does not mention the usage of 
 /etc/dhclient-exit-hooks file to protect the earlier contents of 
 /etc/resolv.conf file.  Will put this issue in the freebsd-doc mailing list.
 
 With regards,
 Varuna
 Eudaemonic Systems
 Simple, Specific  Insightful

I have re-read your original message and I think the confusion is here:


 2***# When resolv.conf is not changed actually, we don't
  # need to update it.
  # If /usr is not mounted yet, we cannot use cmp, then
  # the following test fails.  In such case, we simply
  # ignore an error and do update resolv.conf.
 3***if cmp -s $tmpres /etc/resolv.conf; then
  rm -f $tmpres
  return 0
  fi 2/dev/null
 [...]
 I guess, the 1***, 3*** and 4*** is causing the recreation of 
 /etc/resolv.conf. 
   Is this correct? I did a small modification to 3*** which is:
  if !(cmp -s $tmpres /etc/resolv.conf); then
  rm -f $tmpres
  return 0
  fi 2/dev/null
 This seems to have solved the issue of /etc/resolv.conf getting overwritten 
 with 
 just: nameserver 192.168.98.4.  This ensures that: If there is a difference 
 between $tmpres and /etc/resolv.conf, then it exits post removal of $tmpres.  
 If 
 the execution of 3*** returns a 0, a new file gets created.  I guess the 
 modification get the intent of 3*** working.
 
 Have I barked up the wrong tree?

I think yes, you have barked up the wrong tree.  The intent of the code
at 3*** is not to exit if there is a difference, it is to exit if there
is NO difference.  In other words, if the old and new files are
identical then there is no need to re-write the file, just cleanup and
exit.  If the files are different then replace the existing file with
the new one.

This is just the (sometimes annoying) way dhcp works.  If the dhcp
server provides new resolver info it completely replaces any existing
resolver info unless you've configured your dhclient.conf to prevent it.
It only does so if the interface being configured is the current
default-route interface, or there is no current default-route interface.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: /etc/resolv.conf getting over written with dhcp

2012-06-15 Thread Ian Lepore

On Fri, 2012-06-15 at 23:02 +0530, Varuna wrote:
 Thanks for the pointers.
 
 Dima Panov wrote:
   From my /etc/dhclient.conf:
  
  interface lagg0 {
  send dhcp-lease-time 3600;
  prepend domain-name-servers 127.0.0.1, 4.4.4.4, 8.8.8.8;
  request subnet-mask, broadcast-address, time-offset, routers,
  domain-name, domain-name-servers;
  require subnet-mask, domain-name-servers;
  }
  
  And result is /etc/resolv.conf:
  # Generated by resolvconf
  nameserver 127.0.0.1
  nameserver 4.4.4.4
  nameserver 8.8.8.8
  nameserver 192.168.1.1
 True indeed this will work and I did have a look at dhclient.conf(5) to setup 
 the freebsd8:/etc/dhclient.conf.  This will still call /sbin/dhclient-script 
 which will overwrite the configuration done to the /etc/resolv.conf each time 
 the system power is recycled.  As per /usr/src/include/resolv.h, the MAXNS is 
 by 
 default set to 3; which the default configuration user will not be aware of 
 as 
 the entire focus will be on the ifconfig related flags in /etc/rc.conf.  BTW, 
 the example indicated in dhclient.conf(5) has a typo which says 
 /etc/dhclient-script instead of /sbin/dhclient-script, indeed the system does 
 not fail if the typo exists in dhclient.conf.
 
 
 Eugene Grosbein wrote:
  There is simple solution: create file /etc/dhclient-enter-hooks
  and override add_new_resolv_conf() there to do nothing:
 
  add_new_resolv_conf() {
return 0
  }
 
  Works just fine for my systems.
 Indeed this is a good suggestion, and this is if the user is aware of what to 
 look for and where in /sbin/dhclient-script it is documented.
 
 A general sysadmin would be aware of /etc/nsswitch.conf and /etc/resolv.conf 
 for 
 name resolution issues and I do not think they will be aware of so many 
 possible 
 ways to handle the issue of resolv.conf getting overwritten by the usage of 
 dhcp.
 
 What would be the way out? Do you think it would be a good idea to push the 
 nameserver configuration information into /etc/rc.conf which happens to be 
 the 
 single file that would handle the system configuration?
 
 With regards,
 Varuna
 Eudaemonic Systems
 Simple, Specific  Insightful
 
 IT Consultants, Continued Education  Systems Distribution
 +91-88-92-47-62-63
 http://www.eudaemonicsystems.net
 http://enquiry.eudaemonicsystems.net
 
 --
 This email is confidential, and may be legally privileged.  If you
 are not the intended recipient, you must not use or disseminate
 this information in any format.  If you have received this email in
 error, please do delete it along with copies of it existing in any
 other format, and notify the sender immediately.  The sender of this
 email believes it is virus free, and does not accept any liability
 for any errors or omissions arising thereof.
 

Using the 'prepend' or 'supercede' keywords in /etc/dhclient.conf is
pretty much the standard way of handling a mix of static and dhcp
interfaces where the static config needs to take precedence.  I'm not
sure why you dismiss it as essentially good, but somehow not good
enough.  It's been working for me for years.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: wired memory - again!

2012-06-13 Thread Ian Lepore

On Tue, 2012-06-12 at 23:45 +0300, Konstantin Belousov wrote:
 On Tue, Jun 12, 2012 at 08:51:34AM -0600, Ian Lepore wrote:
  On Sat, 2012-06-09 at 22:45 +0200, Wojciech Puchar wrote:
   
First, all memory allocated by UMA and consequently malloc(9) is
wired. In other words, almost all memory used by kernel is accounted
as wired.
   
   yes i understand this. still i found no way how to find out what 
   allocated 
   that much.
   
   
Second, the buffer cache wires the pages which are inserted into VMIO
buffers. So your observation is basically right, cached buffers means
   
   what are exactly VMIO buffers. i understand that page must be wired 
   WHEN 
   doing I/O.
   But i have too much wired memory even when doing no I/O at all.
  
  I agree, this is The Big Question for me.  Why does the system keep
  wired writable mappings of the buffers in kva after the IO operations
  are completed?  
 Read about buffer cache, e.g. in the Design and Implementation of
 the FreeBSD OS book.
 
  
  If it did not do so, it would fix the instruction-cache-disabled bug
  that kills performance on VIVT cache architectures (arm and mips) and it
  would reduce the amount of wired memory (that apparently doesn't need to
  be wired, unless I've missed the implications of a previous reply in
  this thread).
 
 I have no idea what is the bug you are talking about. If my guess is
 right, and it specifically references unability of some processors
 to correctly handle several mappings of the same physical page into
 different virtual addresses due to cache tagging using virtual address
 instead of physical, then this is a hardware bug, not software.
 

This bug:

http://lists.freebsd.org/pipermail/freebsd-arm/2012-January/003288.html

The bug isn't the VIVT cache hardware, it's the fact that the way we
handle the requirements of the hardware has the side effect of leaving
the instruction cache bit disabled on executable pages because the
kernel keeps writable mappings of the pages even after the IO is done.

 AFAIR, at least HP PA and MIPS have different instantiation of this problem.
 Our kernel uses multi-mapping quite often, and buffers is only one example.
 
 Also, why do you think that the pages entered into buffers shall not be
 wired, it is completely beyond my understanding.

What's beyond my understanding is why a page has to remain wired after
the IO is complete.  That question seems to me to be tangentially
related to the above question of why the kernel needs to keep a writable
mapping of the buffer after it's done writing into the page (either via
DMA or via uiomove() depending on the direction of the IO).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD Boot Times

2012-06-13 Thread Ian Lepore

On Wed, 2012-06-13 at 09:10 +0200, Wojciech Puchar wrote:
  Greetings,
 
  I was just wondering what it is that FreeBSD does that makes it take so 
  long 
  to boot. Booting into Ubuntu minimal or my own custom Linux distro, 
  literally 
  takes 0.5-2 seconds to boot up to shell, where FreeBSD takes about 10-20 
  seconds. I'm not sure if anything could be parallelized in the boot process,
 
 mostly kernel time. 
  Note: This isn't really an issue, moreso a curiosity.
 
 true. system that never crash are not often booted

An embedded system may be booted or powered cycled dozens of times a
day, and boot time can be VERY important.  Don't assume that the way you
use FreeBSD is the only way.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: wired memory - again!

2012-06-12 Thread Ian Lepore

On Sat, 2012-06-09 at 22:45 +0200, Wojciech Puchar wrote:
 
  First, all memory allocated by UMA and consequently malloc(9) is
  wired. In other words, almost all memory used by kernel is accounted
  as wired.
 
 yes i understand this. still i found no way how to find out what allocated 
 that much.
 
 
  Second, the buffer cache wires the pages which are inserted into VMIO
  buffers. So your observation is basically right, cached buffers means
 
 what are exactly VMIO buffers. i understand that page must be wired WHEN 
 doing I/O.
 But i have too much wired memory even when doing no I/O at all.

I agree, this is The Big Question for me.  Why does the system keep
wired writable mappings of the buffers in kva after the IO operations
are completed?  

If it did not do so, it would fix the instruction-cache-disabled bug
that kills performance on VIVT cache architectures (arm and mips) and it
would reduce the amount of wired memory (that apparently doesn't need to
be wired, unless I've missed the implications of a previous reply in
this thread).

-- Ian

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: wired memory - again!

2012-06-09 Thread Ian Lepore

On Sat, 2012-06-09 at 09:21 +0200, Wojciech Puchar wrote:
 top reports wired memory 128MB
 
 
 WHERE it is used? below results of vmstat -m and vmstat -z
 values does not sum up even to half of it
 FreeBSD 9 - few days old.
 
 What i am missing and why there are SO MUCH wired memory on 1GB machine 
 without X11 or virtualbox
 
  [vmstat output snipped]
 


I have been struggling to answer the same question for about a week on
our embedded systems (running 8.2).  We have systems with 64MB ram which
have 20MB wired, and I couldn't find any way to directly view what that
wired memory is being used for.  I also discovered that the vmstat
output accounted for only a tiny fraction of the 20MB.

What I eventually determined is that there is some sort of correlation
between vfs buffer space and wired memory.  Our embedded systems
typically do very little disk IO, but during some testing we were
spewing debug output to /var/log/messages at the rate of several lines
per second for hours.  Under these conditions the amount of wired memory
would climb from its usual of about 8MB to around 20MB, and once it
climbed that high it pretty much never went down, or only went down a
couple MB.  The resulting memory pressure caused our apps to get killed
over and over again with out of swap space (we have no swap on these
systems).

The kernel auto-tunes the vfs buffer space using the formula for the
first 64 MB of ram use 1/4 for buffers, plus 1/10 of the ram over 64
MB.  Using 16 of 64 MB of ram for buffer space seems insane to me, but
maybe it makes sense on certain types of servers or something.  I added
option NBUF=128 to our kernel config and that dropped the buffer space
to under 2 MB and since doing that I haven't seen the amount of wired
memory ever go above 8 MB.  I wonder whether my tuning of NBUF is
affecting wired memory usage by indirectly tuning the 'nswbuf' value; I
can't tune nswbuf directly because the embedded system is ARM-based and
we have no loader(8) for setting tunablables.

I'm not sure NBUF=128 is a good setting even for a system that doesn't
do much IO, so I consider it experimental and we're testing under a
variety of conditions to see if it leads to any unexpected behaviors.
I'm certainly not suggesting anyone else rush to add this option to
their kernel config.

I am VERY curious about the nature of this correlation between vfs
buffer space and wired memory.  For the VM gurus:  Is the behavior I'm
seeing expected?   Why would memory become wired and seemingly never get
released back to one of the page queues after the IO is done?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Need to revert behavior of OpenSSH to the old key order ...

2012-05-22 Thread Ian Lepore

On Mon, 2012-05-21 at 14:26 -0700, Jason Usher wrote:
 
 --- On Mon, 5/21/12, Garance A Drosehn g...@freebsd.org wrote:
  
 But have you tried it in this order ?
  
 HostKey /usr/local/etc/ssh/ssh_host_key
 HostKey
  /usr/local/etc/ssh/ssh_host_dsa_key
 HostKey
  /usr/local/etc/ssh/ssh_host_rsa_key
 HostKey
  /usr/local/etc/ssh/ssh_host_ecdsa_key
  
  Which is to say, have your sshd_config file list multiple
  hostkey's, and then restart sshd after making that change?
  I tried a similar change and it seemed to have some effect
  on what clients saw when connecting, but I can't tell if
  it has the effect that you want.
 
 
 The order of HostKey directives in sshd_config does not change the actual 
 order.  In newer implementations, RSA is provided first, no matter how you 
 configure the sshd_config.
 
 As I mentioned before, removing RSA completely is sort of a fix, but I can't 
 do that because some people might actually be explicitly using RSA, and they 
 would all break.
 
 Anyone ?

After poking through the sshd code a bit, it looks to me like this is
working as designed and it's the clients that are broken.  For host key
algorithm, and other things where both the server and the client side
have a list of possibilities and have to agree on a match from those
lists, the client side is completely in control of precedence, by
design.

The server has a list of things it can support, A,B,C,D.  The client
sends a list of things it desires, in order of preference, D,A,C.  The
server chooses a match as follows:

for each client list item
for each server list item
if current-client-item matches current-server-item
return current-client-item as the match
end if
end for
end for

In your case it appears that the client sends rsa,dsa as the host key
algorithm list.  The server has dsa,rsa,maybe,other,stuff and since
rsa is the client's first choice and exists in the server list, it gets
used.  Then the client rejects the rsa key because it was really only
ever going to be happy with a dsa key.  IMO, this is a client-side bug;
if it's only going to accept dsa (because that's the only thing in the
known_hosts file) then it should only ask for that.

So I think you have two choices...

 1) Only offer a dsa key.  It appears the right way to do this would be
to have just one HostKey statement in the sshd config file that names
your dsa key file.  The presence of at least one HostKey statement will
prevent the code from adding the default keyfile names internally, so
you should end up with only a dsa key being offered.

 2) Try the attached patch to violate the design and force the server's
configuration order to override the precedence implied by the client's
request list.  Put HostKey statements the sshd_config file in the order
you want enforced.

I don't think #2 is a good option, but I know how it is in a production
world... sometimes you've got to do things that you know are bad to keep
the show running.  Hopefully when you do such things it's just to buy
some time to deploy a better fix (but it doesn't always work out that
way; I still maintain horrible temporary hacks like this from years
and years ago).

Maybe option 1 would work okay for you in light of this info:  When I
look in the openssh source from freebsd 6.4, it appears that while an
rsa hostkey was supported, it would not be added to the server config by
default; it would only be used if you specifically configured it with a
HostKey statement in sshd_config.  So maybe you can safely assume that
nobody was ever connecting to your freebsd 6.x machines using an rsa
hostkey.

Now for The Big Caveat:  All of the above is based on code inspection.
I haven't tested anything, including the attached patch.

-- Ian

Index: crypto/openssh/kex.c
===
--- crypto/openssh/kex.c	(revision 235554)
+++ crypto/openssh/kex.c	(working copy)
@@ -371,7 +371,7 @@
 static void
 choose_hostkeyalg(Kex *k, char *client, char *server)
 {
-	char *hostkeyalg = match_list(client, server, NULL);
+	char *hostkeyalg = match_list(server, client, NULL);
 	if (hostkeyalg == NULL)
 		fatal(no hostkey alg);
 	k-hostkey_type = key_type_from_name(hostkeyalg);
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Need to revert behavior of OpenSSH to the old key order ...

2012-05-22 Thread Ian Lepore

On Tue, 2012-05-22 at 09:59 -0700, Jason Usher wrote:
 Hi Ian,
 
 Thank you very much for taking a look at this, and for understanding what I'm 
 talking about here.
 
 Comments inline, below...
 
 
 --- On Tue, 5/22/12, Ian Lepore free...@damnhippie.dyndns.org wrote:
 

   But have you tried it in this order
  ?

   HostKey
  /usr/local/etc/ssh/ssh_host_key
   HostKey
/usr/local/etc/ssh/ssh_host_dsa_key
   HostKey
/usr/local/etc/ssh/ssh_host_rsa_key
   HostKey
/usr/local/etc/ssh/ssh_host_ecdsa_key

Which is to say, have your sshd_config file list
  multiple
hostkey's, and then restart sshd after making that
  change?
I tried a similar change and it seemed to have
  some effect
on what clients saw when connecting, but I can't
  tell if
it has the effect that you want.
   
   
   The order of HostKey directives in sshd_config does not
  change the actual order.  In newer implementations, RSA
  is provided first, no matter how you configure the
  sshd_config.
   
   As I mentioned before, removing RSA completely is sort
  of a fix, but I can't do that because some people might
  actually be explicitly using RSA, and they would all break.
   
   Anyone ?
  
  After poking through the sshd code a bit, it looks to me
  like this is
  working as designed and it's the clients that are
  broken.  For host key
  algorithm, and other things where both the server and the
  client side
  have a list of possibilities and have to agree on a match
  from those
  lists, the client side is completely in control of
  precedence, by
  design.
 
 
 OK.  That's bad news, as I have no influence on the clients at all.
 
 
 
  In your case it appears that the client sends rsa,dsa as
  the host key
  algorithm list.  The server has
  dsa,rsa,maybe,other,stuff and since
  rsa is the client's first choice and exists in the server
  list, it gets
  used.  Then the client rejects the rsa key because it
  was really only
  ever going to be happy with a dsa key.  IMO, this is a
  client-side bug;
  if it's only going to accept dsa (because that's the only
  thing in the
  known_hosts file) then it should only ask for that.
 
 
 Exactly.  It would be nice if the client at least tried the other algorithm 
 to see if that does indeed match up with the public key it is sitting on ... 
 breaking automation out in the field is really problematic.
 
 
 
   1) Only offer a dsa key.  It appears the right way to
  do this would be
  to have just one HostKey statement in the sshd config file
  that names
  your dsa key file.  The presence of at least one
  HostKey statement will
  prevent the code from adding the default keyfile names
  internally, so
  you should end up with only a dsa key being offered.
 
 
 Ok, I did this - I explicitly defined a HostKey in sshd_config that happens 
 to be my DSA key:
 
 #HostKey for protocol version 1
 #HostKey /etc/ssh/ssh_host_key
 #HostKeys for protocol version 2
 HostKey /etc/ssh/ssh_host_dsa_key
 
 (note the last line is uncommented)
 
 and sshd does indeed just present the DSA key (to clients that were 
 previously negotiating the RSA key, after the upgrade).
 
 So this is great... I was originally wary of forcing DSA only like this, 
 since there might be clients out in the world that had somehow negotiated an 
 RSA key, but based on your further comments, it sounds like that is not the 
 case.
 
 So if everyone has DSA keys (we'll find out ...) then we are all set.
 
 Thank you very much for examining this issue - I hope the archives of this 
 conversation will help others in the future.

Seeing your example config with the commented-out HostKey lines made me
realize that you probably want to have two HostKey lines, one for the
protocol v1 key and another for the dsa key for v2.  The 6.x server
added the v1 key and the v2 dsa key by default, so you could have
existing clients relying on a v1 key.  Since you now have a HostKey
statement the new server code won't add the v1 key by default so you'd
need to be explicit about it.  

Based on examining the code, I think this will be safe because the keys
have different type-names (rsa1 vs rsa) so a client wanting to use a
protocol v2 rsa key won't accidentally match the protcol v1 rsa key
named in the config file (and it will still match the dsa key).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-21 Thread Ian Lepore

On Fri, 2012-05-18 at 16:13 +0200, Svatopluk Kraus wrote:
 On Thu, May 17, 2012 at 10:07 PM, Ian Lepore
 free...@damnhippie.dyndns.org wrote:
  On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
  Hi,
 
  I'm working on DMA bus implementation for ARM11mpcore platform. I've
  looked at implementation in ARM tree, but IMHO it only works with some
  assumptions. There is a problem with DMA on memory block which is not
  aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
 
  Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
  Then first cache line associated with the buffer can be divided into
  two parts, A and B, where A is a memory we know nothing about it and B
  is buffer memory. The same stands for last cache line associatted with
  the buffer. We have no problem if a memory is coherent. Otherwise it
  depends on memory attributes.
 
  1. [no cache] attribute
  No problem as memory is coherent.
 
  2. [write throught] attribute
  The part A can be invalidated without loss of any data. It's not problem 
  too.
 
  3. [write back] attribute
  In general, there is no way how to keep both parts consistent. At the
  start of DMA transaction, the cache line is written back and
  invalidated. However, as we know nothing about memory associated with
  part A of the cache line, the cache line can be filled again at any
  time and messing up DMA transaction if flushed. Even if the cache line
  is only filled but not flushed during DMA transaction, we must make it
  coherent with memory after that. There is a trick with saving part A
  of the line into temporary buffer, invalidating the line, and
  restoring part A in current ARM (MIPS) implementation. However, if
  somebody is writting to memory associated with part A of the line
  during this trick, the part A will be messed up. Moreover, the part A
  can be part of another DMA transaction.
 
  To safely use DMA with no coherent memory, a memory with [no cache] or
  [write throught] attributes can be used without problem. A memory with
  [write back] attribute must be aligned on CACHE_LINE_SIZE.
 
  However, for example mbuf, a buffer for DMA can be part of a structure
  which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
  can know that nobody will write to the structure during DMA
  transaction, so it's safe to use the buffer event if it's not aligned
  on CACHE_LINE_SIZE.
 
  So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
  we want to avoid bounce pages overhead, we must support additional
  information to DMA transaction. It should be easy to support the
  information about drivers data buffers. However, what about OS data
  buffers like mentioned mbufs?
 
  The question is following. Is or can be guaranteed for all or at least
  well-known OS data buffers which can be part of DMA access that the
  not CACHE_LINE_SIZE aligned buffers are surrounded by data which
  belongs to the same object as the buffer and the data is not written
  by OS when given to a driver?
 
  Any answer is appreciated. However, 'bounce pages' is not an answer.
 
  Thanks, Svata
 
  I'm adding freebsd-arm@ to the CC list; that's where this has been
  discussed before.
 
  Your analysis is correct... to the degree that it works at all right
  now, it's working by accident.  At work we've been making the good
  accident a bit more likely by setting the minimum allocation size to
  arm_dcache_align in kern_malloc.c.  This makes it somewhat less likely
  that unrelated objects in the kernel are sharing a cache line, but it
  also reduces the effectiveness of the cache somewhat.
 
  Another factor, not mentioned in your analysis, is the size of the IO
  operation.  Even if the beginning of the DMA buffer is cache-aligned, if
  the size isn't exactly a multiple of the cache line size you still have
  the partial flush situation and all of its problems.
 
  It's not guaranteed that data surrounding a DMA buffer will be untouched
  during the DMA, even when that surrounding data is part of the same
  conceptual object as the IO buffer.  It's most often true, but certainly
  not guaranteed.  In addition, as Mark pointed out in a prior reply,
  sometimes the DMA buffer is on the stack, and even returning from the
  function that starts the IO operation affects the cacheline associated
  with the DMA buffer.  Consider something like this:
 
 void do_io()
 {
 int buffer;
 start_read(buffer);
 // maybe do other stuff here
 wait_for_read_done();
 }
 
  start_read() gets some IO going, so before it returns a call has been
  made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets
  done on the cacheline containing the variable 'buffer'.  The act of
  returning from the start_read() function causes that cacheline to get
  reloaded, so now the stale pre-DMA value of the variable 'buffer' is in
  cache again.  Right after that, the DMA completes so that ram

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-17 Thread Ian Lepore

On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
 Hi,
 
 I'm working on DMA bus implementation for ARM11mpcore platform. I've
 looked at implementation in ARM tree, but IMHO it only works with some
 assumptions. There is a problem with DMA on memory block which is not
 aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
 
 Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
 Then first cache line associated with the buffer can be divided into
 two parts, A and B, where A is a memory we know nothing about it and B
 is buffer memory. The same stands for last cache line associatted with
 the buffer. We have no problem if a memory is coherent. Otherwise it
 depends on memory attributes.
 
 1. [no cache] attribute
 No problem as memory is coherent.
 
 2. [write throught] attribute
 The part A can be invalidated without loss of any data. It's not problem too.
 
 3. [write back] attribute
 In general, there is no way how to keep both parts consistent. At the
 start of DMA transaction, the cache line is written back and
 invalidated. However, as we know nothing about memory associated with
 part A of the cache line, the cache line can be filled again at any
 time and messing up DMA transaction if flushed. Even if the cache line
 is only filled but not flushed during DMA transaction, we must make it
 coherent with memory after that. There is a trick with saving part A
 of the line into temporary buffer, invalidating the line, and
 restoring part A in current ARM (MIPS) implementation. However, if
 somebody is writting to memory associated with part A of the line
 during this trick, the part A will be messed up. Moreover, the part A
 can be part of another DMA transaction.
 
 To safely use DMA with no coherent memory, a memory with [no cache] or
 [write throught] attributes can be used without problem. A memory with
 [write back] attribute must be aligned on CACHE_LINE_SIZE.
 
 However, for example mbuf, a buffer for DMA can be part of a structure
 which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
 can know that nobody will write to the structure during DMA
 transaction, so it's safe to use the buffer event if it's not aligned
 on CACHE_LINE_SIZE.
 
 So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
 we want to avoid bounce pages overhead, we must support additional
 information to DMA transaction. It should be easy to support the
 information about drivers data buffers. However, what about OS data
 buffers like mentioned mbufs?
 
 The question is following. Is or can be guaranteed for all or at least
 well-known OS data buffers which can be part of DMA access that the
 not CACHE_LINE_SIZE aligned buffers are surrounded by data which
 belongs to the same object as the buffer and the data is not written
 by OS when given to a driver?
 
 Any answer is appreciated. However, 'bounce pages' is not an answer.
 
 Thanks, Svata

I'm adding freebsd-arm@ to the CC list; that's where this has been
discussed before.

Your analysis is correct... to the degree that it works at all right
now, it's working by accident.  At work we've been making the good
accident a bit more likely by setting the minimum allocation size to
arm_dcache_align in kern_malloc.c.  This makes it somewhat less likely
that unrelated objects in the kernel are sharing a cache line, but it
also reduces the effectiveness of the cache somewhat.

Another factor, not mentioned in your analysis, is the size of the IO
operation.  Even if the beginning of the DMA buffer is cache-aligned, if
the size isn't exactly a multiple of the cache line size you still have
the partial flush situation and all of its problems.

It's not guaranteed that data surrounding a DMA buffer will be untouched
during the DMA, even when that surrounding data is part of the same
conceptual object as the IO buffer.  It's most often true, but certainly
not guaranteed.  In addition, as Mark pointed out in a prior reply,
sometimes the DMA buffer is on the stack, and even returning from the
function that starts the IO operation affects the cacheline associated
with the DMA buffer.  Consider something like this:

void do_io()
{
int buffer;
start_read(buffer);
// maybe do other stuff here
wait_for_read_done();
}

start_read() gets some IO going, so before it returns a call has been
made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets
done on the cacheline containing the variable 'buffer'.  The act of
returning from the start_read() function causes that cacheline to get
reloaded, so now the stale pre-DMA value of the variable 'buffer' is in
cache again.  Right after that, the DMA completes so that ram has a
newer value that belongs in the buffer variable and the copy in the
cacheline is stale.  

Before control gets into the wait_for_read_done() routine that will
attempt to handle the POSTREAD partial cacheline flush, another thread
gets control and begins

Re: csh builtin command problems

2012-05-14 Thread Ian Lepore

On Wed, 2012-05-09 at 21:34 -0400, Robert Simmons wrote:
 I'm trying to use sysv style echo in /bin/csh and I've hit a wall as
 to how to get it to work.
 
 The following does not have the outcome that I'm looking for:
 
 # echo_style=sysv
 # echo test\ttest  test
 # cat test
 testttest
 
 I want this:
 
 # echo test\ttest  test
 # cat test
 testtest
 
 Any thoughts?

What I see on 8.3 is this:

% set echo_style=sysv
% echo test\ttest
testttest
% echo test\ttest
testtest
% 

So it seems from this very minimal test that the implementation of echo
is correct, but the parsing of the command line in csh requires that the
\t in the arg be protected with quotes.  (I don't normally spend any
longer in csh than it takes for a .cshrc to launch bash, and even that's
only on systems where I don't control /etc/passwd to just use bash
directly.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: diagonising a overheating problem

2012-05-14 Thread Ian Lepore

On Mon, 2012-05-14 at 18:56 -0400, Aryeh Friedman wrote:
 On Mon, May 14, 2012 at 6:37 PM, Bartosz Fabianowski free...@chillt.de 
 wrote:
  Try sysctl dev.cpu.0.temperature. I have a notoriously overheating Dell
  laptop and for me, this sysctl always reports the temperature.
 
  - Bartosz
 
 ~/Desktop aryeh@localhost% sysctl dev.cpu.0.temperature
 sysctl: unknown oid 'dev.cpu.0.temperature'
 ~/Desktop aryeh@localhost% sysctl dev.cpu.0
 dev.cpu.0.%desc: ACPI CPU
 dev.cpu.0.%driver: cpu
 dev.cpu.0.%location: handle=\_PR_.C000
 dev.cpu.0.%pnpinfo: _HID=none _UID=0
 dev.cpu.0.%parent: acpi0
 dev.cpu.0.freq: 1500
 dev.cpu.0.freq_levels: 1500/7260 1400/6056 1225/5299 1200/5125
 1100/4500 1000/4095 900/3753 800/3468 700/3034 600/2601 500/2167
 400/1734 300/1300 200/867 100/433
 dev.cpu.0.cx_supported: C1/0 C2/100
 dev.cpu.0.cx_lowest: C1
 dev.cpu.0.cx_usage: 100.00% 0.00% last 233us

dev.cpu.0.temperature is provided by the coretemp(4) driver, maybe you
need to kldload it?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Calling tsleep(9) with interrupts disabled

2012-05-08 Thread Ian Lepore

I just realized that I've accidentally coded a sequence similar to this
in a driver:

   s = intr_disable();
   // do stuff here
   tsleep(sc, 0, twird, hz / 4);
   // more stuff
   intr_restore(s);

Much to my surpise this works, including waking up due to wakeup(sc)
being called from an interrupt handler.  So apparently tsleep() results
in interrupts being re-enabled during the sleep, although nothing in the
manpage says that will happen.

Can I safely rely on this behavior, or is it working by accident?

(Please no lectures on the evils of disabling interrupts...  This is not
a multi-GHz multi-core Xeon, it's a 180mhz embedded SoC with buggy
builtin devices that will drop or corrupt data if an interrupt happens
during the do stuff here part of the code.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool

2012-04-18 Thread Ian Lepore

On Wed, 2012-04-18 at 09:41 -0400, John Baldwin wrote:
 On Wednesday, April 18, 2012 2:02:22 am Andriy Gapon wrote:
  on 17/04/2012 23:43 John Baldwin said the following:
   On Tuesday, April 17, 2012 4:22:19 pm Andriy Gapon wrote:
   We already have a flag for ZFS (KARGS_FLAGS_ZFS, 0x4).  So the new flag 
   could be
   named something ZFS-specific (as silly as KARGS_FLAGS_ZFS2) or something 
   more
   general such as KARGS_FLAGS_32_BYTES meaning that the total size of 
   arguments
   area is 32 bytes (as opposed to 24 previously).
   
   Does KARGS_FLAGS_GUID work?
   
  
  I think that's too terse, we already passed a pool guid via the existing
  argument space.  So it should be something like KARGS_FLAGS_ZFS_FS_GUID or
  KARGS_FLAGS_ZFS_DS_GUID (DS - dataset).
 
 Ah.  I do think the flag should indicate that the bootinfo structure is 
 larger,
 I was assuming you were adding a new GUID field that didn't exist before.
 I can't think of something better than KARGS_FLAGS_32.  What might be nice
 actually, is to add a new field to indicate the size of the argument area and
 to set a flag to indicate that the size field is present (KARGS_FLAGS_SIZE)?


YES!  A size field (preferably as the first field in the struct) along
with a flag to indicate that it's a new-style boot info struct that
starts with a size field, will allow future changes without a lot of
drama.  It can allow code that has to deal with the struct without
interpretting it (such as trampoline code that has to copy it to a new
stack or memory area as part of loading the kernel) to be immune to
future changes.

This probably isn't a big deal in the x86 world, but it can be important
for embedded systems where a proprietary bootloader has to pass info to
a proprietary board_init() type routine in the kernel using
non-proprietary loader/trampoline code that's part of the base.

We have a bit of a mess in this regard in the ARM world right now, and
it would be a lot lessy messy if something like this had been in place.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool

2012-04-18 Thread Ian Lepore

On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote:
 on 18/04/2012 17:22 Ian Lepore said the following:
  YES!  A size field (preferably as the first field in the struct) along
  with a flag to indicate that it's a new-style boot info struct that
  starts with a size field, will allow future changes without a lot of
  drama.  It can allow code that has to deal with the struct without
  interpretting it (such as trampoline code that has to copy it to a new
  stack or memory area as part of loading the kernel) to be immune to
  future changes.
 
 Yeah, placing the new field at front would immediately break compatibility and
 even access to the flags field :-)
 

Code would only assume the new field was at the front of the struct if
the new flag is set, otherwise it would use the historical struct
layout.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool

2012-04-18 Thread Ian Lepore

On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote:
 on 18/04/2012 17:22 Ian Lepore said the following:
  YES!  A size field (preferably as the first field in the struct) along
  with a flag to indicate that it's a new-style boot info struct that
  starts with a size field, will allow future changes without a lot of
  drama.  It can allow code that has to deal with the struct without
  interpretting it (such as trampoline code that has to copy it to a new
  stack or memory area as part of loading the kernel) to be immune to
  future changes.
 
 Yeah, placing the new field at front would immediately break compatibility and
 even access to the flags field :-)
 

Oh wait, is the flags field embedded in the struct?  My bad, I didn't
look.  In the ARM code I'm used to working with, the flags are passed
from the bootloader to the kernel entry point in a register; I don't
know why assumed that would be true on other platforms.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [GSoC] [ARM] arm cleanup - my own proposal

2012-04-14 Thread Ian Lepore

On Sun, 2012-04-01 at 20:19 +0200, Aleksander Dutkowski wrote:
 hello!
 
 after few weeks searching for interesting idea for me, I've decided to
 propose my own one. It is already mentioned on IdeasPage:
 - ARM cleanup
 
 Why I have chosen this one? I am very interested in embedded world.
 Now I am working on porting FBSD to at91sam9g45 - I will be much more
 motivated working on arm fbsd project than any other.
 
 Why should you let me do that project? While working on freebsd/arm
 I've noticed places that could be optimized, or separated, i.e.
 at91_samsize() should be declared for each board separately - now,
 this function has if-else and checks, which board is he running on.
 
 I would like to identify and fix that bugs, so the code will be more
 efficient and clear. Moreover, I think there should be a
 tutorial/framework for adding new boards or SoCs, so I will be
 simplier. I am currently reading the code in sys/arm/at91 and
 searching for improvements but I will be very pleased, if you send me
 your insights.
 
 The first question is - should I cleanup only at91 branch or more? I
 am quite familiar with at91 right now.
 The second - how to test the code? Some of boards could be tested in
 qemu, I could buy board with at91rm9200 for example, if I'm in. But
 maybe I will find here people with their own boards, they could help
 me testing? I havs sbc6045 board with at91sam9g45 SoC but it hasn't
 fbsd support yet (I'm working on it now :) )
 
 I also thought about reducing kernel size for embedded, if arm cleanup
 won't fit.
 
 

I'm curious whether you ever got a reply to this privately, since
nothing appeared on the list?  I meant to reply and offer to do testing
of at91 changes on rm9200 hardware, but I was on vacation when you
posted originally, and I forgot to reply until just now.

It's been my growing impression for about a year that the arm support in
FreeBSD has atrophied to the point where it can barely be said that it's
supported at all.  Now I see this morning that marius@ has committed a
set of style cleanups to the at91 code (r234281), so maybe it's not
quite as dead as I feared.

At Symmetricom we build a variety of products based on the rm9200, and
we're maintaining quite a set of diffs from stock FreeBSD.  Some are bug
fixes, some are enhancements such as allowing the master clock frequency
to be changed during kernel init (instead of in the bootloader) and a
hints-based system that allows the atmelarm bus to become aware of new
child devices that aren't in the stock code and manage their resources.
It sure would be nice if some of those diffs could get rolled back in;
it would certainly make it easier for me to integrate things like
Marius' style cleanups back into our repo.

Anyway, if ongoing changes are going to be happening to the at91 code,
I'm certainly interested in helping however I can.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Ian Lepore

On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote:
 Hi,
 
 I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and
 working around it by claiming that our pthread library isn't normal
 which uses standard signals rather then a signal thread.
 
 My limited understanding of these facilities is however not enough to
 see the actual problem here and reading of related manpages did not lead
 me to a solution either. A test case reproducing the problem is attached.
 
 What happens is that SIGCHLD is never received by the signal thread and
 the child processes turn to zombies. Signal counters never go up, not
 even for SIGINFO, which I added specifically to see if anything gets
 through at all.
 
 The signal thread shows being stuck in sigwait. It's reproducible on
 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
 anything newer unfortunately, but I suspect this is a bug/linuxism in
 the code not in FreeBSD.
 
 Thanks in advance for any insights.
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

The signal mask for a new thread is inherited from the parent thread.
In your example code, the signal handling thread inherits the blocked
status of the signals as set up in main().  Try adding this line to
signal_handler() before it goes into its while() loop:

 pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL);

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Ian Lepore

On Wed, 2012-04-11 at 17:47 +0300, Konstantin Belousov wrote:
 On Wed, Apr 11, 2012 at 08:26:13AM -0600, Ian Lepore wrote:
  On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote:
   Hi,
   
   I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and
   working around it by claiming that our pthread library isn't normal
   which uses standard signals rather then a signal thread.
   
   My limited understanding of these facilities is however not enough to
   see the actual problem here and reading of related manpages did not lead
   me to a solution either. A test case reproducing the problem is attached.
   
   What happens is that SIGCHLD is never received by the signal thread and
   the child processes turn to zombies. Signal counters never go up, not
   even for SIGINFO, which I added specifically to see if anything gets
   through at all.
   
   The signal thread shows being stuck in sigwait. It's reproducible on
   8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
   anything newer unfortunately, but I suspect this is a bug/linuxism in
   the code not in FreeBSD.
   
   Thanks in advance for any insights.
   ___
   freebsd-hackers@freebsd.org mailing list
   http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
   To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
  
  The signal mask for a new thread is inherited from the parent thread.
  In your example code, the signal handling thread inherits the blocked
  status of the signals as set up in main().  Try adding this line to
  signal_handler() before it goes into its while() loop:
  
   pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL);
 
 This is completely wrong. sigwait(2) requires the waited signals to be
 blocked, so the code is right in this regard.
 

Ooops, sorry.  The code that sets up our signal handling threads uses
SIG_SETMASK rather than BLOCK/UNBLOCK, and my quick glance at it
misinterpretted what it was doing.

-- Ian



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Regarding coredump and restart

2012-03-29 Thread Ian Lepore

On Fri, 2012-03-30 at 01:10 +0800, Mahesh Babu wrote:
 I am currently working on coredump and then restarting the process in FreeBSD 
 9.
 
 
 I have created the coredump file for a process using gcore of gdb.
 
 I am not able restart the process from the coredump file.
 
 Is there any ways to restart the process using gdb itself or any other ways 
 to implement restarting of the process from the coredump file?
 
 
 Thanks,
 Mahesh

A coredump does not contain the entire state of a process, it only
contains the part of the state that is contained within memory belonging
to the process.  Other parts of the state can exist outside of that
memory.   For example, in open disk files, in the corresponding state of
another process at the other end of a socket connection, and so on.
Bringing back the memory image will not bring back the corresponding
state in external resources.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Graphical Terminal Environment

2012-03-06 Thread Ian Lepore

On Tue, 2012-03-06 at 10:24 -0500, Brandon Falk wrote:
 On 3/6/2012 11:05 AM, per...@pluto.rain.com wrote:
  Brandon Falk bfalk_...@brandonfa.lk wrote:
 
  I havent tried tmux yet, but on my system im only able to get
  80x40 with vidcontrol on one monitor. But with xterm in xorg
  i can get 319x89 per monitor ...
 
  To get higher resolution than what vidcontrol provides, you'll most
  likely need to run the display in graphic mode (which is what X11
  does) rather than in text mode.  That means that you will need to
  either use, or reinvent, the lowest levels of X (display driver,
  window mapping) and at least part of the xterm/rxvt application
  (terminal emulation, font rasterizing, perhaps scrolling).  You
  could, however, eliminate the X practice of using the network to
  connect the terminal emulator to the display; this would give you
  an architecture resembling SunView (and its predecessor, SunTools).
 
  I _think_ SunTools/SunView were proprietary, although it's possible
  that Sun released the source code at some point.  You could try
  doing some research with Google and/or the Internet Archive.
 
 That's pretty much my plan. To write some lower level drivers to put the 
 system
 in a graphics mode. I have 4 monitors and there is no other way to get 
 multiple
 monitors without a GPU specific driver (at least from my VGA OSDev 
 experience).
 My goal will be to make a driver that will be able to be runnable by any other
 driver easily. Instead of having to use Xorg, just calls to the video driver 
 to
 set the mode to graphics, then some primitive functions to draw lines and 
 dots.
 
 I don't see why Xorg should dominate the drivers completely, I really wish it
 was a matter of having an open, well documented, easy to use API that you can
 just give direct commands to.
 
 From my understanding, this is the current model:
 
 [  Apps   ]
 |
 v
 [  Xorg   ]
 |
 v
 [  Driver ]
 |
 v
 [  GPU]
 
 I think it should be the following:
 
 [ Apps ]
|
v
 [ Xorg ]   [ Apps ]
|  |
v  v
 [Driver  ]
|
v
 [  GPU   ]
 
 Does this make sense to anyone else? I really want to get this idea across
 because I think it would be really beneficial.
 
 -Brandon

With that model and your statement that the driver should support only
primitive functions to draw lines and dots, that leaves the non-trivial
problem of font rendering to the app.  Given your original goal, font
rendering is pretty much the bulk of what you want to do, is the app
layer the right place for it?

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: mtree(8) reporting of file modes

2012-03-06 Thread Ian Lepore

On Tue, 2012-03-06 at 12:41 -0800, David Wolfskill wrote:
 As I mentioned in
 http://docs.FreeBSD.org/cgi/mid.cgi?20120306000520.GS1519, at work,
 we're trying to use mtree(8) to do reality checks on server
 configuration/provisioning.  (We are not proposing the use of mtree to
 actually enforce a particular configuration -- we are only considering
 using it to generate specification files, then check aa given system
 against those specification files.)
 
 I had thought it odd (after running mtree -c) that most of the entries
 in the resulting specification file failed to mention the mode of the
 file; this was the catalyst for the above-cited message.
 
 In the mean time, I started poking at the sources.
 
 Caveat: I'm not really a C programmer; the  bulk of my background is in
 sysadmin-type positions (though I've been doing other stuff for the last
 4 years).
 
 Anyway, I fairly quickly focused my attention on
 src/usr.sbin/mtree/create.c, in particular, on the statf() function
 therein.
 
 Most of this part of the code is barely changed since 4.4 Lite; the most
 recent change to the section in question (lines 207 - 208 from the
 version in head as of r232599) was made by rgrimes@ back in 1994.
 
 So I presume that there's something I'm overlooking or otherwise
 missing, since the folks who have been here before were certainly more
 clueful than I am.
 
 But the code in question:
 
 ...
 206 }
 207 if (keys  F_MODE  (p-fts_statp-st_mode  MBITS) != mode)
 208 output(indent, offset, mode=%#o, p-fts_statp-st_mode 
  MBITS);
 ...
 
 is what outputs the mode to standard output.
 
 Here is (the bulk of) what I found:
 
 * The keys  F_MODE term merely tests to see if we are interested
   in reporting the file mode.  (By default, we are.)
 
 * p-fts_statp-st_mode refers to the st_mode returned from stat()
   for the file presently being examined.
 
 * MBITS is a mask of mode bits about which we care; it is defined
   (in mtree.h) as (S_ISUID|S_ISGID|S_ISTXT|S_IRWXU|S_IRWXG|S_IRWXO).
   These are defined in sys/stat.h; MBITS, thus, works out to 000.
 
 * mode is set to the (masked) mode of the (immediately) enclosing
   directory when it is visited in pre-order.  (This is done in statd().)
 
 As a result, we only report the mode of a file if it differs from the
 mode of its parent directory.
 
 Huh??!?
 
 
 Maybe I'm confused, but certainly for my present purposes, and likely in
 general, I'd think it would make sense to just always report the file
 mode.
 
 A way to do that would be to change the above excerpt to read:
 
 ...
 206 }
 207 if (keys  F_MODE)
 208 output(indent, offset, mode=%#o, p-fts_statp-st_mode 
  MBITS);
 ...
 
 
 Another alternative, in case there are use cases for the existing
 behavior, would be to provide either another key or a command-line
 flag that says give me all the modes.
 
 Am I the only one who would find such a change useful?
 
 Thanks for any reality checks. :-}
 
 Peace,
 david

At a glance I think the idea here is that when it outputs the directory
entry it outputs a /set line that has the directory's mode in it, and
then as it does the files in that directory it only needs to output a
mode= clause for a file if it differs from the most recent /set line.
(This is based on studying the code for about 30 seconds, so don't take
it as gospel.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: How to access kernel memory from user space

2012-02-22 Thread Ian Lepore

On Wed, 2012-02-22 at 17:24 +, Svetlin Manavski wrote:
 Hi all,
 I have a very similar problem as described in this thread back in 2009:
 
 http://lists.freebsd.org/pipermail/freebsd-hackers/2009-January/027367.html
 
 I have a kernel module producing networking stats which I need to
 frequently read from the user space. A copy of the data structure would be
 too expensive so I need to access the kernel data directly from the user
 space.
 
 Unfortunately Alexej's code crashes in the following area:
 
 vm_map_lookup(kmem_map, addr, VM_PROT_ALL, myentry, myobject, mypindex,
 myprot, mywired); /* OUT */
 vm_map_lookup_done(kmem_map, myentry);
 I am using 64bit FreeBSD 8.2 on Intel Xeon hardware.
 Any idea how to make a stable implementation on my platform?
 
 Thank you,
 Svetlin
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

I've never done this, but if I needed to, I think the first thing I'd
try is to use an mmap(2) of /dev/kmem to map the memory you need into
userspace (of course your userspace app will need to be running with
root privs to do this).  

That leaves the interesting problem of locating what offset within the
kernel virtual address space you need to map to get at your data.  Two
things come to mind... have your kernel module export the address in a
sysctl (that feels kind of hack-ish but it should be quick and easy to
do), or use libkvm's kvm_nlist() function to locate the symbol within
your module (I think that should be possible; again I've never actually
done any of this).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Parallels v4 regression (aka ada(4) oddity) in RELENG_9

2012-01-23 Thread Ian Lepore

On Mon, 2012-01-23 at 10:06 -0800, Devin Teske wrote:
 I have a Parallels virtual machine and it runs FreeBSD 4 through 8 just
 swimmingly.
 
 However, in RELENG_9 I notice something different. My once ad0 is now 
 showing
 up as ada0. However, something even stranger is that devfs is providing both
 ad0 family devices AND ada0 family devices.
 
 What's worse is that I can't seem to partition the disk with MBR+disklabel
 scheme.
 
 My procedure goes something like this:
 
 1. Boot from RELENG_9 LiveCD
 2. Execute: sysctl -n kern.disks
 3. Notice two items: cd0 ada0
 4. Look in /dev
 5. Notice several items: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1 ada0p2 ada0p3
 6. Wipe partition table by executing: dd if=/dev/zero of=/dev/ada0 bs=512k
 count=256
 7. Look in /dev
 8. Notice less items now: ad0 ada0
 9. Execute: sysctl -n kern.disks
 10. Notice nothing changed: cd0 ada0
 11. Write out standard whole disk MBR slice
 12. Look in /dev
 13. Notice that nothing changed: ad0 ada0
 NOTE: Where is ad0s1 or ada0s1?
 14. Use fdisk to make sure everything was written successfully
 15. Notice everything looks good (slice 1 is of type FreeBSD, slice 2, 3, and 
 4
 are unused)
 16. Reboot
 17. Boot back into RELENG_9 LiveCD
 18. Look in /dev
 19. Notice that the old devices are back!: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1
 ada0p2 ada0p3
 20. Use fstab to look at MBR partition table
 21. Notice that things look good (with respect to fdisk'ing): slice 1 is
 FreeBSD, 2, 3, and 4 are still unused
 22. Notice /dev still doesn't have ad0s1 or ada0s1
 23. Use gpart to look at ada0
 24. Notice GPT [CORRUPT]
 
 ...
 
 OK!?!?
 
 ...
 
 Use same exact RELENG_9 LiveCD on either a physical machine or VMware Virtual
 machine.
 
 SUCCESS!!
 
 Go back to Parallels 4
 
 FAILURE!!
 
 Go back to RELENG_8 LiveCD with Parallels 4
 
 SUCCESS!!
 
 What's going on here? I think ada(4) is my problem. Can someone please provide
 feedback? Willing to dig further and provide any/all feedback to help fix this
 regression.

I've experienced the part of that scenario where changing a drive from
gpt to mbr scheme results in all the gpt partitions reappearing after a
reboot.  I concluded (but didn't take time to be absolutely certain)
that during boot the geom layer was seeing the backup gpt partition info
at the end of the disk and concluding that it needed to ignore the mbr
and use the backup gpt info instead.  Once I quit using dd and similar
tools and consistantly used gpart destroy to wipe out the gpt before
changing to mbr, it stopped happening.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Parallels v4 regression (aka ada(4) oddity) in RELENG_9

2012-01-23 Thread Ian Lepore

On Mon, 2012-01-23 at 10:15 -0800, Garrett Cooper wrote:
 On Mon, Jan 23, 2012 at 10:06 AM, Devin Teske devin.te...@fisglobal.com 
 wrote:
  I have a Parallels virtual machine and it runs FreeBSD 4 through 8 just
  swimmingly.
 
  However, in RELENG_9 I notice something different. My once ad0 is now 
  showing
  up as ada0. However, something even stranger is that devfs is providing 
  both
  ad0 family devices AND ada0 family devices.
 
  What's worse is that I can't seem to partition the disk with MBR+disklabel
  scheme.
 
  My procedure goes something like this:
 
  1. Boot from RELENG_9 LiveCD
  2. Execute: sysctl -n kern.disks
  3. Notice two items: cd0 ada0
  4. Look in /dev
  5. Notice several items: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1 ada0p2 ada0p3
  6. Wipe partition table by executing: dd if=/dev/zero of=/dev/ada0 bs=512k
  count=256
  7. Look in /dev
  8. Notice less items now: ad0 ada0
  9. Execute: sysctl -n kern.disks
  10. Notice nothing changed: cd0 ada0
  11. Write out standard whole disk MBR slice
  12. Look in /dev
  13. Notice that nothing changed: ad0 ada0
  NOTE: Where is ad0s1 or ada0s1?
  14. Use fdisk to make sure everything was written successfully
  15. Notice everything looks good (slice 1 is of type FreeBSD, slice 2, 3, 
  and 4
  are unused)
  16. Reboot
  17. Boot back into RELENG_9 LiveCD
  18. Look in /dev
  19. Notice that the old devices are back!: ad0 ad0p1 ad0p2 ad0p3 ada0 ada0p1
  ada0p2 ada0p3
  20. Use fstab to look at MBR partition table
  21. Notice that things look good (with respect to fdisk'ing): slice 1 is
  FreeBSD, 2, 3, and 4 are still unused
  22. Notice /dev still doesn't have ad0s1 or ada0s1
  23. Use gpart to look at ada0
  24. Notice GPT [CORRUPT]
 
  ...
 
  OK!?!?
 
  ...
 
  Use same exact RELENG_9 LiveCD on either a physical machine or VMware 
  Virtual
  machine.
 
  SUCCESS!!
 
  Go back to Parallels 4
 
  FAILURE!!
 
  Go back to RELENG_8 LiveCD with Parallels 4
 
  SUCCESS!!
 
  What's going on here? I think ada(4) is my problem. Can someone please 
  provide
  feedback? Willing to dig further and provide any/all feedback to help fix 
  this
  regression.
 
 The 'bug' is in gpart/geom and the 'issue' is present in prior
 versions of FreeBSD. The backup partition is now more of a thorn in
 everyone's side than previous versions. gpart delete'ing all the
 partitions, then doing gpart destroy is probably what you want (there
 isn't a simple one-liner that would do this).
 Thanks,
 -Garrett


'gpart destroy -F geom' should do it in one step.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Rebooting/Halting system from kernel module

2012-01-22 Thread Ian Lepore

On Sun, 2012-01-22 at 14:19 +0400, geoffrey levand wrote:
 Hi,
 
 how would i reboot/halt the system from a kernel module ?
 
 regards
 
 --
 Почта@Mail.Ru в твоем мобильном!
 Просто зайди с телефона на m.mail.ru

There is an undocumented (at least in terms of a manpage) function named
shutdown_nice() in sys/kern/kern_shutdown.c that will send a signal to
the init process if it's running or call boot(9) if not.  Or maybe a
direct call to boot(9) is what you're looking for, if bypassing the
running of rc shutdown scripts and all is your goal.  (There is a mapage
for boot(9)).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-17 Thread Ian Lepore

On Tue, 2012-01-17 at 10:56 -0800, Julian Elischer wrote:
 If it came to that maybe all the people who are currently saying they 
 need better
 support of the 8.x branch could get together and together, support 
 someone
 to do that job for them..would 1/5th of  a person be too expensive
 for 
 them?
 
 if not, what is a reasonable cost?  Is it worth 1/20 th of a person?
 
 
 Julian
 

I've got to say, this strikes me as the most interesting idea floated so
far in this conversation.  I've heard of many instances of sponsored
projects; they almost always involve major new features or support for
new hardware or technologies; paying someone for a specific small
focused fix is also common.  

A sponsored branch is... well... just an interesting concept to me. 

Unlike most developers, I have little interest in creating new code from
scratch to implement the fad of the week.  (There's that whole other
opensource OS if fad of the week technology is your thing.)  I live to
find and fix bugs.  Sometimes that means days of frustration to generate
a one-line patch.  Sometimes you find the problem in minutes but the fix
means a painful redesign that touches 342 files and has the potential to
ruin everyone's day when you get it wrong.  But, for me at least, it's
much more challenging and thus more rewarding when you get it right.

Despite being a developer myself, I understand completely where John is
coming from in opening this conversation, and I'm firmly in the me too
camp because I'm also an end user of FreeBSD.  I work at a company that
creates embedded systems products with FreeBSD as our OS. 

In July we started the process of converting our products from 6.2 to
8.2.  Out of sheer emergency necessity we shipped a product using 8.2 in
October -- 6.2 was panicking and the customer was screaming, we had no
choice; we've had to do several fix releases since then.  It's only
within the past couple weeks that I think we're finally ready to deploy
8.2 for all new products.  More testing is needed before updating
existing products in the field.  It takes a long time for a business to
vet a major release of an OS and deploy it.  It costs a lot.

Now, before we're even really completely up and running on 8.2 at work,
9.0 hits the street, and developers have moved on to working in the 10.0
world.  What are the chances that any of the patches I've submitted for
bugs we fixed in 8.x are ever going to get commited now that 8 is well
on its way to becoming ancient history in developers' minds?

So back to where I started this rambling... that concept of a sponsored
branch, or maybe something along the lines of a long-lived stable branch
supported by a co-op of interested users.  Some co-op members may be
able to provide developers or other engineering-related resources, some
may just pay cash to help acquire those resources for various short-term
or targeted needs along the way.  I think it could work, and I think
businesses that need such stability might find it easier to contribute
something to a co-op than the current situation that requires a company
such as ours to become, in effect, our own little FreeBSD Project
Lite (if you think FreeBSD lacks manpower to do release engineering,
imagine how hard it is for a small or medium sized business).

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-17 Thread Ian Lepore

On Wed, 2012-01-18 at 01:17 +0200, Andriy Gapon wrote:
 on 17/01/2012 23:46 Ian Lepore said the following:
  Now, before we're even really completely up and running on 8.2 at work,
  9.0 hits the street, and developers have moved on to working in the 10.0
  world.  What are the chances that any of the patches I've submitted for
  bugs we fixed in 8.x are ever going to get commited now that 8 is well
  on its way to becoming ancient history in developers' minds?
 
 My opinion is that this will have more to do with your approach to pushing the
 patches (and your persistence) rather than with anything else.  As long as
 stable/8 is still a supported branch or the bugs are reproducible in any of 
 the
 supported branches.

Well I submitted a sort of random sample of the patches we're
maintaining at work, 11 of them as formal PRs and 2 posted to the lists
here recently.  So far two have been committed (the most important one
and the most trivial one, oddly enough).  I'm not sure just how pushy
one is supposed to be, I don't want to be a jerk.  Not to mention that I
wouldn't know who to push.  That's actually why I'm now being active on
the mailing lists, I figured maybe patches will be more accepted from
someone the commiters know rather than just as code out of the blue
attached to a PR.

I think it would be great if there were some developers (a team, maybe
something not quite that formal) who concentrated on maintenance of
older code for the user base who needs it.  I'd be happy to contribute
to that effort, both on my own time, and I have a commitment from
management at work to allow me a certain amount of billable work hours
to interface with the FreeBSD community, especially in terms of getting
our work contributed back to the project (both to help the project, and
to help us upgrade more easily in the future).

I have no idea if there are enough developers who'd be interested in
such a concept to make it work, co-op or otherwise.  But I like the fact
that users and developers are talking about their various needs and
concerns without any degeneration into flame wars.  It's cool that most
of the focus here is centered on how to make things better for everyone.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: BeagleBone?

2012-01-15 Thread Ian Lepore

On Sun, 2012-01-15 at 16:05 -0800, Tim Kientzle wrote:
 Just got a BeagleBone in the mail and so far, it seems like fun:
  * Under $100
  * Relatively modern Cortex-A8 ARM CPU (TI AM3358)
  * Built-in Ethernet, USB console, etc.
 
 So far,  I've gotten console access from my FreeBSD
 laptop and am starting to tinker with a nanobsd-like
 script to build a bootable SD image.  (By copying the
 MLO and u-boot.img files; nothing FreeBSD-specific yet.)
 
 Next step:  Compile the arm/uboot boot loader and
 see if I can get that to load and run.
 
 Anyone else tinkering with one of these?  Any
 hints?  ;-)
 
 Tim

The freebsd-arm list would be the place for info.  There's still work to
do to get FreeBSD running on a Cortex-A8, last I heard.

-- Ian

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: trouble with atrtc

2012-01-09 Thread Ian Lepore

On Thu, 2012-01-05 at 10:33 -0500, John Baldwin wrote:
 On Wednesday, January 04, 2012 5:22:29 pm Ian Lepore wrote:
  [...]
  Because atrtc.c has a long and rich history of modifcations, some of
  them fairly recent, I thought it would be a good idea to toss out my
  ideas for changes here and solicit feedback up front, rather than just
  blindly posting a PR with a patch...
  
  It turns out to be very easy to probe for the latched-read behavior with
  just a few lines of code in atrtc_start(), so I'd propose doing that and
  setting a flag that the in/out code can use to disable the caching of
  the current register number on hardware that needs it.
  
  I'd like to add a new public function, atrtc_nmi_enable(int enable) that
  drivers can use to manipulate the NMI flag safely under clock_lock and
  playing nicely with the register number caching code.
  
  Completely unrelated but nice to have: I'd like to add a tuneable to
  control the use of inb(0x84) ops to insert delays after writing to 0x70
  and 0x71.  Modern hardware doesn't need this, so I think it should
  default to not inserting delays.
  
  I've done all these things in our local 8.2 code base and tested them on
  all the hardware I've got on hand.  If these changes sound acceptable
  I'll prepare patches to -current as well.
 
 These changes all sound good to me.
 

Here is the patch for -current and 9.  I can provide a patch to 8-stable
as well; it's essentially the same patch with small context differences.

I've tested this using -current on several systems, recent and old
hardware, including manually bumping up the quality score for the rtc
event timer to force it to get used, and it seems to work without
trouble (and of course I've been testing the same patch in 8.2 for a
while on a bunch of different hardware).

Index: sys/isa/rtc.h
===
RCS file: /local/base/FreeBSD-CVS/src/sys/isa/rtc.h,v
retrieving revision 1.16.2.1
diff -u -p -r1.16.2.1 rtc.h
--- sys/isa/rtc.h   23 Sep 2011 00:51:37 -  1.16.2.1
+++ sys/isa/rtc.h   9 Jan 2012 22:04:12 -
@@ -117,6 +117,7 @@ extern  int atrtcclock_disable;
 intrtcin(int reg);
 void   atrtc_restore(void);
 void   writertc(int reg, u_char val);
+void   atrtc_nmi_enable(int enable);
 #endif
 
 #endif /* _I386_ISA_RTC_H_ */
Index: sys/x86/isa/atrtc.c
===
RCS file: /local/base/FreeBSD-CVS/src/sys/x86/isa/atrtc.c,v
retrieving revision 1.13.2.1
diff -u -p -r1.13.2.1 atrtc.c
--- sys/x86/isa/atrtc.c 23 Sep 2011 00:51:37 -  1.13.2.1
+++ sys/x86/isa/atrtc.c 9 Jan 2012 22:04:12 -
@@ -55,28 +55,59 @@ __FBSDID($FreeBSD: src/sys/x86/isa/atrt
 #defineRTC_LOCKmtx_lock_spin(clock_lock)
 #defineRTC_UNLOCK  mtx_unlock_spin(clock_lock)
 
+/* atrtcclock_disable is set to 1 by apm_attach() or by hint.atrtc.0.clock=0 */
 intatrtcclock_disable = 0;
 
-static int rtc_reg = -1;
-static u_char  rtc_statusa = RTCSA_DIVIDER | RTCSA_NOPROF;
-static u_char  rtc_statusb = RTCSB_24HR;
+static int use_iodelay = 0; /* set from hint.atrtc.0.use_iodelay */
+
+#define RTC_REINDEX_REQUIRED  0xffU
+#define NMI_ENABLE_BIT0x80U
+
+static u_char nmi_enable;
+static u_char rtc_reg = RTC_REINDEX_REQUIRED;
+static u_char rtc_statusa = RTCSA_DIVIDER | RTCSA_NOPROF;
+static u_char rtc_statusb = RTCSB_24HR;
+
+/*
+ * Delay after writing to IO_RTC[+1] registers.  Modern hardware doesn't
+ * require this expensive delay, so it's a tuneable that's disabled by default.
+ */
+static __inline void
+rtc_iodelay(void)
+{
+   if (use_iodelay)
+   inb(0x84);
+}
 
 /*
  * RTC support routines
+ *
+ * Most rtc chipsets let you write a value into the index register and then 
each
+ * read of the IO register obtains a new value from the indexed location. 
Others
+ * behave as if they latch the indexed value when you write to the index, and
+ * repeated reads keep returning the same value until you write to the index
+ * register again.  atrtc_start() probes for this behavior and leaves rtc_reg
+ * set to RTC_REINDEX_REQUIRED if reads keep returning the same value.
  */
 
+static __inline void
+rtcindex(u_char reg)
+{
+   if (rtc_reg != reg) {
+   if (rtc_reg != RTC_REINDEX_REQUIRED)
+   rtc_reg = reg;
+   outb(IO_RTC, reg | nmi_enable);
+   rtc_iodelay();
+   }
+}
+
 int
 rtcin(int reg)
 {
u_char val;
 
RTC_LOCK;
-   if (rtc_reg != reg) {
-   inb(0x84);
-   outb(IO_RTC, reg);
-   rtc_reg = reg;
-   inb(0x84);
-   }
+   rtcindex(reg);
val = inb(IO_RTC + 1);
RTC_UNLOCK;
return (val);
@@ -87,14 +118,9 @@ writertc(int reg, u_char val)
 {
 
RTC_LOCK;
-   if (rtc_reg != reg) {
-   inb(0x84);
-   outb(IO_RTC, reg);
-   rtc_reg

Re: backup BIOS settings

2012-01-09 Thread Ian Lepore

On Tue, 2012-01-10 at 04:01 +0100, Łukasz Kurek wrote:
 Hi,
 Is it possible to backup BIOS settings (CMOS configuration) to file and 
 restore this settings on the other machine (the same hardware configuration 
 and the same BIOS)?
 
 I try do it for this way:
 
 kldload nvram
 
 dd if=/dev/nvram of=nvram.bin   (backup)
 
 dd if=nvram.bin of=/dev/nvram   (restore)
 
 
 but this way always load default BIOS settings, not my (probably there is 
 some kind of error).

Examine the contents of the nvram.bin file with hexdump.  If every byte
has the same value, I just posted a patch to this list earlier today
(subject is trouble with atrtc) that will fix the problem.

Many new RTC chipsets have more than the original 114 bytes of nvram.
The nvram driver doesn't currently provide access to the extra banks.
I'm not sure whether the BIOS would store anything in those other banks,
but if so, failing to save and restore those values might cause the
behavior you see.

Also, it's not directly related to your question, but I notice the
nvram(4) manpage says the driver does nothing about the checksum, but
looking at the driver code, it does recalculate the checksum when it
writes to nvram.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: backup BIOS settings

2012-01-09 Thread Ian Lepore

On Tue, 2012-01-10 at 04:01 +0100, Łukasz Kurek wrote:
 Hi,
 Is it possible to backup BIOS settings (CMOS configuration) to file and 
 restore this settings on the other machine (the same hardware configuration 
 and the same BIOS)?
 
 I try do it for this way:
 
 kldload nvram
 
 dd if=/dev/nvram of=nvram.bin   (backup)
 
 dd if=nvram.bin of=/dev/nvram   (restore)
 
 
 but this way always load default BIOS settings, not my (probably there is 
 some kind of error).

Oh wait, the patch I posted can't help, because it fixes a problem that
only happens when you read the same location repeatedly, and the nvram
driver never does that.  But it would still be interesting to examine
the nvram.bin file and see if it looks reasonable.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

1 2 >

1 - 100 of 102 matches

Mail list logo