Re: general Darwin imports (was Re: Darwin cmd import?)

2004-06-02 Thread Robert Watson
On Wed, 2 Jun 2004, Michael W. Lucas wrote:

 On Sat, May 29, 2004 at 07:55:21PM -0400, Robert Watson wrote:
  The FreeBSD Core Team took a look at the APSL a while back, and decided
  that similar to LGPL/GPL, it was an acceptable license for use in
  userspace for stand-alone tools, but that similar protections to LGPL/GPL
  would be required for kernel code (not built by default, carefully marked,
  etc).  That said, Apple tends to release only code they've heavily
  rewritten or created from scratch under APSL; code they modify tends to
  remain under the existing license (CMU, BSD, etc).  Generally they're
  careful to label the license on the download page.
 
 I'm writing an article about Apple's licensing and returning code to the
 community, but if you want to become a committer read this: 
 
 Apple has made a lot of improvements to various FreeBSD utilities, and
 re-released them under the original licensing.  This provides an
 excellent source of patches. 
 
 People may gripe about Apple not returning stuff to the open source
 community.  The truth is, they have.  They aren't responsible for
 converting what they return into a format we can use, but they haven't
 deliberately obfuscated their code.  Sorting out the diffs would be a
 pain, but not horribly difficult. 
 
 According to Jordan Hubbard, the best source of low-hanging fruit is
 their modified libc.  They've had people work out all sorts of bugs,
 clean up functions, performance improvements, etc.  Libc changes require
 extensive testing.  They also have wide-reaching benefits.  It's still
 BSDL'd, so we can take back whatever we want.
 
 If you want a commit bit, go and pick some of this fruit and send-pr it. 

I would also add that Apple has worked hard to improve their interaction
on the open source licensing front.  APSLv2 is a dramatic improvement over
APSLv1.  They've also been working internally to improve their ability to
return changes under non-APSL licenses, and recently released several new
components in the new Darwin drop under the Berkeley at my request.  There
are some areas where I don't think we'll see any license movement (HFS+,
for one thing), but there are other areas where (at least from the
outside) it appears Apple recognizes the benefit of widespread use of the
code, community participation, etc.  And I'm happy for us to prove Apple
right by adopting their pieces in sensible ways, improving them, and
pointing them at the improvements.  :-) 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Darwin cmd import?

2004-05-29 Thread Robert Watson

On Fri, 28 May 2004, Cyrille Lefevre wrote:

 regarding the APSL (http://www.opensource.apple.com/apsl/), do you think
 it is possible to import some darwin commands w/ mods. 
 
 for instance, I thing to decomment and relpath from bootstrap_cmds, sadc
 and sar from system_cmds, and maybe some others in the future. 
 
 also, how about to import NetBSD shlock ? and CMU md (a sort of mkdep in
 C) ? 
 
 PS : decomment and relpath only need some mods while sadc would need a
 large amount of mods, don't know about sar. 

The FreeBSD Core Team took a look at the APSL a while back, and decided
that similar to LGPL/GPL, it was an acceptable license for use in
userspace for stand-alone tools, but that similar protections to LGPL/GPL
would be required for kernel code (not built by default, carefully marked,
etc).  That said, Apple tends to release only code they've heavily
rewritten or created from scratch under APSL; code they modify tends to
remain under the existing license (CMU, BSD, etc).  Generally they're
careful to label the license on the download page.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: implications of SMP kernel on UP

2004-04-02 Thread Robert Watson

On Thu, 1 Apr 2004, Bjoern A. Zeeb wrote:

 what are the implications on running an SMP enabled kernel on a UP
 machine ? 
 
 I first thought of things like:
 - performence (most likely not worth the discussion ?)
 - additional locking problematic ?
 - ... ?
 
 Or asked the other way round: why would I want to disable SMP on a
 kernel that is going to run on a UP machine ? 

I've observed substantial performance overhead from enabling SMP on UP
boxes.  However, increasing numbers of UP boxes ship with HTT, blurring
the picture a little.  There are at least two issues associated with
enabling SMP on UP boxes:

- First, we use the IO APIC, which has caused compatibility problems with
  some systems (likely actually ACPI problems?), as well as a slightly
  higher cost to interrupt handling.

- Second, we use locked operations for locks, increasing their cost.

I've spent a bit of time trimming some gratuitous locking from the system
call path, so it should actually be a bit better than it was previously,
but the upshot is that if you want optimal performance on UP, you should
compile out both apic and SMP.  Peter and I have had conversations about
creating HAL modules that plug and play locking operations, optimized
copies, and so on for the kernel, and improving the run-time pluggability
of SMP (et al), but haven't made any progress.

It's worth noting, FYI, that we always compile modules with locked atomic
operations so that one module will work on UP and SMP...

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: UT2004?

2004-04-02 Thread Robert Watson

On Wed, 31 Mar 2004, Kris Kennaway wrote:

 On Thu, Apr 01, 2004 at 03:26:27PM +0930, Daniel O'Connor wrote:
  Has anyone got it installed under FreeBSD?
  I got the demo to run and install pretty well (for some reason I can't play it 
  in KDE, I have to drop back to twm otherwise my system hangs), but the full 
  game doesn't install :(
  
  I have tried both the DVD edition and the 6 CD version.. It doesn't appear to 
  detect that I have mounted a new disk and so I can't get past installing the 
  first disks worth of stuff.
  
  I run the installer like so 
  sudo /compat/linux/bin/sh /cdrom/linux-installer.sh
  
  and pick /usr/local/ut2004 as the place to install it.
  
  I have ktrace'd it and when I click 'Yes' on the CDROM prompt it only seems to 
  try and open fstab and mtab. It ends up with a FreeBSD fstab 
  and /compat/linux/etc/mtab which is a zero length file.
 
 Is it expecting /compat/linux/etc/mtab to be updated somehow when you
 mount the new disk? 

linprocfs exports an mtab file from the kernel that's appropriate for use
as a substitute, I believe.  You can try symlinking etc/mtab to
procfs/mtab in the linux namespace.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: implications of SMP kernel on UP

2004-04-02 Thread Robert Watson

On Thu, 1 Apr 2004, Thierry Herbelot wrote:

 Le Thursday 01 April 2004 09:10, Bjoern A. Zeeb a écrit :
  Hi,
 
  what are the implications on running an SMP enabled kernel on a UP
  machine ?
 
  I first thought of things like:
  - performence (most likely not worth the discussion ?)
 
 I got an improvement with a factor of ten between an SMP and a UP kernel
 on a HTT-enabled P4/2,6GHz/800MHz FSB on network transfers (with gigabit
 Ethernet boards : SMP gives about 6MB/s for FTP transfer rate, and UP
 gives up to 75MB/s) 
 
 So : as long as the network stack is not fully locked (this is coming -
 perhaps for 5.3), a server should definitely run a UP kernel. 

I would instead phrase this as A kernel-bound network server may benefit
from running a UP server.  For compute-bound tasks, running SMP has
pretty dramatic effects :-).  It's also worth pointing out that in many
existing configurations, even with Giant over the network stack, we
already see performance benefits running 5.x with SMP over 4.x with SMP.
BTW, look for network locking patches coming to the arch@ mailing list in
the next couple of days to try out (subject to limitations).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Question regarding shell user creation at login time

2004-03-28 Thread Robert Watson

On Mon, 29 Mar 2004, Ganbold wrote:

 Hi,
 
 I traced sshd using ktrace and it says:
 ..
   10198 new  CALL  setuid(0)
   10198 new  RET   setuid -1 errno 1 Operation not permitted
   10198 new  CALL  execve(0x80485d0,0xbfbfed8c,0xbfbfed94)
   10198 new  NAMI  /home/new/new.pl
   10198 new  RET   execve -1 errno 13 Permission denied
   10198 new  CALL  exit(0x)
 .

Don't you mean to be running /home/new/new instead?  new.pl isn't world
readable/executable. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


 
 My C program is:
 
 #include unistd.h
 
 main(ac, av)
  char **av;
 {
  setuid(0);
  execv(/home/new/new.pl,av);
 }
 
 Directory:
 
 public# ls -la ~new
 total 46
 drwxr-xr-x  2 root  wheel512 Mar 29 09:10 .
 drwxr-xr-x  8 root  wheel512 Mar 25 15:28 ..
 -r--r-  1 root  new  767 Mar 24 17:43 .cshrc
 -r--r-  1 root  new  248 Mar 26 12:32 .login
 -r--r-  1 root  new  158 Mar 24 17:43 .login_conf
 -r--r-  1 root  new  373 Mar 24 17:43 .mail_aliases
 -r--r-  1 root  new  331 Mar 24 17:43 .mailrc
 -r--r-  1 root  new  797 Mar 24 17:43 .profile
 -r--r-  1 root  new  276 Mar 24 17:43 .rhosts
 -r--r-  1 root  new  975 Mar 24 17:43 .shrc
 -rwsr-x---  1 root  new 4651 Mar 26 08:47 new
 --  1 root  wheel 94 Mar 26 08:47 new.c
 -r-x--  1 root  wheel  15430 Mar 25 15:16 new.pl
 -rw-r--r--  1 root  wheel 52 Mar 25 16:52 new.sh
 
 
 Can somebody tell me the reason why it is failed?
 
 Thanks in advance,
 
 Ganbold
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: usermode linux on BSD?

2004-03-10 Thread Robert Watson

On Wed, 10 Mar 2004, David Gilbert wrote:

 Has anyone made an attempt to run usermode linux on FreeBSD?  Is the
 issue-list long? 

There was a neat paper at BSDCon 2003 discussing running usermode FreeBSD
on Linux, and it talked about what would be necessary to make usermode
FreeBSD run on FreeBSD.  You can find the paper off the USENIX web site,
or perhaps via Google.  I think it was a relatively small set of changes.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


 
 Dave.
 
 -- 
 
 |David Gilbert, Independent Contractor.   | Two things can only be |
 |Mail:   [EMAIL PROTECTED]|  equal if and only if they |
 |http://daveg.ca  |   are precisely opposite.  |
 =GLO
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: a serious error in sched_ule.c?

2004-03-09 Thread Robert Watson

On 9 Mar 2004, Bin Ren wrote:

 Hi, all:
 
 I've been reading sched_ule.c and seem to find a serious error:
 
 in 'sched_slice()':
 
  * Rationale:
  * KSEs in interactive ksegs get the minimum slice so that we
  * quickly notice if it abuses its advantage.
 
 Then, there is:
 
 if (!SCHED_INTERACTIVE(kg)) {
 .
 .
 } else
 ke-ke_slice = SCHED_SLICE_INTERACTIVE;
 
 Then, at the beginning of the file, there is:
 
 #define SCHED_SLICE_INTERACTIVE (slice_max)
 
 
 (slice_max) for interactive KSEs Either this is a serious mistake or 
 I'm seriously missing sth here.

I believe this is a synchronization error in the comment and the code. 
The code was changed to provide a maximum slice to interactive
applications because non-CPU intensive X11 applications will be marked as
interactive, but redraws get interrupted in a short slice.  When the
change went in to increase the time slice I saw an observable improvement
in the redraws of X11 apps under load.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Strange problem with vnodes and sockets

2004-03-07 Thread Robert Watson

On Sun, 7 Mar 2004, Kiss Tibor wrote:

 I want to create a small kernel module which logs the socket operations.
 So in my module I have a socket structure, and i want to know which
 process (thread) owns that. I try to solve this problem by this way: 

Sockets, as with files, can be referenced by more than one process at a
time.  While there is only one process that has created any given socket,
references to the socket can be inherited by processes forked from it, as
well as passed using UNIX domain sockets.  As such, there really isn't a
notion of owner.  so_cred is a cached referenced to the process
credential of the process that created the socket...

 So how can the v_type 2048? v_type is an enum (vnode.h) with 10
 options:  enum vtype { VNON, VREG, VDIR, VBLK, VCHR, VLNK, VSOCK,
 VFIFO, VBAD }; 
 
 And the real problem is: why don't find that code any VSOCK type vnode
 in the active process list? And how can i find the proc struct for a
 socket? :) 

VSOCK vnodes are rendezvous points for UNIX domain socket communication,
not the actual communication vehicles themselves.  Very few UNIX domain
sockets are used in normal operation, but you might take a look at
/var/run/log, and the file descriptors that referenced various sockets to
the log subsystem.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Standard sbc and pcm support in GENERIC kernel?

2004-03-04 Thread Robert Watson

On Wed, 3 Mar 2004, Randy Pratt wrote:

 On Wed,  3 Mar 2004 17:03:40 +0100 (CET) you wrote:
 
  I've been on the question list for some time, and I have noticed
  that many people do not know how to get sound support up and
  running in FreeBSD 5.X. I know that re-compiling the kernel is easy
  enough, but there are still people not willing to do so, as I have
  noticed on the list. Therefor I thought it might be an idea to put
  sound support in the GENERIC kernel configuration, so that newbies
  will no longer find themselves stuck with that.
 
 I think I've read more than one time about problems fitting the
 installation on the 1.44M floppies. 
 
 Definitely a bikeshed discussion but adding to the documentation
 regarding kldload or a knob in sysinstall to turn on all sound modules
 is preferable to adding to the kernel. 

Actually, pcm was in GENERIC for a while, but was removed because it
caused hangs on boot with a common line of Dell Latitude notebooks at the
time.  The problem is likely now fixed, and I'd certainly not object to it
being in GENERIC as long as there are no similar widespread issues now.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Looking for static analysis tool to generate call graphs

2004-03-03 Thread Robert Watson

On Wed, 3 Mar 2004, Zajcev Evgeny wrote:

 Robert Watson [EMAIL PROTECTED] writes:
 
  Well, using a scary combination of grep, awk, a long list of omit this
  regexp's, and prcc from cflow, I got the following: 
 
  http://www.watson.org/~robert/freebsd/20040302-sockets.ps
 
 Actually it looks kind a mess.  Maybe use dot's clustering or ranking to
 organize callgraph a little? 

Part of that is because things are somewhat convoluted :-).

I've applied some of your suggestions, as well as a bit more
noise-trimming and clarification, to:

  http://www.watson.org/~robert/freebsd/20040303-sockets.ps

Hopefully this is somewhat of an improvement.  I also added some more of
the socketvar.h macros to my hinted edges -- apparently prcc doesn't
really do much with our large macros, and so I was missing some edges.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Looking for static analysis tool to generate call graphs

2004-03-03 Thread Robert Watson

On Wed, 3 Mar 2004, Dag-Erling Smørgrav wrote:

 Robert Watson [EMAIL PROTECTED] writes:
  Well, using a scary combination of grep, awk, a long list of omit this
  regexp's, and prcc from cflow, I got the following: 
 
  http://www.watson.org/~robert/freebsd/20040302-sockets.ps
 
  Duck and cover. 
 
 Hmm, is there any way you can try to group functions with similar names
 together?  For instance, functions whose names match /^fd.*/ call mostly
 eachother, and the graph would be a lot cleaner if they were placed
 close together. 

In the most recent revision, I've tried to assign the same rank and color
to certain classes of functions: 

  System Calls (accept, bind, close, connect, dup, ...)

  Protocol Switch (pru_accept, pru_attach, pru_bind, pr_ctloutput, ...)

  File Descriptor Switch (fo_read, fo_write, fo_poll, ...)

  Socket File Descriptor Functions (soo_read, soo_write, ...)

In addition, I assigned the same color to certain classes of functions:

  Almost System Calls (kern_bind, kern_connect, accept1, ...)

  Protocol Upcalls to Socket Layer (soisdisconnected, soisdisconnected, ...)

I'm going to experiment with grouping later today.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Looking for static analysis tool to generate call graphs

2004-03-02 Thread Robert Watson
On Mon, 1 Mar 2004, Robert Watson wrote:

 I'd like to generate static call graphs from sections of src/sys/kern,
 src/sys/net, and src/sys/netinet, and ideally, get an output that looks
 pretty when printed to a (perhaps large) piece of paper.  It doesn't
 need to be able to handle function pointer magic in structures (vnode
 operations, socket operations, file descriptor operations, sysinits,
 etc); I just want a fairly high-level graph to get a feel for particular
 chunks of code spanning a couple of C files.  Anyone have any
 recommendations? Preferably something that can actually parse the
 variant of C we use in our kernel :-).

Well, using a scary combination of grep, awk, a long list of omit this
regexp's, and prcc from cflow, I got the following: 

http://www.watson.org/~robert/freebsd/20040302-sockets.ps

Duck and cover. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Looking for static analysis tool to generate call graphs

2004-03-01 Thread Robert Watson

I'd like to generate static call graphs from sections of src/sys/kern,
src/sys/net, and src/sys/netinet, and ideally, get an output that looks
pretty when printed to a (perhaps large) piece of paper.  It doesn't need
to be able to handle function pointer magic in structures (vnode
operations, socket operations, file descriptor operations, sysinits, etc); 
I just want a fairly high-level graph to get a feel for particular chunks
of code spanning a couple of C files.  Anyone have any recommendations? 
Preferably something that can actually parse the variant of C we use in
our kernel :-). 

Thanks,

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: em0, polling performance, P4 2.8ghz FSB 800mhz

2004-02-29 Thread Robert Watson

On Sun, 29 Feb 2004, Mike Silbersack wrote:

 On Sat, 28 Feb 2004, Don Bowman wrote:
 
  this would only allow 2 concurrent TCP sessions per unique
  source address. Depends on the syn flood you are expecting
  to experience. You could also use dummynet to shape syn
  traffic to a fixed level i suppose.
 
 Does that really help?  If so, we need to optimize the syncache. :(

Given that we have syncookie support, the other thing we could consider
doing under high syn load is simply to drop the syncache from the loop
entirely.  The syncache provides us with the ability to gracefully
degrade as the syn rate goes up, but the FIFO cache bucket overflow
handling means we pay the cost of syncache entry allocation even in the
high load situation.  It might be interesting to measure when syncache
overflow is taking place, and simply drop it from the loop under a rate
known to exceed the syncache capacity, then re-enable it again once the
rate drops.  This would remove a memory allocation, queue walking, and in
the case of an SMP system, locking, from the syn handling path.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Accessing sysctls from kernel

2004-02-26 Thread Robert Watson

On Thu, 26 Feb 2004, Bruce M Simpson wrote:

 On Thu, Feb 26, 2004 at 02:10:40PM +0100, Ivan Voras wrote:
  In sys/sys/sysctl.h I see function kernel_sysctlbyname() that looks (to 
  me) to be intended for accessing sysctl values from kernel, but for it's 
  first parameter it requires a struct thread *td.
  
  What should I pass to it? (I'm calling it from inside a screensaver module)
 
 You could try lying about which thread you are, when you aren't in a
 userland thread: 
 
 Cscope tag: kernel_sysctlbyname
#   line  filename / context / line
1728  /sys/dev/vinum/vinumio.c vinum_scandisk
  error = kernel_sysctlbyname(thread0, kern.disks, NULL,
2741  /sys/dev/vinum/vinumio.c vinum_scandisk
  kernel_sysctlbyname(thread0, kern.disks, devicename,
3305  /sys/i386/i386/elan-mmcr.c init_AMD_Elan_sc520
  i = kernel_sysctlbyname(thread0, machdep.i8254_freq,

FWIW, the thread exists in the context of a sysctl for several reasons --
one is to provide access to the requesting process's address space,
another is the credential authorizing the change.  While there are calls
kernel_sysctl() and kernel_sysctlbyname(), those are generally intended
for consumption on behalf of a user process.  My general preference
would be to offer an in-kernel API to manage whatever service is being
accessed if it's being done in the kernel on behalf of the kernel,
rather than trying to force the access through the current sysctl MIB.
That way you don't find unnecessary references to thread0, etc, which have
some dubious locking properties, as well as abuse of credentials, etc,
that may have unexpected side effects with less traditional security
models. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.2.1 make installword trashed my system

2004-02-26 Thread Robert Watson

On Thu, 26 Feb 2004, db wrote:

 I've been running 5.2 on my laptop (i386 Acer)  for some time now. Few
 hours ago I download the 5.2.1-release source, buildworld, buildkernel,
 installkernel, but after a few minuts of installworld my system froze.
 Now when I try to boot I get: 

Ouch.

 Mounting root from ufs: /dev/ad0s1a
 WARNING: / was not properly dismounted
 exec /sbin/init: error 8
 /sbin/sysctl: 1: Syntax error: ; unexpected
 /sbin/sysctl: 1: Syntax error: ; unexpected
 
 Thu Feb 26 16:24:26 CET 2004
 Feb 26 16:24:26 init: can't exec getty '/usr/libexec/getty' for port
 /dev/ttyv0: No such file or directory
 and so on
 
 So the question is: Now what? I can boot in single user mode and get a
 shell, but very few programs work and I can't mount anything? 

It seems there might be three fairly straight-forward choices, not sure
which are options for you: 

(1) Download the 5.2.1 ISO, and do a binary update, which will simply slap
down the 5.2.1 binaries over whatever is on the disk (a pretty blend
of 5.2 and 5.2.1, no doubt). 

(2) In general, the 5.2 and 5.2.1 binaries are about the same in userspace
-- most changes were to the kernel, and no ABIs were changed.  So it
sounds like maybe init got toasted, a few shared libraries, etc.  Boot
to single-user mode using /rescue/init and/or /rescue/sh, and manually
update binaries from your object tree until you can successfully kick
off an installworld.  You can use /rescue/cp to do most of this.  I'd
start by copying /usr/obj/usr/src/sbin/init to /sbin/init, and hitting
a couple of the key libraries (libc, libutil, for example).  As soon
as you can boot normally to single user mode and use the existing
tools, restart installworld.  By looking at the dates in /bin, /sbin,
etc, you can probably figure out where it gave up, and what only got
partially completed.

(3) If you can NFS boot the system, perhaps using PXE, you can do an
installworld over the network.  This is generally easy if you already
have a PXE setup, but otherwise hard as you have to figure out PXE.

None of this addresses the hang you saw -- once you've gotten the system
up and running properly, if you are still experiencing the hang, we should
see if we can figure out what it is that's hanging.  However, that will be
very hard to do in the partially updated configuration, so I think the
best bet is to try and get the update finished.

Good luck!

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: use after free bugs

2004-02-21 Thread Robert Watson
On Fri, 20 Feb 2004, John Baldwin wrote:

 On Thursday 19 February 2004 08:43 pm, Ted Unangst wrote:
  Hi.  These are some bugs found by Coverity in a static analysis run on the
  FreeBSD kernel.  All these are use after free bugs.
 
 Thanks for the excellent bug reports! 

I wonder if the same approach relating to memory allocation and free
checking via static analysis could be applied to locking and unlocking of
locks?  I.e.:

- We don't release locks more than once.

- We don't forget to unlock.

- We hold a lock before accessing certain fields (defined by annotation)
  of a structure.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: malloc backed md/mfs filesystem swapped?

2004-02-13 Thread Robert Watson

On Fri, 13 Feb 2004, Andrew J Caines wrote:

 After Ring the various FMs including, but not limited to, mdmfs(8),
 mdconfig(8) malloc(9), I am unclear whether of not the memory used by md
 of type MD_MALLOC is kernel memory which will not be swapped, or not. 
 
 On the same subject, does the the MD_SWAP backed device simply use
 swapable userland VM or does it specifically use a piece of the
 (presumably) disk backed swap partition? 
 
 FYI, the relevant fstab entries for a malloc backed disk having a UFS2
 with softupdates and async would look like: 

Malloc-backed md devices will be backed by unpageable kernel address
space, and doing this with anything but a very small virtual disk will
result in a kernel panic once the pages are allocated and the rest of the
kernel runs out of address space and memory.  Swap-backed md devices will
be backed by pageable memory, but I'm not sure what the practical limits
(if any)  are for address space concerns.  In general, I use malloc-backed
disks only for diskless systems, and then, only in a sparing way.  If you
have swap available, you pretty much always want to use swap-backing for
memory disks -- if there's room in memory they will run as fast as
malloc-backed, but you don't have to be as worried about the Oh shoot,
I'm out of room case.  I use a pretty large swap-backed file system for
/tmp on almost all of my production systems, since swap is cheap, and most
of the time so is memory. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

 
 md/tmpmfs rw,-M,-s128m,async  2   0
 md/var/runmfs rw,-M,-s1m,async2   0
 
 
 -Andrew-
 -- 
  ___
 | -Andrew J. Caines-   Unix Systems Engineer   [EMAIL PROTECTED]  |
 | They that can give up essential liberty to obtain a little temporary |
 |  safety deserve neither liberty nor safety - Benjamin Franklin, 1759 |
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: malloc backed md/mfs filesystem swapped?

2004-02-13 Thread Robert Watson

On Sat, 14 Feb 2004, Colin Percival wrote:

 At 00:56 14/02/2004, Robert Watson wrote:
 If you
 have swap available, you pretty much always want to use swap-backing for
 memory disks -- if there's room in memory they will run as fast as
 malloc-backed, but you don't have to be as worried about the Oh shoot,
 I'm out of room case.
 
Actually, there is one consideration: swap-backed memory disks have a
 sector size equal to the machine page size.  This will result in some
 inflation in memory usage, and can confuse program which expect a sector
 size of 512 bytes (for example, dd, which I plan on fixing but I haven't
 gotten around to yet). 

One such application is Vinum, actually, which does not like using
swap-backed storage nodes, although maybe I fixed that. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kernel threads

2004-01-29 Thread Robert Watson

On Wed, 28 Jan 2004, Julian Elischer wrote:

 the KSE stuff requires too much assistance from teh Userland Thread
 scheduler. 
 
 HOWEVER it is possible that kthreads may one day be implemented as
 multiple threads of a single kernel process..  (but not yet) 

John has been talking about doing this for a while -- clustering the
kernel threads into a smaller number of kernel processes or a single
kernel process.  This is the approach Darwin takes as well, FWIW -- they
have a kernel_task in which all the various kernel threads hang out, which
avoids the overhead of full processes, as well as the emotional baggage. 
I think I saw John put it on his TODO list in Perforce, so maybe it's
coming soon :-). 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD Status Report for Oct-Dec 2003

2004-01-29 Thread Robert Watson

On Thu, 29 Jan 2004, Danny Braniss wrote:

 thanks!  with so much garbage/software/noise around it's difficult to
 see the gems.  and hearing from first hand is very important.  true also
 that google hit it first, but you provided the missing link. 

If you want to peruse the FreeBSD perforce server, you can visit:

 http://perforce.freebsd.org/

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


 
 danny
 
  www.perforce.com
  
  Simply put, Perforce is a source control management tool that makes
  that is very oriented towards easily managing multiple development
  streams and easily integrating changes between them.  Whereas branching
  in CVS is expensive and hard to manage, Perforce makes it very, very
  easy.  So it's an ideal tool for managing lots of parallel projects
  that may or may not be related.
  
  Scott
  
 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kernel threads

2004-01-27 Thread Robert Watson

On Tue, 27 Jan 2004, Renaud Molla Wanadoo wrote:

 I'm trying to use the kthread library under 5.2-RELEASE but can't
 compile my program (which actually only tries to create a thread). 
 
 I've read that there is now KSE to create kernel threads, but i am
 wondering if it could be used within the kernel code. 

I'm left a little unclear by your message what it is you're trying to do. 
In traditional parlance, a kernel thread is a thread executing kernel
code in the kernel.  These are created using the kthread(9) API, which is
available both to kernel modules and code compiled directly into the
kernel.  You can see examples of kthread use (both compiled in and in
modules) by grepping in the src/sys/kern and src/sys/dev/* trees.  The
only real caveat here that I know if is that you need to grab the Giant
lock if your thread will use it, since kthreads don't start holding Giant,
and that if you call kthread_exit(), you will need to grab Giant before
that. 

A use of kernel thread popularized by linux is the idea of userspace
threads that are backed by a kernel schedulable thread, as opposed to
multiple userspace threads being mapped into a single thread making up a
single process. In FreeBSD 5.x, the libc_r library provides multiple
user threads multiplexed onto a single kernel-visible thread/process.
libkse and libthr provide M:N and 1:1 models.  By linking your
application against libkse or libthr and using the pthreads API, you will
automatically get parallelism and latency improvements over libc_r.

Hope this helps. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-25 Thread Robert Watson

On Sun, 25 Jan 2004, Mike Silbersack wrote:

 On Sat, 24 Jan 2004, Robert Watson wrote:
 
  To pick up the corrupted packet on the machine where the corruption is
  occurring, you might want to try hooking up the UDP checksum drop case to
  BPF_MTAP() for a special BPF device or rule, or have it spit them into a
  raw socket (probably easier).
 
 He said that the packet's checksum passes, but it is corrupt, so this
 won't work. 

I may have misread: my reading was that the if_xl card marks the packet as
having passed the checksum test, but if you let the OS do the checksum,
the checksum fails.  I.e., either the hardware checksumming is broken, or
the data is corrupted between when the hardware does the checksum, and it
reaches the OS buffer.  As such, Sam's patch works because it tells the OS
to ignore the checksum results from the hardware (although it doesn't
disable the checking of checksums), causing the OS to recalculate the
checksums and drop the packets rather than accepting them.  The goal of
the change I suggested would be to also do the checksums in the OS as
well, which allows you to detect the bad packets, but instead of dropping
them, funnel them aside for later analysis.   However, if I've misread,
sorry for the confusion!

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Robert Watson

On Fri, 23 Jan 2004, Matthew Dillon wrote:

 I tracked down an occassional buildworld failure on DragonFly to my
 XL driver, which is synchronized to 4.x's XL driver.

It would be very helpful if you could do the following:

(1) See if you can reproduce this using something other than NFS --
perhaps netperf using UDP_STREAM or the like, between that machine and
another machine.  This would give us a more reproduceable workload
than builds, and hopefully one that is less sensitive to things like
context switching, etc.

(2) See if you can reproduce this with a stock 4.9-RELEASE kernel (or
4-STABLE).  While the drivers are similar between 4.x and DFBSD, there
are actually quite a few structural changes in the DFBSD version.
Maybe it would make sense to try backing out the local DFBSD changes
to the base FreeBSD version, even if not trying a completely FreeBSD
system, to see if they are the cause.  It's difficult to diff the two
because of reorganization and style changes.

 [EMAIL PROTECTED]:6:0:   class=0x02 card=0x764610b7 chip=0x764610b7 rev=0x30 
 hdr=0x00

Does this card have a product name, or is it one of those chips embedded
in a motherboard without a separate name?

I took a look through the xl cards/chips on my various machines, and was
unable to find anything that had remotely the same card or chip ID.  I did
some high-volume packet flows between them with hardware checksumming
disabled and didn't see any corrupted UDP packets, but the workloads I'm
using sound pretty different.  Knowing it could be reproduced using a more
simple workload (and the specifics) would be good.

FYI, I checked the Linux driver for these cards, and didn't see mention of
any quirks for the particular chips/card you're using.  The only thing of
note in the Linux driver was the following:

/* Check the PCI latency value.  On the 3c590 series the latency timer
   must be set to the maximum value to avoid data corruption that occurs
   when the timer expires during a transfer.  This bug exists the Vortex
   chip only. */
if (pdev) {
u8 pci_latency;
u8 new_latency = (drv_flags  IS_VORTEX) ? 248 : 32;

pci_read_config_byte(pdev, PCI_LATENCY_TIMER, pci_latency);
if (pci_latency  new_latency) {
printk(KERN_INFO %s: Overriding PCI latency
timer (CFLT) setting of %d, new value is %d.\n,
   dev-name, pci_latency, new_latency);
pci_write_config_byte(pdev, PCI_LATENCY_TIMER, new_latency);
}
}

The rate at which you have failures sounds like it could be a similar
issue, however -- an occasional collision between a timer and DMA.  NFS is
often a mix of small RPCs handling lookups and attributes, and larger RPCs
carrying data.  Using netperf or a related tool might help you identify if
one of those is more likely to cause the failure. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Robert Watson

On Sat, 24 Jan 2004, Max Laier wrote:

 On Saturday 24 January 2004 17:06, Robert Watson wrote:
  On Fri, 23 Jan 2004, Matthew Dillon wrote:
   I tracked down an occassional buildworld failure on DragonFly to
   my XL driver, which is synchronized to 4.x's XL driver.
 
 FYI: This was reproduced on OpenBSD as well (w/ ftp and scp): 
 http://marc.theaimsgroup.com/?l=openbsd-techm=107494884327698w=2

Two thoughts on other things to try, with that in mind:

(1) Linux on the same hardware, see if whatever set of XL workarounds they
have addresses this specific problem.

(2) Try the NDIS driver with the NDIS-u-lator on FreeBSD 5.x and see if
that also has the problem.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Robert Watson

On Sat, 24 Jan 2004, Luigi Rizzo wrote:

 On Sat, Jan 24, 2004 at 01:38:37PM -0500, Robert Watson wrote:
 ...
  (2) Try the NDIS driver with the NDIS-u-lator on FreeBSD 5.x and see if
  that also has the problem.
 
 but going this way you have no idea on what the driver does, including
 enabling hw checksums. This looks like a useless test at least for the
 purpose of finding out what is going wrong

Actually, I'm more curious about whether it's a known errata/misbehavior
for the card that 3Com's drivers work around, or not.  The problem could
well be compleely unrelated to hardware checksuming per se -- the
corruption might well be taking place as the buffer is moved from the
card's buffer to the operating system managed buffer.  If the NDIS driver
doesn't illustrate the same problem, it tells us that by frobbing
appropriately, this problem can be worked around.  It also tells us that
by looking a bit harder at what the driver is doing (i.e., how it frobs
the hardware), we can learn something about the appropriate workaround. 
If it's a delay/timing issue, it's less likely we can learn something, but
if the NDIS driver is simply disabling hardware checksumming for specific
chipsets, that's something we should be able to figure out.  On the other
hand, if the NDIS driver shows the exact same problem, this might not be
an issue known to the vendor.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Robert Watson

On Sat, 24 Jan 2004, Luigi Rizzo wrote:

 On Sat, Jan 24, 2004 at 02:12:12PM -0500, Robert Watson wrote:
 ...
   but going this way you have no idea on what the driver does, including
   enabling hw checksums. This looks like a useless test at least for the
   purpose of finding out what is going wrong
  
  Actually, I'm more curious about whether it's a known errata/misbehavior
  for the card that 3Com's drivers work around, or not.  The problem could
  well be compleely unrelated to hardware checksuming per se -- the
  corruption might well be taking place as the buffer is moved from the
  card's buffer to the operating system managed buffer.  If the NDIS driver
  doesn't illustrate the same problem, it tells us that by frobbing
  appropriately, this problem can be worked around.  It also tells us that
  by looking a bit harder at what the driver is doing (i.e., how it frobs
  the hardware), we can learn something about the appropriate workaround. 
 
 yes, but how would you know that, short of reverse engineering the
 driver, or tracing I/O accesses to the hardware ?  It really looks like
 an overkill effort... I'd rather just try to debug the issue working on
 an open source driver, or dump the hardware altogether and replace it
 with something known to work... 

My understanding is that NDIS drivers rely on the HAL provided by NT to
perform hardware access, so you can generate I/O traces with relative
ease.  Decoding and following the HAL traces during card setup is probably
relatively straight forward, since presumably most of the I/O transactions
will match the documented services of the card.  It might be useful to add
some KTR support to Bill's NDIS pieces for this very purpose, if there's
interest.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Robert Watson

On Sat, 24 Jan 2004, Matthew Dillon wrote:

 Well, I tried to tcpdump a session.  I managed to hit the error three
 times but in all three cases the tcpdump on the server dropped the
 particular packet I was looking for.  I'm only able to get a 70%
 retention rate in the tcpdump output on the server... its just trying
 to record too much for the machine to handle at the rate the NFS requests
 are coming in.

To pick up the corrupted packet on the machine where the corruption is
occurring, you might want to try hooking up the UDP checksum drop case to
BPF_MTAP() for a special BPF device or rule, or have it spit them into a
raw socket (probably easier).

Problem is, the context switching does in BPF, so if you can get another
machine onto the segment without it being excessively switched (perhaps on
a monitor port), using a third machine to grab the on-the-wire packets
might work best.  That way you can compare pre-corruption and
post-corruption.

 I'm going to give up trying to characterize the corruption for now.
 It could very well be the PCI latency timer as previously discussed
 but I can't test that right now.

If it is the problem, it may be easier to do this and see if it works than
to track down the packet :-).

good luck...

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GEOM + Vinum

2004-01-21 Thread Robert Watson
On Wed, 21 Jan 2004, Lukas Ertl wrote:

  I step in. I complained bitterly about the rip-it-off-plans.

s/plans/proposal/

  I'm currently not able to help out coding, but I would gladly
  supply remote console access to a box suitable for vinum testing.
  (Including access to a local cvsup-/cvs-server, backup space etc.)
 
 Thanks for these offers!
 
 FWIW, there's now a new mailing list, freebsd-geom@, and we should move
 this thread over there. 

I'm really glad someone has picked up on this.  No one wants to see Vinum
users left behind, it's simply been a question of finding someone to bring
Vinum forward :-). 

(And once I remembered I had to use MailMan to subscribe to freebsd-geom
and not Majordomo, I was much happier :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: shutdown -p now

2004-01-21 Thread Robert Watson

On Wed, 21 Jan 2004, Liam Foy wrote:

 shutdown -p now is dependant upon hardware, and am 100% sure my hardware
 supports this; yet it still does not work. Must I have anything added to
 my kernel configuration or anything? 

What version of FreeBSD are you using?  Do you have ACPI enabled, if on
5.x?

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ip_input - chksum - why is it done so early in ip_input?

2004-01-17 Thread Robert Watson

On Sat, 17 Jan 2004, Andre Oppermann wrote:

  Besides that i'd like to add that FreeBSD has the fastest forwarding engine
  i've seen on any free OS. It's in my opinion a very suitable OS for
  routing/forwarding.
 
 We are working on it to make it even faster.  If you are using 5.2 or
 -current you get the first step of it by enabling
 net.inet.ip.fastfowarding.  This is a newly written fast path for packet
 forwarding. (Do not do this on 4.9 because that is the old ip_flow
 code). 

You can also enable debug.mpsafenet, which disables holding the Giant lock
over the forwarding path for supported ethernet drivers.  Unfortunately,
this option can't be used with KAME IPSEC or IPv6 yet, but can be used
with FAST_IPSEC.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-5.2

2004-01-15 Thread Robert Watson

On Thu, 15 Jan 2004, Matt Freitag wrote:

 Building 5.2-RELEASE from 5.1-RELEASE-p10 w/ipf+ipfw+ipfw6+dummynet, 5.1
 Compiled fine with this setup.  I need ipfilter as it's doing my source
 routing for ipv6 (multiple transits) since ip6fw doesn't support fwd. (I
 just use ip6fw for filtering, and ipf for forwarding to the correct
 interface according to source)  Am I just being stupid here somehow? 

IPFILTER now relies on the PFIL_HOOKS kernel option; this is something
that is somewhat poorly documented, and we should add it to the errate I
suspect.  If you add options PFIL_HOOKS to your kernel config, it should
work.  Moving to PFIL_HOOKS for all the funky IP input/ouput feature is
a goal for 5.3 (in fact, I believe Sam has it almost entirely done in one
of his development branches), and should both simplify the input/output
paths, and also simplify locking for the IP stack.  So the change is for a
good cause :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


 
 snip
 cc -c -O -pipe -mcpu=pentiumpro -Wall -Wredundant-decls -Wnested-externs 
 -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline 
 -Wcast-qual  -fformat-extensions -std=c99 -g -nostdinc -I-  -I. 
 -I../../.. -I../../../contrib/dev/acpica -I../../../contrib/ipfilter 
 -I../../../contrib/dev/ath -I../../../contrib/dev/ath/freebsd 
 -I../../../contrib/ngatm -D_KERNEL -include opt_global.h -fno-common 
 -finline-limit=15000 -fno-strict-aliasing  -mno-align-long-strings 
 -mpreferred-stack-boundary=2 -ffreestanding -Werror  
 ../../../contrib/ipfilter/netinet/ip_fil.c
 ../../../contrib/ipfilter/netinet/ip_fil.c: In function `fr_check_wrapper':
 ../../../contrib/ipfilter/netinet/ip_fil.c:319: `PFIL_OUT' undeclared 
 (first use in this function)
 ../../../contrib/ipfilter/netinet/ip_fil.c:319: (Each undeclared 
 identifier is reported only once
 ../../../contrib/ipfilter/netinet/ip_fil.c:319: for each function it 
 appears in.)
 ../../../contrib/ipfilter/netinet/ip_fil.c: In function `fr_check_wrapper6':
 ../../../contrib/ipfilter/netinet/ip_fil.c:329: `PFIL_OUT' undeclared 
 (first use in this function)
 cc1: warnings being treated as errors
 ../../../contrib/ipfilter/netinet/ip_fil.c: In function `iplattach':
 ../../../contrib/ipfilter/netinet/ip_fil.c:376: warning: unused variable 
 `ph_inet'
 ../../../contrib/ipfilter/netinet/ip_fil.c:378: warning: unused variable 
 `ph_inet6'
 machine/in_cksum.h: At top level:
 ../../../contrib/ipfilter/netinet/ip_fil.c:317: warning: 
 `fr_check_wrapper' defined but not used
 ../../../contrib/ipfilter/netinet/ip_fil.c:327: warning: 
 `fr_check_wrapper6' defined but not used
 *** Error code 1
 
 Stop in /usr/src/sys/i386/compile/funk.
 
 snip
 
 -mpf
 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-5.2

2004-01-15 Thread Robert Watson

On Thu, 15 Jan 2004, Eric Masson wrote:

  Robert == Robert Watson [EMAIL PROTECTED] writes:
 
  Robert Moving to PFIL_HOOKS for all the funky IP input/ouput
 
 Will all available packet filters, including ipfw rely on PFIL_HOOKS or
 not ? 

Yes; we to make it so that ipfw will also rely on PFIL_HOOKS to integrate
with the IP stack, greatly reducing the quantity of #ifdef FOO in
ip_input() and ip_output().

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_[usi]ticks from userland without kvm and procfs?

2004-01-14 Thread Robert Watson

On Wed, 14 Jan 2004, Ryan Beasley wrote:

 I'm poring over some code that uses the p_[usi]ticks counters inside of
 struct proc.  This is fine under 4.x where kinfo_proc includes a copy of
 proc, but is broken under 5.x since a commit 3 years ago that
 reorganized kinfo_proc. 
 
 So, outside of kvm and procfs, is there any user-kernel interface for
 getting to struct proc or just those counters?  (getrusage is kinda
 close except one can't lookup info about another process.  :|. ) 

libkvm uses two back-ends to retrieve information from the kernel: it can
either retrieve it using sysctl() on a live kernel, or using kvm access on
/dev/kmem or a core file.  Generally, using sysctl() is preferred for a
live kernel, as it requires no special privilege, and also lets the kernel
decide what data is revealed to the user application (i.e., hide processes
owned by other users).  The kernel function that generally exports process
information userspace access is sysctl_out_proc() in
src/sys/kern/kern_proc.c, which calls kill_info_proc() of
fill_kinfo_thread(), depending on a flag passed to sysctl. 

Those fields are now part of the thread definition as opposed to the proc
definition, and don't appear in the externalized structure in -current
(that I can tell).  A lot of process accounting and measurement changed
with the introduction of M:N threads (KSE), and some of the details
haven't yet been sorted out as part of the dust settling.  It could well
be that the fields are not currently maintained properly, and that the
functionality in the kernel needs to be fixed to measure them again
properly. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Filesystem marker.

2004-01-14 Thread Robert Watson

On Wed, 14 Jan 2004, David Gilbert wrote:

 Is there a set of bytes at some offset in a block that is common to any
 instance of a BSD ufs filesystem?  I ask because recently my home
 machine erased it's fdisk block _and_ the bsdlabel with it.  It
 certainly didn't have time to erase the whole disk, but I'm having
 trouble guessing where the partitions are. 
 
 /usr/ports/sysutils/gpart will look for partitions on a disk ... but it
 only knows to look for bsd disklabels ... not bsd filesystems.  Ideally,
 I'd like to make a bsd filesystem module for gpart with some pointers
 from the group. 

I ported the OpenBSD version of their scan_ffs to FreeBSD. However, it
only speaks UFS1:

  http://www.watson.org/~robert/freebsd/scan_ffs_freebsd4/

It might also require tweaking to even build on -CURRENT, as I haven't
lost any file systems recently enough to have needed to test.  One of the
nice things about this tool is that it can generate output that can then
be fed into disklabel to write the disklabel you need back to disk.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Gratituous ARP and the em driver

2004-01-13 Thread Robert Watson

On -1 xxx -1, Nielsen wrote:

 When I change IP addresses on my 'em' gigabit NIC, ARP isn't sent
 properly. This appears to be the problem in the following bug report,
 however i'm using the 'fixed' version of the em driver (in FreeBSD 4.9). 
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=54488
 
 Does anyone have any tips on how to get around this? 
 
 I'm building new systems with gigabit ethernet support and this problem
 keeps cropping up. I have a failover system, and when moving an IP alias
 between machines, the em NIC driver doesn't properly send out gratituous
 ARP, resulting in the IP being inaccessible. 
 
 - The problem does not occur when plugged into a 100BaseTX switch -
 FreeBSD 4.9p1 / em version 1.7.16 - Tried various gigabit switches.  -
 One other odd thing is that when configuring the NIC (ifconfig) the
 machine locks up for several seconds. 

If you run tcpdump on the machine to sniff the interface in question
looking for arp packets, does tcpdump see the gratuitous arp?  I'm
guessing that it does, and the lack of sending the arp is a result of
delays in negotiating on the wire.  Does this problem turn up only the
first time you raise the interface, or every time you change the IP
address on the interface? 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Future of RAIDFrame and Vinum (was: Future of RAIDFrame)

2004-01-12 Thread Robert Watson

On Mon, 12 Jan 2004, Mark Linimon wrote:

  If nothing happens, vinum is going to break even more very soon.
 
 No ... if you do a commit that changes the code assumptions upon which
 vinum was built, vinum will break.  vinum is not going to magically
 break by itself. 
 
 This gets back to a problem with the FreeBSD development model:  people
 who commit changes that break things in other parts of the system do not
 automatically get assigned the responsibility to fix them.  Now, there's
 no way to impose something like that requirement on a cooperative
 anarchy, so I am not playing the let's reorganize  card -- I think
 most of us would agree that that dog won't hunt as we say down around
 these parts. 
 
 But, in the real world of software engineering, He Who Breaketh It, Must
 Fixeth It. 

Well, actually, it's not quite that way.  The reality is that every major
component in a system needs someone with expertise in it, who is willing
and available to maintain the system across infrastructural changes. Vinum
has made it over smaller API bumps without a maintainer: the move to
devfs, etc.  However, to make it speak GEOM requires someone highly
familiar with Vinum, and with the time available to do it.  If we want to
enhance the architecture of FreeBSD for improvements in performance,
stability, and long-term maintainability, there will necessarily be
structural changes that require a distributed update of the system. 
FreeBSD is of sufficient size that no one person will be able to make this
sort of change alone, which is one of the important reasons to have a
software maintenance model that reflects that.  Our notion of software
maintainance could certainly use some further evolution, but I think there
are some existing intuitions.  Vinum is a highly complex software module,
and *must* have an active maintainer in order to survive structural
operating system changes. 

Greg has recently posted to arch@ saying What's the future of Vinum,
indicating an intent to continue to enhance Vinum, and received a number
of e-mails regarding how to adapt Vinum for GEOM.  I sent him an e-mail in
which I laid out a spectrum of possibilities, ranging from the minimalist
to a complete transformation of Vinum into a GEOM module.  Greg has
indicated he plans to work on Vinum further, so I think the best we can do
is provide support and encourgement.  The minimalist approach appears to
be viable (although there are some risks), and someone highly familiar
with Vinum (such as Greg) can probably make the changes in short order.
He's currently at a conference, but my hope is that when he gets back,
he'll evaluate some of the approaches we've described, and pick one.  I
think the right strategy is to follow the minimalist approach now (adopt
the disk(9) API, rather than having Vinum generate character devices) so
that swap works on Vinum again, and so that when UFS moves to speaking
GEOM there's no loss of functionality.  If we want to completely
reimplement Vinum, we should do that separately so as to avoid loss of
functionality during structural changes.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: diskless problems

2004-01-11 Thread Robert Watson

On Sun, 11 Jan 2004, Danny Braniss wrote:

 while the subject is being revived, there are some changes/additions I
 made to libstand/bootp.c, it exports all the dhcp tags so that they are
 available to rc.diskless? or rc.d/initdiskless via kenv check out
   ftp://ftp.cs.huji.ac.il/users/danny/freebsd/diskless-boot/
 
 these are a bit date, but the uptodated stuff is actively being used
 here, so if there is some interest i could update it. 

Sounds very interesting indeed.  Could you:

(1) Update it to bootp.c:1.5; this just removed 'register', and it looks
like you've already done that.

(2) Restore the original file style -- right now, it's a very hard to read
diff because you use different tabbing, function prototypes, etc, so
it's hard to isolate and read the changes.  There's probably some room
for style convergence (new function prototypes), but we have a
long-standing tradition of committing style and functional chaanges
separately so cvs diff is maximally useful between revisions.  It
looks like the main difference is that you use four space tabs, and
the original file uses real tab characters. 

Then if you could file a PR and drop me the PR number by e-mail, that
would be great.  I can do the style stuff if necessary, but I figured
since you're much more familiar with the changes, it might not be a bad
idea if you did it :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: MD(4) cleanups and unload lesson.

2004-01-11 Thread Robert Watson

On Sun, 11 Jan 2004, Pawel Jakub Dawidek wrote:

 With attached patch unloading md(4) module is possible.  It also cleans
 up big part of code according to style(9). 

Could you separate this into a functional diff and a style diff?  There's
a general preference to not combine them, as it means cvs diff between
revisions isn't useful for identifying functional changes (i.e., reviewing
for bugs when back-tracking, etc). 

Thanks,

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: diskless problems

2004-01-10 Thread Robert Watson
On Sat, 10 Jan 2004, Dag-Erling Smørgrav wrote:

 I'm trying to set up a VIA C3-based mini-ITX box for diskless boot using
 isc-dhcpd 3.0 from ports.  The kernel and modules load fine, but
 isc-dhcpd doesn't seem to answer the kernel's DHCP discover message. 
 
 The following is a tcpdump of the traffic the DHCP server sees.  I've
 removed the timestamps for legibility.  The DHCP server is on 10.0.0.6,
 the TFTP and NFS server is on 10.0.0.4, and the client is on 10.0.0.9. 
 
 0.0.0.0.68  255.255.255.255.67:  xid:0x64c4603d secs:4 flags:0x8000 [|bootp]
 0.0.0.0.68  255.255.255.255.67:  xid:0x64c4603d secs:4 flags:0x8000 [|bootp]
 arp who-has 10.0.0.4 tell 10.0.0.9
 10.0.0.9.68  255.255.255.255.67:  xid:0x3d60c464 file [|bootp]
 10.0.0.6.67  10.0.0.9.68:  xid:0x3d60c464 Y:10.0.0.9 S:10.0.0.4 file [|bootp] 
 [tos 0x10]
 10.0.0.9.68  255.255.255.255.67:  xid:0x3d60c464 file [|bootp]
 10.0.0.6.67  10.0.0.9.68:  xid:0x3d60c464 Y:10.0.0.9 S:10.0.0.4 [|bootp] [tos 0x10]

Can you send tcpdump -e output?

 at this point the kernel boots and prints
 
 Sending DHCP Discover packet from interface vr0 (00:40:63:c4:60:3d)

What kernel configuration are you using?  Are there multiple ethernet
devices in the system?  Normally if you're using pxeboot for diskless
booting, there's no need for the kernel or userspace to use DHCP: they
inherit the DHCP settings provided by the pxeboot loader using the kernel
environment.  When using PXE, there's no need for any special kernel
options, etc, you should just be able to use GENERIC. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: diskless problems

2004-01-10 Thread Robert Watson

On Sat, 10 Jan 2004, Dag-Erling Smørgrav wrote:

 Robert Watson [EMAIL PROTECTED] writes:
  Can you send tcpdump -e output?
 
 22:18:14.884745 0:40:63:c4:60:3d ff:ff:ff:ff:ff:ff 0800 590: 0.0.0.0.68  
 255.255.255.255.67:  xid:0x64c4603d secs:4 flags:0x8000 [|bootp]
 22:18:16.911162 0:40:63:c4:60:3d ff:ff:ff:ff:ff:ff 0800 590: 0.0.0.0.68  
 255.255.255.255.67:  xid:0x64c4603d secs:4 flags:0x8000 [|bootp]
 22:18:16.919251 0:40:63:c4:60:3d ff:ff:ff:ff:ff:ff 0806 60: arp who-has 10.0.0.4 
 tell 10.0.0.9
 22:18:17.134219 0:40:63:c4:60:3d ff:ff:ff:ff:ff:ff 0800 590: 10.0.0.9.68  
 255.255.255.255.67:  xid:0x3d60c464 file [|bootp]
 22:18:17.135119 8:0:2b:86:88:55 0:40:63:c4:60:3d 0800 348: 10.0.0.6.67  
 10.0.0.9.68:  xid:0x3d60c464 Y:10.0.0.9 S:10.0.0.4 file [|bootp] [tos 0x10]
 22:18:17.135621 0:40:63:c4:60:3d ff:ff:ff:ff:ff:ff 0800 590: 10.0.0.9.68  
 255.255.255.255.67:  xid:0x3d60c464 file [|bootp]
 22:18:17.136477 8:0:2b:86:88:55 0:40:63:c4:60:3d 0800 348: 10.0.0.6.67  
 10.0.0.9.68:  xid:0x3d60c464 Y:10.0.0.9 S:10.0.0.4 [|bootp] [tos 0x10]
 22:18:38.239936 0:40:63:c4:60:3d ff:ff:ff:ff:ff:ff 0800 1502: 0.0.0.0.68  
 255.255.255.255.67:  xid:0x0001 flags:0x8000 [|bootp] [ttl 1]
 
  What kernel configuration are you using?  Are there multiple ethernet
  devices in the system?
 
 I followed the advice from the diskless(8) man page.  There's only one
 interface, and tcpdump clearly shows that the DHCP server recieves a
 request but does not answer. 

I was a bit surprised to see 'vr0', since PXE is almost always used with
fxp drivers.

  Normally if you're using pxeboot for diskless
  booting, there's no need for the kernel or userspace to use DHCP: they
  inherit the DHCP settings provided by the pxeboot loader using the kernel
  environment.  When using PXE, there's no need for any special kernel
  options, etc, you should just be able to use GENERIC.
 
 I'll try again without the BOOTP options... 

Yeah.  Our PXE booting support isn't really the same as the traditional
diskless booting environment.  If we don't have a PXE manpage, we probably
should have one, since it's actually pretty easy to use.  I use PXE
booting extensively in my test environment, and it makes life much, much
easier.  I'm sure we have some worked examples posted around, but if not,
I can post the details of my configuration.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: diskless problems

2004-01-10 Thread Robert Watson

On Sat, 10 Jan 2004, Dag-Erling Smørgrav wrote:

 Robert Watson [EMAIL PROTECTED] writes:
  On Sat, 10 Jan 2004, Dag-Erling Smørgrav wrote:
   I'll try again without the BOOTP options... 
  Yeah.  Our PXE booting support isn't really the same as the traditional
  diskless booting environment.
 
 It works fine without the BOOTP options... 

Yeah, makes sense, although I sort of feel as though it should have worked
either way.  I've just committed some changes to the diskless(8)  man page
to indicate those options aren't needed with PXE. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Future of RAIDFrame

2004-01-10 Thread Robert Watson

On Sat, 10 Jan 2004, Scott Long wrote:

 I started RAIDframe three years ago with the hope of bringing a proven
 and extensible RAID stack to FreeBSD.  Unfortunately, while it was made
 to work pretty well on 4.x, it has never been viable on 5.x; it never
 survived the introduction of GEOM and removal of the old disk layer. 
 I'm coming to the conclusion that I really don't have the time to work
 on it in my spare time.  Also, I've seen next to zero interest in it
 from others, except for the occasional reminder that it doesn't work. 
 
 I still believe in it, and I still believe that it can be integrated
 into GEOM and become the all-singing-all-dancing raid engine for the OS. 
 It will probably never be an LVM stack, but I've also always believed
 that LVM and RAID are related but separate layers.  It can certainly
 build upon whatever LVM layer appears in GEOM.  All it needs is one or
 two other people to share some of the work and testing with me. 
 
 I have a Work-In-Progress for converting and integrating it into GEOM on
 my home Perforce server.  It hasn't been touched in several months and I
 really don't see myself being able to finish alone it in the near
 future.  Since it's been hanging over my head for so long, I'm very,
 very close to just removing it and moving on.  If anyone has the
 interest AND time available to help out with keeping it, please let me
 know ASAP. 

While I recognize the reality of time constraints and developers, I think
it might not be a bad idea (regardless of the outcome here) to import the
RAIDFrame bits into the FreeBSD Perforce server, so that it's available
for reference should anyone pick this up now (or in the future). 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Where is FreeBSD going?

2004-01-07 Thread Robert Watson

On Wed, 7 Jan 2004, Roman Neuhauser wrote:

 [1] has core@ considered subversion (devel/subversion)?

Everyone has their eyes wide open looking for a revision control
alternative, but last time it was discussed in detail (a few months ago?)
it seemed there still wasn't a viable alternative.  On the src tree side,
FreeBSD committers are making extensive use of a Perforce repository
(which supports lightweight branching, etc, etc), but there's a strong
desire to maintain the base system on a purely open source revision
control system, and migrating your data is no lightweight proposition. 
Likewise, you really want to trust your data only to tried and true
solutions, I think -- we want to build an OS, not a version control
system, if at all possible :-).  Subversion seems to be the current
favorite to keep an eye on, but the public release seemed not to have
realized the promise of the design (i.e., no three-way merges, etc).  You
can peruse the FreeBSD Perforce repository via the web using
http://perforce.FreeBSD.org/ -- it contains a lot of personal and small
project sandboxes that might be of interest. For example, we do all the
primary TrustedBSD development in Perforce before merging it to the main
CVS repository. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: switching between groups

2004-01-07 Thread Robert Watson

On Wed, 7 Jan 2004, Adil Katchi wrote:

 Unfortunately, newgrp(1) would not work, because it calls setgroups,
 which for some weird reason, needs the caller to be a superuser.  Isn't
 there a function that sets the groups (like setgroups) of the current
 process where you don't have to be a superuser?  To maintain security,
 that function could just check that the groups being set by setgroups
 are a subset of the caller's set.  Does a function like that already
 exist?  If not, how come? 

Groups are sometimes used for negative access control rights: i.e.,
permissions are set on a file so that users who should not be able to read
the file are in a group, and the group rights are less than the 'other'
rights.  If users can drop arbitrary groups, they can leave the group
excluding the rights.  This probleis more or less pronounced with ACLs,
depending on who you speak to: using negative rights is often a workaround
for not having ACLs, but with ACLs, you can add more than one group to a
file, and don't have to be a member of the group to add it... 

It does strike me that newgrp(1) seems less than useful without the setuid
bit... 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


 
 Thanks,
 
 Adil
 
 -Original Message-
 From: Bruce M Simpson [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, January 06, 2004 1:12 PM
 To: Adil Katchi
 Cc: '[EMAIL PROTECTED]'
 Subject: Re: switching between groups
 
 
 On Tue, Jan 06, 2004 at 11:14:06AM -0500, Adil Katchi wrote:
  I was just wondering if anyone has any ideas how it's possible for a user
  that belongs to multiple groups to somehow limit his or her own
 capabilities
  by using only one of the n groups that they belong to and be able to
 switch
  between these groups?  For example, if userA belongs to groupA, groupB and
  groupC, can userA enter a mode that would force it to only belong to
 groupA
  (or groupB, or groupC)?  UserA whould be able to switch between these
 groups
  and back to normal (ie. belong to all groups).
 
 newgrp(1) could be hacked to do this fairly easily. Currently it preserves
 supplemental group memberships. An option to discard supplementals could
 be added.
 
 Or just call setgroups() with a no-op group-list vector and then setgid()/
 setegid() from within your application.
 
 BMS
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Where is FreeBSD going?

2004-01-06 Thread Robert Watson

On Tue, 6 Jan 2004, Paul Robinson wrote:

 And therein lies a problem. The only thing any of the committers cares
 about is what they think. Got a problem? Submit a patch. Don't like the
 way things are done? Submit a patch. Don't like how such-and-such a util
 works? Submit a patch. 

While it's clearly the case that many people have met with the submit a
patch response, that's probably more a property of time constraints from
developers than a lack of desire to work with users to produce a system
users want.  Many FreeBSD developers find FreeBSD of particular appeal
because it gives them a chance to produce a system they've always wanted
to use: one that addresses the frustrations of many other systems out
there.  For example, a fair number of FreeBSD developers have their time
funded by Internet Service Providers who appreciate the scalability,
performance, and mangeability of FreeBSD when deployed on tens of
thousands of machines.  They bring changes to FreeBSD regularly reflecting
those needs.  Many FreeBSD developers do hang out in the public IRC
channels and try to answer questions, hang out on questions@, stable@,
etc.  Sometimes, you post a question and get the answer That doesn't work
yet, but we're looking for a few good developers..., but frequently, you
also get a patch and If you could try this and see if it helps with your
problem...  Obviously, the harder question you ask, the more likely
you'll get We're looking for a few good developers... :-). 

The marketting department of Microsoft may be able to keep their less
user-friendly developers from talking to their users, but many people
would argue that one of the greatest benefits of open source is increasing
that communication, even if it means the unwashed developers talk to real
people once in a while.  A great many developers pick FreeBSD to work on
because they're quite aware of what users of other systems have to deal
with, and want to produce a system people can use.  But no one is paying
the bills for hand-holding, so unless people step up to do the hand
holding (thanks greatly to those who do!) it's not going to happen.  We'd
appreciate your help in making it happen, if that's something that strikes
you as done wrong or poorly.  As with any commercial software development
enterprise, we also have limited resources, but unlike a commercial
software development enterprise, we can help involve a much larger
community in building and supporting a product.

 Personally, unless the madness around SMP, the 5- branch and various
 other bits are ironed out, I can see my next server deployment making
 use of DragonFly. At least they listen to people who don't submit
 patches due to the limitations of time/skill/whatever. No, I'm not a
 Matt fan - I like and respect most on -core and others. I just think 5-
 has got... well, it's all a bit out of hand really, isn't it? 

The reality is that operating system development takes a lot of time,
energy, and expertise.  We can't pull a next generation operating system
out of hats overnight -- it takes literally hundreds of man years of work
to do.  It's not something one, three, or even ten people can do alone. 
FreeBSD 5.x remains a work in progress, but has made a lot of progress in
the right direction.  I think what you think of as madness is a
necessary step on the path of a major engineering project.  I can't think
of any major project I've seen where at some point, people haven't taken a
pause for a breather saying Oh my god -- what have we gotten ourselves
into.  On the other hand, I think referring to it as madness dismisses
years of hard work by a great many competent and dedicated developers.

A year ago, M:N threading was extremely far from productionability --
today, it's on the cusp of being there, with higher performance and
increasingly high reliability.  It's almost ready for 5-STABLE.  There's
substantial on-going work on SMP, with a huge investment of time and
energy into the network stack, VM system, VFS, process support,
scheduling, etc.  These are areas where the primary feedback today is
going to be stability and performance, and believe me, we're listening. 
All the FreeBSD developers I correspond with regularly run FreeBSD 5 on
their desktops, on their servers, in their appliances, etc, to make sure
we keep shaking out problems.  Many companies have production products
based on 5.x, and their feedback (and contributions) have been valuable.
We've also invested substantial efforts in areas like compiler toolchains,
standards compliance, not to mention new features. 

5.x is, at long last, starting to land; it will take about one more minor
version number to get there, we believe, but it is in dramatically better
shape than it was a year or two ago.  As I said above: writing operating
systems isn't a small task.  Companies invest tens (hundreds) of millions
of dollars writing and maintaining operating systems, and (net across
developers, if you actually bill for the volunteer 

Re: pciconf -lv - /dev/pci error

2003-12-31 Thread Robert Watson

On Wed, 31 Dec 2003, William Michael Grim wrote:

 I have 5.1-RELEASE installed on my system, and I've never needed to do a
 pciconf -lv to probe the system before.  However, I tried doing it
 earlier today after logging in through SSH and doing su - to become
 superuser.  I received this error: 
 
 [EMAIL PROTECTED] 09:12:42 root]# pciconf -lv
 pciconf: /dev/pci: Operation not permitted
 
 [EMAIL PROTECTED] 09:15:41 root]# ls -l /dev/pci
 crw-r--r--  1 root  wheel  251,   0 Nov  2 05:09 /dev/pci
 
 So, as you can see, the permissions are correct.  Perhaps I don't have
 something compiled into my kernel?  I can attach a dmesg and kernel
 config if it's necessary. 

pciconf -lv appears to cause pciconf to open /dev/pci writable:

   731 pciconf  CALL  open(0x8049a55,0x2,0)
   731 pciconf  NAMI  /dev/pci
   731 pciconf  RET   open -1 errno 13 Permission denied

And, of course, it's not writable by non-root.  The attached patch causes
pciconf to open /dev/pci read-only when listing devices (apply to
usr.sbin/pciconf/pciconf.c): 

Index: pciconf.c
===
RCS file: /home/ncvs/src/usr.sbin/pciconf/pciconf.c,v
retrieving revision 1.19
diff -u -r1.19 pciconf.c
--- pciconf.c   20 Jun 2003 23:59:25 -  1.19
+++ pciconf.c   31 Dec 2003 17:58:45 -
@@ -165,7 +165,7 @@
if (verbose)
load_vendors();
 
-   fd = open(_PATH_DEVPCI, O_RDWR, 0);
+   fd = open(_PATH_DEVPCI, O_RDONLY, 0);
if (fd  0)
err(1, %s, _PATH_DEVPCI);
 


The pci_user.c code in the kernel requires that the caller hold a writable
file descriptor for most of the ioctls; the exception is PCIOCGETCONF,
which is the only ioctl pciconf's list_devs() uses.  We can probably just
go ahead and commit this patch, I think.  The reason a check was added to
the kernel pci ioctl code is that unaligned writes to /dev/pci can cause
faults, I believe... 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: pciconf -lv - /dev/pci error

2003-12-31 Thread Robert Watson

On Wed, 31 Dec 2003, John Baldwin wrote:

 History is in PR 32677.  I do think your patch might be ok if it only
 applies to the -l case.  If so, then it should probably be committed and
 MFC'd (along with the kernel pci_user.c change) so the PR can be closed. 

Well, this patch changes only the user code for pciconf, which doesn't run
with privilege, not the kernel code implementing the protections.  pciconf
appears only to require the PCIOCGETCONF ioctl to implement -l[v], and all
this patch does is make it so pciconf ask for a read-only file descriptor
for -l[v].  This patch doesn't fix pciconf with securelevels, since we
still prevent acquiring an open file descriptor when the securelevel is 
0.  I think a better answer would be to expose the PCI stuff using a
sysctl mib rather than an ioctl, since file descriptors to /dev/pci are
multi-purpose, and imply the ability to read/write the register space,
etc.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: patch: portable dirhash

2003-12-17 Thread Robert Watson

On Wed, 17 Dec 2003, Alexander Kabaev wrote:

 On Tue, 16 Dec 2003 22:12:08 -0500 (EST)
 Ted Unangst [EMAIL PROTECTED] wrote:
 
  can somebody please review/commit this to freebsd?  it is most of the 
  differences to permit openbsd to use the code.  it should not change
  the code in any functional way.
 
 I do not think there is any point in this code ever hitting FreeBSD CVS
 repository. Rather, OpenBSD should just take cleaned-out copy of this
 code and be done with it. 

Well, it's true the #ifdef OpenBSD's probably don't help the readability
of our code, abstracting a step by using macros to wrap specific locking
primitives is a widely used approach in the FreeBSD tree, especially where
it's not clear a final locking strategy has been developed due to a lack
of profiling.  For example, in both the network code and process
management code, we wrap mutexes/sxlocks in macros to avoid committing to
either, and to make changing the strategy easier.  I wouldn't object to
our adopting the macro wrapping, which would have the side effect of
helping the OpenBSD patch size a lot also, even leaving out the #ifdef's.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: patch: portable dirhash

2003-12-17 Thread Robert Watson

On Wed, 17 Dec 2003, Robert Watson wrote:

 On Wed, 17 Dec 2003, Alexander Kabaev wrote:
 
  On Tue, 16 Dec 2003 22:12:08 -0500 (EST)
  Ted Unangst [EMAIL PROTECTED] wrote:
  
   can somebody please review/commit this to freebsd?  it is most of the 
   differences to permit openbsd to use the code.  it should not change
   the code in any functional way.
  
  I do not think there is any point in this code ever hitting FreeBSD CVS
  repository. Rather, OpenBSD should just take cleaned-out copy of this
  code and be done with it. 
 
 Well, it's true the #ifdef OpenBSD's probably don't help the readability
 of our code, abstracting a step by using macros to wrap specific locking
 primitives is a widely used approach in the FreeBSD tree, especially
 where it's not clear a final locking strategy has been developed due to
 a lack of profiling.  For example, in both the network code and process
 management code, we wrap mutexes/sxlocks in macros to avoid committing
 to either, and to make changing the strategy easier.  I wouldn't object
 to our adopting the macro wrapping, which would have the side effect of
 helping the OpenBSD patch size a lot also, even leaving out the
 #ifdef's. 

That said, LOCK() is a terrible name for a macro. :-)  If anything, it
should be DIRHASH_LOCK() or the like. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: general load balancing issues

2003-12-15 Thread Robert Watson

On Mon, 15 Dec 2003, Matthew Seaman wrote:

 On Mon, Dec 15, 2003 at 12:46:52PM +0100, Bogdan TARU wrote:
   Right now I am considering a setup with one common NFS repository for
   the configuration files, Apache binaries, Web content and temp
   directory for PHP, NFS resource which will be mounted on all the
   'front' webservers. I am wondering, though, if I will be able (by
   having one common temp directory for PHP) to load-balance the domains
   involving sessions: will the sessions be lost when connsecutive hits
   go to different webservers, or not? 
 
 The canonical answer to this is to store the session data in the
 back-end database, so that it's accessible to all of your servers. 
 
 See the PHP docs for session_set_save_handler(). There's an example of
 how to do this in the O'Reilly Platypus book Web Database Applications
 with PHP and MySQL, or contact me off list and I can send you some
 sample code.  Probably a good idea to take this off-list anyhow, as it's
 not really [EMAIL PROTECTED] material. 

Another approach I've seen is to avoid the use of state as much as
possible, but when the user starts accessing a stateful service, to
redirect them from the load balancer to one of the back end servers
directly.  This assumes that the majority of content generating load is
static, of course (which may well not be the case because dynamic content
generates much more load than static content in many installations).

Another approach is, if there is little state being used, to store the
state in the client via URL lines or cookies.  This can be especially
effective if you use a keyed hash with expiry as part of the cookie or URL
data so that you can trust the state.

When setting up load balancing with state, one of the hardest things is
making sure the solution isn't slower than the original, and the details
of the local installation are often relevant.  If there are frequent state
queries, going to a backend database can make things slower.  If they're
infrequent, and enough of the work can happen on the web server, it can
make things a lot faster (and it's much easier to manage than many other
solutions, since it just works). 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: howto upgrade 4.8 to 4.9 without cdrom or floppy?

2003-12-11 Thread Robert Watson

On Fri, 12 Dec 2003, paul van den bergen wrote:

 on freebsd-hackers, Alfred Perlstein posted a method that allows
 boot-disk-less installation... but it requires mdconfig, a 5.1
 utility... 
 
 is there a method to do this under 4.8? 
 
 it seems to me that the job performed by md0 could be done with vn0 e.g. 

If you're willing to build from the source tree, the
buildworld/buildkernel/installkernel/reboot/installworld/mergemaster route
is actually quite reliable.  I just upgraded a 4.6 box to 4.9 a couple of
days ago, remotely with no serial console, cdrom, or floppy, without a
hitch.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research



 
 do 
 # ls /dev/vn*
 if empty do
 # cd /dev
 # ./MAKEDEV vn0
 # ./MAKEDEV vn1
 # vnconfig vn0 /path/to/freebsd/4.9.iso
 # mount_cd9660 /dev/vn0c /path/to/freebsd4.9
 
 or however you access the freebsd install iso disk
 the point being to get access to the /floppies/boot.flp image on the cdrom
 
 # vnconfig vn1 /path/to/freebsd4.9/floppies/boot.flp
 # mkdir /bootfloppy
 # mount_mfs /dev/vn1c /bootfloppy/
 # cp /bootfloppy/kernel.gz /ikernel.gz
 # cp /bootfloppy/mfsroot.gz /mfsroot.gz
 
 then reboot as described...
 
 I am about to try this out... wish me luck!
 
 
 On Mon, 1 Dec 2003 07:18 pm, Alfred Perlstein wrote:
  I have a mini-HOWTO here that possibly be automated.
 
  Basically we're going to install FreeBSD over FreeBSD without
  a floppy, cdrom or pxe.
 
  This depends on a loader that's compatible with your kernel
  so if really weird lockups happen, you might not be compatible.
 
 
  Anyhow, here we go:
 
 
  Download the boot.flp from the release you want to install.
 
  Mount it like so:
  mdconfig -a -t vnode -f boot.flp
  # should output something like 'md0'
  mkdir -p /mnt
  mount /dev/md0 /mnt
 
  Copy the yummy bits from the install image to your root:
  cp /mnt/kernel.gz /ikernel.gz
  cp /mnt/mfsroot.gz /mfsroot.gz
 
  Now reboot and interrupt the loader when it counts down the boot.
 
  Then type these commands into the loader:
  unload kernel
  load /ikernel
  load -t mfs_root /mfsroot
  set vfs.root.mountfrom
  boot
 
  Now cross your fingers once you wipe the partitions out to reinstall...
 
 
  It would be cool if this could be automated[1], perhaps by setting
  the boot partition to the swap partition and setting it up temporarily
  as a ufs filesystem and then... oh... well...
 
  [1] http://www.jerkcity.com/jerkcity1426.html
 
 
 
 -- 
 Dr Paul van den Bergen
 Centre for Advanced Internet Architectures
 caia.swin.edu.au
 [EMAIL PROTECTED]
 IM:bulwynkl2002
 And some run up hill and down dale, knapping the chucky stones 
 to pieces wi' hammers, like so many road makers run daft. 
 They say it is to see how the world was made.
 Sir Walter Scott, St. Ronan's Well 1824 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: adding more ram

2003-12-10 Thread Robert Watson

On Wed, 10 Dec 2003, Dan Nelson wrote:

 In the last episode (Dec 10), [EMAIL PROTECTED] said:
  I have a server with 1GB of RAM and a swap partition of 2GB i will
  upgrade the memory server to 2GB so my questions are:
  
  should i fix the swap partition to have now 4GB of space ?
 
 Depends.  Have you ever used up that 2gb of swap?  If not, you'll
 probably never consume 4gb either :)  If this is a database server, or
 something similar where a few processes allocate large amounts of
 memory, you don't need much swap anyway, since if any of those processes
 actually has to swap, you end up thrashing the system as it tries to
 swap 500mb processes in and out of memory.  I really can't think of a
 system that would still perform well with 2 or 3GB of process space in
 swap.  At the 2gb RAM point, you usually have a system where any
 swapping == bad news. 

Actually, the thing I use swap for most now is to make sure I can allocate
large temporary file systems without consuming excessive kernel address
space.  I.e., I'll often create a 512mb swap-backed md device for /tmp,
and make sure I have enough swap to fully back it and everything else,
even though the chances are I won't touch it in normal operation.  I just
don't want to run out in the event something does need it...

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Sharing data between user space and kernel

2003-12-07 Thread Robert Watson

On Sun, 7 Dec 2003, Anand Subramanian wrote:

 A look at the copyin() code in the kernel reveals that all the kernel
 needs to do to access the data(address space) of a user process is

Note that the copyin/copyout implementat is machine-dependent (MD) and so
while this is true on i386, it may not be true on other systems.  The
fuword/suword/copyin/copyout/uio code is intentionally designed to avoid
the assumption that userspace pointers are directly dereferenceable by
kernel code.  One important example of a situation where this difference
has to be maintained is implementing 32-bit emulation on 64-bit platforms. 
On amd64, you can't just dereference a 32-bit pointer when the kernel is
running in 64-bit mode. 

 1.  Get the current thread, which I saw is done using the PCPU_GET macro.
 So I suppose this is always preserved upon a system call.
 
 2. Set the segment register for the user process correctly.
 
 And magically, all the user process's data can now be accessed by the
 kernel directly. 
 
 Is that correct? In the event of which, it would become really easy for
 a user process to allocate a chunk of memory and all a kernel module
 needs to do to implement shared memory is do the steps 1  2 and
 access the data.
 
 Of course there is the question that the user process is swapped out
 after the system call and some other thread starts running in between in
 which case curthread should point to some other thread and not the one
 that issued the system call. But then, isn't this what happens upon
 every system call normally, when the kernel does the steps 1  2 to
 obtain the data arguments which are passed to the system call. So this
 is hardly a problem. So, can shared memory be implemented this way
 instead of the more traditional pseudo-device way? 
 
 Appreciate any comments on this(please do a CC to my email address, in
 case you choose to respond). 

An additional issue is that user pages are pageable to disk, so may not be
in memory.  If you're holding any mutexes/etc in kernel when you touch one
of those pages, the page fault has to be processed, and you risk (a)
holding the locks for a long time, and (b) lock order problems.  This is
one reason why copyin()/copyout() have to be used very carefully, and this
would apply also to any code replicating that functionality.  If you take
a look at the sysctl() code, you'll see that it wires userspace pages into
memory to avoid the risk of sleeping().  What you probably want to do is
actually allocate wired kernel pages and export them to userspace.  Take a
look at the GEOM gstat(8) implementation, which does exactly that. 
However, you have to make sure that if you ever decide to reuse that
kernel memory for something else (i.e., free it back to the allocator),
you've GC'd all userspace references to it. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reward for fixing keyboard support in FreeBSD, apply within

2003-12-07 Thread Robert Watson

On Sun, 7 Dec 2003, Mathew Kanner wrote:

 On Dec 07, Mathew Kanner wrote:
  The way I see it, FreeBSD needs serious hacking to have
  multiple concurrent keyboards support without serious hacking.
 
   ugh, you know what I mean. 

With mouse support, we have a layer of indirection with moused that
combines input from various mouse devices into a single event stream via
/dev/sysmouse.  While I don't think we want a keyboard daemon at this
point, we might well need to add a similar abstraction in the kernel so
that different keyboard sources can be combined.  Just plugging in a USB
mouse and having it just work is extremely beneficial, and I agree we
need to be able to support the same with keyboards.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IPFW and the IP stack

2003-12-04 Thread Robert Watson

On Thu, 4 Dec 2003, Devon H.O'Dell wrote:

 This is obviously the most logical explanation. There's a good bit of
 questioning for PFIL_HOOKS to be enabled in generic to allow ipf to be
 loaded as a module as well. If this is the case, we'll have two
 firewalls that have their hooks compiled in by default allowing for them
 both to be loaded as modules. (Is this still scheduled for 5.2?) 
 
 But at this point, there's no way to allow one to turn the IPFW hooks
 *off*. Is there a reason for this? 
 
 Would it be beneficial (or possible) to hook ipfw into pfil(9)? This
 way, we could allow the modules to be loaded by default for both and
 also allow for the total absence of both in the kernel. Sorry if I've
 missed discussions on this and am being redundant. 

Sam Leffler has done a substantial amount of work to push all of the
various hacks (features?) behind PFIL_HOOKS, and I anticipate we'll
ship PFIL_HOOKS enabled in GENERIC in 5.3 and use it to plug in most of
these services.  This also means packages like IPFilter and PF will work
out of the box without a kernel recompile, not to mention offering
substantial architectural cleanup. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ifconfig(8) refactoring -- YACC grammar now online

2003-11-30 Thread Robert Watson

On Sun, 30 Nov 2003, Bruce M Simpson wrote:

 On Sun, Nov 30, 2003 at 01:12:42PM +0100, Andre Oppermann wrote:
  What I've thinking about a lot is to make the networking system and
  ifconfig sort of class-based like newbus and geom.
 
 Look at: http://people.freebsd.org/~bms/dump/nifconfig/nifconfig-design.txt
 
 There is a pending change to if_gre to enable it to be easily classified
 in this way; ifconfig would simply query the interface for its if_type.
 This is one way to do it without having to change struct ifnet. We could
 add a new field, but avoiding changing the ABI is a Good Thing. 

if_type seems like it will work for high level classes of interfaces, but
something more fine-grained will be required for interfaces that implement
multiple classes or subclasses (i.e., 802 generally, and also 802.11b). 
Or likewise, tap interfaces might implement 802 generally, but also
if_tap-specific primitives.  Do we need to probe by-name for capabilities
using interface ioctls, or return a list of implemented
interfaces/classes to allow things to be a bit more multidimensional? 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ifconfig(8) refactoring -- YACC grammar now online

2003-11-30 Thread Robert Watson

On Sun, 30 Nov 2003, Bruce M Simpson wrote:

 On Sun, Nov 30, 2003 at 02:20:50PM -0500, Robert Watson wrote:
  if_type seems like it will work for high level classes of interfaces, but
  something more fine-grained will be required for interfaces that implement
  multiple classes or subclasses (i.e., 802 generally, and also 802.11b).
 
 The idea just now is we look at if_media if we need to get specific with
 physical interfaces. 
 
 tap would seem to be missing from my list, actually; I note it's used to
 provide VMware support in the absence of Netgraph, amongst other things.

if_tap is actually quite useful, and in the same general class of
synthetic interfaces as if_tun.  I've used both in building tunneling and
topology-manipulation tools, as well as for debugging routing, etc. 
if_tap simulates an 802 device, and if_tun simulates a point-to-point
device.  VMware is the only application I know of using if_tap, although I
have a fair amount of my own code that uses it.  Userland ppp uses if_tun,
as to some of the third party crypto tunneling tools. 

  Or likewise, tap interfaces might implement 802 generally, but also
  if_tap-specific primitives.  Do we need to probe by-name for capabilities
  using interface ioctls, or return a list of implemented
  interfaces/classes to allow things to be a bit more multidimensional?
 
 That might work well, actually -- I already added a MIB to rtsock to
 deal with our lack of reporting multicast group memberships, I see no
 reason not to add one to enumerate loaded interface classes. 
 
 OTOH, for the 'could load kld' case, this falls down, until the instance
 is created, either through cloning or completing ifattach() for a
 physical interface -- but if CREATE is a separate operation this isn't a
 problem, it is a problem if we want to say something like this in one
 go:-
 
   ifconfig gif0 create tunnel 1.2.3.4 5.6.7.8 10.0.0.1 10.0.0.2
 
 Then you do need a means for the ifconfig instance to ask gif0 if it
 speaks 'tunnel-ese' once it's loaded. 
 
 I have to find an abstraction to comfortably deal with this stacking of
 properties/methods, simple polymorphism (a la Java 'implements
 interface')  springs to mind. 

I think that would be a reasonable approach, although it seems to me that
both the inheritance and implements models might apply in looking at
sets of protocol relationships.  a tap interface is a synthetic interface,
it implements synthetic interface controls, as well as implementing 802.
However, it might be neat to hook up 802.11 to a tap-like interface
sometime as well.  Question: does 802.11 imply 802?  If so, a notion of
inheritence might be quite useful for driver implementors.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: freebsd smp - linux up

2003-11-27 Thread Robert Watson
On Wed, 26 Nov 2003, Anthony Schneider wrote:

 sadly, all ktrace shows is ktrace launching vmware (from 'ktrace
 vmware', shows sh reading and executing, and then ends with the vmware
 fork). 
 
 is there a special way to ktrace linux binaries that i'm not aware of? 

ktrace should work fine, but you need to make sure you use the linux_kdump
port so that the system call trace is interpreted correctly when converted
to text.  As DES points out, make sure you have the right flags to the
ktrace command so tracing is inheritted across fork and exec.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


 
 -Anthony.
 
 On Tue, Nov 25, 2003 at 07:32:35PM +0100, Dag-Erling Smørgrav wrote:
  Anthony Schneider [EMAIL PROTECTED] writes:
   is there a way to have linux emulation report that its kernel is running
   on a UP system even though the freebsd box it's running on is SMP?  i
   would like to get vmware running on my smp -current box, but vmmon_smp.ko
   is broken, and with vmmon_up.ko loaded i get a message about needing to
   be running on an smp linux kernel version 2.0 (2.2) or higher, even though
   linux emulation reports a 2.4 kernel.
  
  It would be interesting to know exactly what it needs that we don't
  provide.  I suspect it's something really trivial...  do you see any
  messages in syslog about unimplemented syscalls?  Could you get a
  ktrace or something?
  
  DES
  -- 
  Dag-Erling Smørgrav - [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Help request: problems with a 5.1 server and large numbers of ssh users.

2003-11-20 Thread Robert Watson

On Wed, 19 Nov 2003, Len Sassaman wrote:

 It is my intuition from this behavior that the sshd master process
 listening for connections is unable to spawn a new process to complete
 the authentication step, and thus the connection is being dropped. There
 is no information of use in dmesg, nor in the system logs. (I've cranked
 up LogLevel to DEBUG3 in sshd_config). 
 
 I have a RedHat Linux server running the 2.4.18-3smp kernel on a dual
 Athlon MP 1800+ and 2048MB RAM that is known to handle 1000 users
 without issue -- so I have to believe the FreeBSD box, though not as
 beefy hardware-wise, should be able to do better than a few hundred
 users. I believe this to be some sort of resource limit issue, but I
 have addressed everything I could think of. 

Hmm.  Well, it certainly sounds like a resource limit to me, especially if
it's a nice round number like 150 or 300.  However, I'm also having a
bit of trouble seeing, off the top of my head, which limit it might be. 
It sounds like you've got the ones I would think of.  A quick skim of
sshd.c suggests that it is pretty careful to document various failure
modes in debugging output.  There are one or two failures where it does
not log, and they include the call to pipe() in the server loop -- if that
fails, it bails without an error, which is a little surprising.  Could you
post server debug output for the first connection to the server that
fails?  This would let us see how far it got...  In particular, whether
it did spawn a child process, etc.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Help request: problems with a 5.1 server and large numbers of ssh users.

2003-11-20 Thread Robert Watson

On Thu, 20 Nov 2003, Ken Smith wrote:

 On Thu, Nov 20, 2003 at 10:56:08AM -0500, Robert Watson wrote:
 
  Hmm.  Well, it certainly sounds like a resource limit to me, especially if
  it's a nice round number like 150 or 300.
 
 One possibility might be running out of pseudo-terminals to support the
 login sessions.  pty's are created as needed I think, and the code that
 handles it is in sys/kern/tty_pty.c.  The limits on it appear to be 256
 ptys: 

I thought about that, but the submitter indicated that pty's were not
being allocated.  However, that would be a really good thing to verify,
since the numbers come out right...

I should really clean up and commit my pty cleanup at some point, as well
as support for forkpty()/openpty()/etc that avoid the sort of code found
below.  Presumably that would be a 5.3 thing. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


 
 /*
  * This function creates and initializes a pts/ptc pair
  *
  * pts == /dev/tty[pqrsPQRS][0123456789abcdefghijklmnopqrstuv]
  * ptc == /dev/pty[pqrsPQRS][0123456789abcdefghijklmnopqrstuv]
  *
  * XXX: define and add mapping of upper minor bits to allow more
  *  than 256 ptys.
  */
 
 I don't know if simply changing the :
 
   static char *names = pqrsPQRS;
 
 to something longer is all that would be required or if there are
 other factors involved.
 
 -- 
   Ken Smith
 - From there to here, from here to  |   [EMAIL PROTECTED]
   there, funny things are everywhere.   |
   - Theodore Geisel |
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 4.9 KLDload error

2003-11-08 Thread Robert Watson

On Fri, 7 Nov 2003, Jin Guojun [NCS] wrote:

 A KLD module ncs_time_ctl.ko compiled on both 4.8 and 4.9 hosts can be
 loaded by kldload on any 4.8 machine. But neither .ko files can be
 loaded on a 4.9 machine.  The error is: 
 
 4.9 # kldload -v ./ncs_time_ctl.ko
 kldload: can't load ./ncs_time_ctl.ko: Exec format error
 
 kldload should give more error information on what function it failed to load.
 
 Is this possible a 4.9 bug in kldload? or does some KLD mechanism has
 been changed in 4.9-RELEASE?  Is there any way to analyze what is wrong
 in the 4.9 LKD system? 

Unfortunately, the UNIX errno mechanism isn't very expressive.  However,
the kernel linker will send debugging output to the system console.  Check
dmesg and see if there's more information there.  Typically, this error
will be the result of a failure to link symbols in the module: either due
to a symbol already present, or a missing dependency.  To debug this
further, look at the console output, and also compare the output of nm 
on the .ko built on 4.8 and 4.9 to see if its dependencies or exposed
symbols have changed.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: sending messages, user process -- kernel module

2003-11-07 Thread Robert Watson

On Fri, 7 Nov 2003, Jerry Toung wrote:

 I am trying to do asynchronous send/receive between a user process that
 I am writing and a kernel module that I am also writing.  I thought
 about implementing something similar to unix routing socket, but I will
 have to define a new domain and protosw.  Beside that idea, what else
 would you suggest? 

This is actually somewhat of a FAQ, since it comes up with relative
frequency.  I should dig up my most recent answer and forward that to you,
but the quicky answers off the top of my head are:

(1) One frequent answer is a pseudo-device -- for example, /dev/log
buffers kernel log output for syslogd to pick up asynchronously.  Arla
and Coda both use pseudo-devices as a channel for local procedure
calls to/from userspace to support their respective file systems using
userspace cache managers.

(2) Have the kernel open a file system FIFO and have the process on that
FIFO.  The client-side NFS locking code uses /var/run/lock to ship
locking events to a userspace rpc.lockd.  However, responses from
rpc.lockd are then delivered to the kernel using a system call
synchronously from the user process, as opposed to via a FIFO.

(3) The routing socket approach can work quite well, especially if you
need multicast semantics for messages, not to mention well-defined
APIs for managing buffer size, etc. Another instance of this approach
is PF_KEY, used for IPsec key management.  As you point out, it
requires digging into other code and a fair amount of implementation
overhead.

(4) You can have kernel code create and listen on sockets in existing
domains, including UNIX domain sockets and TCP/IP sockets.  The NFS
client and server code both make use of sockets directly in the
kernel for RPCs.

Some of the particularly nice benefits of (2) and (4) is that it's easy to
implement userspace test code, since the fifo/socket is just used as a
rendezvous and doesn't care if the other end is in kernel or not. 
Likewise, the blocking/buffering/... semantics are quite well defined,
which means you won't be tracking down wakeups, select semantics, thread
behavior and synchronization, etc, as you might do in (1).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Update: Debox sendfile modifications

2003-11-05 Thread Robert Watson
On Wed, 5 Nov 2003, Igor Sysoev wrote:

 As to worker kthreads I think it's better to queue aio operation as it
 was made in src/sys/kern/vfs_aio.c:aio_qphysio(). 

One of the things that worries me about the proposal to use kernel worker
threads to perform the I/O is that this can place a fairly low upper bound
on effective parallelism, unless the kernel threads themselves can issue
the I/O's asynchronously.  In the network stack itself, we are event and
queue driven without blocking--if we can maintain the apparent semantics
to the application, it would be very nice to be able to handle that at the
socket layer itself.  I.e., not waste a thread + stack per in-progress 
operation, and instead have a worker or two that simply propel operations
up and down the stack (similar to geom_up and geom_down). 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Update: Debox sendfile modifications

2003-11-05 Thread Robert Watson

On Wed, 5 Nov 2003, Igor Sysoev wrote:

 On Wed, 5 Nov 2003, Robert Watson wrote:
 
  On Wed, 5 Nov 2003, Igor Sysoev wrote:
  
   As to worker kthreads I think it's better to queue aio operation as it
   was made in src/sys/kern/vfs_aio.c:aio_qphysio(). 
  
  One of the things that worries me about the proposal to use kernel worker
  threads to perform the I/O is that this can place a fairly low upper bound
  on effective parallelism, unless the kernel threads themselves can issue
  the I/O's asynchronously.  In the network stack itself, we are event and
  queue driven without blocking--if we can maintain the apparent semantics
  to the application, it would be very nice to be able to handle that at the
  socket layer itself.  I.e., not waste a thread + stack per in-progress 
  operation, and instead have a worker or two that simply propel operations
  up and down the stack (similar to geom_up and geom_down). 
 
 As far as I understand src/sys/kern/vfs_aio.c:aio_qphysio() (that
 handles AIO on raw disks) does not use kthreads and simply queues
 operations. 

I think it sounds like we're actually agreeing with each other. 
Currently, AIO does use threads for non-character devices, so in the
socket case it will be using a worker thread. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Experimental FreeBSD and Linux kernel source cross reference web site

2003-10-30 Thread Robert Watson

On Thu, 30 Oct 2003, Hiten Pandya wrote:

   Thank you very very much! ;-)
 
   Atlast, someone got to it.  I have been wanting to setup LXR for
   DragonFly for quite some time now, but did not have enough time
   on my hands to mess with it.  Does it require any sort of
   patching for it to work on FreeBSD ?  I recall it requires MySQL
   and some other stuff..

I'm actually using an older version of the lxr software, 0.3.1, which
doesn't make use of a back-end SQL database, rather, some simple db-based
data stores and glimpse for searches.  It was a lot easier to set up, once
I fixed some rather critical bugs :-).

I've gone ahead and dropped a snapshot of the DFBSD sys tree on fxr as
well, and am currently cvsuping opendarwin source to drop a recent
snapshot of xnu.  I'm not sure if there are any DFBSD tags worth using
other than HEAD, so I just used a timestamp for the checkout.  The
rearrangement of the DFBSD tree makes diffing between FreeBSD and DFBSD
bits a little more difficult, but in many cases it's fairly feasible. I've
been trying to decide how to improve diffability between the FreeBSD and
Darwin trees: most FreeBSD bits compare directly with xnu/bsd/..., not
xnu/..., and lxr isn't very flexible about how it sets up diff
comparisons. 

I've also noticed that lxr is currently unwilling to index macros as
identifiers when they're generated at compile-time, which means (for
example) that you have to use freetext searches to find vnode operation
macro use.  I'm not sure how much more time I'm willing to invest in
further refining lxr itself, but I'll keep the source code snapshots
up-to-date and bring in new sources of kernel source code as appropriate. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Experimental FreeBSD and Linux kernel source cross reference web site

2003-10-29 Thread Robert Watson

In the past when browsing the Linux source code, I've made extensive use
of the Linux Cross-Reference (LXR) hosted at lxr.linux.no.  This web site
provides a cross-referenced and searchable HTML interface to the Linux
source code; you can perform freetext and identifier searches, check
differences between revisions, etc.  For FreeBSD, we provide a cvsweb
interface that is extremely useful for tracking changes, but a little less
useful for raw browsing when you're looking for use of an identifier. In
the past, CMU's PDL (and possibly others) have provided FreeBSD
cross-reference web pages, but I was unable to find one once that site
went down.  As such, I've experimentally set up the LXR software with
access to several branches of the FreeBSD source code, as well as 2.4 and
2.6 Linux kernels at: 

http://fxr.watson.org/

This is experimental, but I've found it to be quite useful for my own
work.  I'm intermittently synchronizing the checked out snapshots to CVS. 
LXR is a useful piece of software, but not designed to handle multiple
source code collections so well (i.e., currently isn't a good candidate
for all of src).  On the other hand, making the source code more easy to
search and browse is a very useful thing, so feel free to give it a spin
:-).  I'll probably keep tweaking and playing with the configuration, as
well as put more revisions of the Linux source online, probably drop in an
OpenBSD, NetBSD, or DFBSD snapshot or two, etc, soon also.  I don't
promise it will be there tomorrow, but if it proves useful and
interesting, it probably will be :-). 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Experimental FreeBSD and Linux kernel source cross reference web site

2003-10-29 Thread Robert Watson
FYI, lxr's C parsing code appears to dislike some of our C constructs.  I
haven't had a chance to dig in much yet, but this is a warning that there
are some glitches (for example, kern_prot.c seems to be improperly parsed
in RELENG_4).  Also, the identifier database seems somewhat prone to
corruption if aborted part way through processing; the identifier database
for HEAD appears currently to be corrupted so I'm rebuilding it.  So if an
identifier search fails unexpectedly, or you notice that a C file is not
highlighted with cross-reference links for important identifiers, that's
probably why: try again in ten minutes.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD mail list etiquette

2003-10-25 Thread Robert Watson

On Sat, 25 Oct 2003, Matthew Dillon wrote:

 It's a lot easier lockup path then the direction 5.x is going, and
 a whole lot more maintainable IMHO because most of the coding doesn't
 have to worry about mutexes or LORs or anything like that.  

You still have to be pretty careful, though, with relying on implicit
synchronization, because while it works well deep in a subsystem, it can
break down on subsystem boundaries.  One of the challenges I've been
bumping into recently when working with Darwin has been the split between
their Giant kernel lock, and their network lock.  To give a high level
summary of the architecture, basically they have two Funnels, which behave
similarly to the Giant lock in -STABLE/-CURRENT: when you block, the lock
is released, allowing other threads to enter the kernel, and regained when
the thread starts to execute again. They then have fine-grained locking
for the Mach-derived components, such as memory allocation, VM, et al. 

Deep in a particular subsystem -- say, the network stack, all works fine. 
The problem is at the boundaries, where structures are shared between
multiple compartments.  I.e., process credentials are referenced by both
halves  of the Darwin BSD kernel code, and are insufficiently protected
in the current implementation (they have a write lock, but no read lock,
so it looks like it should be possible to get stale references with
pointers accessed in a read form under two different locks). Similarly,
there's the potential for serious problems at the surprisingly frequently
occuring boundaries between the network subsystem and remainder of the
kernel: file descriptor related code, fifos, BPF, et al.  By making use of
two large subsystem locks, they do simplify locking inside the subsystem,
but it's based on a web of implicit assumptions and boundary
synchronization that carries most of the risks of explicit locking.

It's also worth noting that there have been some serious bugs associated
with a lack of explicit synchronization in the non-concurrent kernel model
used in RELENG_4 (and a host of other early UNIX systems relying on a
single kernel lock).  These have to do with unexpected blocking deep in a
function call stack, where it's not anticipated by a developer writing
source code higher in the stack, resulting in race conditions.  In the
past, there have been a number of exploitable security vulnerabilities due
to races opened up in low memory conditions, during paging, etc.  One
solution I was exploring was using the compiler to help track the
potential for functions to block, similar to the const qualifier, combined
with blocking/non-blocking assertions evaluated at compile-time.  However,
some of our current APIs (M_NOWAIT, M_WAITOK, et al) make that approach
somewhat difficult to apply, and would have to be revised to use a
compiler solution.  These potential weaknesses very much exist in an
explicit model, but with explicit locking, we have a clearer notion of how
to express assertions.

In -CURRENT, we make use of thread-based serialization in a number of
places to avoid explicit synchronization costs (such as in GEOM for
processing work queues), and we should make more use of this practice. 
I'm particularly interested in the use of interface interrupt threads
performing direct dispatch as a means to maintain interface ordering of
packets coming in network interfaces while allowing parallelism in network
processing (you'll find this in use in Sam's netperf branch currently).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Synchronization philosophy (was: Re: FreeBSD mail list etiquette)

2003-10-25 Thread Robert Watson

(Subject changed to reflect the fact that it contains useful technical
content and banter, resulting in a hijacking of the thread; hope no one
minds)

On Sat, 25 Oct 2003, Matthew Dillon wrote:

 Yes.  I'm not worried about BPF, and ucred is easy since it is
 already 95% of the way there, though messing with ucred's ref count
 will require a mutex or an atomic bus-locked instruction even in 
 DragonFly!  The route table is our big issue.  TCP caches routes so we
 can still BGL the route table and achieve 85% of the scaleable
 performance so I am not going to worry about the route table initially.
 
 An example with ucred would be to passively queue it to a particular cpu
 for action.  Lets say instead of using an atomic bus-locked instruction
 to manipulate ucred's ref count, we instead send a passive IPI to the
 cpu 'owning' the ucred, and that ucred is otherwise read-only.  A 
 passive IPI, which I haven't implemented yet, is simply queueing an
 IPI message but not actually generating an interrupt on the target cpu
 unless the CPU-CPU software IPI message FIFO is full, so it doesn't
 actually waste any cpu cycles and multiple operations can be executed
 in-batch by the target.  Passive IPIs can be used for things
 that do not require instantanious action and both bumping and releasing
 ref counts can take advantage of it.  I'm not saying that is how
 we will deal with ucred, but it is a definite option.

Actually, the problem isn't so much the data referenced by ucred, but the
references themselves.  Part of the issue in Darwin is that ucred
references are always gained using the p_ucred pointer in the proc
structure.  The proc structure is read and dereferenced fairly deep in the
network code (network funnel), and also in the remainder of the kernel
(kernel funnel).  In addition, there's a lock used to serialize writes to
p-p_ucred, but not to protect against reads of stale data.  Shared
structures, such as these, occur in pretty large quantity in BSD code, and
will be a problem no matter what approach to synchronization is taken. 
Moving towards message passing helps to structure the code to avoid
sharing of this sort, although it's not the only way to motivate that sort
of change.  I'm a big fan of the change in -CURRENT to use td-td_cred as
a read-only thread-local credential reference and avoid synchronization on
the credential reference--it nicely addresses the requirements for
consistency in the referenced data for the read-only cases (which are the
vast majority of uses of a credential).

There are a number of cases where moving towards a message passing
philosophy would really clean up the synchronization and parallelism
issues in FreeBSD: for example, even the relatively simple accounting file
rotation would benefit from queue-like operation to serialize the
accounting data/event stream and rotation events.  Using locks and
condition variables to perform serialization as is currently done in the
accounting code is unwieldy and bug-prone.  However, when moving to
event/message queuing, you also have to be very careful with data
ownership and referencing, as well as proper error-handling.  With
accounting, most scheduled vnode operations are asynchronous and have no
need for substantial error handling (when a process performs execve(),
regardless of whether accounting of that operation succeeds or fails,
execve() continues).  The start/stop operation, however, is intended to be
synchronous.  Happily, in the accounting case, all necessary error
checking can be performed in advance of the handoff to the accounting
thread from the user thread, but that won't always be the case...

One of the other risks that has worried me about this approach is that
explicit locking has some nice benefits from the perspective of
deadlocking and lock order management: monitoring for deadlocks and lock
orders is a well-understood topic, and the tools for tracking deadlocks
and wait conditions, as well as for performing locking and waiting safely,
are mature.  As with with the older BSD sleeping interfaces, such as
tsleep(), synchronous waits on messages are harder to mechanically track,
and resulting deadlocks resemble resource deadlocks more than lock
deadlocks...  On the other hand, some forms of tracing may be made easier. 
I've had some pretty nasty experiences trying to track deadlocks between
cooperating threads due to message waits, and found tools such as WITNESS
much easier to work with. 

In some work we're doing for one of our customers, we make extensive use
of handoff between various submitting threads and a serializing kernel
thread making use of thread-local storage to avoid explicit
synchronization.  Having dealt both with lower level lock/cv primitives
for event passing, and message passing, I have to say I'm leaning far more
towards the message passing.  However, it benefits particularly from the
approach due to its 

Re: Is socket buffer locking as questionable as it seems?

2003-10-04 Thread Robert Watson

On Sat, 4 Oct 2003, Brian Fundakowski Feldman wrote:

 I keep getting these panics on my SMP box (no backtrace or DDB or crash
 dump of course, because panic() == hang to FreeBSD these days):  panic:
 receive: m == 0 so-so_rcv.sb_cc == 52 From what I can tell, all sorts
 of socket-related calls are MP-safe  and yet never even come close to
 locking the socket buffer.  From what I can tell, the easiest way for
 this occur would be sbrelease()  being called from somewhere that it's
 supposed to, but doesn't, have sblock().  Has anyone seen these, or a
 place to start looking?  Maybe a way to get panics to stop hanging the
 machine?  TIA if anyone has some enlightenment. 

The system calls are marked MPSAFE in the case of the socket calls because
the grabbing of Giant has been pushed down into the system call, as
opposed to Giant being grabbed by the system call code itself.  Giant
should be held across all the relevant socket-related events -- if you
find a place where it's not, send some details :-).  As you observe, there
is currently no socket locking in the source tree, although I'm hopeful
that will be remedied in the next couple of months.  The lower levels of
the IP stack can be run Giant-free at this point, although my local
patches to run multiple input paths in parallel runs into a panic due to
insufficient locking in ip_forward() (bug report already filed with Sam). 

One of the conclusions from the recent developer summit was that a big
focus needs to be placed on interrupt processing latency and device driver
improvements so that we get the benefits of finger-grained locking.
Peter's has picked up the task of doing a driver API sweep to provide
better facilities for doing this.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 4.8-stable kernel panic

2003-09-15 Thread Robert Watson
If one of you has had a chance to test this properly, please go ahead and
commit.  I don't have remote -STABLE development boxes, so haven't been
able to do any -STABLE merging since I went to BSDCon.  I did get RE
permission to MFC this change.

FYI, I have a bunch more related changes in a patch that I can dig up once
I'm caught up on work re-mail.  There are a number of M_TRYWAIT scenarios
where we don't test the return value -- some easier to fix than others. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

On Mon, 15 Sep 2003, Maxim Konovalov wrote:

 On Sun, 14 Sep 2003, 23:05-0500, Mike Silbersack wrote:
 
 
  On Sun, 14 Sep 2003 [EMAIL PROTECTED] wrote:
 
   Hello,
  
   It's been almost a month now since I posted the original message on the
   list and I'm wondering about the progress on resolving this problem.
  
   I still can reproduce the panics after cvs-supping to RELENG_4 ~ 23:00 EDT
   today.
  
   Thanks much.
 
  Ooops, I forgot to follow up on this.
 
  Ok, a few questions:
 
  1.  Can you compile INVARIANTS and INVARIANT_SUPPORT into your kernel?
  That might help us track down the problem.
 
  2.  What does your network setup look like?  Are you using divert sockets,
  is there ppp in action, etc.
 
  I believe that I tried out your script at the time, and I couldn't find it
  to cause any problems on my system.
 
 rwatson has fixed this panic in rev. 1.115 in -current:
 
 revision 1.115
 date: 2003/08/26 14:11:48;  author: rwatson;  state: Exp;  lines: +2 -0
 M_PREPEND() with an argument of M_TRYWAIT can fail, meaning the
 returned mbuf can be NULL.  Check for NULL in rip_output() when
 prepending an IP header.  This prevents mbuf exhaustion from
 causing a local kernel panic when sending raw IP packets.
 
 PR: kern/55886
 Reported by:Pawel Malachowski [EMAIL PROTECTED]
 MFC after:  3 days
 
 and haven't MFCed it yet.  Here is a patch for -stable:
 
 Index: sys/netinet/raw_ip.c
 ===
 RCS file: /home/ncvs/src/sys/netinet/raw_ip.c,v
 retrieving revision 1.64.2.17
 diff -u -r1.64.2.17 raw_ip.c
 --- sys/netinet/raw_ip.c  9 Sep 2003 19:09:22 -   1.64.2.17
 +++ sys/netinet/raw_ip.c  15 Sep 2003 04:21:59 -
 @@ -257,6 +257,8 @@
   return(EMSGSIZE);
   }
   M_PREPEND(m, sizeof(struct ip), M_WAIT);
 + if (m == NULL)
 + return(ENOBUFS);
   ip = mtod(m, struct ip *);
   ip-ip_tos = inp-inp_ip_tos;
   ip-ip_off = 0;
 %%%
 
 -- 
 Maxim Konovalov, [EMAIL PROTECTED], [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Reminder: BSDCon next week in San Mateo!

2003-09-04 Thread Robert Watson

This is just a friendly reminder e-mail that the BSD Conference is taking
place in San Mateo next week, and that if you're planning to attend and
haven't yet registered, you might want to.  Or, just turn up and register
at the door.

There's a really strong lineup of FreeBSD-related papers, especially
relating to new features in the 5-CURRENT development line. I've attached
a list of just some of the interesting things that will be going on there:
they include a number of tutorials relating to development and
administration, technical session presentations relating to the
development of FreeBSD, development of products using FreeBSD, and the
deployment of FreeBSD-based systems.  And, as always, there will be a
variety of invited talks, BoFs and work-in-progress sessions. 

USENIX has extended their early registration pricing, and also (I believe) 
has an online registration discount.  Multi-employee discounts are also
available for companies sending more than one employee.  You can find out
more about the location, schedule of events, etc, at: 

  http://www.usenix.org/events/bsdcon03/

I look forward to seeing you there!

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


  Several excellent tutorials including one on developing storage
extensions using GEOM

  Keynote: Computing Fallacies (or, What Is the World Coming To?)

  Reasoning about SMP in FreeBSD

  devd-A Device Configuration Daemon

  ULE: A Modern Scheduler for FreeBSD

  An Automated Binary Security Update System for FreeBSD

  Building a High-performance Computing Cluster Using FreeBSD

  build.sh: Cross-building NetBSD

  Invited Talk: Long Range 802.11 WANs

  BSD Status Reports

  GBDE-GEOM Based Disk Encryption

  Cryptographic Device Support for FreeBSD

  Enhancements to the Fast Filesystem to Support Multi-Terabyte Storage
Systems

  Invited Talk: Social and Technical Implications of Nonproprietary
Software

  Running BSD Kernels as User Processes by Partial Emulation and Rewriting
of Machine Instructions

  A Digital Preservation Network Appliance Based on OpenBSD

  Using FreeBSD to Render Realtime Localized Audio and Video

  Work in Progess Reports (WiPs)

  Tagging Data in the Network Stack: mbuf_tags

  Fast IPSec: A High-Performance IPsec Implementation

  The WHBA Project: Experiences deeply embedding NetBSD

  Invited Talk: Post-Digital Possibilities


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Ugly Huge BSD Monster

2003-09-01 Thread Robert Watson

On Mon, 1 Sep 2003, Denis Troshin wrote:

 Almost every package I install requires a few other packages. This 'idea
 of using dependent packages' turns FreeBSD (and other unix-systems) to
 an ugly monster. 
 
 For example, I don't need Perl or Python but a few packages I install
 require them. 
 
 Does exist a programming under unix without these dependencies? 
 
 P.S.  Under Windows it is possible to write not bad applications which
 depend just on libraries (KERNEL32, USER32, GDI32).  And these libs
 exist on every base system!!! 
 
 Is it possible in unix? 
 
 Before I thought that unix programs very compact, but they are huge! 

You've already got a boatload of responses, but I figured I'd throw in
mine: it depends on the application.  If applications require a scripting
language, by virtue of what they do or how they are written, well, you get
a scripting language in the dependencies.  To get a Windows-like
environment on FreeBSD, you need to layer the X server and then a
toolkit/windowing environment on top -- my personal leaning right now is
to stick QT/KDE on top.  Once you have those pieces in place, you have a
lot of what you need to write general-purpose applications interacting
with users, the network, multimedia, etc.

If you look at some of the key UNIX software packages, however, you'll
see that they tend not to have a lot of dependencies -- Apache, Postgres,
MySQL, etc.  These applications avoid dependencies through less reliance
on scripting, GUI elements, etc.  One of the upsides, and downsides, of
the open source world is a strong dependence on scripting, and the
resulting diversification of scripting languages and rapid prototyping
tools.  This occurs in the Windows world also, though -- if you rely on
Java, you need the JVM.  If you have TCL applications, you need the TCL
environment as well.  Many web sites running on Windows use Perl for CGI
just as they do in UNIX, in which case you need Perl... 

One of the nice things about this package-oriented approach is that the
dependencies are generally very explicit: you want to write a gui app, so
you need the gui pieces.  Your application requires a back-end database,
so a database dependency is introduced.  In Windows, you have a larger
base but less ability to decompose as a result.  I'm also a bit alarmed
when I install a new application and pick up two new scripting languages
along the way -- I tend to avoid installing applications that pull in
scripting as a dependency.  However, sometimes that's unavoidable.  In
Windows, I think you'll find applications depend on more in the way of
libraries than you think, though...  Upgrades to system dlls when you
build and install applications are not infrequent -- application vendors
tend to quietly bundle all the dependent runtime components and quietly
install them

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Minimalist FreeBSD 4.8

2003-08-27 Thread Robert Watson
As has been mentioned, the FreeBSD source tree as shipped isn't configured
for minimization without a fair amount of effort.  However, there are a
number of larger components, typically maintained by third parties, that
are build-time removable, and are typically arguments to the build
specified in make.conf.  Here are the components people like to disable
with relative frequency, found by grepping for '^#NO' in
/usr/share/examples/etc/make.conf, plus a little trimming for entries that
have to do with compile flags on -CURRENT:

#NO_CVS=true# do not build CVS
#NO_CXX=true# do not build C++ and friends
#NO_BIND=   true# do not build BIND
#NO_FORTRAN=true# do not build g77 and related libraries
#NO_GDB=true# do not build GDB
#NO_I4B=true# do not build isdn4bsd package
#NO_IPFILTER=   true# do not build IP Filter package
#NO_KERBEROS=   true# do not build and install Kerberos 5 (KTH Heimdal)
#NO_LPR=true# do not build lpr and related programs
#NO_MAILWRAPPER=true# do not build the mailwrapper(8) MTA selector
#NO_MODULES=true# do not build modules with the kernel
#NO_OBJC=   true# do not build Objective C support
#NO_OPENSSH=true# do not build OpenSSH
#NO_OPENSSL=true# do not build OpenSSL (implies NO_KERBEROS and
#NO_SENDMAIL=   true# do not build sendmail and related programs
#NO_SHAREDOCS=  true# do not build the 4.4BSD legacy docs
#NO_TCSH=   true# do not build and install /bin/csh (which is tcsh)
#NO_X=  true# do not compile in XWindows support (e.g. doscmd)
#NOCRYPT=   true# do not build any crypto code
#NOGAMES=   true# do not build games (games/ subdir)
#NOINFO=true# do not make or install info files
#NOLIBC_R=  true# do not build libc_r (re-entrant version of libc)
#NOMAN= true# do not build manual pages
#NOPROFILE= true# Avoid compiling profiled libraries
#NOSHARE=   true# do not go into the share subdir

On 4.x-STABLE, the set is slightly different as Kerberos5 isn't built by
default, UUCP is included in the source tree, etc.  I don't think we
currently have a NO_GCC flag or NO_BINUTILS to avoid installing the
compiler and related tools, but I imagine those would be fairly
straight-forward to build.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Minimalist FreeBSD 4.8

2003-08-27 Thread Robert Watson

On Wed, 27 Aug 2003, Brian Reichert wrote:

 On Wed, Aug 27, 2003 at 07:26:10AM +1000, John Birrell wrote:
  One way to do this initially is to install a full FreeBSD system on one
  disk partition and use a second partition for a trial install. FreeBSD's
  boot manager will let you boot into each.
 
 As I'm pursuing these matters as well, I've found that mucking with
 jails is faster, for a lot of bulk work.  Starting/stopping a jail is
 _much_ quicker than reboots.  (And it's a lot easier to reset a jail to
 a prior state.)  This won't exercise the rc* scripts, but will let you
 quickly test for dependancies elsewhere. 

Actually, I tend to boot my jails using the existing rc pieces -- I skip
some of the hardware-esque things (network interface configuration, file
system mounting), but do use the rc stuff to start daemons. 

 And whatever you find for dependancies, please document them somewhere; 
 I still have a fantasy of 'deconstructing' FreeBSD into finer-grained
 packages...

One of the big problems with that process has between that people who've
attempted it (perhaps rationally) get caught up in combining
compartmentalization of the build and compartmentalization of the
delivery.  I.e., they sit there and try to figure out how to break out
libraries, utilities, etc, and get caught up in building the end-all to
package building infrastructure.  Something a little lower-hanging would
go a long way... 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Kernel Panic on FreeBSD-5.1-p2 with SMP support

2003-08-24 Thread Robert Watson
On Sun, 24 Aug 2003, Markus Paluschek wrote:

 I have Compaq Proliant Server with 2 Pentium Xeon IV 2,4GHz processors. 
 After installing FreeBSD-5.1, upgrading to FreeBSD-5.1-p2 by cvsup and
 recompiling kernel with SMP support I;ve download ircd-hybrid-7 sources
 and installed on user account after running it and writing /restart
 my.ircd.server Im getting kernel panic: than system locks, need to
 reset.  What to do for fixing that? 

There's a pretty useful chapter in the FreeBSD Developer's Handbook on
kernel debugging: the starting point for debugging a panic is to get a
stack trace and posting that.  I would actually suggest updating to
FreeBSD 5-CURRENT, since some pretty large bugfixes have gone into the
tree since the release, and they may well have fixed the problem you're
bumping into. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GEOM Gate.

2003-08-15 Thread Robert Watson

On Fri, 15 Aug 2003, Pawel Jakub Dawidek wrote:

 On Thu, Aug 14, 2003 at 09:48:57PM +0200, Attila Nagy wrote:
 + Bruce M Simpson wrote:
 + Whatever next? PCI-over-IP?
 + Collecting cheap on board serial lines to make a big terminal server 
 + makes sense to me :)
 + 
 + BTW, Pawel's stuff would be even more interesting if it would be 
 + possible to mount the same filesystem on more than one machines.
 
 It'll be, but probably in read-write mode on one machine and read-only
 mode on rest machines, because you don't export file systems here, but
 disk devices. 

In order to do this, you need a file system capable of multi-node
consistency, and a medium capable of supporting the consistency
mechanisms.  Since we can't handle mounting the same file system
read-write and read-only in multiple places from the same block device
without a likely panic, I expect much the same results with a distributed
block device.  Multiple read-only mounts should work OK, but you don't
want to violate the assumptions of the read-only mounts by introducing a
read-write mount.  File systems can be written that do synchronization on
using a protocol of some sort when talking to a common block device, but
that will keep you busy for a while, I expect :-). 

That said, I think the geom gate stuff looks very cool :-).  You might be
able to run some interesting performance numbers comparing NFS and UFS
over a remote block device.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Missing system call in linux emulation ( patch )

2003-08-14 Thread Robert Watson

On Thu, 14 Aug 2003, Steven Hartland wrote:

 I've created a patch for the linux emulation which adds a dummy for the
 exit_group syscall along with defining all functions up to 252 this
 fixes a crash in the BattleField 1942 server. How do I go about getting
 this into the various FreeBSD streams so others can benifit. 

File a PR, please.  BTW, I saw this on the OpenBSD source-changes list
recently and remembered the missing system call thread here:

  CVSROOT:/cvs
  Module name:src
  Changes by: [EMAIL PROTECTED]   2003/08/14 12:34:15

  Modified files:
  sys/compat/linux: syscalls.master 

  Log message:
  add more syscalls. implement exit_group (which is actually an alias for
  sys_exit), needed for newer glibc's binaries.
  from marius aamodt eriksen marius at monkey dot org

Assuming that this is the right approach to solving the problem, we could
probably pull that change into our linux emulator.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: possible deadlocks?

2003-08-07 Thread Robert Watson

On Wed, 6 Aug 2003, Ted Unangst wrote:

 My advisor Dawson Engler has written a deadlock detector, and we'd like
 some verification. They look like bugs, unless there is some other
 reason why two call chains cannot happen at the same time. 

Neat -- sounds like two good catches given the responses so far.  Can we
expect more such reports forthcoming?  This kind of help will be
invaluable in finishing up the fine-grained locking work.  Alternatively,
do you plan to post the software?  Is this static or dynamic analysis? 
etc, etc?  :-)

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Communications kernel - userland

2003-07-22 Thread Robert Watson

On Tue, 22 Jul 2003, Adam Migus wrote:

 Perhaps I'm not understanding you right but I think Pawel's idea is
 cool.  It seems to fulfill your requirements (except being network
 specific).  I suppose if it were network specific we could optimize it
 for packet streams and if we made it complicated enough it would require
 quite an elaborate sychronization and notification mechanism.  Is that
 closer to what have in mind? 

Well, the case I had particularly in mind was the rapid flow of packets
form the kernel to the user process; Pawel's suggestion handles the flow
of new data from the user process to the kernel well, and has substantial
similarity to some of the IO Lite mechanisms I pointed at (and hopefully
with many of the same performance benefits).  In the kernel-to-userspace
case, we want to avoid the copy of what is originally kernel-owned memory
(from the mbuf allocator) to the user process memory.  If you didn't care
about stuff like confidentiality of kernel memory, etc, the simplest
approach would be to actually map the mbuf memory (and possibly cluster)
into userspace, and then notify the user process in some form of the new
mapping.  However, because mbufs and their meta-data aren't page aligned
(etc, etc, etc), you really don't want to do it explicitly that way, I
suspect. 

By synchronization, I had in mind a mechanism by which the process and
kernel would communicate about memory ownership in the shared memory
space: I'm done with this packet, I'm done with these packets, I want
to continue delivery of that packet, I modified this packet, I'm
inserting a new packet here, I'm dropping this packet, all without
extensive memory copying, and with a moderate amount of asynchrony (and
possibly parallelism).  In terms of functionality, it might be similar to
some of the current services that forward between IPDIVERT in and out 
(such as natd), or between BPF pseudo-devices.  This sounds like something
that likely exists in a few commercial products already, so my question to
Terry was to whether he knew of any in the literature.  IOLite is the
closest I know of, as it supports the zero-copy page and memory ownership
bits, although I don't know if they allowed it to handle packets, perhaps
just datagrams and streams.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Communications kernel - userland

2003-07-21 Thread Robert Watson

On Mon, 21 Jul 2003, Terry Lambert wrote:

 Robert Watson wrote:
  Of these approaches, my favorite are writing directly to a file, and using
  a psuedo-device, depending on the requirements.  They have fairly
  well-defined security semantics (especially if you properly cache the
  open-time credentials in the file case).  I don't really like the Fifo
  case as it has to re-look-up the fifo each time, and has some odd blocking
  semantics.  Sockets, as I said, involve a lot of special casing, so unless
  you're already dealing with network code, you probably don't want to drag
  it into the mix.  If you're creating big new infrastructure for a feature,
  I suppose you could also hook it up as a first class object at the file
  descriptor level, in the style of kqueue.  If it's relatively minor event
  data, you could hook up a new kqueue event type.  You could also just use
  a special-purpose system call or sysctl if you don't mind a lot of context
  switching and lack of buffering.
 
 I like setting the PG_G bit on the page involved, which maps it into the
 address space of all processes.  8-). 

For one of our research projects, here at NAI, we did a fair amount of
userland network code prototyping.  We started out with IPDIVERT, then
pushed down to BPF using a partial network stack in userspace.  We've
found it's a lot easier on competent network developers who are unfamiliar
with the FreeBSD kernel code, not to mention easier on debugging.  We
never got so far on that project as to do shared memory between the kernel
and userspace, but I know that that's been done by at least a couple of
companies at various points to reduce copying and context switch costs for
userspace test frameworks.  One of the things I'd really like to see if
some decent throw packets between kernel and userspace primitive bits,
such that the kernel has a useful and logical way to expose buffer data
into directly mapped user pages, and an appropriate notification and
management system to reuse memory, etc.  Something that looks a bit like
the relationship between kernel device drivers and devices when it comes
to DMA management.  Do you know if any such framework exists? 
(Specifically targetted at exposing network packets...)  (Ideally not
requiring privilege in the user process, nor involving nasty integrity or
confidentiality problems :-)

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Communications kernel - userland

2003-07-21 Thread Robert Watson

On Mon, 21 Jul 2003, Pawel Jakub Dawidek wrote:

 For example syscall is marking some range with mark() function.  For now
 on this range isn't accessable from userland. If process will try to
 write to this page, page is copied (copy-on-write).  If this page will
 be modified by kernel it will be marked as MODIFIED.  Now when syscall
 will call unmark() on this range we could get two scenarious: 
 
   1. Page is marked as MODIFIED (by kernel) so userland copy
  of this page (if it exists of course) is destroyed and
  this page will be putted in its place.
  This is replacement for copyin() and then copyout() or
  just copyout()..
   2. Page isn't marked as MODIFIED, so kernel version of page
  is destroyed (is there is userland version).
  This is replacement for just copyin().
 
 There could be other ways. Thread/process could be locked if it is
 trying to access memory marked with mark() function. And this, I think,
 don't hit performance, because this happends really rarely. So maybe it
 is better to lock thread for a moment instead of doplicating page, but I
 don't think so. 

This sounds a bit like some of the IO Lite stuff -- moving to a
page-centric model for IO interfaces to avoid copy operations, in many
cases able to share pages between applications, buffer cache, network
buffers, etc. Take a look at:

  http://www.cs.princeton.edu/~vivek/

For some details.  Some of the benefits of this approach are captured in
the common case through sendfile(), in practice, but it's definitely worth
a read.

I guess what I had in mind was something more network-specific, with
interfaces optimized for memory mapped network packet streams.  In the
simplest case, something like memory-mapping the BPF buffer from kernel
space to userspace, with some sort of simple stream synchronization so
that the user application could notify the kernel as to when it could
reuse bits of the buffer, but avoiding copy operations and lots of context
switching. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Communications kernel - userland

2003-07-20 Thread Robert Watson

On Sat, 19 Jul 2003, Pawel Jakub Dawidek wrote:

 Your choices are:
 - device,
 - sysctl,
 - syscall.

There are actually a few other more obscure ways to push information from
the kernel to userspace, depending on what you want to accomplish.

Write directly to a file from the kernel.  ktrace, system accounting, and
ktr with alq all stream data directly to
a file provided by an authorized user process.  quotas and UFS1
extended attribute data are also written directly to a file.  On
other operating systems, audit implementations frequently take the same
approach -- when the goal is long term storage of data in a
user-accessible
form, but you don't want to stream it through a user process live, this
is usually the preference.  Typically, when taking this approach, a
special system call is used to notify the kernel of the target file to
write to -- the file is created by the user process with appropriate
protections.  Often, but not always, the system call is non-blocking and
simply returns once the file is hooked up as a target, and continues
until another system call cancels delivery, or switches it to a new
target.

Stream it through a device node.  If you need only one or a small number
of processes to listen for events from the kernel, a common approach
is a pseudo-device that acts like a file.  For example, syslogd listens
on /dev/klog for log events from the kernel; some audit implementations
also take this approach.  Our devd, usbd, and others similarly listen
for system events that are exposed to user processes as data on a
blocking pseudo-device.  One nice thing about this approach is that you
can combine it with select(), kqueue(), et al, to do centralized event
management in the application.  BPF also does this.  Both Arla and
Coda take this approach for LPC'ing to userspace to request events
as a result of VFS operations by processes.

Expose it using a special socket type.  We expose routing data and
network stack administrative controls as special reads, writes, and
ioctls on various socket types.  I'm not a big fan of this approach,
as it special cases a lot of bits, and requires you to get caught
up in socket semantics.  However, one advantage of this approach is
it makes the notion of multicast of events to multiple listeners easier
to deal with, since each socket endpoint has automatic message buffering.

There are some other odd cases in use as well.  The NFS locking code
opens a specially named fifo (/var/run/lock) and writes messages to
it, which are picked up by rpc.lockd.  The lock daemon pushes events
back into the kernel using a special system call.  I don't really
like this approach, as it has some odd semantics -- especially since
it reopens the fifo for each operation, and there are credential/
file system namespace inconsistencies.

Of these approaches, my favorite are writing directly to a file, and using
a psuedo-device, depending on the requirements.  They have fairly
well-defined security semantics (especially if you properly cache the
open-time credentials in the file case).  I don't really like the Fifo
case as it has to re-look-up the fifo each time, and has some odd blocking
semantics.  Sockets, as I said, involve a lot of special casing, so unless
you're already dealing with network code, you probably don't want to drag
it into the mix.  If you're creating big new infrastructure for a feature,
I suppose you could also hook it up as a first class object at the file
descriptor level, in the style of kqueue.  If it's relatively minor event
data, you could hook up a new kqueue event type.  You could also just use
a special-purpose system call or sysctl if you don't mind a lot of context
switching and lack of buffering. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: running 5.1-RELEASE with no procfs mounted (lockups?)

2003-07-17 Thread Robert Watson

On Tue, 15 Jul 2003, Josh Brooks wrote:

 I have loaded two 5.1-RELEASE systems, both of them have PROCFS and
 PSEUDOFS in the kernel, and yet neither of them have a procfs mounted. 
 
 There is no procfs line in /etc/fstab by default, and no procfs is
 mounted on the system in any way. 
 
 
 Question 1:  Is this intentional ?  Is it no longer needed/recommended
 to
  run a procfs ?

Most system functionality that relied on procfs has been rewritten to rely
on other mechanisms.  In general, I advise against running procfs--it's
interesting, but conceptually it's very risky.  If you look at the history
of security advisories on systems that supported procfs (FreeBSD, Linux,
Solaris), you'll get a sense of why: procfs represents processes as files,
and the semantics of processes and of files are very different.  For
example, with processes, there are notions of revoked access; processes
are reused to hold several programs often running with different
credentials.

The behavior I'm aware of that currently relies on procfs and has not yet
been adapted to use ptrace() or sysctl() are:

ps -e   Relies on groping around in the address space of each
process to display environmental variables.

truss   Relies on the event model of procfs; there have been some
initial patches and discussion of migrating truss to ptrace() but
I don't think we have anything very usable yet.  I'd be happy to
be corrected on this. :-)

Also, linprocfs, which offers many of the functions of procfs, relies on
pseudofs, and is required to run many Linux emulated programs.  Often for
rather bizarre reasons (retrieving command line arguments from the
per-process cmdline file...).

 Question 2:  Is this because I am running without procfs ?  Or have
 these
  type of problems been seen in 5.1-RELEASE by other causes ? 

This is most likely unrelated. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: current state of the art / best practice for devfs in a jail ?

2003-07-03 Thread Robert Watson

On Thu, 3 Jul 2003, Joshua Oreman wrote:

 On Thu, Jul 03, 2003 at 04:00:46AM -0700 or thereabouts, Josh Brooks wrote:
  
  I have been researching the various of ways people add devfs to a jail to
  give the jail certian /dev devices necessary to function ...
 
 Well, all I did was test your research :-)

Gordon Tetlow (victim CC'd) was, I believe, working on changes to rc.d to
allow automatic construction of jails at boot, and part of that was some
best practice devfs rules for jail.  Perhaps he could chime in now? :-)

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: per-directory quotas possible on 5.x ?

2003-06-30 Thread Robert Watson

On Sun, 29 Jun 2003, Josh Brooks wrote:

 Normally, quotas work on a per-user, per-filesystem basis - so if a user
 has a home directory and other processes _not owned by that user_ are
 placing files and using up space into that directory, it will not count
 toward the quota (unless they get chowned/chgrpd to that user/group). 
 
 Is there any way to enforce a quota on a directory, regardless of what
 ownership or group ownership the files and dirs inside the directory -
 that is to say, take directory X, located at an arbitrary spot on the
 system, I want it to grow no larger than size Y. 
 
 I know this can be done by creating a lot of little partitions - maybe
 even vn-backed parttion-on-file, but that seems like a hack, as they
 would be hard to resize. 
 
 I am looking for a way to force a changeable quota on a directory,
 regardless of what gets put in it, or who owns what gets put in it. 
 
 Any hacks/asuggestions/comments of any kind are very appreciated. 

Unfortunately, the UFS file system model makes it difficult to implement
this sort of feature.  One major part of this is that files can exist in
more than one directory at a time, by virtue of hard links; this in turn
is relied on for file system checking, where a file may end up linked to
more than one directory when certain failure modes occur and are recovered
from.  Another part of the problem is that the internals of UFS really
disassociate the namespace from the storage mechanism, and since such a
directory based quota system would determine the relationship between
files based on the namespace and not a per-inode attribute, this also
makes implementing such a system on a UFS file system difficult.  FWIW,
you can sometimes get similar semantics using group quotas and the fact
that, on BSD, entries created in directories have the group of the parent
directory in which they are created...

Most of the systems I've seen that do quotas on a large scale do basically
follow the many volumes model -- for example, large AFS cells may have
tens or hundreds of thousands of volumes, and use volume size to impose
quotas, which sounds like what you're looking for.  When I've seen things
like this done on UFS, it's usually been as a weak consistency accounting
mechanism -- measure the size of various trees at intervals and bill based
on the sampled size, rather than block allocation.

As you may have noticed in trying the vn-backed mechanism, there are some
inefficiencies that turn up in FreeBSD when have large numbers of
pseudo-devices, etc.  The resizing problem is real, also, since we don't
have online file system resizing.  FWIW, a file system like HFS+ (which
has a much more strict directory hierarchy) would lend itself to directory
quotas much more.  A port of HFS+ to FreeBSD was recently posted to
freebsd-fs.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Suid and gid files

2003-06-23 Thread Robert Watson

On Mon, 23 Jun 2003, Socketd wrote:

 I just installed FreeBSD 5.1 release and ran a find / -perm +4000 and
 find / -perm +2000. My question is: are any of these files used by the
 system, in a way that prevents me from making them non-executable to the
 world?  I have no shell users and don't use sendmail. 

Setuid can be turned off on pretty much all of the binaries; however, as
you turn off setuid bits, more and more things will not work for
unprivileged users.  During normal system operation, privileges are
usually dropped as opposed to acquired, so the exceptions are usually
for access to raw sockets, system devices, etc.  I recently removed the
setuid bit from the quota command in -CURRENT, and am in the throes of
reviewing the remaining setuid/setgid pieces as part of developing our
Security Architecture document. 

The one potentially problematic case that comes to mind is mail submission
by sendmail; mechanisms such as cron, at, etc, expect to be able to
generate mail from unprivileged users and that may break if you use
sendmail as the MTA but without setuid.  There are mail systems that don't
require setuid, instead relying on LTMP, which might be preferable in your
environment.  I also find su very helpful, FWIW :-). 

 Btw why is /usr/sbin/ppp world readable? (not that is matters) 

sproing:/usr/sbin ls -l ppp
-r-sr-xr--  1 root  network  367304 May  8 15:16 ppp*

Yeah, that is a little inconsistent, although not harmful as far as I can
tell.  I'll remove the read bit in -CURRENT and we'll see if anyone
complains :-). 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: struct ipc_perm

2003-06-20 Thread Robert Watson

On Wed, 18 Jun 2003, Dmitry Sivachenko wrote:

   Is there any reason why struct ipc_perm is not protected by #ifdef _KERNEL
   in ipc.h?  Is it supposed to be used from userland?
  
  It's needed by ipcs.
 
 Ah, I see.  It is visible via struct msqid_ds. 
 
 I developed a patch which requires addition of custom field to ipc_perm. 
 I am trying to imagine which problems can it cause to userland programs. 

We have local changes in the TrustedBSD development trees to extend all
the structures in the kernel without modifying the ABI.  We needed this to
put labels in the various System V IPC object structures.  We're not ready
to merge them yet, but it will probably happen in the next month or so.
If you'd like early access to the patch, we can drop you a copy.  We'll
merge it into the MAC tree in about a week.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue alternative?

2003-06-15 Thread Robert Watson

On Sun, 15 Jun 2003, Matthew Hagerty wrote:

 I'm writing a little application that needs to watch a file that another
 process is writing to, think 'tail -F'.  kqueue and kevent are going to
 do it for me on *BSD, but I'm also trying to support *cough* linux and
 other UN*X types OSes. 
 
 From what I can find on google, the linux community seems very opposed
 to kqueue and has not yet implemented it (they say: blah blah blah,
 aio_*, blah blah balh.)  What alternatives do I have with OSes that
 don't support kqueue?  I'd really hate to poll with stat(), but do I
 have any other choices? 

I was recently told about a library named libevent from Niels Provos,
which abstracts a variety of underlying event mechanisms behind a common
API.  You can learn a bit more about it here: 

  http://www.monkey.org/~provos/libevent/

It doesn't appear to support /dev/poll yet, but the web page suggests such
support is planned.  If it's not already a port, we should create one. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Jdk13/14 still hangs in 4.8 Prerelease. Outstanding Fix need (fwd)

2003-02-25 Thread Robert Watson

Per Martin's request, I'm forwarding this response to the broader group
involved in this thread.  Basically, I think broadening the scope of
processes permitted to make the scheduler call is fine, but you don't want
to use the CANSIGNAL() code that's currently present for several reasons.
The simplist solution might be to only allow the scheduler change if the
requesting process is targetting itself. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

-- Forwarded message --
Date: Tue, 25 Feb 2003 12:53:53 -0500 (EST)
From: Robert Watson [EMAIL PROTECTED]
To: Martin Blapp [EMAIL PROTECTED]
Subject: Re: Jdk13/14 still hangs in 4.8 Prerelease. Outstanding Fix need (fwd)


On Tue, 25 Feb 2003, Martin Blapp wrote:

 Basically, it changes p31b_proc() to not always return an error for
 non-root.  If rwaston@ signs off on the security implications (should be
 minimal, basically means that you can change your own scheduling params
 and can change the params of other processes you own) then I would
 prefer this patch. 

Hmm.  I think the check there is a bit on the unsafe side, that could be
why it was disabled.  Basically, it permits the scheduler change in the
following four circumstances:

(0) Superuser always wins
(1) Subject real uid is object real uid

E.g., any process I should randomly start or own

(2) Subject effective uid is object real uid

If a tool is temporarily switched to my uid to exercise my
privileges, sounds OK.

(3) Subject real uid is object effective uid (uh oh)
(4) Subject effective uid is object effective uid (uh oh)

The reason (3) and (4) are problems is that they affect daemons
temporarily switching to a user's privileges to carry out a task -- such
as mail delivery, or a userland NFS server or the like.  It could be that
these are poor handling of the loopback process case, wherein a process
can always modify its own scheduling.  Take a look at p_cansched() in 5.x
for a bit more what I think the check should be.  In summary, the rules
are:

(0) You can always reschedule the current process.
(1) If you're in a different jail, deny.
(2) Optionally call out to MAC.
(3) If the seeotheruids support says you can't see the other process,
you can't reschedule it either, regardless of uids.
(4) If the real uids are the same, it's OK -- i.e., any arbitrary shell
process (setuid or otherwise).
(5) If the subject effective uid is the same as the object real uid -- if
temporarily adopting a user's privileges, we can reschedule the
processes they own. 
(6) Superuser always wins (subject to 0, 1, 2, 3).
(7) Deny

 I don't know why the check was turned off.  The entire #if 0 / #else /
 #endif seems to have been around since revision 1.1.

It's probably because whoever wrote it realized that it was moderately
suspect.  I would oppose simply enabling the current CANSIGNAL check -- it
has serious problems.  On the other hand, putting in a refined check
sounds reasonable and I'd be happy to review such a patch.  Although the
code from 5.x won't instantly work with 4.x without substantial
modification, it might make a good starting point.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: Monitoring changes in extended attributes?

2003-02-24 Thread Robert Watson

On Wed, 12 Feb 2003, Kevin Fogleman wrote:

 Is there an existing way to monitor the entire filesystem for changes to
 any file, particularly changes in extended attributes? 
 
 I'm looking to write a program that builds an index of all
 user-accessable extended attributes for every file in the filesystem and
 then updates that index in real time according to modifications to
 existing files' attributes, creation of new files and deletion of files. 
  I've read over the documentation for kqueue, but some things were left
 unclear.  For example, it appears that kqueue needs a file descriptor
 for each file that one would want to monitor, making any large-scale
 file monitoring impractical.  Is there any other way in FreeBSD to be
 notified of file modifications in a way that would allow one to monitor
 the whole file system or large portions of it?  Also, I'm not very
 knowledgable about file system conventions, so I'm wondering how one
 would detect the creation of new files?  I don't really need to know
 whether a particular attribute changed, but rather just whether any of
 them changed. 
 
 BTW, I have posted this question earlier to freebsd-questions, but
 nobody answered and, judging by the content of the other questions on
 that list, I thought that my question would be more appropriate here. 

Currently, you can monitor particular files for meta-data changes, which
include extended attribute modifications, and you can monitor directories
for changes, which might include the addition of a new name (and hence
possibly a file).  However, currently there's no way to monitor at the
granularity of a file system for events such as Some EA changed or A
new file was allocated.  I guess such primitives haven't generally been
needed in the past, although I can certainly imagine scenarios where they
might be used.  Kqueue is the vehicle the two events I identified above
can be monitored with, and it's certainly possible to imagine adding new
event categories to monitor a file system for global events, assuming it's
a local file system.  However, then the question becomes Once I know that
a file has been added, how do I find it, which I would guess generally
results in a recursive search, at which point I suspect you might as well
just re-search the entire fs once in a while anyway.  The functionality
you're looking for sounds a bit more database-esque than in line with a
traditional file store.

FWIW, Apple has a searchfs() system call and vnode operation to permit
more efficient meta-data searches on HFS+; this makes some sense for HFS+
because it has a notion of a centralized meta-data store, whereas ours is
laid out pretty sparsely over the tree and works a bit differently.  They
don't support generalized meta-data extended attributes right now, though,
although they do have a few specific attributes beyond the standard set. 
Well, we actually have local patches to add EA's to their UFS file system
that would probably work on HFS+, but they aren't in the central Darwin
tree.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: Multi-level jailing.

2003-02-20 Thread Robert Watson

On Mon, 17 Feb 2003, Pawel Jakub Dawidek wrote:

 I have prepared patch for jail functionality against FreeBSD
 5.0-CURRENT.  It provides multi-level jailing and multiple ips for
 jails. 

Sounds cool, although I haven't had a chance to read the patch yet.
Question: how did you handle the problem (if at all) that INADDR_ANY
doesn't perform a wildcard binding with multiple IPs in the same jail?
It's not strictly required that it be handled, but it was always one of
the semantic problems I bumped into when I experimented with more IPs.  A
single-IP jail works because it maps INADDR_ANY into the only IP
available.  I'll try to get a box up and running with these changes in the
next few days and give them a spin.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: max simultaneous TCP connections (32,763)?

2003-01-26 Thread Robert Watson

On Sun, 26 Jan 2003, Sam Tannous wrote:

 I have two freebsd boxes (back to back) and I've been playing with a
 simple server on one machine and client on the other machine (this was
 simply an exercise with playing with kqueue).  Both the server and the
 client are single processes and the client seems to stop at 32,763
 connections. 
 
 I've modified the port range, tcp keepalive, kern.ipc.somaxconn,
 maxfiles, maxsockets, nmbclusters.  I even tried
 net.inet.tcp.tcbhashsize (up to 1024). 
 
 Is there some other parameter I'm missing?  Or is this a known
 limitation/bug? 

Some of this has to do with limits on the available ancillary ports for
out-going connections.  Try adding additional IP addresses to the client
machine, and forcing your client software to use specific IP addresses. 
TCP uniquely identifies connections by the pair of port numbers and IP
addresses, so assuming unconstrained use of the outgoing port space on a
particular IP, that TCP/IP can in theory support up to (approx) 64k
outgoing connections.  In practice, we only allocate out of specific
ranges.  By adding additional IP addresses for outgoing connections, you
increase the number of potential connections to a particular remote
IP/port tuple.  However, if you're not specifying a local IP address, the
stack will pick the most appropriate local address for the route, which
is probably the first IP address on the interface associated with the
route to the other endpoint.  Hard-coding local addreses in your
application overrides that.  I've never tried this (i.e., using multiple
IPs to get around the TCP/IP limit), so if it doesn't work, let me know.
In theory, it should.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: umount of procfs fails

2003-01-26 Thread Robert Watson

On Sun, 26 Jan 2003, Tim Kientzle wrote:

 Experimenting with 'mount' and stumbled across the following oddity: 
 
 mount -t procfs proc /mnt
 umount -t /mnt

You're missing the proc after -t here, right?

 results in procfs still mounted on /mnt but no longer mounted on /proc. 
 It appears that a umount of procfs is unmounting the most recently
 mounted instance rather than the instance mounted at the specified
 location. 
 
 I haven't checked to see if other filesystem types have this problem. 

I experimented a bit, and found the following:

If I unmount /mnt using simply umount /mnt on -CURRENT or -STABLE, /mnt
is unmounted.  If I specify -t procfs, the same thing happens.  If I
mis-specify the mount type as -t proc, then it silently fails, and
neither is unmounted.  Which isn't to say that it didn't happen for you,
but you should probably provide a bit more information.  First, could you
identify the version of FreeBSD you're running?  Second, can you include
script output of the shell session in which you mount /proc, /mnt, run
mount to confirm they are both mounted, then umount one, run mount to show
the wrong one unmounted? 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: NFS ACLS's ?

2002-12-29 Thread Robert Watson

On Fri, 27 Dec 2002, joe mcguckin wrote:

 Are there any strange interactions between NFS and filesystems that are
 not UFS? E.g. UFS2? Does NFS support new features that these fs's may
 implement? 

NFS can represent many but not all of the services found in UFS1 and UFS2. 
Among things it doesn't support are the retrieval and manipulation of BSD
file user flags, system flags, extended attributes, and access control
lists (ACLs). However, NFSv3 does correctly handle enforcement with these
features because clients rely on the server to evaluate protections on
file system objects using an ACCESS RPC.  NFS2 evaluates protections on
the client (if I recall correctly) so may not behave properly.  There are
RPC extensions to NFSv3 to retrieve and manipulate ACLs on Solaris, IRIX,
et al, but we don't currently implement those extensions.  Likewise, NFSv4
supports ACL management, but we don't yet implement NFSv4.  It shouldn't
be too hard to dig up information on the NFSv3 ACL RPC extensions and
implement them on FreeBSD 5, since the semantics of our ACLs are highly
compatible with Solaris and IRIX.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: panic: icmp_error: bad length

2002-12-11 Thread Robert Watson

BTW, if this bug exists in 5.0 for the same reasons (or even different
ones), we should try to generate a fix ASAP and get it committed.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

On Thu, 12 Dec 2002, Ian Dowse wrote:

 In message [EMAIL PROTECTED], Luigi Rizzo writes:
 the diagnosis looks reasonable, though i do not remember changing
 anything related to this between 4.6 and 4.7 so i wonder why the
 error did not appear in earlier versions of the code.
 
 Yes strange - actually, it looks like the THERE IS NO FUNCTIONAL
 OR EXTERNAL API CHANGE IN THIS COMMIT commit may be to blame :-)
 Some fragments below.
 
 Ian
 
 bridge.c 1.16.2.2:
 +#ifdef PFIL_HOOKS
 ...
 -* before calling the firewall, swap fields the same as IP does.
 -* here we assume the pkt is an IP one and the header is contiguous
 ...
 -   ip = mtod(m0, struct ip *);
 -   NTOHS(ip-ip_len);
 -   NTOHS(ip-ip_off);
 
 ip_fw.c 1.131.2.34:
 -   if (0  BRIDGED) { /* not yet... */
 -   offset = (ntohs(ip-ip_off)  IP_OFFMASK);
 +   if (BRIDGED) { /* bridged packets are as on the wire */
 +   ip_off = ntohs(ip-ip_off);
 ip_len = ntohs(ip-ip_len);
   } else {
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: kernel/userland ssh filesystem for FreeBSD?

2002-12-11 Thread Robert Watson

On Wed, 11 Dec 2002, Marco Molteni wrote:

 as you might know, both kde (via kio-fish) and gnome (via gnome virtual
 file system) provide a userland filesystem-like API that allows to
 mount a remote filesystem using ssh. What I don't like about those
 solutions is that they require the application to use a particular API
 (kio slave or gnome vfs). 
 
 Another approach, that provides a real filesystem interface, is the
 Linux Userspace File System. 
 
 Quoting from http://lufs.sourceforge.net/lufs/intro.html: 
 
 LUFS is a hybrid userspace filesystem framework supporting an
 indefinite
 number of filesystems transparently for any application. It consists
 of
 a kernel module and an userspace daemon. Basically it delegates most
 of
 the VFS calls to a specialized daemon which handles them. 
 
 Now the question: if I wanted to do something similar for FreeBSD, how
 would I do it? Any high-level hints? 

FreeBSD actually includes a module for this very purpose to support the
Coda file system, which uses a userspace cache manager to interact with
directory services, manage the on-disk local cache, etc.  I actually
slightly prefer the Arla XFS kernel module, which behaves in an almost
identical manner.  Both create /dev nodes and communicate their needs via
what are effectively RPC upcalls.  They both follow the model that a
daemon exists in userspace to support a file system mount, and will update
the kernel with namespace information, as well as providing referenced to
cache files locally.  Usually the userland daemon is threaded, and matches
worker threads with kernel threads/processes currently blocked in file
system activity.  I know there was discussion of getting the XFS module to
support more than one mountpoint at a time, but I'm not sure if that
happened or not.  The Arla code is separately distributed from FreeBSD,
but there's a port I believe.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Some problems about KSE

2002-12-11 Thread Robert Watson
A commit was made to correct the KSE crash shortly after 5.0-RC1.  You can
cvsup forward to a newer revision, or wait for RC2.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

On Wed, 11 Dec 2002, ouyang kai wrote:

 Hi, everybody,
   I want to make sure whether we can program the multi-thread code based on 
 KSE in FreeBSD5.0 RC-1.
   I have make in '/usr/src/lib/libpthread', I found some new things in 
 '/usr/lib' as follow:
 lrwxr-xr-x   1 root  wheel   11 Dec 11 16:04 libkse.so - libkse.so.1
 -r--r--r--   1 root  wheel68780 Dec 11 16:04 libkse.so.1
 -r--r--r--   1 root  wheel   164448 Dec 11 16:04 libkse_p.a
 -r--r--r--   1 root  wheel   153854 Dec 11 16:04 libkse.a
   So if I program. How can I use the kse?
   I can use pthread(3) as traditional manner, only using '-lpthread' instead 
 of '-pthread' in my makefile, right?
   when I use /usr/src/tools/KSE/ksetest/ksetest program , it always cause my 
 box crash. I have report this issue to Julian.
   I am seeing KSE(2), I have some puzzles about that.
   1. upcall is really means what? Does it represent through 'km_func'? if it 
 were true, the 'km_func' is indicated by whom? UTS, Kernel, or user program, 
 I do not know.
   2. When one process has more than one KSEG, the signal should be delivered 
 to which KSEG? The manual said it is indeterminate. I do not know how the 
 signal could be delivered to the special KSEG exactly?
 
 Thank you!
 Best Regards
   Ouyang Kai
 
 
 
 
 _
 Add photos to your e-mail with MSN 8. Get 2 months FREE*. 
 http://join.msn.com/?page=features/featuredemail
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



<    1   2   3   4   5   6   7   8   >