Re: Needed: suid library calls (was Re: cvs commit: src/crypto/openssh sshd_config)

2000-05-25 Thread Ville-Pertti Keinonen

[EMAIL PROTECTED] (Nick Sayer) writes:

 What we _really_ need is some mechanism to recognize the difference
 between a user program and a system library, with an eye towards
 granting privileges to trusted libraries without letting those privileges
 leak past the library in question.
 
 I don't claim that this is an _easy_ thing to do, nor that it is a particularly
 standard thing to do.

(Shared) libraries are currently a userland concept.  Doing what
you're suggesting would require a special kind of library, controlled
by the kernel and called through the kernel.

In order to protect from threads and other means of sharing memory,
the library would have to use its own memory for everything writable,
protected against access by the unprivileged parts of the user
program.

This would effectively create a new ring of protection somewhere in
between the "userland" and "kernel" privileges - a MULTICS concept, as
Matthew Dillon noted - with its own stack and memory.

On architectures that only implement user/supervisor modes of
execution and don't provide segments or other kludges, such library
calls would, in a sense, be executed in different processes
(protection would require a separate address space - assuming that the
library calls wouldn't be running in supervisor mode, in which case
the entire mechanism would basically be per-process loadable system
calls, also not an acceptable solution).

 But the mechanism of having some sort of daemon or service whose
 job it is to just do !strcmp(pw-pw_passwd,crypt(foo,pw-pw_passwd))
 is, I think, kind of overkill.

It would also have to open the password database using the appropriate
privileges...which in the case of a privileged library and
multithreaded programs (or just rfork) is unsafe because other threads
could also access the database while the library has the file handle
open.

IMHO a "privileged library" would, to be safe, have to be so well
isolated from the rest of the program that the functionality might as
well be in a separate process.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: ipsec 'replay' syslog error messages after reboot of one host

2000-05-11 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Matthew Dillon) writes:

 The question is:  What am I forgetting to do?  Or is this a bug in our
 IPSEC implementation?

AFAIK this is more or less how it's supposed to work.  IPsec is a
mess.  Security associations are not stateless, ESP provides replay
protection using a sequence number.  Replay-prevention is, however,
optional, and the setkey manual page claims it to be off by default,
so it could be a bug...you might want to try specifying -r 0
explicitly.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: ipsec 'replay' syslog error messages after reboot of one host

2000-05-11 Thread Ville-Pertti Keinonen


 IPSec isn't well documented, but once I figured out the config
 file it didn't seem too bad.  I am guessing that replay prevention

Reading the RFCs might be more helpful than most of the KAME
documentation.  There's also a lot of undocumented stuff for which the
sources seem to be the only source of information (e.g. how PF_KEY v2
differs from the standard).

 I had to fix up /etc/rc.network a little to load the ipsec rules
 at the appropriate point (just after the interface and ipfw setup,
 but before any services (like NFS) are run).  I am going to put the
 (relatively simple) patch for rc.network up for a quick review and
 then commit it along with an example file and a reference to the
 example file in the man page.

Fixed security associations with an infinite lifetime are certainly
not the ideal way of using IPsec.  Examples of setups like this should
be provided with the appropriate warnings.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Why this works?

2000-05-11 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (FengYue) writes:

 I've 3 small programs.  First one writes 4K of data contains 'A's into a
 file /tmp/pagetest and then lseek() to the begin of the file.
 Second one writes 4K of 'Z' into the same file /tmp/pagetest and
 then lseek() to the begin of the file.  They both do that in a tight
 loop.  Now, the third program reads 4K of data from /tmp/pagetest
 and exit if the 4K data does not contain all 'A's nor 'Z's.  3 programs
 run concurrently on the same machine (3.4).  No lock in the code whatsoever,
 and all 3 programs use pure write() and read().  I thought the third
 program would exit pretty quickly since the data in the file may contain
 mixed of 'A's and 'Z's, but it has been running for hours and nothing
 happened.  Could someone kindly explain this?  I was told that this is
 because the pagesize is 4096 in the kernel, so that read()/write() 4K of
 data will not get context switched until the call is compeleted.  
 Is that right?

Not quite.  If FreeBSD didn't perform locking, operations affecting
single filesystem blocks would probably be atomic (as long as the
userland buffer is in memory).

However, FreeBSD does perform locking in read(2) and write(2) for
local files, so your third program should never fail and exit.

Note that the system call interface does not guarantee reads or writes
to be atomic, this just happens to be how it is implemented at the
moment.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: [OT] Finding people with GSM phones (was Re: GPS heads up )

2000-05-08 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Olaf Hoyer) writes:

 Well, thats reality.
 Sometimes the mobile telco hotlines are so overloaded, you cannot even tell
 them that your phone was stolen. (Talk about service-but you get what you
 pay for)
 In germany, there is some list, where every cell phone can be entered with
 its IMEI-number (thats like the MAC on an ethernet card). So theoretically
 you simply enter them and make them useless for the thief. 

In Finland, somebody is apparently doing something to track down
stolen phones, rather than block their use.

One Saturday morning I got a call from someone at some agency (I
couldn't quite make out what it was, it sounded like customs but that
would seem odd) accusing me of stealing the GSM phone I was using.  It
turned out that he had one digit wrong (presumably of the either the
IMEI-number or just the MSISDN).  I wonder what he was trying to
accomplish by calling the supposedly stolen phone.

This was last month, but not on April 1...  ;--)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: What are the best gcc optimization options for Pentium 200 MMX

2000-04-10 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Kris Kennaway) writes:

 Can you say "gimmick"? :-) gcc often produces demonstrably broken code for
 optimisation levels higher than -O.

That -O is safe seems to be a persistent myth.  GCC also produces
broken code for -O and no optimization in some cases, sometimes while
producing working code for higher optimization levels...  I wouldn't
state e.g. that -O2 produces broken code any more often than -O, this
may have been true for version X.Y.Z but is certainly not universally
true.

I believe that the reasons the FreeBSD build uses -O are the fact that
especially with older versions of gcc, -O2 slowed down compilation
considerably for little noticable performance improvement (as for -O3,
automatic inlining is generally undesirable), and it is always best to
only have to test the system with a single set of flags.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Shared /bin and /sbin

2000-03-30 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Warner Losh) writes:

 I have a system that has one file system on it (eg everything is on
 /).  I'm finding that a lot of space is wasted on the multiple static
 copies of libc in /sbin and /bin.  I was thinking about building, for
 this system only, /bin and /sbin dynamic.  Has anybody ever done this?
 What are the implications of doing this.  I can't think of anything
 that would stop this from working, but I thought I'd run it by people
 here.

I've done this, and did manage to get an almost complete system into a
reasonably small space.  It was 2.2.x, but I wouldn't expect any
special new requirements with more current versions.  IIRC it didn't
require much more than fixing the appropriate Makefile.incs in the
source tree.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: UVM vs FreeBSD VM system

2000-01-19 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Jonas Bulow) writes:

 How does the UVM system compare to the VM system in FreeBSD?  Are there
 any benchmark tests or research results in this area?

The dissertation paper on UVM describes the differences (and is
reasonably objective).  It can be found on the UVM pages
(http://www.ccrc.wustl.edu/pub/chuck/tech/uvm/).

I'm not aware of any benchmarks, but would expect UVM to be somewhat
slower doing most things (not by design, just the current
implementation).


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: vmnet (was: Linux ioctl not implemented error)

1999-12-07 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Vladimir N. Silyaev) writes:

 are need to have steal nerves. I fill that, at the time when I was porting 
 vmware. I have too much hours of very interested work - load driver, launch
 vmware and then looking into the DDB double fault screen. Reload box,
 and then again.

I suppose that the incomplete virtualization of the x86 prevents you
from running vmware on FreeBSD on vmware on Linux for debugging?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: journaling UFS and LFS

1999-10-31 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Don) writes:

  and the next question: now that LFS starts to get usable in NetBSD 
  - has anybody started to look at getting it working again in
  FreeBSD too (maybe matt ?) or has it on the TODO list
 LFS is being considered as a starting point for this project. The goal is
 to build an extensible file system with features such as the ability to
 grow and shrink partitions, acl's journaling etc.

There is a difference between a log-structured filesystem and a
journaling filesystem...

 XFS is also being considered as a feature reference.

*Very* different from LFS.  (What are features?  "Has files and
directories"?  Time-complexity?  Implementation details?  Buzzwords?)

This seems a bit hard to believe (must check freebsd-fs to see if
people are actually *seriously* considering LFS as a starting
point...).


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: X11/C++ question

1999-10-27 Thread Ville-Pertti Keinonen

[EMAIL PROTECTED] (Chuck Robey) writes:

 Boy, I sure wish Java compiled and ran natively.  I'd stop using C++
 forever.

gcc-2.95.1 + libgcj already works, at least for simple programs.  On
FreeBSD 3.x programs seem to work as long as you use statically linked
libraries (shared libraries cause the garbage collector to dump core).

There already seems to be some awt code in libgcj, I have no idea
whether it's actually functional.

And the speed isn't quite comparable to what you can achieve
lower-level languages (pretty close to the equivalent C++ code with
all methods virtual, heavy use of rtti and common-base-class-based
containers), but probably good enough for a lot of things.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: --enable-haifa

1999-10-14 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (W Gerald Hicks) writes:

 I don't have a shiny new K7 yet, where I might expect the haifa
 build to make more of a difference than my crusty old Pentium...

Processors with out-of-order execution benefit *less* from scheduling
than non-OOO superscalar processors.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Multiple routes to the same destination

1999-09-20 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Zhihui Zhang) writes:

 As said by the 4.4 BSD book (page 423), 4.4 BSD does not support multiple
 routes to the same destination (identical key and mask). Does the radix
 tree code in FreeBSD - 4.0 has the same limitation?  I am wondering if
 there is already a solution for this? 

How would the routing code use multiple routes?  You'd need additional
rules to determine how to use them (e.g. round-robin for load
balancing).

In some cases where you want something unusual, you can use different
net sizes for the same net.  The code selects the route with the
smallest net (or at least used to - I don't know whether this is
documented behavior).

Note that the destination presumably means the destination where the
data being routed should end up, not the gateway it is sent to.
Multiple routes referring to the same gateway are obviously supported.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: aio_*

1999-09-14 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Jayson Nordwick) writes:

 While reading through (at least trying to... I wish there was some sort of
 kernel documentation available, the entry fee is very high) the aio_* calls,
 I had a few questions to clear up my understanding: 

 1) Do they only work on files?  The only implementation I see is in 
 the VFS layer.

AIO is not in the VFS layer.  The source file containing the
implementation is improperly named.

It works on any file descriptor that you could do the equivalent
read(2)/write(2) calls on.

 2) It is my understanding that it uses an aio daemon running as a kernel
 thread (the aio_daemon() call kind of give that one away).  It seems as
 if this can be almost entirely done in user space.  More important to what
 I am trying to do, it seems as if aio_* does not give peak latency 
 or throughput performace, since the aio_daemon has to compete for resources
 along with all other processes.

 Should aio_* be used for applications that have high performance
 requirements?  What does aio_* get you above having a seperate 
 thread pumping in/out data?

The implementation in FreeBSD probably isn't a particularly efficient
one.

It should be faster than threads, though.  You'll need fewer switches
between user and kernel mode and synchronization is simpler.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: aio_*

1999-09-14 Thread Ville-Pertti Keinonen

nordw...@scam.xcf.berkeley.edu (Jayson Nordwick) writes:

 While reading through (at least trying to... I wish there was some sort of
 kernel documentation available, the entry fee is very high) the aio_* calls,
 I had a few questions to clear up my understanding: 

 1) Do they only work on files?  The only implementation I see is in 
 the VFS layer.

AIO is not in the VFS layer.  The source file containing the
implementation is improperly named.

It works on any file descriptor that you could do the equivalent
read(2)/write(2) calls on.

 2) It is my understanding that it uses an aio daemon running as a kernel
 thread (the aio_daemon() call kind of give that one away).  It seems as
 if this can be almost entirely done in user space.  More important to what
 I am trying to do, it seems as if aio_* does not give peak latency 
 or throughput performace, since the aio_daemon has to compete for resources
 along with all other processes.

 Should aio_* be used for applications that have high performance
 requirements?  What does aio_* get you above having a seperate 
 thread pumping in/out data?

The implementation in FreeBSD probably isn't a particularly efficient
one.

It should be faster than threads, though.  You'll need fewer switches
between user and kernel mode and synchronization is simpler.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Proposal: Add generic username for 3rd-party MTA's

1999-09-02 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Sheldon Hearn) writes:

 Actually, not. The postfix and exim ports, at least, would be taught to
 use the new UID when it became available in STABLE. I'm pretty sure
 smail and others would follow suit. Remember, _we_ control the ports and
 can have packages install for whatever ID we please.

The transition could be much quicker if the uid was added by a port
(e.g. mail/smtp-user) and the ports that wanted to use it depended on
that port.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Proposal: Add generic username for 3rd-party MTA's

1999-09-02 Thread Ville-Pertti Keinonen

sheld...@uunet.co.za (Sheldon Hearn) writes:

 Actually, not. The postfix and exim ports, at least, would be taught to
 use the new UID when it became available in STABLE. I'm pretty sure
 smail and others would follow suit. Remember, _we_ control the ports and
 can have packages install for whatever ID we please.

The transition could be much quicker if the uid was added by a port
(e.g. mail/smtp-user) and the ports that wanted to use it depended on
that port.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Sharing file descriptors

1999-08-31 Thread Ville-Pertti Keinonen

bri...@wintelcom.net (Alfred Perlstein) writes:

 1) file descriptor passing (described in Unix Network Programming Vol I)

Or just read recv(2), search for SCM_RIGHTS.

 2) shared address fork (should be on http://lt.tar.com)

Or just read rfork(2), and you don't need to share the address space.

The general idea of software server redundancy seems a bit odd,
though, debugging the software carefully and automatically restarting
it on failures is generally a better idea.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: locking revisited

1999-08-30 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Greg Lehey) writes:

 All systems which do more than one thing at a time need file locking
 at some time or another.  Since it involves cooperation between
 potentially unrelated processes, it's an obvious kernel function.  Any
 "solution" requiring cooperation between processes isn't really a
 solution.  As a result, I don't consider advisory locking to be real
 locking: it's just a kludge.

But strict explicit locks (I'll avoid the term "mandatory" to avoid
confusion) are only better than advisory locks in some restricted
cases.

You haven't commented on my previous critiques here.  Nobody has
disputed them, other than by saying that mandatory locking does not
mean locking where you have to actually call a function to apply the
lock, but that doesn't appear to be what *you* mean by mandatory
locking.

As the most explicit critique was not Cc'd to you (I lose fields
because I read mailing lists through list-news gateways), and you may
have missed it in all of the noise on the list, so I'll quote myself
here:

The most significant advantage I see with mandatory locking over
advisory locking is guaranteeing atomicity for things done by
programs that do use locking.  This only protects the data when
programs that access the same data without locking don't need
locking, which generally means that they either don't need to
modify the data or that there can't be multiple instances of those
other programs *and* the modifications made are themselves atomic
(can't be read-modify-write, or even multiple writes if
consistency is required).

This is a somewhat limited set of cases.  If anyone can come up
with a counter-example, please present it.

Locking entire files, in addition to ranges, would seem to me to
be of further benefit, as it would allow properly locking programs
to fully protect against any single non-locking program which,
like Greg's cat example, would presumably be run interactively and
thus would require explicit stupidity to create additional races.

Note that protection is *at best* against a single program and at
worst only against other cooperative programs, as with advisory locks.

I can understand the aesthetic point of preferring that when you say
"lock the file", other processes don't get to do things to the file,
but the practical value of being able to do this is not *that*
significant, since the use of any non-cooperative programs makes
advisory and explicit locks equally useless with the exception of
cases mentioned above.

 FreeBSD is one of the few operating systems which doesn't have
 kernel-level locking.  If we want to emulate other systems correctly,
 we *must* have advisory locking.  This includes SCO UNIX, System V.4

How is FreeBSD's flock/fcntl advisory locking not kernel-level
locking?

 All this doesn't leave too much room for arguments about whether
 locking works or not: it works on all platforms except FreeBSD, and
 that's only because FreeBSD doesn't implement locking.

FreeBSD implements advisory locking.  I would expect most people to
consider it locking.  It's sufficient for a lot of things.  There are
cases where some other kind of locking would be better, but then you
may say that any voluntary locking scheme is useless.

  - System V style.  We need this for compatibility with System V.  The
choice of mandatory or advisory locking depends on the file
permissions.

  - Only mandatory locking.  fcntl works as before, but locks are
always mandatory, not advisory.  I'm sure that this won't be
popular, at least initially, but if you don't like it, you don't
have to use it.y

Perhaps both of the above, all with on/off sysctls.  I would suggest
implementing both range and file locking.

Implementing mandatory locking as Terry Lambert defines it might also
be worth considering, but it shouldn't be made easy to turn on by
accident.

And for all additions, things should be properly documented, since
they might not guarantee what people would expect them to.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: locking revisited

1999-08-30 Thread Ville-Pertti Keinonen

g...@lemis.com (Greg Lehey) writes:

 All systems which do more than one thing at a time need file locking
 at some time or another.  Since it involves cooperation between
 potentially unrelated processes, it's an obvious kernel function.  Any
 solution requiring cooperation between processes isn't really a
 solution.  As a result, I don't consider advisory locking to be real
 locking: it's just a kludge.

But strict explicit locks (I'll avoid the term mandatory to avoid
confusion) are only better than advisory locks in some restricted
cases.

You haven't commented on my previous critiques here.  Nobody has
disputed them, other than by saying that mandatory locking does not
mean locking where you have to actually call a function to apply the
lock, but that doesn't appear to be what *you* mean by mandatory
locking.

As the most explicit critique was not Cc'd to you (I lose fields
because I read mailing lists through list-news gateways), and you may
have missed it in all of the noise on the list, so I'll quote myself
here:

The most significant advantage I see with mandatory locking over
advisory locking is guaranteeing atomicity for things done by
programs that do use locking.  This only protects the data when
programs that access the same data without locking don't need
locking, which generally means that they either don't need to
modify the data or that there can't be multiple instances of those
other programs *and* the modifications made are themselves atomic
(can't be read-modify-write, or even multiple writes if
consistency is required).

This is a somewhat limited set of cases.  If anyone can come up
with a counter-example, please present it.

Locking entire files, in addition to ranges, would seem to me to
be of further benefit, as it would allow properly locking programs
to fully protect against any single non-locking program which,
like Greg's cat example, would presumably be run interactively and
thus would require explicit stupidity to create additional races.

Note that protection is *at best* against a single program and at
worst only against other cooperative programs, as with advisory locks.

I can understand the aesthetic point of preferring that when you say
lock the file, other processes don't get to do things to the file,
but the practical value of being able to do this is not *that*
significant, since the use of any non-cooperative programs makes
advisory and explicit locks equally useless with the exception of
cases mentioned above.

 FreeBSD is one of the few operating systems which doesn't have
 kernel-level locking.  If we want to emulate other systems correctly,
 we *must* have advisory locking.  This includes SCO UNIX, System V.4

How is FreeBSD's flock/fcntl advisory locking not kernel-level
locking?

 All this doesn't leave too much room for arguments about whether
 locking works or not: it works on all platforms except FreeBSD, and
 that's only because FreeBSD doesn't implement locking.

FreeBSD implements advisory locking.  I would expect most people to
consider it locking.  It's sufficient for a lot of things.  There are
cases where some other kind of locking would be better, but then you
may say that any voluntary locking scheme is useless.

  - System V style.  We need this for compatibility with System V.  The
choice of mandatory or advisory locking depends on the file
permissions.

  - Only mandatory locking.  fcntl works as before, but locks are
always mandatory, not advisory.  I'm sure that this won't be
popular, at least initially, but if you don't like it, you don't
have to use it.y

Perhaps both of the above, all with on/off sysctls.  I would suggest
implementing both range and file locking.

Implementing mandatory locking as Terry Lambert defines it might also
be worth considering, but it shouldn't be made easy to turn on by
accident.

And for all additions, things should be properly documented, since
they might not guarantee what people would expect them to.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-27 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Terry Lambert) writes:

 I think this has been the basis of your objection so far.  If so,
 it's a fundamental misunderstanding of "mandatory".  In this context

What I was objecting to were some of the arguments made by Greg Lehey
and Wes Peters, both of whom explicitly stated that opening does not
block.

It had nothing to do with mandatory locking beyond that (quite
possibly flawed) interpretation.

 By your definiton of explicit, no.  However, explicit locking is
 voluntary, just as advisory locking is voluntary, in terms of
 whether programs participate (or not).

 This pretty much means that explicit locking degrades to advisory
 locking, in the presence of (un)intentionally non-participatory
 programs.

That's basically what my objection was.

The "deadlock prone" objection made by others applies more strongly to
implicit locking, and is also valid.  It can take quite a bit of care
to ensure that there is always a maintenance path to the system that
allows a sufficient environment to be used without blocking on locked
files to allow root to get in and kill any processes causing problems.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Mandatory locking?

1999-08-27 Thread Ville-Pertti Keinonen

tlamb...@primenet.com (Terry Lambert) writes:

 I think this has been the basis of your objection so far.  If so,
 it's a fundamental misunderstanding of mandatory.  In this context

What I was objecting to were some of the arguments made by Greg Lehey
and Wes Peters, both of whom explicitly stated that opening does not
block.

It had nothing to do with mandatory locking beyond that (quite
possibly flawed) interpretation.

 By your definiton of explicit, no.  However, explicit locking is
 voluntary, just as advisory locking is voluntary, in terms of
 whether programs participate (or not).

 This pretty much means that explicit locking degrades to advisory
 locking, in the presence of (un)intentionally non-participatory
 programs.

That's basically what my objection was.

The deadlock prone objection made by others applies more strongly to
implicit locking, and is also valid.  It can take quite a bit of care
to ensure that there is always a maintenance path to the system that
allows a sufficient environment to be used without blocking on locked
files to allow root to get in and kill any processes causing problems.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-26 Thread Ville-Pertti Keinonen


 Not to jump down your throat, or anything, but you seem to be
 perpetuating some incorrct assumptions about both effect and
 proposed implementation details, and they must be stomped.  8-).

I was assuming that mandatory locking, in the context of this
discussion, does not mean automatic, forced exclusion on open, but
rather explicit locks, applied by calls similar to those used for
advisory locking, that are enforced by the kernel.

The arguments presented by most people seem to rely on such an
interpretation.

To avoid confusion, I'll refer to the possible locking methods as
advisory locking, explicit locking and implicit locking.

 Advisory locking lacks coherency for a NetWare, SMB, AppleTalk, or
 other file server running under FreeBSD as a hosted OS.

 It also has the problem that the hosted OS semantics, if they
 include mandatory locking, are not enforced against other
 processes, e.g. between an SMB server and an AppleTalk server
 running on the same machine, or beteen an SMB server and a UNIX
 program both needing access to the same database.

Yes, if file service protocols don't provide locking (or if one of the
operating systems involved doesn't provide locking) they obviously
can't benefit from any locking that isn't done implicitly.

 Also, I believe your example is flawed.  If a file is opened by a

Not for explicit locking, I hope.

 process that requires mandatory locking, no process that does not
 also open the file with mandatory locking turned on can access the
 file.  Neither can a program that requires mandatory locking
 semantics open the file if it is open by a process not using those
 same semantics.

So if a process wishes to use explicit locking calls, it indicates
that intent when opening the file - otherwise, the open implicitly
locks the file.  So multiple writers, or simultaneous readers and
writers are only permitted for programs that indicate that they are
going to use explicit locks on the file.

This could actually make sense.  But I don't think that is what is
being suggested.

 Mandatory locking for things like database files is necessary,
 unless the underlying FS supports records (in which case, like
 FILES-11, it most likely supports record locking anyway, and
 may only decide not to support them if it seperately implements
 a transaction facility).

 You have to have mandatory locking to implement transactions...
 like updating the parity bits on a RAID 5 stripe.

But you certainly don't want to use open/read/write/close cycles for
such a purpose.

 This is why so many _real_ UNIX databases like to squat on their
 own raw disk partition.

  Locking entire files, in addition to ranges, would seem to me to be of
  further benefit, as it would allow properly locking programs to fully
  protect against any single non-locking program which, like Greg's cat
  example, would presumably be run interactively and thus would require
  explicit stupidity to create additional races.

 This is already possible, using O_EXCL.  Likewise, it doesn't

I think you mean O_EXLOCK.  It sets an advisory lock, it does not help
against programs that don't use locks.

 apply to device files, and can not be applied (via fcntl(2)) to
 any files whose vnodes indirect through other than the vfsops
 version of "struct fileops".

It doesn't depend on the struct fileops selected, fcntl checks
explicitly that f_type is DTYPE_VNODE before assuming that f_data
points to a vnode.

 For SVR4 semantics, you can set the suid/sgid permission bits on a
 non-executable file.

The document describing "mandatory" locking in Linux seems to indicate
that setting sgid changes the behavior of locking calls to apply
explicit locks rather than merely advisory ones, and that this is what
is done by other operating systems as well.

Actually an implementation could still use the existing (advisory)
locks internally, but apply advisory locks in the kernel for the
duration of operations that need them (read/write/some cases of open).

 The act of opening the file for O_RDONLY sets a read lock on the
 entire file, which allows multiple readers, and the act of opening
 the file O_RDWR sets a write lock on the file, which allows a single
 writer.

I'm fairly certain this is not what is being discussed.  Certainly not
by more than half of the participants in the discussion.  ;--)

 Again, there are no issues with badly behaved processes.  There
 is no such thing as a badly behaved process.

I agree, for implicit locks there isn't.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Mandatory locking?

1999-08-26 Thread Ville-Pertti Keinonen

 Not to jump down your throat, or anything, but you seem to be
 perpetuating some incorrct assumptions about both effect and
 proposed implementation details, and they must be stomped.  8-).

I was assuming that mandatory locking, in the context of this
discussion, does not mean automatic, forced exclusion on open, but
rather explicit locks, applied by calls similar to those used for
advisory locking, that are enforced by the kernel.

The arguments presented by most people seem to rely on such an
interpretation.

To avoid confusion, I'll refer to the possible locking methods as
advisory locking, explicit locking and implicit locking.

 Advisory locking lacks coherency for a NetWare, SMB, AppleTalk, or
 other file server running under FreeBSD as a hosted OS.

 It also has the problem that the hosted OS semantics, if they
 include mandatory locking, are not enforced against other
 processes, e.g. between an SMB server and an AppleTalk server
 running on the same machine, or beteen an SMB server and a UNIX
 program both needing access to the same database.

Yes, if file service protocols don't provide locking (or if one of the
operating systems involved doesn't provide locking) they obviously
can't benefit from any locking that isn't done implicitly.

 Also, I believe your example is flawed.  If a file is opened by a

Not for explicit locking, I hope.

 process that requires mandatory locking, no process that does not
 also open the file with mandatory locking turned on can access the
 file.  Neither can a program that requires mandatory locking
 semantics open the file if it is open by a process not using those
 same semantics.

So if a process wishes to use explicit locking calls, it indicates
that intent when opening the file - otherwise, the open implicitly
locks the file.  So multiple writers, or simultaneous readers and
writers are only permitted for programs that indicate that they are
going to use explicit locks on the file.

This could actually make sense.  But I don't think that is what is
being suggested.

 Mandatory locking for things like database files is necessary,
 unless the underlying FS supports records (in which case, like
 FILES-11, it most likely supports record locking anyway, and
 may only decide not to support them if it seperately implements
 a transaction facility).

 You have to have mandatory locking to implement transactions...
 like updating the parity bits on a RAID 5 stripe.

But you certainly don't want to use open/read/write/close cycles for
such a purpose.

 This is why so many _real_ UNIX databases like to squat on their
 own raw disk partition.

  Locking entire files, in addition to ranges, would seem to me to be of
  further benefit, as it would allow properly locking programs to fully
  protect against any single non-locking program which, like Greg's cat
  example, would presumably be run interactively and thus would require
  explicit stupidity to create additional races.

 This is already possible, using O_EXCL.  Likewise, it doesn't

I think you mean O_EXLOCK.  It sets an advisory lock, it does not help
against programs that don't use locks.

 apply to device files, and can not be applied (via fcntl(2)) to
 any files whose vnodes indirect through other than the vfsops
 version of struct fileops.

It doesn't depend on the struct fileops selected, fcntl checks
explicitly that f_type is DTYPE_VNODE before assuming that f_data
points to a vnode.

 For SVR4 semantics, you can set the suid/sgid permission bits on a
 non-executable file.

The document describing mandatory locking in Linux seems to indicate
that setting sgid changes the behavior of locking calls to apply
explicit locks rather than merely advisory ones, and that this is what
is done by other operating systems as well.

Actually an implementation could still use the existing (advisory)
locks internally, but apply advisory locks in the kernel for the
duration of operations that need them (read/write/some cases of open).

 The act of opening the file for O_RDONLY sets a read lock on the
 entire file, which allows multiple readers, and the act of opening
 the file O_RDWR sets a write lock on the file, which allows a single
 writer.

I'm fairly certain this is not what is being discussed.  Certainly not
by more than half of the participants in the discussion.  ;--)

 Again, there are no issues with badly behaved processes.  There
 is no such thing as a badly behaved process.

I agree, for implicit locks there isn't.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-25 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Wes Peters) writes:

 And how many programmers with nearly (or more than) two decades of UNIX
 experience it takes to convince someone it really is useful.

It should only take one, as long as the arguments made are not bogus.

IMHO Greg made some very silly arguments (or at least used some very
stupid examples) for mandatory locking and never answered my points
regarding them.  (The arguments of some of the ones opposing mandatory
locking have been equally silly.)

I *do* agree that mandatory locking *can* be useful, but the
usefulness is not nearly as broad as some people seem to be implying,
and advisory locking is not as useless as some claim.

The most significant advantage I see with mandatory locking over
advisory locking is guaranteeing atomicity for things done by programs
that do use locking.  This only protects the data when programs that
access the same data without locking don't need locking, which
generally means that they either don't need to modify the data or that
there can't be multiple instances of those other programs *and* the
modifications made are themselves atomic (can't be read-modify-write,
or even multiple writes if consistency is required).

This is a somewhat limited set of cases.  If anyone can come up with a
counter-example, please present it.

Locking entire files, in addition to ranges, would seem to me to be of
further benefit, as it would allow properly locking programs to fully
protect against any single non-locking program which, like Greg's cat
example, would presumably be run interactively and thus would require
explicit stupidity to create additional races.

Locking entire files is also the only way to ensure that non-locking
programs can even see the file in a consistent state.

As a special case, mandatory locking could also be useful in ensuring
long-term exclusive access to some set of data, but this seems like
something that should be done using file permissions.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Mandatory locking?

1999-08-25 Thread Ville-Pertti Keinonen

w...@softweyr.com (Wes Peters) writes:

 And how many programmers with nearly (or more than) two decades of UNIX
 experience it takes to convince someone it really is useful.

It should only take one, as long as the arguments made are not bogus.

IMHO Greg made some very silly arguments (or at least used some very
stupid examples) for mandatory locking and never answered my points
regarding them.  (The arguments of some of the ones opposing mandatory
locking have been equally silly.)

I *do* agree that mandatory locking *can* be useful, but the
usefulness is not nearly as broad as some people seem to be implying,
and advisory locking is not as useless as some claim.

The most significant advantage I see with mandatory locking over
advisory locking is guaranteeing atomicity for things done by programs
that do use locking.  This only protects the data when programs that
access the same data without locking don't need locking, which
generally means that they either don't need to modify the data or that
there can't be multiple instances of those other programs *and* the
modifications made are themselves atomic (can't be read-modify-write,
or even multiple writes if consistency is required).

This is a somewhat limited set of cases.  If anyone can come up with a
counter-example, please present it.

Locking entire files, in addition to ranges, would seem to me to be of
further benefit, as it would allow properly locking programs to fully
protect against any single non-locking program which, like Greg's cat
example, would presumably be run interactively and thus would require
explicit stupidity to create additional races.

Locking entire files is also the only way to ensure that non-locking
programs can even see the file in a consistent state.

As a special case, mandatory locking could also be useful in ensuring
long-term exclusive access to some set of data, but this seems like
something that should be done using file permissions.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-24 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Chuck Robey) writes:

 On 23 Aug 1999, Ville-Pertti Keinonen wrote:

  And even without otherwise incorrect behavior, if you have a program
  that doesn't use any locking and another one that uses mandatory
  locking to prevent races with the non-locking program, the mere
  existence of the locking program does not prevent multiple non-locking
  programs from generating similar conditions.
 
 That's very odd, I thought the idea behind mandatory locking was to
 completely eliminate the possibility that a program could do what you're
 saying; all programs would *mandatorily* be forced to do locking to
 access the resource.

I don't know what the textbook definition for mandatory locking is,
but was assuming (particularly considering the proposal to use a fcntl
interface) that by mandatory locking, Greg was referring to a "harder"
lock than current advisory locking, one that had to be instantiated
explicitly but would not only lock out other attempts to lock, but all
other attempts to access the file.

The further messages in this the thread seems to indicate that
different individuals have different definitions for mandatory
locking...  I'd still assume that marking a file to be accessible by
only one process at a time is *not* what is being discussed.
Particularly since it is not even clear what this would mean for
forked processes, dup, sending file descriptors over local sockets
etc.

Note that my arguments earlier don't apply in a case where you might
want to e.g. ensure consistency for non-locking programs with
read-only access, with the only program with privileges to modify the
data making the data inaccessible during updates.  This is a scenario
where it would, IMHO, actually be quite useful to have mandatory
locking.

In any case, if shared (open) access is allowed, such a feature can
introduce semantic changes to read/write system calls - normally,
read/write can never return EAGAIN or block for unlimited amounts of
time on regular, local files.  EAGAIN is not that much of a problem,
as it requires explicitly setting O_NONBLOCK, but blocking can
introduce new deadlocks.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Mandatory locking?

1999-08-24 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (John-Mark Gurney) writes:

 Ville-Pertti Keinonen scribbled this message on Aug 24:

  cat writes part of oldmail to /var/mail/grog
  sendmail locks /var/mail/grog
  (cat may try to write more to /var/mail/grog but blocks)
  sendmail delivers new mail
  sendmail unlocks /var/mail/grog
  cat writes the rest of oldmail to /var/mail/grog
  
  You'll still probably end up with a broken mailbox.
 
 what you do is this:
 lockf -k $mailfile cat ${mailtmp}  $mailfile

Which doesn't support Greg's arguments for mandatory locking, as
you're now doing locking in both programs.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Mandatory locking?

1999-08-24 Thread Ville-Pertti Keinonen

chu...@picnic.mat.net (Chuck Robey) writes:

 On 23 Aug 1999, Ville-Pertti Keinonen wrote:

  And even without otherwise incorrect behavior, if you have a program
  that doesn't use any locking and another one that uses mandatory
  locking to prevent races with the non-locking program, the mere
  existence of the locking program does not prevent multiple non-locking
  programs from generating similar conditions.
 
 That's very odd, I thought the idea behind mandatory locking was to
 completely eliminate the possibility that a program could do what you're
 saying; all programs would *mandatorily* be forced to do locking to
 access the resource.

I don't know what the textbook definition for mandatory locking is,
but was assuming (particularly considering the proposal to use a fcntl
interface) that by mandatory locking, Greg was referring to a harder
lock than current advisory locking, one that had to be instantiated
explicitly but would not only lock out other attempts to lock, but all
other attempts to access the file.

The further messages in this the thread seems to indicate that
different individuals have different definitions for mandatory
locking...  I'd still assume that marking a file to be accessible by
only one process at a time is *not* what is being discussed.
Particularly since it is not even clear what this would mean for
forked processes, dup, sending file descriptors over local sockets
etc.

Note that my arguments earlier don't apply in a case where you might
want to e.g. ensure consistency for non-locking programs with
read-only access, with the only program with privileges to modify the
data making the data inaccessible during updates.  This is a scenario
where it would, IMHO, actually be quite useful to have mandatory
locking.

In any case, if shared (open) access is allowed, such a feature can
introduce semantic changes to read/write system calls - normally,
read/write can never return EAGAIN or block for unlimited amounts of
time on regular, local files.  EAGAIN is not that much of a problem,
as it requires explicitly setting O_NONBLOCK, but blocking can
introduce new deadlocks.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-24 Thread Ville-Pertti Keinonen

g...@lemis.com (Greg Lehey) writes:

 an agreement of some kind.  But what if I want to merge the contents
 of another mail folder:

   cat oldmail /var/mail/grog

 That works, but it's playing with fire: if sendmail is delivering a
 message at the same time, it won't see me, and my cat doesn't get a
 lock beforehand, so both an incoming message and part of my mail
 folder could end up getting written to the same location.  With
 mandatory locking, it would work, transparently.

Certainly not with range-locking rather than file-locking.  cat is
certainly not guaranteed to be atomic, and while you shouldn't end up
writing things in the same location, what might happen unless you are
preventing multiple openers is:

cat writes part of oldmail to /var/mail/grog
sendmail locks /var/mail/grog
(cat may try to write more to /var/mail/grog but blocks)
sendmail delivers new mail
sendmail unlocks /var/mail/grog
cat writes the rest of oldmail to /var/mail/grog

You'll still probably end up with a broken mailbox.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-24 Thread Ville-Pertti Keinonen

gurne...@efn.org (John-Mark Gurney) writes:

 Ville-Pertti Keinonen scribbled this message on Aug 24:

  cat writes part of oldmail to /var/mail/grog
  sendmail locks /var/mail/grog
  (cat may try to write more to /var/mail/grog but blocks)
  sendmail delivers new mail
  sendmail unlocks /var/mail/grog
  cat writes the rest of oldmail to /var/mail/grog
  
  You'll still probably end up with a broken mailbox.
 
 what you do is this:
 lockf -k $mailfile cat ${mailtmp}  $mailfile

Which doesn't support Greg's arguments for mandatory locking, as
you're now doing locking in both programs.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (John-Mark Gurney) writes:

 Christopher Seiwald scribbled this message on Aug 18:
  It's a pretty straightforward change to bypass the insertion sort for
  large subsets of the data.  If no one has a strong love for qsort, I'll
  educate myself on how to make and contribute this change.
 
 why don't you implement this w/ the 5 element median selection qsort
 algorithm?  my professor for cis413 talked about this algorithm and
 that it really is the fastest qsort algorithm...   I don't have any
 pointers to a paper on this... but I might be able to dig some info up
 on it if you are interested...

I don't think the point is eliminating worst-cases, but optimizing
common cases, which in this case caused more worst-cases and thus
needs fixing.  Besides, the median selection chooses among more than 3
elements already (but only when the data set is large enough).

For fixing worst cases, an introspective sort might be a good idea,
i.e. do a quick sort but fall back to heap sort if a certain depth is
exceeded (you know you're losing when the depth exceeds log n).  This
also has another advantage - if you limit the depth of the sort, you
don't need to use the cpu stack for state, you can allocate a
fixed-size array for the purpose.  This probably isn't a real
performance advantage for a C qsort implementation because of the
overhead of calling cmp.  It does, however, guarantee that sorting
uses a reasonable amount of stack.  Such an assumption isn't portable
when using qsort(3), though.  Expect to die if you do large qsorts
from threads with small thread stacks.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Mandatory locking?

1999-08-23 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Greg Lehey) writes:

 Again, if we have two concurrent transactions, we stand to gain money:
 the updated balance is likely not to know about the other transaction,
 and will thus "forget" one of the deductions.

 Now I suppose you're going to come and say that this is bad
 programming, and advisory locking would do the job if the software is
 written right.  Correct.  You could also use the same argument to say
 that memory protection isn't necessary, because a correctly written
 program doesn't overwrite other processes address space.  It's the

The difference is that if a program has privileges to screw up
whatever you are protecting, it can do so even if you do have
mandatory locking, simply by functioning incorrectly when it does gain
access to the data.

And even without otherwise incorrect behavior, if you have a program
that doesn't use any locking and another one that uses mandatory
locking to prevent races with the non-locking program, the mere
existence of the locking program does not prevent multiple non-locking
programs from generating similar conditions.

(I'm not opposed to mandatory locking in principle, but I don't find
your reasoning very convincing.)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Ville-Pertti Keinonen

gurne...@efn.org (John-Mark Gurney) writes:

 Christopher Seiwald scribbled this message on Aug 18:
  It's a pretty straightforward change to bypass the insertion sort for
  large subsets of the data.  If no one has a strong love for qsort, I'll
  educate myself on how to make and contribute this change.
 
 why don't you implement this w/ the 5 element median selection qsort
 algorithm?  my professor for cis413 talked about this algorithm and
 that it really is the fastest qsort algorithm...   I don't have any
 pointers to a paper on this... but I might be able to dig some info up
 on it if you are interested...

I don't think the point is eliminating worst-cases, but optimizing
common cases, which in this case caused more worst-cases and thus
needs fixing.  Besides, the median selection chooses among more than 3
elements already (but only when the data set is large enough).

For fixing worst cases, an introspective sort might be a good idea,
i.e. do a quick sort but fall back to heap sort if a certain depth is
exceeded (you know you're losing when the depth exceeds log n).  This
also has another advantage - if you limit the depth of the sort, you
don't need to use the cpu stack for state, you can allocate a
fixed-size array for the purpose.  This probably isn't a real
performance advantage for a C qsort implementation because of the
overhead of calling cmp.  It does, however, guarantee that sorting
uses a reasonable amount of stack.  Such an assumption isn't portable
when using qsort(3), though.  Expect to die if you do large qsorts
from threads with small thread stacks.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Mandatory locking?

1999-08-23 Thread Ville-Pertti Keinonen

g...@lemis.com (Greg Lehey) writes:

 Again, if we have two concurrent transactions, we stand to gain money:
 the updated balance is likely not to know about the other transaction,
 and will thus forget one of the deductions.

 Now I suppose you're going to come and say that this is bad
 programming, and advisory locking would do the job if the software is
 written right.  Correct.  You could also use the same argument to say
 that memory protection isn't necessary, because a correctly written
 program doesn't overwrite other processes address space.  It's the

The difference is that if a program has privileges to screw up
whatever you are protecting, it can do so even if you do have
mandatory locking, simply by functioning incorrectly when it does gain
access to the data.

And even without otherwise incorrect behavior, if you have a program
that doesn't use any locking and another one that uses mandatory
locking to prevent races with the non-locking program, the mere
existence of the locking program does not prevent multiple non-locking
programs from generating similar conditions.

(I'm not opposed to mandatory locking in principle, but I don't find
your reasoning very convincing.)


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD-XFS Update

1999-08-12 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Alton, Matthew) writes:

 I am currently researching methods for implementing the 64-bit
 syscalls stat64(), fstat64(), lseek64() etc.  delineated in the
 SGI design doc _64 Bit File Access_  by Adam Sweeney.

Do the design docs indicate how inode numbers should interact with
userland APIs?

IIRC, inode numbers are 64-bit numbers in XFS.  Since ino_t, st_ino of
struct stat and d_fileno of struct dirent are only 32 bits, inode
numbers may be truncated and not appear unique to userland.  This
would break the assumptions of some code (e.g. getcwd(3), when not
using the kernel extension).


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD-XFS Update

1999-08-12 Thread Ville-Pertti Keinonen

matthew.al...@anheuser-busch.com (Alton, Matthew) writes:

 I am currently researching methods for implementing the 64-bit
 syscalls stat64(), fstat64(), lseek64() etc.  delineated in the
 SGI design doc _64 Bit File Access_  by Adam Sweeney.

Do the design docs indicate how inode numbers should interact with
userland APIs?

IIRC, inode numbers are 64-bit numbers in XFS.  Since ino_t, st_ino of
struct stat and d_fileno of struct dirent are only 32 bits, inode
numbers may be truncated and not appear unique to userland.  This
would break the assumptions of some code (e.g. getcwd(3), when not
using the kernel extension).


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: libcompat proposition

1999-08-12 Thread Ville-Pertti Keinonen

ch...@calldei.com (Chris Costello) writes:

I'm in favor of a libgnucompat rather than gnu functions in
 libcompat.

And how would a libgnucompat be different from libiberty?  Except of
course that it would be maintained by the FreeBSD folks...  Or that it
would be maintained at all.  ;--)


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD voice synthesis

1999-08-04 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Dag-Erling Smorgrav) writes:

 Ville-Pertti Keinonen [EMAIL PROTECTED] writes:
  I certainly don't expect any of the available voices to be able to
  pronounce Finnish names correctly, even with phonetic specifications.

 If the software were *designed* to speak Finnish, I'd expect it to
 cope with Finnish much better than it currently does with English,
 seeing as you guys have nearly phonetic spelling.

Festival is basically language-independent, each voice is associated
with a specific language, so with a Finnish voice it should be able to
pronounce Finnish reasonably.

Since the English voices have dictionaries for pronunciation, anyhow,
a Finnish voice wouldn't necessarily do a better job in terms of
pronunciation, although a Finnish voice should require fewer distinct
phonemes.

Creating voices does seem to involve quite a bit of work, though.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD voice synthesis

1999-08-04 Thread Ville-Pertti Keinonen

w...@softweyr.com (Wes Peters) writes:

 available for home computers decades ago.  (Anyone else here ever use
 SAM the Software Automated Mouth for the Atari 800 or Commodore 64?)

Yes.

It's almost surprising how little speech synthesis has improved, at
least judging from the festival demos (it is, of course, better than
SAM, but apparently the data and processing requirements are several
orders of magnitude greater).  I haven't downloaded all of the
required stuff, yet, so I don't know how good or bad it actually might
be.

I certainly don't expect any of the available voices to be able to
pronounce Finnish names correctly, even with phonetic specifications.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD voice synthesis

1999-08-04 Thread Ville-Pertti Keinonen

d...@flood.ping.uio.no (Dag-Erling Smorgrav) writes:

 Ville-Pertti Keinonen w...@iki.fi writes:
  I certainly don't expect any of the available voices to be able to
  pronounce Finnish names correctly, even with phonetic specifications.

 If the software were *designed* to speak Finnish, I'd expect it to
 cope with Finnish much better than it currently does with English,
 seeing as you guys have nearly phonetic spelling.

Festival is basically language-independent, each voice is associated
with a specific language, so with a Finnish voice it should be able to
pronounce Finnish reasonably.

Since the English voices have dictionaries for pronunciation, anyhow,
a Finnish voice wouldn't necessarily do a better job in terms of
pronunciation, although a Finnish voice should require fewer distinct
phonemes.

Creating voices does seem to involve quite a bit of work, though.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Documenting writev(2) ENOBUFS error

1999-08-01 Thread Ville-Pertti Keinonen


   :  [ENOBUFS] Insufficient system buffer space exists to complete the op-
   :eration.
   :
   :Do you know what kind of circumstances that error *really* occurs
   :under?
  
  So you can get ENOBUFS not related to mbufs for UDP/local datagram
  sockets, but you should never get ENOBUFS from write for TCP sockets
  or local stream sockets.
 
 So, do you want to enumerate the cases in which this error can occur in the
 man page?  This is not generally done, now that we have verified it is
 possible for the system to generate ENOBUFS on a writev.  I think the text
 stands as it is.

It should probably mention that it doesn't occur for most files (or
that it only occurs for datagram sockets - although it probably
applies to some types of raw sockets, too, and possibly
non-PF_INET/PF_UNIX sockets) to avoid people doing unnecessary
checking or implementing kernel code that bails out when it shouldn't.

It should be a fair requirement that the kernel continue to never fail
with ENOBUFS for a write to a reliable stream (local file, fifo, pipe
or stream socket) and that such cases be treated as bugs.  I would
assume that this corresponds to how other systems operate, as well.
Of course I'm no authority on this, and I'm not sure about NFS writes.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Documenting writev(2) ENOBUFS error

1999-08-01 Thread Ville-Pertti Keinonen

   :  [ENOBUFS] Insufficient system buffer space exists to complete 
   the op-
   :eration.
   :
   :Do you know what kind of circumstances that error *really* occurs
   :under?
  
  So you can get ENOBUFS not related to mbufs for UDP/local datagram
  sockets, but you should never get ENOBUFS from write for TCP sockets
  or local stream sockets.
 
 So, do you want to enumerate the cases in which this error can occur in the
 man page?  This is not generally done, now that we have verified it is
 possible for the system to generate ENOBUFS on a writev.  I think the text
 stands as it is.

It should probably mention that it doesn't occur for most files (or
that it only occurs for datagram sockets - although it probably
applies to some types of raw sockets, too, and possibly
non-PF_INET/PF_UNIX sockets) to avoid people doing unnecessary
checking or implementing kernel code that bails out when it shouldn't.

It should be a fair requirement that the kernel continue to never fail
with ENOBUFS for a write to a reliable stream (local file, fifo, pipe
or stream socket) and that such cases be treated as bugs.  I would
assume that this corresponds to how other systems operate, as well.
Of course I'm no authority on this, and I'm not sure about NFS writes.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Documenting writev(2) ENOBUFS error

1999-07-31 Thread Ville-Pertti Keinonen


 :[EMAIL PROTECTED] (Wes Peters) writes:
 :
 :  [ENOBUFS] Insufficient system buffer space exists to complete the op-
 :eration.
 :
 :Do you know what kind of circumstances that error *really* occurs
 :under?
 :
 :If it happened with files, that would be a bug and should be fixed.
 :The call is supposed to block to wait for writes to be possible.  This
 
 I am almost certain that this error can only occur when writing to
 sockets, and only then of the network mbuf pool is completely exhausted.
 UDP is probably the most vulernable.

It looks to me like it can't happen to stream sockets using
write/writev.

As far as I can tell, the ENOBUFS error can occur internally for sends
if:

- There is a shortage of mbufs at a low level (at higher levels,
code either blocks or panics)
- A network interface has a lot of packets queued (this is done at
an IP level)
- The socket buffer of a local datagram socket is full (the receiving
socket, not the one the send occurred on)

The TCP layer doesn't let ENOBUFS from low-level calls get through,
but returns success. A TCP socket is prepared to resend the data at a
higher level, anyhow, so the data is not lost and an error doesn't
need to be returned.

OOB data or implicit connections can return ENOBUFS for TCP sends, but
they are activated by parameters only available through the
send/sendto API.

So you can get ENOBUFS not related to mbufs for UDP/local datagram
sockets, but you should never get ENOBUFS from write for TCP sockets
or local stream sockets.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Documenting writev(2) ENOBUFS error

1999-07-31 Thread Ville-Pertti Keinonen

 :w...@softweyr.com (Wes Peters) writes:
 :
 :  [ENOBUFS] Insufficient system buffer space exists to complete the 
 op-
 :eration.
 :
 :Do you know what kind of circumstances that error *really* occurs
 :under?
 :
 :If it happened with files, that would be a bug and should be fixed.
 :The call is supposed to block to wait for writes to be possible.  This
 
 I am almost certain that this error can only occur when writing to
 sockets, and only then of the network mbuf pool is completely exhausted.
 UDP is probably the most vulernable.

It looks to me like it can't happen to stream sockets using
write/writev.

As far as I can tell, the ENOBUFS error can occur internally for sends
if:

- There is a shortage of mbufs at a low level (at higher levels,
code either blocks or panics)
- A network interface has a lot of packets queued (this is done at
an IP level)
- The socket buffer of a local datagram socket is full (the receiving
socket, not the one the send occurred on)

The TCP layer doesn't let ENOBUFS from low-level calls get through,
but returns success. A TCP socket is prepared to resend the data at a
higher level, anyhow, so the data is not lost and an error doesn't
need to be returned.

OOB data or implicit connections can return ENOBUFS for TCP sends, but
they are activated by parameters only available through the
send/sendto API.

So you can get ENOBUFS not related to mbufs for UDP/local datagram
sockets, but you should never get ENOBUFS from write for TCP sockets
or local stream sockets.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Documenting writev(2) ENOBUFS error

1999-07-30 Thread Ville-Pertti Keinonen

w...@softweyr.com (Wes Peters) writes:

  [ENOBUFS] Insufficient system buffer space exists to complete the op-
eration.

Do you know what kind of circumstances that error *really* occurs
under?

If it happened with files, that would be a bug and should be fixed.
The call is supposed to block to wait for writes to be possible.  This
applies to stream sockets in most cases, as well.  Based on a quick
look at the code, out-of-band TCP data seems to be the only case where
ENOBUFS might be returned for streams, and that obviously doesn't
apply to write/writev.

As I mentioned to Nik in private mail, for datagram sockets, the
description in send(2) more or less applies.  Programs should
generally use send rather than write for such objects, anyhow.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-21 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Peter Jeremy) writes:

 "Leif Neland" [EMAIL PROTECTED] wrote:
 My 60MHz Pentium, FreeBSD
 
 time file /usr/home/leif/vnc-3.3.2r
 /usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
 original filename, last modified: Thu Jan 21 19:23:21 1999
 
 real0m1.237s
 user0m0.758s
 sys 0m0.394s

 I can't believe these figures.

Hmm, a 200 MHz Pentium (MMX), 3.2-RELEASE, everything in cache:

$ /usr/bin/time file twofish.tar.gz
twofish.tar.gz: gzip compressed data, deflated, last modified: Mon Jun 15 02:40:53 
1998, os: Unix
0.35 real 0.24 user 0.10 sys

I'd say that considering that things are cached (cpu-bound), it's very
accurately proportional to Leif's time.  Variances can be accounted
for by the slight implementation differences (the MMX version has a
bigger L1 and better branch prediction).

It's also reasonably proportional to a 400 MHz PII (0.09/0.08/0.01
running 3.2 -- 0.06/0.04/0.01 running 2.2.8, BTW).  Considering the
completely different core, this is also quite close to what you might
expect.

 I can't reproduce the complaint using a 64MB PII-266 running -CURRENT -
 there's no evidence of lack of speed, and profiling file(1) doesn't
 show any anomolies.

What are your results, then?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-21 Thread Ville-Pertti Keinonen

jere...@gsmx07.alcatel.com.au (Peter Jeremy) writes:

 Leif Neland le...@neland.dk wrote:
 My 60MHz Pentium, FreeBSD
 
 time file /usr/home/leif/vnc-3.3.2r
 /usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
 original filename, last modified: Thu Jan 21 19:23:21 1999
 
 real0m1.237s
 user0m0.758s
 sys 0m0.394s

 I can't believe these figures.

Hmm, a 200 MHz Pentium (MMX), 3.2-RELEASE, everything in cache:

$ /usr/bin/time file twofish.tar.gz
twofish.tar.gz: gzip compressed data, deflated, last modified: Mon Jun 15 
02:40:53 1998, os: Unix
0.35 real 0.24 user 0.10 sys

I'd say that considering that things are cached (cpu-bound), it's very
accurately proportional to Leif's time.  Variances can be accounted
for by the slight implementation differences (the MMX version has a
bigger L1 and better branch prediction).

It's also reasonably proportional to a 400 MHz PII (0.09/0.08/0.01
running 3.2 -- 0.06/0.04/0.01 running 2.2.8, BTW).  Considering the
completely different core, this is also quite close to what you might
expect.

 I can't reproduce the complaint using a 64MB PII-266 running -CURRENT -
 there's no evidence of lack of speed, and profiling file(1) doesn't
 show any anomolies.

What are your results, then?


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Overcommit and calloc()

1999-07-20 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Dag-Erling Smorgrav) writes:

 "Kelly Yancey" [EMAIL PROTECTED] writes:
Ahh...but wouldn't the bzero() touch all of the memory just allocated
  functionally making it non-overcommit?
 
 No. If it were an "non-overcomitting malloc", it would return NULL and
 set errno to ENOMEM, instead of dumping core.

It won't dump core.  If it isn't the biggest process, it'll simply
succeed, but somebody else is killed.  If it's the biggest process,
it'll die with SIGKILL without dumping core.

There *are* systems that kill "random" processes when swap runs out,
presumably when they need to actually get pages that aren't available.
FreeBSD is not one of them.

Overcommit still has nothing to do with malloc.  Either the *system*
is overcommitted or it isn't - per-process overcommitment is
irrelevant, as is the way memory has become overcommitted.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Overcommit and calloc()

1999-07-20 Thread Ville-Pertti Keinonen

d...@flood.ping.uio.no (Dag-Erling Smorgrav) writes:

 Kelly Yancey kby...@alcnet.com writes:
Ahh...but wouldn't the bzero() touch all of the memory just allocated
  functionally making it non-overcommit?
 
 No. If it were an non-overcomitting malloc, it would return NULL and
 set errno to ENOMEM, instead of dumping core.

It won't dump core.  If it isn't the biggest process, it'll simply
succeed, but somebody else is killed.  If it's the biggest process,
it'll die with SIGKILL without dumping core.

There *are* systems that kill random processes when swap runs out,
presumably when they need to actually get pages that aren't available.
FreeBSD is not one of them.

Overcommit still has nothing to do with malloc.  Either the *system*
is overcommitted or it isn't - per-process overcommitment is
irrelevant, as is the way memory has become overcommitted.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Replacement for grep(1) (part 2)

1999-07-16 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Chris G. Demetriou) writes:

 Matthew Dillon [EMAIL PROTECTED] writes:
  The text size of a program is irrelevant, because swap is never
  allocated for it.  The data and BSS are only relevant when they

No, you can mprotect read-only vnode mappings to writable.  Most
things wouldn't be hurt badly if this changed, though, I suspect that
this already varies between operating systems.

  are modified.
  
  The only thing swap is ever used for is the dynamic allocation of memory.
  There are three ways to do it:  sbrk(), mmap(... MAP_ANON), or
  mmap(... MAP_PRIVATE).

 yup, almost: not all MAP_PRIVATE mappings need backing store, only
 MAP_PRIVATE and writeable mappings.  (MAP_PRIVATE does _not_ guarantee
 that you won't see modifications made via other MAP_SHARED mappings.)

...but in *this* case, you certainly shouldn't allow mprotect to fail
(with what, ENOMEM?).

It's certainly counterintuitive to me that mprotect could fail due to
a resource shortage.

 Actually, only now have you brought that up.  And, that's very system
 dependent.  On NetBSD/i386 the default is 2MB, and, it's worth noting
 that you only need to reserve as much as the current stack limit
 allows (after that, you're going to get a signal anyway, and if more

So what setrlimit accepts depends on how much memory is available?

Ok, programs changing their stack limit are rare, but this would still
be another API change.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Replacement for grep(1) (part 2)

1999-07-16 Thread Ville-Pertti Keinonen

jul...@whistle.com (Julian Elischer) writes:

 If you wanted to fix this, you could add a patch to malloc that touched
 every page that it handed to the application. (and trapped sig11s)

How would you expect that to work?

Several misunderstandings seem to be common regarding this issue (most
not directed at you):

 - malloc almost never fails with NULL.  This is not true, if resource
limits are set properly, any one program using huge amounts of memory
is going to hit them long before swap space is exhausted.

 - The program currently trying to get the page is the one that is
killed.

 - Actually paging in all memory is going to protect a program from
getting killed.  This is going to make it *more likely* for it to be
killed.

 - Not overcommitting doesn't consume huge amounts of reserve space
unless programs do something special.

A rough sum of memory usage can be computed by summing up all of the
process VSZs plus your stack limit times the number of processes.  How
many of you would be willing to configure that much swap space?

If you really wanted to run without overcommit, you'd only run
statically linked binaries and set your stack limits to small values.
This could be desirable for some (but not general-purpose) systems, an
option for doing this wouldn't be entirely bogus.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Replacement for grep(1) (part 2)

1999-07-16 Thread Ville-Pertti Keinonen

c...@netbsd.org (Chris G. Demetriou) writes:

 Matthew Dillon dil...@apollo.backplane.com writes:
  The text size of a program is irrelevant, because swap is never
  allocated for it.  The data and BSS are only relevant when they

No, you can mprotect read-only vnode mappings to writable.  Most
things wouldn't be hurt badly if this changed, though, I suspect that
this already varies between operating systems.

  are modified.
  
  The only thing swap is ever used for is the dynamic allocation of 
  memory.
  There are three ways to do it:  sbrk(), mmap(... MAP_ANON), or
  mmap(... MAP_PRIVATE).

 yup, almost: not all MAP_PRIVATE mappings need backing store, only
 MAP_PRIVATE and writeable mappings.  (MAP_PRIVATE does _not_ guarantee
 that you won't see modifications made via other MAP_SHARED mappings.)

...but in *this* case, you certainly shouldn't allow mprotect to fail
(with what, ENOMEM?).

It's certainly counterintuitive to me that mprotect could fail due to
a resource shortage.

 Actually, only now have you brought that up.  And, that's very system
 dependent.  On NetBSD/i386 the default is 2MB, and, it's worth noting
 that you only need to reserve as much as the current stack limit
 allows (after that, you're going to get a signal anyway, and if more

So what setrlimit accepts depends on how much memory is available?

Ok, programs changing their stack limit are rare, but this would still
be another API change.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: a BSD identd

1999-07-13 Thread Ville-Pertti Keinonen

gr...@freebsd.org (Brian F. Feldman) writes:

 It's out with the bad, in with the good. Pidentd code is pretty terrible.
 The only security concerns with my code were wrt FAKEID, and those were
 mostly fixed (mostly meaning that a symlink _may_ be opened, but it won't
 be read.) If anyone wants to audit my code for security, I invite them to.

Did you mean to avoid reading through symlinks using the open + fstat
method mentioned earlier in the thread?

I thought I'd misunderstood, that you had to be discussing something
else, since you and whoever else was involved both agreed that open +
fstat is sufficient, and I thought that several people can't possibly
be so completely confused.

If you really want to avoid reading through symlinks, you need to
lstat, open and fstat (the order doesn't really matter).


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Wrong comment in VM code?

1999-07-09 Thread Ville-Pertti Keinonen

[EMAIL PROTECTED] (Zhihui Zhang) writes:

 At the beginning of the file vm_object.c, we have the following comment:
 
 The only items within the object structure which are modified after time
 of creation are:
 
 reference count locked by object's lock
 pager routine   locked by object's lock 
 
 But at the end of vnode_pager_setsize(), we modify the size field.  So at
 least three items can be modified after creation.  Am I right?

The comment is wrong (it's probably supposed to mean something other
than it seems to), the only field in a vm_object that *isn't* modified
after creation is 'id'.

The comment is also wrong in that there are no vm_object locks in
FreeBSD (they've been ripped out).


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Rewriting pca(4) using finetimer(9) (was: Re: MPU401 now worksunder New Midi Driver Framework with a Fine Timer)

1999-07-09 Thread Ville-Pertti Keinonen

p...@critter.freebsd.dk (Poul-Henning Kamp) writes:

 Somebody should study the abilities of the on-cpu APIC for this
 for pentium ff. machines.

The local APIC would work very nicely, but I'm not sure that you can
enable it reliably in a non-SMP configuration.  AFAIK most BIOSes
don't provide an MP config at all unless you have multiple CPUs
present.  If you don't have an MP config, you can't set up the
redirection tables.

And if you have a non-SMP chipset, you can't route interrupts at all,
since you won't have an APIC bus on your motherboard or an I/O APIC
for the real interrupts.

It's been a while since I looked at the documentation, but it *might*
be possible that the local APIC timers would work without using APIC
interrupt routing.  IIRC the timers are simply programmed with the IDT
vector number to generate as an interrupt.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Wrong comment in VM code?

1999-07-09 Thread Ville-Pertti Keinonen
zzh...@cs.binghamton.edu (Zhihui Zhang) writes:

 At the beginning of the file vm_object.c, we have the following comment:
 
 The only items within the object structure which are modified after time
 of creation are:
 
 reference count locked by object's lock
 pager routine   locked by object's lock 
 
 But at the end of vnode_pager_setsize(), we modify the size field.  So at
 least three items can be modified after creation.  Am I right?

The comment is wrong (it's probably supposed to mean something other
than it seems to), the only field in a vm_object that *isn't* modified
after creation is 'id'.

The comment is also wrong in that there are no vm_object locks in
FreeBSD (they've been ripped out).


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Rewriting pca(4) using finetimer(9) (was: Re: MPU401 now worksunder New Midi Driver Framework with a Fine Timer)

1999-07-09 Thread Ville-Pertti Keinonen

p...@critter.freebsd.dk (Poul-Henning Kamp) writes:

 But shouldn't you still be able to use the timer in the local apic ?

Did you read the last paragraph in my message?

Here it is again:

 It's been a while since I looked at the documentation, but it *might*
 be possible that the local APIC timers would work without using APIC
 interrupt routing.  IIRC the timers are simply programmed with the IDT
 vector number to generate as an interrupt.

I haven't tried it, I don't know what would happen.  If someone else
knows (or has a chance to try it soon), please comment.

Even if it works, using the feature would probably have to rely on
undocumented behavior.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Bursting at the seams (was: Heh heh, humorous lockup)

1999-07-08 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Patryk Zadarnowski) writes:

  You can't extend the address space that way, segments are all parts of
  the single 4GB address space described by the page mapping.

 True, but you can reserve a part of the 4GB address space (say 128MB of it)
 for partitioning into tiny (say 8MB) address spaces (which are still flat,
 just small), for use by small processes, the idea being that all those small
 processes will than share a single page table without compromising on memory
 protection (the GDT is under full OS's control anyway), or the simplicity of
 a flat address space (virtual addresses still start at 0 and continue till
 the top of address space; the scheme is totally transparent.)

Yeah, I know, I've read Liedtke's original paper where he described
the optimization in L3, that's fine for that specific purpose, but
that wasn't what the thread was about.  Unless I totally missed the
point.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Bursting at the seams (was: Heh heh, humorous lockup)

1999-07-08 Thread Ville-Pertti Keinonen

jul...@whistle.com (Julian Elischer) writes:

 we already use the gs register for SMP now..
 what about the fs register?
 I vaguely remember that the different segments could be used to achieve
 this (%fs points to user space or something)

You can't extend the address space that way, segments are all parts of
the single 4GB address space described by the page mapping.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Heh heh, humorous lockup

1999-07-08 Thread Ville-Pertti Keinonen

dil...@apollo.backplane.com (Matthew Dillon) writes:

 pair-down the fields in both structures.  For example, the vnode structure
 contains a lot of temporary clustering fields that could be removed
 entirely if clustering operations are done at the time of the actual I/O
 rather then before hand ( which leads to other problems related to 
 low-memory deadlocks :-(... but assuming that could be fixed... ).

Actually the vnode structure can be reduced in size quite a bit
without affecting behavior.  I analyzed this in a private mail to phk
a few months ago, I can get the list of necessary changes out again if
anyone is actually interested.

The idea was to reduce the size to 128 bytes (on i386) so that the
kernel malloc would do a reasonable job allocating the vnodes without
too much overhead.  IIRC it was very close.

I had written code that allocated and deallocated vnodes dynamically
(see http://www.hut.fi/~will/freebsd_vnfree0.diff for a non-malloc
version with parameters adjusted to exercise the behavior quite
heavily).  It didn't seem like a very useful feature, though, because
of fragmentation (even with the 'optimizing' zone allocator in the
patch).  Even if the kernel malloc would be usable, the only other
common object that would typically use that memory would be ffs
inodes, which are allocated and deallocated along with vnodes...

This reminds me - the small patch I submitted to fix the v_id
references still hasn't been commited (phk, if you're reading this, is
there any specific reason for this?).


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Bursting at the seams (was: Heh heh, humorous lockup)

1999-07-08 Thread Ville-Pertti Keinonen

patr...@mycenae.ilion.eu.org (Patryk Zadarnowski) writes:

  You can't extend the address space that way, segments are all parts of
  the single 4GB address space described by the page mapping.

 True, but you can reserve a part of the 4GB address space (say 128MB of it)
 for partitioning into tiny (say 8MB) address spaces (which are still flat,
 just small), for use by small processes, the idea being that all those small
 processes will than share a single page table without compromising on memory
 protection (the GDT is under full OS's control anyway), or the simplicity of
 a flat address space (virtual addresses still start at 0 and continue till
 the top of address space; the scheme is totally transparent.)

Yeah, I know, I've read Liedtke's original paper where he described
the optimization in L3, that's fine for that specific purpose, but
that wasn't what the thread was about.  Unless I totally missed the
point.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Overwrite an executable file that is running

1999-07-07 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Zhihui Zhang) writes:

 For a big executable file that is being run by the OS, all its contents
 may not be loaded into the memory.  At the same time, the developer gets
 impatient and wants to create a new version of the same file.  He could
 modify the makefile to output the new version to a different file name,
 but this is tedious. This new version should not overwrite the older
 verion of the file being run. My question is how FreeBSD prevents this
 from happening?  Can anyone point out for me where in the source code this

It is prevented by not allowing it.  A file cannot simultaneously be
executing and opened for writing.

To find the relevant bits in the sources, try:

grep ETXTBSY /sys/kern/*


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Overwrite an executable file that is running

1999-07-07 Thread Ville-Pertti Keinonen

zzh...@cs.binghamton.edu (Zhihui Zhang) writes:

 For a big executable file that is being run by the OS, all its contents
 may not be loaded into the memory.  At the same time, the developer gets
 impatient and wants to create a new version of the same file.  He could
 modify the makefile to output the new version to a different file name,
 but this is tedious. This new version should not overwrite the older
 verion of the file being run. My question is how FreeBSD prevents this
 from happening?  Can anyone point out for me where in the source code this

It is prevented by not allowing it.  A file cannot simultaneously be
executing and opened for writing.

To find the relevant bits in the sources, try:

grep ETXTBSY /sys/kern/*


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



TCP input processing bug

1999-06-22 Thread Ville-Pertti Keinonen

I think I've located a problem in TCP input processing...and it has
been there for quite a while.  It breaks half-open connection
discovery for many cases since version 1.15 of netinet/tcp_input.c
(committed by Garrett Wollman, which is why this is Cc'd to him),
although that isn't where the (presumably) incorrect behavior was
introduced.

The half-open connection discovery problem can be reproduced easily,
the conditions required are:

 - Machine A thinks it has an established connection with machine B

 - Machine B disagrees (it has crashed, the network has been down,
   maybe has been recently assigned the IP of another machine that
   disconnected unnicely etc., there are a lot of conditions that
   can cause this)

 - Machine B tries to connect to machine A using the same source
   port number as the half-open connection

 - Machine B selects a sequence number below the current window
   expected by machine A

Machine B sends a SYN, but gets nothing as a reply (it should be
getting an ACK), no matter how many times it tries.  Machine A will
keep the connection in an established state until it tries to send
data (depending on the application, this may never happen) or is timed
out by keepalives.

This is particularly nasty if the boot procedure of machine B
establishes a TCP connection to machine A - after a crash, it'll
always try to use the same port number and never succeed.

Basically, in the tcp_input function, just before ACK processing, when
'goto drop' is done if ACK isn't set, TF_ACKNOW might be set in
tp-t_flags, but the ACK is never sent because tcp_output is never
called.

This can be fixed by checking for TF_ACKNOW in the drop: case and
calling tcp_output if it is set.  However, such a modification can
change the behavior of a considerable number of cases so I think it
needs careful verification.

Anyone who knows the TCP code, please comment!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: vinum performance

1999-06-17 Thread Ville-Pertti Keinonen

cro...@cs.rpi.edu (David E. Cross) writes:

 I have a drive that is rated at ~16 Meg/second, and indeed it delivers on the
 order of 15+ Meg/second.  If I use Vinum to create a concatinated device
 of 2 such units performance drops to 2.5 Meg/sec.  This seems like a 
 drastic drop in performance.  Any ideas what I am doin incorrectly?

You've accidentally striped subdisks on the same drive?  ;--)

Like Greg Lehey said, you haven't really provided enough details.  The
minimum info required would be:

 - Is this read or write performance?

Many disks are shipped with write caching disabled, and write
performance can be significantly worse than read performance.  It
shouldn't be quite *that* bad, though, I get better write performance
with slower disks, write caching disabled and mirroring (with the
default 3.2 vinum - which has debugging compiled in, look at the bss
size...).

 - Are you testing through the filesystem?  (How are you testing?)

Maybe you're doing a dd test and accessing /dev/vinum/vol/* rather
than /dev/vinum/rvol/*...


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: vinum performance

1999-06-17 Thread Ville-Pertti Keinonen

g...@lemis.com (Greg Lehey) writes:

  You've accidentally striped subdisks on the same drive?  ;--)
 
  Like Greg Lehey said, you haven't really provided enough details.
 
 He did provide one detail, though; this is a concatenated plex, not a
 striped one.

Or he at least *thinks* it's concatenated.  ;--)  I didn't miss that -
but given numbers that bad, it sounds like there might be some really
silly mistake involved.

  Many disks are shipped with write caching disabled, and write
  performance can be significantly worse than read performance.
 
 Not if it works without Vinum.

My thought was that he might be comparing read performance and write
performance (they are often pretty close).  But even so, the
difference shouldn't be *that* big.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: coarse vs fine-grained locking in SMP systems

1999-06-15 Thread Ville-Pertti Keinonen
m...@servo.ccr.org (Mike O'Dell) writes:

 very fine-grain-locked systems often display convoying and
 are prone to priority inversion problems.  coarse-grained

Priority inversion problems are design flaws.  Depending on the type
of locks, they may not even be possible.  Spin locks held for short
periods of time (typical for very fine-grained systems) can't cause
priority inversion because the process holding the lock can't block.

 we published the best Unix SMP paper I've ever seen in Computing
 Systems - from the Amdahl guys who did an SMP version of the kernel
 by very clever hacks on SPLx() macros to make them spin locks and
 a bit of other clever trickery on the source.  they could take a stock

An approach like that can't possibly be sufficient if code has been
written with the assumption that only interrupt-like events or
blocking calls can change things from under it.  There is quite a bit
of code in FreeBSD that relies on this.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: symlink question

1999-06-15 Thread Ville-Pertti Keinonen

dsche...@enteract.com (David Scheidt) writes:

 First try:  Suppose foo depends on /usr/local/etc/foo.conf.
 /usr/local/etc is a link to /usr/local/${ARCH}/etc.  User does
 export $ARCH=../../home/user, so /usr/local/etc/foo.conf is now in
 their home directory.  Depending on how poorly written foo is

Eww, I don't like the idea of using environment variables this way.
The kernel shouldn't rely on them, they are a userland thing except
during execve.  Environment variables aren't even visible to the
kernel in the process that sets them.

Variant symlinks don't need to be controlled through environment
variables.  If there is a specific use in mind for variant symlinks,
the mechanism for configuring them should be chosen with consideration
for that.  (Even if variant symlinks could be environment variables,
there should be ones that are based on some hard-wired info and
system-wide variant symlinks should only use environment variables
when user-modifiability is specifically desirable.  Your example is
obviously a case of improper use.)

If there is no specific use in mind for variant symlinks, other than
to have fun magic thingies around to play with that *can* be used for
such-and-such, then implementing them is not a particularly good idea.

For example, Lites had variant symlinks with keywords that were
internally resolved to the architecture/system name or the name of the
system being emulated.

For Lites, this was much better than something equivalent to FreeBSD's
/compat hacks, because emulated systems were equal, and the root
partition could be shared with the real system.  For FreeBSD, the
current approach is probably better, because emulated systems are
optional exceptions.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: oops, here's the patch

1999-06-15 Thread Ville-Pertti Keinonen

dil...@apollo.backplane.com (Matthew Dillon) writes:

 However, if the inside of the first conditional generates an error, the vp
 may be vput twice.  What I recommend is this for the last bit:

That can't happen (the attributes are straight from VATTR_NULL along
that path) - if it could, the file could also be truncated...

   if (vap-va_size != -1) {
   ...
   if (error) {
   vput(vp);
   vp = NULL;   my addition
   }
   }
   if (eexistdebug  vp)   also check vp != NULL
   vput(vp);

 It would be good if someone else could look over this routine and
 double-check David's find and his solution with my modification.  Have
 we handled all the cases?

Yes, for that code path.

Here's a simpler virtual unified diff that does the same thing as
David's patch.  (You don't need an 'eexistdebug' variable.)

if (vap-va_size != -1) {
...
-   if (error)
-   vput(vp);
}
+   if (error)
+   vput(vp);

You can add a check for 'error == 0' in addition to
'vap-va_size != -1' but that shouldn't have any effect.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: FS tuning (Was: File system gets too fragmented ???)

1999-05-31 Thread Ville-Pertti Keinonen

jo...@gnu.org (Joel Ray Holveck) writes:

 As we all know, tunefs -o space will hurt write performance.  Will it
 hurt read performance?  If I don't care about install-time speed, but
 do care about run-time speed and free space, should I populate my
 filesystems at install time with space tuning?

-o space should have very little effect install-time.

The space vs. time optimization parameter only has any effect when
files are extended by small amounts (the fragment at the end is
reallocated).  This seldom happens for most files.  Log files and
interactively edited files are probably (just guessing) most likely to
make use of fragment reallocations.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: question about vnode and inode locking

1999-05-31 Thread Ville-Pertti Keinonen

zzh...@cs.binghamton.edu (Zhihui Zhang) writes:

 It seems to me that we can lock at the vnode layer AND at the inode layer. 

No, the inode lock is, in most cases, the vnode layer lock.  It isn't
obvious because the code assumes that any filesystem using vop_stdlock
has a 'struct lock' as the first entry of the internal data pointed to
by v_data.

Very ugly.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: A bug in namei cache? (stale entries)

1999-05-28 Thread Ville-Pertti Keinonen

 Suppose, you have a directory hierarchy a - b - c.  In each of a, b, and
 c, we have the following files:
 
 a: ., .., a1, a2, a3, b   (a1, a2, a3 are not directory files)
 b: ., .., b1, b2, b3, c   (b1, b2, b3 are not directory files)
 
 If I do a mv a a_new, then cache entries for a, a1, a2, a3, b will be
 purged from the cache. Although b is purged from the namecache, we can
 still find it by other means (e.g. ufs_ihashget() called by ffs_vget()). 
 So the entries for b1, b2, b3, c are still useful.  So the namei cache
 will not contain any stale entries. 
 
 Am I right? 

Yes, except that the other means for finding b are more commonly by
holding a reference to the vnode (open file handle, currenct
directory) or just by searching the directory again.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: A bug in namei cache?

1999-05-27 Thread Ville-Pertti Keinonen
zzh...@cs.binghamton.edu (Zhihui Zhang) writes:

 Suppose you want to mv a directory file (with subdirectories) to another
 name (it is like grafting a subtree to another point), the namecache
 associated with the source directory file will be purged by calling
 cache_purge() (done in ufs_rename()?).  However, the routine cache_purge() 
 does not purge cache entries recursively down the subtree.  Will this
 result in a lot of stale entries in the namecache? FreeBSD 3.1 no longer

The name cache only caches component names, not paths, so the entries
are still valid.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Repeatable kernel panic for 3.2-RELEASE NFS server

1999-05-18 Thread Ville-Pertti Keinonen
cro...@cs.rpi.edu (David E. Cross) writes:

 One of our users way able to reliably crash an NFS server 3 times today.
 I have since copied his program and have reliably crashed a seperate and
 unloaded machine with the exact same panic, lockmgr: locking against
 myself.  I check the recent DG patches that went in after -RELEASE and they

Are you sure this is NFS related?

I can certainly reliably reproduce that and other panics (reported in
kern/11629, includes a fix).


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message