Re: Needed: suid library calls (was Re: cvs commit: src/crypto/openssh sshd_config)
[EMAIL PROTECTED] (Nick Sayer) writes: What we _really_ need is some mechanism to recognize the difference between a user program and a system library, with an eye towards granting privileges to trusted libraries without letting those privileges leak past the library in question. I don't claim that this is an _easy_ thing to do, nor that it is a particularly standard thing to do. (Shared) libraries are currently a userland concept. Doing what you're suggesting would require a special kind of library, controlled by the kernel and called through the kernel. In order to protect from threads and other means of sharing memory, the library would have to use its own memory for everything writable, protected against access by the unprivileged parts of the user program. This would effectively create a new ring of protection somewhere in between the "userland" and "kernel" privileges - a MULTICS concept, as Matthew Dillon noted - with its own stack and memory. On architectures that only implement user/supervisor modes of execution and don't provide segments or other kludges, such library calls would, in a sense, be executed in different processes (protection would require a separate address space - assuming that the library calls wouldn't be running in supervisor mode, in which case the entire mechanism would basically be per-process loadable system calls, also not an acceptable solution). But the mechanism of having some sort of daemon or service whose job it is to just do !strcmp(pw-pw_passwd,crypt(foo,pw-pw_passwd)) is, I think, kind of overkill. It would also have to open the password database using the appropriate privileges...which in the case of a privileged library and multithreaded programs (or just rfork) is unsafe because other threads could also access the database while the library has the file handle open. IMHO a "privileged library" would, to be safe, have to be so well isolated from the rest of the program that the functionality might as well be in a separate process. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: ipsec 'replay' syslog error messages after reboot of one host
[EMAIL PROTECTED] (Matthew Dillon) writes: The question is: What am I forgetting to do? Or is this a bug in our IPSEC implementation? AFAIK this is more or less how it's supposed to work. IPsec is a mess. Security associations are not stateless, ESP provides replay protection using a sequence number. Replay-prevention is, however, optional, and the setkey manual page claims it to be off by default, so it could be a bug...you might want to try specifying -r 0 explicitly. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: ipsec 'replay' syslog error messages after reboot of one host
IPSec isn't well documented, but once I figured out the config file it didn't seem too bad. I am guessing that replay prevention Reading the RFCs might be more helpful than most of the KAME documentation. There's also a lot of undocumented stuff for which the sources seem to be the only source of information (e.g. how PF_KEY v2 differs from the standard). I had to fix up /etc/rc.network a little to load the ipsec rules at the appropriate point (just after the interface and ipfw setup, but before any services (like NFS) are run). I am going to put the (relatively simple) patch for rc.network up for a quick review and then commit it along with an example file and a reference to the example file in the man page. Fixed security associations with an infinite lifetime are certainly not the ideal way of using IPsec. Examples of setups like this should be provided with the appropriate warnings. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Why this works?
[EMAIL PROTECTED] (FengYue) writes: I've 3 small programs. First one writes 4K of data contains 'A's into a file /tmp/pagetest and then lseek() to the begin of the file. Second one writes 4K of 'Z' into the same file /tmp/pagetest and then lseek() to the begin of the file. They both do that in a tight loop. Now, the third program reads 4K of data from /tmp/pagetest and exit if the 4K data does not contain all 'A's nor 'Z's. 3 programs run concurrently on the same machine (3.4). No lock in the code whatsoever, and all 3 programs use pure write() and read(). I thought the third program would exit pretty quickly since the data in the file may contain mixed of 'A's and 'Z's, but it has been running for hours and nothing happened. Could someone kindly explain this? I was told that this is because the pagesize is 4096 in the kernel, so that read()/write() 4K of data will not get context switched until the call is compeleted. Is that right? Not quite. If FreeBSD didn't perform locking, operations affecting single filesystem blocks would probably be atomic (as long as the userland buffer is in memory). However, FreeBSD does perform locking in read(2) and write(2) for local files, so your third program should never fail and exit. Note that the system call interface does not guarantee reads or writes to be atomic, this just happens to be how it is implemented at the moment. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: [OT] Finding people with GSM phones (was Re: GPS heads up )
[EMAIL PROTECTED] (Olaf Hoyer) writes: Well, thats reality. Sometimes the mobile telco hotlines are so overloaded, you cannot even tell them that your phone was stolen. (Talk about service-but you get what you pay for) In germany, there is some list, where every cell phone can be entered with its IMEI-number (thats like the MAC on an ethernet card). So theoretically you simply enter them and make them useless for the thief. In Finland, somebody is apparently doing something to track down stolen phones, rather than block their use. One Saturday morning I got a call from someone at some agency (I couldn't quite make out what it was, it sounded like customs but that would seem odd) accusing me of stealing the GSM phone I was using. It turned out that he had one digit wrong (presumably of the either the IMEI-number or just the MSISDN). I wonder what he was trying to accomplish by calling the supposedly stolen phone. This was last month, but not on April 1... ;--) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: What are the best gcc optimization options for Pentium 200 MMX
[EMAIL PROTECTED] (Kris Kennaway) writes: Can you say "gimmick"? :-) gcc often produces demonstrably broken code for optimisation levels higher than -O. That -O is safe seems to be a persistent myth. GCC also produces broken code for -O and no optimization in some cases, sometimes while producing working code for higher optimization levels... I wouldn't state e.g. that -O2 produces broken code any more often than -O, this may have been true for version X.Y.Z but is certainly not universally true. I believe that the reasons the FreeBSD build uses -O are the fact that especially with older versions of gcc, -O2 slowed down compilation considerably for little noticable performance improvement (as for -O3, automatic inlining is generally undesirable), and it is always best to only have to test the system with a single set of flags. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Shared /bin and /sbin
[EMAIL PROTECTED] (Warner Losh) writes: I have a system that has one file system on it (eg everything is on /). I'm finding that a lot of space is wasted on the multiple static copies of libc in /sbin and /bin. I was thinking about building, for this system only, /bin and /sbin dynamic. Has anybody ever done this? What are the implications of doing this. I can't think of anything that would stop this from working, but I thought I'd run it by people here. I've done this, and did manage to get an almost complete system into a reasonably small space. It was 2.2.x, but I wouldn't expect any special new requirements with more current versions. IIRC it didn't require much more than fixing the appropriate Makefile.incs in the source tree. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: UVM vs FreeBSD VM system
[EMAIL PROTECTED] (Jonas Bulow) writes: How does the UVM system compare to the VM system in FreeBSD? Are there any benchmark tests or research results in this area? The dissertation paper on UVM describes the differences (and is reasonably objective). It can be found on the UVM pages (http://www.ccrc.wustl.edu/pub/chuck/tech/uvm/). I'm not aware of any benchmarks, but would expect UVM to be somewhat slower doing most things (not by design, just the current implementation). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: vmnet (was: Linux ioctl not implemented error)
[EMAIL PROTECTED] (Vladimir N. Silyaev) writes: are need to have steal nerves. I fill that, at the time when I was porting vmware. I have too much hours of very interested work - load driver, launch vmware and then looking into the DDB double fault screen. Reload box, and then again. I suppose that the incomplete virtualization of the x86 prevents you from running vmware on FreeBSD on vmware on Linux for debugging? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: journaling UFS and LFS
[EMAIL PROTECTED] (Don) writes: and the next question: now that LFS starts to get usable in NetBSD - has anybody started to look at getting it working again in FreeBSD too (maybe matt ?) or has it on the TODO list LFS is being considered as a starting point for this project. The goal is to build an extensible file system with features such as the ability to grow and shrink partitions, acl's journaling etc. There is a difference between a log-structured filesystem and a journaling filesystem... XFS is also being considered as a feature reference. *Very* different from LFS. (What are features? "Has files and directories"? Time-complexity? Implementation details? Buzzwords?) This seems a bit hard to believe (must check freebsd-fs to see if people are actually *seriously* considering LFS as a starting point...). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: X11/C++ question
[EMAIL PROTECTED] (Chuck Robey) writes: Boy, I sure wish Java compiled and ran natively. I'd stop using C++ forever. gcc-2.95.1 + libgcj already works, at least for simple programs. On FreeBSD 3.x programs seem to work as long as you use statically linked libraries (shared libraries cause the garbage collector to dump core). There already seems to be some awt code in libgcj, I have no idea whether it's actually functional. And the speed isn't quite comparable to what you can achieve lower-level languages (pretty close to the equivalent C++ code with all methods virtual, heavy use of rtti and common-base-class-based containers), but probably good enough for a lot of things. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: --enable-haifa
[EMAIL PROTECTED] (W Gerald Hicks) writes: I don't have a shiny new K7 yet, where I might expect the haifa build to make more of a difference than my crusty old Pentium... Processors with out-of-order execution benefit *less* from scheduling than non-OOO superscalar processors. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Multiple routes to the same destination
[EMAIL PROTECTED] (Zhihui Zhang) writes: As said by the 4.4 BSD book (page 423), 4.4 BSD does not support multiple routes to the same destination (identical key and mask). Does the radix tree code in FreeBSD - 4.0 has the same limitation? I am wondering if there is already a solution for this? How would the routing code use multiple routes? You'd need additional rules to determine how to use them (e.g. round-robin for load balancing). In some cases where you want something unusual, you can use different net sizes for the same net. The code selects the route with the smallest net (or at least used to - I don't know whether this is documented behavior). Note that the destination presumably means the destination where the data being routed should end up, not the gateway it is sent to. Multiple routes referring to the same gateway are obviously supported. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: aio_*
[EMAIL PROTECTED] (Jayson Nordwick) writes: While reading through (at least trying to... I wish there was some sort of kernel documentation available, the entry fee is very high) the aio_* calls, I had a few questions to clear up my understanding: 1) Do they only work on files? The only implementation I see is in the VFS layer. AIO is not in the VFS layer. The source file containing the implementation is improperly named. It works on any file descriptor that you could do the equivalent read(2)/write(2) calls on. 2) It is my understanding that it uses an aio daemon running as a kernel thread (the aio_daemon() call kind of give that one away). It seems as if this can be almost entirely done in user space. More important to what I am trying to do, it seems as if aio_* does not give peak latency or throughput performace, since the aio_daemon has to compete for resources along with all other processes. Should aio_* be used for applications that have high performance requirements? What does aio_* get you above having a seperate thread pumping in/out data? The implementation in FreeBSD probably isn't a particularly efficient one. It should be faster than threads, though. You'll need fewer switches between user and kernel mode and synchronization is simpler. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: aio_*
nordw...@scam.xcf.berkeley.edu (Jayson Nordwick) writes: While reading through (at least trying to... I wish there was some sort of kernel documentation available, the entry fee is very high) the aio_* calls, I had a few questions to clear up my understanding: 1) Do they only work on files? The only implementation I see is in the VFS layer. AIO is not in the VFS layer. The source file containing the implementation is improperly named. It works on any file descriptor that you could do the equivalent read(2)/write(2) calls on. 2) It is my understanding that it uses an aio daemon running as a kernel thread (the aio_daemon() call kind of give that one away). It seems as if this can be almost entirely done in user space. More important to what I am trying to do, it seems as if aio_* does not give peak latency or throughput performace, since the aio_daemon has to compete for resources along with all other processes. Should aio_* be used for applications that have high performance requirements? What does aio_* get you above having a seperate thread pumping in/out data? The implementation in FreeBSD probably isn't a particularly efficient one. It should be faster than threads, though. You'll need fewer switches between user and kernel mode and synchronization is simpler. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Proposal: Add generic username for 3rd-party MTA's
[EMAIL PROTECTED] (Sheldon Hearn) writes: Actually, not. The postfix and exim ports, at least, would be taught to use the new UID when it became available in STABLE. I'm pretty sure smail and others would follow suit. Remember, _we_ control the ports and can have packages install for whatever ID we please. The transition could be much quicker if the uid was added by a port (e.g. mail/smtp-user) and the ports that wanted to use it depended on that port. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Proposal: Add generic username for 3rd-party MTA's
sheld...@uunet.co.za (Sheldon Hearn) writes: Actually, not. The postfix and exim ports, at least, would be taught to use the new UID when it became available in STABLE. I'm pretty sure smail and others would follow suit. Remember, _we_ control the ports and can have packages install for whatever ID we please. The transition could be much quicker if the uid was added by a port (e.g. mail/smtp-user) and the ports that wanted to use it depended on that port. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Sharing file descriptors
bri...@wintelcom.net (Alfred Perlstein) writes: 1) file descriptor passing (described in Unix Network Programming Vol I) Or just read recv(2), search for SCM_RIGHTS. 2) shared address fork (should be on http://lt.tar.com) Or just read rfork(2), and you don't need to share the address space. The general idea of software server redundancy seems a bit odd, though, debugging the software carefully and automatically restarting it on failures is generally a better idea. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: locking revisited
[EMAIL PROTECTED] (Greg Lehey) writes: All systems which do more than one thing at a time need file locking at some time or another. Since it involves cooperation between potentially unrelated processes, it's an obvious kernel function. Any "solution" requiring cooperation between processes isn't really a solution. As a result, I don't consider advisory locking to be real locking: it's just a kludge. But strict explicit locks (I'll avoid the term "mandatory" to avoid confusion) are only better than advisory locks in some restricted cases. You haven't commented on my previous critiques here. Nobody has disputed them, other than by saying that mandatory locking does not mean locking where you have to actually call a function to apply the lock, but that doesn't appear to be what *you* mean by mandatory locking. As the most explicit critique was not Cc'd to you (I lose fields because I read mailing lists through list-news gateways), and you may have missed it in all of the noise on the list, so I'll quote myself here: The most significant advantage I see with mandatory locking over advisory locking is guaranteeing atomicity for things done by programs that do use locking. This only protects the data when programs that access the same data without locking don't need locking, which generally means that they either don't need to modify the data or that there can't be multiple instances of those other programs *and* the modifications made are themselves atomic (can't be read-modify-write, or even multiple writes if consistency is required). This is a somewhat limited set of cases. If anyone can come up with a counter-example, please present it. Locking entire files, in addition to ranges, would seem to me to be of further benefit, as it would allow properly locking programs to fully protect against any single non-locking program which, like Greg's cat example, would presumably be run interactively and thus would require explicit stupidity to create additional races. Note that protection is *at best* against a single program and at worst only against other cooperative programs, as with advisory locks. I can understand the aesthetic point of preferring that when you say "lock the file", other processes don't get to do things to the file, but the practical value of being able to do this is not *that* significant, since the use of any non-cooperative programs makes advisory and explicit locks equally useless with the exception of cases mentioned above. FreeBSD is one of the few operating systems which doesn't have kernel-level locking. If we want to emulate other systems correctly, we *must* have advisory locking. This includes SCO UNIX, System V.4 How is FreeBSD's flock/fcntl advisory locking not kernel-level locking? All this doesn't leave too much room for arguments about whether locking works or not: it works on all platforms except FreeBSD, and that's only because FreeBSD doesn't implement locking. FreeBSD implements advisory locking. I would expect most people to consider it locking. It's sufficient for a lot of things. There are cases where some other kind of locking would be better, but then you may say that any voluntary locking scheme is useless. - System V style. We need this for compatibility with System V. The choice of mandatory or advisory locking depends on the file permissions. - Only mandatory locking. fcntl works as before, but locks are always mandatory, not advisory. I'm sure that this won't be popular, at least initially, but if you don't like it, you don't have to use it.y Perhaps both of the above, all with on/off sysctls. I would suggest implementing both range and file locking. Implementing mandatory locking as Terry Lambert defines it might also be worth considering, but it shouldn't be made easy to turn on by accident. And for all additions, things should be properly documented, since they might not guarantee what people would expect them to. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: locking revisited
g...@lemis.com (Greg Lehey) writes: All systems which do more than one thing at a time need file locking at some time or another. Since it involves cooperation between potentially unrelated processes, it's an obvious kernel function. Any solution requiring cooperation between processes isn't really a solution. As a result, I don't consider advisory locking to be real locking: it's just a kludge. But strict explicit locks (I'll avoid the term mandatory to avoid confusion) are only better than advisory locks in some restricted cases. You haven't commented on my previous critiques here. Nobody has disputed them, other than by saying that mandatory locking does not mean locking where you have to actually call a function to apply the lock, but that doesn't appear to be what *you* mean by mandatory locking. As the most explicit critique was not Cc'd to you (I lose fields because I read mailing lists through list-news gateways), and you may have missed it in all of the noise on the list, so I'll quote myself here: The most significant advantage I see with mandatory locking over advisory locking is guaranteeing atomicity for things done by programs that do use locking. This only protects the data when programs that access the same data without locking don't need locking, which generally means that they either don't need to modify the data or that there can't be multiple instances of those other programs *and* the modifications made are themselves atomic (can't be read-modify-write, or even multiple writes if consistency is required). This is a somewhat limited set of cases. If anyone can come up with a counter-example, please present it. Locking entire files, in addition to ranges, would seem to me to be of further benefit, as it would allow properly locking programs to fully protect against any single non-locking program which, like Greg's cat example, would presumably be run interactively and thus would require explicit stupidity to create additional races. Note that protection is *at best* against a single program and at worst only against other cooperative programs, as with advisory locks. I can understand the aesthetic point of preferring that when you say lock the file, other processes don't get to do things to the file, but the practical value of being able to do this is not *that* significant, since the use of any non-cooperative programs makes advisory and explicit locks equally useless with the exception of cases mentioned above. FreeBSD is one of the few operating systems which doesn't have kernel-level locking. If we want to emulate other systems correctly, we *must* have advisory locking. This includes SCO UNIX, System V.4 How is FreeBSD's flock/fcntl advisory locking not kernel-level locking? All this doesn't leave too much room for arguments about whether locking works or not: it works on all platforms except FreeBSD, and that's only because FreeBSD doesn't implement locking. FreeBSD implements advisory locking. I would expect most people to consider it locking. It's sufficient for a lot of things. There are cases where some other kind of locking would be better, but then you may say that any voluntary locking scheme is useless. - System V style. We need this for compatibility with System V. The choice of mandatory or advisory locking depends on the file permissions. - Only mandatory locking. fcntl works as before, but locks are always mandatory, not advisory. I'm sure that this won't be popular, at least initially, but if you don't like it, you don't have to use it.y Perhaps both of the above, all with on/off sysctls. I would suggest implementing both range and file locking. Implementing mandatory locking as Terry Lambert defines it might also be worth considering, but it shouldn't be made easy to turn on by accident. And for all additions, things should be properly documented, since they might not guarantee what people would expect them to. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
[EMAIL PROTECTED] (Terry Lambert) writes: I think this has been the basis of your objection so far. If so, it's a fundamental misunderstanding of "mandatory". In this context What I was objecting to were some of the arguments made by Greg Lehey and Wes Peters, both of whom explicitly stated that opening does not block. It had nothing to do with mandatory locking beyond that (quite possibly flawed) interpretation. By your definiton of explicit, no. However, explicit locking is voluntary, just as advisory locking is voluntary, in terms of whether programs participate (or not). This pretty much means that explicit locking degrades to advisory locking, in the presence of (un)intentionally non-participatory programs. That's basically what my objection was. The "deadlock prone" objection made by others applies more strongly to implicit locking, and is also valid. It can take quite a bit of care to ensure that there is always a maintenance path to the system that allows a sufficient environment to be used without blocking on locked files to allow root to get in and kill any processes causing problems. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mandatory locking?
tlamb...@primenet.com (Terry Lambert) writes: I think this has been the basis of your objection so far. If so, it's a fundamental misunderstanding of mandatory. In this context What I was objecting to were some of the arguments made by Greg Lehey and Wes Peters, both of whom explicitly stated that opening does not block. It had nothing to do with mandatory locking beyond that (quite possibly flawed) interpretation. By your definiton of explicit, no. However, explicit locking is voluntary, just as advisory locking is voluntary, in terms of whether programs participate (or not). This pretty much means that explicit locking degrades to advisory locking, in the presence of (un)intentionally non-participatory programs. That's basically what my objection was. The deadlock prone objection made by others applies more strongly to implicit locking, and is also valid. It can take quite a bit of care to ensure that there is always a maintenance path to the system that allows a sufficient environment to be used without blocking on locked files to allow root to get in and kill any processes causing problems. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
Not to jump down your throat, or anything, but you seem to be perpetuating some incorrct assumptions about both effect and proposed implementation details, and they must be stomped. 8-). I was assuming that mandatory locking, in the context of this discussion, does not mean automatic, forced exclusion on open, but rather explicit locks, applied by calls similar to those used for advisory locking, that are enforced by the kernel. The arguments presented by most people seem to rely on such an interpretation. To avoid confusion, I'll refer to the possible locking methods as advisory locking, explicit locking and implicit locking. Advisory locking lacks coherency for a NetWare, SMB, AppleTalk, or other file server running under FreeBSD as a hosted OS. It also has the problem that the hosted OS semantics, if they include mandatory locking, are not enforced against other processes, e.g. between an SMB server and an AppleTalk server running on the same machine, or beteen an SMB server and a UNIX program both needing access to the same database. Yes, if file service protocols don't provide locking (or if one of the operating systems involved doesn't provide locking) they obviously can't benefit from any locking that isn't done implicitly. Also, I believe your example is flawed. If a file is opened by a Not for explicit locking, I hope. process that requires mandatory locking, no process that does not also open the file with mandatory locking turned on can access the file. Neither can a program that requires mandatory locking semantics open the file if it is open by a process not using those same semantics. So if a process wishes to use explicit locking calls, it indicates that intent when opening the file - otherwise, the open implicitly locks the file. So multiple writers, or simultaneous readers and writers are only permitted for programs that indicate that they are going to use explicit locks on the file. This could actually make sense. But I don't think that is what is being suggested. Mandatory locking for things like database files is necessary, unless the underlying FS supports records (in which case, like FILES-11, it most likely supports record locking anyway, and may only decide not to support them if it seperately implements a transaction facility). You have to have mandatory locking to implement transactions... like updating the parity bits on a RAID 5 stripe. But you certainly don't want to use open/read/write/close cycles for such a purpose. This is why so many _real_ UNIX databases like to squat on their own raw disk partition. Locking entire files, in addition to ranges, would seem to me to be of further benefit, as it would allow properly locking programs to fully protect against any single non-locking program which, like Greg's cat example, would presumably be run interactively and thus would require explicit stupidity to create additional races. This is already possible, using O_EXCL. Likewise, it doesn't I think you mean O_EXLOCK. It sets an advisory lock, it does not help against programs that don't use locks. apply to device files, and can not be applied (via fcntl(2)) to any files whose vnodes indirect through other than the vfsops version of "struct fileops". It doesn't depend on the struct fileops selected, fcntl checks explicitly that f_type is DTYPE_VNODE before assuming that f_data points to a vnode. For SVR4 semantics, you can set the suid/sgid permission bits on a non-executable file. The document describing "mandatory" locking in Linux seems to indicate that setting sgid changes the behavior of locking calls to apply explicit locks rather than merely advisory ones, and that this is what is done by other operating systems as well. Actually an implementation could still use the existing (advisory) locks internally, but apply advisory locks in the kernel for the duration of operations that need them (read/write/some cases of open). The act of opening the file for O_RDONLY sets a read lock on the entire file, which allows multiple readers, and the act of opening the file O_RDWR sets a write lock on the file, which allows a single writer. I'm fairly certain this is not what is being discussed. Certainly not by more than half of the participants in the discussion. ;--) Again, there are no issues with badly behaved processes. There is no such thing as a badly behaved process. I agree, for implicit locks there isn't. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mandatory locking?
Not to jump down your throat, or anything, but you seem to be perpetuating some incorrct assumptions about both effect and proposed implementation details, and they must be stomped. 8-). I was assuming that mandatory locking, in the context of this discussion, does not mean automatic, forced exclusion on open, but rather explicit locks, applied by calls similar to those used for advisory locking, that are enforced by the kernel. The arguments presented by most people seem to rely on such an interpretation. To avoid confusion, I'll refer to the possible locking methods as advisory locking, explicit locking and implicit locking. Advisory locking lacks coherency for a NetWare, SMB, AppleTalk, or other file server running under FreeBSD as a hosted OS. It also has the problem that the hosted OS semantics, if they include mandatory locking, are not enforced against other processes, e.g. between an SMB server and an AppleTalk server running on the same machine, or beteen an SMB server and a UNIX program both needing access to the same database. Yes, if file service protocols don't provide locking (or if one of the operating systems involved doesn't provide locking) they obviously can't benefit from any locking that isn't done implicitly. Also, I believe your example is flawed. If a file is opened by a Not for explicit locking, I hope. process that requires mandatory locking, no process that does not also open the file with mandatory locking turned on can access the file. Neither can a program that requires mandatory locking semantics open the file if it is open by a process not using those same semantics. So if a process wishes to use explicit locking calls, it indicates that intent when opening the file - otherwise, the open implicitly locks the file. So multiple writers, or simultaneous readers and writers are only permitted for programs that indicate that they are going to use explicit locks on the file. This could actually make sense. But I don't think that is what is being suggested. Mandatory locking for things like database files is necessary, unless the underlying FS supports records (in which case, like FILES-11, it most likely supports record locking anyway, and may only decide not to support them if it seperately implements a transaction facility). You have to have mandatory locking to implement transactions... like updating the parity bits on a RAID 5 stripe. But you certainly don't want to use open/read/write/close cycles for such a purpose. This is why so many _real_ UNIX databases like to squat on their own raw disk partition. Locking entire files, in addition to ranges, would seem to me to be of further benefit, as it would allow properly locking programs to fully protect against any single non-locking program which, like Greg's cat example, would presumably be run interactively and thus would require explicit stupidity to create additional races. This is already possible, using O_EXCL. Likewise, it doesn't I think you mean O_EXLOCK. It sets an advisory lock, it does not help against programs that don't use locks. apply to device files, and can not be applied (via fcntl(2)) to any files whose vnodes indirect through other than the vfsops version of struct fileops. It doesn't depend on the struct fileops selected, fcntl checks explicitly that f_type is DTYPE_VNODE before assuming that f_data points to a vnode. For SVR4 semantics, you can set the suid/sgid permission bits on a non-executable file. The document describing mandatory locking in Linux seems to indicate that setting sgid changes the behavior of locking calls to apply explicit locks rather than merely advisory ones, and that this is what is done by other operating systems as well. Actually an implementation could still use the existing (advisory) locks internally, but apply advisory locks in the kernel for the duration of operations that need them (read/write/some cases of open). The act of opening the file for O_RDONLY sets a read lock on the entire file, which allows multiple readers, and the act of opening the file O_RDWR sets a write lock on the file, which allows a single writer. I'm fairly certain this is not what is being discussed. Certainly not by more than half of the participants in the discussion. ;--) Again, there are no issues with badly behaved processes. There is no such thing as a badly behaved process. I agree, for implicit locks there isn't. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
[EMAIL PROTECTED] (Wes Peters) writes: And how many programmers with nearly (or more than) two decades of UNIX experience it takes to convince someone it really is useful. It should only take one, as long as the arguments made are not bogus. IMHO Greg made some very silly arguments (or at least used some very stupid examples) for mandatory locking and never answered my points regarding them. (The arguments of some of the ones opposing mandatory locking have been equally silly.) I *do* agree that mandatory locking *can* be useful, but the usefulness is not nearly as broad as some people seem to be implying, and advisory locking is not as useless as some claim. The most significant advantage I see with mandatory locking over advisory locking is guaranteeing atomicity for things done by programs that do use locking. This only protects the data when programs that access the same data without locking don't need locking, which generally means that they either don't need to modify the data or that there can't be multiple instances of those other programs *and* the modifications made are themselves atomic (can't be read-modify-write, or even multiple writes if consistency is required). This is a somewhat limited set of cases. If anyone can come up with a counter-example, please present it. Locking entire files, in addition to ranges, would seem to me to be of further benefit, as it would allow properly locking programs to fully protect against any single non-locking program which, like Greg's cat example, would presumably be run interactively and thus would require explicit stupidity to create additional races. Locking entire files is also the only way to ensure that non-locking programs can even see the file in a consistent state. As a special case, mandatory locking could also be useful in ensuring long-term exclusive access to some set of data, but this seems like something that should be done using file permissions. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mandatory locking?
w...@softweyr.com (Wes Peters) writes: And how many programmers with nearly (or more than) two decades of UNIX experience it takes to convince someone it really is useful. It should only take one, as long as the arguments made are not bogus. IMHO Greg made some very silly arguments (or at least used some very stupid examples) for mandatory locking and never answered my points regarding them. (The arguments of some of the ones opposing mandatory locking have been equally silly.) I *do* agree that mandatory locking *can* be useful, but the usefulness is not nearly as broad as some people seem to be implying, and advisory locking is not as useless as some claim. The most significant advantage I see with mandatory locking over advisory locking is guaranteeing atomicity for things done by programs that do use locking. This only protects the data when programs that access the same data without locking don't need locking, which generally means that they either don't need to modify the data or that there can't be multiple instances of those other programs *and* the modifications made are themselves atomic (can't be read-modify-write, or even multiple writes if consistency is required). This is a somewhat limited set of cases. If anyone can come up with a counter-example, please present it. Locking entire files, in addition to ranges, would seem to me to be of further benefit, as it would allow properly locking programs to fully protect against any single non-locking program which, like Greg's cat example, would presumably be run interactively and thus would require explicit stupidity to create additional races. Locking entire files is also the only way to ensure that non-locking programs can even see the file in a consistent state. As a special case, mandatory locking could also be useful in ensuring long-term exclusive access to some set of data, but this seems like something that should be done using file permissions. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
[EMAIL PROTECTED] (Chuck Robey) writes: On 23 Aug 1999, Ville-Pertti Keinonen wrote: And even without otherwise incorrect behavior, if you have a program that doesn't use any locking and another one that uses mandatory locking to prevent races with the non-locking program, the mere existence of the locking program does not prevent multiple non-locking programs from generating similar conditions. That's very odd, I thought the idea behind mandatory locking was to completely eliminate the possibility that a program could do what you're saying; all programs would *mandatorily* be forced to do locking to access the resource. I don't know what the textbook definition for mandatory locking is, but was assuming (particularly considering the proposal to use a fcntl interface) that by mandatory locking, Greg was referring to a "harder" lock than current advisory locking, one that had to be instantiated explicitly but would not only lock out other attempts to lock, but all other attempts to access the file. The further messages in this the thread seems to indicate that different individuals have different definitions for mandatory locking... I'd still assume that marking a file to be accessible by only one process at a time is *not* what is being discussed. Particularly since it is not even clear what this would mean for forked processes, dup, sending file descriptors over local sockets etc. Note that my arguments earlier don't apply in a case where you might want to e.g. ensure consistency for non-locking programs with read-only access, with the only program with privileges to modify the data making the data inaccessible during updates. This is a scenario where it would, IMHO, actually be quite useful to have mandatory locking. In any case, if shared (open) access is allowed, such a feature can introduce semantic changes to read/write system calls - normally, read/write can never return EAGAIN or block for unlimited amounts of time on regular, local files. EAGAIN is not that much of a problem, as it requires explicitly setting O_NONBLOCK, but blocking can introduce new deadlocks. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mandatory locking?
[EMAIL PROTECTED] (John-Mark Gurney) writes: Ville-Pertti Keinonen scribbled this message on Aug 24: cat writes part of oldmail to /var/mail/grog sendmail locks /var/mail/grog (cat may try to write more to /var/mail/grog but blocks) sendmail delivers new mail sendmail unlocks /var/mail/grog cat writes the rest of oldmail to /var/mail/grog You'll still probably end up with a broken mailbox. what you do is this: lockf -k $mailfile cat ${mailtmp} $mailfile Which doesn't support Greg's arguments for mandatory locking, as you're now doing locking in both programs. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mandatory locking?
chu...@picnic.mat.net (Chuck Robey) writes: On 23 Aug 1999, Ville-Pertti Keinonen wrote: And even without otherwise incorrect behavior, if you have a program that doesn't use any locking and another one that uses mandatory locking to prevent races with the non-locking program, the mere existence of the locking program does not prevent multiple non-locking programs from generating similar conditions. That's very odd, I thought the idea behind mandatory locking was to completely eliminate the possibility that a program could do what you're saying; all programs would *mandatorily* be forced to do locking to access the resource. I don't know what the textbook definition for mandatory locking is, but was assuming (particularly considering the proposal to use a fcntl interface) that by mandatory locking, Greg was referring to a harder lock than current advisory locking, one that had to be instantiated explicitly but would not only lock out other attempts to lock, but all other attempts to access the file. The further messages in this the thread seems to indicate that different individuals have different definitions for mandatory locking... I'd still assume that marking a file to be accessible by only one process at a time is *not* what is being discussed. Particularly since it is not even clear what this would mean for forked processes, dup, sending file descriptors over local sockets etc. Note that my arguments earlier don't apply in a case where you might want to e.g. ensure consistency for non-locking programs with read-only access, with the only program with privileges to modify the data making the data inaccessible during updates. This is a scenario where it would, IMHO, actually be quite useful to have mandatory locking. In any case, if shared (open) access is allowed, such a feature can introduce semantic changes to read/write system calls - normally, read/write can never return EAGAIN or block for unlimited amounts of time on regular, local files. EAGAIN is not that much of a problem, as it requires explicitly setting O_NONBLOCK, but blocking can introduce new deadlocks. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
g...@lemis.com (Greg Lehey) writes: an agreement of some kind. But what if I want to merge the contents of another mail folder: cat oldmail /var/mail/grog That works, but it's playing with fire: if sendmail is delivering a message at the same time, it won't see me, and my cat doesn't get a lock beforehand, so both an incoming message and part of my mail folder could end up getting written to the same location. With mandatory locking, it would work, transparently. Certainly not with range-locking rather than file-locking. cat is certainly not guaranteed to be atomic, and while you shouldn't end up writing things in the same location, what might happen unless you are preventing multiple openers is: cat writes part of oldmail to /var/mail/grog sendmail locks /var/mail/grog (cat may try to write more to /var/mail/grog but blocks) sendmail delivers new mail sendmail unlocks /var/mail/grog cat writes the rest of oldmail to /var/mail/grog You'll still probably end up with a broken mailbox. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
gurne...@efn.org (John-Mark Gurney) writes: Ville-Pertti Keinonen scribbled this message on Aug 24: cat writes part of oldmail to /var/mail/grog sendmail locks /var/mail/grog (cat may try to write more to /var/mail/grog but blocks) sendmail delivers new mail sendmail unlocks /var/mail/grog cat writes the rest of oldmail to /var/mail/grog You'll still probably end up with a broken mailbox. what you do is this: lockf -k $mailfile cat ${mailtmp} $mailfile Which doesn't support Greg's arguments for mandatory locking, as you're now doing locking in both programs. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
[EMAIL PROTECTED] (John-Mark Gurney) writes: Christopher Seiwald scribbled this message on Aug 18: It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. why don't you implement this w/ the 5 element median selection qsort algorithm? my professor for cis413 talked about this algorithm and that it really is the fastest qsort algorithm... I don't have any pointers to a paper on this... but I might be able to dig some info up on it if you are interested... I don't think the point is eliminating worst-cases, but optimizing common cases, which in this case caused more worst-cases and thus needs fixing. Besides, the median selection chooses among more than 3 elements already (but only when the data set is large enough). For fixing worst cases, an introspective sort might be a good idea, i.e. do a quick sort but fall back to heap sort if a certain depth is exceeded (you know you're losing when the depth exceeds log n). This also has another advantage - if you limit the depth of the sort, you don't need to use the cpu stack for state, you can allocate a fixed-size array for the purpose. This probably isn't a real performance advantage for a C qsort implementation because of the overhead of calling cmp. It does, however, guarantee that sorting uses a reasonable amount of stack. Such an assumption isn't portable when using qsort(3), though. Expect to die if you do large qsorts from threads with small thread stacks. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mandatory locking?
[EMAIL PROTECTED] (Greg Lehey) writes: Again, if we have two concurrent transactions, we stand to gain money: the updated balance is likely not to know about the other transaction, and will thus "forget" one of the deductions. Now I suppose you're going to come and say that this is bad programming, and advisory locking would do the job if the software is written right. Correct. You could also use the same argument to say that memory protection isn't necessary, because a correctly written program doesn't overwrite other processes address space. It's the The difference is that if a program has privileges to screw up whatever you are protecting, it can do so even if you do have mandatory locking, simply by functioning incorrectly when it does gain access to the data. And even without otherwise incorrect behavior, if you have a program that doesn't use any locking and another one that uses mandatory locking to prevent races with the non-locking program, the mere existence of the locking program does not prevent multiple non-locking programs from generating similar conditions. (I'm not opposed to mandatory locking in principle, but I don't find your reasoning very convincing.) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
gurne...@efn.org (John-Mark Gurney) writes: Christopher Seiwald scribbled this message on Aug 18: It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. why don't you implement this w/ the 5 element median selection qsort algorithm? my professor for cis413 talked about this algorithm and that it really is the fastest qsort algorithm... I don't have any pointers to a paper on this... but I might be able to dig some info up on it if you are interested... I don't think the point is eliminating worst-cases, but optimizing common cases, which in this case caused more worst-cases and thus needs fixing. Besides, the median selection chooses among more than 3 elements already (but only when the data set is large enough). For fixing worst cases, an introspective sort might be a good idea, i.e. do a quick sort but fall back to heap sort if a certain depth is exceeded (you know you're losing when the depth exceeds log n). This also has another advantage - if you limit the depth of the sort, you don't need to use the cpu stack for state, you can allocate a fixed-size array for the purpose. This probably isn't a real performance advantage for a C qsort implementation because of the overhead of calling cmp. It does, however, guarantee that sorting uses a reasonable amount of stack. Such an assumption isn't portable when using qsort(3), though. Expect to die if you do large qsorts from threads with small thread stacks. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Mandatory locking?
g...@lemis.com (Greg Lehey) writes: Again, if we have two concurrent transactions, we stand to gain money: the updated balance is likely not to know about the other transaction, and will thus forget one of the deductions. Now I suppose you're going to come and say that this is bad programming, and advisory locking would do the job if the software is written right. Correct. You could also use the same argument to say that memory protection isn't necessary, because a correctly written program doesn't overwrite other processes address space. It's the The difference is that if a program has privileges to screw up whatever you are protecting, it can do so even if you do have mandatory locking, simply by functioning incorrectly when it does gain access to the data. And even without otherwise incorrect behavior, if you have a program that doesn't use any locking and another one that uses mandatory locking to prevent races with the non-locking program, the mere existence of the locking program does not prevent multiple non-locking programs from generating similar conditions. (I'm not opposed to mandatory locking in principle, but I don't find your reasoning very convincing.) To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD-XFS Update
[EMAIL PROTECTED] (Alton, Matthew) writes: I am currently researching methods for implementing the 64-bit syscalls stat64(), fstat64(), lseek64() etc. delineated in the SGI design doc _64 Bit File Access_ by Adam Sweeney. Do the design docs indicate how inode numbers should interact with userland APIs? IIRC, inode numbers are 64-bit numbers in XFS. Since ino_t, st_ino of struct stat and d_fileno of struct dirent are only 32 bits, inode numbers may be truncated and not appear unique to userland. This would break the assumptions of some code (e.g. getcwd(3), when not using the kernel extension). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: BSD-XFS Update
matthew.al...@anheuser-busch.com (Alton, Matthew) writes: I am currently researching methods for implementing the 64-bit syscalls stat64(), fstat64(), lseek64() etc. delineated in the SGI design doc _64 Bit File Access_ by Adam Sweeney. Do the design docs indicate how inode numbers should interact with userland APIs? IIRC, inode numbers are 64-bit numbers in XFS. Since ino_t, st_ino of struct stat and d_fileno of struct dirent are only 32 bits, inode numbers may be truncated and not appear unique to userland. This would break the assumptions of some code (e.g. getcwd(3), when not using the kernel extension). To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: libcompat proposition
ch...@calldei.com (Chris Costello) writes: I'm in favor of a libgnucompat rather than gnu functions in libcompat. And how would a libgnucompat be different from libiberty? Except of course that it would be maintained by the FreeBSD folks... Or that it would be maintained at all. ;--) To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD voice synthesis
[EMAIL PROTECTED] (Dag-Erling Smorgrav) writes: Ville-Pertti Keinonen [EMAIL PROTECTED] writes: I certainly don't expect any of the available voices to be able to pronounce Finnish names correctly, even with phonetic specifications. If the software were *designed* to speak Finnish, I'd expect it to cope with Finnish much better than it currently does with English, seeing as you guys have nearly phonetic spelling. Festival is basically language-independent, each voice is associated with a specific language, so with a Finnish voice it should be able to pronounce Finnish reasonably. Since the English voices have dictionaries for pronunciation, anyhow, a Finnish voice wouldn't necessarily do a better job in terms of pronunciation, although a Finnish voice should require fewer distinct phonemes. Creating voices does seem to involve quite a bit of work, though. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: BSD voice synthesis
w...@softweyr.com (Wes Peters) writes: available for home computers decades ago. (Anyone else here ever use SAM the Software Automated Mouth for the Atari 800 or Commodore 64?) Yes. It's almost surprising how little speech synthesis has improved, at least judging from the festival demos (it is, of course, better than SAM, but apparently the data and processing requirements are several orders of magnitude greater). I haven't downloaded all of the required stuff, yet, so I don't know how good or bad it actually might be. I certainly don't expect any of the available voices to be able to pronounce Finnish names correctly, even with phonetic specifications. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD voice synthesis
d...@flood.ping.uio.no (Dag-Erling Smorgrav) writes: Ville-Pertti Keinonen w...@iki.fi writes: I certainly don't expect any of the available voices to be able to pronounce Finnish names correctly, even with phonetic specifications. If the software were *designed* to speak Finnish, I'd expect it to cope with Finnish much better than it currently does with English, seeing as you guys have nearly phonetic spelling. Festival is basically language-independent, each voice is associated with a specific language, so with a Finnish voice it should be able to pronounce Finnish reasonably. Since the English voices have dictionaries for pronunciation, anyhow, a Finnish voice wouldn't necessarily do a better job in terms of pronunciation, although a Finnish voice should require fewer distinct phonemes. Creating voices does seem to involve quite a bit of work, though. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Documenting writev(2) ENOBUFS error
: [ENOBUFS] Insufficient system buffer space exists to complete the op- :eration. : :Do you know what kind of circumstances that error *really* occurs :under? So you can get ENOBUFS not related to mbufs for UDP/local datagram sockets, but you should never get ENOBUFS from write for TCP sockets or local stream sockets. So, do you want to enumerate the cases in which this error can occur in the man page? This is not generally done, now that we have verified it is possible for the system to generate ENOBUFS on a writev. I think the text stands as it is. It should probably mention that it doesn't occur for most files (or that it only occurs for datagram sockets - although it probably applies to some types of raw sockets, too, and possibly non-PF_INET/PF_UNIX sockets) to avoid people doing unnecessary checking or implementing kernel code that bails out when it shouldn't. It should be a fair requirement that the kernel continue to never fail with ENOBUFS for a write to a reliable stream (local file, fifo, pipe or stream socket) and that such cases be treated as bugs. I would assume that this corresponds to how other systems operate, as well. Of course I'm no authority on this, and I'm not sure about NFS writes. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Documenting writev(2) ENOBUFS error
: [ENOBUFS] Insufficient system buffer space exists to complete the op- :eration. : :Do you know what kind of circumstances that error *really* occurs :under? So you can get ENOBUFS not related to mbufs for UDP/local datagram sockets, but you should never get ENOBUFS from write for TCP sockets or local stream sockets. So, do you want to enumerate the cases in which this error can occur in the man page? This is not generally done, now that we have verified it is possible for the system to generate ENOBUFS on a writev. I think the text stands as it is. It should probably mention that it doesn't occur for most files (or that it only occurs for datagram sockets - although it probably applies to some types of raw sockets, too, and possibly non-PF_INET/PF_UNIX sockets) to avoid people doing unnecessary checking or implementing kernel code that bails out when it shouldn't. It should be a fair requirement that the kernel continue to never fail with ENOBUFS for a write to a reliable stream (local file, fifo, pipe or stream socket) and that such cases be treated as bugs. I would assume that this corresponds to how other systems operate, as well. Of course I'm no authority on this, and I'm not sure about NFS writes. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Documenting writev(2) ENOBUFS error
:[EMAIL PROTECTED] (Wes Peters) writes: : : [ENOBUFS] Insufficient system buffer space exists to complete the op- :eration. : :Do you know what kind of circumstances that error *really* occurs :under? : :If it happened with files, that would be a bug and should be fixed. :The call is supposed to block to wait for writes to be possible. This I am almost certain that this error can only occur when writing to sockets, and only then of the network mbuf pool is completely exhausted. UDP is probably the most vulernable. It looks to me like it can't happen to stream sockets using write/writev. As far as I can tell, the ENOBUFS error can occur internally for sends if: - There is a shortage of mbufs at a low level (at higher levels, code either blocks or panics) - A network interface has a lot of packets queued (this is done at an IP level) - The socket buffer of a local datagram socket is full (the receiving socket, not the one the send occurred on) The TCP layer doesn't let ENOBUFS from low-level calls get through, but returns success. A TCP socket is prepared to resend the data at a higher level, anyhow, so the data is not lost and an error doesn't need to be returned. OOB data or implicit connections can return ENOBUFS for TCP sends, but they are activated by parameters only available through the send/sendto API. So you can get ENOBUFS not related to mbufs for UDP/local datagram sockets, but you should never get ENOBUFS from write for TCP sockets or local stream sockets. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Documenting writev(2) ENOBUFS error
:w...@softweyr.com (Wes Peters) writes: : : [ENOBUFS] Insufficient system buffer space exists to complete the op- :eration. : :Do you know what kind of circumstances that error *really* occurs :under? : :If it happened with files, that would be a bug and should be fixed. :The call is supposed to block to wait for writes to be possible. This I am almost certain that this error can only occur when writing to sockets, and only then of the network mbuf pool is completely exhausted. UDP is probably the most vulernable. It looks to me like it can't happen to stream sockets using write/writev. As far as I can tell, the ENOBUFS error can occur internally for sends if: - There is a shortage of mbufs at a low level (at higher levels, code either blocks or panics) - A network interface has a lot of packets queued (this is done at an IP level) - The socket buffer of a local datagram socket is full (the receiving socket, not the one the send occurred on) The TCP layer doesn't let ENOBUFS from low-level calls get through, but returns success. A TCP socket is prepared to resend the data at a higher level, anyhow, so the data is not lost and an error doesn't need to be returned. OOB data or implicit connections can return ENOBUFS for TCP sends, but they are activated by parameters only available through the send/sendto API. So you can get ENOBUFS not related to mbufs for UDP/local datagram sockets, but you should never get ENOBUFS from write for TCP sockets or local stream sockets. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Documenting writev(2) ENOBUFS error
w...@softweyr.com (Wes Peters) writes: [ENOBUFS] Insufficient system buffer space exists to complete the op- eration. Do you know what kind of circumstances that error *really* occurs under? If it happened with files, that would be a bug and should be fixed. The call is supposed to block to wait for writes to be possible. This applies to stream sockets in most cases, as well. Based on a quick look at the code, out-of-band TCP data seems to be the only case where ENOBUFS might be returned for streams, and that obviously doesn't apply to write/writev. As I mentioned to Nik in private mail, for datagram sockets, the description in send(2) more or less applies. Programs should generally use send rather than write for such objects, anyhow. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: speed of file(1)
[EMAIL PROTECTED] (Peter Jeremy) writes: "Leif Neland" [EMAIL PROTECTED] wrote: My 60MHz Pentium, FreeBSD time file /usr/home/leif/vnc-3.3.2r /usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated, original filename, last modified: Thu Jan 21 19:23:21 1999 real0m1.237s user0m0.758s sys 0m0.394s I can't believe these figures. Hmm, a 200 MHz Pentium (MMX), 3.2-RELEASE, everything in cache: $ /usr/bin/time file twofish.tar.gz twofish.tar.gz: gzip compressed data, deflated, last modified: Mon Jun 15 02:40:53 1998, os: Unix 0.35 real 0.24 user 0.10 sys I'd say that considering that things are cached (cpu-bound), it's very accurately proportional to Leif's time. Variances can be accounted for by the slight implementation differences (the MMX version has a bigger L1 and better branch prediction). It's also reasonably proportional to a 400 MHz PII (0.09/0.08/0.01 running 3.2 -- 0.06/0.04/0.01 running 2.2.8, BTW). Considering the completely different core, this is also quite close to what you might expect. I can't reproduce the complaint using a 64MB PII-266 running -CURRENT - there's no evidence of lack of speed, and profiling file(1) doesn't show any anomolies. What are your results, then? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: speed of file(1)
jere...@gsmx07.alcatel.com.au (Peter Jeremy) writes: Leif Neland le...@neland.dk wrote: My 60MHz Pentium, FreeBSD time file /usr/home/leif/vnc-3.3.2r /usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated, original filename, last modified: Thu Jan 21 19:23:21 1999 real0m1.237s user0m0.758s sys 0m0.394s I can't believe these figures. Hmm, a 200 MHz Pentium (MMX), 3.2-RELEASE, everything in cache: $ /usr/bin/time file twofish.tar.gz twofish.tar.gz: gzip compressed data, deflated, last modified: Mon Jun 15 02:40:53 1998, os: Unix 0.35 real 0.24 user 0.10 sys I'd say that considering that things are cached (cpu-bound), it's very accurately proportional to Leif's time. Variances can be accounted for by the slight implementation differences (the MMX version has a bigger L1 and better branch prediction). It's also reasonably proportional to a 400 MHz PII (0.09/0.08/0.01 running 3.2 -- 0.06/0.04/0.01 running 2.2.8, BTW). Considering the completely different core, this is also quite close to what you might expect. I can't reproduce the complaint using a 64MB PII-266 running -CURRENT - there's no evidence of lack of speed, and profiling file(1) doesn't show any anomolies. What are your results, then? To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Overcommit and calloc()
[EMAIL PROTECTED] (Dag-Erling Smorgrav) writes: "Kelly Yancey" [EMAIL PROTECTED] writes: Ahh...but wouldn't the bzero() touch all of the memory just allocated functionally making it non-overcommit? No. If it were an "non-overcomitting malloc", it would return NULL and set errno to ENOMEM, instead of dumping core. It won't dump core. If it isn't the biggest process, it'll simply succeed, but somebody else is killed. If it's the biggest process, it'll die with SIGKILL without dumping core. There *are* systems that kill "random" processes when swap runs out, presumably when they need to actually get pages that aren't available. FreeBSD is not one of them. Overcommit still has nothing to do with malloc. Either the *system* is overcommitted or it isn't - per-process overcommitment is irrelevant, as is the way memory has become overcommitted. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Overcommit and calloc()
d...@flood.ping.uio.no (Dag-Erling Smorgrav) writes: Kelly Yancey kby...@alcnet.com writes: Ahh...but wouldn't the bzero() touch all of the memory just allocated functionally making it non-overcommit? No. If it were an non-overcomitting malloc, it would return NULL and set errno to ENOMEM, instead of dumping core. It won't dump core. If it isn't the biggest process, it'll simply succeed, but somebody else is killed. If it's the biggest process, it'll die with SIGKILL without dumping core. There *are* systems that kill random processes when swap runs out, presumably when they need to actually get pages that aren't available. FreeBSD is not one of them. Overcommit still has nothing to do with malloc. Either the *system* is overcommitted or it isn't - per-process overcommitment is irrelevant, as is the way memory has become overcommitted. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
[EMAIL PROTECTED] (Chris G. Demetriou) writes: Matthew Dillon [EMAIL PROTECTED] writes: The text size of a program is irrelevant, because swap is never allocated for it. The data and BSS are only relevant when they No, you can mprotect read-only vnode mappings to writable. Most things wouldn't be hurt badly if this changed, though, I suspect that this already varies between operating systems. are modified. The only thing swap is ever used for is the dynamic allocation of memory. There are three ways to do it: sbrk(), mmap(... MAP_ANON), or mmap(... MAP_PRIVATE). yup, almost: not all MAP_PRIVATE mappings need backing store, only MAP_PRIVATE and writeable mappings. (MAP_PRIVATE does _not_ guarantee that you won't see modifications made via other MAP_SHARED mappings.) ...but in *this* case, you certainly shouldn't allow mprotect to fail (with what, ENOMEM?). It's certainly counterintuitive to me that mprotect could fail due to a resource shortage. Actually, only now have you brought that up. And, that's very system dependent. On NetBSD/i386 the default is 2MB, and, it's worth noting that you only need to reserve as much as the current stack limit allows (after that, you're going to get a signal anyway, and if more So what setrlimit accepts depends on how much memory is available? Ok, programs changing their stack limit are rare, but this would still be another API change. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
jul...@whistle.com (Julian Elischer) writes: If you wanted to fix this, you could add a patch to malloc that touched every page that it handed to the application. (and trapped sig11s) How would you expect that to work? Several misunderstandings seem to be common regarding this issue (most not directed at you): - malloc almost never fails with NULL. This is not true, if resource limits are set properly, any one program using huge amounts of memory is going to hit them long before swap space is exhausted. - The program currently trying to get the page is the one that is killed. - Actually paging in all memory is going to protect a program from getting killed. This is going to make it *more likely* for it to be killed. - Not overcommitting doesn't consume huge amounts of reserve space unless programs do something special. A rough sum of memory usage can be computed by summing up all of the process VSZs plus your stack limit times the number of processes. How many of you would be willing to configure that much swap space? If you really wanted to run without overcommit, you'd only run statically linked binaries and set your stack limits to small values. This could be desirable for some (but not general-purpose) systems, an option for doing this wouldn't be entirely bogus. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
c...@netbsd.org (Chris G. Demetriou) writes: Matthew Dillon dil...@apollo.backplane.com writes: The text size of a program is irrelevant, because swap is never allocated for it. The data and BSS are only relevant when they No, you can mprotect read-only vnode mappings to writable. Most things wouldn't be hurt badly if this changed, though, I suspect that this already varies between operating systems. are modified. The only thing swap is ever used for is the dynamic allocation of memory. There are three ways to do it: sbrk(), mmap(... MAP_ANON), or mmap(... MAP_PRIVATE). yup, almost: not all MAP_PRIVATE mappings need backing store, only MAP_PRIVATE and writeable mappings. (MAP_PRIVATE does _not_ guarantee that you won't see modifications made via other MAP_SHARED mappings.) ...but in *this* case, you certainly shouldn't allow mprotect to fail (with what, ENOMEM?). It's certainly counterintuitive to me that mprotect could fail due to a resource shortage. Actually, only now have you brought that up. And, that's very system dependent. On NetBSD/i386 the default is 2MB, and, it's worth noting that you only need to reserve as much as the current stack limit allows (after that, you're going to get a signal anyway, and if more So what setrlimit accepts depends on how much memory is available? Ok, programs changing their stack limit are rare, but this would still be another API change. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: a BSD identd
gr...@freebsd.org (Brian F. Feldman) writes: It's out with the bad, in with the good. Pidentd code is pretty terrible. The only security concerns with my code were wrt FAKEID, and those were mostly fixed (mostly meaning that a symlink _may_ be opened, but it won't be read.) If anyone wants to audit my code for security, I invite them to. Did you mean to avoid reading through symlinks using the open + fstat method mentioned earlier in the thread? I thought I'd misunderstood, that you had to be discussing something else, since you and whoever else was involved both agreed that open + fstat is sufficient, and I thought that several people can't possibly be so completely confused. If you really want to avoid reading through symlinks, you need to lstat, open and fstat (the order doesn't really matter). To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Wrong comment in VM code?
[EMAIL PROTECTED] (Zhihui Zhang) writes: At the beginning of the file vm_object.c, we have the following comment: The only items within the object structure which are modified after time of creation are: reference count locked by object's lock pager routine locked by object's lock But at the end of vnode_pager_setsize(), we modify the size field. So at least three items can be modified after creation. Am I right? The comment is wrong (it's probably supposed to mean something other than it seems to), the only field in a vm_object that *isn't* modified after creation is 'id'. The comment is also wrong in that there are no vm_object locks in FreeBSD (they've been ripped out). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Rewriting pca(4) using finetimer(9) (was: Re: MPU401 now worksunder New Midi Driver Framework with a Fine Timer)
p...@critter.freebsd.dk (Poul-Henning Kamp) writes: Somebody should study the abilities of the on-cpu APIC for this for pentium ff. machines. The local APIC would work very nicely, but I'm not sure that you can enable it reliably in a non-SMP configuration. AFAIK most BIOSes don't provide an MP config at all unless you have multiple CPUs present. If you don't have an MP config, you can't set up the redirection tables. And if you have a non-SMP chipset, you can't route interrupts at all, since you won't have an APIC bus on your motherboard or an I/O APIC for the real interrupts. It's been a while since I looked at the documentation, but it *might* be possible that the local APIC timers would work without using APIC interrupt routing. IIRC the timers are simply programmed with the IDT vector number to generate as an interrupt. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Wrong comment in VM code?
zzh...@cs.binghamton.edu (Zhihui Zhang) writes: At the beginning of the file vm_object.c, we have the following comment: The only items within the object structure which are modified after time of creation are: reference count locked by object's lock pager routine locked by object's lock But at the end of vnode_pager_setsize(), we modify the size field. So at least three items can be modified after creation. Am I right? The comment is wrong (it's probably supposed to mean something other than it seems to), the only field in a vm_object that *isn't* modified after creation is 'id'. The comment is also wrong in that there are no vm_object locks in FreeBSD (they've been ripped out). To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Rewriting pca(4) using finetimer(9) (was: Re: MPU401 now worksunder New Midi Driver Framework with a Fine Timer)
p...@critter.freebsd.dk (Poul-Henning Kamp) writes: But shouldn't you still be able to use the timer in the local apic ? Did you read the last paragraph in my message? Here it is again: It's been a while since I looked at the documentation, but it *might* be possible that the local APIC timers would work without using APIC interrupt routing. IIRC the timers are simply programmed with the IDT vector number to generate as an interrupt. I haven't tried it, I don't know what would happen. If someone else knows (or has a chance to try it soon), please comment. Even if it works, using the feature would probably have to rely on undocumented behavior. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Bursting at the seams (was: Heh heh, humorous lockup)
[EMAIL PROTECTED] (Patryk Zadarnowski) writes: You can't extend the address space that way, segments are all parts of the single 4GB address space described by the page mapping. True, but you can reserve a part of the 4GB address space (say 128MB of it) for partitioning into tiny (say 8MB) address spaces (which are still flat, just small), for use by small processes, the idea being that all those small processes will than share a single page table without compromising on memory protection (the GDT is under full OS's control anyway), or the simplicity of a flat address space (virtual addresses still start at 0 and continue till the top of address space; the scheme is totally transparent.) Yeah, I know, I've read Liedtke's original paper where he described the optimization in L3, that's fine for that specific purpose, but that wasn't what the thread was about. Unless I totally missed the point. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Bursting at the seams (was: Heh heh, humorous lockup)
jul...@whistle.com (Julian Elischer) writes: we already use the gs register for SMP now.. what about the fs register? I vaguely remember that the different segments could be used to achieve this (%fs points to user space or something) You can't extend the address space that way, segments are all parts of the single 4GB address space described by the page mapping. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Heh heh, humorous lockup
dil...@apollo.backplane.com (Matthew Dillon) writes: pair-down the fields in both structures. For example, the vnode structure contains a lot of temporary clustering fields that could be removed entirely if clustering operations are done at the time of the actual I/O rather then before hand ( which leads to other problems related to low-memory deadlocks :-(... but assuming that could be fixed... ). Actually the vnode structure can be reduced in size quite a bit without affecting behavior. I analyzed this in a private mail to phk a few months ago, I can get the list of necessary changes out again if anyone is actually interested. The idea was to reduce the size to 128 bytes (on i386) so that the kernel malloc would do a reasonable job allocating the vnodes without too much overhead. IIRC it was very close. I had written code that allocated and deallocated vnodes dynamically (see http://www.hut.fi/~will/freebsd_vnfree0.diff for a non-malloc version with parameters adjusted to exercise the behavior quite heavily). It didn't seem like a very useful feature, though, because of fragmentation (even with the 'optimizing' zone allocator in the patch). Even if the kernel malloc would be usable, the only other common object that would typically use that memory would be ffs inodes, which are allocated and deallocated along with vnodes... This reminds me - the small patch I submitted to fix the v_id references still hasn't been commited (phk, if you're reading this, is there any specific reason for this?). To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Bursting at the seams (was: Heh heh, humorous lockup)
patr...@mycenae.ilion.eu.org (Patryk Zadarnowski) writes: You can't extend the address space that way, segments are all parts of the single 4GB address space described by the page mapping. True, but you can reserve a part of the 4GB address space (say 128MB of it) for partitioning into tiny (say 8MB) address spaces (which are still flat, just small), for use by small processes, the idea being that all those small processes will than share a single page table without compromising on memory protection (the GDT is under full OS's control anyway), or the simplicity of a flat address space (virtual addresses still start at 0 and continue till the top of address space; the scheme is totally transparent.) Yeah, I know, I've read Liedtke's original paper where he described the optimization in L3, that's fine for that specific purpose, but that wasn't what the thread was about. Unless I totally missed the point. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Overwrite an executable file that is running
[EMAIL PROTECTED] (Zhihui Zhang) writes: For a big executable file that is being run by the OS, all its contents may not be loaded into the memory. At the same time, the developer gets impatient and wants to create a new version of the same file. He could modify the makefile to output the new version to a different file name, but this is tedious. This new version should not overwrite the older verion of the file being run. My question is how FreeBSD prevents this from happening? Can anyone point out for me where in the source code this It is prevented by not allowing it. A file cannot simultaneously be executing and opened for writing. To find the relevant bits in the sources, try: grep ETXTBSY /sys/kern/* To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Overwrite an executable file that is running
zzh...@cs.binghamton.edu (Zhihui Zhang) writes: For a big executable file that is being run by the OS, all its contents may not be loaded into the memory. At the same time, the developer gets impatient and wants to create a new version of the same file. He could modify the makefile to output the new version to a different file name, but this is tedious. This new version should not overwrite the older verion of the file being run. My question is how FreeBSD prevents this from happening? Can anyone point out for me where in the source code this It is prevented by not allowing it. A file cannot simultaneously be executing and opened for writing. To find the relevant bits in the sources, try: grep ETXTBSY /sys/kern/* To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
TCP input processing bug
I think I've located a problem in TCP input processing...and it has been there for quite a while. It breaks half-open connection discovery for many cases since version 1.15 of netinet/tcp_input.c (committed by Garrett Wollman, which is why this is Cc'd to him), although that isn't where the (presumably) incorrect behavior was introduced. The half-open connection discovery problem can be reproduced easily, the conditions required are: - Machine A thinks it has an established connection with machine B - Machine B disagrees (it has crashed, the network has been down, maybe has been recently assigned the IP of another machine that disconnected unnicely etc., there are a lot of conditions that can cause this) - Machine B tries to connect to machine A using the same source port number as the half-open connection - Machine B selects a sequence number below the current window expected by machine A Machine B sends a SYN, but gets nothing as a reply (it should be getting an ACK), no matter how many times it tries. Machine A will keep the connection in an established state until it tries to send data (depending on the application, this may never happen) or is timed out by keepalives. This is particularly nasty if the boot procedure of machine B establishes a TCP connection to machine A - after a crash, it'll always try to use the same port number and never succeed. Basically, in the tcp_input function, just before ACK processing, when 'goto drop' is done if ACK isn't set, TF_ACKNOW might be set in tp-t_flags, but the ACK is never sent because tcp_output is never called. This can be fixed by checking for TF_ACKNOW in the drop: case and calling tcp_output if it is set. However, such a modification can change the behavior of a considerable number of cases so I think it needs careful verification. Anyone who knows the TCP code, please comment! To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: vinum performance
cro...@cs.rpi.edu (David E. Cross) writes: I have a drive that is rated at ~16 Meg/second, and indeed it delivers on the order of 15+ Meg/second. If I use Vinum to create a concatinated device of 2 such units performance drops to 2.5 Meg/sec. This seems like a drastic drop in performance. Any ideas what I am doin incorrectly? You've accidentally striped subdisks on the same drive? ;--) Like Greg Lehey said, you haven't really provided enough details. The minimum info required would be: - Is this read or write performance? Many disks are shipped with write caching disabled, and write performance can be significantly worse than read performance. It shouldn't be quite *that* bad, though, I get better write performance with slower disks, write caching disabled and mirroring (with the default 3.2 vinum - which has debugging compiled in, look at the bss size...). - Are you testing through the filesystem? (How are you testing?) Maybe you're doing a dd test and accessing /dev/vinum/vol/* rather than /dev/vinum/rvol/*... To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: vinum performance
g...@lemis.com (Greg Lehey) writes: You've accidentally striped subdisks on the same drive? ;--) Like Greg Lehey said, you haven't really provided enough details. He did provide one detail, though; this is a concatenated plex, not a striped one. Or he at least *thinks* it's concatenated. ;--) I didn't miss that - but given numbers that bad, it sounds like there might be some really silly mistake involved. Many disks are shipped with write caching disabled, and write performance can be significantly worse than read performance. Not if it works without Vinum. My thought was that he might be comparing read performance and write performance (they are often pretty close). But even so, the difference shouldn't be *that* big. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: coarse vs fine-grained locking in SMP systems
m...@servo.ccr.org (Mike O'Dell) writes: very fine-grain-locked systems often display convoying and are prone to priority inversion problems. coarse-grained Priority inversion problems are design flaws. Depending on the type of locks, they may not even be possible. Spin locks held for short periods of time (typical for very fine-grained systems) can't cause priority inversion because the process holding the lock can't block. we published the best Unix SMP paper I've ever seen in Computing Systems - from the Amdahl guys who did an SMP version of the kernel by very clever hacks on SPLx() macros to make them spin locks and a bit of other clever trickery on the source. they could take a stock An approach like that can't possibly be sufficient if code has been written with the assumption that only interrupt-like events or blocking calls can change things from under it. There is quite a bit of code in FreeBSD that relies on this. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: symlink question
dsche...@enteract.com (David Scheidt) writes: First try: Suppose foo depends on /usr/local/etc/foo.conf. /usr/local/etc is a link to /usr/local/${ARCH}/etc. User does export $ARCH=../../home/user, so /usr/local/etc/foo.conf is now in their home directory. Depending on how poorly written foo is Eww, I don't like the idea of using environment variables this way. The kernel shouldn't rely on them, they are a userland thing except during execve. Environment variables aren't even visible to the kernel in the process that sets them. Variant symlinks don't need to be controlled through environment variables. If there is a specific use in mind for variant symlinks, the mechanism for configuring them should be chosen with consideration for that. (Even if variant symlinks could be environment variables, there should be ones that are based on some hard-wired info and system-wide variant symlinks should only use environment variables when user-modifiability is specifically desirable. Your example is obviously a case of improper use.) If there is no specific use in mind for variant symlinks, other than to have fun magic thingies around to play with that *can* be used for such-and-such, then implementing them is not a particularly good idea. For example, Lites had variant symlinks with keywords that were internally resolved to the architecture/system name or the name of the system being emulated. For Lites, this was much better than something equivalent to FreeBSD's /compat hacks, because emulated systems were equal, and the root partition could be shared with the real system. For FreeBSD, the current approach is probably better, because emulated systems are optional exceptions. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: oops, here's the patch
dil...@apollo.backplane.com (Matthew Dillon) writes: However, if the inside of the first conditional generates an error, the vp may be vput twice. What I recommend is this for the last bit: That can't happen (the attributes are straight from VATTR_NULL along that path) - if it could, the file could also be truncated... if (vap-va_size != -1) { ... if (error) { vput(vp); vp = NULL; my addition } } if (eexistdebug vp) also check vp != NULL vput(vp); It would be good if someone else could look over this routine and double-check David's find and his solution with my modification. Have we handled all the cases? Yes, for that code path. Here's a simpler virtual unified diff that does the same thing as David's patch. (You don't need an 'eexistdebug' variable.) if (vap-va_size != -1) { ... - if (error) - vput(vp); } + if (error) + vput(vp); You can add a check for 'error == 0' in addition to 'vap-va_size != -1' but that shouldn't have any effect. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: FS tuning (Was: File system gets too fragmented ???)
jo...@gnu.org (Joel Ray Holveck) writes: As we all know, tunefs -o space will hurt write performance. Will it hurt read performance? If I don't care about install-time speed, but do care about run-time speed and free space, should I populate my filesystems at install time with space tuning? -o space should have very little effect install-time. The space vs. time optimization parameter only has any effect when files are extended by small amounts (the fragment at the end is reallocated). This seldom happens for most files. Log files and interactively edited files are probably (just guessing) most likely to make use of fragment reallocations. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: question about vnode and inode locking
zzh...@cs.binghamton.edu (Zhihui Zhang) writes: It seems to me that we can lock at the vnode layer AND at the inode layer. No, the inode lock is, in most cases, the vnode layer lock. It isn't obvious because the code assumes that any filesystem using vop_stdlock has a 'struct lock' as the first entry of the internal data pointed to by v_data. Very ugly. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: A bug in namei cache? (stale entries)
Suppose, you have a directory hierarchy a - b - c. In each of a, b, and c, we have the following files: a: ., .., a1, a2, a3, b (a1, a2, a3 are not directory files) b: ., .., b1, b2, b3, c (b1, b2, b3 are not directory files) If I do a mv a a_new, then cache entries for a, a1, a2, a3, b will be purged from the cache. Although b is purged from the namecache, we can still find it by other means (e.g. ufs_ihashget() called by ffs_vget()). So the entries for b1, b2, b3, c are still useful. So the namei cache will not contain any stale entries. Am I right? Yes, except that the other means for finding b are more commonly by holding a reference to the vnode (open file handle, currenct directory) or just by searching the directory again. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: A bug in namei cache?
zzh...@cs.binghamton.edu (Zhihui Zhang) writes: Suppose you want to mv a directory file (with subdirectories) to another name (it is like grafting a subtree to another point), the namecache associated with the source directory file will be purged by calling cache_purge() (done in ufs_rename()?). However, the routine cache_purge() does not purge cache entries recursively down the subtree. Will this result in a lot of stale entries in the namecache? FreeBSD 3.1 no longer The name cache only caches component names, not paths, so the entries are still valid. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Repeatable kernel panic for 3.2-RELEASE NFS server
cro...@cs.rpi.edu (David E. Cross) writes: One of our users way able to reliably crash an NFS server 3 times today. I have since copied his program and have reliably crashed a seperate and unloaded machine with the exact same panic, lockmgr: locking against myself. I check the recent DG patches that went in after -RELEASE and they Are you sure this is NFS related? I can certainly reliably reproduce that and other panics (reported in kern/11629, includes a fix). To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message