Re: syscalls implementation
Mmaist wrote: Hi! I was wondering were syscalls implementation is in the FreeBSD source tree. I would like to know, especially, where int kldload(const char*); is located. sys/kern/kern_linker.c contains int kldload(struct thread *, struct kldload_args *) and I need to watch at what called between them. This is probably the wrong list. You probably want -questions. Here's the simplest answer you will probably understand, given that the question you are really asking is about understanding how system calls work, and where baby system calls come from. 8-). The system calls are in code that's generated from the file /usr/src/sys/kern/syscalls.master by /usr/src/sys/kern/maksyscalls.sh, which is an awk script encapsulated in a shell script. In user space, the system calls are stubs in a library that traps into the vector code generated from syscalls.master in the kernel. This code is located in /usr/src/lib/libc. In the case os system calls that are unwrapped (like kldload(2)), the calls are generated assembly code that comes from running a script against the same syscalls.master file describd earlier; see src/lib/libc/sys/Makefile.inc for the exact place in the source tree that these stubs are being generated from. The way a system call happens is that the arguments are pushed on the stack (or put into registers, depending), and then a trap is issued by attempting to execute a supervisor-only instruction in user code. This effectively generates a fault which is then serviced by a fault handler in the kernel which recognizes the particular trap as special, and treats it as a system call. Now when the special trap code in the kernel itself is activated, it packages up the arguments in a structure of a known size (known to the code in init_sysent.c, generated from syscalls.master). For the Intel version of the trap code, see /usr/src/sys/i386/i386/trap.c and the related assembly code there. The current thread and the packaged argument pointer are passed to the system call. Techinically, it makes a lot of things difficult that the current thread is passed into the system call as part of its context, rather than obtained when needed, and cached locally, if necessary, but since it's handy at the time, it's passed in. On non-register-poor architectures, the current thread is usually made available in a register, so the cost in obtaining it later is actually lss than a memory dereference of a computed stack offset, as it is/was (depends on the version, architecture, etc.) in FreeBSD because it's being passed in. On of the major things this makes difficult is accurate proxy credential representation at various points in the kernel (see the NFS server source code for examples of the contorted logic this makes necessary). The packages argument structures are defined in the (also generated from the syscalls.master) file /usr/src/sys/sys/sysproto.h. For the kldload system call, this looks like: struct kldload_args { char file_l_[PADL_(const char *)]; const char * file; char file_r_[PADR_(const char *)]; }; This argument structure is actually a dscriptor. A descriptor is used to ensure packing and alignment, so that the user stack save area can be coerced directly to the structure type, and dereferenced in the function. The descriptor contains the information necessary to line the structure contents up with the user stack area and/or register spill area, where the arguments were stored in user space before the trap. Mostly, you can just look at the middle element to know what the argument is, for each line of arguments. In this case, it's a const char * file. This matches what it was in user space when you made the call. So the function you are seeing in the kernel: int kldload(struct thread *, struct kldload_args *); *is* the system call you saw in user space: int kldload(const char*); with the trap added thread pointer and pointer to the packed up save area containing the same char * value you passed in user space. Note that since the user and kernel space are not necessarily in core at the same time (maybe the pointer you pointed to was in a page that was swapped out), so you have to use copyin/uiomove in the function in the kernel to copy the path in before it can be used in the kernel address space. Probably you should not be hacking in this area until you understand the code operation a little better, since unless you know what you are doing, most changes you could possibly make will leave you with a dead system. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: api for sharing memory from kernel to userspace?
Alfred Perlstein [EMAIL PROTECTED] writes: I need to share about 100megs of memory between kernel and userspace. The memory can not be paged and should appear contig in the process's address space. Any suggestions? I need a way to either: map user memory into the kernel's address space. map kernel memory into the user's address space. I was looking at pmap_qenter() but it didn't see attractive because it's for short term mappings, this mapping will exist for quite a while. Given your non-paged requirement, the allocation needs to take place in the kernel. Visible in a single process, or in all of them? If all of them, set the PG_U bit; it will have the same address in user space as it does in the kernel, and be visible to all processes. If only one of them, the best general solution is to use a pseudo-device, and support mmap() on it. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD VFS System?
Ryan Sommers wrote: Are there any good web resources or books on the VFS system that the FreeBSD kernel uses? I'm guessing it might have originated from the 4.4BSD(?) interface. I've been attempting to read through the source code for different system calls (ie mkdir, rmdir, mount/umount) and haven't been able to get very far because of the substantial layers of indirection involved. For this reason I was looking at picking apart the different subsystems involved and was wondering if there was anything any more annotated then the source code itself. The VFS stacking code came from the FICUS projct out of UCLA. Here are all their papers on FS stacking: ftp://ftp.cs.ucla.edu/pub/ficus/usenix_summer_90.ps.gz ftp://ftp.cs.ucla.edu/pub/ficus/OLD_TECHREPORTS/ucla_csd_900044.ps.gz ftp://ftp.cs.ucla.edu/pub/ficus/OLD_TECHREPORTS/ucla_csd_910007.ps.gz ftp://ftp.cs.ucla.edu/pub/ficus/WorkObjOrOpSys_90.ps.gz ftp://ftp.cs.ucla.edu/pub/ficus/heidemann_thesis.ps.gz ftp://ftp.cs.ucla.edu/pub/ficus/ucla_csd_930019.ps.gz -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Machines with = 4GB of RAM
Andrew Kinney wrote: On 17 Dec 2003 at 15:44, Julian Elischer wrote: snip options KVA_PAGES=512 may be a start, but is it still required, and do I have to change anything else to match it? (where does the Makefile work out where to link the kernel for?) Is a value of 512 enough for a machine with 16GB of RAM? Any hints, (even a better google search string) appreciated. We have a 4GB machine running 4.8-RELEASE, so we aren't using PAE, but we had to make changes similar to what you're asking about for a different reason. Your requirements will vary depending on the version of FreeBSD you're running, but in general, increasing KVA_PAGES will help considerably with stability on large memory machines. It should be noted that releases prior to 4.8 required more changes than just KVA_PAGES, but the documentation is a bit muddied on that subject. In general, the kmem_map size and other kernel memory usage, including page tables necessary to reference the full memory, end up taking more than 1G, so the 3G user:1G kernel ratio that's the default for older FreeBSD won't work at all. I usually recommend that people make it 1G user:3G kernel; you can get away with 2G:2G if you aren't going to be allocating lots of mbufs or supporting lots of open sockets, etc., but in general most people throw 4G+ into a box because they plan on building a network server and then throwing some serious load on it. I don't know if it is required, but we rebuilt the world after changing KVA_PAGES just to make sure that any hidden dependencies on that value were handled in things other than the kernel. Depends. The normal case where this will be required is for prebuilt kernel modules. The only user space code I'm aware of which cares is the Linux threads package (and anything that links against it), since the threads mailboxes are in a fixed location apriori known to both the kernel module and the threads library, and the location has to be changed when the KVA space changes, since it assumes a 3:1 or whatever was in effect when it was compiled. As far as 512 being a large enough setting for a 16GB machine, that depends entirely on what you plan to do with the machine and its usage pattern of various system resources. In my personal experience, th kernel and data structures consume over 1G in a 4G box, du to the auto-tuning cruft trying to b smarter than it actually is, and making bad decisions. In the 4.7/4.8 time frame, this was catastrophic with more than 4G, since it didn't stop scaling at 4G (the kernel can only address 4G, without PAE, no matter what, since pointers are 32 bits). Scaling above that point tris to allocate more memory for the kernel than the kernel is capable of addressing. For instance, on our 4GB machine, it does a lot of heavy web serving, databases, and email. We needed the 2GB KVA on that machine because of large numbers of files, large network buffers, and some weirdness relating to Apache and pv entries. If your usage patterns were similar and you wanted to make full use of the 16GB without getting trap 12 panics, then 2GB KVA may be inadequate. Older boxes won't even boot with 3:1 if you jam 4G in them, priod. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: question about _exit() function
rmkml wrote: is the _exit() function safe for a thread ? my program use vfork() and then execve in a thread context. The documentation mentions that the process has to call _exit() in case of failure. But this _exit() is really safe for the parent thread ? The behaviour is undefined in the failure case, but only if you have stablishd a pthread_atfork() handler that does anything in the child. In general, the more correct approach is to use posix_spawn(), but FreeBSD doesn't support posix_spawn() (funny; this has come up now twice in within 60 messages of each other, while ther was a very long time when it was not pertinent at all...). POSIX also sriously discourages the use of vfork(), and recommends fork() instead, for threaded programs. Note that the fork() only *ever* creates a single thread, so it will only replicate the currently running thread and its address space, rather than all currently running threads in the parent. You said in another message that this is on 4.8. I think that the behaviour will not be quite what you expect, in that case, and that it'll be better in -current, but might still not be what you expect there, either (depends on what you are expecting). See also: http://www.opengroup.org/onlinepubs/007904975/functions/fork.html -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: question about _exit() function
rmkml wrote: Thanks a lot for the answer. I will change vfork() with fork(). An another question: in the man page of vfork() it is mentionned that the fork() function has to use _exit(0) too when something wrong with the execve() happens! I can see how you might read it this way, but that's not really correct. The purpose of _exit() instead of exit() is to avoid calling the atexit() handler, the per thread data cleanup handlers, and the cancellation routines. In the case of a vfork(), it's undefined as to whether you will be operating against resources currently allocated and appropriately reference counted to the child, or whether you are operating against that of the parent. In the case of fork(), you are guaranteed to operate in the context of the child. Which you should use is dependent on your application. The normal operation of a fork() in a threaded program is to duplicate only the calling thread. If you have registered atexit(), per thread data cleanup handlers, or cancellation routines (or, in some cases, signal handlers), then when you call exit(), these things will be invoked. Consider that the thread forking has perhaps the responsibility of cleaning up a shared memory segment created by a task. You do *not* want to do this cleanup (which happens on an interface that you are manually resource tracking) in the child process in this case, since it could rip the shared memory segment out from under other processes which are using it. Rather than calling _exit() to avoid this, you probably want to call exit()... however, you must deal with detaching your registered handlers and avoiding your manual cleanup process. The correct way to do this is to disable them in the child by utilizing the function pthread_atfork() to disentangle them at fork time. See: http://www.opengroup.org/onlinepubs/007904975/functions/pthread_atfork.html Is the child a real process or because of the thread context a part of the parent process, so a new thread. It is a real process; the only thread running in this real process is a copy of the thread that was running at the time of the fork(). This is the primary reason that vfork() cautions against its use in POSIX documentation, and why it only permits either an _exit() or an execve(), and why you should avoid vfork(). In this case a pthread_exit() may be a better solution. Is that point a view complety wrong ? You likely do not want to use pthread_exit() in this case, since you are the only thread in a process. If you were to use this with vfork(), the combination would likely cause a program malfunction. If you were to use it with fork(), the combination may or may not be a problem. In the second case, the reason for this is that there exists the possibility that when you create the new process, it will link with object files with statically declared instances whose .init routines create worker threads. For example, the Netscape LDAP library is not threads-safe, so at one point I wrote a wrapper library that queued requests to a single worker thread that was created at the time the library reference was instanced via a .init section. Calls into the library queued requests, which were then serialized to the Netscape library, and responses were sent back out. The result looked like an LDAP client library that could be reentered, but which would serialize work to the non-thread-reentrant library under the covers. So the reason pthread_exit() should not be used is that you may not be the only thread running, and if so, there will be no pthread_join() called to reap your thread, and the other threads will continue indefinitely: you can't depend on not creating threads as a side effect of using various libraries or shared objects. Currently, is some indeterminate case, a part of my program freeze just after the vfork(). So, I try to understand what may cause the calling thread of vfork() to freeze ... Without more detail, this is hard to pinpoint, since I don't really know what you mean by freeze, and there appears to be a language barrier to a precise explanation. Most likely, this is an interaction between the user space scheduler and the vfork(). Realize that in the 4.x series of FreeBSD, the pthreads are implemented with a user space scheduler. This means that following a vfork(), since there is only one schedulable entity, the process, all threads are suspended until your _exit(0 or execve() call (assuming that these do the right thing in the vfork() case interacting with threads, and POSIX says that it's undefined if this will happen). If you want other threads in your main applicaion to run concurrently with the child, you *must* use fork() instead of vfork(). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: getpwnam with md5 encrypted passwds
Clifton Royston wrote: If you will need to do authentication after your program drops privileges, your best course is probably to go through PAM, to install a separate daemon which implements a PAM-supported protocol and which runs with privileges, and then to enable that protocol as a PAM authentication method for your application. [ ... RADIUS example with LDAP mention ... ] Sounds like a good approach, though I'll point out that had you tried LDP, you would have been hard-put to use LDAP as a proxy protocol to another authentication base (a PAM backend for an LDAP server, while not quite impossible, would be very hard). How did you avoid the recursion problem of the RADIUS server trying to authenticate via pam_radius to the RADIUS server tyring to authenticate ... -- Terry? ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: getpwnam with md5 encrypted passwds
Peter Pentchev wrote: On Wed, Nov 26, 2003 at 02:21:04PM +0100, Kai Mosebach wrote: Looks interesting ... is this method also usable, when i dropped my privs ? I think Terry meant pam_authenticate() (not pan), but to answer your question: no, when you drop your privileges, you do not have access to at least the system's password database (/etc/spwd.db, generated from /etc/passwd and /etc/master.passwd by pwd_mkdb(8)). If this will be any consolation, getpwnam() won't return a password field when you have dropped root privileges either. Peter is correct on both counts. If I had not sen his reply first, I would have made the same reply. You cannot crypt something you cannot read. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Flags Oddity
Kris Kirby wrote: FreeBSD (4.9-RC) doesn't appear to export schg flags over NFS. You've got to shell in locally to the machine to move the schg flags; ls -lao doesn't report them over NFS, but does list them locally. Non-local flags are not defined, so they are not permitted to be exported over NFS. You'll find the same thing with the number of bits in major and minor number, etc.. For a long time (until Julian added the first devfs to FreeBSD), it was not possible to NFS-boot a FreeBSD box off of e.g. an Alpha running TRU64 UNIX, for example. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: getpwnam with md5 encrypted passwds
[EMAIL PROTECTED] wrote: i am trying to validate a given user password against my local passwd-file with this piece of code : if (!( pwd = getpwnam ( user ))) { log(ERROR,User %s not known,user); stat=NOUSER; } if (!strcmp( crypt(pass,pwd-pw_name), pwd-pw_passwd) ) { log(DEBUG|MISC,HURRAY : %s authenticated\n, user); stat = AUTHED; } I know you have the fix for the crypt of the wrong field, but the proper thing to do is probably to use pan_authenticate() so that you are insensitive to the athentication method being used, rather than crypting and comparing it yourself. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: secure file flag?
Wes Peters wrote: On Tuesday 18 November 2003 16:31, Rayson Ho wrote: e.g. when deleting a secure file, the OS will overwrite the file with random data. Better to overwrite it with a more secure pattern. See ports/ sysutils/obliterate for references. It has been mentioned before that this could be done on in the kernel, obliterating blocks in the VM rather than zeroing them. I hadn't thought of applying at the file or filesystem level. The DOD has a specific pattern it requirs, to consider the deletion to be secure. The closest we have is the 'rm -P' command and the above-mentioned obliterate command. The overwrite pattern used in 'rm -P' is not likely to be effective against a dedicated inspection of the disk; the one in obliterate somewhat more so. On most modern drives, nothing is likly to be ffective, without OS support all th way down to the driver and hardware flags level. The DOD specified pattern is only effective if you have separate control of the seek and the write. The reason for this is that you musttake head hysteresis into account since if you did a seek in for the initial write and a seek out for the erase, you will ens up with a small strip of bits that are readable, even if they are much smaller than a standard track width, since there is jitter in the head positioning that depends on the side of the track you are coming from. So in reality, you also need to control sector sparing and write caching, as well, to avoid track caching, if possible, and seeks for sector sparing which are hidden from the OS trying to invoke the write pattern: you need to turn both of these off, if you can. If you can't, you need to buy a different disk, and turn both of these off, if you can. If you can't, you are going to ned to write your own disk firmware. You also need to deal with not writing to tracks at one end or the other of the disk, since you can't seek to them from the opposite direction, which means you have no way to write the pattern you are expected to write. This generally means that the end tracks need to be treated as scrath landing zones, and you only ever write pattern data to them, and then only because that's the way to get the disk head onto the track so you can seek back to the track that you really want to erase. In a track-caching world, this tends to be not useful, unless you can determine the physical geometry of the disk, and treat tracks as separate entities. Finally, if you have a track-caching disk, it's likely that the way it operates is to just seek in and start writing. That will mean that in order to avoid a thin stripe of your old bits, you have to trat tracks as singl entities, and that means that if you have a track that shares data with several files, and you want to scribble over one of them effectively, you have to scribble over everything effectively, and then put the data for the filec(s) you didn't want to erase back on the track. This sounds like an interesting file flag. Would you expect the process to block on the unlink(2) call while the overwrite takes place, or for this to happen in a kernel thread? The former seems pretty straight- forward, hacking at ffs_blkfree. The latter I really wouldn't know how to begin without (a lot) more study. You would have to do the former, or you would not pass common criteria valuation, if that's what you are aiming for. The normal way this is handled in government secure facilities is a 2U rack unit containing thermite charges. The normal way this is handled in a commercial scure facility is mostly to put the disks in a crusher. If this is somthing other than that, I doubt anyone would be willing to spend US$60,000/MB to have someone recover your porn. You are likely safe enough with PHK's somewhat inscure disk encryption thingy. As an overall note, you might want to contact Michal Serrachio off list; he has a solution to this problem which h might be willing to license to you for a fee. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Library libgcc_pic.a missing on 5.1?
Jim Durham wrote: Is liibgcc_a not supposed to be on 5.1? Is the one in /usr/lib not good enough for you? 8-) 8-). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Conflict between sys/sysproto.h stdio.h ... ?
lucy loo wrote: I am writing a kernel loadable module to reimplement some system calls. I have included sys/sysproto.h, sys/systm.h, etc. -- very standard header files for kld implmentation. So far... I also want to do file i/o in this module, therefore I need to include stdio.h. But it obviously conflicts with those sys/..., and make won't pass. Anyone knows how to fix this? You cannot use libc functions in the kernel. The kernel does not link against libc. It is not an application, it is a kernel. There are some libc functions which are provided in the kernel; there are other libc functions for which there are similar kernel functions of the same name (e.g. printf), and there are some libc functions -- quoted because they aren't really there, but you can use them -- that are inlined by the compiler. Programming in the kernel environment is not the same as programming in the normal applications environment (the POSIX environment). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: non-root process and PID files
Jos Backus wrote: On Fri, Nov 14, 2003 at 01:45:45AM -0800, Terry Lambert wrote: OK. We already have one of those. We call it init. 8-). Feature-wise init and svscan/supervise don't quite match; svscan has more features, one of which being that it doesn't use a single control file which if you screw it can render your system unusable. Even SysV init has more (useful) features than ours. Also, init is kinda hard to use by non-root users for that reason. People have been known to successfully replace init with svscan, btw. The main feature svscan lacks for me is the right license. 8-). Practically, init does what you want: monitors and restarts things that die. This whole discussion, though, is pretty stupid, since the things you are worring about keeping running should be written to not die in the first place, and if they *are* dying, it's generally dumb to restart them thinking that they will magically not die again, since, having ben restarted, they are now among the blessed. 8-) 8-). This is really off-topic. But sure, there are instances where this approach is less than robust. So what? Using pidfiles doesn't solve this problem either. Not using PID files won't fix it either. So portraying PID files as being a bad way to solve this problem (which is what people were originally complaining about) is really dumb: PID files work very well for resolving the problem that they were intended to resolve. You just need to create them when you are first started, which is done by root, unless you are SUID to some other UID, in which case it's done by that ID instead. After that, the same UID has priviledges on the file, and the problem is entirely solved. [ ... ] Also, software can fail in ways unaccounted for. But this is really off-topic. It can't fail that way if you account for it failing that way. 8-). But see the above: restarting sshd because it core dumps when you try to login using putty isn't going to magically make attempting to login via putty not core dump it. Anyway, FreeBSD has steadfastly disliked the concept of a registry, ever since Microsoft implemented it in Windows95; it's on of the biggest NIH items of all time. I think it would help if config files had a common structure and could be queried for interesting service metadata like dependencies. See the Arusha project for an example. This is totally bogus. Data interfaces are the same things that screw us over with proc size mismatch messages every time the proc structure changes. The only reasoanble interface is one that provides procedural accessor/mutator functions to abstract the format of the interned vs. the externed data, so that it then becomes *impossible* to get a proc size mismatch. FreeBSD currently continues to have the proc size mismatch problem because it currently insists on continuing to use data interfaces. FreeBSD continues to use data interfaces in this are because it can not use procedural interfaces to operate against system dumps. FreeBSD can't use procedural interfaces to operate against system dumps because it does not take advantage of the ELF format to add an ELF section to the kernel image to contain the shared library for the procedural interface to use against the kernel (with an internal, but therefore hidden, data interface), so that there is never a discrepancy. Anyway, we were talking about the use of pidfiles versus using a process monitor. I'm simply claiming that using a process monitor is far superior. And I'm simply claiming that they solve different problems, and that complaining about not having a solution to the problem that a process monitor solves (to wit: restartting buggy programs that should not be crashing in the first place) is OK, but complaining that PID files don't solve the same problem is incredibly bogus. They solve different problems, and you can't simply replace PID files with process monitors, and continue to solve the problems that PID files solve that process monitors don't solve. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: non-root process and PID files
Jos Backus wrote: On Thu, Nov 13, 2003 at 02:45:18AM -0800, Terry Lambert wrote: Why use pid files at all if you could be using a process supervisor instead? Who supervises the supervisor? Heh. The supervisor should be small and robust, like init. Has init died on you recently? Do you want to solve this problem or find Nirvana? If the latter, don't use computers. OK. We already have one of those. We call it init. 8-). There are also the small issues of ordering (the reason you can't just run everything out of /etc/ttys via init in the first place), Sure. Hard to get right but not unsolvable. No reason you can't use process monitoring with something like rcNG. We tried very hard to do this on the InterJet. We still ended up shooting most things in the head with large caliber bullets each time the dial on demand interface went up or down because we did not have the idea of hard and soft dependencies. Even if we had had them, though, we would still have been SOL, since many of the Open Source programs we used cached information when they started. Because of this, the data could get stale. For example, say I ran sendmail and bound it to the external port (or INADDR_ANY). What is the host name that I should claim to the remote host when I answer with the 200 Connected message? What should I use for the argument to the HELO or EHLO for outbound SMTP connections so that the name I use matches the name the remote host gets on it's crosscheck for the canonical name of the machine contacting it via a gethostbyaddr(getpeername())? Basically, you end up with a system where you either can't cache data, or where the cache has to be chared, or where you implement a generic notification mechanism. No matter how you slice it, though, you're talking about rewriting millions of lines of code. Cacheing Considered Harmful. multiple instances, /service/smtpd.{external,internal} Yeah, we did this, so that we could shoot only half the processes in the head on link up/down. It sucked. We almost shipped a product that wouldn't hav worked when we did the DNS split, because the dependency graph had to be manually managed, and wasn't. and removing human error from adding and removing new things to be monitored. That's a generic problem with any type of change management. Not really. If your configuration changes all happened in a centralized data repository, and nobody cached anything, but got their information from that central repository, and the interface to the repository was a system interface (so the system could cache on your behalf so performance didn't degrade unbearably), THEN you might have something. After you rewrote millions of lines of Open Source code to use your registry instead of working the way it currently works, which is everyone has their own poop files. If you are lucky, hitting them over the head with a shovel (SIGHUP) works, and you don't have to kill and resurrect them (you just have to wait a long time before the services become usable again, e.g. DNS reading its config files). Anyway, FreeBSD has steadfastly disliked the concept of a registry, ever since Microsoft implemented it in Windows95; it's on of the biggest NIH items of all time. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: non-root process and PID files
Jos Backus wrote: On Mon, Oct 27, 2003 at 10:31:18AM -0500, Dan Langille wrote: If a process starts up and does a setuid, should it be writing the PID file before or after the setuid? Two methods exists AFAIK: 1 - write your PID immediately, and the file is chown root:wheel 2 - write your PID to /var/run/myapp/myapp.pid where /var/run/myapp/ is chown myapp:myapp Of the two, I think #1 is cleaner as it does not require another directory with special permissions. Any suggestions? Why use pid files at all if you could be using a process supervisor instead? Who supervises the supervisor? Sure, you can take the English Bobby approach (init dies, the kernel yells Help me, human, or I shall yell 'Help me Human!' again, and tries to start software that will never start over and over), but that solves nothing; you would be amazed at the number of people who want MacOS X to try to restart init, instead of panicing, when init can't be started in the first place, or won't stay running if it was. So this doesn't solve the origin of authority problem. The problem being solved is avoiding running multiple instances of roles... so actually, it would be better if the file were named e.g. smtp.pid, rather than sendmail.pid, which would step on the toes of everyone who wanted to use their program name as part of the file name to make it harder to use someone else's software to replace their software. There are also the small issues of ordering (the reason you can't just run everything out of /etc/ttys via init in the first place), multiple instances, and removing human error from adding and removing new things to be monitored. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue, NOTE_EOF
Jaromir Dolecek wrote: marius aamodt eriksen wrote: in order to be able to preserve consistent semantics across poll, select, and kqueue (EVFILT_READ), i propose the following change: on EVFILT_READ, add an fflag NOTE_EOF which will return when the file pointer *is* at the end of the file (effectively always returning on EVFILT_READ, but setting the NOTE_EOF flag when it is at the end). specifically, this allows libevent[1] to behave consistently across underlying polling infrastructures (this has become a practical issue). I'm not sure I understand what is the exact issue. Why would this be necessary or what does this exactly solve? AFAIK poll() doesn't set any flags in this case neither, so I don't see how this is inconsistent. BTW, shouldn't the EOF flag be cleared when the file is extended? It solves the half-close-on-socket issue which occurs on an HTTP/1.0 request or an HTTP/1.1 non-pipelined/terminal request that occurs on most HTTP connections as a result of the client closing their side of the socket connection, but the server being expected to provide a response to the request on the same socket. You need this to be able to distinguish, after getting an EOF, if you need to provide a response to the request, or you need to drop it, based on additional processing you choose to do in your application. Doing it this way saves you 33%, 50%, or 66% of the required system calls to detect the edge conditions, depending on your I/O model and when they hit. It is useful for HTTP servers, HTTP Proxy servers, L7 load balancers, load balancers that implement Direct Server Return for requests, and in a number of other common cases having to do with networking (e.g. transcoding proxies for cell phones or other users requiring it, FTP control vs. data channels in the non-passive case, ssh/rcmd/etc., as just a couple of select examples). I rather expect that it would be singularly useless for socketpair(), pipe(), named pipe (FIFO), or file operations, but that's not what we are talking about when we talk about event libraries that deal with input sources/sinks. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Confused about HyperThreading and Performance
Daniel Ellard wrote: Can someone point me at some non-marketing documentation about hyperthreading on the latest Intel chips? I'm seeing some strange performance measurements and I would like to figure out what they mean. Go out to Intel's web site's developer section, and look for SMT. There is a lot of literature. The reason you are seeing a performance drop is contention for shared resources that the scheduler doesn't know are shared. SPECmark and similar benchmarks tend to get worse numbers on every OS when SMT is enabled, due to contention. The only model which will work is a hierarchical affinity and negaffinity model, and I am not away of an OS that is not also treating the hardware as NUMA which works this way (and none of those run on Intel chips; mostly, they run on real NUMA hardware). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kylix in FreeBSD
Rod Person wrote: On Thursday 06 November 2003 09:09 am, It was written: If you futs with getting Kylix to run under FreeBSD, don't forget the special glibc requirements that some versions of Kylix have. Maybe you should probably simply replace the entire /compat userland with the userland of a distro that Kylix supprorts _with_ kylix extra patches installed? Have you tried this? Since Kylix came out I have tried to get it to run on FreeBSD and various Linux distros. A few days ago I got kylix to run on SuSE 8.2 (from the kylix newsgroups this seems to be the best distro for it). NetBSDs Linux emulation is based on SuSE, isn't it? But, I found no postings related to Kylix on NetBSD. My next wondering is would NetBSD Linux emu run under FreeBSD and would this run kylix? Since all new developement in Kylix is apparently officially stalled, now would be a good time to do the porting work, since it's no longer a moving target... http://www.linuxworld.com.au/nindex.php?id=122384005eid=-50 -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD 5.1-p10 reproducible crash with Apache2
Mike Silbersack wrote: On Wed, 5 Nov 2003, [ISO-8859-1] Branko F. Grac(nar wrote: I tried today with yesterday's -CURRENT. Same symptoms. No kernel panic, just lockup. Ok, submit a PR with clear details on how to recreate the problem, and we'll see if someone can take a look into it. I'm too busy to look at it, but at least putting it in a PR will ensure that it doesn't get too lost. Once the PR is filed, you might want to try asking on the freebsd-threads list; it sounds like the issue might be thread-related. (Note that your original e-mail might contain enough detail, I'm not certain; I just skimmed it. Filing a good PR is important either way, mailing list messages get easily lost.) Is gdb good enough in FreeBSD that you can break to the kernel debugger with GDB enabled, and dump out the stacks for all threads currently in the kernel for all processes? The way to find this, if it's a threads related issue, is to do exactly that, and then look to se if there's something like a close in one thread of an fd being used in a blocking operation in another thread. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: spambouncer tags much freebsd list mail as spam
C. Kukulies wrote: I installed the spambouncer.org procmail script and before I was switching the behaviour from SILENT to COMPLAIN I took a look at my spam.incoming folder and found a lot of messages from freebsd-bugs and freebsd-mobile in there. Both lists are not directed to folders prior to spambouncer coming into effect so they are trapped by spambouncer and I suspect that other freebsd lists would be trapped as well. Anyone experienced similar? The spambouncer script makes the same incorrect assumption that the EarthLink SPAM stuff makes, which is that any mail not explicitly addressed to you is SPAM. In other words, they expect mailing lists to violate the draft RFC that prohibits header rewriting by mailing lists, and they expect all mailing list servers to eat the overhead of expanding each message to a single recipient message, instead of sending the messages in bulk if the destination domain is identical. At least the spambouncer script can be modified to respect the Sender: header, which EarthLink fails to respect in their list of allowed senders. This is pretty much how you should modify the spambouncer code to handle mailing lists (and how EarthLink should modify their anti-SPAM system, as well, but probably won't). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: O_NOACCESS?
andi payn wrote: Now hold on. The standard (by which I you mean POSIX? or one of the UNIX standards?) doesn't say that you can't have an additional flag called O_NOACCESS with whatever value and meaning you want. A strictly conforming implementation can not expose things into the namespace that are not defined by the standard to be in the namespace, when a manifest constant specifying a conformance level is in scope. This is the main reason for things like _types.h. Obviously, code that relies on such a flag will be non-portable, since no standard defines such a flag, but that's fine, since the intended uses (writing a FreeBSD-specific backend for fam, for example) aren't expected to be portable anyway. Not justnot portable, but fails to conform to standards. If O_NOACCESS happens to be == O_ACCMODE on FreeBSD--just as it is on linux--and if that happens to also be == O_WRONLY | O_RDWR (with no other flags set), I don't see how that changes anything. Other than the security issues it raises, you mean, right? [ ... ] In which case, your example is (O_RDWR|O_WRONLY) == O_RDWR. The standard does not indicate whether the implementation is to use bits, or sequential manifest constants, only that the bits that make up the constants be in the range covered by O_ACCMODE. First, again, this is intended to be used for non-portable code, and therefore, the fact that this happens not to be true on FreeBSD means it's irrelevant that it could be true elsewhere. Especially since, if O_NOACCESS were added to FreeBSD, it would still fail to exist entirely on other platforms, which means it matters little what value it might have if it did exist--code written to use O_NOACCESS won't compile on platforms without O_NOACCESS. You nead to look at the implementation of VOP_OPEN in FreeBSD; specifically, you need to look at the fact that fp-f_flags is passed as one of its parameters, and that the FS is permitted to interpret these flags in an FS-dependent fashion. Then you need to at the fact that FreeBSD supports locadable FS types, and that there are third party FS's that proxy operations over the network, which can include a network version of the flags, and conversion back and forth could therefore end up being ambiguous. Really, it needs to be a bitmap internally in FreeBSD, as well, but that's a big step. Second, any platform that defines O_NOACCESS could do so differently. On FreeBSD, as on linux, the most sensible definition is O_NOACCESS == O_WRONLY | O_RDWR == 3. Or a platform that defined O_RDONLY as 1 and O_WRONLY as 2, the most sensible definition would be O_NOACCESS == 0. I pray this flag never gets adopted outside of Linux... In fact, externally, they are bits, but internally, in the kernel, they are manifest constants. Yes, FFLAGS and OFLAGS convert between the two. If you look at how this works in the linux kernel, you'll see that O_RDONLY (0) converts to FREAD (1); O_WRONLY (1) to FWRITE (2); O_RDWR (2) to FREAD | FWRITE (3); and O_NOACCESS (3) to 0. This could be done the same way in FreeBSD.* * Actually, this is a tiny lie; linux has a 2-bit internal access flags value which it derives in this way, and uses the original passed-in flags for everything except access. FreeBSD instead just adds 1, relying on the fact that the lower 2 bits will never be 3, and therefore all of the other bits will stay the same. This means that enabling this value would make the FFLAGS and OFLAGS macros slightly more complicated on FreeBSD. It would be more useful to intern them as a bitmap, IMO, and get rid of the conversion. The problm is compatability with historical source code passing literal constants instead of manifest values. The most useful thing you could do with this, IMO, is opn a directory for fchdir(). Except that you can already do exactly this with chdir(). But I can see that you might at some point want to check the directory before chdir'ing to it, or pass an fd down into some function instead of a string, and this would be useful in such a case. Or deal with issues of privilege granted merely by open. For example, on FreeBSD, an implementation of this would permit any normal user to do INB/OUTB to any I/O port on any hardare on the machine. This is a can of worms. Of course, allowing this on directories for which you are normally denied read/write permissions would be a neat way to escape from chroot'ed environments and compromise a host system... How would it allow that? If you can open files outside your chroot environment--even files you would otherwise have read access to--it's not much of a chroot! Mounted procfs within a chrooted environment. Admittedly, FreeBSD is moving away from procfs, but on Linux, it's a serious issue, since such basic utilities as ps and so on won't work without it. Having O_NOACCESS would be useful for the fam port, for porting pieces of lilo, and probably for other things I
Re: O_NOACCESS?
M. Warner Losh wrote: Rewind units on tape drives? If there's no access check done, and I open the rewind unit as joe-smoe? The close code is what does the rewind, and you don't have enough knowledge to know if the tape was opened r/w there. Which brings up the idea of passing fp-fd_flags to VOP_CLOSE()... -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: O_NOACCESS?
andi payn wrote: As far as I can tell, FreeBSD doesn't have anything equivalent to linux's O_NOACCESS (which is not in any of the standard headers, but it's equal to O_WRONLY | O_RDWR, or O_ACCMODE). In linux, this can be used to say, give me an fd for this file, but don't try to open it for reading or writing or anything else. The standard does not permit this. First off, O_ACCMODE is a bitmask, and is guaranteed to be inclusive of the bits for O_RDONLY, O_WRONLY, and O_RDWR, but *not* guaranteed to not be inclusive of additional bits, reserved or locally defined but outside the _POSIX_SOURCE namespace. By using this value as a parameter, you could very well be setting many more bits, and you could be setting bits for local implementation options that you really, really do not want to set. Second, the standard is ambiguous as to how O_RDWR is defined; it is perfectly permissable to define these values as: #define O_RDONLY1 /* the read bit */ #define O_WRONLY2 /* the write bit */ #define O_RDWR (O_RDONLY|O_WRONLY) /* read + write */ In which case, your example is (O_RDWR|O_WRONLY) == O_RDWR. The standard does not indicate whether the implementation is to use bits, or sequential manifest constants, only that the bits that make up the constants be in the range covered by O_ACCMODE. In fact, externally, they are bits, but internally, in the kernel, they are manifest constants. This allows you to get an fd to pass to fcntl (e.g., for dnotify), or call ioctl's on, etc.--even if you don't have either read or write access to the file. The obvious question is, Why should this ever be allowed? Well, if you can stat the file, why can't you, e.g., ask kevent to monitor it? The most useful thing you could do with this, IMO, is opn a directory for fchdir(). Of course, allowing this on directories for which you are normally denied read/write permissions would be a neat way to escape from chroot'ed environments and compromise a host system... In FreeBSD, this doesn't work; you just get EINVAL. Having O_NOACCESS would be useful for the fam port, for porting pieces of lilo, and probably for other things I haven't thought of yet. (I believe that either this was added to linux to support lilo, or the open syscall just happened to work this way, and once the lilo developers discovered this and took advantage of it, it's been retained that way ever since to keep lilo working.) The latter is most likely. In any case, this would not be allowed by GEOM for the purpose to which LILO is trying to put it, unless you were to modify GEOM to add a control path for parents of already opened devices. If you did this, you might as well just add a proper set of abstract fcntl's to GEOM, and get rid of all the raw disk crap in user space, and unbreak dislabel and the other stuff that GEOM broke when it went in. On the other hand, BSD has done without it for many years, and there's probably a good reason it's never been added. So, what is that good reason? fcntl.h: #define FFLAGS(oflags) ((oflags) + 1) I don't think there's a backwards-compatibility issue. Unfortunately, yes, there is. The values are not bits, internally to the kernel. The conversion to internal form merely adds 1, it doesn't shift the values. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kevent and related stuff
andi payn wrote: First, let me mention that I'm not nearly as experienced coding for *BSD as for linux, so I may ask some stupid questions. I've been looking at the fam port, and this has brought up a whole slew of questions. I'm not sure if all of them are appropriate to this list, but I don't know who else to ask, so First, some background: On Irix and Linux, fam works by asking the kernel to send it a signal whenever the specified accesses occur. On FreeBSD, since there is no imon interface and no dnotify fcntl, it instead works by periodically stating all of the files it's watching--which is obviously not as good. The fam FAQ suggests that FreeBSD users should adapt fam to use the kevent interface. Yes. The file access monitor tool is the classic argument. I looked into kevent, and it seems like there are a number of problems that lead me to suspect that this is a really stupid idea. And yet, I'd assume that someone on the fam team at SGI and/or one of the outside fam developers would know FreeBSD at least as well as me. Therefore, I'm guessing I'm missing something here. So, any ideas anyone can offer would be very helpful. So, here's the questions I have: * I think (but I'm not sure) that kevent doesn't notify at all if the only change to a file is its ATIME. If I'm right, this makes kevent completely useless for fam. Adding a NOTE_ACCESS or something similar would fix this. Since I'm pretty new to FreeBSD: What process do I go through to figure out whether anyone else wants this, whether the interface I've come up with is acceptable, etc.? And, once I write the code, do I submit it as a pr? You add it, submit it as a PR, if send-pr will work from your machine properly, discuss it on the lists, and if someone with a commit bit has the time and likes the idea, it will be committed. * The kevent mechanism doesn't monitor directories in a sufficient way to make fam happy. If you change a file in a directory that you're watching, unlike imon or dnotify, kevent won't see anything worth reporting at all. This means that for directory monitoring, kevent is useless as-is. Again, if I wanted to patch kevent to provide this additional notification, would others want this? I'm not sure that this is correct, unless you are talking about monitoring all files in a directory by merely monitoring the directory. If you make a modification to the file metadata (e.g. add a link or rename it), then you will be notified that the directory has changed. The argument against subhierarchy monitoring is that it will, by definition, stop at the directory level, and it can not be successfully implemented for all FS types. * When two equivalent events appear in the queue, kevent aggregates them. This means that if there are two updates to a file since the last time you checked, you'll only see the most recent one. For some uses of fam (keeping a folder window up to date), this is what you want; for others (keeping track of how often a file is read), this is useless. The only solution I can think of is to add an additional flag, or some other way to specify that you want duplicated events. This is the classic edge triggered vs. level triggered argument that Linux people bring up every time someone suggest they implement kqueue in Linux. This is easily fixable: you seperate the flag from the data, adding an additional argument to KNOTE(). This also has the side effect of removing the restriction on the PID size, which is imposed by the limited number of bits left over for representing the PID. This is a trivial change, and I've done it several times. The way this works is that you establish, via definition of the udata argument, a contract between the kernel and the user space over what the udata means. The additional argument to KNOTE can then be used by the per event note handling code in the kernel to fill out a udata structure with as much data as you want to give it, and to identify the place in user space to copy it out to. For example, you could set up an accept filter to accept up to 10 connections at a time, and return the fd's into the user space structure's int [10] array and fill out the int count value with how many were returned. For your case, you could use it to copy out each and every event instance, rather than aggregating the events. * Unlike imon and dnotify, kevent doesn't provide any kind of callback mechanism; instead, you have to poll the queue for events. Would it be useful to specify another flag/parameter that would tell the kernel to signal the monitoring process whenever an event is available? (It would certainly make the fam code easier to write--but if it's not useful anywhere else, that's probably not enough.) You can SIGPOLL on the event descriptor returned by kqueue(). You can use it in a select() or poll() call. You can pass it to another kqueue() as an EVFILT_READ event. Snding signals (callbacks) is probably the
Re: non-root process and PID files
Nielsen wrote: Christopher Vance wrote: May I suggest a different feature: the ability to mark an open file (not just its fd) 'remove on close', with permission checked at mark time rather than close time (this status forgotten if not permitted when set) and the unlink actually done at close time only if the file has exactly one link and one open file instance at that time. WinNT (2K etc...) has this capability. Not saying that this makes it a good idea though. In all Windows supported FS's, there is no separation between the inode and the directory entry referencing the inode, so you'd expect them to be able to do this from an open file reference, since they can always get the directory entry back. In response to another post: saving the path at startup time is no good, since if someone moves the file (if it's open) or removes it preemptively (and it's closed), then there's a race window in which another instance of the program may start, and the program exiting removes the wrong file (or gets a permission denied error). All this hacking to solve the problem of harmless old files lying around in /var/run is fraught with peril... -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: freeing data segment
Vinod R. Kashyap wrote: I have this huge data structure in the data segment of my scsi driver. This data structure is initialized at driver build time, and is used only during driver initialization. I am trying to find out if I can free-up the memory it occupies, once I am done with the driver initialization. Does anyone know how to do this? You can either implement it in a separate ELF section that's marked init only (this is a defined ELF attribute), and then fix FreeBSD to honor discarding of such sections (FreeBSD doesn't implement very much of the capabilities of ELF), or... You can make two drivers, make the init driver depend on the other driver, load the init driver, have it's init routine call an entry point in the other driver to give it a callback into itself, do the callback to do the actual initialization, and then unload the second driver. Convoluted, but it works (I used it for a firmware downaload in a GigE driver at one point). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: non-root process and PID files
Christopher Vance wrote: You can already mark a fd 'close on exec'. May I suggest a different feature: the ability to mark an open file (not just its fd) 'remove on close', with permission checked at mark time rather than close time (this status forgotten if not permitted when set) and the unlink actually done at close time only if the file has exactly one link and one open file instance at that time. If all you have is an fd, you can not get from an fd to a path without an exhaustive search of the disk, in most FS's. Also, leaving the path peresent permits someone to hard-link it to some other file, to make it stay around. Since /var has a /var/tmp, this would be a real danger, I think. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fine grained locking (was: FreeBSD mail list etiquitte)
Robert Watson wrote: On Sat, 25 Oct 2003, Matthew Dillon wrote: It's a lot easier lockup path then the direction 5.x is going, and a whole lot more maintainable IMHO because most of the coding doesn't have to worry about mutexes or LORs or anything like that. You still have to be pretty careful, though, with relying on implicit synchronization, because while it works well deep in a subsystem, it can break down on subsystem boundaries. One of the challenges I've been bumping into recently when working with Darwin has been the split between their Giant kernel lock, and their network lock. To give a high level summary of the architecture, basically they have two Funnels, which behave similarly to the Giant lock in -STABLE/-CURRENT: when you block, the lock is released, allowing other threads to enter the kernel, and regained when the thread starts to execute again. They then have fine-grained locking for the Mach-derived components, such as memory allocation, VM, et al. Deep in a particular subsystem -- say, the network stack, all works fine. The problem is at the boundaries, where structures are shared between multiple compartments. I.e., process credentials are referenced by both halves of the Darwin BSD kernel code, and are insufficiently protected in the current implementation (they have a write lock, but no read lock, so it looks like it should be possible to get stale references with pointers accessed in a read form under two different locks). Similarly, there's the potential for serious problems at the surprisingly frequently occuring boundaries between the network subsystem and remainder of the kernel: file descriptor related code, fifos, BPF, et al. By making use of two large subsystem locks, they do simplify locking inside the subsystem, but it's based on a web of implicit assumptions and boundary synchronization that carries most of the risks of explicit locking. Are you maybe looking at the Jaguar (10.2) code base here? I personally fixed a number of these types of issues. Mostly, these types of issues come down to the fact that funnels, like the BGL in FreeBSD, are actually code locks, not data locks. FWIW, it looks like the 10.3 (Panther) code is available from the Darwin (7.0) project now: http://developer.apple.com/darwin/ Mixing code and data locks is where most problems happen; it's one of the reasons that the BGL and the push-down to fine grain locking in FreeBSD is, I think, the wrong way to go. The destination is right, but the path is probably wrong, and things will get much worse with the FreeBSD approach before they get any better. Part of the problem is FreeBSD's mutex implementation is complicated, and permits recursion (thereby tacitly encouraging recursion). I personally think that should result in a panic -- though you could maybe make an argument for it on the basis of pool mutexes, where the same mutex pool is used for all your locks. But FreeBSD doesn't really do that effectively, so it's kind of a lose/lose situation. It's really, really hard to put multiple code locks into a kernel, and get things right. That's the problem with the FreeBSD approach: having a BGL in the first place implicitly turns all your data locks that have to coexist with the BGL code lock into code locks as well. I think the right approach would have been to start with a single pool mutex implementation, allowing recursion, and a macro wrapper for grabbing/releasing locks so that the locks could be substituted for per-data item locks instead of pool locks, and the per-data item locks would *disallow* recursion. A smart appoach to that would be to have a structure prefix on all kernel structures, such that you could abuse a cast and use a single mutex implementation for all data items. It's also worth noting that there have been some serious bugs associated with a lack of explicit synchronization in the non-concurrent kernel model used in RELENG_4 (and a host of other early UNIX systems relying on a single kernel lock). These have to do with unexpected blocking deep in a function call stack, where it's not anticipated by a developer writing source code higher in the stack, resulting in race conditions. In the past, there have been a number of exploitable security vulnerabilities due to races opened up in low memory conditions, during paging, etc. I think this is a general problem. I noted early on that FreeBSD was going to have issues in this regard once there was real kernel threading support. The KSE model is much less prone to triggering the race conditions, but any model where the same process can be in the kernel multiple times is problematic. Particularly since the BSD code doesn't really support the concept of cancellation of a blocking system call in progress (you don't have a means of doing the wakeup to fail the tsleep's with a recognizable error code that could mean cancellation to back out state). It's
Re: non-root process and PID files
Leo Bicknell wrote: Dan Langille wrote: Any suggestions? Here's a slightly backwards concept. We're all familar with how you can open a file, remove it from the directory, and not have it go away until the application closes it. Well, extend those semantics to the namespace. That is, have a directory where any name that does not exist can be opened RW, any name that does exist can be opened RO. A file is automatically removed when no one has an open descriptor to it anymore. This is a somewhat neat idea. However, it would open a pretty big race window, and you could denial-of-service a server by creating a PID file belonging to some server, and leaving it there with a bogus PID in it, and anything that was watching the file R/O to kill -0 it to check if the processs needs to be restarted would always think the process needs to be restarted. 8-). Basically, all your processes would end up needing to be SUID root, at least initially, which would mean breaking most mail server software. They'd need that so that you could deny any create except by root to keep ordinary users from DOS'ing a daemon. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some mmap observations compared to Linux 2.6/OpenBSD
Ted Unangst wrote: On Fri, 24 Oct 2003, Michel TALON wrote: What is more interesting is to look at the actual benchmark results in http://bulk.fefe.de/scalability/ in particular the section about mmap benchmarks, the only one where OpenBSD shines. However as soon as touching pages is benchmarked OpenBSD fails very much. look closer. openbsd's touch page times are identical to what you'd expect a disk access to be. the pages aren't cached, they're read from disk. so compared to systems that don't read from disk, it looks pretty bad. a 5 line patch to fix the benchmark so that the file actually is cached on openbsd results in performance much in line with freebsd/linux. Why does the benchmark need to be fixed for OpenBSD and not for any other platform? My point here is that a benchmark measures what it measures, and if you don't like what it measures, making it measure something else is not a fix for the problem that it was originally intended to show. Microbenchmarks are pretty dumb, in general, because they are not representative of real world performance on a given fixed load, and are totally useless for predicting performance under a mixed load. That said, if this microbenchmark bothers you, fix OpenBSD. I know that Linux has some very good scores on LMBench, and that optimiziing the code to produce good numbers on that test set has pessimized performance in a number of areas under real world conditions. Unless there's an obvious win to be had without additional cost, it's best to take the numbers with a grain of salt. THAT said, it's probably a good idea for the other BSD's to use the read/black code from OpenBSD as a guid for their own code. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD mail list etiquette
John-Mark Gurney wrote: Wes Peters wrote this message on Thu, Oct 23, 2003 at 01:43 -0700: Kip Macy, other DragonFlyBSD developers, and anyone else wishing to contribute are invited to join and participate in the open FreeBSD mail lists, sharing code, design information, research and test results, etc. according to their own will. We welcome input from everyone, including constructive criticism of weaknesses or flaws in FreeBSD. And patches (against FreeBSD) are highly encouraged. It rarely helps to simply point out flaws (or showing how X OS runs soo much better than FreeBSD, why are you guys even running FreeBSD?) w/o showing code to fix it. Note: I am not speaking as an offical representive of FreeBSD, just as a developer who has too few time to try to code up a patch for code I haven't seen. And considering that DragonFlyBSD is based upon FreeBSD coming up with said patches should be trivial. First off, I really appreciate the mmap() discussion which has taken place. Someone has done a lot of work to create benchmarks, which, while being microbenchmarks, are a hell of a lot more useful than most of their kind. Further, they've pointed out where to get code to get comparable results in FreeBSD, licensed under a two clause BSD license, which means the only issue facing anyone is one of trivial integration. Second, Kip Macy and Matt Dillon have done some excellent work on the checkpointing code. It's basically ELF-based, and requires only small changes to the exec to set up the process for being able to be checkpointed and restarted. Again, the license is a two clause BSD license, and again, the only work necessary to get this over to FreeBSD is integration. When someone offers you a gift, you don't jump down their throat with jack-boots on, complaining about how the gift is wrapped or what color it is; you shut the hell up about any complaints and say Thank you. If the wrapping bothers you, well, you're going to remove it anyway. If the color bothers you, wait until they leave and paint the damn thing. If they come for a visit, they will be much more likely to be happy that you put it on display on the mantle than unhappy that you changed its color. Frankly, FreeBSD has too many cooks, and not enough bottle washers; this is a euphimism for saying that all anyone with a commit bit seems to want to do any more is write new code, and no one is willing to take on the integration and maintenance tasks. In Linux, this work is done by Linus, Alan Cox, and a couple of other people. People get commit bits so that they can do integration, and so that patches don't sit in bug databases for 6 years unintegrated. The problem with this imbalance, is that you seem to be unwilling to hire bottle washers, and people willing to wash bottles when there are no clean bottles left are never given any respect, and certainly not the level of respect accorded to cooks. You guys need to get your heads out, and give out some commit bits to some people willing to do the dirty work of integration of the code people are donating, and of closing out bug database entries where code is provided, and writing code that demonstrates the bug database problem and coming up with a fix and integrating *that*, where patches aren't provided. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Benchmarking kqueue() performance?
Lev Walkin wrote: One of the most comprehensive sites about that problem is: http://www.kegel.com/c10k.html That's about scaling to a large number of connections, not about kqueue() vs. select performance. The biggest problem with a large number of connections, at least as far as FreeBSD is concerned, is the TCP timer implementation using a callout wheel, since any expiring timer has to traverse every bucket in the chain, instead of stopping at the first one that's un expired (see the BSD 4.2/4.3 timers for an example of the right way to do it). FWIW: I've had a FreeBSD box with a static page server on it up to 1.6M simultaneous connections with very little work, so 10K is pretty trivial in comparison. For doing real work, and giving 1G to a server process and 512M to caching, this number drops to ~250K connections, but that's still 25 time what he claims is some insurmountable barrier. BTW, the company for which I did this work is still shipping real product that handles those loads on a FreeBSD box, FWIW. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: GEOM Gate.
Wilko Bulte wrote: On Tue, Oct 14, 2003 at 10:44:14PM +0200, Oldach, Helge wrote: From: Richard Tobin [mailto:[EMAIL PROTECTED] Ok, GEOM Gate is ready for testing. For those who don't know what it is, they can read README: Aaargh! It's the return of nd(4) from SunOS. Excuse me? # uname -a SunOS galaxy 4.1.4 18 sun4m Too new.. Yeah... Think Sun2 systems http://www.netbsd.org/Documentation/network/netboot/nd.html -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Dynamic reads without locking.
Harti Brandt wrote: You need to lock when reading if you insist on consistent data. Even a simple read may be non-atomic (this should be the case for 64bit operations on all our platforms). So you need to do mtx_lock(foo_mtx); bar = foo; mtx_unlock(foo_mtx); if foo is a datatype that is not guaranteed to be red atomically. For 8-bit data you should be safe without the lock on any architecture. I'm not sure for 16 and 32 bit, but for 64-bit you need the look for all our architectures, I think. If you don't care about occasionally reading false data (for statistics or such stuff) you can go without the lock. For a read-before-write type read, this is always true. For an interlock read, this is also true, if you intend to use the data you read as a timing-dependent one-shot (e.g. for a condition variable, etc.). For certain uses, however, it's safe to not lock before the read *on Intel architectures*. This can go out the window on SPARC or PPC, or any architecture where there is no guarantee that there won't be speculative execution or out-of-order execution without explicit synchronization. For instance, on a PPC, there are two instructions that are used to implement a mutext, isync and dsync, one which is used to check it, and the other which is used to take it. Using these, it's a lot less expensive to implement a mutex, and you are guaranteed that after you take a mutex, there's no speculative execution coherency issues left outstanding. For things with an explicit periodicity, with no explicit guarantee of order of operation, reading without a lock will do no damage. For example, consider the case of a push-model migration of processes from one CPU to another, and a (in the general case) lockless implementation of a per CPU scheduler queue with explicit migration requirements -- this avoids the FreeBSD issue of having to take a lock each time you enter the scheduler, and avoids the IPI that results, as well as implying implicit CPU affinity; here's how it works: Each CPU has 3 values associated with it: int loadptr migration queue ptr run queue When it's time to schedule a process on a given CPU, here is the order of operation: 1) Compare migration queue ptr to NULL; this is a lockless operation. 2) If the ptr is non-NULL, a) take a lock on the migration queue. b) move items on it to the per CPU run queue -- LIFO; writing the per CPU run queue is a lockless operation, since no other CPU is permited access to it (no pull of work for migration). c) NULL the migration queue head of the now empty migration queue. d) drop the migration queue lock. 3) Take the entry off the top of my run queue; this is what I'm going to run next. 4) Compare my 'load' value against that of other CPUs; since a CPU is the only one permitted to write it's own load value, this is a lockless operation for the read of all other CPU's 'load' value, so long as that value is accessible in a single bus/memory cycle. 5) If my load is significantly higher than the lowest of all other CPU's, consider migrating work to the lowest of all other CPU's: a) Locate any candidates for migration; these is the current top of my own run queue *after* the removal of the next thing I'm going to run, *without* the don't migrate bit set. b) If a candidate exists, i) Remove the item from the local run queue (lockless). ii) Take the lock on the target CPU migration queue. iii)Put the item on the target migration queue. iv) Drop the migration queue lock. 6) Recalculate my local 'load' value for the benefit of other CPUs. 7) Start executing the item obtained in step #3. Because the operation occurs periodically, and because you skip items that are marked non-migrate (you could have a bounded depth to the search, if so desired), there is the ability to implement explicit CPU affinity, and there is also the ability to implement implicit CPU affinity (since there is hysteresis in the act of deciding to push work off to another CPU). The purpose of LIFO ordering, and examining the migration queue before examining the local run queue *after* the LIFO ordering of inserts of work from other CPUs is that the work on the other CPU that pushed to you was at the top of its run queue, and this avoids penalizing migrated objects one scheduler cycle latency. Note that all this works, even in a decoherent system on the migration queue examination, since the worst case is you delay a migration for up to two scheduling cycles (one for the 'load' value on the
Re: Dynamic reads without locking.
Frank Mayhar wrote: The other thing is that the unlocked reads about which I assume Jeffrey Hsu was speaking can only be used in very specific cases, where one has control over both the write and the read. If you have to handle unmodified third-party modules, you have no choice but to do locking, for the reason you indicate. On the other hand, you can indeed make such a rule as you describe: For this global datum, we always either write or read it in a single operation, we never do a write/read/modify/write. Hey, if it's your kernel, you can make the rules you want to make! But it's better to not allow third-party modules to use those global data. I'm pretty sure that Jeffrey is aware of read-modify-write issues with atomic vs. idempotent multi-instruction operations, since he generally knows what he's doing. 8-). Probably the most interesting cases, from his perspective in the network stack, are queue insertions and removals for singly linked queues, where the insertion OR removal can be done atomically without taking locks (but not both, except on fully MESI coherent systems without speculative execution). FreeBSD's use of queue structures is sometimes overkill, and the macro order of operations as they are currently defined prevent non-locking operations, even where they would be safe, if the order of operation were reversed (e.g. for a simple singly linked tail queue). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Recovery from mbuf cluster exhaustion
Peter Bozarov wrote: [ ... ] What I can't seem to figure out is how to flush out the stale mbufs/clusters. I can close down all network interfaces, and kill/restart most of the processes that I presume use up the mbufs. At a given point, there can't possibly be any processes that are hogging the mbuf clusters. Yet, a while later, this is what the pool looks like [grid] ~ $ netstat -m 4305/4944/18240 mbufs in use (current/peak/max): 4305 mbufs allocated to data 4304/4560/4560 mbuf clusters in use (current/peak/max) 10356 Kbytes allocated to network (75% of mb_map in use) 8832 requests for memory denied 1 requests for memory delayed 0 calls to protocol drain routines [grid] ~ $ A few clusters have been freed. But not much. Now, if (presumably) no clusters are being used by a process, should they not be released by the kernel? Alternatively, how can I enforce this (short of rebooting the machine, which is *not* the solution I'm looking for)? Wait for 2*MSL for the network connections to go away. Assuming the other end is still there, and not some network loading device that effectively SYN-floods and establishes real connections (e.g. a Web Avalanche or similar product). Doing a netstat -a will show you a list of active connections, of which I'm sure you have more than a few hanging around, even though you killed the process that opened them. You will see a number of bytes in the receive queue or transmit queue columns, and these will indicate the amount of data that you have pending in queues that's either not being read by your application, or that your application has written, but which cannot be sent because the other side of the connection has been shut off, lost network connectivity, died, or intentionally started a transfer with no intention of actually reading the data you were going to send (e.g. the Microsoft WAST web tool for benchmarking does this, and so does httpload). Probably, you need more mbuf clusters, and therefore more mbufs as well. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HZ = 1000 slows down application
Luigi Rizzo wrote: On Tue, Oct 07, 2003 at 06:17:04PM -0400, [EMAIL PROTECTED] wrote: We did some intensive profiling of our application. It does not seem like we are depending on clock ticks for any calculations. On the other hand we notice that our slow iterations happen almost at the same instant as microuptime went backward messages in the system log. We if this is the case, probably your code at some point computes a time difference which turns out negative (or if it is unsigned, it becomes very very large) upon those events, thus causing some loop to explode. It should be easy to check if this is the case, and just ignore those outliers rather than trying to figure out why the clock goes backward. I used to see the same microuptime went backwards msg on some of my 400MHz boxes, even without NTP enabled. Maybe a buggy timer, not sure which timecounter was used on that box (i read some time ago that the cpu on the soekris4801 has a weird TSC implementation where the upper 32 bits change when the lower 32 bits are 0xfffd, who knows what other bugs might be in other hardware...) FWIW: Internally, MacOS X supports monotime, which is a monotonically increasing time counter, guaranteed to not go backwards. That avoids problems exactly like what you are describing. FreeBSD should consider supporting a monotime. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: netisr
Giovanni P. Tirloni wrote: I'm studying the network stack and now I'm confronted with something called netisr. It seems ether_demux puts the packet in a netisr queue instead of passing it directly to ip_input (if that was the packet's type). Is this derived from LRP ? No. NETISR is a software interrupt that runs when software interrupts run, which is to say, when the SPL is lowered as a result of returning from the last nested hardware interrupt, which means on hardware and clock interrupts. It is completely antithetical to LRP, which, despite the name Lazy Receiver Processing, is mostly only lazy about direct dispatch. I've read their paper and it looks like their network channel (ni) but I'm not sure. No. There are two LRP papers from Rice University. The first was against FreeBSD 3.2, and dealt with just the idea of LRP. The scond makes things much more complicated than they need to be, in order to introduce a concept called ResCon, or resource containers. Neither set of code is really production, because it uses an alternate protocol family definition and a seperately hacked uipc_socket.c and uipc_socket2.c, along with the totally parallel TCP/IP stack. The closest you can come to LRP in FreeBSD 5.x is to use the sysctl's for direct dispatch, which will, in fact, directly call ip_input() from interrupt. This isn't a full LRP, since it doesn't add a receive buffer into the proc structure for per-process enqueuing of packets. When I implemented the 3.x version of Rice's LRP in FreeBSD 4.3, I avoided this hack. The main reason for the hack was to deal with accepting connections, since at interrupt, without a proc structure, there was no way to deal with the socket creation for the accept, due to a lack of an appropriate credential. The sneaky approach I used for this was to create the accept socket using the cred that was present on the listen socket on which the connection had come in. For this to be at all useful, you need to extend kevent accept filters to allow creation of accepted descriptor instances in the process context, and throw them onto the kqueue that was set up against the listen socket. I recommend that if you want to play with LRP, you add an attribute flag to the protocol stack structure to indicate an LRP capable vs. an LRP incapable stack, and then implement it inline, rather than as a separate thing. I also recommend that if you do this, you do it using the Rice 3.x code, and ignore the ResCon stuff, which I think is an interesting idea, but which adds very little in the way of real value to things (though it does add overhead). Where can I find more information on this? If you are asking about NETISR, then I recommend W. Richard Steven's books, specifically TCP/IP Illustrated, Volume 2: The Implementation. If you are asking about LRP, then any search engine search for Lazy Receiver Processing should turn up the two Rice University and the one Duke University reference, as well as dozens of other references, including the one to the Lucent/IBM work on QLinux (which has some other neat information on things like WFS Queueing and other things that are actually necessary to avoid the potential for livelock at the user/kernel boundary). FWIW: Interrupt threads, as they are used in 5.x, are pretty much antithetical to an LRP implementation, since you can still end up live-locked under heavy load (or denial of service attack), which is why you wanted LRP in the first place: to make progress under a load heavier than you could possibly respond to in a reasonable time. The problem is that the requeing to the interrupt thread adds back in the same type of transition boundary you were trying to take out by getting rid of NETISR in the first place. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Changing the NAT IP on demand?
Julian Elischer wrote: On Mon, 6 Oct 2003, Leo Bicknell wrote: In a message written on Sun, Oct 05, 2003 at 08:11:05PM -0600, Nick Rogness wrote: In addition to keeping your NAT translations (as suggested by Wes), you need to also keep routes for those entries as well, so that preserved traffic remains to route out the right ISP even if a switch occurs. You're right, however I would go with a different mechanism, but one I've also never tried to do. What you want is routing based on the source address of the packet, not the destination as per usual. You want to be able to say source a.a.a.a goes out link A. I've never tried to do it on FreeBSD (it's easy on say Cisco's, with a bit of a performance hit on some platforms). this is very easy using the ipfw 'fwd' rule.. Actually, it's very hard; the 'fwd' rule doesn't quite cut it for things like, for example, NFS. It also fails to work with aliases, when you want the packet sent from a specific IP on a given interface. There are some workarounds, like binding it locally to an IP, but that's not so good when you are wanting to be able to change IP addresses, as in the case in point. Cisco really does routing differently. One thing that would be handy is a socket type that was a TCP stream socket, but which was bound to an interface, rather than to a specific IP address. This only solves some of the problems, like not having to restart your already listening servers when the IP address changes out from under it (e.g. the kick sendmail in the head issue we had with the InterJet I's II's when they were running dialup in dial-on-demand mode). Really, you need to have routing implemented like it's implemented in Cisco's, and associate the reverse route with the last packet you received from a given IP address (you always have this because you had to have it to do the handshake). The FreeBSD routing code tends to get the routing wrong some of the time, particularly in picking the local address to send a packet from. The problem with the ipfw forwarding is that you don't apriori know who's going to be talking to you, so you can't really make preestablished rules for the forwarding for every possible IP address that's non-local and across one link or the other. I suppose you could establish rules when you saw packets, but that would require running as root, and hacking all your servers to do the right thing anytime you got a connection. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Why is PCE not set in CR4?
Bruce M Simpson wrote: Now that I think on this a bit more, a sysctl might be a better place to put this, but it seemed to belong with the i386_vm86() bits, rather than polluting initcpu.c right away. The important thing is to allow the kernel to intermediate and control allocation of counters to applications, so where you put it is less important than that it be a procedural interface. A sysctl can be a procedural interface, but it's kind of ugly. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Why is PCE not set in CR4?
Bruce M Simpson wrote: On Wed, Oct 01, 2003 at 11:39:36AM +0200, Grumble wrote: However, I am not allowed to use the RDPMC instruction from ring 3 because the PCE (Performance-monitoring Counters Enable) bit is not set. You can do it with /dev/perfmon. man 4 perfmon. I have read the perfmon documentation and source code. For several reasons, I do not think it is totally adequate in my situation. [ ... ] This is an extension to the i386_vm86() syscall which will let you turn PCE on and off if you're the superuser. I like this a lot better. To answer the inevitable question of why: PCE counters are a scarce resource, and the kernel needs to run interference on their allocation and deallocation by user space applications, to avoid collisions between applications; this is the same reason we have AGP and sound card device drivers in the kernel. I'm not sure if restricting this to root users is exactly necessary, but it can't hurt, given that there is a performance denial of service possible otherwise. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: has anyone installed 5.1 from a SCSI CD?
Peter Jeremy wrote: On Sun, Sep 28, 2003 at 06:14:25PM -0400, Sergey Babkin wrote: BTW, I have another related issue too: since at least 4.7 all the disk device nodes have charcater device entries in /dev. 'block' vs 'character' has nothing to do with random or sequential access and any application that thinks it does is broken. Any application that directly accesses devices must understand all the various quirks - ability to seek, block size(s) supported, side- effects of operations etc. As opposed to the kernel understanding them, and representing the classes of devices uniformly to the user level software. Yes, block devices must be random access, but character devices can be either random or sequential-only depending on the physical device. But character devices can't be random-only. Therefore, you can assume the ability to perform random access on block devices, and you can assume character devices require sequential access, and your software will just work, without a lot of buffer copying around in user space. The only purpose for block devices was to provide a cache for disk devices. It makes far more sense for this caching to be tightly coupled into the filesystem code where the cache characteristics can be better controlled. Actually, there are a number of other uses for this. The number one use is for devices whose physical block size is not an even power of two less than or equal to the page size. The primary place you see this is in reading audio tracks directly off CD's. Another place this is useful is in the UDF file system that Apple was prepared to donate to the FreeBSD project several years ago. DVD's are recorded in two discrete areas, one of which is an index section, recorded as ISO9660, and one of which is recorded as UDF. By providing two distinct devices to access the drive, it was possible to mount the character device as ISO9660, and then access the UDF data via the block device. Again, we are talking about physical block sizes of which the page size is not an even power of 2 multiple. Another use for these devices is to avoid the need for some form of intermediary blocking program (e.g. dd, etc.) for accessing historical archives on tape devices. Traditional blocking on tape devices is 20K, and by enforcing this at the block device layer, it's possible to deal with streaming of tape devices without adding an unacceptable CPU overhead. Another issue is Linux emulation; Linux itself has only block devices, not character, and when things are the right size and alignment, the block devices pass through and act like character devices. However... this means that Linux software which depends on this behaviour will not run on FreeBSD under emulation. Finally, block devices serve the function of double-buffering a device for unaligned and/or non-physical block sized accesses. The benefit to this is that you do not need to replicate code in *every single user space program* in order deal with buffering issues. There has historically been a lot of pain involved in maintaining disk utilities, and in porting new FS's to FreeBSD, as a result of the lack of block devices to deal with issues like this. I'll agree that the change has been mostly harmless -- additional pain, rather than actually obstructing code from being written (except that Apple didn't donate the UDF code and it took years to reproduce it, of course, FreeBSD doesn't appear to have suffered anything other than a migration of FS developers to other platforms). On the other hand, a lot of the promised benefits of this change never really materialized; for example, even though it's more efficient in theory, Linux performance still surpasses FreeBSD performance when it comes to raw device I/O (and Linux has only *block* devices). We still have to use a hack (foot shooting) to allow us to edit disklabels, rather than using an ioctl() to retrive thm or rewrite them as necessary, etc., and thus use user space utilities to do the work that belongs below an abstract layer in the kernel. I'm not saying that FreeBSD should switch to the Linux model -- though doing so would benefit Linux emulation, and, as Linux demonstrates, it does not have to mean a loss of performance -- but to paint it as something everyone agreed upon or even something everyone has grown to accept is also not a fair characterization. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: user malloc from kernel
earthman wrote: how to allocate some memory chunk in user space memory from kernel code? how to do it correctly? If your intent is to allocate a chunk of memory which is shared between your kernel and a single process in user space, the normal way of doing this is to allocate the memory to a device driver in the kernel, and then support mmap() on it to establish a user space mapping for the kernel allocated memory. In general, you must do this so that the memory is wired down in the kernel address space, so that if you attempt to access it in the kernel while the process you are interested in sharing with is swapped out, you do not segfault and trap-12 (page not present) panic your kernel. If your intent is to share memory with every process in user space (e.g. similar to what some OS's use to implement zero system call gettimeofday() functions, etc.), then you want to allocate the memory in kernel space (still), make sure it's on a page boundary, and set the PG_G and PG_U bits on the page(s) in question. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: user malloc from kernel
Pawel Jakub Dawidek wrote: On Mon, Sep 29, 2003 at 06:56:13PM +0300, Peter Pentchev wrote: + I mean, won't the application's memory manager attempt to allocate the + next chunk of memory right over the region that you have stolen with + this brk(2) invocation? Thus, when the application tries to write into + its newly-allocated memory, it will overwrite the data that the kernel + has placed there, and any attempt to access the kernel's data later will + fail in wonderfully unpredictable ways :) I'm not sure if newly allocated memory will overwrite memory allocated in kernel, but for sure process is able to write to this memory. Sometime ago I proposed model which will allow to remove all copyin(9) calls and many copyout(9), but I'm not so skilled in VM to implement it. You probably need two pages; one R/O in user space and R/W in kernel space, and one R/W in both user and kernel space. The copyin() elimination would use the R/W page. Frankly, I have to say that you aren't saving much by eliminating copyin() this way, and most of your overhead is going to be data copies with pointers, and it doesn't really matter where you get the pointers into the kernel, the bummer is going to be copying around the data pointed to by the pointers. For the copyout, you'd probably get a rather larger benefit if you could implement getpid(), getuid(), getgid(), getppid(), and so on, in user space entirely, just by referencing the common read-only page. You could probably also benefit significantly by deobfuscating the timer code and using a flip-flop timer and externalizing the calibration information in a single globally read-only page (PG_G, PG_U, R/O mapping one place, kernel-only R/W mapping another), and then using it to implement a zero system call gettimeofday() operation (there's really no need to have a huge list of timers, if updates are effectively atomic at the clock interrupt, and you use a flip-flop pointer to only two contexts instead of a huge number of them). Specifically, you could find yourself with a huge performance improvement in anything that has to log in the Apache/SQUID styles, which require a *lot* of logging, which would mean a *lot* of system calls. You could also use a knote for this, which is only returned when other knote's are returned, and not otherwise, but that would be a lot less friendly to third party source code that was not specifically adulterated for FreeBSD. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [PATCH] : libc_r/uthread/uthread_write.c
Daniel Eischen wrote: If you are using libkse or libthr, you will get a partial byte count and not zero because the tape driver returns the (partial) bytes written. So exiting the loop in libc_r and returning 0 would only seem to correct the problem for libc_r. If there is a difference, it could be because libc_r is using non-blocking IO behind the scenes, and sa(4) may be returning partial byte count in the non-blocking case and 0 (or -1 and ENOSPC) in the blocking case (which is what you'd get using libkse/libthr). I would think that for non-block multiple and/or non-block-aligned writes, there's no way to avoid the fault-in penalty for the need to do read-before-write, so there will always be some unavoidable stalls. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TCP information
Dan Nelson wrote: These types of statistics aren't kept. They usually do not make it into commercial product distributions for performance reasons, and because every byte added to a tcpcb structure is one byte less that can be used for something else. In practice, adding 134 bytes of statistics to a tcpcb would double its size and halve the number of simultaneous connections you would be able to support with the same amount of RAM in a given machine (as one example), if all of that memory had to come out of the same space, all other things being equal. tcpcb is currently 236 bytes though, and I don't imagine adding another 8 bytes for an unsigned long dropped packets counter is going to kill him. 236 is too large. We do stupid things like not compressing the state. For example, there is state that is unique to a listen socket and state that is unique to a connecting socket: this state should be in a union, so that tcpcb's are smaller. The kqueue bloat, particularly that for accept filters is another issue. So is the bloated credential and other information, most of which belongs in application-specific extension data chains that are *only* used when the aplication is active vs. the TCP connection (e.g. when IPSec is active, when kqueues have been registered, etc.). In 4.x, the structure size was 134 bytes (maybe 136; depends on which 4.x, I guess). The exra 100 bytes are cruft. Removing the cruft and compressing the state with a union would get you just under 128 bytes, so the current structure is almost 100% additional bloat for features that are rarely used, or are used, but are generally only in effect on a small number of the open sockets you are dealing with; very very annoying. Deepak: if you really want stats, try adding a struct tcpstat to tcpcb and hack all the netinet/tcp* code to update those whenever the global tcpstat gets updated. You'll get all the info that netstat -s prints, for each socket. *That* will definitely double the size of struct tcpcb :) The statistics gathering really should be macrotized, and a macro declaration added for this. You could then make it a compile-time option as to whether or not you gather the stats (default to off!). Assuming some FreeBSD committer is willing to stick the macros in the headers and the instrumentation points. If you did the extension structure chaining trick, noted above, you could even make it runtime adjustable; however, you would need to (1) add a timestamp to the structure to indicate the start time for statistics gathering and (2) walk the list of open sockets to add an extension for each of the already open sockets in the system. You could even have a seperate set of commands (I would suggest a psuedo device driver for doing it) to enable/start/stop/disable, so you can leave dormant extension structure lying around to control sample intervals separated by non-sample intervals of indeterminate length. Either way, though, I think you would want it to be off by default, just like you want the IPSEC to be off by default, given that it soaks up a huge default object per socket just by bing compiled in, even if the socket never actually uses the feature. 8-(. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TCP information
Deepak Jain wrote: If the tcpcb struct were expanded/changed and the various increments were added in the appropriate packet pushing code, this would work right? Is there something non-obvious that one would need to worry about to undertake such a project? Your overhead would be slightly higher when doing statistics, because you would need to store more information for each of the statistics you wanted to gather. The main reasonable objections to doing this by default would be (1) added overhead and (2) increased per-connection costs. The first of these is an issue for everybody. The second of these would be an issue for connections which remain idle for significant fractions of time. I'd call 20% or more of the time idle significant for this purpose, so you could include FTP control channels, HTTP persistent connections, IMAP4 connections, database entry screens, and pretty much anything else that was client/server, had a slow human on one end of the client, and a persistent connection to the server on the other. See my other posting on how to do this at a slightly higher cost, but only when it's enabled, via a pointer indirection, or at equal cost without one, as a compile-time option. I think that approach would be better for your purposes, particularly if you wanted to offload the code maintenance, rather than reintegrating a lot of patches for each release you wanted to run on. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TCP information
Deepak Jain wrote: Is there a utility/hack/patch that would allow a diligent sysadmin to obtain which specific TCP connections are generating retransmits and receiving packet drops? netstat will show me drops on an interface, but not on a specific source/dest pair? I am guessing something like a netstat -n, but instead of showing send/rec queues it shows retransmit or packet drops? Would there be much interest in this feature if we were to build it ourselves? These types of statistics aren't kept. Generally, they are used only by network researchers, who hack their stacks to get them. They usually do not make it into commercial product distributions for performance reasons, and because every byte added to a tcpcb structure is one byte less that can be used for something else. In practice, adding 134 bytes of statistics to a tcpcb would double its size and halve the number of simultaneous connections you would be able to support with the same amount of RAM in a given machine (as one example), if all of that memory had to come out of the same space, all other things being equal. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Machine wedges solid after one serial-port source-lineaddition...
Barry Bouwsma wrote: You see, what I'm attempting to do, without knowing what I'm doing, is to implement the TIOCMIWAIT ioctl that apparently exists in Linux, to notify a userland program that there's been a status change on one or more of the modem status lines, and eliminate the need to poll the status line in question, cutting that program's cost to run by a factor of about 20 in the testing I did before the machine would wedge. I did all this offline, with no examples to follow, but now I have something to look at and see if I have the general idea. So I should probably shut up and study it. So, since a printf() is right out, is it safe for me (as a non- programmer, so forgive my ignorance of the basics) to simply use little more than a wakeup() in its place? Or does that, or the tsleep() corresponding, need some sort of careful handling to avoid the lockups I've experienced? I remember wakeup() being bad. Taking any time to do anything at all more than just queueing data and going away is probably bad. If it were my project, I'd mirror the values out to a status structure that's only written at interrupt, and read and reset at software interrupt, and then use the soft interrupt handler to raise the signals/send the wakeup/whatever and then resets the flags bits to zero via a call down that synchronizes like a baud rate or FIFO depth change (e.g. like the mouse line discipline does to set the FIFO depth to avoid jerky mouse movement). Bruce Evans is the authority in this area; you would be well advised to consult him directly. He may even already have code to do something similar to what you want (I think I remember code to signal a program on an RI going high, but I could be mistaken). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Any workarounds for Verisign .com/.net highjacking?
Clifton Royston wrote: For those who don't know what I'm talking about, try executing host thisdomainhasneverexistedandneverwill.com, or any other domain you'd care to make up in .com or .net. Verisign has abused the trust placed in them to operate a root name server, by creating wildcard A records directly under .com and .net, which point to Verisign's search website. If you get their A record in your resolver, pretend you got the standard error instead. It's a really easy resolver hack. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: pppoe - nmap - No buffer space available
[EMAIL PROTECTED] wrote: sendto in send_tcp_raw: sendto(3, packet, 40, 0, X.X.X.X, 16) = No buffer space available Your interface is down. This happens all the time. If you use PPP on a dialup modem with a normal net connection, and unplug the modem while you are doing a ping, you will see the same thing. The easiest fix is don't send packets out routes that transit interfaces which are not up. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: tty layer and lbolt sleeps
Mike Durian wrote: I'm trying to implement a serial protocol that is timing sensitive. I'm noticing things like drains and reads and blocking until the next kernel tick. I believe this is due to the lbolt sleeps in the tty.c code. It looks like I can avoid these sleeps if isbackground() returns false, however I can't figure out how to make this happen. The process is running in the foreground and my attempts to play with the process group haven't helped. Can anyone explain what is happening and nudge me towards a fix? You need your process to become a process group leader, and then you need the serial port you are interested in to become the controlling tty for your process. The first is accomplished with setpgid(2); the second is accomplished with setsid(2) and open(2) (the open must not specify O_NOCTTY). You can move around after that by calling tcsetpgrp(3). You can only have one controlling tty per process, so if you wanted to, for example, have a terminal emulation program that would quit when you turned off your terminal (on-to-off transition of DTR) *and* you *also* wanted it to receive SIGHUP when you got an on-to-off DCD transition from a modem, you would need two processes. See also the source code for getty(8) and the library utility function login_tty(3). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Machine wedges solid after one serial-port source-line addition...
Barry Bouwsma wrote: Would anyone care to explain why the following simple patch could be enough to wedge my machine solid? (My original hack-patches without any console printf() debuggery did the same thing within seconds, as well...) All it does is notify the console whenever a serial port DCD PPS signal transition is detected, as follows (patch against 4.foo; I haven't tried this with 5.bar or later -- also, not a real patch as I've included context and snipped my comments) : [ ... ] I'm wondering if it's something really blindingly obvious that I should be but am not aware of, or something I gotta work on to track down. You are calling printf() from a fast interrupt handler. You shouldn't call printf() from any interrupt handler, and particularly you shouldn't call it from something that can and will have a FIFO overrun well before the printf() gets back. If you need to communicate information to a console log (or wherever), then you should enqueue the information on the status change, and wake up some thread to do the actual processing of the information out to the console. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 20TB Storage System (fsck????)
Geoff Buckingham wrote: On Thu, Sep 04, 2003 at 01:12:45AM -0700, Terry Lambert wrote: Yes. Limit the number of CG bitmaps you examine simultaneously, and make the operation multiple pass over the disk. This is not that hard a modification to fsck, and it can be done fairly quickly by anyone who understands the code. The code in time to fsck the disk will go up inversely proportionally to the amount of RAM it's allowed to use, which is limited to the UVA size minus the fsck program size itself, and the fsck buffers used for things like FS metadata for a given file/directory. Pardon my ignorance but does the number of inodes in the filesystem have a significant impact on the memory requirement of fsck? I can't answer empirically, but extrapolating from the empirical data that I *do* have, the time is going to go up proportional to the number of blocks in use, and the number of blocks in use is going to equal the average number of blocks per file times the number of files, and given that there is one inode per file, you will bound the amount of blocks by bounding the number of inodes. This makes the answer yes, indirectly. What passes get run really depend on how your FS is configured. By default, a background fsck will only check for blocks that are marked as used in the CG bitmaps that are not actually used; so this is a CG bitmap vs. all inodes direct and indirect block lists consistency check only. Most of the incremental or multipass techniques I've discussed on the mailing list assume either a full fsck, or that you are able to lock individual CGs, or at least ranges on the disk, if you wish to do a BG check; or that you read-only the entire disk until you are done, and maintain a list of needs update items (this can be very compact, since it can be run-length encoded or otherwise highly compressed). If you read the fsck manual page and understand what it means, you can get some idea of what parameters effect it in what phases: Inconsistencies checked are as follows: 1. Blocks claimed by more than one inode or the free map. Every inode needs to be scanned to see what blocks in the free map should not be in the free map. The free map is the set of bits in the set of all cylinder group bitmaps. Cross-checking multiple references for directories is a process of combinatorial math. You take N inodes 2 at a time and compare them. A trick that is possible, if you are willing to rebuild the CG bitmaps in core, or are willing to double the space for the set you are examining would be to examine a range, and at the same time keep a shadow. Zero the shadow, and pass the list of inodes once, setting bits in the shadow. If you go to set a bit and it's already set, *then* you go back and find out who it was who had the bit set. This is probably an OK trade-off, particularly if you maintain a list of this-file-this-suspect-bit, and then pass the FS again (large numbers of cross-linked blocks are rare). The second of these operations is as expensive as: #inodes_used*(#inodes_used-1)*(#indirect_blocks**2-1) 2. Blocks claimed by an inode outside the range of the filesystem. What this really should say is which are outside the range. In other words, bogus block numbers. This is a compare that can be made during a direct linear search. 3. Incorrect link counts. This is a directory entry vs. inode count. The expense of this operation depends on whether you are directory-entry-major or on your pass, and the relative number of directory entries vs. inodes. For most FS's, the number of entries is going to be ~15% higher than the number of inodes; this is because of the hard links to directories from their parents, and to parent directories from their child directories. This number could be much, much higher on an FS with a large number of hard links per file. The thing you have to worry about is tracking the number of hard links per inode, and whether you can do this all in memory (e.g. with a linear array of integers of the same type size as the link count, whose length is equal to the number of inodes available in the system), or whether you have to break the job up and pass over the directory structure multiple times. If you can't keep all the items in memory, and must make multiple passes, then it's better to be inode-major; otherwise, it's better to be directory-major. 4. Size checks: Directory size not a multiple of DIRBLKSIZ. Simple check; can be done during one of the single linear passes. Partially truncated file. Also a linear check, but somewhat harder to handle. 5. Bad inode format. Self-inconsistent contents on inodes. 6. Blocks not accounted for anywhere. Every blocks in the free map that's not there and should be, because it's not claimed by a directory or inode. The free map is the set of bits in the set of all cylinder group bitmaps. This is the background fsck
Re: 20TB Storage System
David Gilbert wrote: Poul-Henning == Poul-Henning Kamp [EMAIL PROTECTED] writes: Poul-Henning I am not sure I would advocate 64k blocks yet. Poul-Henning I tend to stick with 32k block, 4k fragment myself. That reminds me... has anyone thought of designing the system to have more than 8 frags per block? Increasingly, for large file performance, we're pushing up the block size dramatically. This is with the assumption that large disks will contain large files. My assumptions on the previous two statements by Poul are: 1) You cannot trust that a short will be treated as an unsigned 16 bit value in all cases, so values that are between 32768 and 65535 may be treated incorrectly. 2) A fully populate block bitmap byte, which means a divide by 8, is necessary to avoid potential division errors. In other words, he's afraid that the sign bit and/or the block size bitmap used by frags may be treated incorrectly. I have to agree with both those observations. A number of people have, historically, reported issues with a divisor other than 8, and the worry about the sign bit is common sense, given the many historical issues faced by other OS's when it comes to 64K block sizes. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 20TB Storage System (fsck????)
Max Clark wrote: Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE could address more than 4GB of Ram. The kernel being able to address the RAM does not meant that the KVA+UVA space is larger than 4G. At best, you could take the uiomove/copyin/copyout performance hit, and move both of thse to 4G, each, rather than 4G total. That still limits you to 4G. If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram for 10TB of disk. Is this correct? Will PAE not function correctly to give me 8GB of Ram? To check 10TB of disk? No, it will not. Is there anyway to bypass this requirement and split fsck into smaller chunks? Being able to fsck my disk is kinda important. Yes. Limit the number of CG bitmaps you examine simultaneously, and make the operation multiple pass over the disk. This is not that hard a modification to fsck, and it can be done fairly quickly by anyone who understands the code. The code in time to fsck the disk will go up inversely proportionally to the amount of RAM it's allowed to use, which is limited to the UVA size minus the fsck program size itself, and the fsck buffers used for things like FS metadata for a given file/directory. I have zero experience with either itanium or opteron. What is the current status of support for these processors in FreeBSD? What would the preferred CPU be? Will there be PCI cards that I would not be able to use in either of these systems? I have no idea whether these systems support a larger UVA size, or how much memory you could jam into them... -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Ugly Huge BSD Monster
Denis Troshin wrote: Almost every package I install requires a few other packages. This 'idea of using dependent packages' turns FreeBSD (and other unix-systems) to an ugly monster. You're right. The authors of the offending software packages should not do that. It's going to be incredibly hard to get the FSF to quit using libibery, getline, gdb, etc., though. For example, I don't need Perl or Python but a few packages I install require them. Don't install those packages? Provide patches that remove the dependencies, if they are trivial? Rewrite the software from scratch, if the dependencirs turn out to be non-trivial? Does exist a programming under unix without these dependencies? Sure. Anything you are willing to write that doesn't do that. P.S. Under Windows it is possible to write not bad applications which depend just on libraries (KERNEL32, USER32, GDI32). And these libs exist on every base system!!! I beg to differ. InstallShield has a tendency to install the NT version of CTL3D.DLL over top of the Windows 95/98 version, breaking things utterly (as one example). Also, CRTL32.DLL no longer ships with the base system, but it is required for a lot of runtime executable code. It was left out of the base system in order to force people to distribute it, and that was done to impose license restrictions on where the resulting code can be run (i.e. it's free to redistribute with your applications, so long as you only run them on a Microsoft OS -- see the VisualDevStudio license next time you get a chance). Is it possible in unix? Before I thought that unix programs very compact, but they are huge! You're using the wrong programs. I'm going to guess you are installing Gnome or KDE or something like that that has a huge dependency list because it wants to have a huge feature list. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Looking for detailed documentation: Install to existingfilesystem
Charles Howse wrote: I'm a hobbyist, and for my personal education, I would like to learn how to install FBSD from an existing filesystem, rather than from FTP or CD. My intention is to copy the files to a directory on the second HDD of my present FBSD system, and point sysinstall to that partition/directory during the install. This is fairly easy. What you do it copy the files to a directory on a second disk drive on the machine, and point sysinstall to that partition/directory during the install. 8-). I can't seem to find any detailed instructions. The handbook just glosses over it, saying follow the instructions on the screen in sysinstall. I've Googled for days and can only find other people asking the same question and talking about their failures. The easiest thing to do is follow the instructions in sysinstall. The reason the handbook says this is that the order of dialogs, etc., in sysinstall varies from version to version of FreeBSD. Here is an example of detailed instructions for a particular version of FreeBSD. If you have a different version, you will need to use instructions matching the version of sysinstall you happen to be using: C return 13*cursor down space 7*cursor up space 7 (selects option 6 from Choose Installation Media dialog because sysinstall is stupid) return Enter the pathname of the directory containing the installation files return Q cursor right (Cancel) return U (for Upgrade) return (disclaimer screen) return return (to begin upgrade) I need to know: 1) What files do I need to have on the partition from which I will be installing? Everything from the first CDROM, in the same directory hierarchy. It's easiest if you just mount a CDROM, or mount a vnconfig'ed CDROM ISO image as a cd9660 FS. 2) ftp address and directory where I can find those files. ftp://ftp.freebsd.org/ The exact subdirectory depends on which particular version you are trying to upgrade to. 3) Can FBSD install from the .iso files? Yes, if you vnconfig them and mount them up as FS's so that to the system they appear as CDROMs, instead of ISOs. 4) A link to a tutorial or howto would really be nice. If none exists, I might consider writing one once I figure out how to do it properly. It's at least slightly different for every version of the system (see above). That's also why you didn't find one. PS: You should also vnconfig the floppy image on the vnconfig'ed ISO image, and pull sysinstall off there, instead of using your local copy. The sysinstall program has some string lists which are hard-coded, and may also vary from version to version. As it is a crunchgen'ed file, you will need to name it sysinstall for it to work. I suggest copying it to /tmp, and running it from there. PPS: I've posted detailed instructions on doing this at least three times in the past three years, since I needed to upgrade over NFS to a machine without anything but a local copy of FreeBSD that could be booted; start looking around June of 2001 in the -current and -hackers archives. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: GEOM Gate.
Attila Nagy wrote: Terry Lambert wrote: It works on firewire and it works on a dual port RAID array (as a separate box containing the RAID array). What does 'it' means? I guess it's not UFS, but the pure ability of sharing a device on a bus, connected to more than one adapters. The it was the subject of the previous sentence, which you diked out; that's how prepositional phrases work in English. 8-). In other words, multiple access to the same device from one or more SCSI controllers. SAN and NAS are also options, but of course, you still have to have an FS that can deal with it, and an external locking protocol. Right, we were talking about FreeBSD, which lacks such a filesystem :( I've said it before, and I'll say it again: porting GFS would be a really trivial amount of work, taking almost no creativity to do; the last time this subject came up and Sistina was offering to change their license, I ported all the user space utilities in under a day. I didn't finish off the whole FS port because I lacked the necessary disk drives and FreeBSD lacked the necessary controller driver for those disk drives, and the active maintainers claimed that they had a port in progress. These types of things are primarily busy-work and a way to spend money on hardware I'll likely never use in a production environment to end up with code under a license that prevents me from using it in a commercial product. That makes doing the work very uninteresting. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: GEOM Gate.
Attila Nagy wrote: Pawel Jakub Dawidek wrote: It'll be, but probably in read-write mode on one machine and read-only mode on rest machines, because you don't export file systems here, but disk devices. This doesn't work on a shared SCSI bus, so I suspect sharing the device on the net won't help. It works on firewire and it works on a dual port RAID array (as a separate box containing the RAID array). It's supposed to work on SCSI III, but the vendors can quit their arguing and jockey'ing for advantage long enough to approve the range locking specification (which is why GFS uses a network daemon). SAN and NAS are also options, but of course, you still have to have an FS that can deal with it, and an external locking protocol. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: a.out binaries
S.Gopinath wrote: I'm required to run a.out binaries like foxplus in a recent Intel based hardware. I have chosen FreeBSD 5.1 and successfuly installed. But I could not run a.out binaries like Foxplus. I tried it by load ibcs modules and aout modules in /boot/kernel directory. My foxplus did not work. I require your suggestions regarding this. I may not use FreeBSD 2.1 version as I require driver for Adaptec 7902 (Ultra Wide SCSI 320). The a.out file format is a file format, not an ABI definition. In the particular case you are talking about, the a.out file you are trying to run is an SCO Xenix ABI binary, rather than a 386BSD/FreeBSD 1.x/2.x binary. In order to support this, you would need to support the Xenix system call entry points, and to recognize the file as a Xenix a.out binary, and to trap to a Xenix system call entry table, rather than a FreeBSD or some other entry point table. This is actually not a very difficult thing to do (for example, you do not have to worry about shared libraries), but it requires kernel modifications in order to support it. This is not supported at this time. If you have a Xenix system with developement tools on it, a competent kernel programmer who knew both platforms and had access to the tools could probably have you up and running in a few weeks, at most. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: IP Network Multipathing failover on FreeBSD..??
maillist bsd wrote: Is it there have IP Network Multipathing failover on FreeBSD..?? how to do so?? Look for VRRP in /usr/ports/net. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: your mail
Kris Kennaway wrote: On Thu, Aug 14, 2003 at 03:14:27PM +0530, S.Gopinath wrote: I'm required to run a.out binaries like foxplus in a recent Intel based hardware. I have chosen FreeBSD 5.1 and successfuly installed. But I could not run a.out binaries like Foxplus. I tried it by load ibcs modules and aout modules in /boot/kernel directory. My foxplus did not work. Do you have the FreeBSD 2.x/3.x/4.x compatibility libraries installed (via sysinstall, or by setting the appropriate COMPAT* variable in /etc/make.conf)? If so, then please post an exact description of the commands you have run and the errors you receive. FoxPro is the later version of FoxBase, which was a Xenix clone of the Ashton-Tate dBase III software. The a.out binaries he's talking about are for one of the Xenix platforms that FoxPro ran on. The only ones I'm aware of that it ran on that were a.out were SCO Xenix 2.1.[23], 3.x, and Altos Xenix (386 platforms: Altos 686 and 886). He might also mean the Intel 320's running Intel Xenix, but I'm pretty sure that FoxBase was the only thing that ran on those platforms. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fw: your mail
S.Gopinath wrote: $ foxplus /usr/lib/foxplus/no87: 1: Syntax error: newline unexpected (expecting )) /usr/lib/foxplus/foxplus.pr: 1: Syntax error: word unexpected (expecting )) $ file /usr/lib/foxplus/no87 /usr/lib/foxplus/no87: Microsoft a.out separate pure segmented word-swapped V2.3 V3.0 286 small model executable Large Text Large Data $ file /usr/lib/foxplus/foxplus.pr /usr/lib/foxplus/foxplus.pr: Microsoft a.out separate pure segmented word-swapped not-stripped V2.3 V3.0 386 small model executable not stripped This is not a Microsoft Xenix binary, contrary to what file claims. This is an SCO Xenix binary. It's not going to work unless you make an execution class loader for the a.out magic number, add the system call entry point mechanism for it, and then emulate the system calls we don't support, and provide stub-calls for the onces we do support, with appropriate translation of any manifest constant values that differ between FreeBSD and Xenix. Do you have a Xenix system with the developement kit? This would be a trivial project for someone who knew FreeBSD and had access to a Xenix developement system. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Ultra ATA card doesn't seem to provide Ultra speeds.
John-Mark Gurney wrote: Ruben de Groot wrote this message on Fri, Aug 01, 2003 at 10:15 +0200: On Fri, Aug 01, 2003 at 04:33:08AM +0200, mh typed: The following comparison is probably bogus, but can anybody explain the huge difference? It's called micro optimization. Linux feels the need to special case /dev/zero to /dev/null, and instead of even reading/writing the data, It just ignores the user request, (or does something like set the pages in the user space to be zero'd. Also, dual procs won't help your performance when you run a single process like this. They will if you interleave the page zero'ing they do on both CPU's... 8^p. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [patch] Re: getfsent(3) and spaces in fstab
Simon Barner wrote: The attached patch will allow blanks and tabs for file systems and path names, as long as the are protected by a '\'. For the old fstab style, blanks and tabs are not allowed as delimiters (as it was in the old implementation). You need to add '\\' to the delimited list, so that it is not skipped. You need to add '\\' to the list of characters that can be escaped, or you've just traded the inability to specify '\t' or ' ' for an inability to speciy '\\'. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Assembly Syscall Question
Matthew Dillon wrote: I think the ultimate performance solution is to have some explicitly shared memory between kerneland and userland and store the arguments, error code, and return value there. Being a fairly small package of memory multi-threading would not be an issue as each thread would have its own slot. You need 8K for this, minimally: 4K that's RO user/RW kernel and 4K that's RW user/RW kernel. You can use it for things like zero system call getpid/getuid/etc.. It's also worth a page doubly mapped, with the second mapping with the PG_G bit set on it (to make it RO visible to user space at the sampe place in all programs) to hold the timecounter information; the current timecounter implementation, with a scad of structures, is both wasteful and unnecessary, given that pointer assigns are atomic, so you can implement with only two, which only take a small part of the page. Doing this, you can use a pointer reference and a structure assign, and a compare-pointer-afterwards to make a zero system call gettimeofday() and other calls (consider the benefits to Apache, SQID, and other programs that have hard logging with timestamp requirements). I've also been toying with the idea of putting environp ** in a COW page, and dealing with changes as a fixup operation in the fault handler (really, environp needs to die, to make way for logical name tables; it persists because POSIX and SuS demand that it persist). So, Matt... how does the modified message based system call interface fare in a before-and-after with lmbench? 8-) 8-). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Assembly Syscall Question
Ryan Sommers wrote: When making a system call to the kernel why is it necessary to push the syscall value onto the stack when you don't call another function? The stack is visible in both user space and kernel space; in general, the register space won't be, unless you are on an architecture with an abundance of registers that doesn't do a save/restore on trap entries. By pushing it onto the stack, you are *positive* that the vale is visible. There is also the (small) possibility that the C compiler will take advanatage of the calling conventions to assume that a value will not change over a system call. Short of declaring that all registers are volatile, you can't really guarantee that the registers pushed in will have the values after the call that they had before the call, unless you save and restore all of them (which is more expensive than the copyin, for system calls with 3 arguments or less -- which is most of them; cost, of course, will vary by architecture). Personally, I like to look at the Linux register-based passing mechanism in the same light that they look at the FreeBSD use of the MMU hardware to assist VM, at the cost of increased FreeBSD VM system complexity (i.e. they think our VM is too convoluted, and we think their system calls are too convoluted). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: getfsent(3) and spaces in fstab
Chris BeHanna wrote: What about test%201/mnt/test%201 ufs ro 0 0 ? Ugly, yes, but that's how a lot of tools escape spaces. % is almost infinitely more likely in a path than \; better to use the \ than the % mechanism. Also, the parser can b LALR single token lookahead, wheras % rewuires pushing state (2 token lookahead). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: usb (ucom) driver code comments..?
M. Warner Losh wrote: In message: [EMAIL PROTECTED] JacobRhoden [EMAIL PROTECTED] writes: : I am trying to get a device working which uses ucom, and the ucom code has no : comments whatsoever, I am able to work bits out, I was wondering if there was : any sort of documentation whatsoever on this area? Use the source luke. However, you'll need to look closely at sys/kern/tty* as well. Looking at something like umodem that implements a ucom plug in might be useful too. You might check out the handbook too, but I think that the usb docs there don't specifically cover usb. There was also a patch posted to the -current or -hackers mailing list around about the end of May that switched 0 and 1 and made made this wort of thing work for another USB device. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: gcc segfault on -CURRENT (cvs yesterday)
Kai Mosebach wrote: Trying to compile sapdb fails on a -CURRENT system build yesterday. On a system from 22.July it compiled fine. Any ideas ? This is pretty ugly, but put a space before the ::'s on that line. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Console serial speed
Russell Cattelan wrote: On Sat, 2003-07-26 at 07:12, Daniel Lang wrote: Bruce M Simpson wrote on Sat, Jul 26, 2003 at 10:06:36AM +0100: On Fri, Jul 25, 2003 at 01:06:28PM -0500, Russell Cattelan wrote: How does one set the serial speed of the console. Does specifying BOOT_COMCONSOLE_SPEED=57600 not work? No, I've experienced the same problem years ago. The funny thing is, that it worked on some machines, while it didn't on others. I worked around the problem by putting machdep.conspeed=38400 in /etc/sysctl.conf, so the speed is reset to the right speed, once the system is up. Of course this doesn't work for boot2, loader or the kernel itself. These three components seem to set their console speed in some cases arbitrarly. Yes this seems to be the case. No matter how hard I try to convince sio.c to default to 57600 (I even went as far at setting static volatile speed_tcomdefaultrate = CONSPEED; to 57600) so there would be no confusion. I still end up with random console speed each time. The boot loader did pick up the speed from /etc/make.conf and does come up every time at 57600. You also need to modify the /etc/ttys entry to specify std.57600 (see the examples for /dev/ttyd[0-3]) and to set: options CONSPEED=57600 in your config file. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vpo in ECP
Paulo Roberto wrote: Sorry hackers, I have posted this to [EMAIL PROTECTED], but got no answer... I did set my mainboard BIOS to use ECP transfer mode (dma 3 irq 7). I edited my kernel to: device ppc0 at isa? flags 0x8 irq 7 (is there a way to declare the dma I want to use? config complains if I add dma xyz) and when I boot I get: Not sure if it's actually meaningful for the driver, but the config file syntax is: device ppc0 at isa? flags 0x8 irq 7 drq 3 Jul 1 10:36:42 delta /kernel: ppc0: Parallel port at port 0x378-0x37f irq 7 flags 0x8 on isa0 Jul 1 10:36:42 delta /kernel: ppc0: Generic chipset (ECP-only) in ECP mode Jul 1 10:36:42 delta /kernel: ppc0: FIFO with 16/16/16 bytes threshold Jul 1 10:36:42 delta /kernel: ppi0: Parallel I/O on ppbus0 Jul 1 10:36:42 delta /kernel: imm0: NIBBLE mode unavailable! and my zip-100 drive does not get recognized. Is it possible to use vpo in ECP mode?? EPP and compatible modes are just too damn slow. man ppc. The answer is it depends on your chipset; clearly, it's not had anyone write explicit support for it yet. The manual page goes into some good detail on how to fix this; specifically, you should look at the Adding support to a new chipset section. See also man vpo; it discusses how to force specific modes for your given chipset. Probably, you should add specific support for your chipset instead, so that other people who end up with the same chipset don't end up having to repeat your problems. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: BCM4401 Support for FreeBSD
Joe Marcus Clarke wrote: On Mon, 2003-07-28 at 12:18, Aeefyu wrote: i.e. Broadcom 440x NIC support for FreeBSD 4.x and 5.x (as found on latest Dell's Notebooks - mine is a 8500) Would anyone be so kind to enlighten me on the the current status? Last I heard of developments being made was end of June. This was forwarded to me from Greg Lehey. The dcm driver works okay for me, but I had to hack it for some new bus dma changes. I have noticed a few issues of slowness with it when using it in a more normal sense (i.e. using it to read mail, ssh to machines, etc.). Most likely, the interactive performance is a result of the RX packet coelescing, with a timer set to too long an interval. In general, it's probably an easy adjustment to make. I notice that he also says he didn't know what to add into some of the timer and media routines, so that could be part of the issue as well (e.g. if it's getting wedged and having to unwedge itself). BTW: As a rule of thumb, if you don't know what to put in a timer or media routine, put a printf() there. It will likely annoy someone into fixing it. 8-). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Probing for devices
Geoff Glasson wrote: I'm trying to port the Linux i810 Direct Rendering Interface ( DRI ) kernel module to FreeBSD. I have reached the point where the thing compiles, and I can load it as a kernel module, but it can't find the graphics device. Through a process of elimination I have come to the conclusion that once the AGP kernel module probes and attaches to the i810 graphics device, nothing else can attach to it. When I read the section on PCI devices it implied ( to me at least ) that multiple kernel modules should be able to attach to the same device. I have tried to get it to work without any success. How would you expect the hardware to act if both drivers were being accessed simultaneously? In general, multiple device attaches are only possible on multifunction devices, where the driver claims by function, rather than merely by PCI ID. You will likely need to have the two drivers intentionally cooperate with each other on the management of the device, in order to accomplish your goal. If your driver is an actual port of the Linux driver, rather than a rewrite, licensing dictates that it talk to the FreeBSD driver and ask it to step out of the way so it can do what it wants to do, and that it act gracefully if the FreeBSD driver refuses because someone is using it. Ideally, your driver would leave the FreeBSD driver in place, and ask it to do some of the device management functions you need on behalf of your driver, rather than your driver attempting to do them directly. If your driver is in the exact same ecological niche (e.g. it provides the same AGP services the FreeBSD driver does), then you will need to ensure that only one driver is ever loaded into the kernel at a time. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Where / how to begin the FreeBSD development journey?
Shawn wrote: On Tue, 2003-07-22 at 02:04, Terry Lambert wrote: There's a wide range of options, from the expensive to the free online stuff. At the high end, we have: $1300 https://www.mckusick.com/courses/introorderform.html $1500 https://www.mckusick.com/courses/advorderform.html Sadly, he doesn't teach these classes anymore (due to lack of students), and I don't live anywhere near there. Also, the price is indeed prohibitive. It's the price for the video tapes of the class. I agree that it's prohibitive. He states at the link, though, that he gives a 50% discount to students, and is willing to work with hardship cases to negotiate a fair price. No harm in asking him about it. http://www.vsi.ru/library/Programmer/fbsdkern/ Nice to know, the information in the Developer's Handbook seems to be far more recent and thorough although admittedly PCI DMA information is currently out of date according to the document. The steady state of all documentation is needs to be updated... -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Where / how to begin the FreeBSD development journey?
Shawn wrote: I did peruse the bugs list at the FreeBSD web site curious as to what the current outstanding issue list was, and felt compelled to see if there was anything left open that I might put my hand to and felt a bit overwhelmed. I noticed that there are over 2,000 some entries with some dating as far back as 1996. So, I wasn't exactly sure where one would begin there either. There's a wide range of options, from the expensive to the free online stuff. At the high end, we have: $1300 https://www.mckusick.com/courses/introorderform.html $1500 https://www.mckusick.com/courses/advorderform.html At the low end, we have things like: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/index.html http://www.vsi.ru/library/Programmer/fbsdkern/ You could also look at the Blue Prints column at the web site http://www.daemonnews.org/ http://search.atomz.com/search/?sp-q=blueprintssp-k=Monthly+Ezinesp-a=sp10015f36 There is also a new users section there: http://www.daemonnews.org/new2bsd/ And there is always the mailing list archives: http://docs.freebsd.org/mail/ -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Communications kernel - userland
Robert Watson wrote: Of these approaches, my favorite are writing directly to a file, and using a psuedo-device, depending on the requirements. They have fairly well-defined security semantics (especially if you properly cache the open-time credentials in the file case). I don't really like the Fifo case as it has to re-look-up the fifo each time, and has some odd blocking semantics. Sockets, as I said, involve a lot of special casing, so unless you're already dealing with network code, you probably don't want to drag it into the mix. If you're creating big new infrastructure for a feature, I suppose you could also hook it up as a first class object at the file descriptor level, in the style of kqueue. If it's relatively minor event data, you could hook up a new kqueue event type. You could also just use a special-purpose system call or sysctl if you don't mind a lot of context switching and lack of buffering. I like setting the PG_G bit on the page involved, which maps it into the address space of all processes. 8-). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: running 5.1-RELEASE with no procfs mounted (lockups?)
Pawel Jakub Dawidek wrote: + trussRelies on the event model of procfs; there have been some + initial patches and discussion of migrating truss to ptrace() but + I don't think we have anything very usable yet. I'd be happy to + be corrected on this. :-) Hmm, why to change this behaviour? Is there any functionality that ktrace(1) doesn't provide? It can interactively run in another window, giving you realtime updates on what's happening up to the point of a kernel crash. With ktrace, you are relatively screwed. Another good example is that it dump out information that ktrace can't, because of where it synchronizes. Some people recently have been seeing EAGAIN when they haven't expected it, with the process exiting immediately after that, with no real clue as to where in the code it's happening (e.g. which system call); truss will show this, if run in another terminal window, but ktrace will not (yes, I know it should; it doesn't. If you can't reconcile this with how you think ktrace should work, then fix it). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: complicated downgrade
Valentin Nechayev wrote: I need to downgrade a remote FreeBSD system from 5.1-release to 4.8-release remotely without any local help (except possible hitting Reset). Don't ask why the collocation provider is too ugly and too far from me; it's given and unchangeable. This system never was 4.* (began from 5.0-DP2). I have done this before. The best was to deal with this is to create a 5.1 system locally, and deal with all the problems that come up there, transplanting the resulting scripts to the system in question. Your biggest problems are going to be the creation of the /dev, which will need to occur in an rc.local on reboot, replacing the disklabel boot code, and the changes to the conf file for ssh to operate correctly (you will likely need to regenerate keys). If you can't remotely NFS mount a CDROM, a lot of the work is going to be getting access to installation media. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: MSDOSFS patch of dirty flag (Darwin Import)
Jun Su wrote: I began to import some code from Darwin msdosfs. Here is my first patch about the dirty flag. I patched the msdosfs kernel module and fsck_msdos to enable the flag. Can someone test it and checked in? Must I submit a PR? From my own option, the new features of Darwin's msdosfs are dirty flag, adv_lock and unicode name. I will check them in the next week. Do these features have chance to commit? Not if you never get them published. If you want to send attachments to the mailing list and have them get through, you need to send them as text/plain. The best way to see why your patches are not making it to the mailing list is to look at your last patch posting, and see what's difference between your signature and your file attachments, and make your file attachments look like your signature attachment (since it got through). On a side note, you probably do not want to corsspost between the -hackers and -current lists so much. A lot of us are subscribed to both of them, so we get two copies. For some of us, this triggers SPAM filtering, and we never see your posts, unless we save SPAM to a folder instead of deleting it, and go look at it occasionally. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Communications kernel - userland
Marc Ramirez wrote: I asked this in -questions, but got no response; sorry for the repost. I have a device driver that needs to make requests for data from a userland daemon. What's the preferred method for doing this in 4.8R and 5.1R? I'm assuming the answer is Unix-domain sockets... It depends on the application. In most cases these are set up as request/response protocols. In that case, the best method is to ise an ioctl() or fcntl() (which you use depends on what in the kernel is talking to userland), and then returning to user space with the request. The userland then makes another call back down with the response, and the next wait-for-request. This saves you fully 50% of the protection domain crossing system calls from an ordinary callback, and it saves you 300% of the protection domain crossings of what you would need for a pipe/FIFO/unix-domain-socket. E.g.: userkernel -- REQ1make_req() sleep_waiting_for_available() ioctl(fd, MY_GETREQ, req) sleep_waiting_for_req() copyout() sleep_waiting_for_rsp() ioctl(fd, MY_RSPREQ, req) sleep_waiting_for_req() copyin() ... REQ2make_req() copyout() sleep_waiting_for_rsp() ioctl(fd, MY_RSPREQ, req) sleep_waiting_for_req() copyin() ... ... -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: running 5.1-RELEASE with no procfs mounted (lockups?)
John Baldwin wrote: Since ktrace logs all syscall entries and exits, it should seem that a kdump after the process had exited would show which syscall returned EAGAIN quite easily. This works if the process exits after the EAGAIN; that would only work for the specific error that people are seeing currently. If the process does what it's supposed to do when it sees EAGAIN, and repeats the call, you could get in a tight loop. The ktrace output could be examined after killing the process, but until the process exits, there's really no output that can be examined using kdump. The problem is that ktrace/kdump rendesvous at a file; truss does not, so it has some capabilities that ktrace does not. In some circumstances (e.g. a system crash, where kdump doesn't get a chance to get at the file, because it's cleaned up and not even fully written, when it's not cleaned up) ktrace loses utterly. This is not to say that it's not a useful tool (I use it myself); just that truss has some utility in situations where a tighter coupling between the tracing and the display of the trace information is useful. My second example is a much better case; my first one was mostly designed for a current discussion about EAGAIN, whereas the most utility for truss over ktrace involves an actual system crash, and/or an application that doesn't exit [ab]normally, thus giving you a synchronized trace file to play with. It's really all about loosely couple synchronization (ktrace) vs. tightl couple synchronization (truss). With truss, you can even expect that in many circumstances, you will at least get boundary information, even in the face of a system crash -- this is a situation that ktrace would lose for sure, if the crash couldn't sync. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: USB device programming with ugen [Solved]
Martin wrote: Seems it was my fault. I found a solution in forums. I had to try out many things until someone has pointed me to BIOS settings and assigning interrupt to USB. I noticed it was off and enabled it. It works now. :) (E.a.: no delays while accessing ugen0 and no freeze with X11.) Martin PS.: perhaps someone should mention it somewhere in the FAQ. The attach should have failed, if it didn't have an interrupt. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: portupgrade
Andrew Konstantinov wrote: I've written a simple script to make my life easier, but there is a problem with that script and I can't figure out the source of that problem. [ ... ] The problem is simple. Whenever this script confronts a program which needs to be upgraded, portupgrade removes the old version and then script terminates without any error messages, while several instances of bash and linker continue to run in the background for several seconds, and then also terminate. Try using the real shell instead of bash, and let us know if the problem still happens. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: What ever happened with this? eXperimental bandwidthdelayproduct code
Dan Nelson wrote: In the last episode (Jul 09), Max Clark said: 600/8*.220 = 165Kbytes or 1.32Mbit/s I understand the BDP concept and the calculation to then generate the tcp window sizes. What I don't understand is this... How in the world is a windows 2000 box running commercial software able to push this link to 625KByte/s (5Mbit/s) Perhaps it defaults to a larger window size? You can easily verify this with tcpdump or ethereal. It's a guarantee that they default to a smaller default MSL than the standard permits. It's smaller by a factor of 10; to get the same effect in FreeBSD: sysctl net.inet.tcp.msl=3000 I *do not* recommend mucking with this timer in order to reduce latency; there are a number of nasty session restart and other attacks you can do using this and taking advantage of intimate knowledge of the TCP state machine implementation of state transitions, and it's easier to DOS attack your machine because it's a 10 times shorter trip to run you up past your tolerable latency. I would much more recommend the approaches referenced by my other posting, and listed in the recent FreeBSD-performance mailing list discussion. Note that one of the things Microsoft is specifically required to do when running certain benchmarks is set their registry values to push the MSL back to the standards mandated value. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: What ever happened with this? eXperimental bandwidthdelayproduct code
Max Clark wrote: :) hehe... Okay, let's say how do I force my machine to think it doesn't have any latency and saturate a 6Mbit/s link even though the link has 220ms latency? See the recent discussion on the FreeBSD-performance mailing list. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: raw socket programming SOLVED
David A. Gobeille wrote: Shouldn't the #included files themselves #include headers they are dependant on? With the use of #ifndef and #define in the headers to keep them from being #included more than once? It seems silly(more work) for the programmer to have to arrange everything in a specific order. The order is documented. Maybe we could have one big header file that declares everything in the right order so that no one has to think at all? 8-) 8-). In general, FreeBSD headers are independent. Unless you have compiler support for precompiled header files, there is a significant reduction in compile time for not having promiscuous headers (headers which include other headers). There are also issues having to do with whether or not certain functions or manifest constants end up in the namespace, and conflict with user header definitions. By including only a minimal set of the headers needed to compile the program, you reduce the possibility of such conflicts occurring. As another point: the craftsman should know his tools; in other words, you should be aware of which headers are necessary, because you know what you are doing (or are able to figure it out, and remember it for the next time you need the information). Finally, POSIX compliance -- strict compliance -- requires that certain headers *not* define values into the implementation space. This is particularly problematic when it comes to inline functions wrapped by macros (for example) or compiler built-ins (for another example) when you want to get a specific implementation, or when the compiler built-in is the only way to do something correctly, as in the recent issues FreeBSD had with alloca() and stack alignment with certain cputype values. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5 Advanced networking questions
Socketd wrote: Ok, anyway to prevent sending ICMP's when ttl = 0? Or do I need a firewall? I guess you want to do this so that you can break path MTU discovery and fail to properly exchange packets with the DF bit set in the headers, and which don't take into account intermediate links with smaller MTUs, like VPNs or PPPOE links? What exactly are you getting from disabling ICMP, besides a broken network connection to some systems you may wish to be able to exchange packets with? -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kernel hacking
Sandeep Kumar Davu wrote: I was making changes to 4.5 source code. I tried to recompile the kernel. it compiles well but is not able to link it. I used the function inet_aton in uipc_socket.c This is the error i got. uipc_socket.o(.text+0xid8): undefined refernce to '__inet_aton' I added all the header files that were required. Can anyone tell what is missing. You are trying to call a libc function from within the kernel. In general, you can not use /usr/include headers in kernel code, only /usr/include/sys headers. This is because the kernel is not linked againsts libc. Mostly, you should not be dealing with strings in the kernel; if you are adding a kernel entry point to be called from user space, you should convert the address into a sockaddr_in before you pass it into the kernel, instead. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5 Advanced networking questions
Socketd wrote: I guess you want to do this so that you can break path MTU discovery and fail to properly exchange packets with the DF bit set in the headers, and which don't take into account intermediate links with smaller MTUs, like VPNs or PPPOE links? What exactly are you getting from disabling ICMP, besides a broken network connection to some systems you may wish to be able to exchange packets with? I don't want to disable ICMP, just don't want to respond when ttl=0, meaning when my firewall/gateway is on a traceroute path. You should specifically modify the ICMP code to not respond to echo datagrams, or when ttl == 0, then, and work it that way. In other words, it's time to hack your network stack to specifically add that feature. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adding second-level resolution to cron(8).
Rich Morin wrote: I have a project for which I need a generalized time-based scheduling daemon. cron(8) is almost ideal, but it only has minute-level resolution. So, I'm thinking about modifying cron to add second-level resolution. Before I start, I thought I'd ask a few questions: * Has someone already done this? There are a couple of programs that can do this, but they aren't cron, per se. * Is it a Really Bad Idea for some reason? One really bad thing about using cron as your starting point is that each time it wakes up, it stats the crontab file to see if it has changed (historical cron implementations needed to be sent a SIGHUP instead). It's also lazy about time changes. The net effect of the first is that it wildly spins updating atime on the file; this is bad from a number of perspectives, but the worst thing about it is that there ar a limited number of duty cycles on things like flash memory ATA devices that you would use in an embedded system, so you end up having to do a lot of work to get a copy of the crontab into a ramdisk. The net effect of the second is that cron would, in effect, need to increase its rate of stat to once a second, which means sixty time less MTBF, even if we are talking a disk inode write, rather than a flash device. Again, the fix would have to be move the crontab somewhere else. Really, this type of thing needs a timer set to go off on or after a spcific time, rather than using an interval timer, or the code needs to get a lot smarter about calculating when the next work needs to be done, and use a kqueue to watch the file for modifications, instead, so it can use a select to implement both the watching and the interval timer. It also needs to be smarter about not rereading the file every time it needs to run, and using a cached copy instead. Unfortunately, cron is not a happy program. 8-(. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5 Advanced networking questions
Socketd wrote: On Tue, 08 Jul 2003 04:17:04 -0700 Terry Lambert [EMAIL PROTECTED] wrote: I don't want to disable ICMP, just don't want to respond when ttl=0, meaning when my firewall/gateway is on a traceroute path. You should specifically modify the ICMP code to not respond to echo datagrams, or when ttl == 0, then, and work it that way. In other words, it's time to hack your network stack to specifically add that feature. Hmm, why not just use a firewall? Because most firewalls, even commercial ones, don't block the ICMP messages you appear to be interested in blocking. You appeared to want to turn your FreeBSD box into what's normally called a stealth system: one that doesn't respond at all to external probe attempts. So it looked like you were trying to *write* a firewall, or at least find a set of rules that would let your FreeBSD box act as a stealth one. The current FreeBSD doesn't support stealth; it's generally something you do to stop network finger-printing and/or to use as a base for launching your own attacks and/or in an attempt to protect a Windows box that can't protect itself very well. If you want the feature in FreeBSD, you are going to need to hack some code. If you are willing to go out ans spend money on a stealth firewall box, well, you should feel free to do that, too; if you do, I reccomend SunScreen from Sun Microsystems, though in general, I don't recommend using stealth firewalls, since they break a number of end-to-end guarantees: http://wwws.sun.com/software/securenet/index.html If you want a real firewall, I recommend the Cisco PIX: http://www.cisco.com/warp/public/cc/pd/fw/sqfw500/ I also recommend reading about the drawbacks of using stealth firewalls, to help decide whether you want to avoid attackers by hiding from them, or avoid attackers by having working firewall software which has been usefully auidted, instead. 8-). http://web.proetus.com/reference/stealthfw/ If you just want to avoid ICMP echo datagrams, I'd change my filter criteria from what you are asking (TTL==0) to ICMP type, and filter packets of type 11 and 0 using the ipfw icmptypes option on your filter type. It's not the same thing as a stealth firewall, but it is good enough to handle your initial complaint, which was the ability to traceroute. Then you wouldn't need to buy another machine. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: how to call a syscall in a kernel module?
zhuyi wrote: Dear all: How to call a syscall in a kernel module? In Linux, you can add two line into your source code. #define __KERNEL_SYSCALLS__ #include linux/unistd.h Most system calls call copyin or copyinstr or uiomove, which assumes that the data is in the user process that is active at the time of the system call. This basically boils down to UIO_SYSSPACE vs. UIO_USERSPACE. Because of this, it's generally not possible to call system calls from within the kernel. It's probably reasonable to turn all the system calls into wrappers; there would be an additional function call worth of overhead for all system calls, but the benefit would be to enable calling of all calls from kernel space. The overhead is probably worth it (AIX works this way, and it doesn't seem to hurt them at all). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: recovering data from a truncated vn-file possible?
Josh Brooks wrote: Long story short, I have a 4gig vn-backed filesystem. The file backing it is now missing the last 750megs ... I can vnconfig it, but when I fsck it I see: Probably the first thing you'll want to do is write a small program to open the file and write a zero at the offset of the 750M to make the device the right size. Most of the recovery tools, including fsck, go into convulsions if the device size shrinks on them. So the first thing you want to do is change the size back to what it should be. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: recovering data from a truncated vn-file possible?
Paul Schenkeveld wrote: $ truncate -s original_size file Bah! Why use a utility, when you can write a program? -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: non-32bit-relocation
Andrew wrote: I wrote a small assembly program to send a string to the floppy. I'm not familiar with nasm, the assembly language compiler I'm using and even less familiar with the as program. basically, other code aside, the nasm compiler says, when using -f elf, that it does not support non-32-bit relocations. Ok, I'm not an assembly expert and less familiar with 'elf' so I need some help. There are a number of assembly language resources for FreeBSD; here are a couple of the better ones: http://www.int80h.org/bsdasm/ http://linuxassembly.org/resources.html The instructions were mov es, seg buffer mov bx, [buffer] int 13h. Can anyone tell me how to do this on a FreeBSD machine? You either don't do this at all, or you use the vm86() system call to do the work in a virtual machine. In general, this will not work for this particular use, since INT 0x13 is used to do disk I/O, and the BIOS interface does not own the disk, the OS disk driver does. To do what you want to do, you should probably call the open(2) system call on the disk device from assembly, and then call the write(2) system call from assembly, instead of trying to use INT 0x13. Here's a little program to write some garbage to /dev/fd0; I call it foo.s. Don't run it against a floppy you care about. Compile it with cc -o foo foo.s; it will create a dynamically linked program, unless you tell it not to (e.g. cc -static -o foo foo.s. If you don't want crt0 involved so that you can mix C and assembly, then you should follow the link above to the first tutorial. Here's the code: # # Assembly program to write a string to the floppy # # # Here are my strings # .section.rodata # /dev/fd0 + NUL .device_to_open: .byte0x2f,0x64,0x65,0x76,0x2f,0x66,0x64,0x30,0x0 # my string .string_to_write: .byte0x6d,0x79,0x20,0x73,0x74,0x72,0x69,0x6e,0x67 # # Code goes in text section # .text .p2align 2,0x90 # main() -- called by crt0's _main() function .globl main .typemain,@function main: pushl %ebp# main is a function; this is the movl %esp,%ebp# entry preamble for a function with subl $8,%esp # no parameters addl $-4,%esp # # fd = open(char *path, int flags, int mode) pushl $0 # mode 0 (not used; callee might care though) pushl $2 # flags; 2 = O_RDWR pushl $.device_to_open# path string; must be NUL terminated call open # call open function addl $16,%esp movl %eax,%eax movl %eax,fd # system call return is in %eax (-1 == error) addl $-4,%esp # write(int fd, char *buf, int len) pushl $9 # len = 9 bytes pushl $.string_to_write # buf; does not need NUL termination movl fd,%eax # fd pushl %eax call write# call write function addl $16,%esp addl $-12,%esp # close(fd) movl fd,%eax # fd pushl %eax call close# call close function addl $16,%esp # just return... we could have called exit, but _main will do that leave ret .finish: .sizemain,.finish-main # text segment size .comm fd,4,4 # global variable 'fd' # # done. # -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TODO list?
Simon L. Nielsen wrote: On 2003.06.27 16:10:13 -0700, Joshua Oreman wrote: I currently have a lot of free time and I was wondering whether there was a TODO list of some sort for bugs that need fixing in FreeBSD. I really want to help the project, and I think such a list would make it much easier to do so. If there's no official TODO list, could someone point out some things? I know C/C++, but I'm very unfamiliar with the kernel. Great :-) There is always plenty to do. I would suggest looking at the PR system and at the 'Contributing to FreeBSD' article which can be found at http://www.freebsd.org/doc/en_US.ISO8859-1/articles/contributing/index.html Hope you find something interesting to spend some time on. Give him a commit bit, and he can quickly grind through all the PR's that already have diff's attached to them, and have just sat there forever. All he'd need to do was verify that there was a problem that was being fixed, and the code didn't look like it would cause damage. If it ends up causing damage anyway, the fix can always be backed out later. Making send-pr actually result in code changes would probably be the most valuable thing anyone could do for the project, and it would give him a chance to read and to understand a lot of diverse code, in the process, to get up to speed on writing his own fixes for PR's without fixes attached. Just my $0.02... -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Page Coloring Defines in vm_page.h
Bruce M Simpson wrote: Something occurred to me whilst I was re-reading the 'Design Elements' article over the weekend; our page coloring algorithm, as it stands, might not be optimal for non-Intel CPUs. Actually, it does very well for most architectures, and was originally developed into product for SPARC and MIPS (by Sun and MIPS Inc., respectively). There are 29 research papers that specifically mention page coloring; here is the citation: http://citeseer.nj.nec.com/cs?q=page+coloringcs=1 I'm going to claim that this is one of the better ones (;^)), mostly because it has some nice real statistics, and represents a rather fun approach to the problem, which doesn't actually involve page coloring, and touches on the rather neat idea of doing cache affinity as part of the scheduler operation: http://citeseer.nj.nec.com/410298.html Also, it has a couple of nice references to related works you aren't going to find online: you will need to find yourself a technical library; references 1, 6, 8, 9, and 12 especially. - Why is this important? The vm code as it stands assumes that we colour for L2 cache. This might not be optimal for all architectures, or, indeed, forthcoming x86 chips. The code previously colored for both L1 and L2. It turns out that the penalty you pay for L1 set overlap is relatively low, comparatively, due to the difference in comparative size between L1 and L2. See also: http://citeseer.nj.nec.com/kessler92page.html - The defines in vm_page.h seem to assume a 4-way set associative cache organization. Yes. This was the most common L2 cache hardware arrangement at the time. There were a couple of good postings on page coloring on the FreeBSD lists back in the mid 1990's by John Dyson, who was the original implementor of both the page coloring code and the unified VM and buffer cache code, which Poul was complaining was incomplete, in the recent FS stacking thread on -arch. You can probably find them in the archives on Minnie (Warren Toomey's archival machine). - If someone could fill me in on how the primes are arrived at that would be very helpful. It's an attempt to get a perfect hash without collision; page coloring relies on avoiding cache line overlap, if at all possible (sometimes it isn't, which is why the page coloring compiler work and the cache affinity scheduler are such intriguing ideas, to me at least, though the compiler work would probably be incredibly hard to tune in any useful fashion to work across an entire CPU family, rather than specific CPU instances). -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]