Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x
On Thu, Oct 27, 2005 at 04:53:35PM -0400, der Mouse wrote: - Implements FreeBSD's devfs on NetBSD. In the past, we (NetBSD folks) have talked about a devfs. [...persistence...] FreeBSD 5+ has /etc/devfs.conf and /etc/devfs.rules [...]. Is that what you're looking for? I didn't write what you're responding to, but, speaking personally: If making changes to /dev with chmod/chown/mv/etc results in those files getting rewritten to match, it's fine. If not, it's not. I think that having those changes propogated to the userland config files would be very good. I'd really want NetBSD to do it, and I bet it could be a useful feature for FreeBSD too. Take care, Bill pgpPFSuyGvQ6e.pgp Description: PGP signature
Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x
On Mon, Oct 24, 2005 at 10:35:47PM +0200, Hans Petter Selasky wrote: Main features: - Implements FreeBSD's devfs on NetBSD. In the past, we (NetBSD folks) have talked about a devfs. One issue that has come up (I'll be honest, I've raised it a lot) is a desire to retain permission changes across boots, and to tie devices (when possible) to a device-specific attribute rather than a probe order. Does FreeBSD's devfs support locators and persistent information? Are there plans to support something like that, if not? Take care, Bill pgpr3OJEGJ3wV.pgp Description: PGP signature
Re: kqueue, NOTE_EOF
On Wed, Nov 12, 2003 at 09:58:15AM +0100, Jaromir Dolecek wrote: marius aamodt eriksen wrote: hi - in order to be able to preserve consistent semantics across poll, select, and kqueue (EVFILT_READ), i propose the following change: on EVFILT_READ, add an fflag NOTE_EOF which will return when the file pointer *is* at the end of the file (effectively always returning on EVFILT_READ, but setting the NOTE_EOF flag when it is at the end). specifically, this allows libevent[1] to behave consistently across underlying polling infrastructures (this has become a practical issue). I'm not sure I understand what is the exact issue. I'm only responding to the notes also. Why would this be necessary or what does this exactly solve? AFAIK poll() doesn't set any flags in this case neither, so I don't see how this is inconsistent. I think the difference is in the default behavior. When you're at EOF, I know that poll() will give you a read-availability event, so you'll read the EOF. Will kqueue? BTW, shouldn't the EOF flag be cleared when the file is extended? Probably. Take care, Bill pgp0.pgp Description: PGP signature
Re: Technical Differences of *BSD and Linux
On Fri, 24 Jan 2003, arief_mulya wrote: Dear all, I Apologize, If this thread has existed before, and so if this is very offtopic and tiredsome for most of you here. I'm a newbie, and just about to get my feet wet into the kernel-code, been using (GNU/)Linux (or whatever the name is, I personally don't really care, I caremost at the technical excellence) for the last two years, I personally think it's a toupper(great); system. But after recently reviewing some BSD based systems, I began to wonder. And these are my questions (I'm trying to avoid flame and being a troll here, so if there's any of my questions is not on technical basis, or are being such a jerk troll please just trash filter my name and email address): Evidently others opted to not pursue that option. 1. In what technical area of the kernel are Linux and *BSD differ? 2. How does it differ? What are the technical reasoning behind the decisions? They differ in most technical areas. Mainly as the *BSD kernels were derived from 4.4-Lite, and Linux was derived, I believe, from Minux. The difference grew since they were developed by differing groups of people. Within the BSDs, the main focus of each one is different. To put it in terms of sound bites, FreeBSD wants to make kick-ass servers, NetBSD wants to support lots lots of hardware, and OpenBSD is concerned all about security. That doesn't mean that the others ignore those areas; all three are interested in security, and being servers, and they all run on more than just one platform. There also is a lot of polination between BSDs. Things will show up in one and then get ported to another. 3. Is there any group of developer from each project that review each other changes, and tries to make the best code out, or is the issues very system specific (something that work best on Linux might not be so on FreeBSD or NetBSD or OpenBSD)? Sometimes changes will apply to all, and a comparable fix will happen to each. This usually shows up in dealing with security advisories, but happens in other places too. For the most part though, what the BSDs need is different from what Linux needs, or at least the expertise doesn't overlap. 4. Any chance of merging the very best part of each kernel? 5. Or is it possible to do so? No, I don't forsee merging. der Mouse pointed out the GPL issue, which is one where I think the BSD and Linux folks will just agree to disagree. Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Temperature
On Wed, 29 Dec 1999, Ted Sikora wrote: It is HOTTER under FreeBSD. Immediatelly upon boot-up it's 26F hotter under FreeBSD than under Linux. Sometime after 3.4-RC and now this started. (I follow the stable branch via CVSup) Under 3.3-STABLE the temerature was always the same as Linux...cool averaging 89F for the CPU's. Now it's over 113F under FreeBSD only. I know it's wierd but the machine does not lie. Under Linux it's the same as before 87-89F. The big question of course is are you sure the machine's not lying? I agree with you that i't unlikely that it's a fundamental hardware problem (like you're getting no air flow) if Linux still reports sane temperatures. But it seems quite reasonable that somehow temperture reading broke for your hardware when you upgraded. Two easy ways to settle the issue would be either to get a thermocouple thermometer, put the thermocouple in the case, and see exactly what happens with the case temperature. Another easy way to settle it is to get the voltages being returned for the temperature sensors as opposed to the reported temperature. If the voltages are the same under the two OS's, then it's definitly a reporting error. :-) Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Portable way to compare struct stat's?
On Mon, 15 Nov 1999, Kelly Yancey wrote: Is there a portable method for determining if the contents of two struct stat's are identical? I believe there is not. The problem is that while What exactly are you trying to do? i.e. why are you comparing the struct stat's? Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Status of UMAPFS
On Sat, 16 Oct 1999, Zhihui Zhang wrote: On Fri, 15 Oct 1999, Zhihui Zhang wrote: Is the UMAPFS working? I add "options UMAPFS" to the configuration file of FreeBSD 3.3-Release and rebuilt the kernel. I got the following errors: loading kernel umap_vnops.o: In function `umap_lock': umap_vnops.o(.text+0x568): undefined reference to `null_bypass' umap_vnops.o: In function `umap_unlock': umap_vnops.o(.text+0x58e): undefined reference to `null_bypass' *** Error code 1 Stop. I find out that you must also include NULLFS in the kernel to compile. I have tested NULLFS and UMAPFS with some trivial commands. Both works. In NetBSD, we changed these two references to be to umap_bypass since it is, after all, umapfs. :-) Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mounting a partition more than once
On Mon, 13 Sep 1999, Tony Finch wrote: Well, in the absence of any comments I hacked around a bit and ended up with the following patch (against 3.3-RC), which permits the same block device to be mounted read-only more than once. The motivation for this is to permit multiple chrooted environments to share the same /usr partition. Wouldn't it be much cleaner to use nullfs? This application is what it's good at. Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: The meaning of LK_INTERLOCK
On Wed, 13 Oct 1999, Zhihui Zhang wrote: The comments say that the flag LK_INTERLOCK means "unlock passed simple lock after getting lk_interlock". Under what circumstances are we going to need two simple locks (release the first one after getting the second one)? I can not understand this easily from the source code. Any help is appreciated. The idea is that the other interlock protects something whose value determines if we want to grab the lock. For example, vn_lock() grabs the vnode interlock and looks at v_flag. If VXLOCK is clear, we then call VOP_LOCK. By doing this interlock trick, no one can get in and modify the flags before we've entered the lock manager. Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: The usage of MNT_RELOAD
On Wed, 8 Sep 1999, Zhihui Zhang wrote: Does fsck have to run on a MOUNTED filesystem? If so, your answer makes sense to me: if fsck modifies the on-disk copy of the superblock, it does not have to unmount and then remount the filesystem, it only need to reload the superlock for disk. I think it's more for the case where fsck has to run on a filesystem which is mounted. It's better to fsck unmounted filesystems, but you don't always have that option (say you want to fsck the fs with fsck on it :-) Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: The usage of MNT_RELOAD
On Wed, 8 Sep 1999, Zhihui Zhang wrote: Does fsck have to run on a MOUNTED filesystem? If so, your answer makes sense to me: if fsck modifies the on-disk copy of the superblock, it does not have to unmount and then remount the filesystem, it only need to reload the superlock for disk. I think it's more for the case where fsck has to run on a filesystem which is mounted. It's better to fsck unmounted filesystems, but you don't always have that option (say you want to fsck the fs with fsck on it :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Terry Lambert wrote: Right. That exported struct lock * makes locking down to the lowest-level file easy - you just feed it to the lock manager, and you're locking the same lock the lowest level fs uses. You then lock all vnodes stacked over this one at the same time. Otherwise, you just call VOP_LOCK below and then lock yourself. I think this defeats the purpose of the stacking architecture; I think that if you look at an unadulterated NULLFS, you'll see what I mean. Please be more precise. I have looked at an unadulterated NULLFS, and found it lacking. I don't see how this change breaks stacking. Intermediate FS's should not trap VOP's that are not applicable to them. True. But VOP_LOCK is applicable to layered fs's. :-) One of the purposes of doing a VOP_LOCK on intermediate vnodes that aren't backing objects is to deal with the global vnode pool management. I'd really like FS's to own their vnode pools, but even without that, you don't need the locking, since you only need to flush data on vnodes that are backing objects. If we look at a stack of FS's with intermediate exposure into the namespace, then it's clear that the issue is really only applicable to objects that act as a backing store: ---- FSExposed in hierarchyBacking object ---- top yes no intermediate_1no no intermediate_2no yes intermediate_3yes no bottomno yes ---- So when we lock "top", we only lock in intermediate_2 and in bottom. No. One of the things Heidemann notes in his dissertation is that to prevent deadlock, you have to lock the whole stack of vnodes at once, not bit by bit. i.e. there is one lock for the whole thing. Actually isn't the only problem when you have vnode fan-in (union FS)? i.e. a plain compressing layer should not introduce vnode locking problems. If it's a block compression layer, it will. Also a translation layer; consider a pure Unicode system that wants to remotely mount an FS from a legacy system. To do this, it needs to expand the pages from the legacy system [only it can, since the legacy system doesn't know about Unicode] in a 2:1 ratio. Now consider doing a byte-range lock on a file on such a system. To propogate the lock, you have to do an arithmetic conversion at the translation layer. This gets worse if the lower end FS is exposed in the namespace as well. Wait. byte-range locking is different from vnode locking. I've been talking about vnode locking, which is different from the byte-range locking you're discussing above. Nope. The problem is that while stacking (null, umap, and overlay fs's) work, we don't have the coherency issues worked out so that upper layers can cache data. i.e. so that the lower fs knows it has to ask the uper layers to give pages back. :-) But multiple ls -lR's work fine. :-) With UVM in NetBSD, this is (supposedly) not an issue. UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM system. You could actually think of it this way, as well: only FS's that contain vnodes that provide backing should implement VOP_GETPAGES and VOP_PUTPAGES, and all I/O should be done through paging. Right. That's part of UBC. :-) Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote: Yes, but we need subsecond in the filesystems. Think about make(1) on a blinding fast machine... Oh yes, I realize that. :-) It's just that I thought you were at one point suggesting having 128 bits to the left of the decimal point (128 bits worth of seconds). I was trying to say that'd be a bit much. :-) Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Terry Lambert wrote: Right. That exported struct lock * makes locking down to the lowest-level file easy - you just feed it to the lock manager, and you're locking the same lock the lowest level fs uses. You then lock all vnodes stacked over this one at the same time. Otherwise, you just call VOP_LOCK below and then lock yourself. I think this defeats the purpose of the stacking architecture; I think that if you look at an unadulterated NULLFS, you'll see what I mean. Please be more precise. I have looked at an unadulterated NULLFS, and found it lacking. I don't see how this change breaks stacking. Intermediate FS's should not trap VOP's that are not applicable to them. True. But VOP_LOCK is applicable to layered fs's. :-) One of the purposes of doing a VOP_LOCK on intermediate vnodes that aren't backing objects is to deal with the global vnode pool management. I'd really like FS's to own their vnode pools, but even without that, you don't need the locking, since you only need to flush data on vnodes that are backing objects. If we look at a stack of FS's with intermediate exposure into the namespace, then it's clear that the issue is really only applicable to objects that act as a backing store: ---- FSExposed in hierarchyBacking object ---- top yes no intermediate_1no no intermediate_2no yes intermediate_3yes no bottomno yes ---- So when we lock top, we only lock in intermediate_2 and in bottom. No. One of the things Heidemann notes in his dissertation is that to prevent deadlock, you have to lock the whole stack of vnodes at once, not bit by bit. i.e. there is one lock for the whole thing. Actually isn't the only problem when you have vnode fan-in (union FS)? i.e. a plain compressing layer should not introduce vnode locking problems. If it's a block compression layer, it will. Also a translation layer; consider a pure Unicode system that wants to remotely mount an FS from a legacy system. To do this, it needs to expand the pages from the legacy system [only it can, since the legacy system doesn't know about Unicode] in a 2:1 ratio. Now consider doing a byte-range lock on a file on such a system. To propogate the lock, you have to do an arithmetic conversion at the translation layer. This gets worse if the lower end FS is exposed in the namespace as well. Wait. byte-range locking is different from vnode locking. I've been talking about vnode locking, which is different from the byte-range locking you're discussing above. Nope. The problem is that while stacking (null, umap, and overlay fs's) work, we don't have the coherency issues worked out so that upper layers can cache data. i.e. so that the lower fs knows it has to ask the uper layers to give pages back. :-) But multiple ls -lR's work fine. :-) With UVM in NetBSD, this is (supposedly) not an issue. UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM system. You could actually think of it this way, as well: only FS's that contain vnodes that provide backing should implement VOP_GETPAGES and VOP_PUTPAGES, and all I/O should be done through paging. Right. That's part of UBC. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote: Yes, but we need subsecond in the filesystems. Think about make(1) on a blinding fast machine... Oh yes, I realize that. :-) It's just that I thought you were at one point suggesting having 128 bits to the left of the decimal point (128 bits worth of seconds). I was trying to say that'd be a bit much. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Need some advice regarding portable user IDs
On Tue, 17 Aug 1999, Brian C. Grayson wrote: On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote: A group of us at Apple are trying to figure out how to handle situations where a filesystem with foreign user ID's are present. Have you looked at mount_umap(8)? I (naively) think it would solve most of your concerns. I don't think so. umap is for translating credentials between domains. I think what Fred wants to do is different, and that is to ignore the credentials on the system. Fred, right now what happens in MacOS when I take a disk which has sharing credentials set up, and hook it into another machine? How are the credentials handled there? Also, one of the problems which has been brought up in the thread is that umap needs to know what credentials to translate to. For that, we'd need to stash the credentails on the drive. Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Need some advice regarding portable user IDs
On Tue, 17 Aug 1999, Brian C. Grayson wrote: On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote: A group of us at Apple are trying to figure out how to handle situations where a filesystem with "foreign" user ID's are present. Have you looked at mount_umap(8)? I (naively) think it would solve most of your concerns. I don't think so. umap is for translating credentials between domains. I think what Fred wants to do is different, and that is to ignore the credentials on the system. Fred, right now what happens in MacOS when I take a disk which has sharing credentials set up, and hook it into another machine? How are the credentials handled there? Also, one of the problems which has been brought up in the thread is that umap needs to know what credentials to translate to. For that, we'd need to stash the credentails on the drive. Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Terry Lambert wrote: Right. That exported struct lock * makes locking down to the lowest-level file easy - you just feed it to the lock manager, and you're locking the same lock the lowest level fs uses. You then lock all vnodes stacked over this one at the same time. Otherwise, you just call VOP_LOCK below and then lock yourself. I think this defeats the purpose of the stacking architecture; I think that if you look at an unadulterated NULLFS, you'll see what I mean. Please be more precise. I have looked at an unadulterated NULLFS, and found it lacking. I don't see how this change breaks stacking. Intermediate FS's should not trap VOP's that are not applicable to them. True. But VOP_LOCK is applicable to layered fs's. :-) One of the purposes of doing a VOP_LOCK on intermediate vnodes that aren't backing objects is to deal with the global vnode pool management. I'd really like FS's to own their vnode pools, but even without that, you don't need the locking, since you only need to flush data on vnodes that are backing objects. If we look at a stack of FS's with intermediate exposure into the namespace, then it's clear that the issue is really only applicable to objects that act as a backing store: ---- FSExposed in hierarchyBacking object ---- top yes no intermediate_1no no intermediate_2no yes intermediate_3yes no bottomno yes ---- So when we lock "top", we only lock in intermediate_2 and in bottom. No. One of the things Heidemann notes in his dissertation is that to prevent deadlock, you have to lock the whole stack of vnodes at once, not bit by bit. i.e. there is one lock for the whole thing. Actually isn't the only problem when you have vnode fan-in (union FS)? i.e. a plain compressing layer should not introduce vnode locking problems. If it's a block compression layer, it will. Also a translation layer; consider a pure Unicode system that wants to remotely mount an FS from a legacy system. To do this, it needs to expand the pages from the legacy system [only it can, since the legacy system doesn't know about Unicode] in a 2:1 ratio. Now consider doing a byte-range lock on a file on such a system. To propogate the lock, you have to do an arithmetic conversion at the translation layer. This gets worse if the lower end FS is exposed in the namespace as well. Wait. byte-range locking is different from vnode locking. I've been talking about vnode locking, which is different from the byte-range locking you're discussing above. Nope. The problem is that while stacking (null, umap, and overlay fs's) work, we don't have the coherency issues worked out so that upper layers can cache data. i.e. so that the lower fs knows it has to ask the uper layers to give pages back. :-) But multiple ls -lR's work fine. :-) With UVM in NetBSD, this is (supposedly) not an issue. UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM system. You could actually think of it this way, as well: only FS's that contain vnodes that provide backing should implement VOP_GETPAGES and VOP_PUTPAGES, and all I/O should be done through paging. Right. That's part of UBC. :-) Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Need some advice regarding portable user IDs
On Tue, 17 Aug 1999, Brian C. Grayson wrote: On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote: A group of us at Apple are trying to figure out how to handle situations where a filesystem with "foreign" user ID's are present. Have you looked at mount_umap(8)? I (naively) think it would solve most of your concerns. I don't think so. umap is for translating credentials between domains. I think what Fred wants to do is different, and that is to ignore the credentials on the system. Fred, right now what happens in MacOS when I take a disk which has sharing credentials set up, and hook it into another machine? How are the credentials handled there? Also, one of the problems which has been brought up in the thread is that umap needs to know what credentials to translate to. For that, we'd need to stash the credentails on the drive. Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Need some advice regarding portable user IDs
On Wed, 18 Aug 1999, Wilfredo Sanchez wrote: I think Mac OS 8 will forget about the credentials. I don't actually know much about how sharing works. But the current file sharing behaviour is not entirely useful to think about, because it doesn't effect the local permissions (much), and the local permission are what I'm worried about. Exported filesystems are another story, and I don't want to compilcate things too much by worrying about that right now. My thought here was more that this was the closest thing to prior art that MacOS has, and that that might be a good user experience to emulate. ;-) Probably the thing to do is either have options to the mount call which have the mounting user own everything, or to set up a umap which maps the desired user to root for access on the filesystem. Take care, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote: In message pine.sol.3.96.990816105106.27345h-100...@marcy.nas.nasa.gov, Bill Studenmund writes: On Sat, 14 Aug 1999, Terry Lambert wrote: Matt doesn't represent the FreeBSD project, and even if he rewrites the VFS subsystem so he can understand it, his rewrite would face considerable resistance on its way into FreeBSD. I don't think there is reason to rewrite it, but there certainly are areas that need fixing. Whew! That's reasuring. I agree there are things which need fixing. It'd be nice if both NetBSD and FreeBSD could fix things in the same way. The use of the vfs_default to make unimplemented VOP's fall through to code which implements function, while well intentioned, is misguided. I beg to differ. The only difference is that we pass through multiple layers before we hit the bottom of the stack. There is no loss of functionality but significant gain of clarity and modularity. If I understood the issue, it is that the leaf fs's (the bottom ones) would use a default routine for non-error functionality. I think Terry's point (which I agree with) was that a leaf fs's default routine should only return errors. 3. The filesystem itself is broken for Y2038 One other suggestion I've heard is to split the 64 bits we have for time into 44 bits for seconds, and 20 bits for microseconds. That's more than enough modification resolution, and also pushes things to past year 500,000 AD. Versioning the indoe would cover this easily. This would be misguided, and given the current speed of evolution lead to other problems far before 2038. Both struct timespec and struct timeval are major mistakes, they make arithmetic on timestamps an expensive operation. Timestamps should be stored as integers using an fix-point notations, for instance 64bits with 32bit fractional seconds (the NTP timestamp), or in the future 128/48. I like that idea. One thing I should probably mention is that I'm not suggesting we ever do arighmetic on the 44/20 number, just we store it that way. struct inode would contain time fields in whatever format the host prefers, with the 44/20 stuff only being in struct dinode. Converting from 44/20 would only happen on initial read. Math would happen on the host format version. :-) If time structures go to 64/32 fixed-point math, then my suggestion can be re-phrased as storing 44.20 worth of that number in the on-disk inode. Extending from 64 to 128bits would be a cheap shift and increased precision and range could go hand in hand. I doubt we need more than 64 bit times. 2^63 seconds works out to 292,279,025,208 years, or 292 (american) billion years. Current theories put the age of the universe at I think 12 to 16 billion years. So 64-bit signed times in seconds will cover from before the big bang to way past any time we'll be caring about. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote: In message pine.sol.3.96.990818101005.14430b-100...@marcy.nas.nasa.gov, Bill Studenmund writes: Whew! That's reasuring. I agree there are things which need fixing. It'd be nice if both NetBSD and FreeBSD could fix things in the same way. Well, that still remains to be seen... :-) I doubt we need more than 64 bit times. 2^63 seconds works out to 292,279,025,208 years, or 292 (american) billion years. Current theories put the age of the universe at I think 12 to 16 billion years. So 64-bit signed times in seconds will cover from before the big bang to way past any time we'll be caring about. :-) I was unclear. I was refering to the seconds side of things. Sub-second resolution would need other bits. Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Terry Lambert wrote: Right. That exported struct lock * makes locking down to the lowest-level file easy - you just feed it to the lock manager, and you're locking the same lock the lowest level fs uses. You then lock all vnodes stacked over this one at the same time. Otherwise, you just call VOP_LOCK below and then lock yourself. I think this defeats the purpose of the stacking architecture; I think that if you look at an unadulterated NULLFS, you'll see what I mean. Please be more precise. I have looked at an unadulterated NULLFS, and found it lacking. I don't see how this change breaks stacking. Intermediate FS's should not trap VOP's that are not applicable to them. True. But VOP_LOCK is applicable to layered fs's. :-) One of the purposes of doing a VOP_LOCK on intermediate vnodes that aren't backing objects is to deal with the global vnode pool management. I'd really like FS's to own their vnode pools, but even without that, you don't need the locking, since you only need to flush data on vnodes that are backing objects. If we look at a stack of FS's with intermediate exposure into the namespace, then it's clear that the issue is really only applicable to objects that act as a backing store: ---- FSExposed in hierarchyBacking object ---- top yes no intermediate_1no no intermediate_2no yes intermediate_3yes no bottomno yes ---- So when we lock top, we only lock in intermediate_2 and in bottom. No. One of the things Heidemann notes in his dissertation is that to prevent deadlock, you have to lock the whole stack of vnodes at once, not bit by bit. i.e. there is one lock for the whole thing. Actually isn't the only problem when you have vnode fan-in (union FS)? i.e. a plain compressing layer should not introduce vnode locking problems. If it's a block compression layer, it will. Also a translation layer; consider a pure Unicode system that wants to remotely mount an FS from a legacy system. To do this, it needs to expand the pages from the legacy system [only it can, since the legacy system doesn't know about Unicode] in a 2:1 ratio. Now consider doing a byte-range lock on a file on such a system. To propogate the lock, you have to do an arithmetic conversion at the translation layer. This gets worse if the lower end FS is exposed in the namespace as well. Wait. byte-range locking is different from vnode locking. I've been talking about vnode locking, which is different from the byte-range locking you're discussing above. Nope. The problem is that while stacking (null, umap, and overlay fs's) work, we don't have the coherency issues worked out so that upper layers can cache data. i.e. so that the lower fs knows it has to ask the uper layers to give pages back. :-) But multiple ls -lR's work fine. :-) With UVM in NetBSD, this is (supposedly) not an issue. UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM system. You could actually think of it this way, as well: only FS's that contain vnodes that provide backing should implement VOP_GETPAGES and VOP_PUTPAGES, and all I/O should be done through paging. Right. That's part of UBC. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote: Yes, but we need subsecond in the filesystems. Think about make(1) on a blinding fast machine... Oh yes, I realize that. :-) It's just that I thought you were at one point suggesting having 128 bits to the left of the decimal point (128 bits worth of seconds). I was trying to say that'd be a bit much. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Need some advice regarding portable user IDs
On Tue, 17 Aug 1999, Brian C. Grayson wrote: On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote: A group of us at Apple are trying to figure out how to handle situations where a filesystem with foreign user ID's are present. Have you looked at mount_umap(8)? I (naively) think it would solve most of your concerns. I don't think so. umap is for translating credentials between domains. I think what Fred wants to do is different, and that is to ignore the credentials on the system. Fred, right now what happens in MacOS when I take a disk which has sharing credentials set up, and hook it into another machine? How are the credentials handled there? Also, one of the problems which has been brought up in the thread is that umap needs to know what credentials to translate to. For that, we'd need to stash the credentails on the drive. Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Need some advice regarding portable user IDs
On Wed, 18 Aug 1999, Wilfredo Sanchez wrote: I think Mac OS 8 will forget about the credentials. I don't actually know much about how sharing works. But the current file sharing behaviour is not entirely useful to think about, because it doesn't effect the local permissions (much), and the local permission are what I'm worried about. Exported filesystems are another story, and I don't want to compilcate things too much by worrying about that right now. My thought here was more that this was the closest thing to prior art that MacOS has, and that that might be a good user experience to emulate. ;-) Probably the thing to do is either have options to the mount call which have the mounting user own everything, or to set up a umap which maps the desired user to root for access on the filesystem. Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Need some advice regarding portable user IDs
On Wed, 18 Aug 1999, Chris Dillon wrote: I'm probably being extremely naive myself, but I just envisioned a scenario like this (pardon me if someone else has already suggested this): When a filesystem is mounted as foreign (HOW that is determined I won't talk about), every file in the filesytem has its credentials mapped to that of the mountpoint. File mode bits are not remapped in any way. New files gain the credentials of their _foreign_ parent. That's the skinny. Now I'll give a (much longer) example to clarify. Sounds fine, except I'd have the owner group passed in in the initial mount, rather than taken from the mount point. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Tue, 17 Aug 1999, Terry Lambert wrote: 2.Advisory locks are hung off private backing objects. I'm not sure. The struct lock * is only used by layered filesystems, so they can keep track both of the underlying vnode lock, and if needed their own vnode lock. For advisory locks, would we want to keep track both of locks on our layer and the layer below? Don't we want either one or the other? i.e. layers bypass to the one below, or deal with it all themselves. I think you want the lock on the intermediate layer: basically, on every vnode that has data associated with it that is unique to a layer. Let's not forget, also, that you can expose a layer into the namespace in one place, and expose it covered under another layer, at another. If you locked down to the backing object, then the only issue you would be left with is one or more intermediate backing objects. Right. That exported struct lock * makes locking down to the lowest-level file easy - you just feed it to the lock manager, and you're locking the same lock the lowest level fs uses. You then lock all vnodes stacked over this one at the same time. Otherwise, you just call VOP_LOCK below and then lock yourself. For a layer with an intermediate backing object, I'm prepared to declare it "special", and proxy the operation down to any inferior backing object (e.g. a union FS that adds files from two FS's together, rather than just directoriy entry lists). I think such layers are the exception, not the rule. Actually isn't the only problem when you have vnode fan-in (union FS)? i.e. a plain compressing layer should not introduce vnode locking problems. I think that export policies are the realm of /etc/exports. The problem with each FS implementing its own policy, is that this is another place that copyinstr() gets called, when it shouldn't. Well, my thought was that, like with current code, most every fs would just call vfs_export() when it's presented an export operation. But by retaining the option of having the fs do its own thing, we can support different export semantics if desired. Right. The "covering" operation is not the same as the "marking as covered" operation. Both need to be at the higher level. Not really. Julian Elisher had code that mounted a /devfs under / automatically, before the user was ever allowed to see /. As a result, the FS that you were left with was indistinguishable from what I describe. The only real difference is that, as a translucent mount over /devfs, the one I describe would be capable of implementing persistant changes to the /devfs, as whiteouts. I don't think this is really that desirable, but some people won't accept a devfs that doesn't have traditional persistance semantics (e.g. "chmod" vs. modifying a well known kernel data structure as an administrative operation). That wouldn't be hard to do. :-) I guess the other difference is that you don't have to worry about large minor numbers when you are bringing up a new platform via NFS from an old platform that can't support large minors in its FS at all. ;-). True. :-) I would resolve this by passing a standard option to the mount code in user space. For root mounts, a vnode is passed down. For other mounts, the vnode is parsed and passed if the option is specified. Or maybe add a field to vfsops. This info says what the mount call will expect (I want a block device, a regular file, a directory, etc), so it fits. :-) Also, if we leave it to userland, what happens if someone writes a program which calls sys_mount with something the fs doesn't expect. :-) I think that you will only be able to find rare examples of FS's that don't take device names as arguments. But for those, you don't specify the option, and it gets "NULL", and whatever local options you specify. I agree I can't see a leaf fs not taking a device node. But layered fs's certainly will want something else. :-) The point is that, for FS's that can be both root and sub-root, the mount code doesn't have to make the decision, it can be punted to higher level code, in one place, where the code can be centrally maintained and kept from getting "stale" when things change out from under it. True. And with good comments we can catch the times when the centrally located code changes brakes an assumption made by the fs. :-) Except for a minor buglet with device nodes, stacking works in NetBSD at present. :-) Have you tried Heidemann's student's stacking layers? There is one encryption, and one per-file compression with namespace hiding, that I think it would be hard pressed to keep up with. But I'll give it the benefit of the doubt. 8-). Nope. The problem is that while stacking (null, umap, and overlay fs's) work, we don't have the coherency issues worked out so that upper layers can cache data. i.e. so that the lower fs knows it has to ask the uper layers to give pages back. :-) But multiple
Re: BSD XFS Port BSD VFS Rewrite
On Tue, 17 Aug 1999, Michael Hancock wrote: As I recall most of FBSD's default routines are also error routines, if the exceptions were a problem it would would be trivial to fix. I think fixing resource allocation/deallocation for things like vnodes, cnbufs, and locks are a higher priority for now. There are examples such as in detached threading where it might make sense for the detached child to be responsible for releasing resources allocated to it by the parent, but in stacking this model is very messy and unnatural. This is why the purpose of VOP_ABORTOP appears to be to release cnbufs but this is really just an ugly side effect. With stacking the code that allocates should be the code that deallocates. Substitute, code with layer to be more correct. I fixed a lot of the vnode and locking cases, unfortunately the ones that remain are probably ugly cases where you have to reacquire locks that had to be unlocked somewhere in the executing layer. See VOP_RENAME for an example. Compare the number of WILLRELEs in vnode_if.src in FreeBSD and NetBSD, ideally there'd be none. I've compared the two, and making the NetBSD number match the FreeBSD number is one of my goals. :-) Any suggestions, or just plodfix? Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Tue, 17 Aug 1999, Terry Lambert wrote: 2.Advisory locks are hung off private backing objects. I'm not sure. The struct lock * is only used by layered filesystems, so they can keep track both of the underlying vnode lock, and if needed their own vnode lock. For advisory locks, would we want to keep track both of locks on our layer and the layer below? Don't we want either one or the other? i.e. layers bypass to the one below, or deal with it all themselves. I think you want the lock on the intermediate layer: basically, on every vnode that has data associated with it that is unique to a layer. Let's not forget, also, that you can expose a layer into the namespace in one place, and expose it covered under another layer, at another. If you locked down to the backing object, then the only issue you would be left with is one or more intermediate backing objects. Right. That exported struct lock * makes locking down to the lowest-level file easy - you just feed it to the lock manager, and you're locking the same lock the lowest level fs uses. You then lock all vnodes stacked over this one at the same time. Otherwise, you just call VOP_LOCK below and then lock yourself. For a layer with an intermediate backing object, I'm prepared to declare it special, and proxy the operation down to any inferior backing object (e.g. a union FS that adds files from two FS's together, rather than just directoriy entry lists). I think such layers are the exception, not the rule. Actually isn't the only problem when you have vnode fan-in (union FS)? i.e. a plain compressing layer should not introduce vnode locking problems. I think that export policies are the realm of /etc/exports. The problem with each FS implementing its own policy, is that this is another place that copyinstr() gets called, when it shouldn't. Well, my thought was that, like with current code, most every fs would just call vfs_export() when it's presented an export operation. But by retaining the option of having the fs do its own thing, we can support different export semantics if desired. Right. The covering operation is not the same as the marking as covered operation. Both need to be at the higher level. Not really. Julian Elisher had code that mounted a /devfs under / automatically, before the user was ever allowed to see /. As a result, the FS that you were left with was indistinguishable from what I describe. The only real difference is that, as a translucent mount over /devfs, the one I describe would be capable of implementing persistant changes to the /devfs, as whiteouts. I don't think this is really that desirable, but some people won't accept a devfs that doesn't have traditional persistance semantics (e.g. chmod vs. modifying a well known kernel data structure as an administrative operation). That wouldn't be hard to do. :-) I guess the other difference is that you don't have to worry about large minor numbers when you are bringing up a new platform via NFS from an old platform that can't support large minors in its FS at all. ;-). True. :-) I would resolve this by passing a standard option to the mount code in user space. For root mounts, a vnode is passed down. For other mounts, the vnode is parsed and passed if the option is specified. Or maybe add a field to vfsops. This info says what the mount call will expect (I want a block device, a regular file, a directory, etc), so it fits. :-) Also, if we leave it to userland, what happens if someone writes a program which calls sys_mount with something the fs doesn't expect. :-) I think that you will only be able to find rare examples of FS's that don't take device names as arguments. But for those, you don't specify the option, and it gets NULL, and whatever local options you specify. I agree I can't see a leaf fs not taking a device node. But layered fs's certainly will want something else. :-) The point is that, for FS's that can be both root and sub-root, the mount code doesn't have to make the decision, it can be punted to higher level code, in one place, where the code can be centrally maintained and kept from getting stale when things change out from under it. True. And with good comments we can catch the times when the centrally located code changes brakes an assumption made by the fs. :-) Except for a minor buglet with device nodes, stacking works in NetBSD at present. :-) Have you tried Heidemann's student's stacking layers? There is one encryption, and one per-file compression with namespace hiding, that I think it would be hard pressed to keep up with. But I'll give it the benefit of the doubt. 8-). Nope. The problem is that while stacking (null, umap, and overlay fs's) work, we don't have the coherency issues worked out so that upper layers can cache data. i.e. so that the lower fs knows it has to ask the uper layers to give pages back. :-) But multiple ls -lR's work
Re: BSD XFS Port BSD VFS Rewrite
On Wed, 18 Aug 1999, Michael Hancock wrote: Interesting, have you read the Heidemann paper that outlines a solution that uses a cache manager? You can probably find it somewhere here, http://www.isi.edu/~johnh/SOFTWARE/UCLA_STACKING/ Nope. I've read his dissertation, and his discussion of the lock management inspired the struct lock * work I did for NetBSD (we use the address of the lock, not the vnode, but other than that it's the same). Thanks for the ref! Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: BSD XFS Port BSD VFS Rewrite
On Sat, 14 Aug 1999, Terry Lambert wrote: I am currently conducting a thorough study of the VFS subsystem in preparation for an all-out effort to port SGI's XFS filesystem to FreeBSD 4.x at such time as SGI gives up the code. Matt Dillon has written in hackers- that the VFS subsystem is presently not well understood by any of the active kernel code contributers and that it will be rewritten later this year. This is obviously of great concern to me in this port. It is of great concern to me that a rewrite, apparently because of non-understanding, is taking place at all. That concerns me too. Many aspects of the 4.4 vnode interface were there for specific reasons. Even if they were hack solutions, to re-write them because of a lack of understanding is dangerous as the new code will likely run into the same problems as before. :-) Also, it behooves all the *BSD's to not get too divergent. Sharing code between us all helps all. Given that I'm working on the kernel side of a data migration file system using NetBSD, I can assure you there are things which FreeBSD would get access to more easily the more-similar the two VFS interface are. :-) I would suggest that anyone planning on this rewrite should talk, in depth, with John Heidemann prior to engaging in such activity. John is very approachable, and is a deep thinker. Any rewrite that does not meet his original design goals for his stacking architecture is, I think, a Very Bad Idea(tm). I greatly appreciate all assistance in answering the following questions: 1) What are the perceived problems with the current VFS? 2) What options are available to us as remedies? 3) To what extent will existing FS code require revision in order to be useful after the rewrite? 4) Will Chapters 6,7,8 9 of "The Design and Implementation of the 4.4BSD Operating System" still pertain after the rewrite? 5) How important are questions 3 4 in the design of the new VFS? I believe that the VFS is conceptually sound and that the existing semantics should be strictly retained in the new code. Any new functionality should be added in the form of entirely new kernel routines and system calls, or possibly by such means as converting the existing routines to the vararg format etc. Here some of the problems I'm aware of, and my suggested remedies: 1.The interface is not reflexive, with regard to cn_pnbuf. Specifically, path buffers are allocated by the caller, but not freed by the caller, and various routines in each FS implementation are expected to deal with this. Each FS duplicates code, and such duplication is subject to error. Not to mention that it makes your kernel fat. Yep, that's not good. 2.Advisory locks are hung off private backing objects. Advisory locks are passed into VOP_ADVLOCK in each FS instance, and then each FS applies this by hanging the locks off a list on a private backing object. For FFS, this is the in core inode. A more correct approach would be to hang the lock off the vnode. This effectively obviates the need for having a VOP_ADVLOCK at all, except for the NFS client FS, which will need to propagate lock requests across the net. The most efficient mechanism for this would be to institute a pass/fail response for VOP_ADVLOCK calls, with a default of "pass", and an actual implementation of the operand only in the NFS client FS. I agree that it's better for all fs's to share this functionality as much as possible. I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an efficiency concern. If we actually make a VOP call, that should be the end of the story. I.e either add a vnode flag to indicate pas/fail-ness, or add a genfs/std call to handle the problem. I'd actually vote for the latter. Hang the byte-range locking off of the vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on OS flavor) to handle the call. That way all fs's that can share code, and the callers need only call VO_ADVLOCK() - no other logic. NetBSD actually needs this to get unionfs to work. Do you want to talk privately about it? Again, each FS must duplicate the advisory locking code, at present, and such duplication is subject to error. Agreed. 3.Object locks are implemented locally in many FS's. The VOP_LOCK interface is implemented via vop_stdlock() calls in many FS's. This is done using the "vfs_default" mechanism. In other FS's, it's implemented locally. The intent of the VOP_LOCK mechanism being implemented as a VOP at all was to allow it to be proxied to another machine over a network, using the original Heidemann design. This is also the reason for the use of descriptors for all VOP arguments, since they can be opaquely
Re: BSD XFS Port BSD VFS Rewrite
On Mon, 16 Aug 1999, Terry Lambert wrote: 2.Advisory locks are hung off private backing objects. I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an efficiency concern. If we actually make a VOP call, that should be the end of the story. I.e either add a vnode flag to indicate pas/fail-ness, or add a genfs/std call to handle the problem. I'd actually vote for the latter. Hang the byte-range locking off of the vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on OS flavor) to handle the call. That way all fs's that can share code, and the callers need only call VO_ADVLOCK() - no other logic. OK. Here's the problem with that: NFS client locks in a stacked FS on top the the NFS client FS. Ahh, but it'd be the fs's decision to map genfs_advlock()/vop_stdadvlock() to its vop_advlock_desc entry or not. In this case, NFS wouldn't want to do that. Though it would mean growing the fs footprint. Specifically, you need to seperate the idea of asserting a lock against the local vnode, asserting the lock via NFS locking, and coelescing the local lock list, after both have succeeded, or reverting the local assertion, should the remote assertion fail. Right. But my thought was that you'd be calling an NFS routine, so it could do the right thing. NetBSD actually needs this to get unionfs to work. Do you want to talk privately about it? If you want. FreeBSD needs it for unionfs and nullfs, so it's something that would be worth airing. I think you could say that no locking routine was an approval of the uuper level lock. This lets you bail on all FS's except NFS, where you have to deal with the approve/reject from the remote host. The problem with this on FreeBSD is the VFS_default stuff, which puts a non-NULL interface on all FS's for all VOP's. I'm not familiar with the VFS_default stuff. All the vop_default_desc routines in NetBSD point to error routines. Yes, this NULL is the same NULL I suggested for advisory locks, above. I'm not sure. The struct lock * is only used by layered filesystems, so they can keep track both of the underlying vnode lock, and if needed their own vnode lock. For advisory locks, would we want to keep track both of locks on our layer and the layer below? Don't we want either one or the other? i.e. layers bypass to the one below, or deal with it all themselves. 5.The idea of "root" vs. "non-root" mounts is inherently bad. You forgot: 5) Update export lists If you call the mount routine with no device name (args.fspec == 0) and with MNT_UPDATE, you get routed to the vfs_export routine This must be the job of the upper level code, so that there is a single control point for export information, instead of spreading it throughout ead FS's mount entry point. I agree it should be detangled, but think it should remain the fs's job to choose to call vfs_export. Otherwise an fs can't impliment its own export policies. :-) I thought it was? Admitedly the only reference code I have is the ntfs code in the NetBSD kernel. But given how full of #ifdef (__FreeBSD__)'s it is, I thought it'd be an ok reference. No. We've lost the context, but what I was trying to say was that I thought the marking-the-vnode-as-mounted-on bit was done in the mount syscall at present. At least that's what http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.130 seems to be doing. Basically, what you would have is the equivalent of a variable length "mounted volume" table, from which mappings (and exports, based on the mappings) are externalized into the namespace. Ahh, sounds like you're talking about a new formalism.. Right. It should just have a "mount" entry point, and the rest of the stuff moves to higher level code, called by the mount system call, and the mountroot stuff during boot, to externalize the root volume at the top of the hierarchy. An ideal world would mount a / that had a /dev under it, and then do transparent mounts over top of that. That would be quite a different place than we have now. ;-) The conversion of the root device into a vnode pointer, or a path to a device into a vnode pointer, is the job of upper level code -- specifically, the mount system call, and the common code for booting. My one concern about this is you've assumed that the user is mounting a device onto a filesystem. No. Vnoide, not bdevvp. The bdevvp stuff is for the boot time stuff in the upper level code, and only applies to the root volume. Maybe I mis-parsed. I thought you were talking about parsing the first mount option (in mount /dev/disk there, the /dev/disk option) into a vnode. The concern below is that different fs's have different ideas as to what that node should be. Some want it a device node which no one else is using (most leaf fs's), while some others want
Re: BSD XFS Port BSD VFS Rewrite
On Sat, 14 Aug 1999, Terry Lambert wrote: I am currently conducting a thorough study of the VFS subsystem in preparation for an all-out effort to port SGI's XFS filesystem to FreeBSD 4.x at such time as SGI gives up the code. Matt Dillon has written in hackers- that the VFS subsystem is presently not well understood by any of the active kernel code contributers and that it will be rewritten later this year. This is obviously of great concern to me in this port. It is of great concern to me that a rewrite, apparently because of non-understanding, is taking place at all. That concerns me too. Many aspects of the 4.4 vnode interface were there for specific reasons. Even if they were hack solutions, to re-write them because of a lack of understanding is dangerous as the new code will likely run into the same problems as before. :-) Also, it behooves all the *BSD's to not get too divergent. Sharing code between us all helps all. Given that I'm working on the kernel side of a data migration file system using NetBSD, I can assure you there are things which FreeBSD would get access to more easily the more-similar the two VFS interface are. :-) I would suggest that anyone planning on this rewrite should talk, in depth, with John Heidemann prior to engaging in such activity. John is very approachable, and is a deep thinker. Any rewrite that does not meet his original design goals for his stacking architecture is, I think, a Very Bad Idea(tm). I greatly appreciate all assistance in answering the following questions: 1) What are the perceived problems with the current VFS? 2) What options are available to us as remedies? 3) To what extent will existing FS code require revision in order to be useful after the rewrite? 4) Will Chapters 6,7,8 9 of The Design and Implementation of the 4.4BSD Operating System still pertain after the rewrite? 5) How important are questions 3 4 in the design of the new VFS? I believe that the VFS is conceptually sound and that the existing semantics should be strictly retained in the new code. Any new functionality should be added in the form of entirely new kernel routines and system calls, or possibly by such means as converting the existing routines to the vararg format etc. Here some of the problems I'm aware of, and my suggested remedies: 1.The interface is not reflexive, with regard to cn_pnbuf. Specifically, path buffers are allocated by the caller, but not freed by the caller, and various routines in each FS implementation are expected to deal with this. Each FS duplicates code, and such duplication is subject to error. Not to mention that it makes your kernel fat. Yep, that's not good. 2.Advisory locks are hung off private backing objects. Advisory locks are passed into VOP_ADVLOCK in each FS instance, and then each FS applies this by hanging the locks off a list on a private backing object. For FFS, this is the in core inode. A more correct approach would be to hang the lock off the vnode. This effectively obviates the need for having a VOP_ADVLOCK at all, except for the NFS client FS, which will need to propagate lock requests across the net. The most efficient mechanism for this would be to institute a pass/fail response for VOP_ADVLOCK calls, with a default of pass, and an actual implementation of the operand only in the NFS client FS. I agree that it's better for all fs's to share this functionality as much as possible. I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an efficiency concern. If we actually make a VOP call, that should be the end of the story. I.e either add a vnode flag to indicate pas/fail-ness, or add a genfs/std call to handle the problem. I'd actually vote for the latter. Hang the byte-range locking off of the vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on OS flavor) to handle the call. That way all fs's that can share code, and the callers need only call VO_ADVLOCK() - no other logic. NetBSD actually needs this to get unionfs to work. Do you want to talk privately about it? Again, each FS must duplicate the advisory locking code, at present, and such duplication is subject to error. Agreed. 3.Object locks are implemented locally in many FS's. The VOP_LOCK interface is implemented via vop_stdlock() calls in many FS's. This is done using the vfs_default mechanism. In other FS's, it's implemented locally. The intent of the VOP_LOCK mechanism being implemented as a VOP at all was to allow it to be proxied to another machine over a network, using the original Heidemann design. This is also the reason for the use of descriptors for all VOP arguments, since they can be opaquely proxied
Re: BSD XFS Port BSD VFS Rewrite
On Mon, 16 Aug 1999, Terry Lambert wrote: 2.Advisory locks are hung off private backing objects. I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an efficiency concern. If we actually make a VOP call, that should be the end of the story. I.e either add a vnode flag to indicate pas/fail-ness, or add a genfs/std call to handle the problem. I'd actually vote for the latter. Hang the byte-range locking off of the vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on OS flavor) to handle the call. That way all fs's that can share code, and the callers need only call VO_ADVLOCK() - no other logic. OK. Here's the problem with that: NFS client locks in a stacked FS on top the the NFS client FS. Ahh, but it'd be the fs's decision to map genfs_advlock()/vop_stdadvlock() to its vop_advlock_desc entry or not. In this case, NFS wouldn't want to do that. Though it would mean growing the fs footprint. Specifically, you need to seperate the idea of asserting a lock against the local vnode, asserting the lock via NFS locking, and coelescing the local lock list, after both have succeeded, or reverting the local assertion, should the remote assertion fail. Right. But my thought was that you'd be calling an NFS routine, so it could do the right thing. NetBSD actually needs this to get unionfs to work. Do you want to talk privately about it? If you want. FreeBSD needs it for unionfs and nullfs, so it's something that would be worth airing. I think you could say that no locking routine was an approval of the uuper level lock. This lets you bail on all FS's except NFS, where you have to deal with the approve/reject from the remote host. The problem with this on FreeBSD is the VFS_default stuff, which puts a non-NULL interface on all FS's for all VOP's. I'm not familiar with the VFS_default stuff. All the vop_default_desc routines in NetBSD point to error routines. Yes, this NULL is the same NULL I suggested for advisory locks, above. I'm not sure. The struct lock * is only used by layered filesystems, so they can keep track both of the underlying vnode lock, and if needed their own vnode lock. For advisory locks, would we want to keep track both of locks on our layer and the layer below? Don't we want either one or the other? i.e. layers bypass to the one below, or deal with it all themselves. 5.The idea of root vs. non-root mounts is inherently bad. You forgot: 5) Update export lists If you call the mount routine with no device name (args.fspec == 0) and with MNT_UPDATE, you get routed to the vfs_export routine This must be the job of the upper level code, so that there is a single control point for export information, instead of spreading it throughout ead FS's mount entry point. I agree it should be detangled, but think it should remain the fs's job to choose to call vfs_export. Otherwise an fs can't impliment its own export policies. :-) I thought it was? Admitedly the only reference code I have is the ntfs code in the NetBSD kernel. But given how full of #ifdef (__FreeBSD__)'s it is, I thought it'd be an ok reference. No. We've lost the context, but what I was trying to say was that I thought the marking-the-vnode-as-mounted-on bit was done in the mount syscall at present. At least that's what http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.130 seems to be doing. Basically, what you would have is the equivalent of a variable length mounted volume table, from which mappings (and exports, based on the mappings) are externalized into the namespace. Ahh, sounds like you're talking about a new formalism.. Right. It should just have a mount entry point, and the rest of the stuff moves to higher level code, called by the mount system call, and the mountroot stuff during boot, to externalize the root volume at the top of the hierarchy. An ideal world would mount a / that had a /dev under it, and then do transparent mounts over top of that. That would be quite a different place than we have now. ;-) The conversion of the root device into a vnode pointer, or a path to a device into a vnode pointer, is the job of upper level code -- specifically, the mount system call, and the common code for booting. My one concern about this is you've assumed that the user is mounting a device onto a filesystem. No. Vnoide, not bdevvp. The bdevvp stuff is for the boot time stuff in the upper level code, and only applies to the root volume. Maybe I mis-parsed. I thought you were talking about parsing the first mount option (in mount /dev/disk there, the /dev/disk option) into a vnode. The concern below is that different fs's have different ideas as to what that node should be. Some want it a device node which no one else is using (most leaf fs's), while some others want a
Re: BSD XFS Port BSD VFS Rewrite
On Fri, 13 Aug 1999, Terry Lambert wrote: Has anyone mentioned to them that they will be unable to incorporate changes made to the GPL'ed version of XFS back into the IRIX version of XFS, without IRIX becoming GPL'ed? Given that they say they're dropping IRIX and going with Linux, I don't think it'll be a problem. Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Improving the Unix API
On Mon, 28 Jun 1999, Francois-Rene Rideau wrote: On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote: See NetBSD (and presumably other BSD) mount -o update,rdonly and/or umount -f. (Last I tried, the latter didn't work as it should, but that's a matter of fixing bugs rather than introducing new features.) If you re-read the original message, the problem is what to do about processes with open file descriptors on the partition: stop them at once? stop them at first file access? block them instead? kill them? Will you do it atomically? How will you allow for such large table-walking to be compatible with real-time kernel response? [Hint: either use incremental data-structures, or don't be atomic and be interruptible instead.] unmount -f is more intended for oh-sh*t situations. So harshness is ok. The way it's done is that all of the vnodes in that fs's vnode list get either vgone'd or vcleaned (in the -f case). This will have the effect of mapping them to deadfs vnodes, so all future access will either fail or do nothing (close works, read returns an error). There aren't any big table walks. :-) Take care, Bill To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message