Re: hfs support for blocksize != 512
Hi, On Thu, 31 Aug 2000, Alexander Viro wrote: > Go ahead, write it. IMNSHO it's going to be much more complicated and > race-prone, but code talks. If you will manage to write it in clear and > race-free way - fine. Frankly, I don't believe that it's doable. It will be insofar more complicated, as I want to use a more complex state machine than "locked <-> unlocked", on the other hand I can avoid such funny constructions as triple_down() and obscure locking order rules. At any time the object will be either locked or in a well defined state, where at any time only a single object is locked by a thread. (I hope some pseudo code does for the beginning, too?) Most namespace operation work simply like a semaphore: restart: lock(dentry); if (dentry is busy) { unlock(dentry); sleep(); goto restart; } dentry->state = busy; unlock(dentry); If the operation is finished, the state is reset and everyone sleeping is woken up. Ok, let's come to the most interesting operation - rename(): restart: lock(olddentry); if (olddentry is busy) { unlock(olddentry); sleep(); goto restart; } olddentry->state = moving; unlock(olddentry); restart2: lock(newdentry); if (newdentry->state == moving) { lock(renamelock); if (olddentry->state == deleted) { unlock(renamelock); unlock(newdentry); sleep(); goto restart; } newdentry->state = deleted; unlock(renamelock); } else if (newdentry is busy) { unlock(newdentry); sleep(); goto restart2; } else newdentry->state = deleted; unlock(newdentry); if (!rename_valid(olddentry, newdentry)) { lock(newdentry); newdentry->state = idle; unlock(renamelock); lock(olddentry); olddentry->state = idle; unlock(olddentry); wakeup_sleepers(); return; } if (newdentry exists) unlink(newdentry); do_rename(olddentry, newdentry); lock(newdentry); newdentry->state = idle; unlock(renamelock); lock(olddentry); olddentry->state = deleted; unlock(olddentry); wakeup_sleepers(); return; Note that I don't touch any inode here, everything happens in the dcache. That means I move the complete inode locking into the fs, all I do here is to make sure, that while operation("foo") is busy, no other operation will use "foo". IMO this should work, I tried it with a rename("foo", "bar") and rename("bar", "foo"): case 1: one rename gets both dentries busy, the other rename will wait till it's finished. case 2: both mark the old dentry as moving and find the new dentry also moving. To make the rename atomic the global rename lock is needed, one rename will find the old dentry isn't moving anymore and has to restart and wait, the other rename will complete. Other operations will keep only one dentry busy, so that I don't a see problem here. If you don't find any major problem here, I'm going to try this. Since if this works, it will have some other advantages: - a user space fs will become possible, that can't even deadlock the system. The first restart loop can be easily made interruptable, so it can be safely killed. (I don't really want to know how a triple_down_interruptable() looks, not to mention the other three locks (+ BKL) taken during a rename.) - I can imagine better support for hfs. It can access the other fork without excessive locking (I think currently it doesn't even tries to). The order in which the forks can be created can change then too. > BTW, I really wonder what kind of locks are you going to have on _blocks_ > (you've mentioned that, unless I've misparsed what you've said). IMO that > way lies the horror, but hey, code talks. I thought about catching a bread, but while thinking about it, there should also be other ways. But that's fs specific, let's concentrate on the generic part first. > You claim that it's doable. I seriously doubt it. Nobody knows your ideas > better than you do, so... come on, demonstrate the patch. I think the above example should do basically the same as some nothing doing patch within affs. I hope that example shows two important ideas (no idea if they will save the world, but I'm willing to learn): - I use the dcache instead of the inode to synchronize namespace operation, what IMO makes quite a lot of sense, since it represents our (cached) representation of the fs. - Using states instead of a semaphore, makes it easily possible to detect e.g. a rename loop. bye, Roman - To
Re: hfs support for blocksize != 512
[snip the plans for AFFS] You know what? Try it. If your scheme is doable at all (I _very_ seriously doubt it, since I've seen similar attempts on FAT-derived filesystems and I remember very well what horror it was) it is doable with private locks. Just take your locks always after the VFS is done with getting its locks and you can forget about the locking done in VFS - the only effect will be that you will see (possibly) fewer simultaneous calls. Which should reduce the pressure on your mechanisms, so if they can work by theselves - they will work. Go ahead, write it. IMNSHO it's going to be much more complicated and race-prone, but code talks. If you will manage to write it in clear and race-free way - fine. Frankly, I don't believe that it's doable. Several things to watch for: * opened unlinked files should remain available until the last process closes the file. * if foo and bar exist there should be no interval during the rename(foo, bar) when open(bar,...) would fail. * busy directories can be removed. * ... and that includes rename() over them. * large intervals when power-off would lead to unrecoverable fs are bad. I'm not talking about full protection, but several seconds of inactivity (i.e. no new requests being submitted) should be enough even on floppies. You will get dirty fs, indeed, but it shouldn't be in catastrophically bad state. BTW, I really wonder what kind of locks are you going to have on _blocks_ (you've mentioned that, unless I've misparsed what you've said). IMO that way lies the horror, but hey, code talks. Right now the thing doesn't even work reliably. If you claim that your design will reduce the contention if VFS will get out of the way - better yet, but let's see first if it will work and will be readable. Allocation problems are not going to enter the game - on AFFS you've got no sparse files and thus all allocation is process-synchronous. Moreover, you can count on the fact that truncate and allocation attempts on a file are not going to clash (that includes the lack of clashes between allocations). You claim that it's doable. I seriously doubt it. Nobody knows your ideas better than you do, so... come on, demonstrate the patch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, > > - get dentry foo > > - get dentry baz > > How? OK, you've found block of baz. You know the name, all right. Links are chained together and all point back to the original, so if you remove the original, you have quite something to do with lots of links. > Now > you've got to do the full tracing all the way back to root. All file header have a pointer to the dir header, so it's not that difficult, but that makes links to directories so interesting. :) Anyway, I'll better try to describe the idea more generally: The basic idea is to introduce transient states to vfs and to move the locking into the fs, which probably knows better what needs to be protected. This would avoid the current locking overkill. Let's take a rename, first we mark the object as to be moved, no need to keep it locked after this. An open on this object would either fail or had to wait (on a seperate queue). Next we mark the destination dir as not removable. This is basically the job of vfs so far, the next steps happen in the fs. (I use affs here as an example.) First we lock the source dir and remove the object from the chain and unlock the dir. Now I can lock the destination, insert the object here and unlock the dir. (back to vfs) All we have to do now is to restore now the state of destination dir and the object and we have to wakeup anyone who's waiting. Back to the original example of removing a file with links. I have to get the dentry of baz as I have to prevent a lookup of that link, while I'm modifying its block. But I think it's enough to lock that block and check only the cached aliases. Then I can modify that block and unlock it again. > > - update file header baz from file header foo > > If it would be that simple... Extent blocks refer to foo, unfortunately. > Yes, copying the thing would be easier. Too bad, data structure prohibits > that. Which data structure prohibits that? Updating the extent blocks isn't that difficult as the back links are not needed for general operation, it's just wasting I/O. A bit more problematic are concurrent readers of foo, so I can't simply trash the buffer of foo's file header, but I can simply keep it allocated till the file is closed (keeps also the inode number constant and unique). > Well, consider rename over the primary link and there you go... Keep in > mind that extent blocks contain the reference to header block, so unless > you want to update them all you've got to move the header into donor's > chain ;-/ Oops, I just read rename(2) and notice that I forgot about a small detail. Ok, above rename operation get's slightly more difficult. Basically it's only a variation of the unlink problem, I first unlink the old file and then insert the new file. As I do less locking, I shouldn't have a locking problem or what do I miss? I just might have to update lots of back links, but that is not a critical part. [I can skip the affs history part, I just see you already got a better answer than I could give.] bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Thu, 31 Aug 2000, J. Dow wrote: > > being a jaded bastard I suspect that Commodore PHBs decided to save a > > bit on floppy controller price and did it well after the initial design > > Comododo PHBs had nothing to do with it. And the Commododo floppy > disk format is quite literally unreadable with a PC style controller. It was > not an economic decision. If you are going to carp please do so from a > basis of real knowledge Alexander. (The REAL blame for the disk fiasco > goes to the people at Metacrap^H^H^H^HComCo.) Hey, I've clearly said that I don't know which idiot was responsible for that fsckup. > > was done and so close to release that redesign was impossible for > > schedule reasons, but it might be something else. We'll probably never > > know unless somebody who had been in the original design team will leak > > it. But whatever reasons were behind that decision, OFS was either blindly > > copied without a single thought about very serious design factor _or_ > > had been crippled at some point before the release. If it's the latter - I > > commiserate with their fs folks. If it's the former... well, I think that > > it says quite a few things about their clue level. > > Metacomco designed it based on their TripOS. OFS is very good for > repairing the filesystem in the event of a problem, although the so called > DiskDoctor they provided quickly earned the name DiskDestroyer. > Metacomco and BSTRINGS and BPOINTERS and all that nonsense > entered the picture when it was decided the originally planned OS was > would take too long to develop. So what Metacomco had was grafted > onto what the old Amiga Inc had done resulting in a hodgepodge > mess. Umm... Interesting. Could somebody familiar with TripOS tell what size sectors had there? IOW, did it keep the metadata out-of-band or not? [snip] > old cruft is preserved for reading old disks. Later on DirCache was added > principly for floppy disks. About that time Randall added both so called > soft links and hard links. For what it is worth it took a long long time and > series of modifications before either of them worked adequately. Egads... Please, pass him my compliments - one has to be _really_ perverted to do the hardlinks that way. Even QNX way of handling that (move them into magical place after the first rename()/link() and leave the dud in the old place) is much saner. > > And let's not go into the links to directories, implemented well > > after it became painfully obvious that they were an invitation for > > troubles (from looking into Amiga newsgroups it seems that miracle > > didn't happen - I've seen quite a few complaints about fs breakage > > answered with "don't use links to directories, they are broken"). > > They MAY be fixed in the OS3.5 BoingBag 2 (service pack 2 with a > cutsiepie name.) Heinz has committed yet another rewrite. Ouch... Why did he do them (links to directories, that is), in the first place? > > Anyway, it's all history. We can't unroll the kludge, no matter > > what we do. We've got what we've got. And I'm not too interested in > > distribution of the blame between the people in team that seems to be > > dissolved years ago. I consider AFFS we have to deal with as a poor excuse > > of design and I think that it gives more than enough reasons for that. > > In alternative history it might be better. So might many other things. > > Indeed, poor or not it exists and we live with it in the Amiga community. > (Um, I wonder if I could talk Hendrix into a copy of the source for SFS so > it could be ported to Linux These days I prefer it to FFS. {^_-}) Hmm... What, format description is not available? > If you want I can bend your ear on things Amiga for longer than your > patience stretches, I suspect. (I've been following the threads discussions alt.folklore.computers is -> that way ;-) Let's take it there... > because there is a project I'd like to port from NT to Linux that just ain't > gonna make it until some nice threads are added and latencies drop > dramatically. RT_Linux may be overkill. But as it sits today Linux is > underkill when you need 1/4 frame and less timing latencies on Show > Control operations. ) ObWTF: WTF did these guys drop QNX when they clearly wanted RTOS? Do they have somebody who a) knew the difference between RT and TS and b) knew that Linux is TS? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Quoth a misinformed Alexander Viro re AFFS, > As for the silliness of the OFS... I apologize for repeating the > story if you know it already, but anyway: OFS looks awfully similar to > Alto filesystem. With one crucial difference: Alto kept the header/footer > equivalents in the sector framing. No silly 400-odd byte sectors for them. > That layout made a lot of sense - you could easily recover from many disk > faults, yodda, yodda, _without_ sacrificing performance. The whole design > relied on ability to put pieces of metadata in the sector framing. Take > that away and you've lost _very_ large part of the benefits. So large that > the whole design ought to be rethought - tradeoffs change big way. > > OFS took that away. Mechanically. It just stuffed the headers into > the data part of sectors. I don't know the story behind that decision - > being a jaded bastard I suspect that Commodore PHBs decided to save a > bit on floppy controller price and did it well after the initial design Comododo PHBs had nothing to do with it. And the Commododo floppy disk format is quite literally unreadable with a PC style controller. It was not an economic decision. If you are going to carp please do so from a basis of real knowledge Alexander. (The REAL blame for the disk fiasco goes to the people at Metacrap^H^H^H^HComCo.) > was done and so close to release that redesign was impossible for > schedule reasons, but it might be something else. We'll probably never > know unless somebody who had been in the original design team will leak > it. But whatever reasons were behind that decision, OFS was either blindly > copied without a single thought about very serious design factor _or_ > had been crippled at some point before the release. If it's the latter - I > commiserate with their fs folks. If it's the former... well, I think that > it says quite a few things about their clue level. Metacomco designed it based on their TripOS. OFS is very good for repairing the filesystem in the event of a problem, although the so called DiskDoctor they provided quickly earned the name DiskDestroyer. Metacomco and BSTRINGS and BPOINTERS and all that nonsense entered the picture when it was decided the originally planned OS was would take too long to develop. So what Metacomco had was grafted onto what the old Amiga Inc had done resulting in a hodgepodge mess. > AFFS took the headers out of the data sectors. But that killed the > whole reason behind having them anywhere - if you can't tell data blocks > from the rest, what's the point of marking free and metadata ones? > Now, links were total lossage - I think that even if you have some Kemo Sabe, links never existed UNTIL the Amiga FFS was developed, redeveloped, and redeveloped again. > doubts about that now, you will lose them when you will write down the > operations needed for rename(). And I mean pure set of on-disk changes - > forget about dentries, inodes and other in-core data. > > Why did they do it that way? Beats me. AmigaOS is a microkernel, > so replacing fs driver should be very easy. It ought to be easier than in > Linux. And they've pulled out the change from OFS to AFFS, so the > filesystem conversion was not an issue. Dunno how about UNIX-friendliness, > but their implementation of links definitely was not friendly to their own > OS. As it turns out many of the recovery tools people built worked remarkably well on FFS when it was introduced with little modification. (Most of the time tracing the actual data blocks was not necessary for rebuilding the disk. Thus the datablock metadata loss was not crippling.) FFS appeared in its first versions with AmigaDOS 1.3. (Er, if you want a copy of some of the earliest versions sent to developers for testing I can arrange something in that regard. I believe I still have most of that "stuff".) It underwent several rewrites as successive developers and demands were placed on it. One major change is evidenced in the hash algorithm used for the original OFS and FFS. It fails to treat international characters correctly when removing case. The international version corrected this deficiency. The old cruft is preserved for reading old disks. Later on DirCache was added principly for floppy disks. About that time Randall added both so called soft links and hard links. For what it is worth it took a long long time and series of modifications before either of them worked adequately. > And let's not go into the links to directories, implemented well > after it became painfully obvious that they were an invitation for > troubles (from looking into Amiga newsgroups it seems that miracle > didn't happen - I've seen quite a few complaints about fs breakage > answered with "don't use links to directories, they are broken"). They MAY be fixed in the OS3.5 BoingBag 2 (service pack 2 with a cutsiepie name.) Heinz has committed yet another rewrite. > Anyway, it's all history. We can't unroll the kludge, no matter > what we do. We've got
Re: hfs support for blocksize != 512
On Thu, 31 Aug 2000, Roman Zippel wrote: > Disclaimer: I know that the following doesn't match the current > implementation, it's just how I would intuitively would do it: > > - get dentry foo > - get dentry baz How? OK, you've found block of baz. You know the name, all right. Now you've got to do the full tracing all the way back to root. During that tracing you've got to do interesting things - essentially that's what Neil and Roman are trying to do with fh_to_dentry patches and it's _not_ simple. Moreover, it's even worse than the current code wrt amount of IO _and_ seeks. OK, nevermind, let's say you've done that. > - lock inode foo > - mark dentry foo as deleted > - getblk file header foo > - mark file header foo as deleted ? > - getblk file header baz You'll have to do it way before - how else would you find out that it was called baz, in the first place? > - update file header baz from file header foo If it would be that simple... Extent blocks refer to foo, unfortunately. Yes, copying the thing would be easier. Too bad, data structure prohibits that. > > On that specific operation. When you are done with > > that, I have a rename() for you, but I think that even simpler example > > (unlink()) will be sufficient. > > Please post it, I know there are some interesting examples, but I don't > have them at hand. Although I wanted to keep that flamewar for later, but > if we're already in it... Well, consider rename over the primary link and there you go... Keep in mind that extent blocks contain the reference to header block, so unless you want to update them all you've got to move the header into donor's chain ;-/ > > Again, we are talking about the data structure and operations it has to > > deal with _according to its designers_. I claim that due to a bad data > > structure design (single-linked lists in hash chains, requirement to have > > all entries belonging to some chain) unlink() (one of the operations it > > was designed to deal with) becomes very complicated and requires rather > > hairy exclusion rules. On Amiga. Linux has nothing with the problem. > > To be fair it shoud be mentioned, that links were added later to affs. Well, but we've got to deal with the result, not with the AFFS-without-links. I certainly agree that most of the blame for bad data structure design falls on the folks who added that kludge for pseudo-links, but that's purely historical question. Result is ugly. As for the silliness of the OFS... I apologize for repeating the story if you know it already, but anyway: OFS looks awfully similar to Alto filesystem. With one crucial difference: Alto kept the header/footer equivalents in the sector framing. No silly 400-odd byte sectors for them. That layout made a lot of sense - you could easily recover from many disk faults, yodda, yodda, _without_ sacrificing performance. The whole design relied on ability to put pieces of metadata in the sector framing. Take that away and you've lost _very_ large part of the benefits. So large that the whole design ought to be rethought - tradeoffs change big way. OFS took that away. Mechanically. It just stuffed the headers into the data part of sectors. I don't know the story behind that decision - being a jaded bastard I suspect that Commodore PHBs decided to save a bit on floppy controller price and did it well after the initial design was done and so close to release that redesign was impossible for schedule reasons, but it might be something else. We'll probably never know unless somebody who had been in the original design team will leak it. But whatever reasons were behind that decision, OFS was either blindly copied without a single thought about very serious design factor _or_ had been crippled at some point before the release. If it's the latter - I commiserate with their fs folks. If it's the former... well, I think that it says quite a few things about their clue level. AFFS took the headers out of the data sectors. But that killed the whole reason behind having them anywhere - if you can't tell data blocks from the rest, what's the point of marking free and metadata ones? Now, links were total lossage - I think that even if you have some doubts about that now, you will lose them when you will write down the operations needed for rename(). And I mean pure set of on-disk changes - forget about dentries, inodes and other in-core data. Why did they do it that way? Beats me. AmigaOS is a microkernel, so replacing fs driver should be very easy. It ought to be easier than in Linux. And they've pulled out the change from OFS to AFFS, so the filesystem conversion was not an issue. Dunno how about UNIX-friendliness, but their implementation of links definitely was not friendly to their own OS. And let's not go into the links to directories, implemented well after it became painfully obvious that they were an invitation for troubles (from looking into
Re: hfs support for blocksize != 512
Hi, On Wed, 30 Aug 2000, Alexander Viro wrote: > c) ->i_sem on pageout? When? For 2.2.16: filemap_write_page() <- filemap_swapout() <- try_to_swap_out() <- ... <- swap_out() <- do_try_to_free_pages() <- kswapd() filemap_write_page() takes i_sem and calls do_write_page(). What did I miss? > BKL matters only in the areas where you do not block. Moreover, > fs code is still under the BKL, so it's totally moot. Let me state it differently, what I'm trying to say: Past: Lots of filesystem code wasn't designed/written with multiple threads in mind. The result is lots of races. Future: We want to experiment with a preempting kernel. Maybe that experiment will fail, but I'm certainly interested in it. But the result here will be a wonderful world of new races and I'm pretty sure your ext2 fixes will break here, one more reason I'm so keen to use sempahores. All I wanted to say is that level of threading is changing. How that is visible in the fs layer is a different problem. > > > Wrong. As the matter of fact, we could trivially get rid of _any_ use of > > > bread() and friends on ext2. > > > > Excuse my stupidity, but could you please outline me how? > > Using kiovec, for one thing. Huh? You said "trivially". > One thing that became really obvious is that current documentation > is either not enough or not read. Hell knows what to do about the latter, > but the former can be helped. Documentation is one (good) thing (I really tried to find as much as possible), but my point is that I tried to discuss design issues, I didn't want to know how it works now (for that I can and do read the source), I want to discuss the possibility of alternative solutions, is that really impossible? Anyway, after I discussed that enough with myself, I think I can try to code up something as soon as find the time for it. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, (Sorry for the previous empty mail, I was a bit too fast with sending and couldn't stop it completly.) On Wed, 30 Aug 2000, Alexander Viro wrote: I concentrate on the most interesting part: > As for AFFS directory format - fine, please describe the data > manipulations required by unlink("foo"); done after the > link("foo","bar/baz");. Both operations are supported on AmigaOS, so > references to UNIX are utterly irrelevant. On the block level, please. > Only for directory blocks. Now, tell me what kind of protection (pageout > has nothing to directories, so all async problems are irrelevant) would > you provide. Or what protection should VFS/core kernel/exec/whatever > provide to filesystem. Disclaimer: I know that the following doesn't match the current implementation, it's just how I would intuitively would do it: - get dentry foo - get dentry baz - lock inode foo - mark dentry foo as deleted - getblk file header foo - mark file header foo as deleted - getblk file header baz - update file header baz from file header foo - brelse file header baz - update inode foo - unlock inode foo - put dentry baz - lock foo's parent - getblk and update dir header parent - getblk file headers from foo's chain until file header of predecessor of foo found - update predecessor to point to successor of foo - brelse everything - unlock foo's parent - put and invalidate dentry foo - last user of foo frees file header foo in bitmap I probably forgot something, but you will surely tell me. Two things I want to mention anyway. First, I only lock something when needed, that of course breaks with current conventions. Second (and most important), I use the dentry to block a possible lookup of an inode, so noone can open or create foo or do anything else with it. A rename would work similiar only that the new dentry would be marked as not complete yet. > On that specific operation. When you are done with > that, I have a rename() for you, but I think that even simpler example > (unlink()) will be sufficient. Please post it, I know there are some interesting examples, but I don't have them at hand. Although I wanted to keep that flamewar for later, but if we're already in it... > Again, we are talking about the data structure and operations it has to > deal with _according to its designers_. I claim that due to a bad data > structure design (single-linked lists in hash chains, requirement to have > all entries belonging to some chain) unlink() (one of the operations it > was designed to deal with) becomes very complicated and requires rather > hairy exclusion rules. On Amiga. Linux has nothing with the problem. To be fair it shoud be mentioned, that links were added later to affs. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Alexander Viro wrote: > On Wed, 30 Aug 2000, Roman Zippel wrote: > > > What? You've proposed locking on pageout. If _that_ isn't the fast path... > > > > No, I suggested a lock (not necessarily the inode lock) during allocation > > of indirect blocks (and defer truncation of them). > > Which means pageout when you are dealing with sparse files. You > don't have them - fine, then you can take such lock right now. > [...] > > > > Sorry, but from time to time I prefer _first_ to think about a problem and > > I try to understand it. One way to do this is to post questions and/or > > suggestions to a mailing list (at least I thought so). If you have an > > other suggestion please enlighten me. > > No problem with _that_. > > How about we all calm down and do something more useful than this pissing > match? One thing that became really obvious is that current documentation > is either not enough or not read. Hell knows what to do about the latter, > but the former can be helped. We have several pieces of it - Richard's one > in the tree, Daniel's postings on fsdevel Funny you should mention that - I was just reading this thread and thinking "now, how the heck and I going to make some sense of the locking rules in the new VFS?". I'm getting to the point where I have to deal with some subtle issues in my own code and I thought I'd approach this by writing down the locking rules. Then I realized that since I don't have a clue where to start, I'd better do some deep breathing, relax and think about it. Here's were I am now: 1) I want to think about what the absolute minimal level of locking for FS ops could be. This is the same as asking what the maximum parallelism could be. This is not necessarily going to resemble the current arrangement very much, and it might give shivers to some fs programmers that are used to being able to count on certain traditional regions of mutual exclusion. 2) Then I have to go look at the current practice, and get it down in some sort of notation that's easy to understand. 3) At this point I'd have the two endpoints of a migration path: where we are (on the road away from BKL) and where we're going (towards the tightest, most parallel fs you ever did see:-). This should be useful in assessing how long that road is, and hence, just how far we are from having the locking rules settle down. 4) Then post the draft, hopefully attracting some of the usual flamage. In other words, trial by fire. > and several parts written by > various folks. This stuff needs to be merged (and corrected where needed). > I volunteer to do that - I've spent quite a while dealing with the code, > so I at least know what _is_ there. I would be really grateful if > * folks who have writeups would post URLs to them (or texts > themselves, if they are small enough). Preferably to fsdevel, but private > email will also go. Please cross-post to [EMAIL PROTECTED] and [EMAIL PROTECTED] as well. On the theory that having more copies of documentation is always better than less. > * people would comment after the result will be posted. Especially > about the missing / hard-to-understand pieces of text. > * somebody helped to turn the result into decent English text. There are a number of native English speakers hanging on the linux-doc list, just waiting to be asked. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, On Wed, 30 Aug 2000, Alexander Viro wrote: c) -i_sem on pageout? When? For 2.2.16: filemap_write_page() - filemap_swapout() - try_to_swap_out() - ... - swap_out() - do_try_to_free_pages() - kswapd() filemap_write_page() takes i_sem and calls do_write_page(). What did I miss? BKL matters only in the areas where you do not block. Moreover, fs code is still under the BKL, so it's totally moot. Let me state it differently, what I'm trying to say: Past: Lots of filesystem code wasn't designed/written with multiple threads in mind. The result is lots of races. Future: We want to experiment with a preempting kernel. Maybe that experiment will fail, but I'm certainly interested in it. But the result here will be a wonderful world of new races and I'm pretty sure your ext2 fixes will break here, one more reason I'm so keen to use sempahores. All I wanted to say is that level of threading is changing. How that is visible in the fs layer is a different problem. Wrong. As the matter of fact, we could trivially get rid of _any_ use of bread() and friends on ext2. Excuse my stupidity, but could you please outline me how? Using kiovec, for one thing. Huh? You said "trivially". One thing that became really obvious is that current documentation is either not enough or not read. Hell knows what to do about the latter, but the former can be helped. Documentation is one (good) thing (I really tried to find as much as possible), but my point is that I tried to discuss design issues, I didn't want to know how it works now (for that I can and do read the source), I want to discuss the possibility of alternative solutions, is that really impossible? Anyway, after I discussed that enough with myself, I think I can try to code up something as soon as find the time for it. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Thu, 31 Aug 2000, Roman Zippel wrote: Disclaimer: I know that the following doesn't match the current implementation, it's just how I would intuitively would do it: - get dentry foo - get dentry baz How? OK, you've found block of baz. You know the name, all right. Now you've got to do the full tracing all the way back to root. During that tracing you've got to do interesting things - essentially that's what Neil and Roman are trying to do with fh_to_dentry patches and it's _not_ simple. Moreover, it's even worse than the current code wrt amount of IO _and_ seeks. OK, nevermind, let's say you've done that. - lock inode foo - mark dentry foo as deleted - getblk file header foo - mark file header foo as deleted ? - getblk file header baz You'll have to do it way before - how else would you find out that it was called baz, in the first place? - update file header baz from file header foo If it would be that simple... Extent blocks refer to foo, unfortunately. Yes, copying the thing would be easier. Too bad, data structure prohibits that. On that specific operation. When you are done with that, I have a rename() for you, but I think that even simpler example (unlink()) will be sufficient. Please post it, I know there are some interesting examples, but I don't have them at hand. Although I wanted to keep that flamewar for later, but if we're already in it... Well, consider rename over the primary link and there you go... Keep in mind that extent blocks contain the reference to header block, so unless you want to update them all you've got to move the header into donor's chain ;-/ Again, we are talking about the data structure and operations it has to deal with _according to its designers_. I claim that due to a bad data structure design (single-linked lists in hash chains, requirement to have all entries belonging to some chain) unlink() (one of the operations it was designed to deal with) becomes very complicated and requires rather hairy exclusion rules. On Amiga. Linux has nothing with the problem. To be fair it shoud be mentioned, that links were added later to affs. Well, but we've got to deal with the result, not with the AFFS-without-links. I certainly agree that most of the blame for bad data structure design falls on the folks who added that kludge for pseudo-links, but that's purely historical question. Result is ugly. As for the silliness of the OFS... I apologize for repeating the story if you know it already, but anyway: OFS looks awfully similar to Alto filesystem. With one crucial difference: Alto kept the header/footer equivalents in the sector framing. No silly 400-odd byte sectors for them. That layout made a lot of sense - you could easily recover from many disk faults, yodda, yodda, _without_ sacrificing performance. The whole design relied on ability to put pieces of metadata in the sector framing. Take that away and you've lost _very_ large part of the benefits. So large that the whole design ought to be rethought - tradeoffs change big way. OFS took that away. Mechanically. It just stuffed the headers into the data part of sectors. I don't know the story behind that decision - being a jaded bastard I suspect that Commodore PHBs decided to save a bit on floppy controller price and did it well after the initial design was done and so close to release that redesign was impossible for schedule reasons, but it might be something else. We'll probably never know unless somebody who had been in the original design team will leak it. But whatever reasons were behind that decision, OFS was either blindly copied without a single thought about very serious design factor _or_ had been crippled at some point before the release. If it's the latter - I commiserate with their fs folks. If it's the former... well, I think that it says quite a few things about their clue level. AFFS took the headers out of the data sectors. But that killed the whole reason behind having them anywhere - if you can't tell data blocks from the rest, what's the point of marking free and metadata ones? Now, links were total lossage - I think that even if you have some doubts about that now, you will lose them when you will write down the operations needed for rename(). And I mean pure set of on-disk changes - forget about dentries, inodes and other in-core data. Why did they do it that way? Beats me. AmigaOS is a microkernel, so replacing fs driver should be very easy. It ought to be easier than in Linux. And they've pulled out the change from OFS to AFFS, so the filesystem conversion was not an issue. Dunno how about UNIX-friendliness, but their implementation of links definitely was not friendly to their own OS. And let's not go into the links to directories, implemented well after it became painfully obvious that they were an invitation for troubles (from looking into Amiga newsgroups it seems that miracle
Re: hfs support for blocksize != 512
On Thu, 31 Aug 2000, J. Dow wrote: being a jaded bastard I suspect that Commodore PHBs decided to save a bit on floppy controller price and did it well after the initial design Comododo PHBs had nothing to do with it. And the Commododo floppy disk format is quite literally unreadable with a PC style controller. It was not an economic decision. If you are going to carp please do so from a basis of real knowledge Alexander. (The REAL blame for the disk fiasco goes to the people at Metacrap^H^H^H^HComCo.) Hey, I've clearly said that I don't know which idiot was responsible for that fsckup. was done and so close to release that redesign was impossible for schedule reasons, but it might be something else. We'll probably never know unless somebody who had been in the original design team will leak it. But whatever reasons were behind that decision, OFS was either blindly copied without a single thought about very serious design factor _or_ had been crippled at some point before the release. If it's the latter - I commiserate with their fs folks. If it's the former... well, I think that it says quite a few things about their clue level. Metacomco designed it based on their TripOS. OFS is very good for repairing the filesystem in the event of a problem, although the so called DiskDoctor they provided quickly earned the name DiskDestroyer. Metacomco and BSTRINGS and BPOINTERS and all that nonsense entered the picture when it was decided the originally planned OS was would take too long to develop. So what Metacomco had was grafted onto what the old Amiga Inc had done resulting in a hodgepodge mess. Umm... Interesting. Could somebody familiar with TripOS tell what size sectors had there? IOW, did it keep the metadata out-of-band or not? [snip] old cruft is preserved for reading old disks. Later on DirCache was added principly for floppy disks. About that time Randall added both so called soft links and hard links. For what it is worth it took a long long time and series of modifications before either of them worked adequately. Egads... Please, pass him my compliments - one has to be _really_ perverted to do the hardlinks that way. Even QNX way of handling that (move them into magical place after the first rename()/link() and leave the dud in the old place) is much saner. And let's not go into the links to directories, implemented well after it became painfully obvious that they were an invitation for troubles (from looking into Amiga newsgroups it seems that miracle didn't happen - I've seen quite a few complaints about fs breakage answered with "don't use links to directories, they are broken"). They MAY be fixed in the OS3.5 BoingBag 2 (service pack 2 with a cutsiepie name.) Heinz has committed yet another rewrite. Ouch... Why did he do them (links to directories, that is), in the first place? Anyway, it's all history. We can't unroll the kludge, no matter what we do. We've got what we've got. And I'm not too interested in distribution of the blame between the people in team that seems to be dissolved years ago. I consider AFFS we have to deal with as a poor excuse of design and I think that it gives more than enough reasons for that. In alternative history it might be better. So might many other things. Indeed, poor or not it exists and we live with it in the Amiga community. (Um, I wonder if I could talk Hendrix into a copy of the source for SFS so it could be ported to Linux These days I prefer it to FFS. {^_-}) Hmm... What, format description is not available? If you want I can bend your ear on things Amiga for longer than your patience stretches, I suspect. (I've been following the threads discussions alt.folklore.computers is - that way ;-) Let's take it there... because there is a project I'd like to port from NT to Linux that just ain't gonna make it until some nice threads are added and latencies drop dramatically. RT_Linux may be overkill. But as it sits today Linux is underkill when you need 1/4 frame and less timing latencies on Show Control operations. petasigh) ObWTF: WTF did these guys drop QNX when they clearly wanted RTOS? Do they have somebody who a) knew the difference between RT and TS and b) knew that Linux is TS? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, - get dentry foo - get dentry baz How? OK, you've found block of baz. You know the name, all right. Links are chained together and all point back to the original, so if you remove the original, you have quite something to do with lots of links. Now you've got to do the full tracing all the way back to root. All file header have a pointer to the dir header, so it's not that difficult, but that makes links to directories so interesting. :) Anyway, I'll better try to describe the idea more generally: The basic idea is to introduce transient states to vfs and to move the locking into the fs, which probably knows better what needs to be protected. This would avoid the current locking overkill. Let's take a rename, first we mark the object as to be moved, no need to keep it locked after this. An open on this object would either fail or had to wait (on a seperate queue). Next we mark the destination dir as not removable. This is basically the job of vfs so far, the next steps happen in the fs. (I use affs here as an example.) First we lock the source dir and remove the object from the chain and unlock the dir. Now I can lock the destination, insert the object here and unlock the dir. (back to vfs) All we have to do now is to restore now the state of destination dir and the object and we have to wakeup anyone who's waiting. Back to the original example of removing a file with links. I have to get the dentry of baz as I have to prevent a lookup of that link, while I'm modifying its block. But I think it's enough to lock that block and check only the cached aliases. Then I can modify that block and unlock it again. - update file header baz from file header foo If it would be that simple... Extent blocks refer to foo, unfortunately. Yes, copying the thing would be easier. Too bad, data structure prohibits that. Which data structure prohibits that? Updating the extent blocks isn't that difficult as the back links are not needed for general operation, it's just wasting I/O. A bit more problematic are concurrent readers of foo, so I can't simply trash the buffer of foo's file header, but I can simply keep it allocated till the file is closed (keeps also the inode number constant and unique). Well, consider rename over the primary link and there you go... Keep in mind that extent blocks contain the reference to header block, so unless you want to update them all you've got to move the header into donor's chain ;-/ Oops, I just read rename(2) and notice that I forgot about a small detail. Ok, above rename operation get's slightly more difficult. Basically it's only a variation of the unlink problem, I first unlink the old file and then insert the new file. As I do less locking, I shouldn't have a locking problem or what do I miss? I just might have to update lots of back links, but that is not a critical part. [I can skip the affs history part, I just see you already got a better answer than I could give.] bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
[snip the plans for AFFS] You know what? Try it. If your scheme is doable at all (I _very_ seriously doubt it, since I've seen similar attempts on FAT-derived filesystems and I remember very well what horror it was) it is doable with private locks. Just take your locks always after the VFS is done with getting its locks and you can forget about the locking done in VFS - the only effect will be that you will see (possibly) fewer simultaneous calls. Which should reduce the pressure on your mechanisms, so if they can work by theselves - they will work. Go ahead, write it. IMNSHO it's going to be much more complicated and race-prone, but code talks. If you will manage to write it in clear and race-free way - fine. Frankly, I don't believe that it's doable. Several things to watch for: * opened unlinked files should remain available until the last process closes the file. * if foo and bar exist there should be no interval during the rename(foo, bar) when open(bar,...) would fail. * busy directories can be removed. * ... and that includes rename() over them. * large intervals when power-off would lead to unrecoverable fs are bad. I'm not talking about full protection, but several seconds of inactivity (i.e. no new requests being submitted) should be enough even on floppies. You will get dirty fs, indeed, but it shouldn't be in catastrophically bad state. BTW, I really wonder what kind of locks are you going to have on _blocks_ (you've mentioned that, unless I've misparsed what you've said). IMO that way lies the horror, but hey, code talks. Right now the thing doesn't even work reliably. If you claim that your design will reduce the contention if VFS will get out of the way - better yet, but let's see first if it will work and will be readable. Allocation problems are not going to enter the game - on AFFS you've got no sparse files and thus all allocation is process-synchronous. Moreover, you can count on the fact that truncate and allocation attempts on a file are not going to clash (that includes the lack of clashes between allocations). You claim that it's doable. I seriously doubt it. Nobody knows your ideas better than you do, so... come on, demonstrate the patch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Wed, 30 Aug 2000, Roman Zippel wrote: > > Your repeated claims of VFS becoming more multi-threaded in ways > > that are not transparent to fs drivers wrt locking are false. > > For example the usage of inode lock changed pretty much and was partly > replaced with the page lock? I can still remember times, where all of the > fs stuff happened under the BKL, for me that means only a _single_ thread a) fs methods _are_ called under BKL. b) BKL has nothing to single-processor races. Proof: definition of lock_kernel() on non-SMP builds. You sleep - you lose BKL. schedule() drops it (and restores when your process gets a timeslice again, but whatever happens in the meanwhile - happens). c) ->i_sem on pageout? When? > of execution could be busy in the whole fs layer. IMHO that's not really a > prime example of multi-threaded programming, if you have a different > definition please let me now. BKL matters only in the areas where you do not block. Moreover, fs code is still under the BKL, so it's totally moot. > > What? You've proposed locking on pageout. If _that_ isn't the fast path... > > No, I suggested a lock (not necessarily the inode lock) during allocation > of indirect blocks (and defer truncation of them). Which means pageout when you are dealing with sparse files. You don't have them - fine, then you can take such lock right now. > > > The major problem right now is that writepage() is supposed to be > > > asynchronous especially for kswapd, but the fs might have to > > > synchronized something _internal_. I think one problem here is that we > > > still have a synchronous buffer API, what makes it very hard to > > > implement a asynchronous interface. That's why I suggested an I/O > > > > Wrong. As the matter of fact, we could trivially get rid of _any_ use of > > bread() and friends on ext2. > > Excuse my stupidity, but could you please outline me how? Using kiovec, for one thing. > > _One_ thread? For the whole fs? So you would pass the dirty pages from > > kswapd to that guy. Fine. It attempts to acquire the inode semaphore (in > > your proposal, as far as I could parse it). It blocks. kswapd keeps > > pumping dirty pages into the queue of that thread. Wonderful... > > Sorry, but did you read my mail? The purpose of that thread is to sleep > and to get waken up to continue the IO. Not very much changes, except that > this thread can safely sleep, whereas kswapd can't. > Excuse my ignorance, but who does currently stop kswapd to start lots of > IO? Filesystems, actually. The problem is not a burst of IO (it will not happen - your thread is locked), but completely unnecessary interlock between the output on different files. > > b) doesn't help AFFS directory problems > > Why the hell do you come always with this, I _never_ mentioned it. Let me put it that way: It will not help with anything except a very specific problem with sparse files. You've mentioned handling of HFS. Guess what, there your suggestion gives zero. Why? Because pageout on HFS never has a chance to allocate anything, so no matter what/how you lock on allocation, kswapd doesn't enter the picture. At all. > > Talk is cheap. If you can show the patch that would simplify ext2, > > I'm sure that Ted will be glad to see it. Same for maintainers of other > > filesystems. The only requirement is that it should work. Excuse me, but > > the longer I read your postings the more it looks like you have no idea of > > the things you are talking about. I would be glad to be proven wrong on > > that one too ;-/ > > I'm very sorry to waste your precious time, but your fscking arrogance > makes me sick. What's your problem? Shall I first worship you as our fs > god who saved us from all races? Huh??? > Sorry, but from time to time I prefer _first_ to think about a problem and > I try to understand it. One way to do this is to post questions and/or > suggestions to a mailing list (at least I thought so). If you have an > other suggestion please enlighten me. No problem with _that_. How about we all calm down and do something more useful than this pissing match? One thing that became really obvious is that current documentation is either not enough or not read. Hell knows what to do about the latter, but the former can be helped. We have several pieces of it - Richard's one in the tree, Daniel's postings on fsdevel and several parts written by various folks. This stuff needs to be merged (and corrected where needed). I volunteer to do that - I've spent quite a while dealing with the code, so I at least know what _is_ there. I would be really grateful if * folks who have writeups would post URLs to them (or texts themselves, if they are small enough). Preferably to fsdevel, but private email will also go. * people would comment after the result will be posted. Especially about the missing / hard-to-understand pieces of text. * somebody
Re: hfs support for blocksize != 512
Hi, > Show me these removed locks. The only polite explanation I see is > that you have serious reading comprehension problems. Let me say it once > more, hopefully that will sink in: > > Your repeated claims of VFS becoming more multi-threaded in ways > that are not transparent to fs drivers wrt locking are false. For example the usage of inode lock changed pretty much and was partly replaced with the page lock? I can still remember times, where all of the fs stuff happened under the BKL, for me that means only a _single_ thread of execution could be busy in the whole fs layer. IMHO that's not really a prime example of multi-threaded programming, if you have a different definition please let me now. > What? You've proposed locking on pageout. If _that_ isn't the fast path... No, I suggested a lock (not necessarily the inode lock) during allocation of indirect blocks (and defer truncation of them). > > The major problem right now is that writepage() is supposed to be > > asynchronous especially for kswapd, but the fs might have to > > synchronized something _internal_. I think one problem here is that we > > still have a synchronous buffer API, what makes it very hard to > > implement a asynchronous interface. That's why I suggested an I/O > > Wrong. As the matter of fact, we could trivially get rid of _any_ use of > bread() and friends on ext2. Excuse my stupidity, but could you please outline me how? > _One_ thread? For the whole fs? So you would pass the dirty pages from > kswapd to that guy. Fine. It attempts to acquire the inode semaphore (in > your proposal, as far as I could parse it). It blocks. kswapd keeps > pumping dirty pages into the queue of that thread. Wonderful... Sorry, but did you read my mail? The purpose of that thread is to sleep and to get waken up to continue the IO. Not very much changes, except that this thread can safely sleep, whereas kswapd can't. Excuse my ignorance, but who does currently stop kswapd to start lots of IO? > b) doesn't help AFFS directory problems Why the hell do you come always with this, I _never_ mentioned it. > Talk is cheap. If you can show the patch that would simplify ext2, > I'm sure that Ted will be glad to see it. Same for maintainers of other > filesystems. The only requirement is that it should work. Excuse me, but > the longer I read your postings the more it looks like you have no idea of > the things you are talking about. I would be glad to be proven wrong on > that one too ;-/ I'm very sorry to waste your precious time, but your fscking arrogance makes me sick. What's your problem? Shall I first worship you as our fs god who saved us from all races? Sorry, but from time to time I prefer _first_ to think about a problem and I try to understand it. One way to do this is to post questions and/or suggestions to a mailing list (at least I thought so). If you have an other suggestion please enlighten me. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, > It sounds to me like different FSes have different needs. Maybe the best > approach is to have two or three fs APIs, according to the needs of the > fs. No, having several fs API is a maintainance nightmare, I think that's something everyone agrees on. What is needed is to modify the API to meet all requirements of vfs and needs of the fs. (The problem is we don't agree on what the fs needs...) bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Tue, 29 Aug 2000, Jeff V. Merkey wrote: > I concur with this appraisal from Al Viro. Single threading the VFS is > going backwards -- not a good idea. It sounds to me like different FSes have different needs. Maybe the best approach is to have two or three fs APIs, according to the needs of the fs. One could be a pure vnode interface, simple, serene, which puts the locking in the driver by whatever means it chooses. Lookup for NFS would be on the vnode number, which would be kept in a kernel table until the file was closed. One could be the current multi-threaded arrangement. Finally, one might add a single-threaded-per-filesystem-instance method for filesystems that don't thread well. It just seems to me that this sort of thing need not be an either-or situation. Comments? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, Tony Mantler wrote: > For those of you who would rather not have read through this entire email, > here's the condensed version: VFS is inherintly a wrong-level API, QNX does > it much better. Flame on. :) VFS isn't really wrong, the problem is that it moved from an almost single threaded API to a multithreaded API and that development isn't complete yet. I don't really expect that fs programming becomes easier, but it should stay sane. For example I want to protect certain state changes properly and not that insane "check all possible states at all possible times and before and after every change" what Al is currently doing in ext2. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, > Yes? And it will become simpler if you will put each and every locking > scheme into the API? No, I didn't say that. I want the API to be less restrictive and make the job for the fs a bit easier. IMO the current API is inconsistent and/or incomplete and I'm still trying to find out what exactly is missing. The VFS is becoming more and more multithreaded, locks are (re)moved, but nothing was added for the fs. > We have ext2 with indirect blocks, inode bitmaps and block bitmaps, one > per cylinder group + counters in each cylinder group. Should VFS know > about the internal locking rules? Should it be aware of the fact that > inodes foo and bar belong to the same cylinder group and if we remove them > we will need to protect the bitmap for a while? Ok, let's take ext2 as an example. Of course vfs should only be the abstraction layer, but it shouldn't enforce locking rules like you added them in ext2. I know the races exists already longer, so you don't have to argue about that, but earlier I suggested a simpler solution, the problem is that it requires holding an exclusive lock while it would sleep. It wouldn't even be in the fast path and would only affect write access to the indirect blocks of a single file, it doesn't affect reads and it doesn't affect access to other files - that really shouldn't be a problem even for a multi threaded environment. But currently this is not possible and all I'm trying now is to explore possibilities to make that possible, as it would make the life for ext2 and every other fs a lot easier. > We have AFFS with totally fscked directory structures. Sorry? Why is that? Because it's not UNIX friendly? It was designed for a completly different os and is very simple. The problems I know are mostly shared with every other fs, that has a more dynamic directory structure than ext2. > It's insane - protection of purely internal data structures belongs to the > module that knows about them. I absolutly don't argue against that! Anyway, somehow you skipped a lot of my mail, so it seems I have to continue to discuss that with myself (hopefully without permanent damage). The major problem right now is that writepage() is supposed to be asynchronous especially for kswapd, but the fs might have to synchronized something _internal_. I think one problem here is that we still have a synchronous buffer API, what makes it very hard to implement a asynchronous interface. That's why I suggested an I/O thread, which can sleep for the caller. Another possibility is to make the already existing asynchronus interface in buffer.c available to the fs. Anyway, if we want an asynchronous fs interface, we need an asynchronous buffer interface, so e.g. writepage() in ext2 can lock the indirect block, starts the I/O and gets called back later, another writepage() call in the same area has to detect that lock (with a simple down_trylock()) and schedules the complete I/O for later. With some help from the buffer interface it should be possible pretty easily and ext2 would actually become much easier again. Something like this would also be great for a real AIO support in userspace with great latencies. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Wed, 30 Aug 2000, Albert D. Cahalan wrote: > Ext2, XFS, Reiserfs, NWFS, and JFS need a multi-threaded VFS. > Do we really need a screaming fast multi-threaded AFFS driver? Erm... Roman seems to complain about VFS/VM not locking hard enough to make protection of private fs data structures unnecessary. > Tell me who is doing SPECweb99 benchmarks on AFFS. > I'd trade away some NTFS performance for a bug reduction. > Perhaps the trade would be OK for most single-user filesystems. > Somebody was doing a Commodore 64 filesystem. That just isn't > going to be mounted on /tmp except as a joke. > > Yeah, I know about the Coda interface and all. People like the > ease-of-use and reliability offered by in-kernel filesystems. What? You are trying to say that debugging the kernel code is easier than doing that in userland? Could you pass me this reefer? Seems to be some fairly strong stuff in there... There are reasons to write kernel fs, but reliability is _not_ one of them; debugging will be harder with all usual consequences. > Having a complex-to-simple VFS adapter would make this guy happy. > You don't have to write it or use it. Albert, care to look at the API someday? Areas of major suckage: ->revalidate() ->truncate() ->readdir() It's too fscking close to 2.4 for further cleanups in these places. The rest is in funny state - simple, but badly documented. It's much simpler than it used to be and I really wonder what simplification would you propose. Full lock on all operations? But one can do it right now - it's not worth a special translator... The only thing to watch for: ->writepage() (i.e. pageout) can happen anywhere below the ->i_size and you can't use blocking exclusion against that. Rationale: trivial deadlocks upon the memory pressure. If fs has no holes (AFFS, HFS, etc. qualify) - no block allocation on pageout, so your life is relatively simple. And the rest of file-modifying stuff is a) process-synchronous b) called with ->i_sem held. For files-with-holes you have to step carefully. Fixed that in ext2, but expanding to the rest will take a while. Fortunately they are relatively sane in other respects and happen to be very similar... [VFS comparison] > > Plan 9 is nice and easy. Without mmap(), > > without link(), without truncate(), without cross-directory rename() and > > No link() and no cross-directory rename()... how in hell??? > They what, move via copy and delete? So do we, if the target is on a different filesystem... So does AFS, for that matter (no cross-directory rename). I can understand them very well - full-blown rename() is _hell_ to get right. BT, DT in VFS, got the nausea. Grep for "Hastur" in fs/namei.c and read the comments. POSIX, Fortunately, these days fs side of ->rename() is mostly painless (compared to what had been there; there _is_ some crap, but that's what you get from the fscked semantics of the operation). 'sides, you had been one of the most vocal link(2)-haters, hadn't you? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Alexander Viro writes: > On Wed, 30 Aug 2000, Roman Zippel wrote: >> The point is: the thing I like about Linux is its simple interfaces, it's >> the basic idea of unix - keep it simple. That is true for most parts - the >> basic idea is simple and the real complexity is hidden behind it. But >> that's currently not true for vfs interface, a fs maintainer has to fight >> right now with fscking complex vfs interface and with a possible fscking > > Yes? And it will become simpler if you will put each and every locking > scheme into the API? > > Look: we have Hans with his trees-all-over-the-place + journal. Mmmm, isn't it just _one_ big tree with different types of nodes? > We have AFFS with totally fscked directory structures. Do you propose to ... > Then check what's left after that locking - e.g. can two > processes access the same fs at the same time or not? ... > Making VFS single-threaded will not fly. If you can show simpler MT one - Ext2, XFS, Reiserfs, NWFS, and JFS need a multi-threaded VFS. Do we really need a screaming fast multi-threaded AFFS driver? Tell me who is doing SPECweb99 benchmarks on AFFS. I'd trade away some NTFS performance for a bug reduction. Perhaps the trade would be OK for most single-user filesystems. Somebody was doing a Commodore 64 filesystem. That just isn't going to be mounted on /tmp except as a joke. Yeah, I know about the Coda interface and all. People like the ease-of-use and reliability offered by in-kernel filesystems. Having a complex-to-simple VFS adapter would make this guy happy. You don't have to write it or use it. > Plan 9 is nice and easy. Without mmap(), > without link(), without truncate(), without cross-directory rename() and No link() and no cross-directory rename()... how in hell??? They what, move via copy and delete? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, Tony Mantler wrote: For those of you who would rather not have read through this entire email, here's the condensed version: VFS is inherintly a wrong-level API, QNX does it much better. Flame on. :) VFS isn't really wrong, the problem is that it moved from an almost single threaded API to a multithreaded API and that development isn't complete yet. I don't really expect that fs programming becomes easier, but it should stay sane. For example I want to protect certain state changes properly and not that insane "check all possible states at all possible times and before and after every change" what Al is currently doing in ext2. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Tue, 29 Aug 2000, David A. Gatwood wrote: > Indeed, that's what a VFS layer should do -- abstract away all physical > structure, inodes, etc., leaving only the file abstraction. I've read It does. That leaves caring about the internal structures to fs - you don't want fscked block bitmap on ext2, you've got to protect it yourself. Sorry. > that the BSD-derived OSes have vnode interfaces that are remarkably > similar to what you're describing, i.e. the concept isn't restricted to > RTOSes. That's what had been done. BTW, pure vnode interface leaves all namespace-related race-prevention to fs writer. And they tend to fsck up. "They" include Kirk, so... I wouldn't call it simple. Moreover, tons of the code are duplicated (with slight variations in the set of present bugs) in all filesystems. > Note that I haven't touched the Linux VFS layer since 2.0.xx, so I'm not > in a position to comment on the current state of the code. :-) It got much simpler. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Tue, 29 Aug 2000, Tony Mantler wrote: > (Obligitory disclaimer: QNX is an embedded operating system, both it's > architecture and target market is considerably different from Linux's) > > QNX's filesystem interfaces make it so painfully easy to write a filesystem > that it puts everything else to shame. You can easily write a fully > functioning, race-free, completely coherent filesystem in less than a week, > it's that simple. I'd interject that it's not a very fair comparison between the kernel complexity of an RTOS and a full fledged traditional OS, but go on ;-) > Now, let's say you do an 'ls' on the FOO directory. The FS api would tap > your filesystem on the shoulder and ask "Hey you, what's in the FOO > directory?". Your filesystem would reply "BAR and BAZ". It might also reply with a stat structure, depending on the implementation, but otherwise, yeah, this is a good model to move towards. > So what does it all mean? Basically, if you want hugely complex dentries, > and inodes as big as your head, you can do that. If you don't, more power > to you. It's all entirely contained inside your specific FS code, the FS > api doesn't care one bit. It just asks you for files. Indeed, that's what a VFS layer should do -- abstract away all physical structure, inodes, etc., leaving only the file abstraction. I've read that the BSD-derived OSes have vnode interfaces that are remarkably similar to what you're describing, i.e. the concept isn't restricted to RTOSes. Note that I haven't touched the Linux VFS layer since 2.0.xx, so I'm not in a position to comment on the current state of the code. :-) Later, David - A brief Haiku: Microsoft is bad. It seems secure at first glance. Then you read your mail. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
I concur with this appraisal from Al Viro. Single threading the VFS is going backwards -- not a good idea. :-) Jeff Alexander Viro wrote: > > On Wed, 30 Aug 2000, Roman Zippel wrote: > > > > > hfs. For example reading from a file might require a read from a btree > > > > file (extent file), with what another file write can be busy with (e.g. > > > > reordering the btree nodes). > > > > > > And? > > > > The point is: the thing I like about Linux is its simple interfaces, it's > > the basic idea of unix - keep it simple. That is true for most parts - the > > basic idea is simple and the real complexity is hidden behind it. But > > that's currently not true for vfs interface, a fs maintainer has to fight > > right now with fscking complex vfs interface and with a possible fscking > > Yes? And it will become simpler if you will put each and every locking > scheme into the API? > > Look: we have Hans with his trees-all-over-the-place + journal. He has a > very legitimate need to protect the internal data structures of Reiserfs > and do it without changing the VFS<->reiserfs interaction whenever he > decides to change purely internal structures. > > We have ext2 with indirect blocks, inode bitmaps and block bitmaps, one > per cylinder group + counters in each cylinder group. Should VFS know > about the internal locking rules? Should it be aware of the fact that > inodes foo and bar belong to the same cylinder group and if we remove them > we will need to protect the bitmap for a while? > > We have FAT32 where we've got a nasty allocation data with rather > interesting locking rules. Should it be protected by VFS? If it should - > well, I have bad news for you: write() on a file will lock the whole > filesystem until write() completes. Don't like it for every fs? Tough, it > will mean that VFS will not protect the thing and fs will have to do it > itself. > > We have AFFS with totally fscked directory structures. Do you propose to > make unlink() block all directory operations on the whole fs? No? Too > bad, because only AFFS knows enough to protect its data structures without > _that_ locking. Sorry, the only rule that would not require the knowledge > of layout and would be strong enough to protect is "no directory access > while unlink() is in progress". Yup, on the whole fs. Hardly acceptable > even for one filesystem, but try to impose that on everyone and see how > long you will survive. JPEGs of the murder scene would be appreciated, > BTW. > > We have HFS with the data structures of its own. You want locking in VFS > that would protect the things VFS doesn't know about and has no business > to meddle with? Fine, post the locking rules. > > It's insane - protection of purely internal data structures belongs to the > module that knows about them. Generic stuff can, should be and _is_ > protected. Private one _can't_ be protected without either horribly > crippled system (see above) or putting the knowledge of each data > structure into the generic layer. And the latter will be on the author of > filesystem anyway, because only he knows what rules he need. > > Please, propose your magical locking scheme that will protect everything > on every fs. And let maintainers of filesystems tell you whether it is > sufficient. Then check what's left after that locking - e.g. can two > processes access the same fs at the same time or not? > > If you are complaining about the fact that maintaining complex data > structures in multithreaded program (which kernel is) may be, well, > complex - welcome to reality. It had been that way since the very > beginning on _all_ MT projects, Linux included. You have complex private > data - you may be in for pain protecting yourself from races. Protection > of the public structures is there, so life became easier than it used to > be back in 2.0/2.1/2.2 days. > > Making VFS single-threaded will not fly. If you can show simpler MT one - > do it and a lot of people will be extremely grateful. 4.4BSD and SunOS > ones are more complex and make the life harder for filesystem writers. > Check yourself. OSF/1 is _much_ more complex. Hell knows what NT has, but > filesystem glue there looks absolutely horrible - compared to them we are > angels in that respect. v7 was simpler, sure enough. Without mmap(), > rename() and truncate() _and_ with only one fs type - why not? Too bad > that it was racey as hell... Plan 9 is nice and easy. Without mmap(), > without link(), without truncate(), without cross-directory rename() and > without support of crazy abortions from hell a-la AFFS. 2.0 and 2.2 are > _way_ more complex, just compare filesystem code size in 2.4 with them and > you will see. And yes, races in question are not new. I can reproduce them > on 2.0.9 box. Single-processor one - nothing fancy and SMP-related. > > If you have a way to simplify VFS and/or filesystems - by all means, > post it on fsdevel/l-k. Just tell what locking warranties you provide. > Current
Re: hfs support for blocksize != 512
On Wed, 30 Aug 2000, Roman Zippel wrote: > > > hfs. For example reading from a file might require a read from a btree > > > file (extent file), with what another file write can be busy with (e.g. > > > reordering the btree nodes). > > > > And? > > The point is: the thing I like about Linux is its simple interfaces, it's > the basic idea of unix - keep it simple. That is true for most parts - the > basic idea is simple and the real complexity is hidden behind it. But > that's currently not true for vfs interface, a fs maintainer has to fight > right now with fscking complex vfs interface and with a possible fscking Yes? And it will become simpler if you will put each and every locking scheme into the API? Look: we have Hans with his trees-all-over-the-place + journal. He has a very legitimate need to protect the internal data structures of Reiserfs and do it without changing the VFS<->reiserfs interaction whenever he decides to change purely internal structures. We have ext2 with indirect blocks, inode bitmaps and block bitmaps, one per cylinder group + counters in each cylinder group. Should VFS know about the internal locking rules? Should it be aware of the fact that inodes foo and bar belong to the same cylinder group and if we remove them we will need to protect the bitmap for a while? We have FAT32 where we've got a nasty allocation data with rather interesting locking rules. Should it be protected by VFS? If it should - well, I have bad news for you: write() on a file will lock the whole filesystem until write() completes. Don't like it for every fs? Tough, it will mean that VFS will not protect the thing and fs will have to do it itself. We have AFFS with totally fscked directory structures. Do you propose to make unlink() block all directory operations on the whole fs? No? Too bad, because only AFFS knows enough to protect its data structures without _that_ locking. Sorry, the only rule that would not require the knowledge of layout and would be strong enough to protect is "no directory access while unlink() is in progress". Yup, on the whole fs. Hardly acceptable even for one filesystem, but try to impose that on everyone and see how long you will survive. JPEGs of the murder scene would be appreciated, BTW. We have HFS with the data structures of its own. You want locking in VFS that would protect the things VFS doesn't know about and has no business to meddle with? Fine, post the locking rules. It's insane - protection of purely internal data structures belongs to the module that knows about them. Generic stuff can, should be and _is_ protected. Private one _can't_ be protected without either horribly crippled system (see above) or putting the knowledge of each data structure into the generic layer. And the latter will be on the author of filesystem anyway, because only he knows what rules he need. Please, propose your magical locking scheme that will protect everything on every fs. And let maintainers of filesystems tell you whether it is sufficient. Then check what's left after that locking - e.g. can two processes access the same fs at the same time or not? If you are complaining about the fact that maintaining complex data structures in multithreaded program (which kernel is) may be, well, complex - welcome to reality. It had been that way since the very beginning on _all_ MT projects, Linux included. You have complex private data - you may be in for pain protecting yourself from races. Protection of the public structures is there, so life became easier than it used to be back in 2.0/2.1/2.2 days. Making VFS single-threaded will not fly. If you can show simpler MT one - do it and a lot of people will be extremely grateful. 4.4BSD and SunOS ones are more complex and make the life harder for filesystem writers. Check yourself. OSF/1 is _much_ more complex. Hell knows what NT has, but filesystem glue there looks absolutely horrible - compared to them we are angels in that respect. v7 was simpler, sure enough. Without mmap(), rename() and truncate() _and_ with only one fs type - why not? Too bad that it was racey as hell... Plan 9 is nice and easy. Without mmap(), without link(), without truncate(), without cross-directory rename() and without support of crazy abortions from hell a-la AFFS. 2.0 and 2.2 are _way_ more complex, just compare filesystem code size in 2.4 with them and you will see. And yes, races in question are not new. I can reproduce them on 2.0.9 box. Single-processor one - nothing fancy and SMP-related. If you have a way to simplify VFS and/or filesystems - by all means, post it on fsdevel/l-k. Just tell what locking warranties you provide. Current ones are documented in the tree, so it will be very easy to compare. I'm not saying that they are ideal (check the documentation in question - I'm saying the opposite in quite a few cases). They _can_ be made better. But if you are saying that you know how to protect purely internal data structures without losing MT
Re: hfs support for blocksize != 512
At 8:09 PM -0500 8/29/2000, Roman Zippel wrote: >So lets get back to the vfs interface Yes, let's do that. Every time I hear someone talking about implementing a filesystem, the words "you are doomed" are usually to be heard somewhere along the lines. Now, the bits on disk aren't usually the part that kills you - heck, I repaired an HFS drive with a hex editor once (don't try that at home, kids) - it's the evil and miserable FS driver APIs that get you. Big ugly structs, coherency problems with layers upon layers of xyz-cache, locking nightmares etc. So, when my boss dropped a multiple-compressed-backed ramdisk filesystem in my lap and said "make it use less memory", the words "I am doomed" floated through my head. Thankfully for the sake of both myself and my sanity, the platform of choice was QNX 4. (Obligitory disclaimer: QNX is an embedded operating system, both it's architecture and target market is considerably different from Linux's) QNX's filesystem interfaces make it so painfully easy to write a filesystem that it puts everything else to shame. You can easily write a fully functioning, race-free, completely coherent filesystem in less than a week, it's that simple. When I wanted to make my compressed-backed ramdisk filesystem attach to multiple points in the namespace with seperate and multiple backings on each point, in only a single instance of the driver, it was as easy as changing 10 lines of code. Now, for those of you who don't have convinient access to QNX4 or QNX Neutrino (which has an even nicer interface, mostly cleaning up on the QNX4 stuff), here's the disneyfied version of how it all works: When your filesystem starts up it tells the FS api "hey you, fs api. if someone needs something under directory FOO, call me". Your filesystem then wanders off and sleeps in the background 'till someone needs it. Now, let's say you do an 'ls' on the FOO directory. The FS api would tap your filesystem on the shoulder and ask "Hey you, what's in the FOO directory?". Your filesystem would reply "BAR and BAZ". Now you do 'cat FOO/BAZ >/dev/null', the FS api taps your filesystem on the shoulder and says "someone wants to open FOO/BAZ". Your filesystem replys "Yeah, got it open, here's an FD for you". The FS layer then comes back again and says "I'll take block x y and z from the file on this FD", to which your filesystem replies "Ok, here it is". Etc etc, you get the point. So what does it all mean? Basically, if you want hugely complex dentries, and inodes as big as your head, you can do that. If you don't, more power to you. It's all entirely contained inside your specific FS code, the FS api doesn't care one bit. It just asks you for files. It also means that you can do cute things like use the exact same API for block/char/random devices as you do for filesystems. No big fuss over special files, procfs, devfs, or dead chickens, your device driver just calls up the FS api and says "hey, I'm /dev/dsp" or "hey, I'll be taking care of /proc/cpuinfo" and it all "just works". Also, it means that if you want to represent your multiforked filesystem as files-as-directories, (can-o-worms: open) you can just do it. No changes to the FS api, no other filesystems break, etc. Everything "just works". If someone, ANYONE, could bring this kind of painfully simple FS api to linux, and make it work, not only would I be eternally in their debt, I would personally send them a box of genuine canadian maple-sugar candies as a small token of my infinite thanks. Even failing that, I urge anyone who would want to look at (re)designing any filesystem API to look at how QNX does it. It's really a beautiful thing. Further reading can be found in "Getting Started with QNX Neutrino 2: A Guide for Realtime Programmers", ISBN 0968250114. I should apologise here for this email being particularily fluffy. It's getting a bit late here, and I don't want to switch my brain on again before I go to sleep. For those of you who would rather not have read through this entire email, here's the condensed version: VFS is inherintly a wrong-level API, QNX does it much better. Flame on. :) Cheers - Tony 'Nicoya' Mantler :) -- Tony "Nicoya" Mantler - Renaissance Nerd Extraordinaire - [EMAIL PROTECTED] Winnipeg, Manitoba, Canada -- http://nicoya.feline.pp.se/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, > > hfs. For example reading from a file might require a read from a btree > > file (extent file), with what another file write can be busy with (e.g. > > reordering the btree nodes). > > And? The point is: the thing I like about Linux is its simple interfaces, it's the basic idea of unix - keep it simple. That is true for most parts - the basic idea is simple and the real complexity is hidden behind it. But that's currently not true for vfs interface, a fs maintainer has to fight right now with fscking complex vfs interface and with a possible fscking complex fs implementation. E2fs or affs have a pretty simple structure and I believe you that it's not that hard to fix, maybe there is also a simple solution for hfs. But I'd like you to forget about that and think about the big picture (how Linus nicely states it). What we should aim at with the vfs interface is simplicity, I want to use a fscking simple semaphore to protect something like anywhere else, I don't want to juggle with lots blocks wich have to be updated atomically. Maybe you get once right, but it will follow you as a nightmare, you add one feature (e.g. quota), you add another feature (like btrees), you so still damned fscking sure to get and keeping it right? So and? What I'd really like to see from you is to be a bit more supportive for other peoples problems, I really don't expect you to solve these problems, but if someone approaches a different solution, you're pretty quick to refuse it. So lets get back to the vfs interface, fs currently have to do pretty much all there changes atomically, they have to grab all the buffers they need and do all changes at once. How can you be sure that this is possible for every possible fs? How do you make sure you don't create other problems like livelocks? We currently have problem that things like kswapd require an asynchronous interface, but fs prefer to synchronize it. Currently you pushing all the burden of an asynchronous interface into the fs, which want to rather avoid that. Why don't you think for a moment in the other direction? Currently I'm playing with the idea of a kernel thread for asynchronous io (maybe one per fs), that thread takes the io requests e.g. from kswapd and the io thread can safely sleep on it, while kswapd can continue its job, but I don't know yet, where to put, whether in the fs specific part or whether it can be made generic enough to be put into the generic part. Can we please think for a moment in that direction? At some point you have to synchronize the io anyway (at latest when it hits the device), but I would pretty much prefer if a fs would get some help at some earlier point. (Anyway, I need some sleep now as well... :) ) bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Tue, 29 Aug 2000, Roman Zippel wrote: > hfs. For example reading from a file might require a read from a btree > file (extent file), with what another file write can be busy with (e.g. > reordering the btree nodes). And? > I really would prefer that a fs could sleep _and_ can use semaphores, > that would keep locking simple, otherwise it gets only a fscking mess. WTF? HFS does not allow holes. _ALL_ allocations there are process synchronous. What's the problem? Pageout on HFS can not allocate blocks and that's the only process-async method. If you want to sleep at completely arbitrary moments while you are modifying the btree (i.e. in the moments when it's in the inconsistent state and hfs_get_block() would fail) - too bad, you are going to have problems. And not from me - power failure will take care of making your life _very_ painful. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
Hi, > Darnit, documentation on filesystem locking is there for purpose. First > folks complain about its absence, then they don't bother to read the > bloody thing once it is there. Furrfu... It's great that it's there, but still doesn't tell you everything. > Said that, handling of indirect blocks used to be badly b0rken on all > normal filesystems and it had been fixed only on ext2, so I wouldn't be > amazed if regular files were bad on B-tree style filesystems. Directories > are easy - all requests are process-synchronous (no pageout), no > truncate() in sight, so the life is better. I don't think that files are that easy, at least from what I know now from hfs. For example reading from a file might require a read from a btree file (extent file), with what another file write can be busy with (e.g. reordering the btree nodes). I really would prefer that a fs could sleep _and_ can use semaphores, that would keep locking simple, otherwise it gets only a fscking mess. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Tue, 29 Aug 2000, Matthew Wilcox wrote: > On Tue, Aug 29, 2000 at 06:08:04PM +0200, Roman Zippel wrote: > > Anyway, I'm happy about any bug reports, that you can't reproduce with > > hfs on a drive with 512 byte sectors (for that I still trying to fully > > understand hfs btrees :-) ). I don't think this patch should be included > > last time i looked (somewhere around 2.3.4x), all the B-tree directory > implementations in the kernel were broken. That's HFS, HPFS and NTFS. > None of them consider the race where an insert occurs into the tree > while you're doing a readdir. I thought about how to fix it for ext2 > btrees but I haven't come up with a satisfactory solution yet. readdir() holds ->both i_sem and ->i_zombie, so I'm not sure what other exclusion do you need. Darnit, documentation on filesystem locking is there for purpose. First folks complain about its absence, then they don't bother to read the bloody thing once it is there. Furrfu... Said that, handling of indirect blocks used to be badly b0rken on all normal filesystems and it had been fixed only on ext2, so I wouldn't be amazed if regular files were bad on B-tree style filesystems. Directories are easy - all requests are process-synchronous (no pageout), no truncate() in sight, so the life is better. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: hfs support for blocksize != 512
On Tue, Aug 29, 2000 at 06:08:04PM +0200, Roman Zippel wrote: > Anyway, I'm happy about any bug reports, that you can't reproduce with > hfs on a drive with 512 byte sectors (for that I still trying to fully > understand hfs btrees :-) ). I don't think this patch should be included last time i looked (somewhere around 2.3.4x), all the B-tree directory implementations in the kernel were broken. That's HFS, HPFS and NTFS. None of them consider the race where an insert occurs into the tree while you're doing a readdir. I thought about how to fix it for ext2 btrees but I haven't come up with a satisfactory solution yet. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
hfs support for blocksize != 512
Hi, Here is a patch for anyone who needs to access HFS on e.g. an MO drive. It's only for 2.2.16, but I was able to do that as part of my job as we need that functionality. Anyway, I've read also a bit through HFS+ spec and IMO basically most of the current hfs needs to rewritten for 2.4, e.g. its special files should better go into the page cache and hfs basically assumes everywhere 512 byte blocks, what isn't true anymore with hfs+. This 512 bytes block problem is also the reason that the perfomance of this patch will suck badly on MOs, since _every_ write (of a 512 byte block) requires a read (of a 1024 byte sector). Anyway, I'm happy about any bug reports, that you can't reproduce with hfs on a drive with 512 byte sectors (for that I still trying to fully understand hfs btrees :-) ). I don't think this patch should be included into standard 2.2, but on the other hand it also shouldn't make anything worse than it already is. bye, Roman hfs1024.diff.gz