subject:"Getting FS access events"

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie

Hi,

On Tue, May 15, 2001 at 04:37:01PM +1200, Chris Wedgwood wrote:
> On Sun, May 13, 2001 at 08:39:23PM -0600, Richard Gooch wrote:
> 
> Yeah, we need a decent unfragmenter. We can do that now with
> bmap().
> 
> SCT wrote a defragger for ext2 but it only handles 1k blocks :(

Actually, I wrote it for extfs, and Alexey Vovenko ported it to ext2.
Extfs *really* needed a defragmenter, because it had weird behaviour
patterns which included allocating all of the blocks of a file in
descending disk blocks at times.  

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie

Hi,

On Fri, May 18, 2001 at 09:55:14AM +0200, Rogier Wolff wrote:

> The "boot quickly" was an example. "Load netscape quickly" on some
> systems is done by dd-ing the binary to /dev/null. 

This is one of the reasons why some filesystems use extent maps
instead of inode indirection trees.  The problem of caching the
metadata basically just goes away if your mapping information is a few
bytes saying "this file is an extent of a hundred block at offset FOO
followed by fifty blocks at offset BAR."

If the mapping metadata is _that_ compact, then your binaries are
almost guaranteed to be either mapped in the inode or in a single
mapping block, so the problem of seeking between indirect blocks
basically just goes away.  You still have to do things like prime the
inode/indirect cache before the first data access if you want
directory scans to go fast, and you still have to preload data pages
for readahead, of course.  

If the objective is "start netscape faster", then the cost of having
to do one synchronous IO to pull in a single indirect extent map block
is going to be negligible next to the other costs.

(Extent maps have their own problems, especially when it comes to
dealing with holes, but that's a different story...)

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie

Hi,

On Sat, May 19, 2001 at 12:47:15PM -0700, Linus Torvalds wrote:
> 
> On Sat, 19 May 2001, Pavel Machek wrote:
> > 
> > > Don't get _too_ hung up about the power-management kind of "invisible
> > > suspend/resume" sequence where you resume the whole kernel state.
> > 
> > Ugh. Now I'm confused. How do you do usefull resume from disk when you
> > don't restore complete state? Do you propose something like "write
> > only pagecache to disk"?
> 
> Go back to the original _reason_ for this whole discussion. 
> 
> It's not really a "resume" event, it's a "populate caches really
> efficiently at boot" event.

Then you'd better be sure that the cache (or at least, the saved
image) only contains data which is guaranteed not to be written
between successive restores from the same image.  The big advantage of
just resuming from the state of the previous shutdown (whether it's
cache or the whole kernenl state) is that you've got a much higher
expectation that nothing on disk got modified between the save and the
restore.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie


Hi,

On Sat, May 19, 2001 at 12:47:15PM -0700, Linus Torvalds wrote:
 
 On Sat, 19 May 2001, Pavel Machek wrote:
  
   Don't get _too_ hung up about the power-management kind of invisible
   suspend/resume sequence where you resume the whole kernel state.
  
  Ugh. Now I'm confused. How do you do usefull resume from disk when you
  don't restore complete state? Do you propose something like write
  only pagecache to disk?
 
 Go back to the original _reason_ for this whole discussion. 
 
 It's not really a resume event, it's a populate caches really
 efficiently at boot event.

Then you'd better be sure that the cache (or at least, the saved
image) only contains data which is guaranteed not to be written
between successive restores from the same image.  The big advantage of
just resuming from the state of the previous shutdown (whether it's
cache or the whole kernenl state) is that you've got a much higher
expectation that nothing on disk got modified between the save and the
restore.

--Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie


Hi,

On Tue, May 15, 2001 at 04:37:01PM +1200, Chris Wedgwood wrote:
 On Sun, May 13, 2001 at 08:39:23PM -0600, Richard Gooch wrote:
 
 Yeah, we need a decent unfragmenter. We can do that now with
 bmap().
 
 SCT wrote a defragger for ext2 but it only handles 1k blocks :(

Actually, I wrote it for extfs, and Alexey Vovenko ported it to ext2.
Extfs *really* needed a defragmenter, because it had weird behaviour
patterns which included allocating all of the blocks of a file in
descending disk blocks at times.  

Cheers,
 Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie


Hi,

On Fri, May 18, 2001 at 09:55:14AM +0200, Rogier Wolff wrote:

 The boot quickly was an example. Load netscape quickly on some
 systems is done by dd-ing the binary to /dev/null. 

This is one of the reasons why some filesystems use extent maps
instead of inode indirection trees.  The problem of caching the
metadata basically just goes away if your mapping information is a few
bytes saying this file is an extent of a hundred block at offset FOO
followed by fifty blocks at offset BAR.

If the mapping metadata is _that_ compact, then your binaries are
almost guaranteed to be either mapped in the inode or in a single
mapping block, so the problem of seeking between indirect blocks
basically just goes away.  You still have to do things like prime the
inode/indirect cache before the first data access if you want
directory scans to go fast, and you still have to preload data pages
for readahead, of course.  

If the objective is start netscape faster, then the cost of having
to do one synchronous IO to pull in a single indirect extent map block
is going to be negligible next to the other costs.

(Extent maps have their own problems, especially when it comes to
dealing with holes, but that's a different story...)

--Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-20 Thread Alan Cox


> I'm confused. I've always wondered that before you suspend the state
> of a machine to disk, why we just don't throw away unnecessary data
> like anything not actively referenced.

swsusp does exactly that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-19 Thread Linus Torvalds

On Sat, 19 May 2001, Pavel Machek wrote:
> 
> > Don't get _too_ hung up about the power-management kind of "invisible
> > suspend/resume" sequence where you resume the whole kernel state.
> 
> Ugh. Now I'm confused. How do you do usefull resume from disk when you
> don't restore complete state? Do you propose something like "write
> only pagecache to disk"?

Go back to the original _reason_ for this whole discussion. 

It's not really a "resume" event, it's a "populate caches really
efficiently at boot" event. But the two are basically the same problem,
it's only a matter of how much you populate (do you populate _everything_
or do you populate just disk caches. Populating just the caches is the
smaller and simpler problem, that only solves the "fast boot" issue).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-19 Thread Pavel Machek


Hi!

> > resume from disk is actually pretty hard to do in way it is readed linearily.
> > 
> > While playing with swsusp patches (== suspend to disk) I found out that
> > it was slow. It needs to do atomic snapshot, and only reasonable way to
> > do that is free half of RAM, cli() and copy.
> 
> Note that "resume from disk" does _not_ have to necessarily resume kernel
> data structures. It is enough if it just resumes the caches etc. 

> Don't get _too_ hung up about the power-management kind of "invisible
> suspend/resume" sequence where you resume the whole kernel state.

Ugh. Now I'm confused. How do you do usefull resume from disk when you
don't restore complete state? Do you propose something like "write
only pagecache to disk"?
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-19 Thread Linus Torvalds

On Tue, 15 May 2001, Pavel Machek wrote:
> 
> resume from disk is actually pretty hard to do in way it is readed linearily.
> 
> While playing with swsusp patches (== suspend to disk) I found out that
> it was slow. It needs to do atomic snapshot, and only reasonable way to
> do that is free half of RAM, cli() and copy.

Note that "resume from disk" does _not_ have to necessarily resume kernel
data structures. It is enough if it just resumes the caches etc. 

Don't get _too_ hung up about the power-management kind of "invisible
suspend/resume" sequence where you resume the whole kernel state.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-19 Thread Linus Torvalds



On Tue, 15 May 2001, Pavel Machek wrote:
 
 resume from disk is actually pretty hard to do in way it is readed linearily.
 
 While playing with swsusp patches (== suspend to disk) I found out that
 it was slow. It needs to do atomic snapshot, and only reasonable way to
 do that is free half of RAM, cli() and copy.

Note that resume from disk does _not_ have to necessarily resume kernel
data structures. It is enough if it just resumes the caches etc. 

Don't get _too_ hung up about the power-management kind of invisible
suspend/resume sequence where you resume the whole kernel state.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-19 Thread Pavel Machek


Hi!

  resume from disk is actually pretty hard to do in way it is readed linearily.
  
  While playing with swsusp patches (== suspend to disk) I found out that
  it was slow. It needs to do atomic snapshot, and only reasonable way to
  do that is free half of RAM, cli() and copy.
 
 Note that resume from disk does _not_ have to necessarily resume kernel
 data structures. It is enough if it just resumes the caches etc. 

 Don't get _too_ hung up about the power-management kind of invisible
 suspend/resume sequence where you resume the whole kernel state.

Ugh. Now I'm confused. How do you do usefull resume from disk when you
don't restore complete state? Do you propose something like write
only pagecache to disk?
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-19 Thread Linus Torvalds



On Sat, 19 May 2001, Pavel Machek wrote:
 
  Don't get _too_ hung up about the power-management kind of invisible
  suspend/resume sequence where you resume the whole kernel state.
 
 Ugh. Now I'm confused. How do you do usefull resume from disk when you
 don't restore complete state? Do you propose something like write
 only pagecache to disk?

Go back to the original _reason_ for this whole discussion. 

It's not really a resume event, it's a populate caches really
efficiently at boot event. But the two are basically the same problem,
it's only a matter of how much you populate (do you populate _everything_
or do you populate just disk caches. Populating just the caches is the
smaller and simpler problem, that only solves the fast boot issue).

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-18 Thread Rogier Wolff

Linus Torvalds wrote:
> I'm really serious about doing "resume from disk". If you want a fast
> boot, I will bet you a dollar that you cannot do it faster than by loading
> a contiguous image of several megabytes contiguously into memory. There is
> NO overhead, you're pretty much guaranteed platter speeds, and there are
> no issues about trying to order accesses etc. There are also no issues
> about messing up any run-time data structures.

Linus, 

The "boot quickly" was an example. "Load netscape quickly" on some
systems is done by dd-ing the binary to /dev/null. 

Now, you're going to say again that this won't work because of
buffer-cache/page-cache incoherency.  That is NOT the point. The point
is that the fun about a cache is that it's just a cache. It speeds
things up transparently. 

If I need a new "prime-the-cache" program to mmap the files, and
trigger a page-in in the right order, then that's fine with me.

The fun about doing these tricks is that it works, and keeps on
working (functionally) even if it stops working (fast).

Yes, there is a way to boot even faster: preloading memory. Fine. But
this doesn't allow me to load netscape quicker.

Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-18 Thread Rogier Wolff


Linus Torvalds wrote:
 I'm really serious about doing resume from disk. If you want a fast
 boot, I will bet you a dollar that you cannot do it faster than by loading
 a contiguous image of several megabytes contiguously into memory. There is
 NO overhead, you're pretty much guaranteed platter speeds, and there are
 no issues about trying to order accesses etc. There are also no issues
 about messing up any run-time data structures.

Linus, 

The boot quickly was an example. Load netscape quickly on some
systems is done by dd-ing the binary to /dev/null. 

Now, you're going to say again that this won't work because of
buffer-cache/page-cache incoherency.  That is NOT the point. The point
is that the fun about a cache is that it's just a cache. It speeds
things up transparently. 

If I need a new prime-the-cache program to mmap the files, and
trigger a page-in in the right order, then that's fine with me.

The fun about doing these tricks is that it works, and keeps on
working (functionally) even if it stops working (fast).

Yes, there is a way to boot even faster: preloading memory. Fine. But
this doesn't allow me to load netscape quicker.

Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-17 Thread Pavel Machek


Hi!

> Besides, just how often do you reboot the box? If that's the hotspot for
> you - when the hell does the boor beast find time to do something useful?

Ten times a day?

But booting is special case: You can read your mail while compiling kernel, 
but try to read your mail while your machine is booting.

What's worse, boot time tends to be time critical, as in "I need to find that 
mail that tells me where I'm expected to be half an hour from now. Ouch. It's 
going to take 40 minutes to get there."
Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-17 Thread Pavel Machek


Hi!

> And because your suspend/resume idea isn't really going to help me
> much. That's because my boot scripts have the notion of
> "personalities" (change the boot configuration by asking the user
> early on in the boot process). If I suspend after I've got XDM
> running, it's too late.

Why not e2defrag so that everything needed for bootup is linear on the
start of disk? Use strace to collect statistics of what happens during 
bootup. [strac should be good enough. If not, uml is.]
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-17 Thread Pavel Machek


Hi!

> I'm really serious about doing "resume from disk". If you want a fast
> boot, I will bet you a dollar that you cannot do it faster than by loading
> a contiguous image of several megabytes contiguously into memory. There is
> NO overhead, you're pretty much guaranteed platter speeds, and there are
> no issues about trying to order accesses etc. There are also no issues
> about messing up any run-time data structures.

resume from disk is actually pretty hard to do in way it is readed linearily.

While playing with swsusp patches (== suspend to disk) I found out that
it was slow. It needs to do atomic snapshot, and only reasonable way to
do that is free half of RAM, cli() and copy.

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-17 Thread Pavel Machek


Hi!

 I'm really serious about doing resume from disk. If you want a fast
 boot, I will bet you a dollar that you cannot do it faster than by loading
 a contiguous image of several megabytes contiguously into memory. There is
 NO overhead, you're pretty much guaranteed platter speeds, and there are
 no issues about trying to order accesses etc. There are also no issues
 about messing up any run-time data structures.

resume from disk is actually pretty hard to do in way it is readed linearily.

While playing with swsusp patches (== suspend to disk) I found out that
it was slow. It needs to do atomic snapshot, and only reasonable way to
do that is free half of RAM, cli() and copy.

-- 
Philips Velo 1: 1x4x8, 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-17 Thread Pavel Machek


Hi!

 Besides, just how often do you reboot the box? If that's the hotspot for
 you - when the hell does the boor beast find time to do something useful?

Ten times a day?

But booting is special case: You can read your mail while compiling kernel, 
but try to read your mail while your machine is booting.

What's worse, boot time tends to be time critical, as in I need to find that 
mail that tells me where I'm expected to be half an hour from now. Ouch. It's 
going to take 40 minutes to get there.
Pavel
-- 
Philips Velo 1: 1x4x8, 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-17 Thread Pavel Machek


Hi!

 And because your suspend/resume idea isn't really going to help me
 much. That's because my boot scripts have the notion of
 personalities (change the boot configuration by asking the user
 early on in the boot process). If I suspend after I've got XDM
 running, it's too late.

Why not e2defrag so that everything needed for bootup is linear on the
start of disk? Use strace to collect statistics of what happens during 
bootup. [strac should be good enough. If not, uml is.]
-- 
Philips Velo 1: 1x4x8, 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-16 Thread H. Peter Anvin


Anton Altaparmakov wrote:
> 
> True, but I was under the impression that Linus' master plan was that the
> two would be in entirely separate name spaces using separate cached copies
> of the device blocks.
> 

Nothing was said about the superblock at all.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-16 Thread Anton Altaparmakov

At 02:30 16/05/2001, H. Peter Anvin wrote:
>Anton Altaparmakov wrote:
> > And how are you thinking of this working "without introducing new
> > interfaces" if the caches are indeed incoherent? Please correct me if I
> > understand wrong, but when two caches are incoherent, I thought it means
> > that the above _would_ screw up unless protected by exclusive write locking
> > as I suggested in my previous post with the side effect that you can't
> > write the boot block without unmounting the filesystem or modifying some
> > interface somewhere.
>
>Not if direct device acess and the superblock exist in the same mapping
>space, OR an explicit interface to write the boot block is created.

True, but I was under the impression that Linus' master plan was that the 
two would be in entirely separate name spaces using separate cached copies 
of the device blocks. Putting them into the same cache would make things 
work of course, although direct access would probably give you a view of an 
inconsistent file system if the fs was writing around the page cache at the 
time (unless the fs and direct accesses lock every page on write access, 
perhaps by zeroing the uptodate flag on the page).

An explicit interface for the boot block would be interesting. AFAICS it 
would have to call down into the file system driver itself (a 
read/write_boot_block method in super_operations perhaps?) due to the 
differences in how the boot block is stored on different filesystems 
(thinking of the "boot block is a file" NTFS case).

Best regards,

 Anton

-- 
Anton Altaparmakov  (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-16 Thread Anton Altaparmakov


At 02:30 16/05/2001, H. Peter Anvin wrote:
Anton Altaparmakov wrote:
  And how are you thinking of this working without introducing new
  interfaces if the caches are indeed incoherent? Please correct me if I
  understand wrong, but when two caches are incoherent, I thought it means
  that the above _would_ screw up unless protected by exclusive write locking
  as I suggested in my previous post with the side effect that you can't
  write the boot block without unmounting the filesystem or modifying some
  interface somewhere.

Not if direct device acess and the superblock exist in the same mapping
space, OR an explicit interface to write the boot block is created.

True, but I was under the impression that Linus' master plan was that the 
two would be in entirely separate name spaces using separate cached copies 
of the device blocks. Putting them into the same cache would make things 
work of course, although direct access would probably give you a view of an 
inconsistent file system if the fs was writing around the page cache at the 
time (unless the fs and direct accesses lock every page on write access, 
perhaps by zeroing the uptodate flag on the page).

An explicit interface for the boot block would be interesting. AFAICS it 
would have to call down into the file system driver itself (a 
read/write_boot_block method in super_operations perhaps?) due to the 
differences in how the boot block is stored on different filesystems 
(thinking of the boot block is a file NTFS case).

Best regards,

 Anton


-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-16 Thread H. Peter Anvin


Anton Altaparmakov wrote:
 
 True, but I was under the impression that Linus' master plan was that the
 two would be in entirely separate name spaces using separate cached copies
 of the device blocks.
 

Nothing was said about the superblock at all.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Anton Altaparmakov wrote:
> 
> And how are you thinking of this working "without introducing new
> interfaces" if the caches are indeed incoherent? Please correct me if I
> understand wrong, but when two caches are incoherent, I thought it means
> that the above _would_ screw up unless protected by exclusive write locking
> as I suggested in my previous post with the side effect that you can't
> write the boot block without unmounting the filesystem or modifying some
> interface somewhere.
> 

Not if direct device acess and the superblock exist in the same mapping
space, OR an explicit interface to write the boot block is created.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Anton Altaparmakov

At 23:35 15/05/2001, H. Peter Anvin wrote:
>"Albert D. Cahalan" wrote:
> > H. Peter Anvin writes:
> > > This would leave no way (without introducing new interfaces) to write,
> > > for example, the boot block on an ext2 filesystem.  Note that the
> > > bootblock (defined as the first 1024 bytes) is not actually used by
> > > the filesystem, although depending on the block size it may share a
> > > block with the superblock (if blocksize > 1024).
> >
> > The lack of coherency would screw this up anyway, doesn't it?
> > You have a block device, soon to be in the page cache, and
> > a superblock, also soon to be in the page cache. LILO writes to
> > the block device, while the ext2 driver updates the superblock.
> > Whatever gets written out last wins, and the other is lost.
>
>Albert, I *did* say "this better work or we have a problem."

And how are you thinking of this working "without introducing new 
interfaces" if the caches are indeed incoherent? Please correct me if I 
understand wrong, but when two caches are incoherent, I thought it means 
that the above _would_ screw up unless protected by exclusive write locking 
as I suggested in my previous post with the side effect that you can't 
write the boot block without unmounting the filesystem or modifying some 
interface somewhere.

As not all filesystems are like ext2, perhaps it would be better to fix 
ext2 and not the cache coherency? If ext2 is claiming ownership of a 
device, then it should do so in its entirety IMHO. You could always extend 
ext2 to use the NTFS approach where the bootsector is nothing more than a 
file which happens to exist on sector(s) zero (and following) of the 
device... (just a thought)

Best regards,

Anton

-- 
Anton Altaparmakov  (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


"Albert D. Cahalan" wrote:
> 
> H. Peter Anvin writes:
> 
> > This would leave no way (without introducing new interfaces) to write,
> > for example, the boot block on an ext2 filesystem.  Note that the
> > bootblock (defined as the first 1024 bytes) is not actually used by
> > the filesystem, although depending on the block size it may share a
> > block with the superblock (if blocksize > 1024).
> 
> The lack of coherency would screw this up anyway, doesn't it?
> You have a block device, soon to be in the page cache, and
> a superblock, also soon to be in the page cache. LILO writes to
> the block device, while the ext2 driver updates the superblock.
> Whatever gets written out last wins, and the other is lost.
> 

Albert, I *did* say "this better work or we have a problem."

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Albert D. Cahalan

H. Peter Anvin writes:

> This would leave no way (without introducing new interfaces) to write,
> for example, the boot block on an ext2 filesystem.  Note that the
> bootblock (defined as the first 1024 bytes) is not actually used by
> the filesystem, although depending on the block size it may share a
> block with the superblock (if blocksize > 1024).

The lack of coherency would screw this up anyway, doesn't it?
You have a block device, soon to be in the page cache, and
a superblock, also soon to be in the page cache. LILO writes to
the block device, while the ext2 driver updates the superblock.
Whatever gets written out last wins, and the other is lost.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Jan Harkes

On Tue, May 15, 2001 at 02:02:29PM -0700, Linus Torvalds wrote:
> In article <[EMAIL PROTECTED]>,
> Alexander Viro  <[EMAIL PROTECTED]> wrote:
> >On Tue, 15 May 2001, H. Peter Anvin wrote:
> >
> >> Alexander Viro wrote:
> >> > >
> >> > > None whatsoever.  The one thing that matters is that noone starts making
> >> > > the assumption that mapping->host->i_mapping == mapping.

Don't worry too much about that, that relationship has been false for
Coda ever since i_mapping was introduced.

The only problem that is still lingering is related to i_size. Writes
update inode->i_mapping->host->i_size, and stat reads inode->i_size,
which are not the same.

I sent a small patch to stat.c for this a long time ago (Linux
2.3.99-pre6-7), which made the assumption in stat that i_mapping->host
was an inode. (effectively tmp.st_size = inode->i_mapping->host->i_size)

Other solutions were to finish the getattr implementation, or keep a
small Coda-specific wrapper for generic_file_write around.

> >> > One actually shouldn't assume that mapping->host is an inode.
> >> 
> >> What else could it be, since it's a "struct inode *"?  NULL?
> >
> >struct block_device *, for one thing. We'll have to do that as soon
> >as we do block devices in pagecache.
> 
> No, Al. It's an inode. It was a major mistake to ever think anything
> else.

So is anyone interested in a small patch for stat.c? It fixes, as far as
I know, the last place that 'assumes' that inode->i_mapping->host is
identical to 

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
> 
> void *.
> 
> Look, methods of your address_space certainly know what they hell they
> are dealing with. Just as autofs_root_readdir() knows what inode->u.generic_ip
> really points to.
> 
> Anybody else has no business to care about the contents of ->host.
> 

Why do we need a ->host at all, then?  Why not simply make it a private
pointer?

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Alexander Viro  <[EMAIL PROTECTED]> wrote:
>> 
>> How would you know what datatype it is?  A union?  Making "struct
>> block_device *" a "struct inode *" in a nonmounted filesystem?  In a
>> devfs?  (Seriously.  Being able to do these kinds of data-structural
>> equivalence is IMO the nice thing about devfs & co...)
>
>void *.

No. It used to be that way, and it was a horrible mess.

We _need_ to know that it's an inode, because the generic mapping
functions basically need to do things like

mark_inode_dirty_pages(mapping->host);

which in turn needs the host to be an inode (otherwise you don't know
how and where to write the dang things back again).

There's no question that you can avoid it being an inode by virtualizing
more of it, and adding more virtual functions to the mapping operations
(right now the only one you'd HAVE to add is the "mark_page_dirty()"
operation), but the fact is that code gets really ugly by doing things
like that.

It was an absolute pleasure to remove all the casts of "mapping->host".
With "void *" it needed to be cast to the right type (and you had to be
able to _prove_ that you knew what the right type was). With "inode *",
the type is statically known, and you don't actually lose anything (at
worst, you'd have a virtual inode and then do an extra layer of
indirection there).

I really don't think we want to go back to "void *". 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro

On Tue, 15 May 2001, Alexander Viro wrote:

> On 15 May 2001, Kai Henningsen wrote:
> 
> > [EMAIL PROTECTED] (Alexander Viro)  wrote on 15.05.01 in 
><[EMAIL PROTECTED]>:
> > 
> > > ... and Multics had all access to files through equivalent of mmap()
> > > in 60s. "Segments" in ls(1) got that name for a good reason.
> > 
> > Where's something called "segments" connected with ls(1)? I can't seem to  
> > find the reference.
> 
> ls == list segments. Name came from Multics.

Basically, they had the whole address space consisting of mmaped files.
address was (segment << 18) + offset (both up to 18 bits) and primitive
was "attach segment (== file) to address space". Each segment had its
own page table, BTW. Directories were special segments and contained
references to other segments (both files and directories). Root had fixed
ID. You could lookup segment by name.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Alexander Viro  <[EMAIL PROTECTED]> wrote:
>On Tue, 15 May 2001, H. Peter Anvin wrote:
>
>> Alexander Viro wrote:
>> > >
>> > > None whatsoever.  The one thing that matters is that noone starts making
>> > > the assumption that mapping->host->i_mapping == mapping.
>> > 
>> > One actually shouldn't assume that mapping->host is an inode.
>> > 
>> 
>> What else could it be, since it's a "struct inode *"?  NULL?
>
>struct block_device *, for one thing. We'll have to do that as soon
>as we do block devices in pagecache.

No, Al. It's an inode. It was a major mistake to ever think anything
else.

I see your problem, but it's not a real problem.  What you do for block
devices (or anything like that where you might have _multiple_ inodes
pointing to the same thing, is to just create a "virtual inode", and
have THAT be the one that the mapping is associated with.  Basically
each "struct block_device *" would have an inode associated with it, to
act as a anchor for things like this. 

What is "struct inode", after all? It's just the virtual representation
of a "entity". The inodes associated with /dev/hda are not the inodes
associated with the actual _device_ - they are just on-disk "links" to
the physical device. 

[ Aside: there are good arguments to _not_ embed "struct inode" into
  "struct block_device", but instead do it the other way around - the
  same way we have filesystem-specific inode data inside "struct inode"
  we can easily have device-type specific data there.  And it makes a
  whole lot more sense to attach a mount to an inode than it makes to
  attach a mount to a "struct block_device".

  Done right, we could eventually get rid of "loopback block devices".
  They'd just be inodes that aren't of type "struct block_device", and
  the index to "struct buffer_head" would not be , but . See? The added level of indirection
  is one that we actually already _use_, it's just that we have this
  loopback device special case for it..

  In a "perfect" setup you could actually do "mount -t ext2 file /mnt/x"
  without having _any_ loopback setup or anything like that, simply
  because you don't _need_ it. It would be automatic. ]

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On 15 May 2001, Kai Henningsen wrote:

> [EMAIL PROTECTED] (Alexander Viro)  wrote on 15.05.01 in 
><[EMAIL PROTECTED]>:
> 
> > ... and Multics had all access to files through equivalent of mmap()
> > in 60s. "Segments" in ls(1) got that name for a good reason.
> 
> Where's something called "segments" connected with ls(1)? I can't seem to  
> find the reference.

ls == list segments. Name came from Multics.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Kai Henningsen


[EMAIL PROTECTED] (Alexander Viro)  wrote on 15.05.01 in 
<[EMAIL PROTECTED]>:

> ... and Multics had all access to files through equivalent of mmap()
> in 60s. "Segments" in ls(1) got that name for a good reason.

Where's something called "segments" connected with ls(1)? I can't seem to  
find the reference.


MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, H. Peter Anvin wrote:

> Alexander Viro wrote:
> > >
> > > What else could it be, since it's a "struct inode *"?  NULL?
> > 
> > struct block_device *, for one thing. We'll have to do that as soon
> > as we do block devices in pagecache.
> > 
> 
> How would you know what datatype it is?  A union?  Making "struct
> block_device *" a "struct inode *" in a nonmounted filesystem?  In a
> devfs?  (Seriously.  Being able to do these kinds of data-structural
> equivalence is IMO the nice thing about devfs & co...)

void *.

Look, methods of your address_space certainly know what they hell they
are dealing with. Just as autofs_root_readdir() knows what inode->u.generic_ip
really points to.

Anybody else has no business to care about the contents of ->host.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, H. Peter Anvin wrote:

> Alexander Viro wrote:
> > >
> > > None whatsoever.  The one thing that matters is that noone starts making
> > > the assumption that mapping->host->i_mapping == mapping.
> > 
> > One actually shouldn't assume that mapping->host is an inode.
> > 
> 
> What else could it be, since it's a "struct inode *"?  NULL?

struct block_device *, for one thing. We'll have to do that as soon
as we do block devices in pagecache.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
> >
> > What else could it be, since it's a "struct inode *"?  NULL?
> 
> struct block_device *, for one thing. We'll have to do that as soon
> as we do block devices in pagecache.
> 

How would you know what datatype it is?  A union?  Making "struct
block_device *" a "struct inode *" in a nonmounted filesystem?  In a
devfs?  (Seriously.  Being able to do these kinds of data-structural
equivalence is IMO the nice thing about devfs & co...)

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, H. Peter Anvin wrote:

> Alexander Viro wrote:
> > 
> > On 15 May 2001, H. Peter Anvin wrote:
> > 
> > > isofs wouldn't be too bad as long as struct mapping:struct inode is a
> > > many-to-one mapping.
> > 
> > Erm... What's wrong with inode->u.isofs_i.my_very_own_address_space ?
> > 
> 
> None whatsoever.  The one thing that matters is that noone starts making
> the assumption that mapping->host->i_mapping == mapping.

One actually shouldn't assume that mapping->host is an inode.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
> >
> > None whatsoever.  The one thing that matters is that noone starts making
> > the assumption that mapping->host->i_mapping == mapping.
> 
> One actually shouldn't assume that mapping->host is an inode.
> 

What else could it be, since it's a "struct inode *"?  NULL?

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
> 
> On 15 May 2001, H. Peter Anvin wrote:
> 
> > isofs wouldn't be too bad as long as struct mapping:struct inode is a
> > many-to-one mapping.
> 
> Erm... What's wrong with inode->u.isofs_i.my_very_own_address_space ?
> 

None whatsoever.  The one thing that matters is that noone starts making
the assumption that mapping->host->i_mapping == mapping.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On 15 May 2001, H. Peter Anvin wrote:

> isofs wouldn't be too bad as long as struct mapping:struct inode is a
> many-to-one mapping.

Erm... What's wrong with inode->u.isofs_i.my_very_own_address_space ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Followup to:  <[EMAIL PROTECTED]>
By author:Anton Altaparmakov <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
> 
> They shouldn't, but maybe some stupid utility or a typo will do it creating 
> two incoherent copies of the same block on the device. -> Bad Things can 
> happen.
> 
> Can't we simply stop people from doing it by say having mount lock the 
> device from further opens (and vice versa of course, doing a "dd" should 
> result in lock of device preventing a mount during the duration of "dd"). - 
> Wouldn't this be a good thing, guaranteeing that problems cannot happen 
> while not incurring any overhead except on device open/close? Or is this a 
> matter of "give the user enough rope"? - If proper rw locking is 
> implemented it could allow simultaneous -o ro mount with a dd from the 
> device but do exclusive write locking, for example, for maximum flexibility.
> 

This would leave no way (without introducing new interfaces) to write,
for example, the boot block on an ext2 filesystem.  Note that the
bootblock (defined as the first 1024 bytes) is not actually used by
the filesystem, although depending on the block size it may share a
block with the superblock (if blocksize > 1024).

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Followup to:  <[EMAIL PROTECTED]>
By author:Alexander Viro <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
> 
> UNIX-like ones (and that includes QNX) are easy. HFS is hopeless - it won't
> be fixed unless authors will do it. Tigran will probably fix BFS just as a
> learning experience ;-) ADFS looks tolerably easy to fix. AFFS... directories
> will be pure hell - blocks jump from directory to directory at zero notice.
> NTFS and HPFS will win from switch (esp. NTFS). FAT is not a problem, if we
> are willing to break CVF and let author fix it. Reiserfs... Dunno. They've
> got a private (slightly mutated) copy of ~60% of fs/buffer.c. UDF should be
> OK. ISOFS... ask Peter. JFFS - dunno.
> 

isofs wouldn't be too bad as long as struct mapping:struct inode is a
many-to-one mapping.

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Craig Milo Rogers


>And because your suspend/resume idea isn't really going to help me
>much. That's because my boot scripts have the notion of
>"personalities" (change the boot configuration by asking the user
>early on in the boot process). If I suspend after I've got XDM
>running, it's too late.

Preface: As has been mentioned on this discussion thread, some
disk devices maintain a cache of their own, running on a small (by
today's standards) CPU.  These caches are probably sector oriented,
not block oriented, but are almost certainly not page oriented or
filesystem oriented.  Well, OK, some might have DOS filesystem
knowlege built-in, I suppose... yuck!

Anyway, although there may be slight differences, they are
effectively block-orieted caches.  As long as they are write-through
(and/or there are cache flushing commands, etc), there are reasonably
coherent with the operating system's main cache, and they meet the
expectations of database programs, etc. that want stable storage.

In terms of efficiency, there are questions about read-aheead,
write-behind, write-through with invalidation or write-through with
cache update -- the usual stuff.  I leave it as an exercise for the
reader to decide how to best tune their system, and merely assert that
it can be done.

Imagine, as a mental exercise, that you move this
block-oriented cache out of the disk drive, and into the main CPU and
operating system, say roughly at the disk driver level.  We lose the
efficiency of having the small CPU do the block lookups, but a hashed
block lookup is rather cheap nowadays, wouldn't you say?  Ignoring
issues of, "What if the disk drive fails independently of the main
CPU, or vice versa?", the transplanted block cache should operate
pretty much as it did in the disk drive.

In particular, it should continue to operate properly with the
main CPU's main page cache.

Conclusion: a page cache can successfully run over a
appropriately designed block cache.  QED.

What's the hitch?  It's the "appropriately designed"
constraint.  It is quite possible that the Linux block cache is not
designed (data strictures and code paths considered together) in a way
that allows it to mimic a simple disk drive's block cache.  I assume
that there's some impediment, or this discussion wouldn't have lasted
so long -- the idea of using the Linux block cache to model a disk
drive's block cache is pretty obvious, after all.

>So what I want is a solution that will keep the kernel clean (believe
>me, I really do want to keep it clean), but gives me a fast boot too.
>And I believe the solution is out there. We just haven't found it yet.

Well, if you want a fast boot *on a single type of disk
drive*, and the existing Linux block cache doesn't work, you could
extend the driver for that hardware with an optional block cache,
independently of Linux' block cache, along with an appropriate
interface to populate it with boot-time blocks, and to flush it when
no longer needed.  That's not exactly clean, though, is it?

You could extend the md (or LVM) drivers, or create a new
driver similar to one of them, that incorporates a simple block cache,
with appropriate mechanisms for populating and flushing it.  Clean?
er, no, rather muddy, in fact.

You might want to lock down the pages that you've
prepopulated, rather than let them be discarded before they're needed.
This could be designed into a new block cache, but you might need to
play some accounting games to get it right with the existing block
cache.

Finally, there's Linus' offer for a preread call, to
prepopulate the page cache.  By virtue of your knowlege of the
underlying implementation of the system, you could preload the file
system index pages into the block cache, and load the datd pages into
the page cache.  Clean!  Sewer-like!

Craig Milo Rogers

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Chris Mason




On Tuesday, May 15, 2001 04:33:57 AM -0400 Alexander Viro
<[EMAIL PROTECTED]> wrote:

> 
> 
> On Tue, 15 May 2001, Linus Torvalds wrote:
> 
>> Looks like there are 19 filesystems that use the buffer cache right now:
>> 
>>  grep -l bread fs/*/*.c | cut -d/ -f2 | sort -u | wc
>> 
>> So quite a bit of work involved.
> 
> Reiserfs... Dunno. They've got a private (slightly mutated) copy of
> ~60% of fs/buffer.c. 

But, putting the log and the metadata in the page cache makes memory
pressure and such cleaner, so this is one of my goals for 2.5.  reiserfs
will still have alias issues due to the packed tails (one copy in the
btree, another in the page), but it will be no worse than it is now.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Daniel Phillips

On Tuesday 15 May 2001 12:44, Alexander Viro wrote:
> On Tue, 15 May 2001, Daniel Phillips wrote:
> > That's because you left out his invalidate:
> >
> > * create an instance in pagecache
> > * start reading into buffer cache (doesn't invalidate, right?)
> > * start writing using pagecache (invalidate buffer copy)
>
> Bzzert. You have a race here. Let's make it explicit:
>
> start writing
> put write request in queue
> block on that
>   start reading into buffer cache
>   put read request into queue
>   read from media
> write to media
>
> And no, we can't invalidate from IO completion hook.
>
> > * lose the page
> > * try to read it (via pagecache)
> >
> > Everthing ok.
>
> Nope.

The problem is that we have two IO operations on the same physical 
block in the queue at the same time, and we don't know it.  Maybe we 
should know it.

For your specific example we are ok if we do:

 * create an instance in pagecache
 * start reading into buffer cache (doesn't invalidate, right?)
 * start writing using pagecache (invalidate buffer copy)
 * lose the page (invalidate buffer copy)
 * try to read it (via pagecache)

We are also ok if we follow my suggested optimization and move the page 
to the buffer cache instead of just losing it.

We are not ok if we do:

 * try to read it (via buffercache)

because its copy is out of date, but this can be fixed by enforcing 
coherency in the request queue. 

1) Why should the request queue not be coherent?

2) Can we stop talking about buffer cache here and start talking about 
blocks mapped into a separate address space in the page cache?  From 
Linus's previous comments in this thread we are going to have that 
anyway, and your race also applies there.

I'd like to call that separate address space a 'block cache'.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, Daniel Phillips wrote:

> That's because you left out his invalidate:
> 
>   * create an instance in pagecache
>   * start reading into buffer cache (doesn't invalidate, right?)
>   * start writing using pagecache (invalidate buffer copy)

Bzzert. You have a race here. Let's make it explicit:

start writing
put write request in queue
block on that
start reading into buffer cache
put read request into queue
read from media
write to media

And no, we can't invalidate from IO completion hook.

>   * lose the page
>   * try to read it (via pagecache)
> 
> Everthing ok.

Nope.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Daniel Phillips


On Tuesday 15 May 2001 08:57, Alexander Viro wrote:
> On Tue, 15 May 2001, Richard Gooch wrote:
> > > What happens if you create a buffer cache entry? Does that
> > > invalidate the page cache one? Or do you just allow invalidates
> > > one way, and not the other? And why=
> >
> > I just figured on one way invalidates, because that seems cheap and
> > easy and has some benefits. Invalidating the other way is costly,
> > so don't bother, even if there were some benefits.
>
> Cute.
>   * create an instance in pagecache
>   * start reading into buffer cache (doesn't invalidate, right?)
>   * start writing using pagecache
>   * lose the page
>   * try to read it (via pagecache)
> Woops - just found a copy in buffer cache, let's pick data from it.
> Pity that said data is obsolete...

That's because you left out his invalidate:

* create an instance in pagecache
* start reading into buffer cache (doesn't invalidate, right?)
* start writing using pagecache (invalidate buffer copy)
* lose the page
* try to read it (via pagecache)

Everthing ok.  As an optimization, instead of 'lose the page', do 'move 
page blocks to buffer cache'.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread David Woodhouse



[EMAIL PROTECTED] said:
> JFFS - dunno.

Bah. JFFS doesn't use any of those horrible block device thingies.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Anton Altaparmakov

At 08:13 15/05/01, Linus Torvalds wrote:
>On Tue, 15 May 2001, Richard Gooch wrote:
> > So what happens if I dd from the block device and also from a file on
> > the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
> > two copies in the page cache? One for the block device access, and one
> > for the file access?
>
>Yup. And never the two shall meet.
>
>Why should they? Why would you ever do something like that, or care about
>the fact?

They shouldn't, but maybe some stupid utility or a typo will do it creating 
two incoherent copies of the same block on the device. -> Bad Things can 
happen.

Can't we simply stop people from doing it by say having mount lock the 
device from further opens (and vice versa of course, doing a "dd" should 
result in lock of device preventing a mount during the duration of "dd"). - 
Wouldn't this be a good thing, guaranteeing that problems cannot happen 
while not incurring any overhead except on device open/close? Or is this a 
matter of "give the user enough rope"? - If proper rw locking is 
implemented it could allow simultaneous -o ro mount with a dd from the 
device but do exclusive write locking, for example, for maximum flexibility.

Just my 2p.

Anton

-- 
Anton Altaparmakov  (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Lars Brinkhoff


Alan Cox <[EMAIL PROTECTED]> writes:
> > Larry, go read up on TOPS-20. :-) SunOS did give unix mmap(), but it
> > did not come up the idea.
> Seems to be TOPS-10 
> http://www.opost.com/dlm/tenex/fjcc72/ 

TENEX is not TOPS-10.  TOPS-10 didn't get virtual memory until around
1974.  By then, TENEX had been around for years.

TOPS-20 was developed from TENEX starting around 1973.

-- 
http://lars.nocrew.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro

On Tue, 15 May 2001, Linus Torvalds wrote:

> Looks like there are 19 filesystems that use the buffer cache right now:
> 
>   grep -l bread fs/*/*.c | cut -d/ -f2 | sort -u | wc
> 
> So quite a bit of work involved.

UNIX-like ones (and that includes QNX) are easy. HFS is hopeless - it won't
be fixed unless authors will do it. Tigran will probably fix BFS just as a
learning experience ;-) ADFS looks tolerably easy to fix. AFFS... directories
will be pure hell - blocks jump from directory to directory at zero notice.
NTFS and HPFS will win from switch (esp. NTFS). FAT is not a problem, if we
are willing to break CVF and let author fix it. Reiserfs... Dunno. They've
got a private (slightly mutated) copy of ~60% of fs/buffer.c. UDF should be
OK. ISOFS... ask Peter. JFFS - dunno.

So probably we'll have to keep the buffer cache (AFFS looks like a real
killer), but we will be able to do pagecache-only versions of a_ops methods.
If fs has no metadata in buffer cache we can drop unmap_underlying_metadata()
for it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds

On Tue, 15 May 2001, Chris Wedgwood wrote:
>
> On Tue, May 15, 2001 at 12:13:13AM -0700, Linus Torvalds wrote:
>
> We should not create crap code just because we _can_.
>
> How about removing code?

Absolutely. It's not all that often that we can do it, but when we can,
it's the best thing in the world.

> In 2.5.x is we move fs metadata into the pagecache, do we even need a
> buffer cache anymore? Can't we just ditch it completely and make all
> device access raw?

Yes and no.

Yes, it would be nice.

But no, I doubt we'll move _all_ metadata into the page-cache. I doubt,
for example, that we'll find people re-doing all the other filesystems. So
even if ext2 was page-cache only, what about all the 35 other filesystems
out there in the standard sources, never mind others that haven't been
integrated (XFS, ext3 etc..).

Yeah, I know. Some of them already do not use the buffer cache at all (the
network filesystems come to mind ;), but even so..

Looks like there are 19 filesystems that use the buffer cache right now:

grep -l bread fs/*/*.c | cut -d/ -f2 | sort -u | wc

So quite a bit of work involved.

But on the whole I'm definitely hoping that yes, we'll relegate the
"buffer_head" to be mainly just for IO, and not be a first-class caching
entity at all. It's just that I think it will take a _lng_ time until
we actually reach that noble goal completely.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds

On Tue, 15 May 2001, Richard Gooch wrote:
> > 
> > What happens if you create a buffer cache entry? Does that
> > invalidate the page cache one? Or do you just allow invalidates one
> > way, and not the other? And why=
> 
> I just figured on one way invalidates, because that seems cheap and
> easy and has some benefits. Invalidating the other way is costly, so
> don't bother, even if there were some benefits.

Ahh..

Well, excuse me while I puke all over your shoes.

Why don't you go hack the NT kernel, or something like that? I have some
taste, and part of that is having this silly notion of "Things should make
sense".

We should not create crap code just because we _can_. Sure, it's easy to
write the code you suggest. Do you really want a system like that? A
system where you have rules that make no sense, except "it was easy to
invlidate one way, so let's do that, and never mind that it makes no
logical sense at all?".

> > Ehh.. And then you'll be unhappy _again_, when we early in 2.5.x
> > start using the page cache for block device accesses. Which we
> > _have_ to do if we want to be able to mmap block devices. Which we
> > _do_ want to do (hint: DVD's etc).
> 
> So what happens if I dd from the block device and also from a file on
> the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
> two copies in the page cache? One for the block device access, and one
> for the file access?

Yup. And never the two shall meet.

Why should they? Why would you ever do something like that, or care about
the fact? Why would you design a system around a perversity, slowing down
(and uglifying) the sane and common case?

> And because your suspend/resume idea isn't really going to help me
> much. That's because my boot scripts have the notion of
> "personalities" (change the boot configuration by asking the user
> early on in the boot process). If I suspend after I've got XDM
> running, it's too late.

Note that I never said "suspend". I said _resume_. You would create the
resume-image once, and you'd create it not at shutdown time, but at the
point you want to resume from.

You don't want to ever suspend the dang thing - just shut it down, and
reboot it quickly by resuming from the snapshot. So you just create a
simple resume snapshot. Which is easy to do, with the exact same tools
that you've been talking about all the time.

What you do is:
 - trace what pages get loaded off the disk
 - create a snapshot of the contents of those pages
 - archive it all up (may I suggest compressing it at the same time?)
 - the "resume" function is just a "uncompress and populate the virtual
   caches with the contents" action.

Note that the "uncompress and populate" doesn't actually have to use the
_real_ disk contents of the file. A byte is a byte is a byte, and it
doesn't actually need to come from the actual filesystem the system
_thinks_ it comes from. Once it is loaded into memory, it's just a
value. You'e "primed" your caches, so when you actually run the bootup
scripts, you'll have some random hit-rate (say, 98%), and improve the
bootup immensely that way.

Another way of saying this: Imagine that you "tar" up and compress the
files you need for booting. You then uncompress and untar the archive, but
instead of untar'ing onto a filesystem, you _just_ populate the caches. 

This is how some CPU's bootstrap themselves: they fill their icache from a
serial rom (at least some alpha chips did this). Never mind that they
didn't actually get that initial state from the _real_ backing store (RAM,
or in the hypothetical "resume" case, the filesystem off disk). There's no
way to tell, if your cached copies have the same data as the data on
disk. Never mind that the data _got_ there a strange way.

(And yes, your "cache priming" had better prime the cache with the same
stuff that _is_ on the real filesystem, otherwise you'd obviously get
strange behaviour with the caches not actually matching what the
filesystem contents are. But that's simple to do, and it's easy enough to
boot up in safe mode without a cache priming stage).

One of the advantages of "resuming" (or "priming the cache", or whatever
you want to call it) is that you're free to lay out the resume/cache image
any way you want on disk, as it has nothing to do with the actual
filesystem - except for the fact of sharing some of the same data. Which
means that you can really read it in efficiently.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, Richard Gooch wrote:

> > What happens if you create a buffer cache entry? Does that
> > invalidate the page cache one? Or do you just allow invalidates one
> > way, and not the other? And why=
> 
> I just figured on one way invalidates, because that seems cheap and
> easy and has some benefits. Invalidating the other way is costly, so
> don't bother, even if there were some benefits.

Cute.
* create an instance in pagecache
* start reading into buffer cache (doesn't invalidate, right?)
* start writing using pagecache
* lose the page
* try to read it (via pagecache)
Woops - just found a copy in buffer cache, let's pick data from it.
Pity that said data is obsolete...

> So what happens if I dd from the block device and also from a file on
> the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
> two copies in the page cache? One for the block device access, and one
> for the file access?

Yes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Richard Gooch

Linus Torvalds writes:
> 
> On Tue, 15 May 2001, Richard Gooch wrote:
> > 
> > However, what about simply invalidating an entry in the buffer cache
> > when you do a write from the page cache?
> 
> And how do you do the invalidate the other way, pray tell?
> 
> What happens if you create a buffer cache entry? Does that
> invalidate the page cache one? Or do you just allow invalidates one
> way, and not the other? And why=

I just figured on one way invalidates, because that seems cheap and
easy and has some benefits. Invalidating the other way is costly, so
don't bother, even if there were some benefits.

> > Actually, I'd kind of like it if the page cache steals from the buffer
> > cache on read. The buffer cache is mostly populated by fsck. Once I've
> > done the fsck, those buffers are useless to me. They might be useful
> > again if they are steal-able by the page cache.
> 
> Ehh.. And then you'll be unhappy _again_, when we early in 2.5.x
> start using the page cache for block device accesses. Which we
> _have_ to do if we want to be able to mmap block devices. Which we
> _do_ want to do (hint: DVD's etc).

So what happens if I dd from the block device and also from a file on
the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
two copies in the page cache? One for the block device access, and one
for the file access?

> Face it. What you ask for is stupid and fundamentally unworkable. 
> 
> Tell me WHY you are completely ignoring my arguments, when I (a)
> tell you why your way is bad and stupid (and when you ignore the
> arguments, don't complain when I call you stupid) and (b) I give you
> alternate ways to do the same thing, except my suggestion is
> _faster_ and has none of the downside yours has.
> 
> WHY?

Because I like to understand completely all the different options
before giving up on any. That in itself is a good enough reason, IMO.

Because I've found that when arguing about this kind of stuff, even if
the other person asks for something that is "wrong" or "stupid" from
your own point of view, if you respect their intelligence, then maybe
you can together find an alternative solution that solves the
underlying problem but does it cleanly.

I've been on the other side of this with a friend and colleague. We
used to have healthy arguments that lasted all afternoon. He'd ask for
something that was unclean and didn't fit into the structure or the
philosophy. But I respected his intelligence, skill and his need for a
solution. In the end, we'd come up with a better way than either one
would have proposed. We had a dialogue.

And because your suspend/resume idea isn't really going to help me
much. That's because my boot scripts have the notion of
"personalities" (change the boot configuration by asking the user
early on in the boot process). If I suspend after I've got XDM
running, it's too late.

So what I want is a solution that will keep the kernel clean (believe
me, I really do want to keep it clean), but gives me a fast boot too.
And I believe the solution is out there. We just haven't found it yet.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds

On Tue, 15 May 2001, Richard Gooch wrote:
> 
> However, what about simply invalidating an entry in the buffer cache
> when you do a write from the page cache?

And how do you do the invalidate the other way, pray tell?

What happens if you create a buffer cache entry? Does that invalidate the
page cache one? Or do you just allow invalidates one way, and not the
other? And why=

> Actually, I'd kind of like it if the page cache steals from the buffer
> cache on read. The buffer cache is mostly populated by fsck. Once I've
> done the fsck, those buffers are useless to me. They might be useful
> again if they are steal-able by the page cache.

Ehh.. And then you'll be unhappy _again_, when we early in 2.5.x start
using the page cache for block device accesses. Which we _have_ to do if
we want to be able to mmap block devices. Which we _do_ want to do (hint:
DVD's etc).

Face it. What you ask for is stupid and fundamentally unworkable. 

Tell me WHY you are completely ignoring my arguments, when I (a) tell you
why your way is bad and stupid (and when you ignore the arguments, don't
complain when I call you stupid) and (b) I give you alternate ways to do
the same thing, except my suggestion is _faster_ and has none of the
downside yours has.

WHY?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Richard Gooch

Linus Torvalds writes:
> You could choose to do "partial coherency", ie be coherent only one
> way, for example. That would make the coherency overhead much less,
> but would also make the caches basically act very unpredictably -
> you might have somebody write through the page cache yet on a read
> actually not _see_ what he wrote, because it got written out to disk
> and was shadowed by cached data in the buffer cache that didn't get
> updated.

OK, I see your concern. And the old way of doing things, placing a
copy in the buffer cache when the page cache does a write, will eat
away performance.

However, what about simply invalidating an entry in the buffer cache
when you do a write from the page cache? By the time you get ready to
do the I/O, you have the device bnum, so then isn't it a trivial
operation to index into the buffer cache and invalidate that block?

Is there some other subtlety I'm missing here?

Actually, I'd kind of like it if the page cache steals from the buffer
cache on read. The buffer cache is mostly populated by fsck. Once I've
done the fsck, those buffers are useless to me. They might be useful
again if they are steal-able by the page cache.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Richard Gooch


Linus Torvalds writes:
 You could choose to do partial coherency, ie be coherent only one
 way, for example. That would make the coherency overhead much less,
 but would also make the caches basically act very unpredictably -
 you might have somebody write through the page cache yet on a read
 actually not _see_ what he wrote, because it got written out to disk
 and was shadowed by cached data in the buffer cache that didn't get
 updated.

OK, I see your concern. And the old way of doing things, placing a
copy in the buffer cache when the page cache does a write, will eat
away performance.

However, what about simply invalidating an entry in the buffer cache
when you do a write from the page cache? By the time you get ready to
do the I/O, you have the device bnum, so then isn't it a trivial
operation to index into the buffer cache and invalidate that block?

Is there some other subtlety I'm missing here?

Actually, I'd kind of like it if the page cache steals from the buffer
cache on read. The buffer cache is mostly populated by fsck. Once I've
done the fsck, those buffers are useless to me. They might be useful
again if they are steal-able by the page cache.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds



On Tue, 15 May 2001, Richard Gooch wrote:
 
 However, what about simply invalidating an entry in the buffer cache
 when you do a write from the page cache?

And how do you do the invalidate the other way, pray tell?

What happens if you create a buffer cache entry? Does that invalidate the
page cache one? Or do you just allow invalidates one way, and not the
other? And why=

 Actually, I'd kind of like it if the page cache steals from the buffer
 cache on read. The buffer cache is mostly populated by fsck. Once I've
 done the fsck, those buffers are useless to me. They might be useful
 again if they are steal-able by the page cache.

Ehh.. And then you'll be unhappy _again_, when we early in 2.5.x start
using the page cache for block device accesses. Which we _have_ to do if
we want to be able to mmap block devices. Which we _do_ want to do (hint:
DVD's etc).

Face it. What you ask for is stupid and fundamentally unworkable. 

Tell me WHY you are completely ignoring my arguments, when I (a) tell you
why your way is bad and stupid (and when you ignore the arguments, don't
complain when I call you stupid) and (b) I give you alternate ways to do
the same thing, except my suggestion is _faster_ and has none of the
downside yours has.

WHY?

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Richard Gooch


Linus Torvalds writes:
 
 On Tue, 15 May 2001, Richard Gooch wrote:
  
  However, what about simply invalidating an entry in the buffer cache
  when you do a write from the page cache?
 
 And how do you do the invalidate the other way, pray tell?
 
 What happens if you create a buffer cache entry? Does that
 invalidate the page cache one? Or do you just allow invalidates one
 way, and not the other? And why=

I just figured on one way invalidates, because that seems cheap and
easy and has some benefits. Invalidating the other way is costly, so
don't bother, even if there were some benefits.

  Actually, I'd kind of like it if the page cache steals from the buffer
  cache on read. The buffer cache is mostly populated by fsck. Once I've
  done the fsck, those buffers are useless to me. They might be useful
  again if they are steal-able by the page cache.
 
 Ehh.. And then you'll be unhappy _again_, when we early in 2.5.x
 start using the page cache for block device accesses. Which we
 _have_ to do if we want to be able to mmap block devices. Which we
 _do_ want to do (hint: DVD's etc).

So what happens if I dd from the block device and also from a file on
the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
two copies in the page cache? One for the block device access, and one
for the file access?

 Face it. What you ask for is stupid and fundamentally unworkable. 
 
 Tell me WHY you are completely ignoring my arguments, when I (a)
 tell you why your way is bad and stupid (and when you ignore the
 arguments, don't complain when I call you stupid) and (b) I give you
 alternate ways to do the same thing, except my suggestion is
 _faster_ and has none of the downside yours has.
 
 WHY?

Because I like to understand completely all the different options
before giving up on any. That in itself is a good enough reason, IMO.

Because I've found that when arguing about this kind of stuff, even if
the other person asks for something that is wrong or stupid from
your own point of view, if you respect their intelligence, then maybe
you can together find an alternative solution that solves the
underlying problem but does it cleanly.

I've been on the other side of this with a friend and colleague. We
used to have healthy arguments that lasted all afternoon. He'd ask for
something that was unclean and didn't fit into the structure or the
philosophy. But I respected his intelligence, skill and his need for a
solution. In the end, we'd come up with a better way than either one
would have proposed. We had a dialogue.

And because your suspend/resume idea isn't really going to help me
much. That's because my boot scripts have the notion of
personalities (change the boot configuration by asking the user
early on in the boot process). If I suspend after I've got XDM
running, it's too late.

So what I want is a solution that will keep the kernel clean (believe
me, I really do want to keep it clean), but gives me a fast boot too.
And I believe the solution is out there. We just haven't found it yet.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, Richard Gooch wrote:

  What happens if you create a buffer cache entry? Does that
  invalidate the page cache one? Or do you just allow invalidates one
  way, and not the other? And why=
 
 I just figured on one way invalidates, because that seems cheap and
 easy and has some benefits. Invalidating the other way is costly, so
 don't bother, even if there were some benefits.

Cute.
* create an instance in pagecache
* start reading into buffer cache (doesn't invalidate, right?)
* start writing using pagecache
* lose the page
* try to read it (via pagecache)
Woops - just found a copy in buffer cache, let's pick data from it.
Pity that said data is obsolete...

 So what happens if I dd from the block device and also from a file on
 the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
 two copies in the page cache? One for the block device access, and one
 for the file access?

Yes.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds



On Tue, 15 May 2001, Richard Gooch wrote:
  
  What happens if you create a buffer cache entry? Does that
  invalidate the page cache one? Or do you just allow invalidates one
  way, and not the other? And why=
 
 I just figured on one way invalidates, because that seems cheap and
 easy and has some benefits. Invalidating the other way is costly, so
 don't bother, even if there were some benefits.

Ahh..

Well, excuse me while I puke all over your shoes.

Why don't you go hack the NT kernel, or something like that? I have some
taste, and part of that is having this silly notion of Things should make
sense.

We should not create crap code just because we _can_. Sure, it's easy to
write the code you suggest. Do you really want a system like that? A
system where you have rules that make no sense, except it was easy to
invlidate one way, so let's do that, and never mind that it makes no
logical sense at all?.

  Ehh.. And then you'll be unhappy _again_, when we early in 2.5.x
  start using the page cache for block device accesses. Which we
  _have_ to do if we want to be able to mmap block devices. Which we
  _do_ want to do (hint: DVD's etc).
 
 So what happens if I dd from the block device and also from a file on
 the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
 two copies in the page cache? One for the block device access, and one
 for the file access?

Yup. And never the two shall meet.

Why should they? Why would you ever do something like that, or care about
the fact? Why would you design a system around a perversity, slowing down
(and uglifying) the sane and common case?

 And because your suspend/resume idea isn't really going to help me
 much. That's because my boot scripts have the notion of
 personalities (change the boot configuration by asking the user
 early on in the boot process). If I suspend after I've got XDM
 running, it's too late.

Note that I never said suspend. I said _resume_. You would create the
resume-image once, and you'd create it not at shutdown time, but at the
point you want to resume from.

You don't want to ever suspend the dang thing - just shut it down, and
reboot it quickly by resuming from the snapshot. So you just create a
simple resume snapshot. Which is easy to do, with the exact same tools
that you've been talking about all the time.

What you do is:
 - trace what pages get loaded off the disk
 - create a snapshot of the contents of those pages
 - archive it all up (may I suggest compressing it at the same time?)
 - the resume function is just a uncompress and populate the virtual
   caches with the contents action.

Note that the uncompress and populate doesn't actually have to use the
_real_ disk contents of the file. A byte is a byte is a byte, and it
doesn't actually need to come from the actual filesystem the system
_thinks_ it comes from. Once it is loaded into memory, it's just a
value. You'e primed your caches, so when you actually run the bootup
scripts, you'll have some random hit-rate (say, 98%), and improve the
bootup immensely that way.

Another way of saying this: Imagine that you tar up and compress the
files you need for booting. You then uncompress and untar the archive, but
instead of untar'ing onto a filesystem, you _just_ populate the caches. 

This is how some CPU's bootstrap themselves: they fill their icache from a
serial rom (at least some alpha chips did this). Never mind that they
didn't actually get that initial state from the _real_ backing store (RAM,
or in the hypothetical resume case, the filesystem off disk). There's no
way to tell, if your cached copies have the same data as the data on
disk. Never mind that the data _got_ there a strange way.

(And yes, your cache priming had better prime the cache with the same
stuff that _is_ on the real filesystem, otherwise you'd obviously get
strange behaviour with the caches not actually matching what the
filesystem contents are. But that's simple to do, and it's easy enough to
boot up in safe mode without a cache priming stage).

One of the advantages of resuming (or priming the cache, or whatever
you want to call it) is that you're free to lay out the resume/cache image
any way you want on disk, as it has nothing to do with the actual
filesystem - except for the fact of sharing some of the same data. Which
means that you can really read it in efficiently.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds




On Tue, 15 May 2001, Chris Wedgwood wrote:

 On Tue, May 15, 2001 at 12:13:13AM -0700, Linus Torvalds wrote:

 We should not create crap code just because we _can_.

 How about removing code?

Absolutely. It's not all that often that we can do it, but when we can,
it's the best thing in the world.

 In 2.5.x is we move fs metadata into the pagecache, do we even need a
 buffer cache anymore? Can't we just ditch it completely and make all
 device access raw?

Yes and no.

Yes, it would be nice.

But no, I doubt we'll move _all_ metadata into the page-cache. I doubt,
for example, that we'll find people re-doing all the other filesystems. So
even if ext2 was page-cache only, what about all the 35 other filesystems
out there in the standard sources, never mind others that haven't been
integrated (XFS, ext3 etc..).

Yeah, I know. Some of them already do not use the buffer cache at all (the
network filesystems come to mind ;), but even so..

Looks like there are 19 filesystems that use the buffer cache right now:

grep -l bread fs/*/*.c | cut -d/ -f2 | sort -u | wc

So quite a bit of work involved.

But on the whole I'm definitely hoping that yes, we'll relegate the
buffer_head to be mainly just for IO, and not be a first-class caching
entity at all. It's just that I think it will take a _lng_ time until
we actually reach that noble goal completely.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, Linus Torvalds wrote:

 Looks like there are 19 filesystems that use the buffer cache right now:
 
   grep -l bread fs/*/*.c | cut -d/ -f2 | sort -u | wc
 
 So quite a bit of work involved.

UNIX-like ones (and that includes QNX) are easy. HFS is hopeless - it won't
be fixed unless authors will do it. Tigran will probably fix BFS just as a
learning experience ;-) ADFS looks tolerably easy to fix. AFFS... directories
will be pure hell - blocks jump from directory to directory at zero notice.
NTFS and HPFS will win from switch (esp. NTFS). FAT is not a problem, if we
are willing to break CVF and let author fix it. Reiserfs... Dunno. They've
got a private (slightly mutated) copy of ~60% of fs/buffer.c. UDF should be
OK. ISOFS... ask Peter. JFFS - dunno.

So probably we'll have to keep the buffer cache (AFFS looks like a real
killer), but we will be able to do pagecache-only versions of a_ops methods.
If fs has no metadata in buffer cache we can drop unmap_underlying_metadata()
for it.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Anton Altaparmakov


At 08:13 15/05/01, Linus Torvalds wrote:
On Tue, 15 May 2001, Richard Gooch wrote:
  So what happens if I dd from the block device and also from a file on
  the mounted FS, where that file overlaps the bnums I dd'ed? Do we get
  two copies in the page cache? One for the block device access, and one
  for the file access?

Yup. And never the two shall meet.

Why should they? Why would you ever do something like that, or care about
the fact?

They shouldn't, but maybe some stupid utility or a typo will do it creating 
two incoherent copies of the same block on the device. - Bad Things can 
happen.

Can't we simply stop people from doing it by say having mount lock the 
device from further opens (and vice versa of course, doing a dd should 
result in lock of device preventing a mount during the duration of dd). - 
Wouldn't this be a good thing, guaranteeing that problems cannot happen 
while not incurring any overhead except on device open/close? Or is this a 
matter of give the user enough rope? - If proper rw locking is 
implemented it could allow simultaneous -o ro mount with a dd from the 
device but do exclusive write locking, for example, for maximum flexibility.

Just my 2p.

Anton


-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread David Woodhouse



[EMAIL PROTECTED] said:
 JFFS - dunno.

Bah. JFFS doesn't use any of those horrible block device thingies.

--
dwmw2


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Daniel Phillips


On Tuesday 15 May 2001 08:57, Alexander Viro wrote:
 On Tue, 15 May 2001, Richard Gooch wrote:
   What happens if you create a buffer cache entry? Does that
   invalidate the page cache one? Or do you just allow invalidates
   one way, and not the other? And why=
 
  I just figured on one way invalidates, because that seems cheap and
  easy and has some benefits. Invalidating the other way is costly,
  so don't bother, even if there were some benefits.

 Cute.
   * create an instance in pagecache
   * start reading into buffer cache (doesn't invalidate, right?)
   * start writing using pagecache
   * lose the page
   * try to read it (via pagecache)
 Woops - just found a copy in buffer cache, let's pick data from it.
 Pity that said data is obsolete...

That's because you left out his invalidate:

* create an instance in pagecache
* start reading into buffer cache (doesn't invalidate, right?)
* start writing using pagecache (invalidate buffer copy)
* lose the page
* try to read it (via pagecache)

Everthing ok.  As an optimization, instead of 'lose the page', do 'move 
page blocks to buffer cache'.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, Daniel Phillips wrote:

 That's because you left out his invalidate:
 
   * create an instance in pagecache
   * start reading into buffer cache (doesn't invalidate, right?)
   * start writing using pagecache (invalidate buffer copy)

Bzzert. You have a race here. Let's make it explicit:

start writing
put write request in queue
block on that
start reading into buffer cache
put read request into queue
read from media
write to media

And no, we can't invalidate from IO completion hook.

   * lose the page
   * try to read it (via pagecache)
 
 Everthing ok.

Nope.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Daniel Phillips


On Tuesday 15 May 2001 12:44, Alexander Viro wrote:
 On Tue, 15 May 2001, Daniel Phillips wrote:
  That's because you left out his invalidate:
 
  * create an instance in pagecache
  * start reading into buffer cache (doesn't invalidate, right?)
  * start writing using pagecache (invalidate buffer copy)

 Bzzert. You have a race here. Let's make it explicit:

 start writing
 put write request in queue
 block on that
   start reading into buffer cache
   put read request into queue
   read from media
 write to media

 And no, we can't invalidate from IO completion hook.

  * lose the page
  * try to read it (via pagecache)
 
  Everthing ok.

 Nope.

The problem is that we have two IO operations on the same physical 
block in the queue at the same time, and we don't know it.  Maybe we 
should know it.

For your specific example we are ok if we do:

 * create an instance in pagecache
 * start reading into buffer cache (doesn't invalidate, right?)
 * start writing using pagecache (invalidate buffer copy)
 * lose the page (invalidate buffer copy)
 * try to read it (via pagecache)

We are also ok if we follow my suggested optimization and move the page 
to the buffer cache instead of just losing it.

We are not ok if we do:

 * try to read it (via buffercache)

because its copy is out of date, but this can be fixed by enforcing 
coherency in the request queue. 

1) Why should the request queue not be coherent?

2) Can we stop talking about buffer cache here and start talking about 
blocks mapped into a separate address space in the page cache?  From 
Linus's previous comments in this thread we are going to have that 
anyway, and your race also applies there.

I'd like to call that separate address space a 'block cache'.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Chris Mason




On Tuesday, May 15, 2001 04:33:57 AM -0400 Alexander Viro
[EMAIL PROTECTED] wrote:

 
 
 On Tue, 15 May 2001, Linus Torvalds wrote:
 
 Looks like there are 19 filesystems that use the buffer cache right now:
 
  grep -l bread fs/*/*.c | cut -d/ -f2 | sort -u | wc
 
 So quite a bit of work involved.
 
 Reiserfs... Dunno. They've got a private (slightly mutated) copy of
 ~60% of fs/buffer.c. 

But, putting the log and the metadata in the page cache makes memory
pressure and such cleaner, so this is one of my goals for 2.5.  reiserfs
will still have alias issues due to the packed tails (one copy in the
btree, another in the page), but it will be no worse than it is now.

-chris

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Craig Milo Rogers


And because your suspend/resume idea isn't really going to help me
much. That's because my boot scripts have the notion of
personalities (change the boot configuration by asking the user
early on in the boot process). If I suspend after I've got XDM
running, it's too late.

Preface: As has been mentioned on this discussion thread, some
disk devices maintain a cache of their own, running on a small (by
today's standards) CPU.  These caches are probably sector oriented,
not block oriented, but are almost certainly not page oriented or
filesystem oriented.  Well, OK, some might have DOS filesystem
knowlege built-in, I suppose... yuck!

Anyway, although there may be slight differences, they are
effectively block-orieted caches.  As long as they are write-through
(and/or there are cache flushing commands, etc), there are reasonably
coherent with the operating system's main cache, and they meet the
expectations of database programs, etc. that want stable storage.

In terms of efficiency, there are questions about read-aheead,
write-behind, write-through with invalidation or write-through with
cache update -- the usual stuff.  I leave it as an exercise for the
reader to decide how to best tune their system, and merely assert that
it can be done.

Imagine, as a mental exercise, that you move this
block-oriented cache out of the disk drive, and into the main CPU and
operating system, say roughly at the disk driver level.  We lose the
efficiency of having the small CPU do the block lookups, but a hashed
block lookup is rather cheap nowadays, wouldn't you say?  Ignoring
issues of, What if the disk drive fails independently of the main
CPU, or vice versa?, the transplanted block cache should operate
pretty much as it did in the disk drive.

In particular, it should continue to operate properly with the
main CPU's main page cache.

Conclusion: a page cache can successfully run over a
appropriately designed block cache.  QED.

What's the hitch?  It's the appropriately designed
constraint.  It is quite possible that the Linux block cache is not
designed (data strictures and code paths considered together) in a way
that allows it to mimic a simple disk drive's block cache.  I assume
that there's some impediment, or this discussion wouldn't have lasted
so long -- the idea of using the Linux block cache to model a disk
drive's block cache is pretty obvious, after all.

So what I want is a solution that will keep the kernel clean (believe
me, I really do want to keep it clean), but gives me a fast boot too.
And I believe the solution is out there. We just haven't found it yet.

Well, if you want a fast boot *on a single type of disk
drive*, and the existing Linux block cache doesn't work, you could
extend the driver for that hardware with an optional block cache,
independently of Linux' block cache, along with an appropriate
interface to populate it with boot-time blocks, and to flush it when
no longer needed.  That's not exactly clean, though, is it?

You could extend the md (or LVM) drivers, or create a new
driver similar to one of them, that incorporates a simple block cache,
with appropriate mechanisms for populating and flushing it.  Clean?
er, no, rather muddy, in fact.

You might want to lock down the pages that you've
prepopulated, rather than let them be discarded before they're needed.
This could be designed into a new block cache, but you might need to
play some accounting games to get it right with the existing block
cache.

Finally, there's Linus' offer for a preread call, to
prepopulate the page cache.  By virtue of your knowlege of the
underlying implementation of the system, you could preload the file
system index pages into the block cache, and load the datd pages into
the page cache.  Clean!  Sewer-like!

Craig Milo Rogers

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Followup to:  [EMAIL PROTECTED]
By author:Alexander Viro [EMAIL PROTECTED]
In newsgroup: linux.dev.kernel
 
 UNIX-like ones (and that includes QNX) are easy. HFS is hopeless - it won't
 be fixed unless authors will do it. Tigran will probably fix BFS just as a
 learning experience ;-) ADFS looks tolerably easy to fix. AFFS... directories
 will be pure hell - blocks jump from directory to directory at zero notice.
 NTFS and HPFS will win from switch (esp. NTFS). FAT is not a problem, if we
 are willing to break CVF and let author fix it. Reiserfs... Dunno. They've
 got a private (slightly mutated) copy of ~60% of fs/buffer.c. UDF should be
 OK. ISOFS... ask Peter. JFFS - dunno.
 

isofs wouldn't be too bad as long as struct mapping:struct inode is a
many-to-one mapping.

-hpa
-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Followup to:  [EMAIL PROTECTED]
By author:Anton Altaparmakov [EMAIL PROTECTED]
In newsgroup: linux.dev.kernel
 
 They shouldn't, but maybe some stupid utility or a typo will do it creating 
 two incoherent copies of the same block on the device. - Bad Things can 
 happen.
 
 Can't we simply stop people from doing it by say having mount lock the 
 device from further opens (and vice versa of course, doing a dd should 
 result in lock of device preventing a mount during the duration of dd). - 
 Wouldn't this be a good thing, guaranteeing that problems cannot happen 
 while not incurring any overhead except on device open/close? Or is this a 
 matter of give the user enough rope? - If proper rw locking is 
 implemented it could allow simultaneous -o ro mount with a dd from the 
 device but do exclusive write locking, for example, for maximum flexibility.
 

This would leave no way (without introducing new interfaces) to write,
for example, the boot block on an ext2 filesystem.  Note that the
bootblock (defined as the first 1024 bytes) is not actually used by
the filesystem, although depending on the block size it may share a
block with the superblock (if blocksize  1024).

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On 15 May 2001, H. Peter Anvin wrote:

 isofs wouldn't be too bad as long as struct mapping:struct inode is a
 many-to-one mapping.

Erm... What's wrong with inode-u.isofs_i.my_very_own_address_space ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
 
 On 15 May 2001, H. Peter Anvin wrote:
 
  isofs wouldn't be too bad as long as struct mapping:struct inode is a
  many-to-one mapping.
 
 Erm... What's wrong with inode-u.isofs_i.my_very_own_address_space ?
 

None whatsoever.  The one thing that matters is that noone starts making
the assumption that mapping-host-i_mapping == mapping.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
 
  None whatsoever.  The one thing that matters is that noone starts making
  the assumption that mapping-host-i_mapping == mapping.
 
 One actually shouldn't assume that mapping-host is an inode.
 

What else could it be, since it's a struct inode *?  NULL?

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, H. Peter Anvin wrote:

 Alexander Viro wrote:
  
  On 15 May 2001, H. Peter Anvin wrote:
  
   isofs wouldn't be too bad as long as struct mapping:struct inode is a
   many-to-one mapping.
  
  Erm... What's wrong with inode-u.isofs_i.my_very_own_address_space ?
  
 
 None whatsoever.  The one thing that matters is that noone starts making
 the assumption that mapping-host-i_mapping == mapping.

One actually shouldn't assume that mapping-host is an inode.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
 
  What else could it be, since it's a struct inode *?  NULL?
 
 struct block_device *, for one thing. We'll have to do that as soon
 as we do block devices in pagecache.
 

How would you know what datatype it is?  A union?  Making struct
block_device * a struct inode * in a nonmounted filesystem?  In a
devfs?  (Seriously.  Being able to do these kinds of data-structural
equivalence is IMO the nice thing about devfs  co...)

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, H. Peter Anvin wrote:

 Alexander Viro wrote:
  
   None whatsoever.  The one thing that matters is that noone starts making
   the assumption that mapping-host-i_mapping == mapping.
  
  One actually shouldn't assume that mapping-host is an inode.
  
 
 What else could it be, since it's a struct inode *?  NULL?

struct block_device *, for one thing. We'll have to do that as soon
as we do block devices in pagecache.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, H. Peter Anvin wrote:

 Alexander Viro wrote:
  
   What else could it be, since it's a struct inode *?  NULL?
  
  struct block_device *, for one thing. We'll have to do that as soon
  as we do block devices in pagecache.
  
 
 How would you know what datatype it is?  A union?  Making struct
 block_device * a struct inode * in a nonmounted filesystem?  In a
 devfs?  (Seriously.  Being able to do these kinds of data-structural
 equivalence is IMO the nice thing about devfs  co...)

void *.

Look, methods of your address_space certainly know what they hell they
are dealing with. Just as autofs_root_readdir() knows what inode-u.generic_ip
really points to.

Anybody else has no business to care about the contents of -host.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Kai Henningsen


[EMAIL PROTECTED] (Alexander Viro)  wrote on 15.05.01 in 
[EMAIL PROTECTED]:

 ... and Multics had all access to files through equivalent of mmap()
 in 60s. Segments in ls(1) got that name for a good reason.

Where's something called segments connected with ls(1)? I can't seem to  
find the reference.


MfG Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On 15 May 2001, Kai Henningsen wrote:

 [EMAIL PROTECTED] (Alexander Viro)  wrote on 15.05.01 in 
[EMAIL PROTECTED]:
 
  ... and Multics had all access to files through equivalent of mmap()
  in 60s. Segments in ls(1) got that name for a good reason.
 
 Where's something called segments connected with ls(1)? I can't seem to  
 find the reference.

ls == list segments. Name came from Multics.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds


In article [EMAIL PROTECTED],
Alexander Viro  [EMAIL PROTECTED] wrote:
On Tue, 15 May 2001, H. Peter Anvin wrote:

 Alexander Viro wrote:
  
   None whatsoever.  The one thing that matters is that noone starts making
   the assumption that mapping-host-i_mapping == mapping.
  
  One actually shouldn't assume that mapping-host is an inode.
  
 
 What else could it be, since it's a struct inode *?  NULL?

struct block_device *, for one thing. We'll have to do that as soon
as we do block devices in pagecache.

No, Al. It's an inode. It was a major mistake to ever think anything
else.

I see your problem, but it's not a real problem.  What you do for block
devices (or anything like that where you might have _multiple_ inodes
pointing to the same thing, is to just create a virtual inode, and
have THAT be the one that the mapping is associated with.  Basically
each struct block_device * would have an inode associated with it, to
act as a anchor for things like this. 

What is struct inode, after all? It's just the virtual representation
of a entity. The inodes associated with /dev/hda are not the inodes
associated with the actual _device_ - they are just on-disk links to
the physical device. 

[ Aside: there are good arguments to _not_ embed struct inode into
  struct block_device, but instead do it the other way around - the
  same way we have filesystem-specific inode data inside struct inode
  we can easily have device-type specific data there.  And it makes a
  whole lot more sense to attach a mount to an inode than it makes to
  attach a mount to a struct block_device.

  Done right, we could eventually get rid of loopback block devices.
  They'd just be inodes that aren't of type struct block_device, and
  the index to struct buffer_head would not be block_deve *, blknr,
  size, but inode *, blknr, size. See? The added level of indirection
  is one that we actually already _use_, it's just that we have this
  loopback device special case for it..

  In a perfect setup you could actually do mount -t ext2 file /mnt/x
  without having _any_ loopback setup or anything like that, simply
  because you don't _need_ it. It would be automatic. ]

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Linus Torvalds


In article [EMAIL PROTECTED],
Alexander Viro  [EMAIL PROTECTED] wrote:
 
 How would you know what datatype it is?  A union?  Making struct
 block_device * a struct inode * in a nonmounted filesystem?  In a
 devfs?  (Seriously.  Being able to do these kinds of data-structural
 equivalence is IMO the nice thing about devfs  co...)

void *.

No. It used to be that way, and it was a horrible mess.

We _need_ to know that it's an inode, because the generic mapping
functions basically need to do things like

mark_inode_dirty_pages(mapping-host);

which in turn needs the host to be an inode (otherwise you don't know
how and where to write the dang things back again).

There's no question that you can avoid it being an inode by virtualizing
more of it, and adding more virtual functions to the mapping operations
(right now the only one you'd HAVE to add is the mark_page_dirty()
operation), but the fact is that code gets really ugly by doing things
like that.

It was an absolute pleasure to remove all the casts of mapping-host.
With void * it needed to be cast to the right type (and you had to be
able to _prove_ that you knew what the right type was). With inode *,
the type is statically known, and you don't actually lose anything (at
worst, you'd have a virtual inode and then do an extra layer of
indirection there).

I really don't think we want to go back to void *. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Alexander Viro




On Tue, 15 May 2001, Alexander Viro wrote:

 On 15 May 2001, Kai Henningsen wrote:
 
  [EMAIL PROTECTED] (Alexander Viro)  wrote on 15.05.01 in 
[EMAIL PROTECTED]:
  
   ... and Multics had all access to files through equivalent of mmap()
   in 60s. Segments in ls(1) got that name for a good reason.
  
  Where's something called segments connected with ls(1)? I can't seem to  
  find the reference.
 
 ls == list segments. Name came from Multics.

Basically, they had the whole address space consisting of mmaped files.
address was (segment  18) + offset (both up to 18 bits) and primitive
was attach segment (== file) to address space. Each segment had its
own page table, BTW. Directories were special segments and contained
references to other segments (both files and directories). Root had fixed
ID. You could lookup segment by name.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Alexander Viro wrote:
 
 void *.
 
 Look, methods of your address_space certainly know what they hell they
 are dealing with. Just as autofs_root_readdir() knows what inode-u.generic_ip
 really points to.
 
 Anybody else has no business to care about the contents of -host.
 

Why do we need a -host at all, then?  Why not simply make it a private
pointer?

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Jan Harkes


On Tue, May 15, 2001 at 02:02:29PM -0700, Linus Torvalds wrote:
 In article [EMAIL PROTECTED],
 Alexander Viro  [EMAIL PROTECTED] wrote:
 On Tue, 15 May 2001, H. Peter Anvin wrote:
 
  Alexander Viro wrote:
   
None whatsoever.  The one thing that matters is that noone starts making
the assumption that mapping-host-i_mapping == mapping.

Don't worry too much about that, that relationship has been false for
Coda ever since i_mapping was introduced.

The only problem that is still lingering is related to i_size. Writes
update inode-i_mapping-host-i_size, and stat reads inode-i_size,
which are not the same.

I sent a small patch to stat.c for this a long time ago (Linux
2.3.99-pre6-7), which made the assumption in stat that i_mapping-host
was an inode. (effectively tmp.st_size = inode-i_mapping-host-i_size)

Other solutions were to finish the getattr implementation, or keep a
small Coda-specific wrapper for generic_file_write around.

   One actually shouldn't assume that mapping-host is an inode.
  
  What else could it be, since it's a struct inode *?  NULL?
 
 struct block_device *, for one thing. We'll have to do that as soon
 as we do block devices in pagecache.
 
 No, Al. It's an inode. It was a major mistake to ever think anything
 else.

So is anyone interested in a small patch for stat.c? It fixes, as far as
I know, the last place that 'assumes' that inode-i_mapping-host is
identical to inode.

Jan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Albert D. Cahalan


H. Peter Anvin writes:

 This would leave no way (without introducing new interfaces) to write,
 for example, the boot block on an ext2 filesystem.  Note that the
 bootblock (defined as the first 1024 bytes) is not actually used by
 the filesystem, although depending on the block size it may share a
 block with the superblock (if blocksize  1024).

The lack of coherency would screw this up anyway, doesn't it?
You have a block device, soon to be in the page cache, and
a superblock, also soon to be in the page cache. LILO writes to
the block device, while the ext2 driver updates the superblock.
Whatever gets written out last wins, and the other is lost.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Albert D. Cahalan wrote:
 
 H. Peter Anvin writes:
 
  This would leave no way (without introducing new interfaces) to write,
  for example, the boot block on an ext2 filesystem.  Note that the
  bootblock (defined as the first 1024 bytes) is not actually used by
  the filesystem, although depending on the block size it may share a
  block with the superblock (if blocksize  1024).
 
 The lack of coherency would screw this up anyway, doesn't it?
 You have a block device, soon to be in the page cache, and
 a superblock, also soon to be in the page cache. LILO writes to
 the block device, while the ext2 driver updates the superblock.
 Whatever gets written out last wins, and the other is lost.
 

Albert, I *did* say this better work or we have a problem.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Anton Altaparmakov


At 23:35 15/05/2001, H. Peter Anvin wrote:
Albert D. Cahalan wrote:
  H. Peter Anvin writes:
   This would leave no way (without introducing new interfaces) to write,
   for example, the boot block on an ext2 filesystem.  Note that the
   bootblock (defined as the first 1024 bytes) is not actually used by
   the filesystem, although depending on the block size it may share a
   block with the superblock (if blocksize  1024).
 
  The lack of coherency would screw this up anyway, doesn't it?
  You have a block device, soon to be in the page cache, and
  a superblock, also soon to be in the page cache. LILO writes to
  the block device, while the ext2 driver updates the superblock.
  Whatever gets written out last wins, and the other is lost.

Albert, I *did* say this better work or we have a problem.

And how are you thinking of this working without introducing new 
interfaces if the caches are indeed incoherent? Please correct me if I 
understand wrong, but when two caches are incoherent, I thought it means 
that the above _would_ screw up unless protected by exclusive write locking 
as I suggested in my previous post with the side effect that you can't 
write the boot block without unmounting the filesystem or modifying some 
interface somewhere.

As not all filesystems are like ext2, perhaps it would be better to fix 
ext2 and not the cache coherency? If ext2 is claiming ownership of a 
device, then it should do so in its entirety IMHO. You could always extend 
ext2 to use the NTFS approach where the bootsector is nothing more than a 
file which happens to exist on sector(s) zero (and following) of the 
device... (just a thought)

Best regards,

Anton


-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread H. Peter Anvin


Anton Altaparmakov wrote:
 
 And how are you thinking of this working without introducing new
 interfaces if the caches are indeed incoherent? Please correct me if I
 understand wrong, but when two caches are incoherent, I thought it means
 that the above _would_ screw up unless protected by exclusive write locking
 as I suggested in my previous post with the side effect that you can't
 write the boot block without unmounting the filesystem or modifying some
 interface somewhere.
 

Not if direct device acess and the superblock exist in the same mapping
space, OR an explicit interface to write the boot block is created.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-15 Thread Lars Brinkhoff


Alan Cox [EMAIL PROTECTED] writes:
  Larry, go read up on TOPS-20. :-) SunOS did give unix mmap(), but it
  did not come up the idea.
 Seems to be TOPS-10 
 http://www.opost.com/dlm/tenex/fjcc72/ 

TENEX is not TOPS-10.  TOPS-10 didn't get virtual memory until around
1974.  By then, TENEX had been around for years.

TOPS-20 was developed from TENEX starting around 1973.

-- 
http://lars.nocrew.org/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-14 Thread Richard Gooch


Linus Torvalds writes:
> 
> On Mon, 14 May 2001, Richard Gooch wrote:
> > 
> > Is there some fundamental reason why a buffer cache can't ever be
> > fast?
> 
> Yes.
> 
> Or rather, there is a fundamental reason why we must NEVER EVER look at
> the buffer cache: it is not coherent with the page cache. 
> 
> And keeping it coherent would be _extremely_ expensive. How do we
> know? Because we used to do that. Remember the small mindcraft
> benchmark? Yup. Double copies all over the place, double lookups, double
> everything.
> 
> You could think: "oh, we only need to look up the buffer cache when we
> create a new page cache mapping, so..".
> 
> You'd be wrong. We'd need to go the other way too: every time we create a
> new buffer cache entry, we'd need to make sure that it isn't mapped
> somewhere in the page cache (impossible), or otherwise we'd do the wrong
> thing sometimes (ie we might have two dirty copies, and we wouldn't know
> _which_ one is valid etc).
> 
> Aliasing is bad. Don't do it.

OK, this (combined with the other message) explains why we want to
keep away from the buffer cache. Thanks.

> You know, the mark of intelligence is realizing when you're making
> the same mistake over and over and over again, and not hitting your
> head in the wall five hundred times before you understand that it's
> not a clever thing to do.

But you didn't have to add this. Please note that I asked why not use
the buffer cache. I didn't proclaim that it was the ideal solution. I
did say what benefits it had, but I didn't assert that the benefits
outweighed the disadvantages.

> Please show some intelligence.

Well, frankly, I think I have. Things are obvious when you know them
already. Even if I'm ignorant, I'm not stupid!

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-14 Thread Alexander Viro




On Mon, 14 May 2001, David S. Miller wrote:

> 
> Larry McVoy writes:
>  > Hell, that's the OS that gave us mmap, remember that?  
> 
> Larry, go read up on TOPS-20. :-) SunOS did give unix mmap(), but it
> did not come up the idea.

s/TOPS-20/Multics/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-14 Thread Alexander Viro

On Mon, 14 May 2001, Linus Torvalds wrote:

> The current page cache is completely non-coherent (with _anything_: it's
> not coherent with other files using a page cache because they have a
> different index, and it's not coherent with the buffer cache because that
> one isn't even in the same name space).

Unfortunately, we have cases when disk block migrates from buffer cache
to page cache. Source of serious PITA and (IMO) the only serious reason
to take indirect blocks into page cache.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-14 Thread David S. Miller



Larry McVoy writes:
 > Hell, that's the OS that gave us mmap, remember that?  

Larry, go read up on TOPS-20. :-) SunOS did give unix mmap(), but it
did not come up the idea.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting FS access events

2001-05-14 Thread Alexander Viro




On Mon, 14 May 2001, Larry McVoy wrote:

> Hell, that's the OS that gave us mmap, remember that?  

"I got it from Agnes..."

Don't get me wrong, SunOS 4 was probably the nicest thing Sun had ever
released and I love it, but mmap(2) was _not_ the best of ideas. Files
as streams of bytes and files as persistent segments really do not mix
well. If you still have their source check the effects of write() from
mmaped area. Especially when you play with unaligned stuff.

Said that, in all sane cases we want indexing by (vnode,offset), not by
(device,block number). We _certainly_ don't want uncontrolled readahead
on block level. E.g. because we might have just allocated a new block
and are busy filling it with data we want to write. The last thing we
want is some fsckwit overwriting it with crap we have on disk. And that's
what such readahead is.

Besides, just how often do you reboot the box? If that's the hotspot for
you - when the hell does the boor beast find time to do something useful?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 139 matches

Mail list logo