Re: [RFC] Parallelize IO for e2fsck

2008-01-17 Thread Valerie Henson
On Jan 17, 2008 5:15 PM, David Chinner <[EMAIL PROTECTED]> wrote:
> On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote:
> > Hi y'all,
> >
> > This is a request for comments on the rewrite of the e2fsck IO
> > parallelization patches I sent out a few months ago.  The mechanism is
> > totally different.  Previously IO was parallelized by issuing IOs from
> > multiple threads; now a single thread issues fadvise(WILLNEED) and
> > then uses read() to complete the IO.
>
> Interesting.
>
> We ultimately rejected a similar patch to xfs_repair (pre-population
> the kernel block device cache) mainly because of low memory
> performance issues and it doesn't really enable you to do anything
> particularly smart with optimising I/O patterns for larger, high
> performance RAID arrays.
>
> The low memory problems were particularly bad; the readahead
> thrashing cause a slowdown of 2-3x compared to the baseline and
> often it was due to the repair process requiring all of memory
> to cache stuff it would need later. IIRC, multi-terabyte ext3
> filesystems have similar memory usage problems to XFS, so there's
> a good chance that this patch will see the same sorts of issues.

That was one of my first concerns - how to avoid overflowing memory?
Whenever I screw it up on e2fsck, it does go, oh, 2 times slower due
to the minor detail of every single block being read from disk twice.
:)

I have a partial solution that sort of blindly manages the buffer
cache.  First, the user passes e2fsck a parameter saying how much
memory is available as buffer cache.  The readahead thread reads
things in and immediately throws them away so they are only in buffer
cache (no double-caching).  Then readahead and e2fsck work together so
that readahead only reads in new blocks when the main thread is done
with earlier blocks.  The already-used blocks get kicked out of buffer
cache to make room for the new ones.

What would be nice is to take into account the current total memory
usage of the whole fsck process and factor that in.  I don't think it
would be hard to add to the existing cache management framework.
Thoughts?

> Promising results, though

Thanks!  It's solving a rather simpler problem than XFS check/repair. :)

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Parallelize IO for e2fsck

2008-01-17 Thread Valerie Henson
On Jan 17, 2008 5:15 PM, David Chinner [EMAIL PROTECTED] wrote:
 On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote:
  Hi y'all,
 
  This is a request for comments on the rewrite of the e2fsck IO
  parallelization patches I sent out a few months ago.  The mechanism is
  totally different.  Previously IO was parallelized by issuing IOs from
  multiple threads; now a single thread issues fadvise(WILLNEED) and
  then uses read() to complete the IO.

 Interesting.

 We ultimately rejected a similar patch to xfs_repair (pre-population
 the kernel block device cache) mainly because of low memory
 performance issues and it doesn't really enable you to do anything
 particularly smart with optimising I/O patterns for larger, high
 performance RAID arrays.

 The low memory problems were particularly bad; the readahead
 thrashing cause a slowdown of 2-3x compared to the baseline and
 often it was due to the repair process requiring all of memory
 to cache stuff it would need later. IIRC, multi-terabyte ext3
 filesystems have similar memory usage problems to XFS, so there's
 a good chance that this patch will see the same sorts of issues.

That was one of my first concerns - how to avoid overflowing memory?
Whenever I screw it up on e2fsck, it does go, oh, 2 times slower due
to the minor detail of every single block being read from disk twice.
:)

I have a partial solution that sort of blindly manages the buffer
cache.  First, the user passes e2fsck a parameter saying how much
memory is available as buffer cache.  The readahead thread reads
things in and immediately throws them away so they are only in buffer
cache (no double-caching).  Then readahead and e2fsck work together so
that readahead only reads in new blocks when the main thread is done
with earlier blocks.  The already-used blocks get kicked out of buffer
cache to make room for the new ones.

What would be nice is to take into account the current total memory
usage of the whole fsck process and factor that in.  I don't think it
would be hard to add to the existing cache management framework.
Thoughts?

 Promising results, though

Thanks!  It's solving a rather simpler problem than XFS check/repair. :)

-VAL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-16 Thread Valerie Henson
On Jan 16, 2008 3:49 AM, Pavel Machek <[EMAIL PROTECTED]> wrote:
>
> ext3's "lets fsck on every 20 mounts" is good idea, but it can be
> annoying when developing. Having option to fsck while filesystem is
> online takes that annoyance away.

I'm sure everyone on cc: knows this, but for the record you can change
ext3's fsck on N mounts or every N days to something that makes sense
for your use case.  Usually I just turn it off entirely and run fsck
by hand when I'm worried:

# tune2fs -c 0 -i 0 /dev/whatever

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-16 Thread Valerie Henson
On Jan 16, 2008 3:49 AM, Pavel Machek [EMAIL PROTECTED] wrote:

 ext3's lets fsck on every 20 mounts is good idea, but it can be
 annoying when developing. Having option to fsck while filesystem is
 online takes that annoyance away.

I'm sure everyone on cc: knows this, but for the record you can change
ext3's fsck on N mounts or every N days to something that makes sense
for your use case.  Usually I just turn it off entirely and run fsck
by hand when I'm worried:

# tune2fs -c 0 -i 0 /dev/whatever

-VAL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Incremental fsck

2008-01-08 Thread Valerie Henson
On Jan 8, 2008 8:40 PM, Al Boldi <[EMAIL PROTECTED]> wrote:
> Rik van Riel wrote:
> > Al Boldi <[EMAIL PROTECTED]> wrote:
> > > Has there been some thought about an incremental fsck?
> > >
> > > You know, somehow fencing a sub-dir to do an online fsck?
> >
> > Search for "chunkfs"
>
> Sure, and there is TileFS too.
>
> But why wouldn't it be possible to do this on the current fs infrastructure,
> using just a smart fsck, working incrementally on some sub-dir?

Several data structures are file system wide and require finding every
allocated file and block to check that they are correct.  In
particular, block and inode bitmaps can't be checked per subdirectory.

http://infohost.nmt.edu/~val/review/chunkfs.pdf

-VAL

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Incremental fsck

2008-01-08 Thread Valerie Henson
On Jan 8, 2008 8:40 PM, Al Boldi [EMAIL PROTECTED] wrote:
 Rik van Riel wrote:
  Al Boldi [EMAIL PROTECTED] wrote:
   Has there been some thought about an incremental fsck?
  
   You know, somehow fencing a sub-dir to do an online fsck?
 
  Search for chunkfs

 Sure, and there is TileFS too.

 But why wouldn't it be possible to do this on the current fs infrastructure,
 using just a smart fsck, working incrementally on some sub-dir?

Several data structures are file system wide and require finding every
allocated file and block to check that they are correct.  In
particular, block and inode bitmaps can't be checked per subdirectory.

http://infohost.nmt.edu/~val/review/chunkfs.pdf

-VAL

-VAL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] ebizzy 0.2 released

2007-10-04 Thread Valerie Henson
On Sun, Sep 30, 2007 at 05:27:03PM -0700, David Miller wrote:
> From: Valerie Henson <[EMAIL PROTECTED]>
> Date: Wed, 22 Aug 2007 19:06:26 -0600
> 
> > ebizzy is designed to generate a workload resembling common web
> > application server workloads.
> 
> I downloaded this only to be basically disappointed.
> 
> Any program which claims to generate workloads "resembling common web
> application server workloads", and yet does zero network activity and
> absolutely nothing with sockets is so far disconnected from reality
> that I truly question how useful it really is even in the context it
> was designed for.
> 
> Please describe this program differently, "a threaded cpu eater", "a
> threaded memory scanner", "a threaded hash lookup", or something
> suitably matching what it really does.
> 
> I'm sure there are at least 10 or even more programs in LTP that one
> could run under "time" and get the same exact functionality.

You're right, that part of the description is misleading. (I've even
had people ask me if it's a file systems benchmark!)

Ebizzy is based on a real web application server and does do things
that are fairly common in such applications (multithreaded memory
allocation and memory access), but it ignores networking for two
reasons: the network stack was not the bottleneck for this workload,
the VM was, and really good network benchmarks already exist. :)
ebizzy is not useful to networking (or file systems) developer, but it
has been used to improve malloc() behavior in glibc and to test VMA
handling optimizations.

In general, I try to make the source of a benchmark clear because it's
so tempting to optimize for completely artificial benchmarks.  The
trick is to do this without misleading the reader (or breaking my NDA).

ebizzy
--

ebizzy is a workload that stresses memory allocation and the virtual
memory subsystem.  It was initially written to model the local
computation portion of a web application server running a large
internet commerce site.  ebizzy is highly threaded, has a large
in-memory working set with poor locality, and allocates and
deallocates memory frequently.  When running most efficiently, ebizzy
will max out the CPU.  When running inefficiently, it will be blocked
much of the time.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] ebizzy 0.2 released

2007-10-04 Thread Valerie Henson
On Sun, Sep 30, 2007 at 05:27:03PM -0700, David Miller wrote:
 From: Valerie Henson [EMAIL PROTECTED]
 Date: Wed, 22 Aug 2007 19:06:26 -0600
 
  ebizzy is designed to generate a workload resembling common web
  application server workloads.
 
 I downloaded this only to be basically disappointed.
 
 Any program which claims to generate workloads resembling common web
 application server workloads, and yet does zero network activity and
 absolutely nothing with sockets is so far disconnected from reality
 that I truly question how useful it really is even in the context it
 was designed for.
 
 Please describe this program differently, a threaded cpu eater, a
 threaded memory scanner, a threaded hash lookup, or something
 suitably matching what it really does.
 
 I'm sure there are at least 10 or even more programs in LTP that one
 could run under time and get the same exact functionality.

You're right, that part of the description is misleading. (I've even
had people ask me if it's a file systems benchmark!)

Ebizzy is based on a real web application server and does do things
that are fairly common in such applications (multithreaded memory
allocation and memory access), but it ignores networking for two
reasons: the network stack was not the bottleneck for this workload,
the VM was, and really good network benchmarks already exist. :)
ebizzy is not useful to networking (or file systems) developer, but it
has been used to improve malloc() behavior in glibc and to test VMA
handling optimizations.

In general, I try to make the source of a benchmark clear because it's
so tempting to optimize for completely artificial benchmarks.  The
trick is to do this without misleading the reader (or breaking my NDA).

ebizzy
--

ebizzy is a workload that stresses memory allocation and the virtual
memory subsystem.  It was initially written to model the local
computation portion of a web application server running a large
internet commerce site.  ebizzy is highly threaded, has a large
in-memory working set with poor locality, and allocates and
deallocates memory frequently.  When running most efficiently, ebizzy
will max out the CPU.  When running inefficiently, it will be blocked
much of the time.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] ebizzy 0.2 released

2007-08-22 Thread Valerie Henson
ebizzy is designed to generate a workload resembling common web
application server workloads.  It is especially useful for testing
changes to memory management, and whenever a highly threaded
application with a large working set and many vmas is needed.

This is release 0.2 of ebizzy.  It reports a rate of transactions per
second, compiles on Solaris, and scales better.  Thanks especially to
Rodrigo Rubira Branco, Brian Twichell, and Yong Cai for their work on
this release.

Available for download at the fancy new Sourceforge site:

http://sourceforge.net/projects/ebizzy/

ChangeLog below.

-VAL

2008-08-15 Valerie Henson <[EMAIL PROTECTED]>

* Release 0.2.

* Started reporting a rate of transactions per second rather than
just measuring the time.

* Solaris compatibility, thanks to Rodrigo Rubira Branco
<[EMAIL PROTECTED]> for frequent patches and testing.

* rand() was limiting scalability, use cheap dumb inline "random"
function to avoid that.  Thanks to Brian Twichell
<[EMAIL PROTECTED]> for finding it and Yong Cai
<[EMAIL PROTECTED]> for testing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] ebizzy 0.2 released

2007-08-22 Thread Valerie Henson
ebizzy is designed to generate a workload resembling common web
application server workloads.  It is especially useful for testing
changes to memory management, and whenever a highly threaded
application with a large working set and many vmas is needed.

This is release 0.2 of ebizzy.  It reports a rate of transactions per
second, compiles on Solaris, and scales better.  Thanks especially to
Rodrigo Rubira Branco, Brian Twichell, and Yong Cai for their work on
this release.

Available for download at the fancy new Sourceforge site:

http://sourceforge.net/projects/ebizzy/

ChangeLog below.

-VAL

2008-08-15 Valerie Henson [EMAIL PROTECTED]

* Release 0.2.

* Started reporting a rate of transactions per second rather than
just measuring the time.

* Solaris compatibility, thanks to Rodrigo Rubira Branco
[EMAIL PROTECTED] for frequent patches and testing.

* rand() was limiting scalability, use cheap dumb inline random
function to avoid that.  Thanks to Brian Twichell
[EMAIL PROTECTED] for finding it and Yong Cai
[EMAIL PROTECTED] for testing.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-11 Thread Valerie Henson
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote:
> Andrew Morton wrote:
> >On Wed, 08 Aug 2007 14:10:15 -0700
> >"Martin J. Bligh" <[EMAIL PROTECTED]> wrote:
> >
> >>Why isn't this easily fixable by just adding an additional dirty
> >>flag that says atime has changed? Then we only cause a write
> >>when we remove the inode from the inode cache, if only atime
> >>is updated.
> >
> >I think that could be made to work, and it would fix the performance
> >issue.
> >
> >It is a behaviour change.  At present ext3 (for example) commits everything
> >every five seconds.  After a change like this, a crash+recovery could cause
> >a file's atime to go backwards by an arbitrarily large time interval - it
> >could easily be months.
> 
> A second pdflush / workqueue at a slower rate would alleviate that.

This becomes delayed atime writes.  I'm not sure that it's better to
batch up the writes and do them all in one big seeky go, or to trickle
them out as they are done.  Best of all is not to do them at all.

Note when talking about saving up atime updates to write out that the
final write is going to be sloow.  Inodes are typically 128 bytes,
and you may have to do a seek between every one.  Currents disks can
do on the order of 100 seeks a second.  So do a find on 1000 files and
you've just created 10 seconds of I/O hanging out in memory.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-11 Thread Valerie Henson
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote:
 Andrew Morton wrote:
 On Wed, 08 Aug 2007 14:10:15 -0700
 Martin J. Bligh [EMAIL PROTECTED] wrote:
 
 Why isn't this easily fixable by just adding an additional dirty
 flag that says atime has changed? Then we only cause a write
 when we remove the inode from the inode cache, if only atime
 is updated.
 
 I think that could be made to work, and it would fix the performance
 issue.
 
 It is a behaviour change.  At present ext3 (for example) commits everything
 every five seconds.  After a change like this, a crash+recovery could cause
 a file's atime to go backwards by an arbitrarily large time interval - it
 could easily be months.
 
 A second pdflush / workqueue at a slower rate would alleviate that.

This becomes delayed atime writes.  I'm not sure that it's better to
batch up the writes and do them all in one big seeky go, or to trickle
them out as they are done.  Best of all is not to do them at all.

Note when talking about saving up atime updates to write out that the
final write is going to be sloow.  Inodes are typically 128 bytes,
and you may have to do a seek between every one.  Currents disks can
do on the order of 100 seeks a second.  So do a find on 1000 files and
you've just created 10 seconds of I/O hanging out in memory.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TULIP] Need new maintainer

2007-07-30 Thread Valerie Henson
On Mon, Jul 30, 2007 at 03:31:58PM -0400, Kyle McMartin wrote:
> On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote:
> > The Tulip network driver needs a new maintainer!  I no longer have
> > time to maintain the Tulip network driver and I'm stepping down.  Jeff
> > Garzik would be happy to get volunteers.
> > 
> 
> Since I already take care of a major consumer of these devices (parisc,
> which pretty much all have tulip) I'm willing to take care of this.
> Alternately, Grant is probably willing.

And I coulda handed you a suitcase full of cards and I missed my
chance!

It's fine by me, although Jeff is the final arbiter.

Thanks!

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tulip: Remove tulip maintainer

2007-07-30 Thread Valerie Henson
Remove Val Henson as tulip maintainer and let her roam free, FREE!

Signed-off-by: Val Henson <[EMAIL PROTECTED]>

--- linux-2.6.orig/MAINTAINERS
+++ linux-2.6/MAINTAINERS
@@ -3569,11 +3569,9 @@ W:   http://www.auk.cx/tms380tr/
 S: Maintained
 
 TULIP NETWORK DRIVER
-P: Valerie Henson
-M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/tulip/
-S: Maintained
+S: Orphan
 
 TUN/TAP driver
 P: Maxim Krasnyansky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[TULIP] Need new maintainer

2007-07-30 Thread Valerie Henson
The Tulip network driver needs a new maintainer!  I no longer have
time to maintain the Tulip network driver and I'm stepping down.  Jeff
Garzik would be happy to get volunteers.

The only current major outstanding patch I know of is Grant's shutdown
race patch, which was incorrectly dropped as obsoleted from -mm (my
fault, I was moving at the time):

http://www.mail-archive.com/[EMAIL PROTECTED]/msg12161.html

I have a very much non-working patch to do it with the preferred
order, ask me for it and I'll see if I can dig it up.  It's unpleasant
partly because it pointed out a lot of latent bugs (e.g.,
del_timer_sync() in interrupt context).

Also, someone is working on support for an emulated Tulip card (yes,
Tulip will _never_ die), so expect possible patches for that.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[TULIP] Need new maintainer

2007-07-30 Thread Valerie Henson
The Tulip network driver needs a new maintainer!  I no longer have
time to maintain the Tulip network driver and I'm stepping down.  Jeff
Garzik would be happy to get volunteers.

The only current major outstanding patch I know of is Grant's shutdown
race patch, which was incorrectly dropped as obsoleted from -mm (my
fault, I was moving at the time):

http://www.mail-archive.com/[EMAIL PROTECTED]/msg12161.html

I have a very much non-working patch to do it with the preferred
order, ask me for it and I'll see if I can dig it up.  It's unpleasant
partly because it pointed out a lot of latent bugs (e.g.,
del_timer_sync() in interrupt context).

Also, someone is working on support for an emulated Tulip card (yes,
Tulip will _never_ die), so expect possible patches for that.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tulip: Remove tulip maintainer

2007-07-30 Thread Valerie Henson
Remove Val Henson as tulip maintainer and let her roam free, FREE!

Signed-off-by: Val Henson [EMAIL PROTECTED]

--- linux-2.6.orig/MAINTAINERS
+++ linux-2.6/MAINTAINERS
@@ -3569,11 +3569,9 @@ W:   http://www.auk.cx/tms380tr/
 S: Maintained
 
 TULIP NETWORK DRIVER
-P: Valerie Henson
-M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/tulip/
-S: Maintained
+S: Orphan
 
 TUN/TAP driver
 P: Maxim Krasnyansky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TULIP] Need new maintainer

2007-07-30 Thread Valerie Henson
On Mon, Jul 30, 2007 at 03:31:58PM -0400, Kyle McMartin wrote:
 On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote:
  The Tulip network driver needs a new maintainer!  I no longer have
  time to maintain the Tulip network driver and I'm stepping down.  Jeff
  Garzik would be happy to get volunteers.
  
 
 Since I already take care of a major consumer of these devices (parisc,
 which pretty much all have tulip) I'm willing to take care of this.
 Alternately, Grant is probably willing.

And I coulda handed you a suitcase full of cards and I missed my
chance!

It's fine by me, although Jeff is the final arbiter.

Thanks!

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Cross-chunk reference checking time estimates

2007-05-31 Thread Valerie Henson
Hey all,

I altered Karuna's cref tool to print the number of seconds it would
take to check the cross-references for a chunk.  The results look good
for chunkfs: on my laptop /home file system and a 1 GB chunk size, the
per-chunk cross-reference check time would be an average of 5 seconds
and a max of 160 seconds in 2013.  This is calculated assuming average
seek time and rotational latency delay for every cross-reference
checked; some simple batching of I/Os could significantly improve
that.

The tool is a little dodgy on error handling and other edge cases ATM,
but for now, here's the results and the code (attached):

[EMAIL PROTECTED]:~/chunkfs/cref_new$ sudo ./cref.sh /dev/hda3 dump /home 1024
Total size = 19535040 KB
Total data stored = 13998240 KB
Number of files = 445406
Number of directories = 31836
Number of special files = 12156
Size of block groups = 1048576 KB
Inodes per block group = 130304
Intra-file cross references = 63167
Directory-subdirectory references = 429
Directory-file references = 2381
Total directory cross references = 2810
Total cross references = 65977
Total cross references = 65977
Average cross references per group = 439
Maximum cross references in a group = 13997
Max group is 4 (0:3, 1:46, 2:282, 3:4996, 5:8445, 6:2, 7:1, 8:27, 9:1, 10:2, 
12:1, 13:51, 14:32, 15:99, 16:2, 17:5, 18:2, )
Average additional time to check cross references = 6.77 s
Max additional time to check cross references = 215.55 s
2013 average additional time to check cross references = 4.93 s
2013 max additional time to check cross references = 156.77 s

Questions?  Come talk on #linuxfs at irc.oftc.net.

-VAL


cref_new.tar.gz
Description: GNU Zip compressed data


Cross-chunk reference checking time estimates

2007-05-31 Thread Valerie Henson
Hey all,

I altered Karuna's cref tool to print the number of seconds it would
take to check the cross-references for a chunk.  The results look good
for chunkfs: on my laptop /home file system and a 1 GB chunk size, the
per-chunk cross-reference check time would be an average of 5 seconds
and a max of 160 seconds in 2013.  This is calculated assuming average
seek time and rotational latency delay for every cross-reference
checked; some simple batching of I/Os could significantly improve
that.

The tool is a little dodgy on error handling and other edge cases ATM,
but for now, here's the results and the code (attached):

[EMAIL PROTECTED]:~/chunkfs/cref_new$ sudo ./cref.sh /dev/hda3 dump /home 1024
Total size = 19535040 KB
Total data stored = 13998240 KB
Number of files = 445406
Number of directories = 31836
Number of special files = 12156
Size of block groups = 1048576 KB
Inodes per block group = 130304
Intra-file cross references = 63167
Directory-subdirectory references = 429
Directory-file references = 2381
Total directory cross references = 2810
Total cross references = 65977
Total cross references = 65977
Average cross references per group = 439
Maximum cross references in a group = 13997
Max group is 4 (0:3, 1:46, 2:282, 3:4996, 5:8445, 6:2, 7:1, 8:27, 9:1, 10:2, 
12:1, 13:51, 14:32, 15:99, 16:2, 17:5, 18:2, )
Average additional time to check cross references = 6.77 s
Max additional time to check cross references = 215.55 s
2013 average additional time to check cross references = 4.93 s
2013 max additional time to check cross references = 156.77 s

Questions?  Come talk on #linuxfs at irc.oftc.net.

-VAL


cref_new.tar.gz
Description: GNU Zip compressed data


[PATCH] Update tulip maintainer email address

2007-05-30 Thread Valerie Henson
I've quit Intel and gone into business as a Linux consultant.  Update
my email address in MAINTAINERS.

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

--- laptop-2.6.orig/MAINTAINERS
+++ laptop-2.6/MAINTAINERS
@@ -3497,7 +3497,7 @@ S:Maintained

 TULIP NETWORK DRIVER
 P: Valerie Henson
-M: [EMAIL PROTECTED]
+M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/tulip/
 S: Maintained
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Update tulip maintainer email address

2007-05-30 Thread Valerie Henson
I've quit Intel and gone into business as a Linux consultant.  Update
my email address in MAINTAINERS.

Signed-off-by: Valerie Henson [EMAIL PROTECTED]

--- laptop-2.6.orig/MAINTAINERS
+++ laptop-2.6/MAINTAINERS
@@ -3497,7 +3497,7 @@ S:Maintained

 TULIP NETWORK DRIVER
 P: Valerie Henson
-M: [EMAIL PROTECTED]
+M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/tulip/
 S: Maintained
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ChunkFS - measuring cross-chunk references

2007-05-06 Thread Valerie Henson
On Mon, Apr 23, 2007 at 02:05:47AM +0530, Karuna sagar K wrote:
> Hi,
> 
> The attached code contains program to estimate the cross-chunk
> references for ChunkFS file system (idea from Valh). Below are the
> results:

Nice work!  Thank you very much for doing this!

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ChunkFS - measuring cross-chunk references

2007-05-06 Thread Valerie Henson
On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote:
> 
> Also, is it considered a cross-chunk reference if a directory entry is
> referencing an inode in another group?  Should there be a continuation
> inode in the local group, or is the directory entry itself enough?

(Sorry for the delay; just moved to Portland these last couple of
weeks.)

It is a cross-chunk reference - we can't calculate the correct link
count for the target file unless we have a quick way to get all the
directory entries pointing to an inode.  My current scheme is to
create a continuation inode for the directory in the chunk containing
the inode (if the chunk containing the inode is full, create new
continuation inodes for both in a new chunk).

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ChunkFS - measuring cross-chunk references

2007-05-06 Thread Valerie Henson
On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote:
 
 Also, is it considered a cross-chunk reference if a directory entry is
 referencing an inode in another group?  Should there be a continuation
 inode in the local group, or is the directory entry itself enough?

(Sorry for the delay; just moved to Portland these last couple of
weeks.)

It is a cross-chunk reference - we can't calculate the correct link
count for the target file unless we have a quick way to get all the
directory entries pointing to an inode.  My current scheme is to
create a continuation inode for the directory in the chunk containing
the inode (if the chunk containing the inode is full, create new
continuation inodes for both in a new chunk).

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ChunkFS - measuring cross-chunk references

2007-05-06 Thread Valerie Henson
On Mon, Apr 23, 2007 at 02:05:47AM +0530, Karuna sagar K wrote:
 Hi,
 
 The attached code contains program to estimate the cross-chunk
 references for ChunkFS file system (idea from Valh). Below are the
 results:

Nice work!  Thank you very much for doing this!

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext3 vs NTFS performance

2007-05-04 Thread Valerie Henson
On Fri, May 04, 2007 at 08:23:08AM -0400, Theodore Tso wrote:
> On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote:
> 
> > I'd really like to see a generic VFS-level detection of
> > read()/write()/creat()/mkdir()/etc. patterns which could detect things
> > like "Oh, this file is likely to be deleted immediately, wait and see
> > if it goes away and don't bother sending it on to the FS immediately"
> > or "Looks like this file will grow pretty big, let's go pre-allocate
> > some space for it."  This is probably best done as a set of helper
> > functions in the usual way.
> 
> What patterns do you think means things like "this file is likely to
> be deleted immediate", or "this file will grow pretty big"?  I don't
> think there are any that would be generally valid.

I wouldn't have guessed that either, but it turns out there are:

http://www.eecs.harvard.edu/~ellard/pubs/able-usenix04.pdf

We present evidence that attributes that are known to
the file system when a file is created, such as its name,
permission mode, and owner, are often strongly related
to future properties of the file such as its ultimate size,
lifespan, and access pattern. More importantly, we show
that we can exploit these relationships to automatically
generate predictive models for these properties, and that
these predictions are sufficiently accurate to enable opti-
mizations.

For example, lock files have predictable names and permissions, and
live for a fraction of second in most cases.  Files which are appended
a few hundred bytes at a time are probably log files and will continue
to grow in this manner.  Some of their predictions were 98% accurate!

In any case, any predictive algorithms we already do at the file
system level can be done at the VFS level, and shared between file
systems, instead of being reimplemented over and over again.  Just
food for thought.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext3 vs NTFS performance

2007-05-04 Thread Valerie Henson
On Fri, May 04, 2007 at 08:23:08AM -0400, Theodore Tso wrote:
 On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote:
 
  I'd really like to see a generic VFS-level detection of
  read()/write()/creat()/mkdir()/etc. patterns which could detect things
  like Oh, this file is likely to be deleted immediately, wait and see
  if it goes away and don't bother sending it on to the FS immediately
  or Looks like this file will grow pretty big, let's go pre-allocate
  some space for it.  This is probably best done as a set of helper
  functions in the usual way.
 
 What patterns do you think means things like this file is likely to
 be deleted immediate, or this file will grow pretty big?  I don't
 think there are any that would be generally valid.

I wouldn't have guessed that either, but it turns out there are:

http://www.eecs.harvard.edu/~ellard/pubs/able-usenix04.pdf

We present evidence that attributes that are known to
the file system when a file is created, such as its name,
permission mode, and owner, are often strongly related
to future properties of the file such as its ultimate size,
lifespan, and access pattern. More importantly, we show
that we can exploit these relationships to automatically
generate predictive models for these properties, and that
these predictions are sufficiently accurate to enable opti-
mizations.

For example, lock files have predictable names and permissions, and
live for a fraction of second in most cases.  Files which are appended
a few hundred bytes at a time are probably log files and will continue
to grow in this manner.  Some of their predictions were 98% accurate!

In any case, any predictive algorithms we already do at the file
system level can be done at the VFS level, and shared between file
systems, instead of being reimplemented over and over again.  Just
food for thought.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext3 vs NTFS performance

2007-05-03 Thread Valerie Henson
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote:
> On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote:
> > Hello all,
> > 
> > I've been testing the NAS performance of ext3/Openfiler 2.2 against
> > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for
> > video workloads. The Windows CIFS client will attempt a poor-man's
> > pre-allocation of the file on the server by sending 1-byte writes at
> > 128K-byte strides, breaking block allocation on ext3 and leading to
> > fragmentation and poor performance. This will happen for many
> > applications (including iTunes) as the CIFS client issues these
> > pre-allocates under the application layer.
> > 
> > I've posted a brief paper on Intel's OSS website
> > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give
> > it a read and let me know what you think. In particular, I'd like to
> > arrive at the right place to fix this problem: is it in the filesystem,
> > VFS, or Samba?
> 
> As I commented on IRC to Val Henson - the XFS performance indicates
> that it is not a VFS or Samba problem.

In terms of what piece of code we can swap out and get good
performance, the problem is indeed in ext3 - it's clear that the cause
of the bad performance is the 1-byte writes resulting in ext3
fragmenting the on-disk layout of the file, and replacing it with XFS
results in nice, clean, unfragmented files.

But in terms of what we should do to fix it, there is the possibility
of some debate.  In general, I think there is a lot of code stuck down
in individual file systems - especially in XFS - that could be
usefully hoisted up to a higher level as generic helper functions.
For example, we've got at least two implementations of reservations,
one in XFS and one in ext3/4.  At least some of the code could be
generic - both file systems want to reserve long contiguous extents -
with the actual mechanics of looking up and reserving free blocks
implemented in per-fs code.

I'd really like to see a generic VFS-level detection of
read()/write()/creat()/mkdir()/etc. patterns which could detect things
like "Oh, this file is likely to be deleted immediately, wait and see
if it goes away and don't bother sending it on to the FS immediately"
or "Looks like this file will grow pretty big, let's go pre-allocate
some space for it."  This is probably best done as a set of helper
functions in the usual way.

For this particular case, Ted is probably right and the only place
we'll ever see this insane poor man's pre-allocate pattern is from the
Windows CIFS client, in which case fixing this in Samba makes sense -
although I'm a bit horrified by the idea of writing 128K of zeroes to
pre-allocate... oh well, it's temporary, and what we care about here
is the read performance, more than the write performance.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext3 vs NTFS performance

2007-05-03 Thread Valerie Henson
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote:
 On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote:
  Hello all,
  
  I've been testing the NAS performance of ext3/Openfiler 2.2 against
  NTFS/WinXP and have found that NTFS significantly outperforms ext3 for
  video workloads. The Windows CIFS client will attempt a poor-man's
  pre-allocation of the file on the server by sending 1-byte writes at
  128K-byte strides, breaking block allocation on ext3 and leading to
  fragmentation and poor performance. This will happen for many
  applications (including iTunes) as the CIFS client issues these
  pre-allocates under the application layer.
  
  I've posted a brief paper on Intel's OSS website
  (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give
  it a read and let me know what you think. In particular, I'd like to
  arrive at the right place to fix this problem: is it in the filesystem,
  VFS, or Samba?
 
 As I commented on IRC to Val Henson - the XFS performance indicates
 that it is not a VFS or Samba problem.

In terms of what piece of code we can swap out and get good
performance, the problem is indeed in ext3 - it's clear that the cause
of the bad performance is the 1-byte writes resulting in ext3
fragmenting the on-disk layout of the file, and replacing it with XFS
results in nice, clean, unfragmented files.

But in terms of what we should do to fix it, there is the possibility
of some debate.  In general, I think there is a lot of code stuck down
in individual file systems - especially in XFS - that could be
usefully hoisted up to a higher level as generic helper functions.
For example, we've got at least two implementations of reservations,
one in XFS and one in ext3/4.  At least some of the code could be
generic - both file systems want to reserve long contiguous extents -
with the actual mechanics of looking up and reserving free blocks
implemented in per-fs code.

I'd really like to see a generic VFS-level detection of
read()/write()/creat()/mkdir()/etc. patterns which could detect things
like Oh, this file is likely to be deleted immediately, wait and see
if it goes away and don't bother sending it on to the FS immediately
or Looks like this file will grow pretty big, let's go pre-allocate
some space for it.  This is probably best done as a set of helper
functions in the usual way.

For this particular case, Ted is probably right and the only place
we'll ever see this insane poor man's pre-allocate pattern is from the
Windows CIFS client, in which case fixing this in Samba makes sense -
although I'm a bit horrified by the idea of writing 128K of zeroes to
pre-allocate... oh well, it's temporary, and what we care about here
is the read performance, more than the write performance.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-05-01 Thread Valerie Henson
On Fri, Apr 27, 2007 at 11:06:47AM -0400, Jeff Dike wrote:
> On Thu, Apr 26, 2007 at 09:58:25PM -0700, Valerie Henson wrote:
> > Here's an example, spelled out:
> > 
> > Allocate file 1 in chunk A.
> > Grow file 1.
> > Chunk A fills up.
> > Allocate continuation inode for file 1 in chunk B.
> > Chunk A gets some free space.
> > Chunk B fills up.
> > Pick chunk A for allocating next block of file 1.
> > Try to look up a continuation inode for file 1 in chunk A.
> > Continuation inode for file 1 found in chunk A!
> > Attach newly allocated block to existing inode for file 1 in chunk A.
> 
> So far, so good (and the slides are helpful, tx!).  What happens when
> file 1 keeps growing and chunk A fills up (and chunk B is still full)?
> Can the same continuation inode also point at chunk C, where the file
> is going to grow to?

You allocate a new continuation inode in chunk C.  The rule is that
only inodes inside a chunk can point to blocks inside the chunk, so
you need an inode in C if you want to allocate blocks from C.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-05-01 Thread Valerie Henson
On Fri, Apr 27, 2007 at 11:06:47AM -0400, Jeff Dike wrote:
 On Thu, Apr 26, 2007 at 09:58:25PM -0700, Valerie Henson wrote:
  Here's an example, spelled out:
  
  Allocate file 1 in chunk A.
  Grow file 1.
  Chunk A fills up.
  Allocate continuation inode for file 1 in chunk B.
  Chunk A gets some free space.
  Chunk B fills up.
  Pick chunk A for allocating next block of file 1.
  Try to look up a continuation inode for file 1 in chunk A.
  Continuation inode for file 1 found in chunk A!
  Attach newly allocated block to existing inode for file 1 in chunk A.
 
 So far, so good (and the slides are helpful, tx!).  What happens when
 file 1 keeps growing and chunk A fills up (and chunk B is still full)?
 Can the same continuation inode also point at chunk C, where the file
 is going to grow to?

You allocate a new continuation inode in chunk C.  The rule is that
only inodes inside a chunk can point to blocks inside the chunk, so
you need an inode in C if you want to allocate blocks from C.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-28 Thread Valerie Henson
On Fri, Apr 27, 2007 at 12:53:34PM +0200, J??rn Engel wrote:
> 
> All this would get easier if continuation inodes were known to be rare.
> You can ditch the doubly-linked list in favor of a pointer to the main
> inode then - traversing the list again is cheap, after all.  And you can
> just try to read the same block once for every continuation inode.
> 
> If those lists can get long and you need a mapping from offset to
> continuation inode on the medium, you are basically fscked.  Storing the
> mapping requires space.  You need the mapping only when space (in some
> chunk) gets tight and you allocate continuation inodes.  So either you
> don't need the mapping or you don't have a good place to put it.

Any mapping structure will have to be pre-allocated.

> Having a mapping in memory is also questionable.  Either you scan the
> whole file on first access and spend a long time for large files.  Or
> you create the mapping on the fly.  In that case the page cache will
> already give you a 90% solution for free.

So in my secret heart of hearts, I do indeed hope that cnodes are rare
enough that we don't actually have to do anything smart to make them
go fast.  Either having no fast lookup structure or creating it in
memory as needed would be the nicest solution.  However, since I can't
guarantee this will be the case, it's nice to have some idea of what
we'll do if this does become important.

> You should spend a lot of effort trying to minimize cnodes. ;)

Yep.  It's much better to optimize away most cnodes instead of trying
to make the go fast.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-28 Thread Valerie Henson
On Fri, Apr 27, 2007 at 12:53:34PM +0200, J??rn Engel wrote:
 
 All this would get easier if continuation inodes were known to be rare.
 You can ditch the doubly-linked list in favor of a pointer to the main
 inode then - traversing the list again is cheap, after all.  And you can
 just try to read the same block once for every continuation inode.
 
 If those lists can get long and you need a mapping from offset to
 continuation inode on the medium, you are basically fscked.  Storing the
 mapping requires space.  You need the mapping only when space (in some
 chunk) gets tight and you allocate continuation inodes.  So either you
 don't need the mapping or you don't have a good place to put it.

Any mapping structure will have to be pre-allocated.

 Having a mapping in memory is also questionable.  Either you scan the
 whole file on first access and spend a long time for large files.  Or
 you create the mapping on the fly.  In that case the page cache will
 already give you a 90% solution for free.

So in my secret heart of hearts, I do indeed hope that cnodes are rare
enough that we don't actually have to do anything smart to make them
go fast.  Either having no fast lookup structure or creating it in
memory as needed would be the nicest solution.  However, since I can't
guarantee this will be the case, it's nice to have some idea of what
we'll do if this does become important.

 You should spend a lot of effort trying to minimize cnodes. ;)

Yep.  It's much better to optimize away most cnodes instead of trying
to make the go fast.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ZFS with Linux: An Open Plea

2007-04-26 Thread Valerie Henson
On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote:
> 
> Does it matter that google's recent report on disk failures indicated
> that SMART never predicted anything useful as far as they could tell?
> Certainly none of my drive failures ever had SMART make any kind of
> indication that anything was wrong.

I saw that talk, and that's not what I got out of it.  They found that
SMART error reports _did_ correlate with drive failure.  See page 8
of:

http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf

(If you're not a USENIX member, you may be able to find a free
download copy elsewhere.)

However, they found that the correlation was not strong enough to make
it economically feasible to replace disks reporting SMART failures,
since something like 70% of disks were still working a year after the
first failure report.  Also, they found that some disks failed without
any SMART error reports.

Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of
data, so for them, "economically feasible" means something different
than for my personal laptop hard drive.  I have twice had my laptop
hard drive start spitting SMART errors and then die within a week.  It
is economically quite sensible for me to replace my laptop drive once
it has an error, since I don't carry around 3 laptops everywhere I go.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Valerie Henson
On Thu, Apr 26, 2007 at 10:47:38AM +0200, Jan Kara wrote:
>   Do I get it right that you just have in each cnode a pointer to the
> previous & next cnode? But then if two consecutive cnodes get corrupted,
> you have no way to connect the chain, do you? If each cnode contained
> some unique identifier of the file and a number identifying position of
> cnode,  then there would be at least some way (through expensive) to
> link them together correctly...

You're right, it's easy to add a little more redundancy that would
make it possible to recover from two consecutive nodes being
corrupted.  Keeping a parent inode id in each continuation inode is
definitely a smart thing to do.

Some minor side notes: Continuation inodes aren't really in any
defined order - if you look at Jeff's ping-pong chunk allocation
example, you'll see that the data in each continuation inode won't be
in linearly increasing order.  Also, while the current implementation
is a simple doubly-linked list, this may not be the best solution
long-term.  What's important is that each continuation inode have a
back pointer to the parent and that there is some structure for
quickly looking up the continuation inode for a given file offset.
Suggestions for data structures that work well in this situation are
welcome. :)

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Valerie Henson
On Thu, Apr 26, 2007 at 12:05:04PM -0400, Jeff Dike wrote:
> 
> No, I'm referring to a different file.  The scenario is that you have
> a growing file in a nearly full disk with files being deleted (and
> thus space being freed) such that allocations for the growing file
> bounce back and forth between chunks.

This is an excellent question.  I call this the ping-pong problem.
The solution is as Amit describes: You have a maximum of one
continuation inode per file per chunk, and you require sparse files.
Here's an example, spelled out:

Allocate file 1 in chunk A.
Grow file 1.
Chunk A fills up.
Allocate continuation inode for file 1 in chunk B.
Chunk A gets some free space.
Chunk B fills up.
Pick chunk A for allocating next block of file 1.
Try to look up a continuation inode for file 1 in chunk A.
Continuation inode for file 1 found in chunk A!
Attach newly allocated block to existing inode for file 1 in chunk A.

This is why the file format inside each chunk needs to support sparse
files.

I have a presentation that has a series of slides on problems and
potential resolutions that might help:

http://infohost.nmt.edu/~val/review/chunkfs_presentation.pdf

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Valerie Henson
On Thu, Apr 26, 2007 at 12:05:04PM -0400, Jeff Dike wrote:
 
 No, I'm referring to a different file.  The scenario is that you have
 a growing file in a nearly full disk with files being deleted (and
 thus space being freed) such that allocations for the growing file
 bounce back and forth between chunks.

This is an excellent question.  I call this the ping-pong problem.
The solution is as Amit describes: You have a maximum of one
continuation inode per file per chunk, and you require sparse files.
Here's an example, spelled out:

Allocate file 1 in chunk A.
Grow file 1.
Chunk A fills up.
Allocate continuation inode for file 1 in chunk B.
Chunk A gets some free space.
Chunk B fills up.
Pick chunk A for allocating next block of file 1.
Try to look up a continuation inode for file 1 in chunk A.
Continuation inode for file 1 found in chunk A!
Attach newly allocated block to existing inode for file 1 in chunk A.

This is why the file format inside each chunk needs to support sparse
files.

I have a presentation that has a series of slides on problems and
potential resolutions that might help:

http://infohost.nmt.edu/~val/review/chunkfs_presentation.pdf

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Valerie Henson
On Thu, Apr 26, 2007 at 10:47:38AM +0200, Jan Kara wrote:
   Do I get it right that you just have in each cnode a pointer to the
 previous  next cnode? But then if two consecutive cnodes get corrupted,
 you have no way to connect the chain, do you? If each cnode contained
 some unique identifier of the file and a number identifying position of
 cnode,  then there would be at least some way (through expensive) to
 link them together correctly...

You're right, it's easy to add a little more redundancy that would
make it possible to recover from two consecutive nodes being
corrupted.  Keeping a parent inode id in each continuation inode is
definitely a smart thing to do.

Some minor side notes: Continuation inodes aren't really in any
defined order - if you look at Jeff's ping-pong chunk allocation
example, you'll see that the data in each continuation inode won't be
in linearly increasing order.  Also, while the current implementation
is a simple doubly-linked list, this may not be the best solution
long-term.  What's important is that each continuation inode have a
back pointer to the parent and that there is some structure for
quickly looking up the continuation inode for a given file offset.
Suggestions for data structures that work well in this situation are
welcome. :)

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ZFS with Linux: An Open Plea

2007-04-26 Thread Valerie Henson
On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote:
 
 Does it matter that google's recent report on disk failures indicated
 that SMART never predicted anything useful as far as they could tell?
 Certainly none of my drive failures ever had SMART make any kind of
 indication that anything was wrong.

I saw that talk, and that's not what I got out of it.  They found that
SMART error reports _did_ correlate with drive failure.  See page 8
of:

http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf

(If you're not a USENIX member, you may be able to find a free
download copy elsewhere.)

However, they found that the correlation was not strong enough to make
it economically feasible to replace disks reporting SMART failures,
since something like 70% of disks were still working a year after the
first failure report.  Also, they found that some disks failed without
any SMART error reports.

Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of
data, so for them, economically feasible means something different
than for my personal laptop hard drive.  I have twice had my laptop
hard drive start spitting SMART errors and then die within a week.  It
is economically quite sensible for me to replace my laptop drive once
it has an error, since I don't carry around 3 laptops everywhere I go.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Wed, Apr 25, 2007 at 05:38:34AM -0600, Andreas Dilger wrote:
> 
> The case where only a fsck of the corrupt chunk is done would not find the
> cnode references.  Maybe there needs to be per-chunk info which contains
> a list/bitmap of other chunks that have cnodes shared with each chunk?

Yes, exactly.  One might almost think you had solved this problem
before. :):):)

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote:
> On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote:
> > 
> > The structure looks like this:
> > 
> >  -- --
> > | cnode 0  |-->| cnode 0  |--> to another cnode or NULL
> >  -- --
> > | cnode 1  |-  | cnode 1  |-
> >  -- |   --  |
> > | cnode 2  |-- |  | cnode 2  |--   |
> >  --  | |--  |   |
> > | cnode 3  | | |  | cnode 3  | |   |
> >  --  | |--  |   |
> >   |  |  ||  |   |
> > 
> >inodes   inodes or NULL
> 
> How do you recover if fsfuzzer takes out a cnode in the chain? The
> chunk is marked clean, but clearly corrupted and needs fixing and
> you don't know what it was pointing at.  Hence you have a pointer to
> a trashed cnode *somewhere* that you need to find and fix, and a
> bunch of orphaned cnodes that nobody points to *somewhere else* in
> the filesystem that you have to find. That's a full scan fsck case,
> isn't?

Excellent question.  This is one of the trickier aspects of chunkfs -
the orphan inode problem (tricky, but solvable).  The problem is what
if you smash/lose/corrupt an inode in one chunk that has a
continuation inode in another chunk?  A back pointer does you no good
if the back pointer is corrupted.

What you do is keep tabs on whether you see damage that looks like
this has occurred - e.g., inode use/free counts wrong, you had to zero
a corrupted inode - and when this happens, you do a scan of all
continuation inodes in chunks that have links to the corrupted chunk.
What you need to make this go fast is (1) a pre-made list of which
chunks have links with which other chunks, (2) a fast way to read all
of the continuation inodes in a chunk (ignoring chunk-local inodes).
This stage is O(fs size) approximately, but it should be quite swift.

> It seems that any sort of damage to the underlying storage (e.g.
> media error, I/O error or user brain explosion) results in the need
> to do a full fsck and hence chunkfs gives you no benefit in this
> case.

I worry about this but so far haven't found something which couldn't
be cut down significantly with just a little extra work.  It might be
helpful to look at an extreme case.

Let's say we're incredibly paranoid.  We could be justified in running
a full fsck on the entire file system in between every single I/O.
After all, something *might* have been silently corrupted.  But this
would be ridiculously slow.  We could instead never check the file
system.  But then we would end up panicking and corrupting the file
system a lot.  So what's a good compromise?

In the chunkfs case, here's my rules of thumb so far:

1. Detection: All metadata has magic numbers and checksums.
2. Scrubbing: Random check of chunks when possible.
3. Repair: When we detect corruption, either by checksum error, file
   system code assertion failure, or hardware tells us we have a bug,
   check the chunk containing the error and any outside-chunk
   information that could be affected by it.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Wed, Apr 25, 2007 at 03:34:03PM +0400, Nikita Danilov wrote:
> 
> What is more important, design puts (as far as I can see) no upper limit
> on the number of continuation inodes, and hence, even if _average_ fsck
> time is greatly reduced, occasionally it can take more time than ext2 of
> the same size. This is clearly unacceptable in many situations (HA,
> etc.).

Actually, there is an upper limit on the number of continuation
inodes.  Each file can have a maximum of one continuation inode per
chunk. (This is why we need to support sparse files.)

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Tue, Apr 24, 2007 at 11:34:48PM +0400, Nikita Danilov wrote:
> 
> Maybe I failed to describe the problem presicely.
> 
> Suppose that all chunks have been checked. After that, for every inode
> I0 having continuations I1, I2, ... In, one has to check that every
> logical block is presented in at most one of these inodes. For this one
> has to read I0, with all its indirect (double-indirect, triple-indirect)
> blocks, then read I1 with all its indirect blocks, etc. And to repeat
> this for every inode with continuations.
> 
> In the worst case (every inode has a continuation in every chunk) this
> obviously is as bad as un-chunked fsck. But even in the average case,
> total amount of io necessary for this operation is proportional to the
> _total_ file system size, rather than to the chunk size.

Fsck in chunkfs is still going to have an element that is proportional
to the file system size for certain cases.  However, that element will
be a great deal smaller than in a regular file system, except in the
most pathological cases.  If those pathological cases happen often,
then it's back to the drawing board.  My hunch is that they won't be
common.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Tue, Apr 24, 2007 at 11:34:48PM +0400, Nikita Danilov wrote:
 
 Maybe I failed to describe the problem presicely.
 
 Suppose that all chunks have been checked. After that, for every inode
 I0 having continuations I1, I2, ... In, one has to check that every
 logical block is presented in at most one of these inodes. For this one
 has to read I0, with all its indirect (double-indirect, triple-indirect)
 blocks, then read I1 with all its indirect blocks, etc. And to repeat
 this for every inode with continuations.
 
 In the worst case (every inode has a continuation in every chunk) this
 obviously is as bad as un-chunked fsck. But even in the average case,
 total amount of io necessary for this operation is proportional to the
 _total_ file system size, rather than to the chunk size.

Fsck in chunkfs is still going to have an element that is proportional
to the file system size for certain cases.  However, that element will
be a great deal smaller than in a regular file system, except in the
most pathological cases.  If those pathological cases happen often,
then it's back to the drawing board.  My hunch is that they won't be
common.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Wed, Apr 25, 2007 at 03:34:03PM +0400, Nikita Danilov wrote:
 
 What is more important, design puts (as far as I can see) no upper limit
 on the number of continuation inodes, and hence, even if _average_ fsck
 time is greatly reduced, occasionally it can take more time than ext2 of
 the same size. This is clearly unacceptable in many situations (HA,
 etc.).

Actually, there is an upper limit on the number of continuation
inodes.  Each file can have a maximum of one continuation inode per
chunk. (This is why we need to support sparse files.)

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote:
 On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote:
  
  The structure looks like this:
  
   -- --
  | cnode 0  |--| cnode 0  |-- to another cnode or NULL
   -- --
  | cnode 1  |-  | cnode 1  |-
   -- |   --  |
  | cnode 2  |-- |  | cnode 2  |--   |
   --  | |--  |   |
  | cnode 3  | | |  | cnode 3  | |   |
   --  | |--  |   |
|  |  ||  |   |
  
 inodes   inodes or NULL
 
 How do you recover if fsfuzzer takes out a cnode in the chain? The
 chunk is marked clean, but clearly corrupted and needs fixing and
 you don't know what it was pointing at.  Hence you have a pointer to
 a trashed cnode *somewhere* that you need to find and fix, and a
 bunch of orphaned cnodes that nobody points to *somewhere else* in
 the filesystem that you have to find. That's a full scan fsck case,
 isn't?

Excellent question.  This is one of the trickier aspects of chunkfs -
the orphan inode problem (tricky, but solvable).  The problem is what
if you smash/lose/corrupt an inode in one chunk that has a
continuation inode in another chunk?  A back pointer does you no good
if the back pointer is corrupted.

What you do is keep tabs on whether you see damage that looks like
this has occurred - e.g., inode use/free counts wrong, you had to zero
a corrupted inode - and when this happens, you do a scan of all
continuation inodes in chunks that have links to the corrupted chunk.
What you need to make this go fast is (1) a pre-made list of which
chunks have links with which other chunks, (2) a fast way to read all
of the continuation inodes in a chunk (ignoring chunk-local inodes).
This stage is O(fs size) approximately, but it should be quite swift.

 It seems that any sort of damage to the underlying storage (e.g.
 media error, I/O error or user brain explosion) results in the need
 to do a full fsck and hence chunkfs gives you no benefit in this
 case.

I worry about this but so far haven't found something which couldn't
be cut down significantly with just a little extra work.  It might be
helpful to look at an extreme case.

Let's say we're incredibly paranoid.  We could be justified in running
a full fsck on the entire file system in between every single I/O.
After all, something *might* have been silently corrupted.  But this
would be ridiculously slow.  We could instead never check the file
system.  But then we would end up panicking and corrupting the file
system a lot.  So what's a good compromise?

In the chunkfs case, here's my rules of thumb so far:

1. Detection: All metadata has magic numbers and checksums.
2. Scrubbing: Random check of chunks when possible.
3. Repair: When we detect corruption, either by checksum error, file
   system code assertion failure, or hardware tells us we have a bug,
   check the chunk containing the error and any outside-chunk
   information that could be affected by it.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread Valerie Henson
On Wed, Apr 25, 2007 at 05:38:34AM -0600, Andreas Dilger wrote:
 
 The case where only a fsck of the corrupt chunk is done would not find the
 cnode references.  Maybe there needs to be per-chunk info which contains
 a list/bitmap of other chunks that have cnodes shared with each chunk?

Yes, exactly.  One might almost think you had solved this problem
before. :):):)

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)

2007-04-16 Thread Valerie Henson
On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote:
> On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote:
>
> > IMHO chunkfs could provide a much more promising approach.
> 
> Agreed, that's one method of compartmentalising the problem.

Agreed, the chunkfs design is only one way to implement repair-driven
file system design - designing your file system to make file system
check and repair fast and easy.  I've written a paper on this idea,
which includes some interesting projections estimating that fsck will
take 10 times as long on the 2013 equivalent of a 2006 file system,
due entirely to changes in disk hardware.  So if your server currently
takes 2 hours to fsck, an equivalent server in 2013 will take about 20
hours.  Eek!  Paper here:

http://infohost.nmt.edu/~val/review/repair.pdf

While I'm working on chunkfs, I also think that all file systems
should strive for repair-driven design.  XFS has already made big
strides in this area (multi-threading fsck for multi-disk file
systems, for example) and I'm excited to see what comes next.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)

2007-04-16 Thread Valerie Henson
On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote:
 On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote:

  IMHO chunkfs could provide a much more promising approach.
 
 Agreed, that's one method of compartmentalising the problem.

Agreed, the chunkfs design is only one way to implement repair-driven
file system design - designing your file system to make file system
check and repair fast and easy.  I've written a paper on this idea,
which includes some interesting projections estimating that fsck will
take 10 times as long on the 2013 equivalent of a 2006 file system,
due entirely to changes in disk hardware.  So if your server currently
takes 2 hours to fsck, an equivalent server in 2013 will take about 20
hours.  Eek!  Paper here:

http://infohost.nmt.edu/~val/review/repair.pdf

While I'm working on chunkfs, I also think that all file systems
should strive for repair-driven design.  XFS has already made big
strides in this area (multi-threading fsck for multi-disk file
systems, for example) and I'm excited to see what comes next.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/4] [TULIP] fix for Lite-On 82c168 PNIC

2007-03-12 Thread Valerie Henson
From: Guido Classen <[EMAIL PROTECTED]>

This small patch fixes two issues with the Lite-On 82c168 PNIC adapters.
I've tested it with two cards in different machines both chip rev 17

The first is the wrong register address CSR6 for writing the MII register
which instead is 0xB8 (this may get a symbol too?) (see similar exisiting code
at line 437) in tulip_core.c

[Double-checked by Val Henson; yes, 0xB8 is correct register for
autonegotiate on this card.]

At least by my cards, the the bit 31 from the MII register seems to be
somewhat unstable. This results in reading wrong values from the Phy-Registers
und prevents the card from correct initialization. I've added a litte delay
and an second test of the bit. If the bit is stil cleared the read/write
process has definitely finished.

[Original patch slightly massaged by Val Henson]

Signed-off-by: Val Henson <[EMAIL PROTECTED]>
Cc: Guido Classen <[EMAIL PROTECTED]>
Signed-off-by: Grant Grundler <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>

---
 drivers/net/tulip/media.c  |   31 +++
 drivers/net/tulip/tulip_core.c |4 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c
@@ -1701,8 +1701,8 @@ static int __devinit tulip_init_one (str
tp->nwayset = 0;
iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6);
iowrite32(0x30, ioaddr + CSR12);
-   iowrite32(0x0001F078, ioaddr + CSR6);
-   iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on 
autonegotiation. */
+   iowrite32(0x0001F078, ioaddr + 0xB8);
+   iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on 
autonegotiation. */
}
break;
case MX98713:
--- tulip-2.6-mm-linux.orig/drivers/net/tulip/media.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/media.c
@@ -76,8 +76,20 @@ int tulip_mdio_read(struct net_device *d
ioread32(ioaddr + 0xA0);
while (--i > 0) {
barrier();
-   if ( ! ((retval = ioread32(ioaddr + 0xA0)) & 
0x8000))
-   break;
+   if ( ! ((retval = ioread32(ioaddr + 0xA0))
+& 0x8000)) {
+   /*
+* Possible bug in 82c168 rev 17 -
+ * sometimes bit 31 is unstable and
+ * clears before actually finished.
+ * Delay and check if bit 31 is still
+ * cleared before believing it.
+*/
+udelay(10);
+if ( ! ((retval = ioread32(ioaddr + 0xA0))
+& 0x8000))
+break;
+}
}
spin_unlock_irqrestore(>mii_lock, flags);
return retval & 0x;
@@ -136,8 +148,19 @@ void tulip_mdio_write(struct net_device 
iowrite32(cmd, ioaddr + 0xA0);
do {
barrier();
-   if ( ! (ioread32(ioaddr + 0xA0) & 0x8000))
-   break;
+   if ( ! (ioread32(ioaddr + 0xA0) & 0x8000)) {
+   /*
+* Possible bug in 82c168 rev 17 -
+ * sometimes bit 31 is unstable and
+ * clears before actually finished.
+ * Delay and check if bit 31 is still
+ * cleared before believing it.
+*/
+udelay(10);
+if ( ! (ioread32(ioaddr + 0xA0)
+& 0x8000))
+break;
+}
} while (--i > 0);
spin_unlock_irqrestore(>mii_lock, flags);
return;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/4] [TULIP] Rev tulip version

2007-03-12 Thread Valerie Henson
Rev tulip version... things have changed since 2002!

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>

---
 drivers/net/tulip/tulip_core.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c
@@ -17,11 +17,11 @@
 
 #define DRV_NAME   "tulip"
 #ifdef CONFIG_TULIP_NAPI
-#define DRV_VERSION"1.1.14-NAPI" /* Keep at least for test */
+#define DRV_VERSION"1.1.15-NAPI" /* Keep at least for test */
 #else
-#define DRV_VERSION"1.1.14"
+#define DRV_VERSION"1.1.15"
 #endif
-#define DRV_RELDATE"May 11, 2002"
+#define DRV_RELDATE"Feb 27, 2007"
 
 
 #include 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/4] [TULIP] Tulip updates

2007-03-12 Thread Valerie Henson
This patch set includes a fix for Lite-on from Guido Classen, some
minor debugging/typo fixes, and a long-need rev to the version (the
last time this was done was 2002!).

-VAL

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/4] [TULIP] Quiet down tulip_stop_rxtx

2007-03-12 Thread Valerie Henson
Only print out debugging info for tulip_stop_rxtx if debug is on.
Many cards (including at least two of my own) fail to stop properly
during initialization according to this test with no apparent ill
effects.  Worse, it tends to spam logs when the driver doesn't work.

Signed-off-by: Val Henson <[EMAIL PROTECTED]>
Signed-off-by: Grant Grundler <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>

---
 drivers/net/tulip/tulip.h |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h
@@ -481,7 +481,7 @@ static inline void tulip_stop_rxtx(struc
while (--i && (ioread32(ioaddr + CSR5) & (CSR5_TS|CSR5_RS)))
udelay(10);
 
-   if (!i)
+   if (!i && (tulip_debug > 1))
printk(KERN_DEBUG "%s: tulip_stop_rxtx() failed"
" (CSR5 0x%x CSR6 0x%x)\n",
pci_name(tp->pdev),

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/4] [TULIP] Fix SytemError typo

2007-03-12 Thread Valerie Henson
Fix an annoying typo - SytemError -> SystemError

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>

---
 drivers/net/tulip/interrupt.c   |4 ++--
 drivers/net/tulip/tulip.h   |2 +-
 drivers/net/tulip/winbond-840.c |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/interrupt.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/interrupt.c
@@ -675,7 +675,7 @@ irqreturn_t tulip_interrupt(int irq, voi
if (tp->link_change)
(tp->link_change)(dev, csr5);
}
-   if (csr5 & SytemError) {
+   if (csr5 & SystemError) {
int error = (csr5 >> 23) & 7;
/* oops, we hit a PCI error.  The code produced 
corresponds
 * to the reason:
@@ -745,7 +745,7 @@ irqreturn_t tulip_interrupt(int irq, voi
  TxFIFOUnderflow |
  TxJabber |
  TPLnkFail |
- SytemError )) != 0);
+ SystemError )) != 0);
 #else
} while ((csr5 & (NormalIntr|AbnormalIntr)) != 0);
 
--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h
@@ -132,7 +132,7 @@ enum pci_cfg_driver_reg {
 /* The bits in the CSR5 status registers, mostly interrupt sources. */
 enum status_bits {
TimerInt = 0x800,
-   SytemError = 0x2000,
+   SystemError = 0x2000,
TPLnkFail = 0x1000,
TPLnkPass = 0x10,
NormalIntr = 0x1,
--- tulip-2.6-mm-linux.orig/drivers/net/tulip/winbond-840.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/winbond-840.c
@@ -1148,7 +1148,7 @@ static irqreturn_t intr_handler(int irq,
}
 
/* Abnormal error summary/uncommon events handlers. */
-   if (intr_status & (AbnormalIntr | TxFIFOUnderflow | SytemError |
+   if (intr_status & (AbnormalIntr | TxFIFOUnderflow | SystemError 
|
   TimerInt | TxDied))
netdev_error(dev, intr_status);
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/4] [TULIP] Tulip updates

2007-03-12 Thread Valerie Henson
This patch set includes a fix for Lite-on from Guido Classen, some
minor debugging/typo fixes, and a long-need rev to the version (the
last time this was done was 2002!).

-VAL

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/4] [TULIP] Quiet down tulip_stop_rxtx

2007-03-12 Thread Valerie Henson
Only print out debugging info for tulip_stop_rxtx if debug is on.
Many cards (including at least two of my own) fail to stop properly
during initialization according to this test with no apparent ill
effects.  Worse, it tends to spam logs when the driver doesn't work.

Signed-off-by: Val Henson [EMAIL PROTECTED]
Signed-off-by: Grant Grundler [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]

---
 drivers/net/tulip/tulip.h |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h
@@ -481,7 +481,7 @@ static inline void tulip_stop_rxtx(struc
while (--i  (ioread32(ioaddr + CSR5)  (CSR5_TS|CSR5_RS)))
udelay(10);
 
-   if (!i)
+   if (!i  (tulip_debug  1))
printk(KERN_DEBUG %s: tulip_stop_rxtx() failed
 (CSR5 0x%x CSR6 0x%x)\n,
pci_name(tp-pdev),

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/4] [TULIP] Fix SytemError typo

2007-03-12 Thread Valerie Henson
Fix an annoying typo - SytemError - SystemError

Signed-off-by: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]

---
 drivers/net/tulip/interrupt.c   |4 ++--
 drivers/net/tulip/tulip.h   |2 +-
 drivers/net/tulip/winbond-840.c |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/interrupt.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/interrupt.c
@@ -675,7 +675,7 @@ irqreturn_t tulip_interrupt(int irq, voi
if (tp-link_change)
(tp-link_change)(dev, csr5);
}
-   if (csr5  SytemError) {
+   if (csr5  SystemError) {
int error = (csr5  23)  7;
/* oops, we hit a PCI error.  The code produced 
corresponds
 * to the reason:
@@ -745,7 +745,7 @@ irqreturn_t tulip_interrupt(int irq, voi
  TxFIFOUnderflow |
  TxJabber |
  TPLnkFail |
- SytemError )) != 0);
+ SystemError )) != 0);
 #else
} while ((csr5  (NormalIntr|AbnormalIntr)) != 0);
 
--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h
@@ -132,7 +132,7 @@ enum pci_cfg_driver_reg {
 /* The bits in the CSR5 status registers, mostly interrupt sources. */
 enum status_bits {
TimerInt = 0x800,
-   SytemError = 0x2000,
+   SystemError = 0x2000,
TPLnkFail = 0x1000,
TPLnkPass = 0x10,
NormalIntr = 0x1,
--- tulip-2.6-mm-linux.orig/drivers/net/tulip/winbond-840.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/winbond-840.c
@@ -1148,7 +1148,7 @@ static irqreturn_t intr_handler(int irq,
}
 
/* Abnormal error summary/uncommon events handlers. */
-   if (intr_status  (AbnormalIntr | TxFIFOUnderflow | SytemError |
+   if (intr_status  (AbnormalIntr | TxFIFOUnderflow | SystemError 
|
   TimerInt | TxDied))
netdev_error(dev, intr_status);
 

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/4] [TULIP] Rev tulip version

2007-03-12 Thread Valerie Henson
Rev tulip version... things have changed since 2002!

Signed-off-by: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]

---
 drivers/net/tulip/tulip_core.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c
@@ -17,11 +17,11 @@
 
 #define DRV_NAME   tulip
 #ifdef CONFIG_TULIP_NAPI
-#define DRV_VERSION1.1.14-NAPI /* Keep at least for test */
+#define DRV_VERSION1.1.15-NAPI /* Keep at least for test */
 #else
-#define DRV_VERSION1.1.14
+#define DRV_VERSION1.1.15
 #endif
-#define DRV_RELDATEMay 11, 2002
+#define DRV_RELDATEFeb 27, 2007
 
 
 #include linux/module.h

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/4] [TULIP] fix for Lite-On 82c168 PNIC

2007-03-12 Thread Valerie Henson
From: Guido Classen [EMAIL PROTECTED]

This small patch fixes two issues with the Lite-On 82c168 PNIC adapters.
I've tested it with two cards in different machines both chip rev 17

The first is the wrong register address CSR6 for writing the MII register
which instead is 0xB8 (this may get a symbol too?) (see similar exisiting code
at line 437) in tulip_core.c

[Double-checked by Val Henson; yes, 0xB8 is correct register for
autonegotiate on this card.]

At least by my cards, the the bit 31 from the MII register seems to be
somewhat unstable. This results in reading wrong values from the Phy-Registers
und prevents the card from correct initialization. I've added a litte delay
and an second test of the bit. If the bit is stil cleared the read/write
process has definitely finished.

[Original patch slightly massaged by Val Henson]

Signed-off-by: Val Henson [EMAIL PROTECTED]
Cc: Guido Classen [EMAIL PROTECTED]
Signed-off-by: Grant Grundler [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]

---
 drivers/net/tulip/media.c  |   31 +++
 drivers/net/tulip/tulip_core.c |4 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c
@@ -1701,8 +1701,8 @@ static int __devinit tulip_init_one (str
tp-nwayset = 0;
iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6);
iowrite32(0x30, ioaddr + CSR12);
-   iowrite32(0x0001F078, ioaddr + CSR6);
-   iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on 
autonegotiation. */
+   iowrite32(0x0001F078, ioaddr + 0xB8);
+   iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on 
autonegotiation. */
}
break;
case MX98713:
--- tulip-2.6-mm-linux.orig/drivers/net/tulip/media.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/media.c
@@ -76,8 +76,20 @@ int tulip_mdio_read(struct net_device *d
ioread32(ioaddr + 0xA0);
while (--i  0) {
barrier();
-   if ( ! ((retval = ioread32(ioaddr + 0xA0))  
0x8000))
-   break;
+   if ( ! ((retval = ioread32(ioaddr + 0xA0))
+ 0x8000)) {
+   /*
+* Possible bug in 82c168 rev 17 -
+ * sometimes bit 31 is unstable and
+ * clears before actually finished.
+ * Delay and check if bit 31 is still
+ * cleared before believing it.
+*/
+udelay(10);
+if ( ! ((retval = ioread32(ioaddr + 0xA0))
+ 0x8000))
+break;
+}
}
spin_unlock_irqrestore(tp-mii_lock, flags);
return retval  0x;
@@ -136,8 +148,19 @@ void tulip_mdio_write(struct net_device 
iowrite32(cmd, ioaddr + 0xA0);
do {
barrier();
-   if ( ! (ioread32(ioaddr + 0xA0)  0x8000))
-   break;
+   if ( ! (ioread32(ioaddr + 0xA0)  0x8000)) {
+   /*
+* Possible bug in 82c168 rev 17 -
+ * sometimes bit 31 is unstable and
+ * clears before actually finished.
+ * Delay and check if bit 31 is still
+ * cleared before believing it.
+*/
+udelay(10);
+if ( ! (ioread32(ioaddr + 0xA0)
+ 0x8000))
+break;
+}
} while (--i  0);
spin_unlock_irqrestore(tp-mii_lock, flags);
return;

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-12 Thread Valerie Henson
On Mon, Feb 12, 2007 at 09:53:18PM +0200, Petri Kaukasoina wrote:
> On Mon, Feb 12, 2007 at 06:49:39PM +0100, Jan Engelhardt wrote:
> > >The one problem with noatime is that mutt's 'new mail arrived' breaks
> > 
> > Just why does not it use mtime then to check for New Mail Arrived, like 
> 
> I have always used:
> 
>   --enable-buffy-sizeUse file size attribute instead of access time
> 
> Support was there at least in 1998, maybe before.

Good point.  However, this works for mutt because new mail is an
append-only operation.  Other apps don't have the guarantee that file
modifications that they care about will change the file size.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-12 Thread Valerie Henson
On Mon, Feb 12, 2007 at 10:40:10AM -0500, Dave Jones wrote:
> 
> The one problem with noatime is that mutt's 'new mail arrived' breaks
> as you mentioned in the relatime changelog, so I'm surprised that
> they turned it on by default.  With relatime fixing that however,
> I'm also unaware of anything that breaks.   I'd be curious to
> do a Fedora test release with relatime, but I know the answer I'll
> get when I recommend we add it to our generated fstabs..
> 
> "If it's good enough, why isn't it the kernel default"
> 
> Hence my current line of questioning ;-)

Okay, I have to admit I used the normal atime semantics, exactly once.
Someone hacked my laptop about 4 years ago (back when I didn't have a
firewall and a remotely exploitable samba server was on by default in
some Red Hat install).  I pulled the plug on the network (no wireless
either) and figured out which files the attacker read, which gave me
some peace of mind. :)

Personally, I'd trade that for the performance/battery life/etc. of
relatime.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-12 Thread Valerie Henson
On Mon, Feb 12, 2007 at 10:40:10AM -0500, Dave Jones wrote:
 
 The one problem with noatime is that mutt's 'new mail arrived' breaks
 as you mentioned in the relatime changelog, so I'm surprised that
 they turned it on by default.  With relatime fixing that however,
 I'm also unaware of anything that breaks.   I'd be curious to
 do a Fedora test release with relatime, but I know the answer I'll
 get when I recommend we add it to our generated fstabs..
 
 If it's good enough, why isn't it the kernel default
 
 Hence my current line of questioning ;-)

Okay, I have to admit I used the normal atime semantics, exactly once.
Someone hacked my laptop about 4 years ago (back when I didn't have a
firewall and a remotely exploitable samba server was on by default in
some Red Hat install).  I pulled the plug on the network (no wireless
either) and figured out which files the attacker read, which gave me
some peace of mind. :)

Personally, I'd trade that for the performance/battery life/etc. of
relatime.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-12 Thread Valerie Henson
On Mon, Feb 12, 2007 at 09:53:18PM +0200, Petri Kaukasoina wrote:
 On Mon, Feb 12, 2007 at 06:49:39PM +0100, Jan Engelhardt wrote:
  The one problem with noatime is that mutt's 'new mail arrived' breaks
  
  Just why does not it use mtime then to check for New Mail Arrived, like 
 
 I have always used:
 
   --enable-buffy-sizeUse file size attribute instead of access time
 
 Support was there at least in 1998, maybe before.

Good point.  However, this works for mutt because new mail is an
append-only operation.  Other apps don't have the guarantee that file
modifications that they care about will change the file size.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-11 Thread Valerie Henson
On Sat, Feb 10, 2007 at 07:54:00PM -0500, Dave Jones wrote:
> 
> Whilst on the subject of RELATIME, is there any good reason why
> not to make this a default mount option ?

Ubuntu has been shipping with noatime as the default for some time
now, with no obvious problems (I'm running Ubuntu).  I see relatime as
an improvement on noatime.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-11 Thread Valerie Henson
On Sat, Feb 10, 2007 at 09:56:07AM -0800, Michael Kerrisk wrote:
> Val,
> 
> I'm just updating the mount(2) man page for MS_RELATIME, and this is the
> text I've come up with:
> 
>MS_RELATIME(Since Linux 2.6.20)
>   When a file on this file system is accessed, only
>   update  the  file's last accessed time (atime) if
>   the current value of atime is less than or  equal
>   to  the file's last modified (mtime) or last sta-
>   tus change time (ctime).  This option  is  useful
>   for  programs, such as mutt(1), that need to know
>   when a file has been read since it was last modi-
>   fied.
> 
> This text is based on your comments accompanying the various patches, but
> it differs in a respect.  Your comments said that the atime would only be
> updated if the atime is older than mtime/ctime.  However, what the code
> actually does is update atime if it is is <= mtime/ctime -- i.e., atime is
> older than or *or equal to* mtime/ctime.
> 
> I'm sure that the code implements your intention, but before incorporating
> the above text I thought I just better check, since the code differs from
> your comment.  Can you just confirm that the proposed man page text is okay.

That's correct, yes.  Thanks!

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-11 Thread Valerie Henson
On Sat, Feb 10, 2007 at 09:56:07AM -0800, Michael Kerrisk wrote:
 Val,
 
 I'm just updating the mount(2) man page for MS_RELATIME, and this is the
 text I've come up with:
 
MS_RELATIME(Since Linux 2.6.20)
   When a file on this file system is accessed, only
   update  the  file's last accessed time (atime) if
   the current value of atime is less than or  equal
   to  the file's last modified (mtime) or last sta-
   tus change time (ctime).  This option  is  useful
   for  programs, such as mutt(1), that need to know
   when a file has been read since it was last modi-
   fied.
 
 This text is based on your comments accompanying the various patches, but
 it differs in a respect.  Your comments said that the atime would only be
 updated if the atime is older than mtime/ctime.  However, what the code
 actually does is update atime if it is is = mtime/ctime -- i.e., atime is
 older than or *or equal to* mtime/ctime.
 
 I'm sure that the code implements your intention, but before incorporating
 the above text I thought I just better check, since the code differs from
 your comment.  Can you just confirm that the proposed man page text is okay.

That's correct, yes.  Thanks!

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documenting MS_RELATIME

2007-02-11 Thread Valerie Henson
On Sat, Feb 10, 2007 at 07:54:00PM -0500, Dave Jones wrote:
 
 Whilst on the subject of RELATIME, is there any good reason why
 not to make this a default mount option ?

Ubuntu has been shipping with noatime as the default for some time
now, with no obvious problems (I'm running Ubuntu).  I see relatime as
an improvement on noatime.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-08 Thread Valerie Henson
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote:
> That's the easy part.   How are we going to get mount(8) patched?

Karel, interested in taking a look at the following patch?  The kernel
bits are in -mm currently.

-VAL

Add the "relatime" (relative atime) option support to mount.  Relative
atime only updates the atime if the previous atime is older than the
mtime or ctime.  Like noatime, but useful for applications like mutt
that need to know when a file has been read since it was last
modified.

Cc: Adrian Bunk <[EMAIL PROTECTED]>
Cc: Al Viro <[EMAIL PROTECTED]>
Cc: Karel Zak <[EMAIL PROTECTED]>

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

---
 mount/mount.8   |7 +++
 mount/mount.c   |6 ++
 mount/mount_constants.h |4 
 3 files changed, 17 insertions(+)

--- util-linux-2.13-pre7.orig/mount/mount.8
+++ util-linux-2.13-pre7/mount/mount.8
@@ -586,6 +586,13 @@ access on the news spool to speed up new
 .B nodiratime
 Do not update directory inode access times on this filesystem.
 .TP
+.B relatime
+Update inode access times relative to modify or change time.  Access
+time is only updated if the previous access time was earlier than the
+current modify or change time. (Similar to noatime, but doesn't break
+mutt or other applications that need to know if a file has been read
+since the last time it was modified.)
+.TP
 .B noauto
 Can only be mounted explicitly (i.e., the
 .B \-a
--- util-linux-2.13-pre7.orig/mount/mount.c
+++ util-linux-2.13-pre7/mount/mount.c
@@ -164,6 +164,12 @@ static const struct opt_map opt_map[] = 
   { "diratime",0, 1, MS_NODIRATIME },  /* Update dir access times */
   { "nodiratime", 0, 0, MS_NODIRATIME },/* Do not update dir access times */
 #endif
+#ifdef MS_RELATIME
+  { "relatime", 0, 0, MS_RELATIME },   /* Update access times relative to
+  mtime/ctime */
+  { "norelatime", 0, 1, MS_RELATIME }, /* Update access time without regard
+  to mtime/ctime */
+#endif
   { NULL,  0, 0, 0 }
 };
 
--- util-linux-2.13-pre7.orig/mount/mount_constants.h
+++ util-linux-2.13-pre7/mount/mount_constants.h
@@ -57,6 +57,10 @@ if we have a stack or plain mount - moun
 #ifndef MS_VERBOSE
 #define MS_VERBOSE 0x8000  /* 32768 */
 #endif
+#ifndef MS_RELATIME
+#define MS_RELATIME   0x20 /* 20: Update access times relative
+  to mtime/ctime */
+#endif
 /*
  * Magic mount flag number. Had to be or-ed to the flag values.
  */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-08 Thread Valerie Henson
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote:
 That's the easy part.   How are we going to get mount(8) patched?

Karel, interested in taking a look at the following patch?  The kernel
bits are in -mm currently.

-VAL

Add the relatime (relative atime) option support to mount.  Relative
atime only updates the atime if the previous atime is older than the
mtime or ctime.  Like noatime, but useful for applications like mutt
that need to know when a file has been read since it was last
modified.

Cc: Adrian Bunk [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Karel Zak [EMAIL PROTECTED]

Signed-off-by: Valerie Henson [EMAIL PROTECTED]

---
 mount/mount.8   |7 +++
 mount/mount.c   |6 ++
 mount/mount_constants.h |4 
 3 files changed, 17 insertions(+)

--- util-linux-2.13-pre7.orig/mount/mount.8
+++ util-linux-2.13-pre7/mount/mount.8
@@ -586,6 +586,13 @@ access on the news spool to speed up new
 .B nodiratime
 Do not update directory inode access times on this filesystem.
 .TP
+.B relatime
+Update inode access times relative to modify or change time.  Access
+time is only updated if the previous access time was earlier than the
+current modify or change time. (Similar to noatime, but doesn't break
+mutt or other applications that need to know if a file has been read
+since the last time it was modified.)
+.TP
 .B noauto
 Can only be mounted explicitly (i.e., the
 .B \-a
--- util-linux-2.13-pre7.orig/mount/mount.c
+++ util-linux-2.13-pre7/mount/mount.c
@@ -164,6 +164,12 @@ static const struct opt_map opt_map[] = 
   { diratime,0, 1, MS_NODIRATIME },  /* Update dir access times */
   { nodiratime, 0, 0, MS_NODIRATIME },/* Do not update dir access times */
 #endif
+#ifdef MS_RELATIME
+  { relatime, 0, 0, MS_RELATIME },   /* Update access times relative to
+  mtime/ctime */
+  { norelatime, 0, 1, MS_RELATIME }, /* Update access time without regard
+  to mtime/ctime */
+#endif
   { NULL,  0, 0, 0 }
 };
 
--- util-linux-2.13-pre7.orig/mount/mount_constants.h
+++ util-linux-2.13-pre7/mount/mount_constants.h
@@ -57,6 +57,10 @@ if we have a stack or plain mount - moun
 #ifndef MS_VERBOSE
 #define MS_VERBOSE 0x8000  /* 32768 */
 #endif
+#ifndef MS_RELATIME
+#define MS_RELATIME   0x20 /* 20: Update access times relative
+  to mtime/ctime */
+#endif
 /*
  * Magic mount flag number. Had to be or-ed to the flag values.
  */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11)

2006-12-07 Thread Valerie Henson
Hi there, Guido,

Jeff resurrected this patch from the misty depths of the past.  I
double-checked the docs and the first bug fix is definitely correct.
The second part isn't in the docs, but seems reasonable.  Is this
still the patch you are using?  Any comments you want to add?

-VAL

> From: Guido Classen <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED], [EMAIL PROTECTED],
>   linux-net@vger.kernel.org, linux-kernel@vger.kernel.org
> Date: Fri, 01 Apr 2005 22:21:44 +0200
> Subject: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11)
> 
> Hi,
> 
> this small patch fixes two issues with the Lite-On 82c168 PNIC adapters.
> I've tested it with two cards in different machines both chip rev 17
> 
> The first is the wrong register address CSR6 for writing the MII register
> which instead is 0xB8 (this may get a symbol too?) (see similar exisiting 
> code
> at line 437) in tulip_core.c
> 
> At least by my cards, the the bit 31 from the MII register seems to be
> somewhat unstable. This results in reading wrong values from the 
> Phy-Registers
> und prevents the card from correct initialization. I've added a litte delay
> and an second test of the bit. If the bit is stil cleared the read/write
> process has definitely finished.
> 
> Cheers
>   Guido
> 
> Signed-off-by: Guido Classen <[EMAIL PROTECTED]>
> 
> diff -ru linux-2.6.11-org/drivers/net/tulip/tulip_core.c 
> linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c
> --- linux-2.6.11-org/drivers/net/tulip/tulip_core.c   2005-04-01 
> 22:10:03.0 +0200
> +++ linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c 2005-03-31 
> 23:14:11.0 +0200
> @@ -1701,8 +1701,8 @@
>   tp->nwayset = 0;
>   iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6);
>   iowrite32(0x30, ioaddr + CSR12);
> - iowrite32(0x0001F078, ioaddr + CSR6);
> - iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on 
> autonegotiation. */
> + iowrite32(0x0001F078, ioaddr + 0xB8);
> + iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on 
> autonegotiation. */
>   }
>   break;
>   case MX98713:
> diff -ru linux-2.6.11-org/drivers/net/tulip/media.c 
> linux-2.6.11.2-pentium/drivers/net/tulip/media.c
> --- linux-2.6.11-org/drivers/net/tulip/media.c2005-04-01 
> 22:10:03.0 +0200
> +++ linux-2.6.11.2-pentium/drivers/net/tulip/media.c  2005-04-01 
> 22:05:31.0 +0200
> @@ -74,8 +74,17 @@
>   ioread32(ioaddr + 0xA0);
>   while (--i > 0) {
>   barrier();
> - if ( ! ((retval = ioread32(ioaddr + 0xA0)) & 
> 0x8000))
> - break;
> + if ( ! ((retval = ioread32(ioaddr + 0xA0))
> +& 0x8000)) {
> +/* bug in 82c168 rev 17?
> + * wait a little while and check if
> + * bit 31 is still cleared */
> +udelay(10);
> +if ( ! ((retval = ioread32(ioaddr + 0xA0))
> +& 0x8000)) {
> +break;
> +}
> +}
>   }
>   spin_unlock_irqrestore(>mii_lock, flags);
>   return retval & 0x;
> @@ -153,8 +162,16 @@
>   iowrite32(cmd, ioaddr + 0xA0);
>   do {
>   barrier();
> - if ( ! (ioread32(ioaddr + 0xA0) & 0x8000))
> - break;
> + if ( ! (ioread32(ioaddr + 0xA0) & 0x8000)) {
> +/* bug in 82c168 rev 17?
> + * wait a little while and check if
> + * bit 31 is still cleared */
> +udelay(10);
> +if ( ! (ioread32(ioaddr + 0xA0)
> +& 0x8000)) {
> +break;
> +}
> +}
>   } while (--i > 0);
>   spin_unlock_irqrestore(>mii_lock, flags);
>   return;
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11)

2006-12-07 Thread Valerie Henson
Hi there, Guido,

Jeff resurrected this patch from the misty depths of the past.  I
double-checked the docs and the first bug fix is definitely correct.
The second part isn't in the docs, but seems reasonable.  Is this
still the patch you are using?  Any comments you want to add?

-VAL

 From: Guido Classen [EMAIL PROTECTED]
 To: [EMAIL PROTECTED], [EMAIL PROTECTED],
   linux-net@vger.kernel.org, linux-kernel@vger.kernel.org
 Date: Fri, 01 Apr 2005 22:21:44 +0200
 Subject: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11)
 
 Hi,
 
 this small patch fixes two issues with the Lite-On 82c168 PNIC adapters.
 I've tested it with two cards in different machines both chip rev 17
 
 The first is the wrong register address CSR6 for writing the MII register
 which instead is 0xB8 (this may get a symbol too?) (see similar exisiting 
 code
 at line 437) in tulip_core.c
 
 At least by my cards, the the bit 31 from the MII register seems to be
 somewhat unstable. This results in reading wrong values from the 
 Phy-Registers
 und prevents the card from correct initialization. I've added a litte delay
 and an second test of the bit. If the bit is stil cleared the read/write
 process has definitely finished.
 
 Cheers
   Guido
 
 Signed-off-by: Guido Classen [EMAIL PROTECTED]
 
 diff -ru linux-2.6.11-org/drivers/net/tulip/tulip_core.c 
 linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c
 --- linux-2.6.11-org/drivers/net/tulip/tulip_core.c   2005-04-01 
 22:10:03.0 +0200
 +++ linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c 2005-03-31 
 23:14:11.0 +0200
 @@ -1701,8 +1701,8 @@
   tp-nwayset = 0;
   iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6);
   iowrite32(0x30, ioaddr + CSR12);
 - iowrite32(0x0001F078, ioaddr + CSR6);
 - iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on 
 autonegotiation. */
 + iowrite32(0x0001F078, ioaddr + 0xB8);
 + iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on 
 autonegotiation. */
   }
   break;
   case MX98713:
 diff -ru linux-2.6.11-org/drivers/net/tulip/media.c 
 linux-2.6.11.2-pentium/drivers/net/tulip/media.c
 --- linux-2.6.11-org/drivers/net/tulip/media.c2005-04-01 
 22:10:03.0 +0200
 +++ linux-2.6.11.2-pentium/drivers/net/tulip/media.c  2005-04-01 
 22:05:31.0 +0200
 @@ -74,8 +74,17 @@
   ioread32(ioaddr + 0xA0);
   while (--i  0) {
   barrier();
 - if ( ! ((retval = ioread32(ioaddr + 0xA0))  
 0x8000))
 - break;
 + if ( ! ((retval = ioread32(ioaddr + 0xA0))
 + 0x8000)) {
 +/* bug in 82c168 rev 17?
 + * wait a little while and check if
 + * bit 31 is still cleared */
 +udelay(10);
 +if ( ! ((retval = ioread32(ioaddr + 0xA0))
 + 0x8000)) {
 +break;
 +}
 +}
   }
   spin_unlock_irqrestore(tp-mii_lock, flags);
   return retval  0x;
 @@ -153,8 +162,16 @@
   iowrite32(cmd, ioaddr + 0xA0);
   do {
   barrier();
 - if ( ! (ioread32(ioaddr + 0xA0)  0x8000))
 - break;
 + if ( ! (ioread32(ioaddr + 0xA0)  0x8000)) {
 +/* bug in 82c168 rev 17?
 + * wait a little while and check if
 + * bit 31 is still cleared */
 +udelay(10);
 +if ( ! (ioread32(ioaddr + 0xA0)
 + 0x8000)) {
 +break;
 +}
 +}
   } while (--i  0);
   spin_unlock_irqrestore(tp-mii_lock, flags);
   return;
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-06 Thread Valerie Henson
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote:
> > On Mon, 4 Dec 2006 16:36:20 -0800 Valerie Henson <[EMAIL PROTECTED]> wrote:
> > Add "relatime" (relative atime) support.  Relative atime only updates
> > the atime if the previous atime is older than the mtime or ctime.
> > Like noatime, but useful for applications like mutt that need to know
> > when a file has been read since it was last modified.
> 
> That seems like a good idea.
> 
> I found touch_atime() to be rather putrid, so I hacked it around a bit.  The
> end result:

I like that rather better - add my:

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

> That's the easy part.   How are we going to get mount(8) patched?

Well, the nodiratime documentation got in. (I was going to add that as
part of this apatch, but lo and behold.)

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-06 Thread Valerie Henson
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote:
  On Mon, 4 Dec 2006 16:36:20 -0800 Valerie Henson [EMAIL PROTECTED] wrote:
  Add relatime (relative atime) support.  Relative atime only updates
  the atime if the previous atime is older than the mtime or ctime.
  Like noatime, but useful for applications like mutt that need to know
  when a file has been read since it was last modified.
 
 That seems like a good idea.
 
 I found touch_atime() to be rather putrid, so I hacked it around a bit.  The
 end result:

I like that rather better - add my:

Signed-off-by: Valerie Henson [EMAIL PROTECTED]

 That's the easy part.   How are we going to get mount(8) patched?

Well, the nodiratime documentation got in. (I was going to add that as
part of this apatch, but lo and behold.)

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-04 Thread Valerie Henson
On Mon, Dec 04, 2006 at 04:36:20PM -0800, Valerie Henson wrote:
> On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote:
> > Hi Steve,
> > 
> > On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote:
> > > > In the future, I'd like to see a "relative atime" mode, which functions
> > > > in the manner described by Valerie Henson at:
> > > > 
> > > > http://lkml.org/lkml/2006/8/25/380
> > > > 
> > > I'd like to second that. [adding Val Henson to the "to"] What (if
> > > anything) remains to be done before the relative atime patch is ready to
> > > go upstream? I'm happy to help out here if required,
> > Last time I looked at them, things seemed to be in pretty good shape - it
> > wasn't a very large patch series.

And the userland part.

-VAL

Add the "relatime" (relative atime) option support to mount.  Relative
atime only updates the atime if the previous atime is older than the
mtime or ctime.  Like noatime, but useful for applications like mutt
that need to know when a file has been read since it was last
modified.

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

---
 mount/mount.8   |7 +++
 mount/mount.c   |6 ++
 mount/mount_constants.h |4 
 3 files changed, 17 insertions(+)
--- util-linux-2.13-pre7.orig/mount/mount.8
+++ util-linux-2.13-pre7/mount/mount.8
@@ -586,6 +586,13 @@ access on the news spool to speed up new
 .B nodiratime
 Do not update directory inode access times on this filesystem.
 .TP
+.B relatime
+Update inode access times relative to modify or change time.  Access
+time is only updated if the previous access time was earlier than the
+current modify or change time. (Similar to noatime, but doesn't break
+mutt or other applications that need to know if a file has been read
+since the last time it was modified.)
+.TP
 .B noauto
 Can only be mounted explicitly (i.e., the
 .B \-a
--- util-linux-2.13-pre7.orig/mount/mount.c
+++ util-linux-2.13-pre7/mount/mount.c
@@ -164,6 +164,12 @@ static const struct opt_map opt_map[] =
   { "diratime",0, 1, MS_NODIRATIME },  /* Update dir access times */
   { "nodiratime", 0, 0, MS_NODIRATIME },/* Do not update dir access times */
 #endif
+#ifdef MS_RELATIME
+  { "relatime", 0, 0, MS_RELATIME },   /* Update access times relative to
+  mtime/ctime */
+  { "norelatime", 0, 1, MS_RELATIME }, /* Update access time without regard
+  to mtime/ctime */
+#endif
   { NULL,  0, 0, 0 }
 };

--- util-linux-2.13-pre7.orig/mount/mount_constants.h
+++ util-linux-2.13-pre7/mount/mount_constants.h
@@ -57,6 +57,10 @@ if we have a stack or plain mount - moun
 #ifndef MS_VERBOSE
 #define MS_VERBOSE 0x8000  /* 32768 */
 #endif
+#ifndef MS_RELATIME
+#define MS_RELATIME   0x20 /* 20: Update access times relative
+  to mtime/ctime */
+#endif
 /*
  * Magic mount flag number. Had to be or-ed to the flag values.
  */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Relative atime (was Re: What's in ocfs2.git)

2006-12-04 Thread Valerie Henson
On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote:
> Hi Steve,
> 
> On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote:
> > > In the future, I'd like to see a "relative atime" mode, which functions
> > > in the manner described by Valerie Henson at:
> > > 
> > > http://lkml.org/lkml/2006/8/25/380
> > > 
> > I'd like to second that. [adding Val Henson to the "to"] What (if
> > anything) remains to be done before the relative atime patch is ready to
> > go upstream? I'm happy to help out here if required,
> Last time I looked at them, things seemed to be in pretty good shape - it
> wasn't a very large patch series.

Yep, the relative atime patch is tiny and pretty much done - just
needs some soak time in -mm and a little more review (cc'd Viro and
fsdevel).  Kernel patch against 2.6.18-rc4 appended, patch to mount
following. (Note that my web server suffered a RAID failure and my
patches page is unavailable till the restore finishes.)

-VAL

Add "relatime" (relative atime) support.  Relative atime only updates
the atime if the previous atime is older than the mtime or ctime.
Like noatime, but useful for applications like mutt that need to know
when a file has been read since it was last modified.

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

---
 fs/inode.c|   11 ++-
 fs/namespace.c|5 -
 include/linux/fs.h|1 +
 include/linux/mount.h |1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

--- linux-2.6.18-rc4-relatime.orig/fs/inode.c
+++ linux-2.6.18-rc4-relatime/fs/inode.c
@@ -1200,7 +1200,16 @@ void touch_atime(struct vfsmount *mnt, s
return;

now = current_fs_time(inode->i_sb);
-   if (!timespec_equal(>i_atime, )) {
+   if (timespec_equal(>i_atime, ))
+   return;
+   /*
+* With relative atime, only update atime if the previous
+* atime is earlier than either the ctime or mtime.
+*/
+   if (!mnt ||
+   !(mnt->mnt_flags & MNT_RELATIME) ||
+   (timespec_compare(>i_atime, >i_mtime) < 0) ||
+   (timespec_compare(>i_atime, >i_ctime) < 0)) {
inode->i_atime = now;
mark_inode_dirty_sync(inode);
}
--- linux-2.6.18-rc4-relatime.orig/fs/namespace.c
+++ linux-2.6.18-rc4-relatime/fs/namespace.c
@@ -376,6 +376,7 @@ static int show_vfsmnt(struct seq_file *
{ MNT_NOEXEC, ",noexec" },
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
+   { MNT_RELATIME, ",relatime" },
{ 0, NULL }
};
struct proc_fs_info *fs_infop;
@@ -1413,9 +1414,11 @@ long do_mount(char *dev_name, char *dir_
mnt_flags |= MNT_NOATIME;
if (flags & MS_NODIRATIME)
mnt_flags |= MNT_NODIRATIME;
+   if (flags & MS_RELATIME)
+   mnt_flags |= MNT_RELATIME;

flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
-  MS_NOATIME | MS_NODIRATIME);
+  MS_NOATIME | MS_NODIRATIME | MS_RELATIME);

/* ... and get the mountpoint */
retval = path_lookup(dir_name, LOOKUP_FOLLOW, );
--- linux-2.6.18-rc4-relatime.orig/include/linux/fs.h
+++ linux-2.6.18-rc4-relatime/include/linux/fs.h
@@ -119,6 +119,7 @@ extern int dir_notify_enable;
 #define MS_PRIVATE (1<<18) /* change to private */
 #define MS_SLAVE   (1<<19) /* change to slave */
 #define MS_SHARED  (1<<20) /* change to shared */
+#define MS_RELATIME(1<<21) /* Update atime relative to mtime/ctime. */
 #define MS_ACTIVE  (1<<30)
 #define MS_NOUSER  (1<<31)

--- linux-2.6.18-rc4-relatime.orig/include/linux/mount.h
+++ linux-2.6.18-rc4-relatime/include/linux/mount.h
@@ -27,6 +27,7 @@ struct namespace;
 #define MNT_NOEXEC 0x04
 #define MNT_NOATIME0x08
 #define MNT_NODIRATIME 0x10
+#define MNT_RELATIME   0x20

 #define MNT_SHRINKABLE 0x100
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Relative atime (was Re: What's in ocfs2.git)

2006-12-04 Thread Valerie Henson
On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote:
 Hi Steve,
 
 On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote:
   In the future, I'd like to see a relative atime mode, which functions
   in the manner described by Valerie Henson at:
   
   http://lkml.org/lkml/2006/8/25/380
   
  I'd like to second that. [adding Val Henson to the to] What (if
  anything) remains to be done before the relative atime patch is ready to
  go upstream? I'm happy to help out here if required,
 Last time I looked at them, things seemed to be in pretty good shape - it
 wasn't a very large patch series.

Yep, the relative atime patch is tiny and pretty much done - just
needs some soak time in -mm and a little more review (cc'd Viro and
fsdevel).  Kernel patch against 2.6.18-rc4 appended, patch to mount
following. (Note that my web server suffered a RAID failure and my
patches page is unavailable till the restore finishes.)

-VAL

Add relatime (relative atime) support.  Relative atime only updates
the atime if the previous atime is older than the mtime or ctime.
Like noatime, but useful for applications like mutt that need to know
when a file has been read since it was last modified.

Signed-off-by: Valerie Henson [EMAIL PROTECTED]

---
 fs/inode.c|   11 ++-
 fs/namespace.c|5 -
 include/linux/fs.h|1 +
 include/linux/mount.h |1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

--- linux-2.6.18-rc4-relatime.orig/fs/inode.c
+++ linux-2.6.18-rc4-relatime/fs/inode.c
@@ -1200,7 +1200,16 @@ void touch_atime(struct vfsmount *mnt, s
return;

now = current_fs_time(inode-i_sb);
-   if (!timespec_equal(inode-i_atime, now)) {
+   if (timespec_equal(inode-i_atime, now))
+   return;
+   /*
+* With relative atime, only update atime if the previous
+* atime is earlier than either the ctime or mtime.
+*/
+   if (!mnt ||
+   !(mnt-mnt_flags  MNT_RELATIME) ||
+   (timespec_compare(inode-i_atime, inode-i_mtime)  0) ||
+   (timespec_compare(inode-i_atime, inode-i_ctime)  0)) {
inode-i_atime = now;
mark_inode_dirty_sync(inode);
}
--- linux-2.6.18-rc4-relatime.orig/fs/namespace.c
+++ linux-2.6.18-rc4-relatime/fs/namespace.c
@@ -376,6 +376,7 @@ static int show_vfsmnt(struct seq_file *
{ MNT_NOEXEC, ,noexec },
{ MNT_NOATIME, ,noatime },
{ MNT_NODIRATIME, ,nodiratime },
+   { MNT_RELATIME, ,relatime },
{ 0, NULL }
};
struct proc_fs_info *fs_infop;
@@ -1413,9 +1414,11 @@ long do_mount(char *dev_name, char *dir_
mnt_flags |= MNT_NOATIME;
if (flags  MS_NODIRATIME)
mnt_flags |= MNT_NODIRATIME;
+   if (flags  MS_RELATIME)
+   mnt_flags |= MNT_RELATIME;

flags = ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
-  MS_NOATIME | MS_NODIRATIME);
+  MS_NOATIME | MS_NODIRATIME | MS_RELATIME);

/* ... and get the mountpoint */
retval = path_lookup(dir_name, LOOKUP_FOLLOW, nd);
--- linux-2.6.18-rc4-relatime.orig/include/linux/fs.h
+++ linux-2.6.18-rc4-relatime/include/linux/fs.h
@@ -119,6 +119,7 @@ extern int dir_notify_enable;
 #define MS_PRIVATE (118) /* change to private */
 #define MS_SLAVE   (119) /* change to slave */
 #define MS_SHARED  (120) /* change to shared */
+#define MS_RELATIME(121) /* Update atime relative to mtime/ctime. */
 #define MS_ACTIVE  (130)
 #define MS_NOUSER  (131)

--- linux-2.6.18-rc4-relatime.orig/include/linux/mount.h
+++ linux-2.6.18-rc4-relatime/include/linux/mount.h
@@ -27,6 +27,7 @@ struct namespace;
 #define MNT_NOEXEC 0x04
 #define MNT_NOATIME0x08
 #define MNT_NODIRATIME 0x10
+#define MNT_RELATIME   0x20

 #define MNT_SHRINKABLE 0x100
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-04 Thread Valerie Henson
On Mon, Dec 04, 2006 at 04:36:20PM -0800, Valerie Henson wrote:
 On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote:
  Hi Steve,
  
  On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote:
In the future, I'd like to see a relative atime mode, which functions
in the manner described by Valerie Henson at:

http://lkml.org/lkml/2006/8/25/380

   I'd like to second that. [adding Val Henson to the to] What (if
   anything) remains to be done before the relative atime patch is ready to
   go upstream? I'm happy to help out here if required,
  Last time I looked at them, things seemed to be in pretty good shape - it
  wasn't a very large patch series.

And the userland part.

-VAL

Add the relatime (relative atime) option support to mount.  Relative
atime only updates the atime if the previous atime is older than the
mtime or ctime.  Like noatime, but useful for applications like mutt
that need to know when a file has been read since it was last
modified.

Signed-off-by: Valerie Henson [EMAIL PROTECTED]

---
 mount/mount.8   |7 +++
 mount/mount.c   |6 ++
 mount/mount_constants.h |4 
 3 files changed, 17 insertions(+)
--- util-linux-2.13-pre7.orig/mount/mount.8
+++ util-linux-2.13-pre7/mount/mount.8
@@ -586,6 +586,13 @@ access on the news spool to speed up new
 .B nodiratime
 Do not update directory inode access times on this filesystem.
 .TP
+.B relatime
+Update inode access times relative to modify or change time.  Access
+time is only updated if the previous access time was earlier than the
+current modify or change time. (Similar to noatime, but doesn't break
+mutt or other applications that need to know if a file has been read
+since the last time it was modified.)
+.TP
 .B noauto
 Can only be mounted explicitly (i.e., the
 .B \-a
--- util-linux-2.13-pre7.orig/mount/mount.c
+++ util-linux-2.13-pre7/mount/mount.c
@@ -164,6 +164,12 @@ static const struct opt_map opt_map[] =
   { diratime,0, 1, MS_NODIRATIME },  /* Update dir access times */
   { nodiratime, 0, 0, MS_NODIRATIME },/* Do not update dir access times */
 #endif
+#ifdef MS_RELATIME
+  { relatime, 0, 0, MS_RELATIME },   /* Update access times relative to
+  mtime/ctime */
+  { norelatime, 0, 1, MS_RELATIME }, /* Update access time without regard
+  to mtime/ctime */
+#endif
   { NULL,  0, 0, 0 }
 };

--- util-linux-2.13-pre7.orig/mount/mount_constants.h
+++ util-linux-2.13-pre7/mount/mount_constants.h
@@ -57,6 +57,10 @@ if we have a stack or plain mount - moun
 #ifndef MS_VERBOSE
 #define MS_VERBOSE 0x8000  /* 32768 */
 #endif
+#ifndef MS_RELATIME
+#define MS_RELATIME   0x20 /* 20: Update access times relative
+  to mtime/ctime */
+#endif
 /*
  * Magic mount flag number. Had to be or-ed to the flag values.
  */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/