Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-16 Thread Dave Jones
On Sun, Aug 05, 2007 at 11:00:29AM -0400, Theodore Tso wrote:

 > P.S.  Yet alternative is to specify noatime on an individual
 > file/directory basis.  We've had this capability for a *long* time,
 > and if a distro were to set noatime for all files in certain
 > hierarchies (i.e., /usr/include) and certain top-level directories
 > (since the chattr +A flag is inherited)

This came across my mind again earlier, and I went digging.
Can you explain how this works?

I've eyeballed the ext2/ext3 code, and feel like I'm missing something obvious.
I'm guessing that for eg, with /usr/include/stdio.h, we check the inodes
for all four parts of path, and if any of them are +A we avoid the
atime update ?  If so, where does that inheritance happen in the code?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-16 Thread Helge Hafting

Andi Kleen wrote:

I always thought the right solution would be to just sync atime only
very very lazily. This means if a inode is only dirty because of an
atime update put it on a "only write out when there is nothing to do
or the memory is really needed" list.
  

Seems like a good idea.  atimes will then be written only by
memory pressure - or umount.  The atimes could be wrong after
a crash, but loosing atimes only is not something
I'd worry about.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-16 Thread Helge Hafting

Andi Kleen wrote:

I always thought the right solution would be to just sync atime only
very very lazily. This means if a inode is only dirty because of an
atime update put it on a only write out when there is nothing to do
or the memory is really needed list.
  

Seems like a good idea.  atimes will then be written only by
memory pressure - or umount.  The atimes could be wrong after
a crash, but loosing atimes only is not something
I'd worry about.

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-16 Thread Dave Jones
On Sun, Aug 05, 2007 at 11:00:29AM -0400, Theodore Tso wrote:

  P.S.  Yet alternative is to specify noatime on an individual
  file/directory basis.  We've had this capability for a *long* time,
  and if a distro were to set noatime for all files in certain
  hierarchies (i.e., /usr/include) and certain top-level directories
  (since the chattr +A flag is inherited)

This came across my mind again earlier, and I went digging.
Can you explain how this works?

I've eyeballed the ext2/ext3 code, and feel like I'm missing something obvious.
I'm guessing that for eg, with /usr/include/stdio.h, we check the inodes
for all four parts of path, and if any of them are +A we avoid the
atime update ?  If so, where does that inheritance happen in the code?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-14 Thread Helge Hafting

Christoph Hellwig wrote:

Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.
  


Indeed.  Just change /bin/mount so it defaults to "noatime"
unless there is an explicit "atime". Similiar for diratime.
Problem solved.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-14 Thread Brice Figureau
On Tue, 2007-08-14 at 04:25 +0200, Andi Kleen wrote:
> On Tue, Aug 14, 2007 at 11:44:56AM +1000, Stewart Smith wrote:
> > > Since the database fits in RAM, the only kind of access Mysql is doing
> > > is writing to the innodb log, the mysql binlog and finally to the innodb
> > > database files.
> > > There are certainly a whole lot of fsync'ing happening.
> > 
> > yes. Keep in mind that the binlog grows in file size too... so this has
> > to sync all the metadata as well (ick, i know).

Back in the first days of my original bug report I moved the binlogs to
another disk and it didn't change anything to my issue.

On Tue, 2007-08-14 at 04:25 +0200, Andi Kleen wrote:
> It might be an interesting experiment to see if it still happens
> with the file system remounted as ext2. ext2 has a much more 
> benign fsync than ext3.

Is it possible to perform a live remount of the fs on ext2 ?

Beside that, the RAID card has a battery backed RAM in write-back mode,
I was told that fsync don't really hurt in this case (moreover the fs is
mounted in journal=writeback mode).

I'll post soon blktrace files in the original bug report, this will show
exactly what is the disk workload in the baseline case _and_ in the
underload atypical case. Maybe that will help to shed some lights on the
issue?

Anyway, thanks,
-- 
Brice Figureau <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-14 Thread Brice Figureau
On Tue, 2007-08-14 at 04:25 +0200, Andi Kleen wrote:
 On Tue, Aug 14, 2007 at 11:44:56AM +1000, Stewart Smith wrote:
   Since the database fits in RAM, the only kind of access Mysql is doing
   is writing to the innodb log, the mysql binlog and finally to the innodb
   database files.
   There are certainly a whole lot of fsync'ing happening.
  
  yes. Keep in mind that the binlog grows in file size too... so this has
  to sync all the metadata as well (ick, i know).

Back in the first days of my original bug report I moved the binlogs to
another disk and it didn't change anything to my issue.

On Tue, 2007-08-14 at 04:25 +0200, Andi Kleen wrote:
 It might be an interesting experiment to see if it still happens
 with the file system remounted as ext2. ext2 has a much more 
 benign fsync than ext3.

Is it possible to perform a live remount of the fs on ext2 ?

Beside that, the RAID card has a battery backed RAM in write-back mode,
I was told that fsync don't really hurt in this case (moreover the fs is
mounted in journal=writeback mode).

I'll post soon blktrace files in the original bug report, this will show
exactly what is the disk workload in the baseline case _and_ in the
underload atypical case. Maybe that will help to shed some lights on the
issue?

Anyway, thanks,
-- 
Brice Figureau [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-14 Thread Helge Hafting

Christoph Hellwig wrote:

Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.
  


Indeed.  Just change /bin/mount so it defaults to noatime
unless there is an explicit atime. Similiar for diratime.
Problem solved.

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-13 Thread Andi Kleen
On Tue, Aug 14, 2007 at 11:44:56AM +1000, Stewart Smith wrote:
> > Since the database fits in RAM, the only kind of access Mysql is doing
> > is writing to the innodb log, the mysql binlog and finally to the innodb
> > database files.
> > There are certainly a whole lot of fsync'ing happening.
> 
> yes. Keep in mind that the binlog grows in file size too... so this has
> to sync all the metadata as well (ick, i know).

It might be an interesting experiment to see if it still happens
with the file system remounted as ext2. ext2 has a much more 
benign fsync than ext3.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-13 Thread Stewart Smith
On Mon, 2007-08-06 at 10:40 +0200, Brice Figureau wrote:
> Mysql accesses its database files in O_DIRECT mode.

binlog is written using buffered IO.

for InnoDB, binlog is synced first, then innodb log. on restart (in 5.0)
these are synced back up so you don't get inconsistencies.

and from a quick look at the innobase source, only data file is using
O_DIRECT.

> Since the database fits in RAM, the only kind of access Mysql is doing
> is writing to the innodb log, the mysql binlog and finally to the innodb
> database files.
> There are certainly a whole lot of fsync'ing happening.

yes. Keep in mind that the binlog grows in file size too... so this has
to sync all the metadata as well (ick, i know).
-- 
Stewart Smith, Senior Software Engineer
MySQL AB, www.mysql.com
Office: +14082136540 Ext: 6616
VoIP: [EMAIL PROTECTED]
Mobile: +61 4 3 8844 332

Jumpstart your cluster:
http://www.mysql.com/consulting/packaged/cluster.html


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-13 Thread Stewart Smith
On Mon, 2007-08-06 at 10:40 +0200, Brice Figureau wrote:
 Mysql accesses its database files in O_DIRECT mode.

binlog is written using buffered IO.

for InnoDB, binlog is synced first, then innodb log. on restart (in 5.0)
these are synced back up so you don't get inconsistencies.

and from a quick look at the innobase source, only data file is using
O_DIRECT.

 Since the database fits in RAM, the only kind of access Mysql is doing
 is writing to the innodb log, the mysql binlog and finally to the innodb
 database files.
 There are certainly a whole lot of fsync'ing happening.

yes. Keep in mind that the binlog grows in file size too... so this has
to sync all the metadata as well (ick, i know).
-- 
Stewart Smith, Senior Software Engineer
MySQL AB, www.mysql.com
Office: +14082136540 Ext: 6616
VoIP: [EMAIL PROTECTED]
Mobile: +61 4 3 8844 332

Jumpstart your cluster:
http://www.mysql.com/consulting/packaged/cluster.html


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-13 Thread Andi Kleen
On Tue, Aug 14, 2007 at 11:44:56AM +1000, Stewart Smith wrote:
  Since the database fits in RAM, the only kind of access Mysql is doing
  is writing to the innodb log, the mysql binlog and finally to the innodb
  database files.
  There are certainly a whole lot of fsync'ing happening.
 
 yes. Keep in mind that the binlog grows in file size too... so this has
 to sync all the metadata as well (ick, i know).

It might be an interesting experiment to see if it still happens
with the file system remounted as ext2. ext2 has a much more 
benign fsync than ext3.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-11 Thread Valerie Henson
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote:
> Andrew Morton wrote:
> >On Wed, 08 Aug 2007 14:10:15 -0700
> >"Martin J. Bligh" <[EMAIL PROTECTED]> wrote:
> >
> >>Why isn't this easily fixable by just adding an additional dirty
> >>flag that says atime has changed? Then we only cause a write
> >>when we remove the inode from the inode cache, if only atime
> >>is updated.
> >
> >I think that could be made to work, and it would fix the performance
> >issue.
> >
> >It is a behaviour change.  At present ext3 (for example) commits everything
> >every five seconds.  After a change like this, a crash+recovery could cause
> >a file's atime to go backwards by an arbitrarily large time interval - it
> >could easily be months.
> 
> A second pdflush / workqueue at a slower rate would alleviate that.

This becomes delayed atime writes.  I'm not sure that it's better to
batch up the writes and do them all in one big seeky go, or to trickle
them out as they are done.  Best of all is not to do them at all.

Note when talking about saving up atime updates to write out that the
final write is going to be sloow.  Inodes are typically 128 bytes,
and you may have to do a seek between every one.  Currents disks can
do on the order of 100 seeks a second.  So do a find on 1000 files and
you've just created 10 seconds of I/O hanging out in memory.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-11 Thread Valerie Henson
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote:
 Andrew Morton wrote:
 On Wed, 08 Aug 2007 14:10:15 -0700
 Martin J. Bligh [EMAIL PROTECTED] wrote:
 
 Why isn't this easily fixable by just adding an additional dirty
 flag that says atime has changed? Then we only cause a write
 when we remove the inode from the inode cache, if only atime
 is updated.
 
 I think that could be made to work, and it would fix the performance
 issue.
 
 It is a behaviour change.  At present ext3 (for example) commits everything
 every five seconds.  After a change like this, a crash+recovery could cause
 a file's atime to go backwards by an arbitrarily large time interval - it
 could easily be months.
 
 A second pdflush / workqueue at a slower rate would alleviate that.

This becomes delayed atime writes.  I'm not sure that it's better to
batch up the writes and do them all in one big seeky go, or to trickle
them out as they are done.  Best of all is not to do them at all.

Note when talking about saving up atime updates to write out that the
final write is going to be sloow.  Inodes are typically 128 bytes,
and you may have to do a seek between every one.  Currents disks can
do on the order of 100 seeks a second.  So do a find on 1000 files and
you've just created 10 seconds of I/O hanging out in memory.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-10 Thread Valdis . Kletnieks
On Fri, 10 Aug 2007 00:04:45 EDT, Bill Davidsen said:

> > I never imagined that itwas the 20%+ hit that is being described, and 
> > with so little impact, or I would have switched to it across the board 
> > years ago.
> > 
> To get that magnitude you need slow disk with very fast CPU. It helps 
> most of systems where the disk hardware is marginal or worse for the i/o 
> load. Don't take that as typical.

I suspect that almost every single laptop with a Core2 Duo in it falls into
that classification, and it's getting worse every year, as we see more
disparity between CPU speeds (increasing) and disk seek times (basically nailed
to the floor for the last decade).



pgpSAQlmGIEyL.pgp
Description: PGP signature


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-10 Thread pointman
Updating the manual page mount(8) with an expanded description of atime/noatime 
and adding nodirtime and data= seems much more reasonable than hacking 
the kernel because you want others to run their systems the way you think they 
should.

Almost every web search of "linux fast disk" (or related words) references 
noatime, and many ext3 specific documents explain the caching options.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-10 Thread pointman
Updating the manual page mount(8) with an expanded description of atime/noatime 
and adding nodirtime and data=option seems much more reasonable than hacking 
the kernel because you want others to run their systems the way you think they 
should.

Almost every web search of linux fast disk (or related words) references 
noatime, and many ext3 specific documents explain the caching options.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-10 Thread Valdis . Kletnieks
On Fri, 10 Aug 2007 00:04:45 EDT, Bill Davidsen said:

  I never imagined that itwas the 20%+ hit that is being described, and 
  with so little impact, or I would have switched to it across the board 
  years ago.
  
 To get that magnitude you need slow disk with very fast CPU. It helps 
 most of systems where the disk hardware is marginal or worse for the i/o 
 load. Don't take that as typical.

I suspect that almost every single laptop with a Core2 Duo in it falls into
that classification, and it's getting worse every year, as we see more
disparity between CPU speeds (increasing) and disk seek times (basically nailed
to the floor for the last decade).



pgpSAQlmGIEyL.pgp
Description: PGP signature


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Bill Davidsen

Andi Kleen wrote:

richard kennedy <[EMAIL PROTECTED]> writes:

This is on a standard desktop machine so there are lots of other
processes running on it, and although there is a degree of variability
in the numbers,they are very repeatable and your patch always out
performs the stock mm2.
looks good to me


iirc the goal of this is less to get better performance, but to avoid long user 
visible
latencies.  Of course if it's faster it's great too, but that's only secondary.

What a trade-off, if you want to get rid of long latency you have to 
live with better throughput. I can live with that. ;-)


Your point well taken, not the intent of the patch, but it may indicate 
where a performance bottleneck happens as well.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Bill Davidsen

[EMAIL PROTECTED] wrote:

On Sun, 5 Aug 2007, Diego Calleja wrote:


El Sun, 5 Aug 2007 09:13:20 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió:


Measurements show that noatime helps 20-30% on regular desktop
workloads, easily 50% for kernel builds and much more than that (in
excess of 100%) for file-read-intense workloads. We cannot just walk



And as everybody knows in servers is a popular practice to disable it.
According to an interview to the kernel.org admins

"Beyond that, Peter noted, "very little fancy is going on, and that is 
good
because fancy is hard to maintain." He explained that the only fancy 
thing

being done is that all filesystems are mounted noatime meaning that the
system doesn't have to make writes to the filesystem for files which are
simply being read, "that cut the load average in half."

I bet that some people would consider such performance hit a bug...



actually, it's popular practice to disable it by people who know how big 
a hit it is and know how few programs use it.


i've been a linux sysadmin for 10 years, and have known about noatime 
for at least 7 years, but I always thought of it in the catagory of 'use 
it only on your performance critical machines where you are trying to 
extract every ounce of performance, and keep an eye out for things 
misbehaving'


I never imagined that itwas the 20%+ hit that is being described, and 
with so little impact, or I would have switched to it across the board 
years ago.


To get that magnitude you need slow disk with very fast CPU. It helps 
most of systems where the disk hardware is marginal or worse for the i/o 
load. Don't take that as typical.



I'll bet there are a lot of admins out there in the same boat.

adding an option in the kernel to change the default sounds like a very 
good first step, even if the default isn't changed today.




--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Bill Davidsen

Andrew Morton wrote:

On Wed, 08 Aug 2007 14:10:15 -0700
"Martin J. Bligh" <[EMAIL PROTECTED]> wrote:


Why isn't this easily fixable by just adding an additional dirty
flag that says atime has changed? Then we only cause a write
when we remove the inode from the inode cache, if only atime
is updated.


I think that could be made to work, and it would fix the performance
issue.

It is a behaviour change.  At present ext3 (for example) commits everything
every five seconds.  After a change like this, a crash+recovery could cause
a file's atime to go backwards by an arbitrarily large time interval - it
could easily be months.

I would think that (really) updating atime on open would be enough, 
hopefully without being too much. The "lazyatime" thing I was playing 
with only updated on open, final close, write, and fork.


I like the idea of updating once in a while, but one of the benefits of 
noatime is allowing drives to spin down via inactivity. If something 
does get done in the area of less but non-zero atime tracking, perhaps 
that could be taken into account. I have to check what "laptop_mode 
actually does, since my laptops are old installs.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Diego Calleja
El Thu, 09 Aug 2007 11:02:38 -0400, Chuck Ebbert <[EMAIL PROTECTED]> escribió:

> NT maintains atimes by default, at least up to XP. You have to edit the
> registry to turn them off, and it is a single global switch -- not per
> mountpoint like Unix.
> 
> And it makes a huge difference there, too.

In windows Vista they've disabled atime updates by default.

And XP maintains atimes, but it uses a trick to avoid the performance
penalty we suffer in linux, similar to what Andi Kleen suggested: they
keep atime updates in memory for one hour, and only sync to disk after
that time - of course they also sync it if there's a oportunity to do it, like
when updating mtime.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Chuck Ebbert
On 08/09/2007 02:25 AM, Lionel Elie Mamane wrote:
> 
>> yeah, it's really ugly. But otherwise i've got no real complaint
>> about ext3 - with the obligatory qualification that
>> "noatime,nodiratime" in /etc/fstab is a must. This speeds up things
>> very visibly (...). So for most file workloads we give Windows a
>> 20%-30% performance edge, for almost nothing.
> 
> It has been years since I used MS Windows much, but from my memories
> of my these days, I was under the impression that it (at least the NT
> line, the only surviving line these days) also maintained "last
> accessed" times. Except I only ever saw it at "right now" because the
> file explorer ... accesses the file before getting this metadata or
> something like that (when you right-click on a file and ask for its
> properties). It has creation and last modification time, too.
> 

NT maintains atimes by default, at least up to XP. You have to edit the
registry to turn them off, and it is a single global switch -- not per
mountpoint like Unix.

And it makes a huge difference there, too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Lionel Elie Mamane
On Sat, Aug 04, 2007 at 06:37:33PM +0200, Ingo Molnar wrote:
> * Linus Torvalds <[EMAIL PROTECTED]> wrote:

>> The fact is, ext3 *sucks* at fsync. I hate hate hate it. It's
>> totally unusable, imnsho.

> yeah, it's really ugly. But otherwise i've got no real complaint
> about ext3 - with the obligatory qualification that
> "noatime,nodiratime" in /etc/fstab is a must. This speeds up things
> very visibly (...). So for most file workloads we give Windows a
> 20%-30% performance edge, for almost nothing.

It has been years since I used MS Windows much, but from my memories
of my these days, I was under the impression that it (at least the NT
line, the only surviving line these days) also maintained "last
accessed" times. Except I only ever saw it at "right now" because the
file explorer ... accesses the file before getting this metadata or
something like that (when you right-click on a file and ask for its
properties). It has creation and last modification time, too.

So, if my memories are correct, there is no performance edge to be
conceded by having atime (but one to be gained by not having atime).

-- 
Lionel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Lionel Elie Mamane
On Sat, Aug 04, 2007 at 06:37:33PM +0200, Ingo Molnar wrote:
 * Linus Torvalds [EMAIL PROTECTED] wrote:

 The fact is, ext3 *sucks* at fsync. I hate hate hate it. It's
 totally unusable, imnsho.

 yeah, it's really ugly. But otherwise i've got no real complaint
 about ext3 - with the obligatory qualification that
 noatime,nodiratime in /etc/fstab is a must. This speeds up things
 very visibly (...). So for most file workloads we give Windows a
 20%-30% performance edge, for almost nothing.

It has been years since I used MS Windows much, but from my memories
of my these days, I was under the impression that it (at least the NT
line, the only surviving line these days) also maintained last
accessed times. Except I only ever saw it at right now because the
file explorer ... accesses the file before getting this metadata or
something like that (when you right-click on a file and ask for its
properties). It has creation and last modification time, too.

So, if my memories are correct, there is no performance edge to be
conceded by having atime (but one to be gained by not having atime).

-- 
Lionel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Chuck Ebbert
On 08/09/2007 02:25 AM, Lionel Elie Mamane wrote:
 
 yeah, it's really ugly. But otherwise i've got no real complaint
 about ext3 - with the obligatory qualification that
 noatime,nodiratime in /etc/fstab is a must. This speeds up things
 very visibly (...). So for most file workloads we give Windows a
 20%-30% performance edge, for almost nothing.
 
 It has been years since I used MS Windows much, but from my memories
 of my these days, I was under the impression that it (at least the NT
 line, the only surviving line these days) also maintained last
 accessed times. Except I only ever saw it at right now because the
 file explorer ... accesses the file before getting this metadata or
 something like that (when you right-click on a file and ask for its
 properties). It has creation and last modification time, too.
 

NT maintains atimes by default, at least up to XP. You have to edit the
registry to turn them off, and it is a single global switch -- not per
mountpoint like Unix.

And it makes a huge difference there, too.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Diego Calleja
El Thu, 09 Aug 2007 11:02:38 -0400, Chuck Ebbert [EMAIL PROTECTED] escribió:

 NT maintains atimes by default, at least up to XP. You have to edit the
 registry to turn them off, and it is a single global switch -- not per
 mountpoint like Unix.
 
 And it makes a huge difference there, too.

In windows Vista they've disabled atime updates by default.

And XP maintains atimes, but it uses a trick to avoid the performance
penalty we suffer in linux, similar to what Andi Kleen suggested: they
keep atime updates in memory for one hour, and only sync to disk after
that time - of course they also sync it if there's a oportunity to do it, like
when updating mtime.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Bill Davidsen

Andrew Morton wrote:

On Wed, 08 Aug 2007 14:10:15 -0700
Martin J. Bligh [EMAIL PROTECTED] wrote:


Why isn't this easily fixable by just adding an additional dirty
flag that says atime has changed? Then we only cause a write
when we remove the inode from the inode cache, if only atime
is updated.


I think that could be made to work, and it would fix the performance
issue.

It is a behaviour change.  At present ext3 (for example) commits everything
every five seconds.  After a change like this, a crash+recovery could cause
a file's atime to go backwards by an arbitrarily large time interval - it
could easily be months.

I would think that (really) updating atime on open would be enough, 
hopefully without being too much. The lazyatime thing I was playing 
with only updated on open, final close, write, and fork.


I like the idea of updating once in a while, but one of the benefits of 
noatime is allowing drives to spin down via inactivity. If something 
does get done in the area of less but non-zero atime tracking, perhaps 
that could be taken into account. I have to check what laptop_mode 
actually does, since my laptops are old installs.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Bill Davidsen

Andi Kleen wrote:

richard kennedy [EMAIL PROTECTED] writes:

This is on a standard desktop machine so there are lots of other
processes running on it, and although there is a degree of variability
in the numbers,they are very repeatable and your patch always out
performs the stock mm2.
looks good to me


iirc the goal of this is less to get better performance, but to avoid long user 
visible
latencies.  Of course if it's faster it's great too, but that's only secondary.

What a trade-off, if you want to get rid of long latency you have to 
live with better throughput. I can live with that. ;-)


Your point well taken, not the intent of the patch, but it may indicate 
where a performance bottleneck happens as well.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-09 Thread Bill Davidsen

[EMAIL PROTECTED] wrote:

On Sun, 5 Aug 2007, Diego Calleja wrote:


El Sun, 5 Aug 2007 09:13:20 +0200, Ingo Molnar [EMAIL PROTECTED] escribió:


Measurements show that noatime helps 20-30% on regular desktop
workloads, easily 50% for kernel builds and much more than that (in
excess of 100%) for file-read-intense workloads. We cannot just walk



And as everybody knows in servers is a popular practice to disable it.
According to an interview to the kernel.org admins

Beyond that, Peter noted, very little fancy is going on, and that is 
good
because fancy is hard to maintain. He explained that the only fancy 
thing

being done is that all filesystems are mounted noatime meaning that the
system doesn't have to make writes to the filesystem for files which are
simply being read, that cut the load average in half.

I bet that some people would consider such performance hit a bug...



actually, it's popular practice to disable it by people who know how big 
a hit it is and know how few programs use it.


i've been a linux sysadmin for 10 years, and have known about noatime 
for at least 7 years, but I always thought of it in the catagory of 'use 
it only on your performance critical machines where you are trying to 
extract every ounce of performance, and keep an eye out for things 
misbehaving'


I never imagined that itwas the 20%+ hit that is being described, and 
with so little impact, or I would have switched to it across the board 
years ago.


To get that magnitude you need slow disk with very fast CPU. It helps 
most of systems where the disk hardware is marginal or worse for the i/o 
load. Don't take that as typical.



I'll bet there are a lot of admins out there in the same boat.

adding an option in the kernel to change the default sounds like a very 
good first step, even if the default isn't changed today.




--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread david

On Sat, 4 Aug 2007, Ray Lee wrote:


On 8/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Sat, 4 Aug 2007, Ingo Molnar wrote:


At least on a surface level, your report has some similarities to
http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller
mentions several things he tried without effect:

< - I increased the max allowed receive buffer through
< proc/sys/net/core/rmem_max and the application calls the right
< syscall. "netstat -su" does not show any "packet receive errors".


mercury1:/proc/sys/net/core# cat rmem_*
124928
131071
mercury1:/proc/sys/net/core# netstat -su
Udp:
697853177 packets received
10025642 packets to unknown port received.
191726680 packet receive errors
63194 packets sent
RcvbufErrors: 191726680
UdpLite:
mercury1:/proc/sys/net/core# echo "512000" >rmem_max


< - After getting "kernel: swapper: page allocation failure.
< order:0, mode:0x20", I increased /proc/sys/vm/min_free_kbytes


I have not seen any similar errors


< - ixgb.txt in kernel network documentation suggests to increase
< net.core.netdev_max_backlog to 30. This did not help.


mercury1:/proc/sys/net/core# cat netdev_*
300
1000
mercury1:/proc/sys/net/core# echo "30" >netdev_max_backlog


< - I also had to increase net.core.optmem_max, because the default
< value was too small for 700 multicast groups.


I'm not running multicast.


As they're all pretty simple to test, it may be worthwhile to give
them a shot just to rule things out.


unfortunantly the load is not high enough right now to see a real 
difference (it's only doing ~1400 logs/sec) I'll catch it at a higher load 
point to see if these make any difference.


David Lang


Ray


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Andi Kleen
Greg Trounson <[EMAIL PROTECTED]> writes:

> mount [fs] -o remount,noatime,nodiratime

nodiratime is implied in noatime.

> I get a compile time of 1m23.368s, a mere 6% improvement.

6% is nothing to sneeze at. A lot of optimizations would kill for less

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread david

On Thu, 9 Aug 2007, Greg Trounson wrote:


 Measurements show that noatime helps 20-30% on regular desktop workloads,
 easily 50% for kernel builds and much more than that (in excess of 100%)
 for file-read-intense workloads. We cannot just walk past such a _huge_
 performance impact so easily without even reacting to the performance
 arguments, and i'm happy Ubuntu picked up noatime,nodiratime and is
 whipping up the floor with Fedora on the desktop.



Sorry I'm just not seeing those gains here.  With my filesystems mounted with 
atime defaults the Quake sources build in 1m28.856s.  A test with ls -ltu 
verifies that atime is working as expected.  When I remount my filesystems 
with:

mount [fs] -o remount,noatime,nodiratime
I get a compile time of 1m23.368s, a mere 6% improvement.

This is on a dual-core Athlon 4200+ box running 2.6.21, so I would have 
thought this to be close to a best-case file I/O test.


what sort of disks does this box have? and what filesystem? slower 
disks/filesystems can result in this showing a larger difference.


however 6% is a fairly significant gain.

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Greg Trounson

Ingo Molnar wrote:

* Alan Cox <[EMAIL PROTECTED]> wrote:

People just need to know about the performance differences - very 
few realise its more than a fraction of a percent. I'm sure Gentoo 
will use relatime the moment anyone knows its > 5% 8)
noatime,nodiratime gave 50% of wall-clock kernel rpm build 
performance improvement for Dave Jones, on a beefy box. Unless i 
misunderstood what you meant under 'fraction of a percent' your 
numbers are _WAY_ off.

What numbers - I didn't quote any performance numbers ?


ok, i misunderstood your "very few realise its more than a fraction of a 
percent" sentence, i thought you were saying it's a fraction of a 
percent.


Measurements show that noatime helps 20-30% on regular desktop 
workloads, easily 50% for kernel builds and much more than that (in 
excess of 100%) for file-read-intense workloads. We cannot just walk 
past such a _huge_ performance impact so easily without even reacting to 
the performance arguments, and i'm happy Ubuntu picked up 
noatime,nodiratime and is whipping up the floor with Fedora on the 
desktop.




Sorry I'm just not seeing those gains here.  With my filesystems mounted with atime 
defaults the Quake sources build in 1m28.856s.  A test with ls -ltu verifies that atime is 
working as expected.  When I remount my filesystems with:

mount [fs] -o remount,noatime,nodiratime
I get a compile time of 1m23.368s, a mere 6% improvement.

This is on a dual-core Athlon 4200+ box running 2.6.21, so I would have thought this to be 
close to a best-case file I/O test.


Greg
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Martin Bligh

Andrew Morton wrote:

On Wed, 08 Aug 2007 14:10:15 -0700
"Martin J. Bligh" <[EMAIL PROTECTED]> wrote:


Why isn't this easily fixable by just adding an additional dirty
flag that says atime has changed? Then we only cause a write
when we remove the inode from the inode cache, if only atime
is updated.


I think that could be made to work, and it would fix the performance
issue.

It is a behaviour change.  At present ext3 (for example) commits everything
every five seconds.  After a change like this, a crash+recovery could cause
a file's atime to go backwards by an arbitrarily large time interval - it
could easily be months.


A second pdflush / workqueue at a slower rate would alleviate that.

Yes, it's a semantic change ... but only in an incredibly small
corner-case ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Alan Cox
On Wed, 08 Aug 2007 15:39:52 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Bill Davidsen wrote:
> > Being standards compliant is not an argument it's a design goal, a 
> > requirement. Standards compliance is like pregant, you are or you're 
> 
> Linux history says different.  There was always the "final 1%" of 
> compliance that required silliness we really did not want to bother with.

This isn't about the 1% however. Its about API and ABI. Changing the
default is a fairly evil ABI change. Telling everyone relatime is cool on
desktops and defaulting it in the distro is not an ABI change and is very
sensible
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Andrew Morton
On Wed, 08 Aug 2007 14:10:15 -0700
"Martin J. Bligh" <[EMAIL PROTECTED]> wrote:

> Why isn't this easily fixable by just adding an additional dirty
> flag that says atime has changed? Then we only cause a write
> when we remove the inode from the inode cache, if only atime
> is updated.

I think that could be made to work, and it would fix the performance
issue.

It is a behaviour change.  At present ext3 (for example) commits everything
every five seconds.  After a change like this, a crash+recovery could cause
a file's atime to go backwards by an arbitrarily large time interval - it
could easily be months.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Martin J. Bligh

Christoph Hellwig wrote:

On Sat, Aug 04, 2007 at 09:42:59PM +0200, J??rn Engel wrote:
  

On Sat, 4 August 2007 21:26:15 +0200, J??rn Engel wrote:


Given the choice between only "atime" and "noatime" I'd agree with you.
Heck, I use it myself.  But "relatime" seems to combine the best of both
worlds.  It currently just suffers from mount not supporting it in any
relevant distro.
  

And here is a completely untested patch to enable it by default.  Ingo,
can you see how good this fares compared to "atime" and
"noatime,nodiratime"?



Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.
  


From what I've seen the problem seems to be that the inode
gets marked dirty when we update atime.

Why isn't this easily fixable by just adding an additional dirty
flag that says atime has changed? Then we only cause a write
when we remove the inode from the inode cache, if only atime
is updated.

Unlike relatime, there's no user-visible change (unless the
machine crashes without clean unmount, but not sure anyone
cares that much about that cornercase). Atime changes are
thus kept in-ram until umount / inode reclaim.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Bill Davidsen

Jeff Garzik wrote:

Bill Davidsen wrote:
Being standards compliant is not an argument it's a design goal, a 
requirement. Standards compliance is like pregant, you are or you're 


Linux history says different.  There was always the "final 1%" of 
compliance that required silliness we really did not want to bother with. 


This is not 1%, this is a user-visible change in behavior, relative to 
all previous Linux versions. There has been a way for ages to trade 
performance for standards for users or distributions, and standards have 
been chosen. Given that there is now a way to get virtually all of the 
performance without giving up atime completely, why the sudden attempt 
to change to a less satisfactory default?


I could understand a push to quickly get relatime with a few 
enhancements (the functionality if not the exact code) into 
distributions, even as a default, but forcing user or distribution 
changes just to retain the same dehavior doesn't seem reasonable. It 
assumes that vendors and users are so stupid they can't understand why 
benchmark results and more important than standards. People who run 
servers are smart enough to decide if their application will run as 
expected without atime.


People have lived with this compromise for a very long time, and it 
seems that a far more balanced solution will be in the kernel soon.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Jeff Garzik

Bill Davidsen wrote:
Being standards compliant is not an argument it's a design goal, a 
requirement. Standards compliance is like pregant, you are or you're 


Linux history says different.  There was always the "final 1%" of 
compliance that required silliness we really did not want to bother with.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Bill Davidsen

Ingo Molnar wrote:

|| ...For me, I would say 50% is not enough to describe the _visible_ 
|| benefits... Not talking any specific number but past 10sec-1min+ 
|| lagging in X is history, it's gone and I really don't miss it that 
|| much... :-) Cannot reproduce even a second long delay anymore in 
|| window focusing under considerable load as it's basically 
|| instantaneous (I can see that it's loaded but doesn't affect the 
|| feeling of responsiveness I'm now getting), even on some loads that I 
|| couldn't previously even dream of... [...]


we really have to ask ourselves whether the "process" is correct if 
advantages to the user of this order of magnitude can be brushed aside 
with simple "this breaks binary-only HSM" and "it's not standards 
compliant" arguments.


Being standards compliant is not an argument it's a design goal, a 
requirement. Standards compliance is like pregant, you are or you're 
not. And to deliberately ignore standards for speed is saying "it's too 
hard to do it right, I'll do it wrong and it will be faster." The answer 
is to do it smarter, with solutions like relatime (which can be enhanced 
as Linus noted) which provide performance benefits without ignoring 
standards, or use of a filesystem which does a better job. But when it 
goes in the kernel the choice of having per-filesystem behavior either 
vanishes or becomes an exercise in complex and as-yet unwritten mount 
options.


There are certainly ways to improve ext3, not journaling atime updates 
would certainly be one, less frequent updates of dirty inodes, whatever. 
But if a user wants to give up standards compliance it should be a 
deliberate choice, not something which the average user will not 
understand or learn to do.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Bill Davidsen

Alan Cox wrote:
However, relatime has the POSIX behavior without the overhead. Therefore 


No. relatime has approximately SuS behaviour. Its not the same as
"correct" behaviour.

Actually correct, but in terms of what can or does break, relatime seems 
a lot closer than noatime, I can't (personally) come up with any 
scenario where real applications would see something which would change 
behavior adversely.


Making noatime a default in the kernel requiring a boot option to 
restore current behavior seems to be a turn toward the "it doesn't 
really work right but it's *fast*" model. If vendors wanted noatime they 
are smart enough to enable it. Now with relatime giving most of the 
benefits and few (of any) of the side effects, I would expect a change.


By all means relatime by default in FC8, but not noatime, and let those 
who find some measurable benefit from noatime use it.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Andi Kleen
richard kennedy <[EMAIL PROTECTED]> writes:
> 
> This is on a standard desktop machine so there are lots of other
> processes running on it, and although there is a degree of variability
> in the numbers,they are very repeatable and your patch always out
> performs the stock mm2.
> looks good to me

iirc the goal of this is less to get better performance, but to avoid long user 
visible
latencies.  Of course if it's faster it's great too, but that's only secondary.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread richard kennedy
On Fri, 2007-08-03 at 14:37 +0200, Peter Zijlstra wrote:
> Per device dirty throttling patches
> 
> These patches aim to improve balance_dirty_pages() and directly address three
> issues:
>   1) inter device starvation
>   2) stacked device deadlocks
>   3) inter process starvation

Hi Peter,
I've been testing your patch with a simple test case that copies a 3GB
file from sda -> sda, and copies a 1GB file from sda -> sdb.
the script is roughly this :-

dd bs=64k if=[sda]/data3g of=[sda]/temp_data3g &
sleep 60
dd bs=64k if=[sda]/data1g of=[sdb]/temp_data1g &
wait
sleep 200

On my amd64x2 desktop machine where sda is a sata 250 GB drive & sdb is
an ide 300 GB drive.

Running this test 5 times gives
2.6.23-rc1-mm2
1GB copy MB/s   3GB copy MB/s
16.216.1
15.214.6
17.314.6
18.014.5
19.014.6

2.6.23-rc1-mm2+pddt_patch
1GB copy MB/s   3GB copy MB/s
23.014.7
24.014.6
20.414.8
22.614.5
23.214.5

This is on a standard desktop machine so there are lots of other
processes running on it, and although there is a degree of variability
in the numbers,they are very repeatable and your patch always out
performs the stock mm2.
looks good to me

Richard
  






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Karel Zak
On Sat, Aug 04, 2007 at 07:17:24PM +0200, Ingo Molnar wrote:
> 
> * Diego Calleja <[EMAIL PROTECTED]> wrote:
> 
> > El Sat, 4 Aug 2007 18:37:33 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió:
> > 
> > > thousands of applications. So for most file workloads we give 
> > > Windows a 20%-30% performance edge, for almost nothing. (for 
> > > RAM-starved kernel builds the performance difference between atime 
> > > and noatime+nodiratime setups is more on the order of 40%)
> > 
> > Just curious - do you have numbers with relatime?
> 
> nope. Stupid question, i just tried it and got this:
> 
>  EXT3-fs: Unrecognized mount option "relatime" or missing value
> 
> i've got util-linux-2.13-0.46.fc6 and 2.6.22 on that box, shouldnt that 

 The relatime patch has been applied to util-lilnux-ng-2.13 (now -rc3),
 you will see it in Fedora 8 (and probably in the others distros).

Karel

-- 
 Karel Zak  <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Karel Zak
On Sat, Aug 04, 2007 at 07:17:24PM +0200, Ingo Molnar wrote:
 
 * Diego Calleja [EMAIL PROTECTED] wrote:
 
  El Sat, 4 Aug 2007 18:37:33 +0200, Ingo Molnar [EMAIL PROTECTED] escribió:
  
   thousands of applications. So for most file workloads we give 
   Windows a 20%-30% performance edge, for almost nothing. (for 
   RAM-starved kernel builds the performance difference between atime 
   and noatime+nodiratime setups is more on the order of 40%)
  
  Just curious - do you have numbers with relatime?
 
 nope. Stupid question, i just tried it and got this:
 
  EXT3-fs: Unrecognized mount option relatime or missing value
 
 i've got util-linux-2.13-0.46.fc6 and 2.6.22 on that box, shouldnt that 

 The relatime patch has been applied to util-lilnux-ng-2.13 (now -rc3),
 you will see it in Fedora 8 (and probably in the others distros).

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread richard kennedy
On Fri, 2007-08-03 at 14:37 +0200, Peter Zijlstra wrote:
 Per device dirty throttling patches
 
 These patches aim to improve balance_dirty_pages() and directly address three
 issues:
   1) inter device starvation
   2) stacked device deadlocks
   3) inter process starvation
snip
Hi Peter,
I've been testing your patch with a simple test case that copies a 3GB
file from sda - sda, and copies a 1GB file from sda - sdb.
the script is roughly this :-

dd bs=64k if=[sda]/data3g of=[sda]/temp_data3g 
sleep 60
dd bs=64k if=[sda]/data1g of=[sdb]/temp_data1g 
wait
sleep 200

On my amd64x2 desktop machine where sda is a sata 250 GB drive  sdb is
an ide 300 GB drive.

Running this test 5 times gives
2.6.23-rc1-mm2
1GB copy MB/s   3GB copy MB/s
16.216.1
15.214.6
17.314.6
18.014.5
19.014.6

2.6.23-rc1-mm2+pddt_patch
1GB copy MB/s   3GB copy MB/s
23.014.7
24.014.6
20.414.8
22.614.5
23.214.5

This is on a standard desktop machine so there are lots of other
processes running on it, and although there is a degree of variability
in the numbers,they are very repeatable and your patch always out
performs the stock mm2.
looks good to me

Richard
  






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Andi Kleen
richard kennedy [EMAIL PROTECTED] writes:
 
 This is on a standard desktop machine so there are lots of other
 processes running on it, and although there is a degree of variability
 in the numbers,they are very repeatable and your patch always out
 performs the stock mm2.
 looks good to me

iirc the goal of this is less to get better performance, but to avoid long user 
visible
latencies.  Of course if it's faster it's great too, but that's only secondary.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Bill Davidsen

Alan Cox wrote:
However, relatime has the POSIX behavior without the overhead. Therefore 


No. relatime has approximately SuS behaviour. Its not the same as
correct behaviour.

Actually correct, but in terms of what can or does break, relatime seems 
a lot closer than noatime, I can't (personally) come up with any 
scenario where real applications would see something which would change 
behavior adversely.


Making noatime a default in the kernel requiring a boot option to 
restore current behavior seems to be a turn toward the it doesn't 
really work right but it's *fast* model. If vendors wanted noatime they 
are smart enough to enable it. Now with relatime giving most of the 
benefits and few (of any) of the side effects, I would expect a change.


By all means relatime by default in FC8, but not noatime, and let those 
who find some measurable benefit from noatime use it.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Bill Davidsen

Ingo Molnar wrote:

|| ...For me, I would say 50% is not enough to describe the _visible_ 
|| benefits... Not talking any specific number but past 10sec-1min+ 
|| lagging in X is history, it's gone and I really don't miss it that 
|| much... :-) Cannot reproduce even a second long delay anymore in 
|| window focusing under considerable load as it's basically 
|| instantaneous (I can see that it's loaded but doesn't affect the 
|| feeling of responsiveness I'm now getting), even on some loads that I 
|| couldn't previously even dream of... [...]


we really have to ask ourselves whether the process is correct if 
advantages to the user of this order of magnitude can be brushed aside 
with simple this breaks binary-only HSM and it's not standards 
compliant arguments.


Being standards compliant is not an argument it's a design goal, a 
requirement. Standards compliance is like pregant, you are or you're 
not. And to deliberately ignore standards for speed is saying it's too 
hard to do it right, I'll do it wrong and it will be faster. The answer 
is to do it smarter, with solutions like relatime (which can be enhanced 
as Linus noted) which provide performance benefits without ignoring 
standards, or use of a filesystem which does a better job. But when it 
goes in the kernel the choice of having per-filesystem behavior either 
vanishes or becomes an exercise in complex and as-yet unwritten mount 
options.


There are certainly ways to improve ext3, not journaling atime updates 
would certainly be one, less frequent updates of dirty inodes, whatever. 
But if a user wants to give up standards compliance it should be a 
deliberate choice, not something which the average user will not 
understand or learn to do.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Jeff Garzik

Bill Davidsen wrote:
Being standards compliant is not an argument it's a design goal, a 
requirement. Standards compliance is like pregant, you are or you're 


Linux history says different.  There was always the final 1% of 
compliance that required silliness we really did not want to bother with.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Bill Davidsen

Jeff Garzik wrote:

Bill Davidsen wrote:
Being standards compliant is not an argument it's a design goal, a 
requirement. Standards compliance is like pregant, you are or you're 


Linux history says different.  There was always the final 1% of 
compliance that required silliness we really did not want to bother with. 


This is not 1%, this is a user-visible change in behavior, relative to 
all previous Linux versions. There has been a way for ages to trade 
performance for standards for users or distributions, and standards have 
been chosen. Given that there is now a way to get virtually all of the 
performance without giving up atime completely, why the sudden attempt 
to change to a less satisfactory default?


I could understand a push to quickly get relatime with a few 
enhancements (the functionality if not the exact code) into 
distributions, even as a default, but forcing user or distribution 
changes just to retain the same dehavior doesn't seem reasonable. It 
assumes that vendors and users are so stupid they can't understand why 
benchmark results and more important than standards. People who run 
servers are smart enough to decide if their application will run as 
expected without atime.


People have lived with this compromise for a very long time, and it 
seems that a far more balanced solution will be in the kernel soon.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Martin J. Bligh

Christoph Hellwig wrote:

On Sat, Aug 04, 2007 at 09:42:59PM +0200, J??rn Engel wrote:
  

On Sat, 4 August 2007 21:26:15 +0200, J??rn Engel wrote:


Given the choice between only atime and noatime I'd agree with you.
Heck, I use it myself.  But relatime seems to combine the best of both
worlds.  It currently just suffers from mount not supporting it in any
relevant distro.
  

And here is a completely untested patch to enable it by default.  Ingo,
can you see how good this fares compared to atime and
noatime,nodiratime?



Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.
  


From what I've seen the problem seems to be that the inode
gets marked dirty when we update atime.

Why isn't this easily fixable by just adding an additional dirty
flag that says atime has changed? Then we only cause a write
when we remove the inode from the inode cache, if only atime
is updated.

Unlike relatime, there's no user-visible change (unless the
machine crashes without clean unmount, but not sure anyone
cares that much about that cornercase). Atime changes are
thus kept in-ram until umount / inode reclaim.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Andrew Morton
On Wed, 08 Aug 2007 14:10:15 -0700
Martin J. Bligh [EMAIL PROTECTED] wrote:

 Why isn't this easily fixable by just adding an additional dirty
 flag that says atime has changed? Then we only cause a write
 when we remove the inode from the inode cache, if only atime
 is updated.

I think that could be made to work, and it would fix the performance
issue.

It is a behaviour change.  At present ext3 (for example) commits everything
every five seconds.  After a change like this, a crash+recovery could cause
a file's atime to go backwards by an arbitrarily large time interval - it
could easily be months.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Alan Cox
On Wed, 08 Aug 2007 15:39:52 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:

 Bill Davidsen wrote:
  Being standards compliant is not an argument it's a design goal, a 
  requirement. Standards compliance is like pregant, you are or you're 
 
 Linux history says different.  There was always the final 1% of 
 compliance that required silliness we really did not want to bother with.

This isn't about the 1% however. Its about API and ABI. Changing the
default is a fairly evil ABI change. Telling everyone relatime is cool on
desktops and defaulting it in the distro is not an ABI change and is very
sensible
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Martin Bligh

Andrew Morton wrote:

On Wed, 08 Aug 2007 14:10:15 -0700
Martin J. Bligh [EMAIL PROTECTED] wrote:


Why isn't this easily fixable by just adding an additional dirty
flag that says atime has changed? Then we only cause a write
when we remove the inode from the inode cache, if only atime
is updated.


I think that could be made to work, and it would fix the performance
issue.

It is a behaviour change.  At present ext3 (for example) commits everything
every five seconds.  After a change like this, a crash+recovery could cause
a file's atime to go backwards by an arbitrarily large time interval - it
could easily be months.


A second pdflush / workqueue at a slower rate would alleviate that.

Yes, it's a semantic change ... but only in an incredibly small
corner-case ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Greg Trounson

Ingo Molnar wrote:

* Alan Cox [EMAIL PROTECTED] wrote:

People just need to know about the performance differences - very 
few realise its more than a fraction of a percent. I'm sure Gentoo 
will use relatime the moment anyone knows its  5% 8)
noatime,nodiratime gave 50% of wall-clock kernel rpm build 
performance improvement for Dave Jones, on a beefy box. Unless i 
misunderstood what you meant under 'fraction of a percent' your 
numbers are _WAY_ off.

What numbers - I didn't quote any performance numbers ?


ok, i misunderstood your very few realise its more than a fraction of a 
percent sentence, i thought you were saying it's a fraction of a 
percent.


Measurements show that noatime helps 20-30% on regular desktop 
workloads, easily 50% for kernel builds and much more than that (in 
excess of 100%) for file-read-intense workloads. We cannot just walk 
past such a _huge_ performance impact so easily without even reacting to 
the performance arguments, and i'm happy Ubuntu picked up 
noatime,nodiratime and is whipping up the floor with Fedora on the 
desktop.




Sorry I'm just not seeing those gains here.  With my filesystems mounted with atime 
defaults the Quake sources build in 1m28.856s.  A test with ls -ltu verifies that atime is 
working as expected.  When I remount my filesystems with:

mount [fs] -o remount,noatime,nodiratime
I get a compile time of 1m23.368s, a mere 6% improvement.

This is on a dual-core Athlon 4200+ box running 2.6.21, so I would have thought this to be 
close to a best-case file I/O test.


Greg
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread david

On Thu, 9 Aug 2007, Greg Trounson wrote:


 Measurements show that noatime helps 20-30% on regular desktop workloads,
 easily 50% for kernel builds and much more than that (in excess of 100%)
 for file-read-intense workloads. We cannot just walk past such a _huge_
 performance impact so easily without even reacting to the performance
 arguments, and i'm happy Ubuntu picked up noatime,nodiratime and is
 whipping up the floor with Fedora on the desktop.



Sorry I'm just not seeing those gains here.  With my filesystems mounted with 
atime defaults the Quake sources build in 1m28.856s.  A test with ls -ltu 
verifies that atime is working as expected.  When I remount my filesystems 
with:

mount [fs] -o remount,noatime,nodiratime
I get a compile time of 1m23.368s, a mere 6% improvement.

This is on a dual-core Athlon 4200+ box running 2.6.21, so I would have 
thought this to be close to a best-case file I/O test.


what sort of disks does this box have? and what filesystem? slower 
disks/filesystems can result in this showing a larger difference.


however 6% is a fairly significant gain.

David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread Andi Kleen
Greg Trounson [EMAIL PROTECTED] writes:

 mount [fs] -o remount,noatime,nodiratime

nodiratime is implied in noatime.

 I get a compile time of 1m23.368s, a mere 6% improvement.

6% is nothing to sneeze at. A lot of optimizations would kill for less

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread david

On Sat, 4 Aug 2007, Ray Lee wrote:


On 8/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Sat, 4 Aug 2007, Ingo Molnar wrote:


At least on a surface level, your report has some similarities to
http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller
mentions several things he tried without effect:

 - I increased the max allowed receive buffer through
 proc/sys/net/core/rmem_max and the application calls the right
 syscall. netstat -su does not show any packet receive errors.


mercury1:/proc/sys/net/core# cat rmem_*
124928
131071
mercury1:/proc/sys/net/core# netstat -su
Udp:
697853177 packets received
10025642 packets to unknown port received.
191726680 packet receive errors
63194 packets sent
RcvbufErrors: 191726680
UdpLite:
mercury1:/proc/sys/net/core# echo 512000 rmem_max


 - After getting kernel: swapper: page allocation failure.
 order:0, mode:0x20, I increased /proc/sys/vm/min_free_kbytes


I have not seen any similar errors


 - ixgb.txt in kernel network documentation suggests to increase
 net.core.netdev_max_backlog to 30. This did not help.


mercury1:/proc/sys/net/core# cat netdev_*
300
1000
mercury1:/proc/sys/net/core# echo 30 netdev_max_backlog


 - I also had to increase net.core.optmem_max, because the default
 value was too small for 700 multicast groups.


I'm not running multicast.


As they're all pretty simple to test, it may be worthwhile to give
them a shot just to rule things out.


unfortunantly the load is not high enough right now to see a real 
difference (it's only doing ~1400 logs/sec) I'll catch it at a higher load 
point to see if these make any difference.


David Lang


Ray


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Bill Davidsen

Claudio Martins wrote:

On Saturday 04 August 2007, Alan Cox wrote:

Linux has never been a "suprise your kernel interfaces all just changed
today" kernel, nor a "gosh you upgraded and didn't notice your backups
broke" kernel.



 Can you give examples of backup solutions that rely on atime being updated?
I can understand backup tools using mtime/ctime for incremental backups (like 
tar + Amanda, etc), but I'm having trouble figuring out why someone would 
want to use atime for that.



Programs which migrate unused files or delete them are the usual cases.

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Alan Cox
> However, relatime has the POSIX behavior without the overhead. Therefore 

No. relatime has approximately SuS behaviour. Its not the same as
"correct" behaviour.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Bill Davidsen

Alan Cox wrote:
i cannot over-emphasise how much of a deal it is in practice. Atime 
updates are by far the biggest IO performance deficiency that Linux has 
today. Getting rid of atime updates would give us more everyday Linux 
performance than all the pagecache speedups of the past 10 years, 
_combined_.


it's also perhaps the most stupid Unix design idea of all times. Unix is 
really nice and well done, but think about this a bit:


Think about the user for a moment instead. 


Do things right. The job of the kernel is not to "correct" for
distribution policy decisions. The distributions need to change policy.
You do that by showing the distributions the numbers. 


With a Red Hat on if we can move from /dev/hda to /dev/sda in FC7 then we
can move from atime to noatime by default on FC8 with appropriate release
note warnings and having a couple of betas to find out what other than
mutt goes boom.


Is there really enough benefit between relatime and noatime to justify 
that? If atime doesn't get updated at all it *will* impact operations, 
and unless there's a real performance gain the path which provides at 
least nominal POSIX compliance seems best.


Plauger's law of least astonishment.

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Bill Davidsen

Jeff Garzik wrote:

Alan Cox wrote:

In some setups it will and in others it won't. Nor is it the only
application that has this requirement. Ext3 currently is a standards
compliant file system. Turn off atime and its very non standards
compliant, turn to relatime and its not standards compliant but nobody
will break (which is good)


Linux has always been a "POSIX unless its stupid" type of system.  For 
the upstream kernel, we should do the right thing -- noatime by default 
-- but allow distros and people that care about rigid compliance to 
easily change the default.


However, relatime has the POSIX behavior without the overhead. Therefore 
that (and maybe reldiratime?) are a far better choice. I don't see a big 
problem with some version of utils not supporting it, since it can be in 
the kernel and will be in the utils soon enough. We have lived without 
it this long, sounds as if we could live a bit longer.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Ingo Molnar

* Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> > Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it 
> > might finally make sense to do so.
> 
> Do we report max(ctime, mtime) as the atime by default when noatime is 
> set or do we still need that to be done?

noatime is unchanged by my patch (it is not the same as the 'improved 
relatime' mode my patch activates), but it would make sense to do your 
change, independently.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Ingo Molnar

* Chuck Ebbert [EMAIL PROTECTED] wrote:

  Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it 
  might finally make sense to do so.
 
 Do we report max(ctime, mtime) as the atime by default when noatime is 
 set or do we still need that to be done?

noatime is unchanged by my patch (it is not the same as the 'improved 
relatime' mode my patch activates), but it would make sense to do your 
change, independently.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Bill Davidsen

Jeff Garzik wrote:

Alan Cox wrote:

In some setups it will and in others it won't. Nor is it the only
application that has this requirement. Ext3 currently is a standards
compliant file system. Turn off atime and its very non standards
compliant, turn to relatime and its not standards compliant but nobody
will break (which is good)


Linux has always been a POSIX unless its stupid type of system.  For 
the upstream kernel, we should do the right thing -- noatime by default 
-- but allow distros and people that care about rigid compliance to 
easily change the default.


However, relatime has the POSIX behavior without the overhead. Therefore 
that (and maybe reldiratime?) are a far better choice. I don't see a big 
problem with some version of utils not supporting it, since it can be in 
the kernel and will be in the utils soon enough. We have lived without 
it this long, sounds as if we could live a bit longer.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Bill Davidsen

Alan Cox wrote:
i cannot over-emphasise how much of a deal it is in practice. Atime 
updates are by far the biggest IO performance deficiency that Linux has 
today. Getting rid of atime updates would give us more everyday Linux 
performance than all the pagecache speedups of the past 10 years, 
_combined_.


it's also perhaps the most stupid Unix design idea of all times. Unix is 
really nice and well done, but think about this a bit:


Think about the user for a moment instead. 


Do things right. The job of the kernel is not to correct for
distribution policy decisions. The distributions need to change policy.
You do that by showing the distributions the numbers. 


With a Red Hat on if we can move from /dev/hda to /dev/sda in FC7 then we
can move from atime to noatime by default on FC8 with appropriate release
note warnings and having a couple of betas to find out what other than
mutt goes boom.


Is there really enough benefit between relatime and noatime to justify 
that? If atime doesn't get updated at all it *will* impact operations, 
and unless there's a real performance gain the path which provides at 
least nominal POSIX compliance seems best.


Plauger's law of least astonishment.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Alan Cox
 However, relatime has the POSIX behavior without the overhead. Therefore 

No. relatime has approximately SuS behaviour. Its not the same as
correct behaviour.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-07 Thread Bill Davidsen

Claudio Martins wrote:

On Saturday 04 August 2007, Alan Cox wrote:

Linux has never been a suprise your kernel interfaces all just changed
today kernel, nor a gosh you upgraded and didn't notice your backups
broke kernel.



 Can you give examples of backup solutions that rely on atime being updated?
I can understand backup tools using mtime/ctime for incremental backups (like 
tar + Amanda, etc), but I'm having trouble figuring out why someone would 
want to use atime for that.



Programs which migrate unused files or delete them are the usual cases.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Miklos Szeredi
> Per device dirty throttling patches

Andrew, may I inquire about your plans with this?

> These patches aim to improve balance_dirty_pages() and directly address three
> issues:
>   1) inter device starvation
>   2) stacked device deadlocks

This one interests me most, due to various real life, reported
problems with fuse filesystems.  For this reason I'd really like to
get this or a subset of it into mainline as soon as possible.

This patchset (or rather the -v7 version) has been running on my
laptop for a couple of weeks without problems.  I've also verified
that it solves the fuse and loop issues.

I have some qualms about the complexity of various parts though.
Especially the "proportions" library, which I'm having problems
understanding.  I'm not sure that this level of sophistication is
really needed to solve the issues with the old code.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Chuck Ebbert
On 08/06/2007 03:37 PM, Alan Cox wrote:
>> We already tried that here. The response: "If noatime is so great, why
>> isn't it the default in the kernel?"
> 
> Ok so we have a pile of people @redhat.com sitting on linux-kernel
> complaining about Red Hat distributions not taking it up. Guys - can
> we just fix it internally please like sensible folk ?
> 
> Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it might
> finally make sense to do so.

Do we report max(ctime, mtime) as the atime by default when noatime
is set or do we still need that to be done?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Alan Cox
> We already tried that here. The response: "If noatime is so great, why
> isn't it the default in the kernel?"

Ok so we have a pile of people @redhat.com sitting on linux-kernel
complaining about Red Hat distributions not taking it up. Guys - can
we just fix it internally please like sensible folk ?

Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it might
finally make sense to do so.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Jeff Garzik

Chuck Ebbert wrote:

On 08/05/2007 04:36 PM, Christoph Hellwig wrote:

Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.



We already tried that here. The response: "If noatime is so great, why
isn't it the default in the kernel?"


Yes, and around and around we go :/

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Chuck Ebbert
On 08/05/2007 04:36 PM, Christoph Hellwig wrote:
> 
> Umm, no f**king way.  atime selection is 100% policy and belongs into
> userspace.  Add to that the problem that we can't actually re-enable
> atimes because of the way the vfs-level mount flags API is designed.
> Instead of doing such a fugly kernel patch just talk to the handfull
> of distributions that matter to update their defaults.
> 

We already tried that here. The response: "If noatime is so great, why
isn't it the default in the kernel?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Dave Jones <[EMAIL PROTECTED]> wrote:

>  > does it work with the "atime on steroids" patch below? (no need to 
>  > configure anything, just apply the patch and go.)
> 
> people have reported that relatime does work, but my util-linux isn't 
> new enough to support it, so I've never got it to work. I'll give your 
> diff a try later, though as it seems to be equivalent I expect it'll 
> work.

would still be nice if you could test it and report back :)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Dave Jones
On Mon, Aug 06, 2007 at 08:39:09AM +0200, Ingo Molnar wrote:
 > 
 > * Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > >  > btw., Mutt does not go boom, i use it myself. It works just fine 
 > >  > and notices new mails even on a noatime,nodiratime filesystem.
 > >  
 > > It still fails miserably for me.
 > > 
 > > If I hit 'C' and '?' I get a list of my mail folders, with some of 
 > > them marked 'N' if they have new mail.  Without atime, those N's never 
 > > show up and every mbox looks like it has no new mail.
 > 
 > does it work with the "atime on steroids" patch below? (no need to 
 > configure anything, just apply the patch and go.)

people have reported that relatime does work, but my util-linux
isn't new enough to support it, so I've never got it to work.
I'll give your diff a try later, though as it seems to be
equivalent I expect it'll work.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Chris Mason
On Sun, 5 Aug 2007 11:00:29 -0400
Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote:
> > I always thought the right solution would be to just sync atime only
> > very very lazily. This means if a inode is only dirty because of an
> > atime update put it on a "only write out when there is nothing to do
> > or the memory is really needed" list.
> 
> As I've mentioend earlier, the memory balancing issues that arise when
> we add an "atime dirty" bit scare me a little.  It can be addressed,
> obviously, but at the cost of more code complexity.

ext3 and reiser both use a dirty_inode method to make sure that we
don't actually have dirty inodes.  This way, kswapd doesn't get stuck
on the log and is able to do real work.

It would be interesting to see a comparison of relatime with a kinoded
that is willing to get stuck on the log.  The FS would need a few
tweaks so that write_inode() could know if it really needed to log or
not, but for testing you could just drop ext3_dirty_inode and have
ext3_write_inode do real work.

Then just change kswapd to kick a new kinoded and benchmark away.  A
real patch would have to look for places where mark_inode_dirty was
used and expected the dirty_inode callback to log things right away,
but for testing its good enough.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Willy Tarreau
On Mon, Aug 06, 2007 at 08:57:12AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > In your example above, maybe it's the opposite, users know they can 
> > keep a file in /tmp one more week by simply cat'ing it.
> 
> sure - and i'm not arguing that noatime should the kernel-wide default. 
> In every single patch i sent it was a .config option (and a boot option 
> _and_ a sysctl option that i think you missed) that a user/distro 
> enables or disabled. But i think the /tmp argument is not very strong: 
> /tmp is fundamentally volatile, and you can grow dependencies on pretty 
> much _any_ aspect of the kernel. So the question isnt "is there impact" 
> (there is, at least for noatime), the question is "is it still worth 
> doing it".
> 
> > Changing the kernel in a non-easily reversible way is not kind to the 
> > users.
> 
> none of my patches did any of that...

I did not notice you talked about a sysctl. A sysctl provides the ability
to switch the behaviour without rebooting, while both the config option
and the command line require a reboot.

> anyway, my latest patch doesnt do noatime, it does the "more intelligent 
> relatime" approach.

... which is not equivalent noatime in the initial example.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Andi Kleen
On Sun, Aug 05, 2007 at 09:41:12PM +0100, Christoph Hellwig wrote:
> On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote:
> > I always thought the right solution would be to just sync atime only
> > very very lazily. This means if a inode is only dirty because of an
> > atime update put it on a "only write out when there is nothing to do
> > or the memory is really needed" list.
> 
> Which is the policy I implemented for XFS a while ago.

How would that work? I didn't think XFS had separate inode lists.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Brice Figureau
Hi Andi,

On Mon, 2007-08-06 at 00:17 +0200, Andi Kleen wrote:
> Brice Figureau <[EMAIL PROTECTED]> writes:
> > 
> >  2) I _still_ don't get the "performances" of 2.6.17, but since that's the
> > better combination I could get, I think there is IMHO progress in the right
> > direction (to be compared to no progress since 2.6.18, that's better :-)).
> 
> If you could characterize your workload well (e.g. how many disks,
> what file systems, what load on mysql) perhaps it would be possible
> to reproduce the problem with a test program or a mysql driver.
> Then it could be bisected.

My server is a Dell Poweredge 2850 (bi-Xeon EM64T 3GHz running without
HT, 4GB of RAM), with a Perc 4/Di (a LSI megaraid with a BBU of 256MB). 
The hardware RAID card has 2 channels, one is connected to 2 10k RPM
146GB SCSI disk that are mirrored in a RAID 1 array on which the system
resides (/dev/sda). The second channel is connected to 4 10k RPM 146GB
disks, on a RAID 10 array which contains the database files and database
logs (/dev/sdb).

The kernel and userspace are 64bits.
Above the hardware RAID arrays there is LVM2 with two physical groups
(one per array). The RAID10 has only one logical volume.

The database volume (the RAID10) is an ext3 volume mounted with
rw,noexec,nosuid,nodev,noatime,data=writeback.

The I/O scheduler on all arrays is deadline.

/proc knobs with values other than defaults are:
/proc/sys/vm/swappiness = 2
/proc/sys/vm/dirty_background_ratio = 1
/proc/sys/vm/dirty_ratio = 2
/proc/sys/vm/vfs_cache_pressure = 1

The only thing running on the server is mysql. 
Mysql memory footprint is about 90% of physical RAM. Mysql is configured
to use exclusively InnoDB.

Mysql accesses its database files in O_DIRECT mode.
Since the database fits in RAM, the only kind of access Mysql is doing
is writing to the innodb log, the mysql binlog and finally to the innodb
database files.
There are certainly a whole lot of fsync'ing happening.
All the database reads are done from the innodb in-RAM cache.

During all my kernel tests (see the original bug report) the machine was
not swapping (so that's not the reason of the stuttering).

If that helps:
db1:~# cat /proc/meminfo 
MemTotal:  4052420 kB
MemFree: 23972 kB
Buffers: 54420 kB
Cached: 168096 kB
SwapCached:1541744 kB
Active:3723468 kB
Inactive:   157180 kB
SwapTotal:11863960 kB
SwapFree: 10193064 kB
Dirty: 320 kB
Writeback:   0 kB
AnonPages: 3657744 kB
Mapped:  20508 kB
Slab:   119964 kB
SReclaimable:   103564 kB
SUnreclaim:  16400 kB
PageTables:   9408 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:  13890168 kB
Committed_AS:  3826764 kB
VmallocTotal: 34359738367 kB
VmallocUsed:268604 kB
VmallocChunk: 34359469435 kB
HugePages_Total: 0
HugePages_Free:  0
HugePages_Rsvd:  0
Hugepagesize: 2048 kB

An typical iostat (taken every 2s under light load):
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 2.000.003.50 0.0044.0012.57 
0.000.00   0.00   0.00
sdb   0.00 9.000.50   27.00 4.00   288.0010.62 
0.010.36   0.36   1.00

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdb   0.00   223.507.50  185.5060.00  5964.0031.21 
0.150.78   0.56  10.80

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 1.000.001.00 0.0015.9216.00 
0.000.00   0.00   0.00
sdb   0.00   198.01   19.90  156.22   159.20  2833.8316.99 
0.040.24   0.20   3.58

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdb   0.00 5.000.50   17.00 4.00   176.0010.29 
0.010.69   0.69   1.20

Would it help if I try blktrace on this server to capture the I/O ?
I enabled it while compiling the kernel, but I don't know yet how to use
it:
any pointer on how to activate it and capture useful information?

Many thanks,
-- 
Brice Figureau <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Diego Calleja <[EMAIL PROTECTED]> wrote:

> > Measurements show that noatime helps 20-30% on regular desktop 
> > workloads, easily 50% for kernel builds and much more than that (in 
> > excess of 100%) for file-read-intense workloads. We cannot just walk
> 
> And as everybody knows in servers is a popular practice to disable it. 
> According to an interview to the kernel.org admins

yeah - but i'd be surprised if more than 1% of all Linux servers out 
there had noatime.

> "Beyond that, Peter noted, "very little fancy is going on, and that is 
> good because fancy is hard to maintain." He explained that the only 
> fancy thing being done is that all filesystems are mounted noatime 
> meaning that the system doesn't have to make writes to the filesystem 
> for files which are simply being read, "that cut the load average in 
> half."

nice quote :-)

> I bet that some people would consider such performance hit a bug...

yeah.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> In your example above, maybe it's the opposite, users know they can 
> keep a file in /tmp one more week by simply cat'ing it.

sure - and i'm not arguing that noatime should the kernel-wide default. 
In every single patch i sent it was a .config option (and a boot option 
_and_ a sysctl option that i think you missed) that a user/distro 
enables or disabled. But i think the /tmp argument is not very strong: 
/tmp is fundamentally volatile, and you can grow dependencies on pretty 
much _any_ aspect of the kernel. So the question isnt "is there impact" 
(there is, at least for noatime), the question is "is it still worth 
doing it".

> Changing the kernel in a non-easily reversible way is not kind to the 
> users.

none of my patches did any of that...

anyway, my latest patch doesnt do noatime, it does the "more intelligent 
relatime" approach.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> i've been a linux sysadmin for 10 years, and have known about noatime 
> for at least 7 years, but I always thought of it in the catagory of 
> 'use it only on your performance critical machines where you are 
> trying to extract every ounce of performance, and keep an eye out for 
> things misbehaving'
> 
> I never imagined that itwas the 20%+ hit that is being described, and 
> with so little impact, or I would have switched to it across the board 
> years ago.
> 
> I'll bet there are a lot of admins out there in the same boat.
> 
> adding an option in the kernel to change the default sounds like a 
> very good first step, even if the default isn't changed today.

yep - but note that this was a gradual effect along the years, today the 
assymetry between CPU performance and disk-seek performance is 
proportionally larger than 10 years ago. Today CPUs are nearly 100 times 
faster than 10 years ago, but disk seeks got only 2-3 times faster. (and 
even that only if you have a high rpm disk - most desktops dont.)

10 years ago noatime was a nifty hack that made a difference if you had 
lots of files. But it still was a problem with no immediate easy 
solution and people developed their counter-arguments. Today the same 
counter-arguments are used, but the situation has evolved alot.

and note that often this has a bigger everyday effect than the tweaking 
of CPU scheduling, IO scheduling or swapping behavior (!). My desktop 
systems rarely swap, have plenty of CPU power to spare, but atime 
updates still have a noticeable latency impact, regardless of the memory 
pressure. Linux has _lots_ of "performance reserves", so people dont 
normally notice when comparing it to other OSs, but still we should not 
be so wasteful with our IO performance, for such a fundamental thing as 
reading files.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Dave Jones <[EMAIL PROTECTED]> wrote:

>  > btw., Mutt does not go boom, i use it myself. It works just fine 
>  > and notices new mails even on a noatime,nodiratime filesystem.
>  
> It still fails miserably for me.
> 
> If I hit 'C' and '?' I get a list of my mail folders, with some of 
> them marked 'N' if they have new mail.  Without atime, those N's never 
> show up and every mbox looks like it has no new mail.

does it work with the "atime on steroids" patch below? (no need to 
configure anything, just apply the patch and go.)

Ingo

--->
Subject: [patch] [patch] implement smarter atime updates support
From: Ingo Molnar <[EMAIL PROTECTED]>

change relatime updates to be performed once per day. This makes
relatime a compatible solution for HSM, mailer-notification and
tmpwatch applications too.

also add the CONFIG_DEFAULT_RELATIME kernel option, which makes
"norelatime" the default for all mounts without an extra kernel
boot option.

add the "default_relatime=0" boot option to turn this off.

also add the /proc/sys/kernel/default_relatime flag which can be changed
runtime to modify the behavior of subsequent new mounts.

tested by moving the date forward:

   # date
   Sun Aug  5 22:55:14 CEST 2007
   # date -s "Tue Aug  7 22:55:14 CEST 2007"
   Tue Aug  7 22:55:14 CEST 2007

access to a file did not generate disk IO before the date was set, and
it generated exactly one IO after the date was set.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |8 +
 fs/Kconfig  |   22 ++
 fs/inode.c  |   53 +++-
 fs/namespace.c  |   24 
 include/linux/mount.h   |3 ++
 kernel/sysctl.c |   17 +++
 6 files changed, 114 insertions(+), 13 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -525,6 +525,10 @@ and is between 256 and 4096 characters. 
This is a 16-member array composed of values
ranging from 0-255.
 
+   default_relatime=
+   [FS] mount all filesystems with relative atime
+   updates by default.
+
default_utf8=   [VT]
Format=<0|1>
Set system-wide default UTF-8 mode for all tty's.
@@ -1468,6 +1472,10 @@ and is between 256 and 4096 characters. 
Format: [,[,...]]
See arch/*/kernel/reboot.c or arch/*/kernel/process.c   

 
+   relatime_interval=
+   [FS] relative atime update frequency, in seconds.
+   (default: 1 day: 86400 seconds)
+
reserve=[KNL,BUGS] Force the kernel to ignore some iomem area
 
reservetop= [X86-32]
Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig
+++ linux/fs/Kconfig
@@ -2060,6 +2060,28 @@ config 9P_FS
 
 endmenu
 
+config DEFAULT_RELATIME
+   bool "Mount all filesystems with relatime by default"
+   default y
+   help
+ If you say Y here, all your filesystems will be mounted
+ with the "relatime" mount option. This eliminates many atime
+ ('file last accessed' timestamp) updates (which otherwise
+ is performed on every file access and generates a write
+ IO to the inode) and thus speeds up IO. Atime is still updated,
+ but only once per day.
+
+ The mtime ('file last modified') and ctime ('file created')
+ timestamp are unaffected by this change.
+
+ Use the "norelatime" kernel boot option to turn off this
+ feature.
+
+config DEFAULT_RELATIME_VAL
+   int
+   default "1" if DEFAULT_RELATIME
+   default "0"
+
 if BLOCK
 menu "Partition Types"
 
Index: linux/fs/inode.c
===
--- linux.orig/fs/inode.c
+++ linux/fs/inode.c
@@ -1162,6 +1162,41 @@ sector_t bmap(struct inode * inode, sect
 }
 EXPORT_SYMBOL(bmap);
 
+/*
+ * Relative atime updates frequency (default: 1 day):
+ */
+int relatime_interval __read_mostly = 24*60*60;
+
+/*
+ * With relative atime, only update atime if the
+ * previous atime is earlier than either the ctime or
+ * mtime.
+ */
+static int relatime_need_update(struct inode *inode, struct timespec now)
+{
+   /*
+* Is mtime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(>i_mtime, >i_atime) >= 0)
+   return 1;
+   /*
+* Is ctime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(>i_ctime, >i_atime) >= 0)
+   return 1;
+
+   /*

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Dave Jones [EMAIL PROTECTED] wrote:

   btw., Mutt does not go boom, i use it myself. It works just fine 
   and notices new mails even on a noatime,nodiratime filesystem.
  
 It still fails miserably for me.
 
 If I hit 'C' and '?' I get a list of my mail folders, with some of 
 them marked 'N' if they have new mail.  Without atime, those N's never 
 show up and every mbox looks like it has no new mail.

does it work with the atime on steroids patch below? (no need to 
configure anything, just apply the patch and go.)

Ingo

---
Subject: [patch] [patch] implement smarter atime updates support
From: Ingo Molnar [EMAIL PROTECTED]

change relatime updates to be performed once per day. This makes
relatime a compatible solution for HSM, mailer-notification and
tmpwatch applications too.

also add the CONFIG_DEFAULT_RELATIME kernel option, which makes
norelatime the default for all mounts without an extra kernel
boot option.

add the default_relatime=0 boot option to turn this off.

also add the /proc/sys/kernel/default_relatime flag which can be changed
runtime to modify the behavior of subsequent new mounts.

tested by moving the date forward:

   # date
   Sun Aug  5 22:55:14 CEST 2007
   # date -s Tue Aug  7 22:55:14 CEST 2007
   Tue Aug  7 22:55:14 CEST 2007

access to a file did not generate disk IO before the date was set, and
it generated exactly one IO after the date was set.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 Documentation/kernel-parameters.txt |8 +
 fs/Kconfig  |   22 ++
 fs/inode.c  |   53 +++-
 fs/namespace.c  |   24 
 include/linux/mount.h   |3 ++
 kernel/sysctl.c |   17 +++
 6 files changed, 114 insertions(+), 13 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -525,6 +525,10 @@ and is between 256 and 4096 characters. 
This is a 16-member array composed of values
ranging from 0-255.
 
+   default_relatime=
+   [FS] mount all filesystems with relative atime
+   updates by default.
+
default_utf8=   [VT]
Format=0|1
Set system-wide default UTF-8 mode for all tty's.
@@ -1468,6 +1472,10 @@ and is between 256 and 4096 characters. 
Format: reboot_mode[,reboot_mode2[,...]]
See arch/*/kernel/reboot.c or arch/*/kernel/process.c   

 
+   relatime_interval=
+   [FS] relative atime update frequency, in seconds.
+   (default: 1 day: 86400 seconds)
+
reserve=[KNL,BUGS] Force the kernel to ignore some iomem area
 
reservetop= [X86-32]
Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig
+++ linux/fs/Kconfig
@@ -2060,6 +2060,28 @@ config 9P_FS
 
 endmenu
 
+config DEFAULT_RELATIME
+   bool Mount all filesystems with relatime by default
+   default y
+   help
+ If you say Y here, all your filesystems will be mounted
+ with the relatime mount option. This eliminates many atime
+ ('file last accessed' timestamp) updates (which otherwise
+ is performed on every file access and generates a write
+ IO to the inode) and thus speeds up IO. Atime is still updated,
+ but only once per day.
+
+ The mtime ('file last modified') and ctime ('file created')
+ timestamp are unaffected by this change.
+
+ Use the norelatime kernel boot option to turn off this
+ feature.
+
+config DEFAULT_RELATIME_VAL
+   int
+   default 1 if DEFAULT_RELATIME
+   default 0
+
 if BLOCK
 menu Partition Types
 
Index: linux/fs/inode.c
===
--- linux.orig/fs/inode.c
+++ linux/fs/inode.c
@@ -1162,6 +1162,41 @@ sector_t bmap(struct inode * inode, sect
 }
 EXPORT_SYMBOL(bmap);
 
+/*
+ * Relative atime updates frequency (default: 1 day):
+ */
+int relatime_interval __read_mostly = 24*60*60;
+
+/*
+ * With relative atime, only update atime if the
+ * previous atime is earlier than either the ctime or
+ * mtime.
+ */
+static int relatime_need_update(struct inode *inode, struct timespec now)
+{
+   /*
+* Is mtime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(inode-i_mtime, inode-i_atime) = 0)
+   return 1;
+   /*
+* Is ctime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(inode-i_ctime, inode-i_atime) = 0)
+   return 1;
+
+   

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 i've been a linux sysadmin for 10 years, and have known about noatime 
 for at least 7 years, but I always thought of it in the catagory of 
 'use it only on your performance critical machines where you are 
 trying to extract every ounce of performance, and keep an eye out for 
 things misbehaving'
 
 I never imagined that itwas the 20%+ hit that is being described, and 
 with so little impact, or I would have switched to it across the board 
 years ago.
 
 I'll bet there are a lot of admins out there in the same boat.
 
 adding an option in the kernel to change the default sounds like a 
 very good first step, even if the default isn't changed today.

yep - but note that this was a gradual effect along the years, today the 
assymetry between CPU performance and disk-seek performance is 
proportionally larger than 10 years ago. Today CPUs are nearly 100 times 
faster than 10 years ago, but disk seeks got only 2-3 times faster. (and 
even that only if you have a high rpm disk - most desktops dont.)

10 years ago noatime was a nifty hack that made a difference if you had 
lots of files. But it still was a problem with no immediate easy 
solution and people developed their counter-arguments. Today the same 
counter-arguments are used, but the situation has evolved alot.

and note that often this has a bigger everyday effect than the tweaking 
of CPU scheduling, IO scheduling or swapping behavior (!). My desktop 
systems rarely swap, have plenty of CPU power to spare, but atime 
updates still have a noticeable latency impact, regardless of the memory 
pressure. Linux has _lots_ of performance reserves, so people dont 
normally notice when comparing it to other OSs, but still we should not 
be so wasteful with our IO performance, for such a fundamental thing as 
reading files.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Willy Tarreau [EMAIL PROTECTED] wrote:

 In your example above, maybe it's the opposite, users know they can 
 keep a file in /tmp one more week by simply cat'ing it.

sure - and i'm not arguing that noatime should the kernel-wide default. 
In every single patch i sent it was a .config option (and a boot option 
_and_ a sysctl option that i think you missed) that a user/distro 
enables or disabled. But i think the /tmp argument is not very strong: 
/tmp is fundamentally volatile, and you can grow dependencies on pretty 
much _any_ aspect of the kernel. So the question isnt is there impact 
(there is, at least for noatime), the question is is it still worth 
doing it.

 Changing the kernel in a non-easily reversible way is not kind to the 
 users.

none of my patches did any of that...

anyway, my latest patch doesnt do noatime, it does the more intelligent 
relatime approach.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Diego Calleja [EMAIL PROTECTED] wrote:

  Measurements show that noatime helps 20-30% on regular desktop 
  workloads, easily 50% for kernel builds and much more than that (in 
  excess of 100%) for file-read-intense workloads. We cannot just walk
 
 And as everybody knows in servers is a popular practice to disable it. 
 According to an interview to the kernel.org admins

yeah - but i'd be surprised if more than 1% of all Linux servers out 
there had noatime.

 Beyond that, Peter noted, very little fancy is going on, and that is 
 good because fancy is hard to maintain. He explained that the only 
 fancy thing being done is that all filesystems are mounted noatime 
 meaning that the system doesn't have to make writes to the filesystem 
 for files which are simply being read, that cut the load average in 
 half.

nice quote :-)

 I bet that some people would consider such performance hit a bug...

yeah.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Brice Figureau
Hi Andi,

On Mon, 2007-08-06 at 00:17 +0200, Andi Kleen wrote:
 Brice Figureau [EMAIL PROTECTED] writes:
  
   2) I _still_ don't get the performances of 2.6.17, but since that's the
  better combination I could get, I think there is IMHO progress in the right
  direction (to be compared to no progress since 2.6.18, that's better :-)).
 
 If you could characterize your workload well (e.g. how many disks,
 what file systems, what load on mysql) perhaps it would be possible
 to reproduce the problem with a test program or a mysql driver.
 Then it could be bisected.

My server is a Dell Poweredge 2850 (bi-Xeon EM64T 3GHz running without
HT, 4GB of RAM), with a Perc 4/Di (a LSI megaraid with a BBU of 256MB). 
The hardware RAID card has 2 channels, one is connected to 2 10k RPM
146GB SCSI disk that are mirrored in a RAID 1 array on which the system
resides (/dev/sda). The second channel is connected to 4 10k RPM 146GB
disks, on a RAID 10 array which contains the database files and database
logs (/dev/sdb).

The kernel and userspace are 64bits.
Above the hardware RAID arrays there is LVM2 with two physical groups
(one per array). The RAID10 has only one logical volume.

The database volume (the RAID10) is an ext3 volume mounted with
rw,noexec,nosuid,nodev,noatime,data=writeback.

The I/O scheduler on all arrays is deadline.

/proc knobs with values other than defaults are:
/proc/sys/vm/swappiness = 2
/proc/sys/vm/dirty_background_ratio = 1
/proc/sys/vm/dirty_ratio = 2
/proc/sys/vm/vfs_cache_pressure = 1

The only thing running on the server is mysql. 
Mysql memory footprint is about 90% of physical RAM. Mysql is configured
to use exclusively InnoDB.

Mysql accesses its database files in O_DIRECT mode.
Since the database fits in RAM, the only kind of access Mysql is doing
is writing to the innodb log, the mysql binlog and finally to the innodb
database files.
There are certainly a whole lot of fsync'ing happening.
All the database reads are done from the innodb in-RAM cache.

During all my kernel tests (see the original bug report) the machine was
not swapping (so that's not the reason of the stuttering).

If that helps:
db1:~# cat /proc/meminfo 
MemTotal:  4052420 kB
MemFree: 23972 kB
Buffers: 54420 kB
Cached: 168096 kB
SwapCached:1541744 kB
Active:3723468 kB
Inactive:   157180 kB
SwapTotal:11863960 kB
SwapFree: 10193064 kB
Dirty: 320 kB
Writeback:   0 kB
AnonPages: 3657744 kB
Mapped:  20508 kB
Slab:   119964 kB
SReclaimable:   103564 kB
SUnreclaim:  16400 kB
PageTables:   9408 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:  13890168 kB
Committed_AS:  3826764 kB
VmallocTotal: 34359738367 kB
VmallocUsed:268604 kB
VmallocChunk: 34359469435 kB
HugePages_Total: 0
HugePages_Free:  0
HugePages_Rsvd:  0
Hugepagesize: 2048 kB

An typical iostat (taken every 2s under light load):
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 2.000.003.50 0.0044.0012.57 
0.000.00   0.00   0.00
sdb   0.00 9.000.50   27.00 4.00   288.0010.62 
0.010.36   0.36   1.00

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdb   0.00   223.507.50  185.5060.00  5964.0031.21 
0.150.78   0.56  10.80

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 1.000.001.00 0.0015.9216.00 
0.000.00   0.00   0.00
sdb   0.00   198.01   19.90  156.22   159.20  2833.8316.99 
0.040.24   0.20   3.58

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdb   0.00 5.000.50   17.00 4.00   176.0010.29 
0.010.69   0.69   1.20

Would it help if I try blktrace on this server to capture the I/O ?
I enabled it while compiling the kernel, but I don't know yet how to use
it:
any pointer on how to activate it and capture useful information?

Many thanks,
-- 
Brice Figureau [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Andi Kleen
On Sun, Aug 05, 2007 at 09:41:12PM +0100, Christoph Hellwig wrote:
 On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote:
  I always thought the right solution would be to just sync atime only
  very very lazily. This means if a inode is only dirty because of an
  atime update put it on a only write out when there is nothing to do
  or the memory is really needed list.
 
 Which is the policy I implemented for XFS a while ago.

How would that work? I didn't think XFS had separate inode lists.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Willy Tarreau
On Mon, Aug 06, 2007 at 08:57:12AM +0200, Ingo Molnar wrote:
 
 * Willy Tarreau [EMAIL PROTECTED] wrote:
 
  In your example above, maybe it's the opposite, users know they can 
  keep a file in /tmp one more week by simply cat'ing it.
 
 sure - and i'm not arguing that noatime should the kernel-wide default. 
 In every single patch i sent it was a .config option (and a boot option 
 _and_ a sysctl option that i think you missed) that a user/distro 
 enables or disabled. But i think the /tmp argument is not very strong: 
 /tmp is fundamentally volatile, and you can grow dependencies on pretty 
 much _any_ aspect of the kernel. So the question isnt is there impact 
 (there is, at least for noatime), the question is is it still worth 
 doing it.
 
  Changing the kernel in a non-easily reversible way is not kind to the 
  users.
 
 none of my patches did any of that...

I did not notice you talked about a sysctl. A sysctl provides the ability
to switch the behaviour without rebooting, while both the config option
and the command line require a reboot.

 anyway, my latest patch doesnt do noatime, it does the more intelligent 
 relatime approach.

... which is not equivalent noatime in the initial example.

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Chris Mason
On Sun, 5 Aug 2007 11:00:29 -0400
Theodore Tso [EMAIL PROTECTED] wrote:

 On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote:
  I always thought the right solution would be to just sync atime only
  very very lazily. This means if a inode is only dirty because of an
  atime update put it on a only write out when there is nothing to do
  or the memory is really needed list.
 
 As I've mentioend earlier, the memory balancing issues that arise when
 we add an atime dirty bit scare me a little.  It can be addressed,
 obviously, but at the cost of more code complexity.

ext3 and reiser both use a dirty_inode method to make sure that we
don't actually have dirty inodes.  This way, kswapd doesn't get stuck
on the log and is able to do real work.

It would be interesting to see a comparison of relatime with a kinoded
that is willing to get stuck on the log.  The FS would need a few
tweaks so that write_inode() could know if it really needed to log or
not, but for testing you could just drop ext3_dirty_inode and have
ext3_write_inode do real work.

Then just change kswapd to kick a new kinoded and benchmark away.  A
real patch would have to look for places where mark_inode_dirty was
used and expected the dirty_inode callback to log things right away,
but for testing its good enough.

-chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Dave Jones
On Mon, Aug 06, 2007 at 08:39:09AM +0200, Ingo Molnar wrote:
  
  * Dave Jones [EMAIL PROTECTED] wrote:
  
 btw., Mutt does not go boom, i use it myself. It works just fine 
 and notices new mails even on a noatime,nodiratime filesystem.

   It still fails miserably for me.
   
   If I hit 'C' and '?' I get a list of my mail folders, with some of 
   them marked 'N' if they have new mail.  Without atime, those N's never 
   show up and every mbox looks like it has no new mail.
  
  does it work with the atime on steroids patch below? (no need to 
  configure anything, just apply the patch and go.)

people have reported that relatime does work, but my util-linux
isn't new enough to support it, so I've never got it to work.
I'll give your diff a try later, though as it seems to be
equivalent I expect it'll work.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Ingo Molnar

* Dave Jones [EMAIL PROTECTED] wrote:

   does it work with the atime on steroids patch below? (no need to 
   configure anything, just apply the patch and go.)
 
 people have reported that relatime does work, but my util-linux isn't 
 new enough to support it, so I've never got it to work. I'll give your 
 diff a try later, though as it seems to be equivalent I expect it'll 
 work.

would still be nice if you could test it and report back :)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Chuck Ebbert
On 08/05/2007 04:36 PM, Christoph Hellwig wrote:
 
 Umm, no f**king way.  atime selection is 100% policy and belongs into
 userspace.  Add to that the problem that we can't actually re-enable
 atimes because of the way the vfs-level mount flags API is designed.
 Instead of doing such a fugly kernel patch just talk to the handfull
 of distributions that matter to update their defaults.
 

We already tried that here. The response: If noatime is so great, why
isn't it the default in the kernel?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Jeff Garzik

Chuck Ebbert wrote:

On 08/05/2007 04:36 PM, Christoph Hellwig wrote:

Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.



We already tried that here. The response: If noatime is so great, why
isn't it the default in the kernel?


Yes, and around and around we go :/

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Alan Cox
 We already tried that here. The response: If noatime is so great, why
 isn't it the default in the kernel?

Ok so we have a pile of people @redhat.com sitting on linux-kernel
complaining about Red Hat distributions not taking it up. Guys - can
we just fix it internally please like sensible folk ?

Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it might
finally make sense to do so.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Chuck Ebbert
On 08/06/2007 03:37 PM, Alan Cox wrote:
 We already tried that here. The response: If noatime is so great, why
 isn't it the default in the kernel?
 
 Ok so we have a pile of people @redhat.com sitting on linux-kernel
 complaining about Red Hat distributions not taking it up. Guys - can
 we just fix it internally please like sensible folk ?
 
 Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it might
 finally make sense to do so.

Do we report max(ctime, mtime) as the atime by default when noatime
is set or do we still need that to be done?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-06 Thread Miklos Szeredi
 Per device dirty throttling patches

Andrew, may I inquire about your plans with this?

 These patches aim to improve balance_dirty_pages() and directly address three
 issues:
   1) inter device starvation
   2) stacked device deadlocks

This one interests me most, due to various real life, reported
problems with fuse filesystems.  For this reason I'd really like to
get this or a subset of it into mainline as soon as possible.

This patchset (or rather the -v7 version) has been running on my
laptop for a couple of weeks without problems.  I've also verified
that it solves the fuse and loop issues.

I have some qualms about the complexity of various parts though.
Especially the proportions library, which I'm having problems
understanding.  I'm not sure that this level of sophistication is
really needed to solve the issues with the old code.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >