Re: do_generic_mapping_read performance issue

2007-03-13 Thread Jan Kara
On Tue 13-03-07 14:43:47, Ashif Harji wrote:
> 
> >>I would like to submit a patch to fix the performance problem.  The
> >>simplest solution is to remove the check.  Even in the situation where an
> >>application does not read in PAGE_SIZE multiples as described above, if
> >>the page is accessed frequently it should remain in the cache.  However, I
> >>am open to suggestions for a more sophisticated scheme.
> >
> > Well, yes, that's an obvious solution. I just wanted to make sure that
> >such change won't break some other load. But so far it seems it won't...
> 
> I would like to find out if the change I am suggesting will cause any 
> other problems.  Is there someone else I should contact before submitting 
> a patch?  Or should I just wait another day or two for comments?
  I don't think you'll get more feedback now. Just write the patch,
benchmark it with your workload and also some others (basically, there should be
no difference) and then submit the patch together with the performance
numbers and some rationale (if it's your first submission, see
Documentation/SubmittingPatches). Give also CC to Andrew <[EMAIL PROTECTED]>
and ask him to put it into -mm kernels for testing.

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-13 Thread Ashif Harji



I would like to submit a patch to fix the performance problem.  The
simplest solution is to remove the check.  Even in the situation where an
application does not read in PAGE_SIZE multiples as described above, if
the page is accessed frequently it should remain in the cache.  However, I
am open to suggestions for a more sophisticated scheme.


 Well, yes, that's an obvious solution. I just wanted to make sure that
such change won't break some other load. But so far it seems it won't...


I would like to find out if the change I am suggesting will cause any 
other problems.  Is there someone else I should contact before submitting 
a patch?  Or should I just wait another day or two for comments?


ashif.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-13 Thread Ashif Harji



I would like to submit a patch to fix the performance problem.  The
simplest solution is to remove the check.  Even in the situation where an
application does not read in PAGE_SIZE multiples as described above, if
the page is accessed frequently it should remain in the cache.  However, I
am open to suggestions for a more sophisticated scheme.


 Well, yes, that's an obvious solution. I just wanted to make sure that
such change won't break some other load. But so far it seems it won't...


I would like to find out if the change I am suggesting will cause any 
other problems.  Is there someone else I should contact before submitting 
a patch?  Or should I just wait another day or two for comments?


ashif.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-13 Thread Jan Kara
On Tue 13-03-07 14:43:47, Ashif Harji wrote:
 
 I would like to submit a patch to fix the performance problem.  The
 simplest solution is to remove the check.  Even in the situation where an
 application does not read in PAGE_SIZE multiples as described above, if
 the page is accessed frequently it should remain in the cache.  However, I
 am open to suggestions for a more sophisticated scheme.
 
  Well, yes, that's an obvious solution. I just wanted to make sure that
 such change won't break some other load. But so far it seems it won't...
 
 I would like to find out if the change I am suggesting will cause any 
 other problems.  Is there someone else I should contact before submitting 
 a patch?  Or should I just wait another day or two for comments?
  I don't think you'll get more feedback now. Just write the patch,
benchmark it with your workload and also some others (basically, there should be
no difference) and then submit the patch together with the performance
numbers and some rationale (if it's your first submission, see
Documentation/SubmittingPatches). Give also CC to Andrew [EMAIL PROTECTED]
and ask him to put it into -mm kernels for testing.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara
On Mon 12-03-07 13:05:17, Ashif Harji wrote:
> 
> On Mon, 12 Mar 2007, Jan Kara wrote:
> 
> >On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> >>On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> >>>  Hi,
> >>>
> Hi, I am encountering a performance problem, which I have tracked into 
> the
> Linux kernel. The problem occurs with my experimental web server that 
> uses
> sendfile to repeatedly transmit files.  The files are based on the 
> static
> portion of the SPECweb99 fileset and range in size to model a reasonable
> workload.  With this workload, a significant number of the requests are
> for files of size 4 KB or less.
> 
> I have determined that the performance problems occurs in the function
> do_generic_mapping_read in file mm/filemap.c for kernel version 
> 2.6.20.1.
> Here is the specific code fragment:
> 
> /*
>  * When (part of) the same page is read multiple times
>  * in succession, only mark it as accessed the first time.
>  */
> if (prev_index != index)
> mark_page_accessed(page);
> >>>  Actually, the code is like that certainly for two years :).
> >>
> >>Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> >>stage (ie. read from the start of a page -- obviously that also has 
> >>holes).
> > Yes, at least in 2.6.12-rc5 which is the first one in git :).
> >
> I was wondering if anyone could explain why the call to 
> mark_page_accessed
> is conditional? That is, what problem it is trying to solve. It would 
> seem
> that in many scenarios, if the same page is accessed repeatedly, then it
> would be appropriate to keep that page cached.
> >>>  I also don't know why the condition is there but it's there at least
> >>>for two years so I'm not sure anybody remembers ;). Nick, do you have
> >>>an idea?
> >>
> >>Yeah it is there because that is basically how our "use once" detection
> >>handles the case where an app does not read in chunks that are PAGE_SIZE
> >>multiples and PAGE_SIZE aligned.
> > OK, I see. Then I'm not sure the check does more good than bad. Because
> >if we happen to reread the same chunk several times, then the check does a
> >wrong thing...
> 
> I would like to submit a patch to fix the performance problem.  The 
> simplest solution is to remove the check.  Even in the situation where an 
> application does not read in PAGE_SIZE multiples as described above, if 
> the page is accessed frequently it should remain in the cache.  However, I 
> am open to suggestions for a more sophisticated scheme.
  Well, yes, that's an obvious solution. I just wanted to make sure that
such change won't break some other load. But so far it seems it won't...

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Ashif Harji


On Mon, 12 Mar 2007, Jan Kara wrote:


On Mon 12-03-07 15:39:00, Nick Piggin wrote:

On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:

  Hi,


Hi, I am encountering a performance problem, which I have tracked into the
Linux kernel. The problem occurs with my experimental web server that uses
sendfile to repeatedly transmit files.  The files are based on the static
portion of the SPECweb99 fileset and range in size to model a reasonable
workload.  With this workload, a significant number of the requests are
for files of size 4 KB or less.

I have determined that the performance problems occurs in the function
do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
Here is the specific code fragment:

/*
 * When (part of) the same page is read multiple times
 * in succession, only mark it as accessed the first time.
 */
if (prev_index != index)
mark_page_accessed(page);

  Actually, the code is like that certainly for two years :).


Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).

 Yes, at least in 2.6.12-rc5 which is the first one in git :).


I was wondering if anyone could explain why the call to mark_page_accessed
is conditional? That is, what problem it is trying to solve. It would seem
that in many scenarios, if the same page is accessed repeatedly, then it
would be appropriate to keep that page cached.

  I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?


Yeah it is there because that is basically how our "use once" detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.

 OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...


Thanks for providing me with additional information.

I would like to submit a patch to fix the performance problem.  The 
simplest solution is to remove the check.  Even in the situation where an 
application does not read in PAGE_SIZE multiples as described above, if 
the page is accessed frequently it should remain in the cache.  However, I 
am open to suggestions for a more sophisticated scheme.


ashif.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Ashif Harji



The implication of this code is that for files of size less than or equal
to a single page, the page associated with such a file is likely to get
evicted from the cache regardless of how frequently it is accessed.  The
reason is that after the first access, prev_index is always zero and index
can only be zero. Hence, mark_page_accessed is never called after the
first time the file is requested.  As a result, the page is evicted from
the cache no matter how frequently it is used.  By changing the kernel to
always call mark_page_accessed for these files, the server throughput is
increased by as much as 20%.

 Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...


Yes, this problem requires the file to be accessed via the same file 
descriptor.  I suspect this behaviour is common for a certain class of 
applications, e.g., web servers.  However, any application that read from 
the same page repeatedly would also experience this problem.  It is more 
likely to occur with small files, but not exculsively.


ashif.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara
On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> >   Hi,
> > 
> > > Hi, I am encountering a performance problem, which I have tracked into 
> > > the 
> > > Linux kernel. The problem occurs with my experimental web server that 
> > > uses 
> > > sendfile to repeatedly transmit files.  The files are based on the static 
> > > portion of the SPECweb99 fileset and range in size to model a reasonable 
> > > workload.  With this workload, a significant number of the requests are 
> > > for files of size 4 KB or less.
> > > 
> > > I have determined that the performance problems occurs in the function
> > > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > > Here is the specific code fragment:
> > > 
> > > /*
> > >  * When (part of) the same page is read multiple times
> > >  * in succession, only mark it as accessed the first time.
> > >  */
> > > if (prev_index != index)
> > > mark_page_accessed(page);
> >   Actually, the code is like that certainly for two years :).
> 
> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> stage (ie. read from the start of a page -- obviously that also has holes).
  Yes, at least in 2.6.12-rc5 which is the first one in git :).

> > > I was wondering if anyone could explain why the call to 
> > > mark_page_accessed 
> > > is conditional? That is, what problem it is trying to solve. It would 
> > > seem 
> > > that in many scenarios, if the same page is accessed repeatedly, then it 
> > > would be appropriate to keep that page cached.
> >   I also don't know why the condition is there but it's there at least
> > for two years so I'm not sure anybody remembers ;). Nick, do you have
> > an idea?
> 
> Yeah it is there because that is basically how our "use once" detection
> handles the case where an app does not read in chunks that are PAGE_SIZE
> multiples and PAGE_SIZE aligned.
  OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
>   Hi,
> 
> > Hi, I am encountering a performance problem, which I have tracked into the 
> > Linux kernel. The problem occurs with my experimental web server that uses 
> > sendfile to repeatedly transmit files.  The files are based on the static 
> > portion of the SPECweb99 fileset and range in size to model a reasonable 
> > workload.  With this workload, a significant number of the requests are 
> > for files of size 4 KB or less.
> > 
> > I have determined that the performance problems occurs in the function
> > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > Here is the specific code fragment:
> > 
> > /*
> >  * When (part of) the same page is read multiple times
> >  * in succession, only mark it as accessed the first time.
> >  */
> > if (prev_index != index)
> > mark_page_accessed(page);
>   Actually, the code is like that certainly for two years :).

Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).

> > The implication of this code is that for files of size less than or equal 
> > to a single page, the page associated with such a file is likely to get 
> > evicted from the cache regardless of how frequently it is accessed.  The 
> > reason is that after the first access, prev_index is always zero and index 
> > can only be zero. Hence, mark_page_accessed is never called after the 
> > first time the file is requested.  As a result, the page is evicted from 
> > the cache no matter how frequently it is used.  By changing the kernel to 
> > always call mark_page_accessed for these files, the server throughput is 
> > increased by as much as 20%.
>   Your analysis seems to be right. But to observe this behaviour you have
> to have the file open and just always reread it using the same file
> descriptor, don't you? That's probably not too common...
> 
> > I was wondering if anyone could explain why the call to mark_page_accessed 
> > is conditional? That is, what problem it is trying to solve. It would seem 
> > that in many scenarios, if the same page is accessed repeatedly, then it 
> > would be appropriate to keep that page cached.
>   I also don't know why the condition is there but it's there at least
> for two years so I'm not sure anybody remembers ;). Nick, do you have
> an idea?

Yeah it is there because that is basically how our "use once" detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara
  Hi,

> Hi, I am encountering a performance problem, which I have tracked into the 
> Linux kernel. The problem occurs with my experimental web server that uses 
> sendfile to repeatedly transmit files.  The files are based on the static 
> portion of the SPECweb99 fileset and range in size to model a reasonable 
> workload.  With this workload, a significant number of the requests are 
> for files of size 4 KB or less.
> 
> I have determined that the performance problems occurs in the function
> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> Here is the specific code fragment:
> 
> /*
>  * When (part of) the same page is read multiple times
>  * in succession, only mark it as accessed the first time.
>  */
> if (prev_index != index)
> mark_page_accessed(page);
  Actually, the code is like that certainly for two years :).

> The implication of this code is that for files of size less than or equal 
> to a single page, the page associated with such a file is likely to get 
> evicted from the cache regardless of how frequently it is accessed.  The 
> reason is that after the first access, prev_index is always zero and index 
> can only be zero. Hence, mark_page_accessed is never called after the 
> first time the file is requested.  As a result, the page is evicted from 
> the cache no matter how frequently it is used.  By changing the kernel to 
> always call mark_page_accessed for these files, the server throughput is 
> increased by as much as 20%.
  Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...

> I was wondering if anyone could explain why the call to mark_page_accessed 
> is conditional? That is, what problem it is trying to solve. It would seem 
> that in many scenarios, if the same page is accessed repeatedly, then it 
> would be appropriate to keep that page cached.
  I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara
  Hi,

 Hi, I am encountering a performance problem, which I have tracked into the 
 Linux kernel. The problem occurs with my experimental web server that uses 
 sendfile to repeatedly transmit files.  The files are based on the static 
 portion of the SPECweb99 fileset and range in size to model a reasonable 
 workload.  With this workload, a significant number of the requests are 
 for files of size 4 KB or less.
 
 I have determined that the performance problems occurs in the function
 do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
 Here is the specific code fragment:
 
 /*
  * When (part of) the same page is read multiple times
  * in succession, only mark it as accessed the first time.
  */
 if (prev_index != index)
 mark_page_accessed(page);
  Actually, the code is like that certainly for two years :).

 The implication of this code is that for files of size less than or equal 
 to a single page, the page associated with such a file is likely to get 
 evicted from the cache regardless of how frequently it is accessed.  The 
 reason is that after the first access, prev_index is always zero and index 
 can only be zero. Hence, mark_page_accessed is never called after the 
 first time the file is requested.  As a result, the page is evicted from 
 the cache no matter how frequently it is used.  By changing the kernel to 
 always call mark_page_accessed for these files, the server throughput is 
 increased by as much as 20%.
  Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...

 I was wondering if anyone could explain why the call to mark_page_accessed 
 is conditional? That is, what problem it is trying to solve. It would seem 
 that in many scenarios, if the same page is accessed repeatedly, then it 
 would be appropriate to keep that page cached.
  I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
   Hi,
 
  Hi, I am encountering a performance problem, which I have tracked into the 
  Linux kernel. The problem occurs with my experimental web server that uses 
  sendfile to repeatedly transmit files.  The files are based on the static 
  portion of the SPECweb99 fileset and range in size to model a reasonable 
  workload.  With this workload, a significant number of the requests are 
  for files of size 4 KB or less.
  
  I have determined that the performance problems occurs in the function
  do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
  Here is the specific code fragment:
  
  /*
   * When (part of) the same page is read multiple times
   * in succession, only mark it as accessed the first time.
   */
  if (prev_index != index)
  mark_page_accessed(page);
   Actually, the code is like that certainly for two years :).

Did it always use ra-prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).

  The implication of this code is that for files of size less than or equal 
  to a single page, the page associated with such a file is likely to get 
  evicted from the cache regardless of how frequently it is accessed.  The 
  reason is that after the first access, prev_index is always zero and index 
  can only be zero. Hence, mark_page_accessed is never called after the 
  first time the file is requested.  As a result, the page is evicted from 
  the cache no matter how frequently it is used.  By changing the kernel to 
  always call mark_page_accessed for these files, the server throughput is 
  increased by as much as 20%.
   Your analysis seems to be right. But to observe this behaviour you have
 to have the file open and just always reread it using the same file
 descriptor, don't you? That's probably not too common...
 
  I was wondering if anyone could explain why the call to mark_page_accessed 
  is conditional? That is, what problem it is trying to solve. It would seem 
  that in many scenarios, if the same page is accessed repeatedly, then it 
  would be appropriate to keep that page cached.
   I also don't know why the condition is there but it's there at least
 for two years so I'm not sure anybody remembers ;). Nick, do you have
 an idea?

Yeah it is there because that is basically how our use once detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara
On Mon 12-03-07 15:39:00, Nick Piggin wrote:
 On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
Hi,
  
   Hi, I am encountering a performance problem, which I have tracked into 
   the 
   Linux kernel. The problem occurs with my experimental web server that 
   uses 
   sendfile to repeatedly transmit files.  The files are based on the static 
   portion of the SPECweb99 fileset and range in size to model a reasonable 
   workload.  With this workload, a significant number of the requests are 
   for files of size 4 KB or less.
   
   I have determined that the performance problems occurs in the function
   do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
   Here is the specific code fragment:
   
   /*
* When (part of) the same page is read multiple times
* in succession, only mark it as accessed the first time.
*/
   if (prev_index != index)
   mark_page_accessed(page);
Actually, the code is like that certainly for two years :).
 
 Did it always use ra-prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
 stage (ie. read from the start of a page -- obviously that also has holes).
  Yes, at least in 2.6.12-rc5 which is the first one in git :).

   I was wondering if anyone could explain why the call to 
   mark_page_accessed 
   is conditional? That is, what problem it is trying to solve. It would 
   seem 
   that in many scenarios, if the same page is accessed repeatedly, then it 
   would be appropriate to keep that page cached.
I also don't know why the condition is there but it's there at least
  for two years so I'm not sure anybody remembers ;). Nick, do you have
  an idea?
 
 Yeah it is there because that is basically how our use once detection
 handles the case where an app does not read in chunks that are PAGE_SIZE
 multiples and PAGE_SIZE aligned.
  OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Ashif Harji



The implication of this code is that for files of size less than or equal
to a single page, the page associated with such a file is likely to get
evicted from the cache regardless of how frequently it is accessed.  The
reason is that after the first access, prev_index is always zero and index
can only be zero. Hence, mark_page_accessed is never called after the
first time the file is requested.  As a result, the page is evicted from
the cache no matter how frequently it is used.  By changing the kernel to
always call mark_page_accessed for these files, the server throughput is
increased by as much as 20%.

 Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...


Yes, this problem requires the file to be accessed via the same file 
descriptor.  I suspect this behaviour is common for a certain class of 
applications, e.g., web servers.  However, any application that read from 
the same page repeatedly would also experience this problem.  It is more 
likely to occur with small files, but not exculsively.


ashif.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Ashif Harji


On Mon, 12 Mar 2007, Jan Kara wrote:


On Mon 12-03-07 15:39:00, Nick Piggin wrote:

On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:

  Hi,


Hi, I am encountering a performance problem, which I have tracked into the
Linux kernel. The problem occurs with my experimental web server that uses
sendfile to repeatedly transmit files.  The files are based on the static
portion of the SPECweb99 fileset and range in size to model a reasonable
workload.  With this workload, a significant number of the requests are
for files of size 4 KB or less.

I have determined that the performance problems occurs in the function
do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
Here is the specific code fragment:

/*
 * When (part of) the same page is read multiple times
 * in succession, only mark it as accessed the first time.
 */
if (prev_index != index)
mark_page_accessed(page);

  Actually, the code is like that certainly for two years :).


Did it always use ra-prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).

 Yes, at least in 2.6.12-rc5 which is the first one in git :).


I was wondering if anyone could explain why the call to mark_page_accessed
is conditional? That is, what problem it is trying to solve. It would seem
that in many scenarios, if the same page is accessed repeatedly, then it
would be appropriate to keep that page cached.

  I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?


Yeah it is there because that is basically how our use once detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.

 OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...


Thanks for providing me with additional information.

I would like to submit a patch to fix the performance problem.  The 
simplest solution is to remove the check.  Even in the situation where an 
application does not read in PAGE_SIZE multiples as described above, if 
the page is accessed frequently it should remain in the cache.  However, I 
am open to suggestions for a more sophisticated scheme.


ashif.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara
On Mon 12-03-07 13:05:17, Ashif Harji wrote:
 
 On Mon, 12 Mar 2007, Jan Kara wrote:
 
 On Mon 12-03-07 15:39:00, Nick Piggin wrote:
 On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
   Hi,
 
 Hi, I am encountering a performance problem, which I have tracked into 
 the
 Linux kernel. The problem occurs with my experimental web server that 
 uses
 sendfile to repeatedly transmit files.  The files are based on the 
 static
 portion of the SPECweb99 fileset and range in size to model a reasonable
 workload.  With this workload, a significant number of the requests are
 for files of size 4 KB or less.
 
 I have determined that the performance problems occurs in the function
 do_generic_mapping_read in file mm/filemap.c for kernel version 
 2.6.20.1.
 Here is the specific code fragment:
 
 /*
  * When (part of) the same page is read multiple times
  * in succession, only mark it as accessed the first time.
  */
 if (prev_index != index)
 mark_page_accessed(page);
   Actually, the code is like that certainly for two years :).
 
 Did it always use ra-prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
 stage (ie. read from the start of a page -- obviously that also has 
 holes).
  Yes, at least in 2.6.12-rc5 which is the first one in git :).
 
 I was wondering if anyone could explain why the call to 
 mark_page_accessed
 is conditional? That is, what problem it is trying to solve. It would 
 seem
 that in many scenarios, if the same page is accessed repeatedly, then it
 would be appropriate to keep that page cached.
   I also don't know why the condition is there but it's there at least
 for two years so I'm not sure anybody remembers ;). Nick, do you have
 an idea?
 
 Yeah it is there because that is basically how our use once detection
 handles the case where an app does not read in chunks that are PAGE_SIZE
 multiples and PAGE_SIZE aligned.
  OK, I see. Then I'm not sure the check does more good than bad. Because
 if we happen to reread the same chunk several times, then the check does a
 wrong thing...
 
 I would like to submit a patch to fix the performance problem.  The 
 simplest solution is to remove the check.  Even in the situation where an 
 application does not read in PAGE_SIZE multiples as described above, if 
 the page is accessed frequently it should remain in the cache.  However, I 
 am open to suggestions for a more sophisticated scheme.
  Well, yes, that's an obvious solution. I just wanted to make sure that
such change won't break some other load. But so far it seems it won't...

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/