Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Rik van Riel

On Wed, 30 May 2001, Andrea Arcangeli wrote:
> On Wed, May 30, 2001 at 08:57:50PM +0200, Yoann Vandoorselaere wrote:
> > I remember the 2.3.51 kernel as the most usable kernel I ever used 
> > talking about VM.
> 
> I also don't remeber anything strange in that kernel about the VM (I
> instead remeber well the VM breakage introduced in 2.3.99-pre).
> 
> Regardless of what 2.3.51 was doing, the falling back into the lower
> zones before starting the balancing is fine.

The problem with 2.3.51 was that it started balancing
the HIGHMEM zone before falling back.

On a 1GB system this lead not only to the system starting
to swap as soon as the 128MB highmem zone was filled up,
it also resulted in the other 900MB being essentially
unused.

Having your 1GB system running as if it had 128MB definately
can be classified as Not Fun.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Andrea Arcangeli

On Wed, May 30, 2001 at 08:57:50PM +0200, Yoann Vandoorselaere wrote:
> I remember the 2.3.51 kernel as the most usable kernel I ever used 
> talking about VM.

I also don't remeber anything strange in that kernel about the VM (I
instead remeber well the VM breakage introduced in 2.3.99-pre).

Regardless of what 2.3.51 was doing, the falling back into the lower
zones before starting the balancing is fine.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Andrea Arcangeli

On Wed, May 30, 2001 at 03:42:51PM -0300, Rik van Riel wrote:
> On Wed, 30 May 2001 [EMAIL PROTECTED] wrote:
> 
> > btw, I think such heuristic is horribly broken ;), the highmem zone
> > simply needs to be balanced if it is under the pages_low mark, just
> > skipping it and falling back into the normal zone that happens to be
> > above the low mark is the wrong thing to do.
> 
> 2.3.51 did this, we all know the result.

I've no idea about what 2.3.51 does, but I was obviously wrong about
that. Forget such what I said above.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Yoann Vandoorselaere

Rik van Riel <[EMAIL PROTECTED]> writes:

> On Wed, 30 May 2001 [EMAIL PROTECTED] wrote:
> 
> > btw, I think such heuristic is horribly broken ;), the highmem zone
> > simply needs to be balanced if it is under the pages_low mark, just
> > skipping it and falling back into the normal zone that happens to be
> > above the low mark is the wrong thing to do.
> 
> 2.3.51 did this, we all know the result.

Just a note, 
I remember the 2.3.51 kernel as the most usable kernel I ever used 
talking about VM.

-- 
Yoann Vandoorselaere | C makes it easy to shoot yourself in the foot. C++ makes
MandrakeSoft | it harder, but when you do, it blows away your whole
 | leg. - Bjarne Stroustrup
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Rik van Riel

On Wed, 30 May 2001, Jens Axboe wrote:

> You are right, this is definitely something that needs checking. I
> really want this to work though. Rik, Andrea? Will the balancing
> handle the extra zone?

In as far as it handles balancing the current zones,
it'll also work with one more. In places where it's
currently broken it will probably also break with one
extra zone, though the fact that the DMA32 zone takes
the pressure off the NORMAL zone might actually help.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Rik van Riel

On Wed, 30 May 2001 [EMAIL PROTECTED] wrote:

> btw, I think such heuristic is horribly broken ;), the highmem zone
> simply needs to be balanced if it is under the pages_low mark, just
> skipping it and falling back into the normal zone that happens to be
> above the low mark is the wrong thing to do.

2.3.51 did this, we all know the result.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread andrea

On Wed, May 30, 2001 at 11:59:50AM +0100, Mark Hemment wrote:
>   Now, when HIGHMEM allocations come in (for page cache pages), they
>   skip the HIGH zone and use the NORMAL zone (as it now has plenty
>   of free pages) - the code at the top of __alloc_pages(), which
>   checks against ->pages_low.

btw, I think such heuristic is horribly broken ;), the highmem zone
simply needs to be balanced if it is under the pages_low mark, just
skipping it and falling back into the normal zone that happens to be
above the low mark is the wrong thing to do.

>   Also, the problem isn't as bad as it first looks - HIGHMEM page-cache
> pages do get "recycled" (reclaimed), but there is a slight imbalance.

there will always be some imbalance unless all allocations would be
capable of highmem (which will never happen). The only thing we can do
is to optimize the zone usage so we won't run out of normal pages unless
there was a good reason. Once we run out of normal pages we'll simply
return NULL and the reserved pool of highmem bounces will be used
instead (other callers will behave differently).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Jens Axboe

On Wed, May 30 2001, Andrea Arcangeli wrote:
> > >   I did change the patch so that bounce-pages always come from the NORMAL
> > > zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
> > > I'm not 100% sure the VM is capable of keeping the zones it already has
> > > balanced - and adding another one might break the camels back.  But as the
> > > test box has 4GB, it wasn't bouncing anyway.
> > 
> > You are right, this is definitely something that needs checking. I
> > really want this to work though. Rik, Andrea? Will the balancing handle
> > the extra zone?
> 
> The bounces can came from the ZONE_NORMAL without problems, however the

Of course

> ZONE_DMA32 way is fine too, but yes probably it isn't needed in real
> life unless you do an huge amount of I/O at the same time. If you want

It's not strictly needed, but it does buy us 3 extra gig to do I/O from
an a pae enabled x86.

> to reduce the amount of changes you can defer the zone_dma32 patch and
> possibly plug it in later.

Yes, I did modular patches for this reason.

Thanks!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Andrea Arcangeli

[ my usual email is offline at the moment, please CC to [EMAIL PROTECTED]
  for anything urgent until the problem is fixed ]

On Wed, May 30, 2001 at 11:55:38AM +0200, Jens Axboe wrote:
> On Wed, May 30 2001, Mark Hemment wrote:
> > Hi Jens,
> > 
> >   I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
> > modified qlogic fibre channel driver with 32disks hanging off it, without
> > any problems.  The test used was SpecFS 2.0
> 
> Cool, could you send me the qlogic diff? It's the one-liner can_dma32
> chance I'm interested in, I'm just not sure what driver you used :-)
> I'll add that to the patch then. Basically all the PCI cards should
> work, I'm just being cautious and only enabling highmem I/O to the ones
> that have been tested.
> 
> >   Peformance is definitely up - but I can't give an exact number, as the
> > run with this patch was compiled with no-omit-frame-pointer for debugging
> > any probs.
> 
> Good
> 
> >   I did change the patch so that bounce-pages always come from the NORMAL
> > zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
> > I'm not 100% sure the VM is capable of keeping the zones it already has
> > balanced - and adding another one might break the camels back.  But as the
> > test box has 4GB, it wasn't bouncing anyway.
> 
> You are right, this is definitely something that needs checking. I
> really want this to work though. Rik, Andrea? Will the balancing handle
> the extra zone?

The bounces can came from the ZONE_NORMAL without problems, however the
ZONE_DMA32 way is fine too, but yes probably it isn't needed in real
life unless you do an huge amount of I/O at the same time. If you want
to reduce the amount of changes you can defer the zone_dma32 patch and
possibly plug it in later.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Jens Axboe

On Wed, May 30 2001, Mark Hemment wrote:
> On Wed, 30 May 2001, Jens Axboe wrote:
> > On Wed, May 30 2001, Mark Hemment wrote:
> > >   This can lead to attempt_merge() releasing the embedded request
> > > structure (which, as an extract copy, has the ->q set, so to
> > > blkdev_release_request() it looks like a request which originated from
> > > the block layer).  This isn't too healthy.
> > > 
> > >   The fix here is to add a check in __scsi_merge_requests_fn() to check
> > > for ->special being non-NULL.
> > 
> > How about just adding 
> > 
> > if (req->cmd != next->cmd
> > || req->rq_dev != next->rq_dev
> > || req->nr_sectors + next->nr_sectors > q->max_sectors
> > || next->sem || req->special)
> > return;
> > 
> > ie check for special too, that would make sense to me. Either way would
> > work, but I'd rather make this explicit in the block layer that 'not
> > normal' requests are left alone. That includes stuff with the sem set,
> > or special.
> 
> 
>   Yes, that is an equivalent fix.
> 
>   In the original patch I wanted to keep the change local (ie. in the SCSI
> layer).  Pushing the check up the generic block layer makes sense.

Ok, so we agree.

>   Are you going to push this change to Linus, or should I?
>   I'm assuming the other scsi-layer changes in Alan's tree will eventually
> be pushed.

I'll push it, I'll do the end_that_request_first thing too.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Mark Hemment


On Wed, 30 May 2001, Jens Axboe wrote:
> On Wed, May 30 2001, Mark Hemment wrote:
> >   This can lead to attempt_merge() releasing the embedded request
> > structure (which, as an extract copy, has the ->q set, so to
> > blkdev_release_request() it looks like a request which originated from
> > the block layer).  This isn't too healthy.
> > 
> >   The fix here is to add a check in __scsi_merge_requests_fn() to check
> > for ->special being non-NULL.
> 
> How about just adding 
> 
>   if (req->cmd != next->cmd
>   || req->rq_dev != next->rq_dev
>   || req->nr_sectors + next->nr_sectors > q->max_sectors
>   || next->sem || req->special)
> return;
> 
> ie check for special too, that would make sense to me. Either way would
> work, but I'd rather make this explicit in the block layer that 'not
> normal' requests are left alone. That includes stuff with the sem set,
> or special.


  Yes, that is an equivalent fix.

  In the original patch I wanted to keep the change local (ie. in the SCSI
layer).  Pushing the check up the generic block layer makes sense.

  Are you going to push this change to Linus, or should I?
  I'm assuming the other scsi-layer changes in Alan's tree will eventually
be pushed.

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Jens Axboe

On Wed, May 30 2001, Mark Hemment wrote:
> Hi again, :)
> 
> On Tue, 29 May 2001, Jens Axboe wrote:
> > Another day, another version.
> > 
> > Bugs fixed in this version: none
> > Known bugs in this version: none
> > 
> > In other words, it's perfect of course.
> 
>   With the scsi-high patch, I'm not sure about the removal of the line
> from __scsi_end_request();
> 
>   req->buffer = bh->b_data;

Why?

>   A requeued request is not always processed immediately, so new
> buffer-heads arriving at the block-layer can be merged against it.  A
> requeued request is placed at the head of a request list, so
> nothing can merge with it - but what about if multiple requests are
> requeued on the same queue?

You forget that SCSI is not head-active, so there can indeed be merges
against a request that was re-added to the queue list.

>   When processing the completion of a SCSI request in a bottom-half,
> __scsi_end_request() can find all the buffers associated with the request
> haven't been completed (ie. leftovers).
> 
>   One question is; can this ever happen?

Yes it can happen.

>   The request is re-queued to the block layer via 
> scsi_queue_next_request(), which uses the "special" pointer in the request
> structure to remember the Scsi_Cmnd associated with the request.  The SCSI
> request function is then called, but doesn't guarantee to immediately
> process the re-queued request even though it was added at the head (say,
> the queue has become plugged).  This can trigger two possible bugs.
> 
>   The first is that __scsi_end_request() doesn't decrement the
> hard_nr_sectors count in the request.  As the request is back on the
> queue, it is possible for newly arriving buffer-heads to merge with the
> heads already hanging off the request.  This merging uses the
> hard_nr_sectors when calculating both the merged hard_nr_sectors and
> nr_sectors counts.

Right, that looks like a bug. I would prefer SCSI using
end_that_request_first here actually.

>   As the request is at the head, only back-merging can occur, but if
> __scsi_end_request() triggers another uncompleted request to be re-queued,
> it is possible to get front merging as well.

There can be front merges too. If a head is active, then no merging can
occcur. But for SCSI, the front request must always be in a sane state.
Or bad things can happen, like you describe.

>   The merging of a re-queued request looks safe, except for the
> hard_nr_sectors.  This patch corrects the hard_nr_sectors accounting.

Right

>   The second bug is from request merging in attempt_merge().
> 
>   For a re-queued request, the request structure is the one embedded in
> the Scsi_Cmnd (which is a copy of the request taken in the 
> scsi_request_fn).
>   In attempt_merge(), q->merge_requests_fn() is called to see the requests
> are allowed to merge.  __scsi_merge_requests_fn() checks number of
> segments, etc, but doesn't check if one of the requests is a re-queued one
> (ie. no test against ->special).
>   This can lead to attempt_merge() releasing the embedded request
> structure (which, as an extract copy, has the ->q set, so to
> blkdev_release_request() it looks like a request which originated from
> the block layer).  This isn't too healthy.
> 
>   The fix here is to add a check in __scsi_merge_requests_fn() to check
> for ->special being non-NULL.

How about just adding 

if (req->cmd != next->cmd
|| req->rq_dev != next->rq_dev
|| req->nr_sectors + next->nr_sectors > q->max_sectors
|| next->sem || req->special)
return;

ie check for special too, that would make sense to me. Either way would
work, but I'd rather make this explicit in the block layer that 'not
normal' requests are left alone. That includes stuff with the sem set,
or special.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Mark Hemment

Hi again, :)

On Tue, 29 May 2001, Jens Axboe wrote:
> Another day, another version.
> 
> Bugs fixed in this version: none
> Known bugs in this version: none
> 
> In other words, it's perfect of course.

  With the scsi-high patch, I'm not sure about the removal of the line
from __scsi_end_request();

req->buffer = bh->b_data;

  A requeued request is not always processed immediately, so new
buffer-heads arriving at the block-layer can be merged against it.  A
requeued request is placed at the head of a request list, so
nothing can merge with it - but what about if multiple requests are
requeued on the same queue?

  In Linus's tree, requests requeued via the SCSI layer can cause problems
(corruption).  I sent out a patch to cover this a few months back, which
got picked up by Alan (its in the -ac series - see the changes to
scsi_lib.c and scsi_merge.c) but no one posted any feedback.
  I've included some of the original message below.

Mark


--
>From [EMAIL PROTECTED] Sat Mar 31 16:07:14 2001 +0100
Date: Sat, 31 Mar 2001 16:07:13 +0100 (BST)
From: Mark Hemment <[EMAIL PROTECTED]>
Subject: [PATCH] Possible SCSI + block-layer bugs

Hi,

  I've never seen these trigger, but they look theoretically possible.

  When processing the completion of a SCSI request in a bottom-half,
__scsi_end_request() can find all the buffers associated with the request
haven't been completed (ie. leftovers).

  One question is; can this ever happen?
  If it can't then the code should be removed from __scsi_end_request(),
if it can happen then there appears to be a few problems;

  The request is re-queued to the block layer via 
scsi_queue_next_request(), which uses the "special" pointer in the request
structure to remember the Scsi_Cmnd associated with the request.  The SCSI
request function is then called, but doesn't guarantee to immediately
process the re-queued request even though it was added at the head (say,
the queue has become plugged).  This can trigger two possible bugs.

  The first is that __scsi_end_request() doesn't decrement the
hard_nr_sectors count in the request.  As the request is back on the
queue, it is possible for newly arriving buffer-heads to merge with the
heads already hanging off the request.  This merging uses the
hard_nr_sectors when calculating both the merged hard_nr_sectors and
nr_sectors counts.
  As the request is at the head, only back-merging can occur, but if
__scsi_end_request() triggers another uncompleted request to be re-queued,
it is possible to get front merging as well.

  The merging of a re-queued request looks safe, except for the
hard_nr_sectors.  This patch corrects the hard_nr_sectors accounting.


  The second bug is from request merging in attempt_merge().

  For a re-queued request, the request structure is the one embedded in
the Scsi_Cmnd (which is a copy of the request taken in the 
scsi_request_fn).
  In attempt_merge(), q->merge_requests_fn() is called to see the requests
are allowed to merge.  __scsi_merge_requests_fn() checks number of
segments, etc, but doesn't check if one of the requests is a re-queued one
(ie. no test against ->special).
  This can lead to attempt_merge() releasing the embedded request
structure (which, as an extract copy, has the ->q set, so to
blkdev_release_request() it looks like a request which originated from
the block layer).  This isn't too healthy.

  The fix here is to add a check in __scsi_merge_requests_fn() to check
for ->special being non-NULL.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Mark Hemment

On Wed, 30 May 2001, Jens Axboe wrote:
> On Wed, May 30 2001, Mark Hemment wrote:
> > Hi Jens,
> > 
> >   I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
> > modified qlogic fibre channel driver with 32disks hanging off it, without
> > any problems.  The test used was SpecFS 2.0
> 
> Cool, could you send me the qlogic diff? It's the one-liner can_dma32
> chance I'm interested in, I'm just not sure what driver you used :-)

  The qlogic driver is the one from;
http://www.feral.com/isp.html
I find this much more stable than the one already in the kernel.
  It did just need the one-liner change, but as the driver isn't in the
kernel there isn't much point adding it change to your patch. :)


> >   I did change the patch so that bounce-pages always come from the NORMAL
> > zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
> > I'm not 100% sure the VM is capable of keeping the zones it already has
> > balanced - and adding another one might break the camels back.  But as the
> > test box has 4GB, it wasn't bouncing anyway.
> 
> You are right, this is definitely something that needs checking. I
> really want this to work though. Rik, Andrea? Will the balancing handle
> the extra zone?

  In theory it should do - ie. there isn't anything to stop it.

  With NFS loads, over a ported VxFS filesystem, I do see some problems
between the NORMAL and HIGH zones.  Thinking about it, ZONE_DMA32
shouldn't make this any worse.

  Rik, Andrea, quick description of a balancing problem;
Consider a VM which is under load (but not stressed), such that
all zone free-page pools are between their MIN and LOW marks, with
pages in the inactive_clean lists.

The NORMAL zone has non-zero page order allocations thrown at
it.  This causes __alloc_pages() to reap pages from the NORMAL
inactive_clean list until the required buddy is built.  The blind
reaping causes the NORMAL zone to have a large number of free pages
(greater than ->pages_low).

Now, when HIGHMEM allocations come in (for page cache pages), they
skip the HIGH zone and use the NORMAL zone (as it now has plenty
of free pages) - the code at the top of __alloc_pages(), which
checks against ->pages_low.

But the NORMAL zone is usually under more pressure than the HIGH
zone - as many more allocations needed ready-mapped memory.  This
causes the page-cache pages from the NORMAL zone to come under
more pressure, and are "re-cycled" quicker than page-cache pages
in the HIGHMEM zone.

  OK, we shouldn't be throwing too many non-zero page allocations at
__alloc_pages(), but it does happen.
  Also, the problem isn't as bad as it first looks - HIGHMEM page-cache
pages do get "recycled" (reclaimed), but there is a slight imbalance.

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Jens Axboe

On Wed, May 30 2001, Mark Hemment wrote:
> Hi Jens,
> 
>   I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
> modified qlogic fibre channel driver with 32disks hanging off it, without
> any problems.  The test used was SpecFS 2.0

Cool, could you send me the qlogic diff? It's the one-liner can_dma32
chance I'm interested in, I'm just not sure what driver you used :-)
I'll add that to the patch then. Basically all the PCI cards should
work, I'm just being cautious and only enabling highmem I/O to the ones
that have been tested.

>   Peformance is definitely up - but I can't give an exact number, as the
> run with this patch was compiled with no-omit-frame-pointer for debugging
> any probs.

Good

>   I did change the patch so that bounce-pages always come from the NORMAL
> zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
> I'm not 100% sure the VM is capable of keeping the zones it already has
> balanced - and adding another one might break the camels back.  But as the
> test box has 4GB, it wasn't bouncing anyway.

You are right, this is definitely something that needs checking. I
really want this to work though. Rik, Andrea? Will the balancing handle
the extra zone?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Mark Hemment

Hi Jens,

  I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
modified qlogic fibre channel driver with 32disks hanging off it, without
any problems.  The test used was SpecFS 2.0

  Peformance is definitely up - but I can't give an exact number, as the
run with this patch was compiled with no-omit-frame-pointer for debugging
any probs.

  I did change the patch so that bounce-pages always come from the NORMAL
zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
I'm not 100% sure the VM is capable of keeping the zones it already has
balanced - and adding another one might break the camels back.  But as the
test box has 4GB, it wasn't bouncing anyway.

Mark


On Tue, 29 May 2001, Jens Axboe wrote:
> Another day, another version.
> 
> Bugs fixed in this version: none
> Known bugs in this version: none
> 
> In other words, it's perfect of course.
> 
> Changes:
> 
> - Added ide-dma segment coalescing
> - Only print highmem I/O enable info when HIGHMEM is actually set
> 
> Please give it a test spin, especially if you have 1GB of RAM or more.
> You should see something like this when booting:
> 
> hda: enabling highmem I/O
> ...
> SCSI: channel 0, id 0: enabling highmem I/O
> 
> depending on drive configuration etc.
> 
> Plea to maintainers of the different architectures: could you please add
> the arch parts to support this? This includes:
> 
> - memory zoning at init time
> - page_to_bus
> - pci_map_page / pci_unmap_page
> - set_bh_sg
> - KM_BH_IRQ (for HIGHMEM archs)
> 
> I think that's it, feel free to send me questions and (even better)
> patches.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Mark Hemment

Hi Jens,

  I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
modified qlogic fibre channel driver with 32disks hanging off it, without
any problems.  The test used was SpecFS 2.0

  Peformance is definitely up - but I can't give an exact number, as the
run with this patch was compiled with no-omit-frame-pointer for debugging
any probs.

  I did change the patch so that bounce-pages always come from the NORMAL
zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
I'm not 100% sure the VM is capable of keeping the zones it already has
balanced - and adding another one might break the camels back.  But as the
test box has 4GB, it wasn't bouncing anyway.

Mark


On Tue, 29 May 2001, Jens Axboe wrote:
 Another day, another version.
 
 Bugs fixed in this version: none
 Known bugs in this version: none
 
 In other words, it's perfect of course.
 
 Changes:
 
 - Added ide-dma segment coalescing
 - Only print highmem I/O enable info when HIGHMEM is actually set
 
 Please give it a test spin, especially if you have 1GB of RAM or more.
 You should see something like this when booting:
 
 hda: enabling highmem I/O
 ...
 SCSI: channel 0, id 0: enabling highmem I/O
 
 depending on drive configuration etc.
 
 Plea to maintainers of the different architectures: could you please add
 the arch parts to support this? This includes:
 
 - memory zoning at init time
 - page_to_bus
 - pci_map_page / pci_unmap_page
 - set_bh_sg
 - KM_BH_IRQ (for HIGHMEM archs)
 
 I think that's it, feel free to send me questions and (even better)
 patches.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Jens Axboe

On Wed, May 30 2001, Mark Hemment wrote:
 Hi Jens,
 
   I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
 modified qlogic fibre channel driver with 32disks hanging off it, without
 any problems.  The test used was SpecFS 2.0

Cool, could you send me the qlogic diff? It's the one-liner can_dma32
chance I'm interested in, I'm just not sure what driver you used :-)
I'll add that to the patch then. Basically all the PCI cards should
work, I'm just being cautious and only enabling highmem I/O to the ones
that have been tested.

   Peformance is definitely up - but I can't give an exact number, as the
 run with this patch was compiled with no-omit-frame-pointer for debugging
 any probs.

Good

   I did change the patch so that bounce-pages always come from the NORMAL
 zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
 I'm not 100% sure the VM is capable of keeping the zones it already has
 balanced - and adding another one might break the camels back.  But as the
 test box has 4GB, it wasn't bouncing anyway.

You are right, this is definitely something that needs checking. I
really want this to work though. Rik, Andrea? Will the balancing handle
the extra zone?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Mark Hemment

On Wed, 30 May 2001, Jens Axboe wrote:
 On Wed, May 30 2001, Mark Hemment wrote:
  Hi Jens,
  
I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
  modified qlogic fibre channel driver with 32disks hanging off it, without
  any problems.  The test used was SpecFS 2.0
 
 Cool, could you send me the qlogic diff? It's the one-liner can_dma32
 chance I'm interested in, I'm just not sure what driver you used :-)

  The qlogic driver is the one from;
http://www.feral.com/isp.html
I find this much more stable than the one already in the kernel.
  It did just need the one-liner change, but as the driver isn't in the
kernel there isn't much point adding it change to your patch. :)


I did change the patch so that bounce-pages always come from the NORMAL
  zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
  I'm not 100% sure the VM is capable of keeping the zones it already has
  balanced - and adding another one might break the camels back.  But as the
  test box has 4GB, it wasn't bouncing anyway.
 
 You are right, this is definitely something that needs checking. I
 really want this to work though. Rik, Andrea? Will the balancing handle
 the extra zone?

  In theory it should do - ie. there isn't anything to stop it.

  With NFS loads, over a ported VxFS filesystem, I do see some problems
between the NORMAL and HIGH zones.  Thinking about it, ZONE_DMA32
shouldn't make this any worse.

  Rik, Andrea, quick description of a balancing problem;
Consider a VM which is under load (but not stressed), such that
all zone free-page pools are between their MIN and LOW marks, with
pages in the inactive_clean lists.

The NORMAL zone has non-zero page order allocations thrown at
it.  This causes __alloc_pages() to reap pages from the NORMAL
inactive_clean list until the required buddy is built.  The blind
reaping causes the NORMAL zone to have a large number of free pages
(greater than -pages_low).

Now, when HIGHMEM allocations come in (for page cache pages), they
skip the HIGH zone and use the NORMAL zone (as it now has plenty
of free pages) - the code at the top of __alloc_pages(), which
checks against -pages_low.

But the NORMAL zone is usually under more pressure than the HIGH
zone - as many more allocations needed ready-mapped memory.  This
causes the page-cache pages from the NORMAL zone to come under
more pressure, and are re-cycled quicker than page-cache pages
in the HIGHMEM zone.

  OK, we shouldn't be throwing too many non-zero page allocations at
__alloc_pages(), but it does happen.
  Also, the problem isn't as bad as it first looks - HIGHMEM page-cache
pages do get recycled (reclaimed), but there is a slight imbalance.

Mark

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Andrea Arcangeli

[ my usual email is offline at the moment, please CC to [EMAIL PROTECTED]
  for anything urgent until the problem is fixed ]

On Wed, May 30, 2001 at 11:55:38AM +0200, Jens Axboe wrote:
 On Wed, May 30 2001, Mark Hemment wrote:
  Hi Jens,
  
I ran this (well, cut-two) on a 4-way box with 4GB of memory and a
  modified qlogic fibre channel driver with 32disks hanging off it, without
  any problems.  The test used was SpecFS 2.0
 
 Cool, could you send me the qlogic diff? It's the one-liner can_dma32
 chance I'm interested in, I'm just not sure what driver you used :-)
 I'll add that to the patch then. Basically all the PCI cards should
 work, I'm just being cautious and only enabling highmem I/O to the ones
 that have been tested.
 
Peformance is definitely up - but I can't give an exact number, as the
  run with this patch was compiled with no-omit-frame-pointer for debugging
  any probs.
 
 Good
 
I did change the patch so that bounce-pages always come from the NORMAL
  zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
  I'm not 100% sure the VM is capable of keeping the zones it already has
  balanced - and adding another one might break the camels back.  But as the
  test box has 4GB, it wasn't bouncing anyway.
 
 You are right, this is definitely something that needs checking. I
 really want this to work though. Rik, Andrea? Will the balancing handle
 the extra zone?

The bounces can came from the ZONE_NORMAL without problems, however the
ZONE_DMA32 way is fine too, but yes probably it isn't needed in real
life unless you do an huge amount of I/O at the same time. If you want
to reduce the amount of changes you can defer the zone_dma32 patch and
possibly plug it in later.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Jens Axboe

On Wed, May 30 2001, Andrea Arcangeli wrote:
 I did change the patch so that bounce-pages always come from the NORMAL
   zone, hence the ZONE_DMA32 zone isn't needed.  I avoided the new zone, as
   I'm not 100% sure the VM is capable of keeping the zones it already has
   balanced - and adding another one might break the camels back.  But as the
   test box has 4GB, it wasn't bouncing anyway.
  
  You are right, this is definitely something that needs checking. I
  really want this to work though. Rik, Andrea? Will the balancing handle
  the extra zone?
 
 The bounces can came from the ZONE_NORMAL without problems, however the

Of course

 ZONE_DMA32 way is fine too, but yes probably it isn't needed in real
 life unless you do an huge amount of I/O at the same time. If you want

It's not strictly needed, but it does buy us 3 extra gig to do I/O from
an a pae enabled x86.

 to reduce the amount of changes you can defer the zone_dma32 patch and
 possibly plug it in later.

Yes, I did modular patches for this reason.

Thanks!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread andrea

On Wed, May 30, 2001 at 11:59:50AM +0100, Mark Hemment wrote:
   Now, when HIGHMEM allocations come in (for page cache pages), they
   skip the HIGH zone and use the NORMAL zone (as it now has plenty
   of free pages) - the code at the top of __alloc_pages(), which
   checks against -pages_low.

btw, I think such heuristic is horribly broken ;), the highmem zone
simply needs to be balanced if it is under the pages_low mark, just
skipping it and falling back into the normal zone that happens to be
above the low mark is the wrong thing to do.

   Also, the problem isn't as bad as it first looks - HIGHMEM page-cache
 pages do get recycled (reclaimed), but there is a slight imbalance.

there will always be some imbalance unless all allocations would be
capable of highmem (which will never happen). The only thing we can do
is to optimize the zone usage so we won't run out of normal pages unless
there was a good reason. Once we run out of normal pages we'll simply
return NULL and the reserved pool of highmem bounces will be used
instead (other callers will behave differently).

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Rik van Riel

On Wed, 30 May 2001 [EMAIL PROTECTED] wrote:

 btw, I think such heuristic is horribly broken ;), the highmem zone
 simply needs to be balanced if it is under the pages_low mark, just
 skipping it and falling back into the normal zone that happens to be
 above the low mark is the wrong thing to do.

2.3.51 did this, we all know the result.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Rik van Riel

On Wed, 30 May 2001, Jens Axboe wrote:

 You are right, this is definitely something that needs checking. I
 really want this to work though. Rik, Andrea? Will the balancing
 handle the extra zone?

In as far as it handles balancing the current zones,
it'll also work with one more. In places where it's
currently broken it will probably also break with one
extra zone, though the fact that the DMA32 zone takes
the pressure off the NORMAL zone might actually help.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Andrea Arcangeli

On Wed, May 30, 2001 at 03:42:51PM -0300, Rik van Riel wrote:
 On Wed, 30 May 2001 [EMAIL PROTECTED] wrote:
 
  btw, I think such heuristic is horribly broken ;), the highmem zone
  simply needs to be balanced if it is under the pages_low mark, just
  skipping it and falling back into the normal zone that happens to be
  above the low mark is the wrong thing to do.
 
 2.3.51 did this, we all know the result.

I've no idea about what 2.3.51 does, but I was obviously wrong about
that. Forget such what I said above.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Yoann Vandoorselaere

Rik van Riel [EMAIL PROTECTED] writes:

 On Wed, 30 May 2001 [EMAIL PROTECTED] wrote:
 
  btw, I think such heuristic is horribly broken ;), the highmem zone
  simply needs to be balanced if it is under the pages_low mark, just
  skipping it and falling back into the normal zone that happens to be
  above the low mark is the wrong thing to do.
 
 2.3.51 did this, we all know the result.

Just a note, 
I remember the 2.3.51 kernel as the most usable kernel I ever used 
talking about VM.

-- 
Yoann Vandoorselaere | C makes it easy to shoot yourself in the foot. C++ makes
MandrakeSoft | it harder, but when you do, it blows away your whole
 | leg. - Bjarne Stroustrup
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Andrea Arcangeli

On Wed, May 30, 2001 at 08:57:50PM +0200, Yoann Vandoorselaere wrote:
 I remember the 2.3.51 kernel as the most usable kernel I ever used 
 talking about VM.

I also don't remeber anything strange in that kernel about the VM (I
instead remeber well the VM breakage introduced in 2.3.99-pre).

Regardless of what 2.3.51 was doing, the falling back into the lower
zones before starting the balancing is fine.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 4GB I/O, cut three

2001-05-30 Thread Rik van Riel

On Wed, 30 May 2001, Andrea Arcangeli wrote:
 On Wed, May 30, 2001 at 08:57:50PM +0200, Yoann Vandoorselaere wrote:
  I remember the 2.3.51 kernel as the most usable kernel I ever used 
  talking about VM.
 
 I also don't remeber anything strange in that kernel about the VM (I
 instead remeber well the VM breakage introduced in 2.3.99-pre).
 
 Regardless of what 2.3.51 was doing, the falling back into the lower
 zones before starting the balancing is fine.

The problem with 2.3.51 was that it started balancing
the HIGHMEM zone before falling back.

On a 1GB system this lead not only to the system starting
to swap as soon as the 128MB highmem zone was filled up,
it also resulted in the other 900MB being essentially
unused.

Having your 1GB system running as if it had 128MB definately
can be classified as Not Fun.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[patch] 4GB I/O, cut three

2001-05-29 Thread Jens Axboe

Hi,

Another day, another version.

Bugs fixed in this version: none
Known bugs in this version: none

In other words, it's perfect of course.

Changes:

- Added ide-dma segment coalescing
- Only print highmem I/O enable info when HIGHMEM is actually set

Please give it a test spin, especially if you have 1GB of RAM or more.
You should see something like this when booting:

hda: enabling highmem I/O
...
SCSI: channel 0, id 0: enabling highmem I/O

depending on drive configuration etc.

Plea to maintainers of the different architectures: could you please add
the arch parts to support this? This includes:

- memory zoning at init time
- page_to_bus
- pci_map_page / pci_unmap_page
- set_bh_sg
- KM_BH_IRQ (for HIGHMEM archs)

I think that's it, feel free to send me questions and (even better)
patches.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[patch] 4GB I/O, cut three

2001-05-29 Thread Jens Axboe

Hi,

Another day, another version.

Bugs fixed in this version: none
Known bugs in this version: none

In other words, it's perfect of course.

Changes:

- Added ide-dma segment coalescing
- Only print highmem I/O enable info when HIGHMEM is actually set

Please give it a test spin, especially if you have 1GB of RAM or more.
You should see something like this when booting:

hda: enabling highmem I/O
...
SCSI: channel 0, id 0: enabling highmem I/O

depending on drive configuration etc.

Plea to maintainers of the different architectures: could you please add
the arch parts to support this? This includes:

- memory zoning at init time
- page_to_bus
- pci_map_page / pci_unmap_page
- set_bh_sg
- KM_BH_IRQ (for HIGHMEM archs)

I think that's it, feel free to send me questions and (even better)
patches.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/