On Thu, Jan 27, 2005 at 07:12:29AM -0600, Robin Holt wrote:
> > Some architectures tend to have spare DMA engines lying around. There's
> > no need to use the CPU for zeroing pages. How feasible would it be for
> > scrubd to use these?
>
> An earlier proposal that Christoph pushed would have
On Thu, Jan 27, 2005 at 07:12:29AM -0600, Robin Holt wrote:
Some architectures tend to have spare DMA engines lying around. There's
no need to use the CPU for zeroing pages. How feasible would it be for
scrubd to use these?
An earlier proposal that Christoph pushed would have used the
Christoph Lameter writes:
> scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7
> pages are cleared at once.
So are you saying that clearing an order 4 page will take measurably
less time than clearing 16 order 0 pages? I find that hard to
believe.
Paul.
-
To unsubscribe from
On Fri, 4 Feb 2005, Paul Mackerras wrote:
> > Yes but its a short burst that only occurs very infrequestly and it takes
>
> It occurs just as often as we clear pages in the page fault handler.
> We aren't clearing any fewer pages by prezeroing, we are just clearing
> them a bit earlier.
scrubd
> > advantage of all the optimizations that modern memory subsystems have for
> > linear accesses. And if hardware exists that can offload that from the cpu
> > then the cpu caches are only minimally affected.
>
> I can believe that prezeroing could provide a benefit on some
> machines, but I
Christoph Lameter writes:
> If the program does not use these cache lines then you have wasted time
> in the page fault handler allocating and handling them. That is what
> prezeroing does for you.
The program is going to access at least one cache line of the new
page. On my G5, it takes _less_
Christoph Lameter writes:
scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7
pages are cleared at once.
So are you saying that clearing an order 4 page will take measurably
less time than clearing 16 order 0 pages? I find that hard to
believe.
Paul.
-
To unsubscribe from
Christoph Lameter writes:
If the program does not use these cache lines then you have wasted time
in the page fault handler allocating and handling them. That is what
prezeroing does for you.
The program is going to access at least one cache line of the new
page. On my G5, it takes _less_
advantage of all the optimizations that modern memory subsystems have for
linear accesses. And if hardware exists that can offload that from the cpu
then the cpu caches are only minimally affected.
I can believe that prezeroing could provide a benefit on some
machines, but I don't think
On Fri, 4 Feb 2005, Paul Mackerras wrote:
Yes but its a short burst that only occurs very infrequestly and it takes
It occurs just as often as we clear pages in the page fault handler.
We aren't clearing any fewer pages by prezeroing, we are just clearing
them a bit earlier.
scrubd clears
On Fri, 4 Feb 2005, Nick Piggin wrote:
> If you have got to the stage of doing "real world" tests, I'd be
> interested to see results of tests that best highlight the improvements.
I am trying to figure out which tests to use right now.
> I imagine many general purpose server things wouldn't be
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote:
> On Fri, 4 Feb 2005, Paul Mackerras wrote:
>
> > As has my scepticism about pre-zeroing actually providing any benefit
> > on ppc64. Nevertheless, the only definitive answer is to actually
> > measure the performance both ways.
>
> Of
On Fri, 4 Feb 2005, Paul Mackerras wrote:
> The dcbz instruction on the G5 (PPC970) establishes the new cache line
> in the L2 cache and doesn't disturb the L1 cache (except to invalidate
> the line in the L1 data cache if it is present there). The L2 cache
> is 512kB and 8-way set associative
Christoph Lameter writes:
> You need to think about this in a different way. Prezeroing only makes
> sense if it can avoid using cache lines that the zeroing in the
> hot paths would have to use since it touches all cachelines on
> the page (the ppc instruction is certainly nice and avoids a
On Fri, 4 Feb 2005, Paul Mackerras wrote:
> On my G5 it takes ~200 cycles to zero a whole page. In other words it
> takes about the same time to zero a page as to bring in a single cache
> line from memory. (PPC has an instruction to establish a whole cache
> line of zeroes in modified state
Rik van Riel writes:
> I'm not convinced. Zeroing a page takes 2000-4000 CPU
> cycles, while faulting the page from RAM into cache takes
> 200-400 CPU cycles per cache line, or 6000-12000 CPU
> cycles.
On my G5 it takes ~200 cycles to zero a whole page. In other words it
takes about the same
Rik van Riel writes:
I'm not convinced. Zeroing a page takes 2000-4000 CPU
cycles, while faulting the page from RAM into cache takes
200-400 CPU cycles per cache line, or 6000-12000 CPU
cycles.
On my G5 it takes ~200 cycles to zero a whole page. In other words it
takes about the same time
On Fri, 4 Feb 2005, Paul Mackerras wrote:
On my G5 it takes ~200 cycles to zero a whole page. In other words it
takes about the same time to zero a page as to bring in a single cache
line from memory. (PPC has an instruction to establish a whole cache
line of zeroes in modified state
Christoph Lameter writes:
You need to think about this in a different way. Prezeroing only makes
sense if it can avoid using cache lines that the zeroing in the
hot paths would have to use since it touches all cachelines on
the page (the ppc instruction is certainly nice and avoids a
On Fri, 4 Feb 2005, Paul Mackerras wrote:
The dcbz instruction on the G5 (PPC970) establishes the new cache line
in the L2 cache and doesn't disturb the L1 cache (except to invalidate
the line in the L1 data cache if it is present there). The L2 cache
is 512kB and 8-way set associative
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote:
On Fri, 4 Feb 2005, Paul Mackerras wrote:
As has my scepticism about pre-zeroing actually providing any benefit
on ppc64. Nevertheless, the only definitive answer is to actually
measure the performance both ways.
Of course.
On Fri, 4 Feb 2005, Nick Piggin wrote:
If you have got to the stage of doing real world tests, I'd be
interested to see results of tests that best highlight the improvements.
I am trying to figure out which tests to use right now.
I imagine many general purpose server things wouldn't be
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
Someone should try implementing the zeroing driver for a fast x86 PCI
device. :)
I'm not convinced. Zeroing a page takes 2000-4000 CPU
cycles, while faulting the page from RAM into cache takes
200-400 CPU cycles per cache line, or 6000-12000 CPU
cycles.
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
> > Nope the BTE is a block transfer engine. Its an inter numa node DMA thing
> > that is being abused to zero blocks.
> Ah, OK.
> Is there a driver for normal BTE operation or is not kernel-controlled ?
There is a function bte_copy in the ia64 arch.
On Wed, 2005-02-02 at 14:31 -0200, Marcelo Tosatti wrote:
> Someone should try implementing the zeroing driver for a fast x86 PCI
> device. :)
The BT848/BT878 seems like an ideal candidate. That kind of abuse is
probably only really worth it on an architecture with cache-coherent DMA
though. If
On Wed, 2005-02-02 at 21:00 +, Maciej W. Rozycki wrote:
> E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA
> engines onchip which can transfer data to/from the memory controller in
> 32-byte chunks over the 256-bit internal bus. We have hardly any use for
> these
On Wed, Feb 02, 2005 at 11:05:14AM -0800, Christoph Lameter wrote:
> On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
>
> > Sounds very interesting idea to me. Guess it depends on whether the cost of
> > DMA write for memory zeroing, which is memory architecture/DMA engine
> > dependant,
> > offsets
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
> > Some architectures tend to have spare DMA engines lying around. There's
> > no need to use the CPU for zeroing pages. How feasible would it be for
> > scrubd to use these?
[...]
> I suppose you are talking about DMA engines which are not being driven
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
> Sounds very interesting idea to me. Guess it depends on whether the cost of
> DMA write for memory zeroing, which is memory architecture/DMA engine
> dependant,
> offsets the cost of CPU zeroing.
>
> Do you have any thoughts on that?
>
> I wonder if
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote:
> On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote:
> > Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> > called scrubd. scrubd is disabled by default but can be enabled
> > by writing an order
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote:
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote:
Adds management of ZEROED and NOT_ZEROED pages and a background daemon
called scrubd. scrubd is disabled by default but can be enabled
by writing an order number to
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
Sounds very interesting idea to me. Guess it depends on whether the cost of
DMA write for memory zeroing, which is memory architecture/DMA engine
dependant,
offsets the cost of CPU zeroing.
Do you have any thoughts on that?
I wonder if such
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
Some architectures tend to have spare DMA engines lying around. There's
no need to use the CPU for zeroing pages. How feasible would it be for
scrubd to use these?
[...]
I suppose you are talking about DMA engines which are not being driven
by
On Wed, Feb 02, 2005 at 11:05:14AM -0800, Christoph Lameter wrote:
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
Sounds very interesting idea to me. Guess it depends on whether the cost of
DMA write for memory zeroing, which is memory architecture/DMA engine
dependant,
offsets the cost of
On Wed, 2005-02-02 at 21:00 +, Maciej W. Rozycki wrote:
E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA
engines onchip which can transfer data to/from the memory controller in
32-byte chunks over the 256-bit internal bus. We have hardly any use for
these devices
On Wed, 2005-02-02 at 14:31 -0200, Marcelo Tosatti wrote:
Someone should try implementing the zeroing driver for a fast x86 PCI
device. :)
The BT848/BT878 seems like an ideal candidate. That kind of abuse is
probably only really worth it on an architecture with cache-coherent DMA
though. If you
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
Nope the BTE is a block transfer engine. Its an inter numa node DMA thing
that is being abused to zero blocks.
Ah, OK.
Is there a driver for normal BTE operation or is not kernel-controlled ?
There is a function bte_copy in the ia64 arch. See
On Wed, 2 Feb 2005, Marcelo Tosatti wrote:
Someone should try implementing the zeroing driver for a fast x86 PCI
device. :)
I'm not convinced. Zeroing a page takes 2000-4000 CPU
cycles, while faulting the page from RAM into cache takes
200-400 CPU cycles per cache line, or 6000-12000 CPU
cycles.
On Thu, 27 Jan 2005, David Woodhouse wrote:
> On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote:
> > An earlier proposal that Christoph pushed would have used the BTE on
> > sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips?
>
> I wasn't being that specific. There's spare DMA
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote:
> An earlier proposal that Christoph pushed would have used the BTE on
> sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips?
I wasn't being that specific. There's spare DMA engines on a lot of
PPC/ARM/FRV/SH/MIPS and other
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote:
> On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote:
> > Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> > called scrubd. scrubd is disabled by default but can be enabled
> > by writing an order
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote:
> Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> called scrubd. scrubd is disabled by default but can be enabled
> by writing an order number to /proc/sys/vm/scrub_start. If a page
> is coalesced of that order or
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote:
Adds management of ZEROED and NOT_ZEROED pages and a background daemon
called scrubd. scrubd is disabled by default but can be enabled
by writing an order number to /proc/sys/vm/scrub_start. If a page
is coalesced of that order or
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote:
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote:
Adds management of ZEROED and NOT_ZEROED pages and a background daemon
called scrubd. scrubd is disabled by default but can be enabled
by writing an order number to
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote:
An earlier proposal that Christoph pushed would have used the BTE on
sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips?
I wasn't being that specific. There's spare DMA engines on a lot of
PPC/ARM/FRV/SH/MIPS and other machines,
On Thu, 27 Jan 2005, David Woodhouse wrote:
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote:
An earlier proposal that Christoph pushed would have used the BTE on
sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips?
I wasn't being that specific. There's spare DMA engines on
Hello Christoph,
In this part of your patch:
[...]
Index: linux-2.6.10/include/linux/gfp.h
===
--- linux-2.6.10.orig/include/linux/gfp.h 2005-01-21 10:43:59.0
-0800
+++ linux-2.6.10/include/linux/gfp.h2005-01-21
Hello Christoph,
In this part of your patch:
[...]
Index: linux-2.6.10/include/linux/gfp.h
===
--- linux-2.6.10.orig/include/linux/gfp.h 2005-01-21 10:43:59.0
-0800
+++ linux-2.6.10/include/linux/gfp.h2005-01-21
Adds management of ZEROED and NOT_ZEROED pages and a background daemon
called scrubd. scrubd is disabled by default but can be enabled
by writing an order number to /proc/sys/vm/scrub_start. If a page
is coalesced of that order or higher then the scrub daemon will
start zeroing until all pages of
Adds management of ZEROED and NOT_ZEROED pages and a background daemon
called scrubd. scrubd is disabled by default but can be enabled
by writing an order number to /proc/sys/vm/scrub_start. If a page
is coalesced of that order or higher then the scrub daemon will
start zeroing until all pages of
Adds management of ZEROED and NOT_ZEROED pages and a background daemon
called scrubd. scrubd is disabled by default but can be enabled
by writing an order number to /proc/sys/vm/scrub_start. If a page
is coalesced of that order or higher then the scrub daemon will
start zeroing until all pages of
51 matches
Mail list logo