Re: A scrub daemon (prezeroing)

2005-02-08 Thread Ralf Baechle
On Thu, Jan 27, 2005 at 07:12:29AM -0600, Robin Holt wrote: > > Some architectures tend to have spare DMA engines lying around. There's > > no need to use the CPU for zeroing pages. How feasible would it be for > > scrubd to use these? > > An earlier proposal that Christoph pushed would have

Re: A scrub daemon (prezeroing)

2005-02-08 Thread Ralf Baechle
On Thu, Jan 27, 2005 at 07:12:29AM -0600, Robin Holt wrote: Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? An earlier proposal that Christoph pushed would have used the

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Paul Mackerras
Christoph Lameter writes: > scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7 > pages are cleared at once. So are you saying that clearing an order 4 page will take measurably less time than clearing 16 order 0 pages? I find that hard to believe. Paul. - To unsubscribe from

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Christoph Lameter
On Fri, 4 Feb 2005, Paul Mackerras wrote: > > Yes but its a short burst that only occurs very infrequestly and it takes > > It occurs just as often as we clear pages in the page fault handler. > We aren't clearing any fewer pages by prezeroing, we are just clearing > them a bit earlier. scrubd

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Andi Kleen
> > advantage of all the optimizations that modern memory subsystems have for > > linear accesses. And if hardware exists that can offload that from the cpu > > then the cpu caches are only minimally affected. > > I can believe that prezeroing could provide a benefit on some > machines, but I

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Paul Mackerras
Christoph Lameter writes: > If the program does not use these cache lines then you have wasted time > in the page fault handler allocating and handling them. That is what > prezeroing does for you. The program is going to access at least one cache line of the new page. On my G5, it takes _less_

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Paul Mackerras
Christoph Lameter writes: scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7 pages are cleared at once. So are you saying that clearing an order 4 page will take measurably less time than clearing 16 order 0 pages? I find that hard to believe. Paul. - To unsubscribe from

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Paul Mackerras
Christoph Lameter writes: If the program does not use these cache lines then you have wasted time in the page fault handler allocating and handling them. That is what prezeroing does for you. The program is going to access at least one cache line of the new page. On my G5, it takes _less_

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Andi Kleen
advantage of all the optimizations that modern memory subsystems have for linear accesses. And if hardware exists that can offload that from the cpu then the cpu caches are only minimally affected. I can believe that prezeroing could provide a benefit on some machines, but I don't think

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Christoph Lameter
On Fri, 4 Feb 2005, Paul Mackerras wrote: Yes but its a short burst that only occurs very infrequestly and it takes It occurs just as often as we clear pages in the page fault handler. We aren't clearing any fewer pages by prezeroing, we are just clearing them a bit earlier. scrubd clears

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Christoph Lameter
On Fri, 4 Feb 2005, Nick Piggin wrote: > If you have got to the stage of doing "real world" tests, I'd be > interested to see results of tests that best highlight the improvements. I am trying to figure out which tests to use right now. > I imagine many general purpose server things wouldn't be

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Nick Piggin
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote: > On Fri, 4 Feb 2005, Paul Mackerras wrote: > > > As has my scepticism about pre-zeroing actually providing any benefit > > on ppc64. Nevertheless, the only definitive answer is to actually > > measure the performance both ways. > > Of

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Christoph Lameter
On Fri, 4 Feb 2005, Paul Mackerras wrote: > The dcbz instruction on the G5 (PPC970) establishes the new cache line > in the L2 cache and doesn't disturb the L1 cache (except to invalidate > the line in the L1 data cache if it is present there). The L2 cache > is 512kB and 8-way set associative

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Paul Mackerras
Christoph Lameter writes: > You need to think about this in a different way. Prezeroing only makes > sense if it can avoid using cache lines that the zeroing in the > hot paths would have to use since it touches all cachelines on > the page (the ppc instruction is certainly nice and avoids a

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Christoph Lameter
On Fri, 4 Feb 2005, Paul Mackerras wrote: > On my G5 it takes ~200 cycles to zero a whole page. In other words it > takes about the same time to zero a page as to bring in a single cache > line from memory. (PPC has an instruction to establish a whole cache > line of zeroes in modified state

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Paul Mackerras
Rik van Riel writes: > I'm not convinced. Zeroing a page takes 2000-4000 CPU > cycles, while faulting the page from RAM into cache takes > 200-400 CPU cycles per cache line, or 6000-12000 CPU > cycles. On my G5 it takes ~200 cycles to zero a whole page. In other words it takes about the same

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Paul Mackerras
Rik van Riel writes: I'm not convinced. Zeroing a page takes 2000-4000 CPU cycles, while faulting the page from RAM into cache takes 200-400 CPU cycles per cache line, or 6000-12000 CPU cycles. On my G5 it takes ~200 cycles to zero a whole page. In other words it takes about the same time

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Christoph Lameter
On Fri, 4 Feb 2005, Paul Mackerras wrote: On my G5 it takes ~200 cycles to zero a whole page. In other words it takes about the same time to zero a page as to bring in a single cache line from memory. (PPC has an instruction to establish a whole cache line of zeroes in modified state

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Paul Mackerras
Christoph Lameter writes: You need to think about this in a different way. Prezeroing only makes sense if it can avoid using cache lines that the zeroing in the hot paths would have to use since it touches all cachelines on the page (the ppc instruction is certainly nice and avoids a

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Christoph Lameter
On Fri, 4 Feb 2005, Paul Mackerras wrote: The dcbz instruction on the G5 (PPC970) establishes the new cache line in the L2 cache and doesn't disturb the L1 cache (except to invalidate the line in the L1 data cache if it is present there). The L2 cache is 512kB and 8-way set associative

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Nick Piggin
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote: On Fri, 4 Feb 2005, Paul Mackerras wrote: As has my scepticism about pre-zeroing actually providing any benefit on ppc64. Nevertheless, the only definitive answer is to actually measure the performance both ways. Of course.

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Christoph Lameter
On Fri, 4 Feb 2005, Nick Piggin wrote: If you have got to the stage of doing real world tests, I'd be interested to see results of tests that best highlight the improvements. I am trying to figure out which tests to use right now. I imagine many general purpose server things wouldn't be

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Rik van Riel
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Someone should try implementing the zeroing driver for a fast x86 PCI device. :) I'm not convinced. Zeroing a page takes 2000-4000 CPU cycles, while faulting the page from RAM into cache takes 200-400 CPU cycles per cache line, or 6000-12000 CPU cycles.

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Christoph Lameter
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > > Nope the BTE is a block transfer engine. Its an inter numa node DMA thing > > that is being abused to zero blocks. > Ah, OK. > Is there a driver for normal BTE operation or is not kernel-controlled ? There is a function bte_copy in the ia64 arch.

Re: A scrub daemon (prezeroing)

2005-02-02 Thread David Woodhouse
On Wed, 2005-02-02 at 14:31 -0200, Marcelo Tosatti wrote: > Someone should try implementing the zeroing driver for a fast x86 PCI > device. :) The BT848/BT878 seems like an ideal candidate. That kind of abuse is probably only really worth it on an architecture with cache-coherent DMA though. If

Re: A scrub daemon (prezeroing)

2005-02-02 Thread David Woodhouse
On Wed, 2005-02-02 at 21:00 +, Maciej W. Rozycki wrote: > E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA > engines onchip which can transfer data to/from the memory controller in > 32-byte chunks over the 256-bit internal bus. We have hardly any use for > these

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Marcelo Tosatti
On Wed, Feb 02, 2005 at 11:05:14AM -0800, Christoph Lameter wrote: > On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > > > Sounds very interesting idea to me. Guess it depends on whether the cost of > > DMA write for memory zeroing, which is memory architecture/DMA engine > > dependant, > > offsets

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Maciej W. Rozycki
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > > Some architectures tend to have spare DMA engines lying around. There's > > no need to use the CPU for zeroing pages. How feasible would it be for > > scrubd to use these? [...] > I suppose you are talking about DMA engines which are not being driven

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Christoph Lameter
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > Sounds very interesting idea to me. Guess it depends on whether the cost of > DMA write for memory zeroing, which is memory architecture/DMA engine > dependant, > offsets the cost of CPU zeroing. > > Do you have any thoughts on that? > > I wonder if

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Marcelo Tosatti
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: > On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: > > Adds management of ZEROED and NOT_ZEROED pages and a background daemon > > called scrubd. scrubd is disabled by default but can be enabled > > by writing an order

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Marcelo Tosatti
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Christoph Lameter
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of CPU zeroing. Do you have any thoughts on that? I wonder if such

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Maciej W. Rozycki
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? [...] I suppose you are talking about DMA engines which are not being driven by

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Marcelo Tosatti
On Wed, Feb 02, 2005 at 11:05:14AM -0800, Christoph Lameter wrote: On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of

Re: A scrub daemon (prezeroing)

2005-02-02 Thread David Woodhouse
On Wed, 2005-02-02 at 21:00 +, Maciej W. Rozycki wrote: E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA engines onchip which can transfer data to/from the memory controller in 32-byte chunks over the 256-bit internal bus. We have hardly any use for these devices

Re: A scrub daemon (prezeroing)

2005-02-02 Thread David Woodhouse
On Wed, 2005-02-02 at 14:31 -0200, Marcelo Tosatti wrote: Someone should try implementing the zeroing driver for a fast x86 PCI device. :) The BT848/BT878 seems like an ideal candidate. That kind of abuse is probably only really worth it on an architecture with cache-coherent DMA though. If you

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Christoph Lameter
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Nope the BTE is a block transfer engine. Its an inter numa node DMA thing that is being abused to zero blocks. Ah, OK. Is there a driver for normal BTE operation or is not kernel-controlled ? There is a function bte_copy in the ia64 arch. See

Re: A scrub daemon (prezeroing)

2005-02-02 Thread Rik van Riel
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Someone should try implementing the zeroing driver for a fast x86 PCI device. :) I'm not convinced. Zeroing a page takes 2000-4000 CPU cycles, while faulting the page from RAM into cache takes 200-400 CPU cycles per cache line, or 6000-12000 CPU cycles.

Re: A scrub daemon (prezeroing)

2005-01-27 Thread Christoph Lameter
On Thu, 27 Jan 2005, David Woodhouse wrote: > On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: > > An earlier proposal that Christoph pushed would have used the BTE on > > sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? > > I wasn't being that specific. There's spare DMA

Re: A scrub daemon (prezeroing)

2005-01-27 Thread David Woodhouse
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: > An earlier proposal that Christoph pushed would have used the BTE on > sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? I wasn't being that specific. There's spare DMA engines on a lot of PPC/ARM/FRV/SH/MIPS and other

Re: A scrub daemon (prezeroing)

2005-01-27 Thread Robin Holt
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: > On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: > > Adds management of ZEROED and NOT_ZEROED pages and a background daemon > > called scrubd. scrubd is disabled by default but can be enabled > > by writing an order

Re: A scrub daemon (prezeroing)

2005-01-27 Thread David Woodhouse
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: > Adds management of ZEROED and NOT_ZEROED pages and a background daemon > called scrubd. scrubd is disabled by default but can be enabled > by writing an order number to /proc/sys/vm/scrub_start. If a page > is coalesced of that order or

Re: A scrub daemon (prezeroing)

2005-01-27 Thread David Woodhouse
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or

Re: A scrub daemon (prezeroing)

2005-01-27 Thread Robin Holt
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to

Re: A scrub daemon (prezeroing)

2005-01-27 Thread David Woodhouse
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? I wasn't being that specific. There's spare DMA engines on a lot of PPC/ARM/FRV/SH/MIPS and other machines,

Re: A scrub daemon (prezeroing)

2005-01-27 Thread Christoph Lameter
On Thu, 27 Jan 2005, David Woodhouse wrote: On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? I wasn't being that specific. There's spare DMA engines on

RE: A scrub daemon (prezeroing)

2005-01-22 Thread Joel Soete
Hello Christoph, In this part of your patch: [...] Index: linux-2.6.10/include/linux/gfp.h === --- linux-2.6.10.orig/include/linux/gfp.h 2005-01-21 10:43:59.0 -0800 +++ linux-2.6.10/include/linux/gfp.h2005-01-21

RE: A scrub daemon (prezeroing)

2005-01-22 Thread Joel Soete
Hello Christoph, In this part of your patch: [...] Index: linux-2.6.10/include/linux/gfp.h === --- linux-2.6.10.orig/include/linux/gfp.h 2005-01-21 10:43:59.0 -0800 +++ linux-2.6.10/include/linux/gfp.h2005-01-21

A scrub daemon (prezeroing)

2005-01-21 Thread Christoph Lameter
Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or higher then the scrub daemon will start zeroing until all pages of

A scrub daemon (prezeroing)

2005-01-21 Thread Christoph Lameter
Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or higher then the scrub daemon will start zeroing until all pages of

A scrub daemon (prezeroing)

2005-01-21 Thread Christoph Lameter
Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or higher then the scrub daemon will start zeroing until all pages of