Re: A scrub daemon (prezeroing)
On Thu, Jan 27, 2005 at 07:12:29AM -0600, Robin Holt wrote: > > Some architectures tend to have spare DMA engines lying around. There's > > no need to use the CPU for zeroing pages. How feasible would it be for > > scrubd to use these? > > An earlier proposal that Christoph pushed would have used the BTE on > sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? On BCM1250 SOCs we've gone a step beyond that and use the Data Mover to clear_page(), see arch/mips/mm/pg-sb1.c. It's roughly comparable to the SN0 BTE. Broadcom has meassured a quite large performance win for such a small code change. Ralf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, Jan 27, 2005 at 07:12:29AM -0600, Robin Holt wrote: Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? On BCM1250 SOCs we've gone a step beyond that and use the Data Mover to clear_page(), see arch/mips/mm/pg-sb1.c. It's roughly comparable to the SN0 BTE. Broadcom has meassured a quite large performance win for such a small code change. Ralf - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Christoph Lameter writes: > scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7 > pages are cleared at once. So are you saying that clearing an order 4 page will take measurably less time than clearing 16 order 0 pages? I find that hard to believe. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Paul Mackerras wrote: > > Yes but its a short burst that only occurs very infrequestly and it takes > > It occurs just as often as we clear pages in the page fault handler. > We aren't clearing any fewer pages by prezeroing, we are just clearing > them a bit earlier. scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7 pages are cleared at once. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
> > advantage of all the optimizations that modern memory subsystems have for > > linear accesses. And if hardware exists that can offload that from the cpu > > then the cpu caches are only minimally affected. > > I can believe that prezeroing could provide a benefit on some > machines, but I don't think it will provide any on ppc64. On modern x86 clears can be done quite quickly (no memory read access) with write combining writes. The problem is just that this will force the page out of cache. If there is any chance that the CPU will be accessing the data soon it's better to do the slower cached RMW clear. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Christoph Lameter writes: > If the program does not use these cache lines then you have wasted time > in the page fault handler allocating and handling them. That is what > prezeroing does for you. The program is going to access at least one cache line of the new page. On my G5, it takes _less_ time to clear the whole page and pull in one cache line from L2 cache to L1 than it does to pull in that same cache line from memory. > Yes but its a short burst that only occurs very infrequestly and it takes It occurs just as often as we clear pages in the page fault handler. We aren't clearing any fewer pages by prezeroing, we are just clearing them a bit earlier. > advantage of all the optimizations that modern memory subsystems have for > linear accesses. And if hardware exists that can offload that from the cpu > then the cpu caches are only minimally affected. I can believe that prezeroing could provide a benefit on some machines, but I don't think it will provide any on ppc64. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Christoph Lameter writes: scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7 pages are cleared at once. So are you saying that clearing an order 4 page will take measurably less time than clearing 16 order 0 pages? I find that hard to believe. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Christoph Lameter writes: If the program does not use these cache lines then you have wasted time in the page fault handler allocating and handling them. That is what prezeroing does for you. The program is going to access at least one cache line of the new page. On my G5, it takes _less_ time to clear the whole page and pull in one cache line from L2 cache to L1 than it does to pull in that same cache line from memory. Yes but its a short burst that only occurs very infrequestly and it takes It occurs just as often as we clear pages in the page fault handler. We aren't clearing any fewer pages by prezeroing, we are just clearing them a bit earlier. advantage of all the optimizations that modern memory subsystems have for linear accesses. And if hardware exists that can offload that from the cpu then the cpu caches are only minimally affected. I can believe that prezeroing could provide a benefit on some machines, but I don't think it will provide any on ppc64. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
advantage of all the optimizations that modern memory subsystems have for linear accesses. And if hardware exists that can offload that from the cpu then the cpu caches are only minimally affected. I can believe that prezeroing could provide a benefit on some machines, but I don't think it will provide any on ppc64. On modern x86 clears can be done quite quickly (no memory read access) with write combining writes. The problem is just that this will force the page out of cache. If there is any chance that the CPU will be accessing the data soon it's better to do the slower cached RMW clear. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Paul Mackerras wrote: Yes but its a short burst that only occurs very infrequestly and it takes It occurs just as often as we clear pages in the page fault handler. We aren't clearing any fewer pages by prezeroing, we are just clearing them a bit earlier. scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7 pages are cleared at once. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Nick Piggin wrote: > If you have got to the stage of doing "real world" tests, I'd be > interested to see results of tests that best highlight the improvements. I am trying to figure out which tests to use right now. > I imagine many general purpose server things wouldn't be helped much, > because they'll typically have little free memory, and will be > continually working and turning things over. These things are helped because zapping memory is very fast. Continual turning things over results in zapping of large memory areas once in awhile which even speeds up (a sparsely accessing) benchmark. Read my earlier posts on the subject. There is of course an issue if the system is continuously low on memory. In that case the buddy allocator may not generate large enough orders of free pages to make it worth to zap them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote: > On Fri, 4 Feb 2005, Paul Mackerras wrote: > > > As has my scepticism about pre-zeroing actually providing any benefit > > on ppc64. Nevertheless, the only definitive answer is to actually > > measure the performance both ways. > > Of course. The optimization depends on the type of load. If you use a > benchmark that writes to all pages in a page then you will see no benefit > at all. For a kernel compile you will see a slight benefit. For processing > of a sparse matrix (page tables are one example) a significant benefit can > be obtained. If you have got to the stage of doing "real world" tests, I'd be interested to see results of tests that best highlight the improvements. I imagine many general purpose server things wouldn't be helped much, because they'll typically have little free memory, and will be continually working and turning things over. A kernel compile on a newly booted system? Well that is a valid test. It is great that performance doesn't *decrease* in that case :P Of course HPC things may be a different story. It would be good to see your gross improvement on typical types of workloads that can best leverage this - and not just initial ramp up phases while memory is being faulted in, but the the full run time. Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Paul Mackerras wrote: > The dcbz instruction on the G5 (PPC970) establishes the new cache line > in the L2 cache and doesn't disturb the L1 cache (except to invalidate > the line in the L1 data cache if it is present there). The L2 cache > is 512kB and 8-way set associative (LRU). So zeroing a page is > unlikely to disturb the cache lines that the page fault handler is > using. Then, when the page fault handler returns to the user program, > any cache lines that the program wants to touch are available in 12 > cycles (L2 hit latency) instead of 200 - 300 (memory access latency). If the program does not use these cache lines then you have wasted time in the page fault handler allocating and handling them. That is what prezeroing does for you. > > cpu caches) is extraordinarily fast and the zeroing of large portions of > > memory is so too. That is why the impact of scrubd is negligible since > > its extremely fast. > > But that also disturbs cache lines that may well otherwise be useful. Yes but its a short burst that only occurs very infrequestly and it takes advantage of all the optimizations that modern memory subsystems have for linear accesses. And if hardware exists that can offload that from the cpu then the cpu caches are only minimally affected. > As has my scepticism about pre-zeroing actually providing any benefit > on ppc64. Nevertheless, the only definitive answer is to actually > measure the performance both ways. Of course. The optimization depends on the type of load. If you use a benchmark that writes to all pages in a page then you will see no benefit at all. For a kernel compile you will see a slight benefit. For processing of a sparse matrix (page tables are one example) a significant benefit can be obtained. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Christoph Lameter writes: > You need to think about this in a different way. Prezeroing only makes > sense if it can avoid using cache lines that the zeroing in the > hot paths would have to use since it touches all cachelines on > the page (the ppc instruction is certainly nice and avoids a cacheline > read but it still uses a cacheline!). The zeroing in itself (within the The dcbz instruction on the G5 (PPC970) establishes the new cache line in the L2 cache and doesn't disturb the L1 cache (except to invalidate the line in the L1 data cache if it is present there). The L2 cache is 512kB and 8-way set associative (LRU). So zeroing a page is unlikely to disturb the cache lines that the page fault handler is using. Then, when the page fault handler returns to the user program, any cache lines that the program wants to touch are available in 12 cycles (L2 hit latency) instead of 200 - 300 (memory access latency). > cpu caches) is extraordinarily fast and the zeroing of large portions of > memory is so too. That is why the impact of scrubd is negligible since > its extremely fast. But that also disturbs cache lines that may well otherwise be useful. > The point is to save activating cachelines not the time zeroing in itself > takes. This only works if only parts of the page are needed immediately > after the page fault. All of that has been documented in earlier posts on > the subject. As has my scepticism about pre-zeroing actually providing any benefit on ppc64. Nevertheless, the only definitive answer is to actually measure the performance both ways. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Paul Mackerras wrote: > On my G5 it takes ~200 cycles to zero a whole page. In other words it > takes about the same time to zero a page as to bring in a single cache > line from memory. (PPC has an instruction to establish a whole cache > line of zeroes in modified state without reading anything from > memory.) > > Thus I can't see how prezeroing can ever be a win on ppc64. You need to think about this in a different way. Prezeroing only makes sense if it can avoid using cache lines that the zeroing in the hot paths would have to use since it touches all cachelines on the page (the ppc instruction is certainly nice and avoids a cacheline read but it still uses a cacheline!). The zeroing in itself (within the cpu caches) is extraordinarily fast and the zeroing of large portions of memory is so too. That is why the impact of scrubd is negligible since its extremely fast. The point is to save activating cachelines not the time zeroing in itself takes. This only works if only parts of the page are needed immediately after the page fault. All of that has been documented in earlier posts on the subject. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Rik van Riel writes: > I'm not convinced. Zeroing a page takes 2000-4000 CPU > cycles, while faulting the page from RAM into cache takes > 200-400 CPU cycles per cache line, or 6000-12000 CPU > cycles. On my G5 it takes ~200 cycles to zero a whole page. In other words it takes about the same time to zero a page as to bring in a single cache line from memory. (PPC has an instruction to establish a whole cache line of zeroes in modified state without reading anything from memory.) Thus I can't see how prezeroing can ever be a win on ppc64. Regards, Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Rik van Riel writes: I'm not convinced. Zeroing a page takes 2000-4000 CPU cycles, while faulting the page from RAM into cache takes 200-400 CPU cycles per cache line, or 6000-12000 CPU cycles. On my G5 it takes ~200 cycles to zero a whole page. In other words it takes about the same time to zero a page as to bring in a single cache line from memory. (PPC has an instruction to establish a whole cache line of zeroes in modified state without reading anything from memory.) Thus I can't see how prezeroing can ever be a win on ppc64. Regards, Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Paul Mackerras wrote: On my G5 it takes ~200 cycles to zero a whole page. In other words it takes about the same time to zero a page as to bring in a single cache line from memory. (PPC has an instruction to establish a whole cache line of zeroes in modified state without reading anything from memory.) Thus I can't see how prezeroing can ever be a win on ppc64. You need to think about this in a different way. Prezeroing only makes sense if it can avoid using cache lines that the zeroing in the hot paths would have to use since it touches all cachelines on the page (the ppc instruction is certainly nice and avoids a cacheline read but it still uses a cacheline!). The zeroing in itself (within the cpu caches) is extraordinarily fast and the zeroing of large portions of memory is so too. That is why the impact of scrubd is negligible since its extremely fast. The point is to save activating cachelines not the time zeroing in itself takes. This only works if only parts of the page are needed immediately after the page fault. All of that has been documented in earlier posts on the subject. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
Christoph Lameter writes: You need to think about this in a different way. Prezeroing only makes sense if it can avoid using cache lines that the zeroing in the hot paths would have to use since it touches all cachelines on the page (the ppc instruction is certainly nice and avoids a cacheline read but it still uses a cacheline!). The zeroing in itself (within the The dcbz instruction on the G5 (PPC970) establishes the new cache line in the L2 cache and doesn't disturb the L1 cache (except to invalidate the line in the L1 data cache if it is present there). The L2 cache is 512kB and 8-way set associative (LRU). So zeroing a page is unlikely to disturb the cache lines that the page fault handler is using. Then, when the page fault handler returns to the user program, any cache lines that the program wants to touch are available in 12 cycles (L2 hit latency) instead of 200 - 300 (memory access latency). cpu caches) is extraordinarily fast and the zeroing of large portions of memory is so too. That is why the impact of scrubd is negligible since its extremely fast. But that also disturbs cache lines that may well otherwise be useful. The point is to save activating cachelines not the time zeroing in itself takes. This only works if only parts of the page are needed immediately after the page fault. All of that has been documented in earlier posts on the subject. As has my scepticism about pre-zeroing actually providing any benefit on ppc64. Nevertheless, the only definitive answer is to actually measure the performance both ways. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Paul Mackerras wrote: The dcbz instruction on the G5 (PPC970) establishes the new cache line in the L2 cache and doesn't disturb the L1 cache (except to invalidate the line in the L1 data cache if it is present there). The L2 cache is 512kB and 8-way set associative (LRU). So zeroing a page is unlikely to disturb the cache lines that the page fault handler is using. Then, when the page fault handler returns to the user program, any cache lines that the program wants to touch are available in 12 cycles (L2 hit latency) instead of 200 - 300 (memory access latency). If the program does not use these cache lines then you have wasted time in the page fault handler allocating and handling them. That is what prezeroing does for you. cpu caches) is extraordinarily fast and the zeroing of large portions of memory is so too. That is why the impact of scrubd is negligible since its extremely fast. But that also disturbs cache lines that may well otherwise be useful. Yes but its a short burst that only occurs very infrequestly and it takes advantage of all the optimizations that modern memory subsystems have for linear accesses. And if hardware exists that can offload that from the cpu then the cpu caches are only minimally affected. As has my scepticism about pre-zeroing actually providing any benefit on ppc64. Nevertheless, the only definitive answer is to actually measure the performance both ways. Of course. The optimization depends on the type of load. If you use a benchmark that writes to all pages in a page then you will see no benefit at all. For a kernel compile you will see a slight benefit. For processing of a sparse matrix (page tables are one example) a significant benefit can be obtained. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote: On Fri, 4 Feb 2005, Paul Mackerras wrote: As has my scepticism about pre-zeroing actually providing any benefit on ppc64. Nevertheless, the only definitive answer is to actually measure the performance both ways. Of course. The optimization depends on the type of load. If you use a benchmark that writes to all pages in a page then you will see no benefit at all. For a kernel compile you will see a slight benefit. For processing of a sparse matrix (page tables are one example) a significant benefit can be obtained. If you have got to the stage of doing real world tests, I'd be interested to see results of tests that best highlight the improvements. I imagine many general purpose server things wouldn't be helped much, because they'll typically have little free memory, and will be continually working and turning things over. A kernel compile on a newly booted system? Well that is a valid test. It is great that performance doesn't *decrease* in that case :P Of course HPC things may be a different story. It would be good to see your gross improvement on typical types of workloads that can best leverage this - and not just initial ramp up phases while memory is being faulted in, but the the full run time. Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 4 Feb 2005, Nick Piggin wrote: If you have got to the stage of doing real world tests, I'd be interested to see results of tests that best highlight the improvements. I am trying to figure out which tests to use right now. I imagine many general purpose server things wouldn't be helped much, because they'll typically have little free memory, and will be continually working and turning things over. These things are helped because zapping memory is very fast. Continual turning things over results in zapping of large memory areas once in awhile which even speeds up (a sparsely accessing) benchmark. Read my earlier posts on the subject. There is of course an issue if the system is continuously low on memory. In that case the buddy allocator may not generate large enough orders of free pages to make it worth to zap them. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Someone should try implementing the zeroing driver for a fast x86 PCI device. :) I'm not convinced. Zeroing a page takes 2000-4000 CPU cycles, while faulting the page from RAM into cache takes 200-400 CPU cycles per cache line, or 6000-12000 CPU cycles. If the page is being used immediately after it is allocated, it may be faster to prezero the page on the fly. On some CPUs these writes bypass the "read from RAM" stage and allow things to just live in cache completely. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > > Nope the BTE is a block transfer engine. Its an inter numa node DMA thing > > that is being abused to zero blocks. > Ah, OK. > Is there a driver for normal BTE operation or is not kernel-controlled ? There is a function bte_copy in the ia64 arch. See arch/ia64/sn/kernel/bte.c > I wonder what has to be done to have active DMA engines be abused for zeroing > when idle and what are the implications of that. Some kind of notification > mechanism > is necessary to inform idleness ? > > Someone should try implementing the zeroing driver for a fast x86 PCI device. > :) Sure but I am on ia64 not i386. Find your own means to abuse your own chips ... ;-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2005-02-02 at 14:31 -0200, Marcelo Tosatti wrote: > Someone should try implementing the zeroing driver for a fast x86 PCI > device. :) The BT848/BT878 seems like an ideal candidate. That kind of abuse is probably only really worth it on an architecture with cache-coherent DMA though. If you have to flush the cache anyway, you might as well just zero it from the CPU. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2005-02-02 at 21:00 +, Maciej W. Rozycki wrote: > E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA > engines onchip which can transfer data to/from the memory controller in > 32-byte chunks over the 256-bit internal bus. We have hardly any use for > these devices and certainly not for all four of them. On machines like the Ocelot, I keep intending to abuse one of the DMA engines for access to the DiskOnChip. Really must dig the Ocelot out of the dusty pile of toys... :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, Feb 02, 2005 at 11:05:14AM -0800, Christoph Lameter wrote: > On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > > > Sounds very interesting idea to me. Guess it depends on whether the cost of > > DMA write for memory zeroing, which is memory architecture/DMA engine > > dependant, > > offsets the cost of CPU zeroing. > > > > Do you have any thoughts on that? > > > > I wonder if such thing (using unrelated devices DMA engine's for zeroing) > > ever been > > done on other OS'es? > > > > AFAIK SGI's BTE is special purpose hardware for memory zeroing. > > Nope the BTE is a block transfer engine. Its an inter numa node DMA thing > that is being abused to zero blocks. Ah, OK. Is there a driver for normal BTE operation or is not kernel-controlled ? > The same can be done with most DMA chips (I have done so on some other > platforms not on i386) Nice! What kind of DMA chip was that and through which kind of bus was it connected to CPU ? I wonder what has to be done to have active DMA engines be abused for zeroing when idle and what are the implications of that. Some kind of notification mechanism is necessary to inform idleness ? Someone should try implementing the zeroing driver for a fast x86 PCI device. :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > > Some architectures tend to have spare DMA engines lying around. There's > > no need to use the CPU for zeroing pages. How feasible would it be for > > scrubd to use these? [...] > I suppose you are talking about DMA engines which are not being driven > by any driver ? E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA engines onchip which can transfer data to/from the memory controller in 32-byte chunks over the 256-bit internal bus. We have hardly any use for these devices and certainly not for all four of them. > Sounds very interesting idea to me. Guess it depends on whether the cost of > DMA write for memory zeroing, which is memory architecture/DMA engine > dependant, > offsets the cost of CPU zeroing. I suppose so, at least with the Broadcom's chips you avoid cache trashing, yet you don't need to care about stale data as coherency between CPUs and the onchip memory controller is maintained automatically by hardware. Maciej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: > Sounds very interesting idea to me. Guess it depends on whether the cost of > DMA write for memory zeroing, which is memory architecture/DMA engine > dependant, > offsets the cost of CPU zeroing. > > Do you have any thoughts on that? > > I wonder if such thing (using unrelated devices DMA engine's for zeroing) > ever been > done on other OS'es? > > AFAIK SGI's BTE is special purpose hardware for memory zeroing. Nope the BTE is a block transfer engine. Its an inter numa node DMA thing that is being abused to zero blocks. The same can be done with most DMA chips (I have done so on some other platforms not on i386) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: > On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: > > Adds management of ZEROED and NOT_ZEROED pages and a background daemon > > called scrubd. scrubd is disabled by default but can be enabled > > by writing an order number to /proc/sys/vm/scrub_start. If a page > > is coalesced of that order or higher then the scrub daemon will > > start zeroing until all pages of order /proc/sys/vm/scrub_stop and > > higher are zeroed and then go back to sleep. > > Some architectures tend to have spare DMA engines lying around. There's > no need to use the CPU for zeroing pages. How feasible would it be for > scrubd to use these? Hi David, I suppose you are talking about DMA engines which are not being driven by any driver ? Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of CPU zeroing. Do you have any thoughts on that? I wonder if such thing (using unrelated devices DMA engine's for zeroing) ever been done on other OS'es? AFAIK SGI's BTE is special purpose hardware for memory zeroing. BTW, Andrew noted on lkml sometime ago that disabling caches before doing zeroing could enhance overall system performance by decreasing cache thrashing. What are the conclusions about that? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or higher then the scrub daemon will start zeroing until all pages of order /proc/sys/vm/scrub_stop and higher are zeroed and then go back to sleep. Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? Hi David, I suppose you are talking about DMA engines which are not being driven by any driver ? Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of CPU zeroing. Do you have any thoughts on that? I wonder if such thing (using unrelated devices DMA engine's for zeroing) ever been done on other OS'es? AFAIK SGI's BTE is special purpose hardware for memory zeroing. BTW, Andrew noted on lkml sometime ago that disabling caches before doing zeroing could enhance overall system performance by decreasing cache thrashing. What are the conclusions about that? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of CPU zeroing. Do you have any thoughts on that? I wonder if such thing (using unrelated devices DMA engine's for zeroing) ever been done on other OS'es? AFAIK SGI's BTE is special purpose hardware for memory zeroing. Nope the BTE is a block transfer engine. Its an inter numa node DMA thing that is being abused to zero blocks. The same can be done with most DMA chips (I have done so on some other platforms not on i386) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? [...] I suppose you are talking about DMA engines which are not being driven by any driver ? E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA engines onchip which can transfer data to/from the memory controller in 32-byte chunks over the 256-bit internal bus. We have hardly any use for these devices and certainly not for all four of them. Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of CPU zeroing. I suppose so, at least with the Broadcom's chips you avoid cache trashing, yet you don't need to care about stale data as coherency between CPUs and the onchip memory controller is maintained automatically by hardware. Maciej - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, Feb 02, 2005 at 11:05:14AM -0800, Christoph Lameter wrote: On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Sounds very interesting idea to me. Guess it depends on whether the cost of DMA write for memory zeroing, which is memory architecture/DMA engine dependant, offsets the cost of CPU zeroing. Do you have any thoughts on that? I wonder if such thing (using unrelated devices DMA engine's for zeroing) ever been done on other OS'es? AFAIK SGI's BTE is special purpose hardware for memory zeroing. Nope the BTE is a block transfer engine. Its an inter numa node DMA thing that is being abused to zero blocks. Ah, OK. Is there a driver for normal BTE operation or is not kernel-controlled ? The same can be done with most DMA chips (I have done so on some other platforms not on i386) Nice! What kind of DMA chip was that and through which kind of bus was it connected to CPU ? I wonder what has to be done to have active DMA engines be abused for zeroing when idle and what are the implications of that. Some kind of notification mechanism is necessary to inform idleness ? Someone should try implementing the zeroing driver for a fast x86 PCI device. :) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2005-02-02 at 21:00 +, Maciej W. Rozycki wrote: E.g. the Broadcom's MIPS64-based SOCs have four general purpose DMA engines onchip which can transfer data to/from the memory controller in 32-byte chunks over the 256-bit internal bus. We have hardly any use for these devices and certainly not for all four of them. On machines like the Ocelot, I keep intending to abuse one of the DMA engines for access to the DiskOnChip. Really must dig the Ocelot out of the dusty pile of toys... :) -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2005-02-02 at 14:31 -0200, Marcelo Tosatti wrote: Someone should try implementing the zeroing driver for a fast x86 PCI device. :) The BT848/BT878 seems like an ideal candidate. That kind of abuse is probably only really worth it on an architecture with cache-coherent DMA though. If you have to flush the cache anyway, you might as well just zero it from the CPU. -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Nope the BTE is a block transfer engine. Its an inter numa node DMA thing that is being abused to zero blocks. Ah, OK. Is there a driver for normal BTE operation or is not kernel-controlled ? There is a function bte_copy in the ia64 arch. See arch/ia64/sn/kernel/bte.c I wonder what has to be done to have active DMA engines be abused for zeroing when idle and what are the implications of that. Some kind of notification mechanism is necessary to inform idleness ? Someone should try implementing the zeroing driver for a fast x86 PCI device. :) Sure but I am on ia64 not i386. Find your own means to abuse your own chips ... ;-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Wed, 2 Feb 2005, Marcelo Tosatti wrote: Someone should try implementing the zeroing driver for a fast x86 PCI device. :) I'm not convinced. Zeroing a page takes 2000-4000 CPU cycles, while faulting the page from RAM into cache takes 200-400 CPU cycles per cache line, or 6000-12000 CPU cycles. If the page is being used immediately after it is allocated, it may be faster to prezero the page on the fly. On some CPUs these writes bypass the read from RAM stage and allow things to just live in cache completely. -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, 27 Jan 2005, David Woodhouse wrote: > On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: > > An earlier proposal that Christoph pushed would have used the BTE on > > sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? > > I wasn't being that specific. There's spare DMA engines on a lot of > PPC/ARM/FRV/SH/MIPS and other machines, to name just the ones sitting > around my desk. If you look at the patch you will find a function call to register a hardware driver for zeroing. I did not include the driver in this patch because there was no change. Look at my other posts regarding prezeroing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: > An earlier proposal that Christoph pushed would have used the BTE on > sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? I wasn't being that specific. There's spare DMA engines on a lot of PPC/ARM/FRV/SH/MIPS and other machines, to name just the ones sitting around my desk. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: > On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: > > Adds management of ZEROED and NOT_ZEROED pages and a background daemon > > called scrubd. scrubd is disabled by default but can be enabled > > by writing an order number to /proc/sys/vm/scrub_start. If a page > > is coalesced of that order or higher then the scrub daemon will > > start zeroing until all pages of order /proc/sys/vm/scrub_stop and > > higher are zeroed and then go back to sleep. > > Some architectures tend to have spare DMA engines lying around. There's > no need to use the CPU for zeroing pages. How feasible would it be for > scrubd to use these? An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? Robin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: > Adds management of ZEROED and NOT_ZEROED pages and a background daemon > called scrubd. scrubd is disabled by default but can be enabled > by writing an order number to /proc/sys/vm/scrub_start. If a page > is coalesced of that order or higher then the scrub daemon will > start zeroing until all pages of order /proc/sys/vm/scrub_stop and > higher are zeroed and then go back to sleep. Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or higher then the scrub daemon will start zeroing until all pages of order /proc/sys/vm/scrub_stop and higher are zeroed and then go back to sleep. Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, Jan 27, 2005 at 12:15:24PM +, David Woodhouse wrote: On Fri, 2005-01-21 at 12:29 -0800, Christoph Lameter wrote: Adds management of ZEROED and NOT_ZEROED pages and a background daemon called scrubd. scrubd is disabled by default but can be enabled by writing an order number to /proc/sys/vm/scrub_start. If a page is coalesced of that order or higher then the scrub daemon will start zeroing until all pages of order /proc/sys/vm/scrub_stop and higher are zeroed and then go back to sleep. Some architectures tend to have spare DMA engines lying around. There's no need to use the CPU for zeroing pages. How feasible would it be for scrubd to use these? An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? Robin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? I wasn't being that specific. There's spare DMA engines on a lot of PPC/ARM/FRV/SH/MIPS and other machines, to name just the ones sitting around my desk. -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A scrub daemon (prezeroing)
On Thu, 27 Jan 2005, David Woodhouse wrote: On Thu, 2005-01-27 at 07:12 -0600, Robin Holt wrote: An earlier proposal that Christoph pushed would have used the BTE on sn2 for this. Are you thinking of using the BTE on sn0/sn1 mips? I wasn't being that specific. There's spare DMA engines on a lot of PPC/ARM/FRV/SH/MIPS and other machines, to name just the ones sitting around my desk. If you look at the patch you will find a function call to register a hardware driver for zeroing. I did not include the driver in this patch because there was no change. Look at my other posts regarding prezeroing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: A scrub daemon (prezeroing)
Hello Christoph, In this part of your patch: [...] Index: linux-2.6.10/include/linux/gfp.h === --- linux-2.6.10.orig/include/linux/gfp.h 2005-01-21 10:43:59.0 -0800 +++ linux-2.6.10/include/linux/gfp.h2005-01-21 11:56:07.0 -0800 @@ -131,4 +131,5 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); +void prep_zero_page(struct page *, unsigned int order); #endif /* __LINUX_GFP_H */ - imoh would be better: +void prep_zero_page(struct page *page, unsigned int order, unsigned int gfp_flags); hth, Joel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: A scrub daemon (prezeroing)
Hello Christoph, In this part of your patch: [...] Index: linux-2.6.10/include/linux/gfp.h === --- linux-2.6.10.orig/include/linux/gfp.h 2005-01-21 10:43:59.0 -0800 +++ linux-2.6.10/include/linux/gfp.h2005-01-21 11:56:07.0 -0800 @@ -131,4 +131,5 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); +void prep_zero_page(struct page *, unsigned int order); #endif /* __LINUX_GFP_H */ - imoh would be better: +void prep_zero_page(struct page *page, unsigned int order, unsigned int gfp_flags); hth, Joel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/