On Mon, Aug 13, 2012 at 07:04:02PM +0200, Borislav Petkov wrote:
> On Mon, Aug 13, 2012 at 02:43:34PM +0300, Kirill A. Shutemov wrote:
> > $ cat test.c
> > #include
> > #include
> >
> > #define SIZE 1024*1024*1024
> >
> > void clear_page_nocache_sse2(void *page) __attribute__((regparm(1)));
> >
On Mon, Aug 13, 2012 at 02:43:34PM +0300, Kirill A. Shutemov wrote:
> $ cat test.c
> #include
> #include
>
> #define SIZE 1024*1024*1024
>
> void clear_page_nocache_sse2(void *page) __attribute__((regparm(1)));
>
> int main(int argc, char** argv)
> {
> char *p;
> unsigned long
> Moving 64 bytes per cycle is faster on Sandy Bridge, but slower on
> Westmere. Any preference? ;)
You have to be careful with these benchmarks.
- You need to make sure the data is cache cold, cache hot is misleading.
- The numbers can change if you have multiple CPUs doing this in parallel.
-A
>>> On 13.08.12 at 13:43, "Kirill A. Shutemov"
>>> wrote:
> On Thu, Aug 09, 2012 at 04:22:04PM +0100, Jan Beulich wrote:
>> >>> On 09.08.12 at 17:03, "Kirill A. Shutemov"
>> >>> wrote:
>
> ...
>
>> > ---
>> > arch/x86/include/asm/page.h |2 ++
>> > arch/x86/include/asm/string_3
On Thu, Aug 09, 2012 at 04:22:04PM +0100, Jan Beulich wrote:
> >>> On 09.08.12 at 17:03, "Kirill A. Shutemov"
> >>> wrote:
...
> > ---
> > arch/x86/include/asm/page.h |2 ++
> > arch/x86/include/asm/string_32.h |5 +
> > arch/x86/include/asm/string_64.h |5
>>> On 09.08.12 at 17:03, "Kirill A. Shutemov"
>>> wrote:
> From: Andi Kleen
>
> Add a cache avoiding version of clear_page. Straight forward integer variant
> of the existing 64bit clear_page, for both 32bit and 64bit.
While on 64-bit this is fine, I fail to see how you avoid using the
SSE2 i
On 08/09/2012 08:03 AM, Kirill A. Shutemov wrote:
From: Andi Kleen
Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.
Also add the necessary glue for highmem including a layer that non cache
coherent architec
From: Andi Kleen
Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.
Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can