Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Dirk Brandewie
On 05/30/2014 01:07 PM, Tim Chen wrote: On Fri, 2014-05-30 at 12:38 -0700, Dirk Brandewie wrote: Dirk, Thanks for checking things out. I tested on a Haswell system, and I see that the frequency can dip below the max even when I set the min_perf_pct to 100. Let me know if you want to log on

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 12:38 -0700, Dirk Brandewie wrote: > > Dirk, > > > > Thanks for checking things out. > > > > I tested on a Haswell system, and I see that the frequency > > can dip below the max even when I set the min_perf_pct to 100. > > Let me know if you want to log on to my system and

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Dirk Brandewie
On 05/30/2014 12:32 PM, Tim Chen wrote: On Fri, 2014-05-30 at 11:45 -0700, Dirk Brandewie wrote: With turbostat from rc7. [root@echolake turbostat]# ./turbostat Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 11:45 -0700, Dirk Brandewie wrote: > > With turbostat from rc7. > [root@echolake turbostat]# ./turbostat > Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 > CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt > CorWatt GFXWatt

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Dirk Brandewie
On 05/30/2014 10:56 AM, Tim Chen wrote: > On Thu, 2014-05-29 at 21:16 -0400, Dave Jones wrote: >> On Thu, May 29, 2014 at 06:07:16PM -0700, Tim Chen wrote: >> > On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: >> > > Sorry for the delay; my Ivy Bridge test machine isn't in my >> > >

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Thu, 2014-05-29 at 21:16 -0400, Dave Jones wrote: > On Thu, May 29, 2014 at 06:07:16PM -0700, Tim Chen wrote: > > On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: > > > Sorry for the delay; my Ivy Bridge test machine isn't in my > > > office and getting to the console to tweak the

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 12:52 -0400, George Spelvin wrote: > > That's very small (less than 0.2%) so I think it's acceptable. > > Thank you! May I take this as an Acked-by; ? Yes, with the caveat that you still have a v3 of this patch that reorganize the K table to rodata. Tim > > I'll work on

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread George Spelvin
> That's very small (less than 0.2%) so I think it's acceptable. Thank you! May I take this as an Acked-by; ? I'll work on some performance improvements, but they proably won't be ready for the 3.16 merge window. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 01:25 -0400, George Spelvin wrote: > > Averaging the 8K bytes per update, I do see an average of 3.2 cycles per > operation (that is, per 8K of data processed) lost, or about 1 cycle per > (3K or less) block processed. I'm hoping the reduced D-cache polution > makes it up

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 01:25 -0400, George Spelvin wrote: Averaging the 8K bytes per update, I do see an average of 3.2 cycles per operation (that is, per 8K of data processed) lost, or about 1 cycle per (3K or less) block processed. I'm hoping the reduced D-cache polution makes it up

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread George Spelvin
That's very small (less than 0.2%) so I think it's acceptable. Thank you! May I take this as an Acked-by; ? I'll work on some performance improvements, but they proably won't be ready for the 3.16 merge window. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 12:52 -0400, George Spelvin wrote: That's very small (less than 0.2%) so I think it's acceptable. Thank you! May I take this as an Acked-by; ? Yes, with the caveat that you still have a v3 of this patch that reorganize the K table to rodata. Tim I'll work on some

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Thu, 2014-05-29 at 21:16 -0400, Dave Jones wrote: On Thu, May 29, 2014 at 06:07:16PM -0700, Tim Chen wrote: On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: Sorry for the delay; my Ivy Bridge test machine isn't in my office and getting to the console to tweak the BIOS is a

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Dirk Brandewie
On 05/30/2014 10:56 AM, Tim Chen wrote: On Thu, 2014-05-29 at 21:16 -0400, Dave Jones wrote: On Thu, May 29, 2014 at 06:07:16PM -0700, Tim Chen wrote: On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: Sorry for the delay; my Ivy Bridge test machine isn't in my office and

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 11:45 -0700, Dirk Brandewie wrote: With turbostat from rc7. [root@echolake turbostat]# ./turbostat Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Dirk Brandewie
On 05/30/2014 12:32 PM, Tim Chen wrote: On Fri, 2014-05-30 at 11:45 -0700, Dirk Brandewie wrote: With turbostat from rc7. [root@echolake turbostat]# ./turbostat Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Tim Chen
On Fri, 2014-05-30 at 12:38 -0700, Dirk Brandewie wrote: Dirk, Thanks for checking things out. I tested on a Haswell system, and I see that the frequency can dip below the max even when I set the min_perf_pct to 100. Let me know if you want to log on to my system and check if

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-30 Thread Dirk Brandewie
On 05/30/2014 01:07 PM, Tim Chen wrote: On Fri, 2014-05-30 at 12:38 -0700, Dirk Brandewie wrote: Dirk, Thanks for checking things out. I tested on a Haswell system, and I see that the frequency can dip below the max even when I set the min_perf_pct to 100. Let me know if you want to log on

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread George Spelvin
Olay, recompiled with the acpi-cpufreq driver, so the performance governor actually works, pegging the frequency at 3900 MHz. Existing (old) code: [ 455.641397] [ 455.641397] testing speed of crc32c [ 455.641403] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 73

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread George Spelvin
> This is odd. On my Ivy Bridge system the CPU speed from /proc/cpuinfo > is at max freq once I set the performance governor. > The numbers above almost look like > the cpu frequency is fluctuating and an average is taken. > What version of the kernel are you running? Is >

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread Dave Jones
On Thu, May 29, 2014 at 06:07:16PM -0700, Tim Chen wrote: > On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: > > Sorry for the delay; my Ivy Bridge test machine isn't in my > > office and getting to the console to tweak the BIOS is a > > bit of a bother. > > > > Anyway, i7-4930K,

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread Tim Chen
On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: > Sorry for the delay; my Ivy Bridge test machine isn't in my > office and getting to the console to tweak the BIOS is a > bit of a bother. > > Anyway, i7-4930K, turbo boost & hyperthreading disabled, > $ cat

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread George Spelvin
Sorry for the delay; my Ivy Bridge test machine isn't in my office and getting to the console to tweak the BIOS is a bit of a bother. Anyway, i7-4930K, turbo boost & hyperthreading disabled, $ cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor performance performance performance

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread Jan Beulich
>>> "George Spelvin" 05/28/14 11:47 PM >>> >Jan Beulich wrote: >> "George Spelvin" 05/28/14 4:40 PM >>> Jan: Is support for SLE10's pre-2.18 binutils still required? >>> Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask. > >> I'd much appreciate if I would be able to build

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread Jan Beulich
George Spelvin li...@horizon.com 05/28/14 11:47 PM Jan Beulich jbeul...@suse.com wrote: George Spelvin li...@horizon.com 05/28/14 4:40 PM Jan: Is support for SLE10's pre-2.18 binutils still required? Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask. I'd much appreciate

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread George Spelvin
Sorry for the delay; my Ivy Bridge test machine isn't in my office and getting to the console to tweak the BIOS is a bit of a bother. Anyway, i7-4930K, turbo boost hyperthreading disabled, $ cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor performance performance performance performance

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread Tim Chen
On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: Sorry for the delay; my Ivy Bridge test machine isn't in my office and getting to the console to tweak the BIOS is a bit of a bother. Anyway, i7-4930K, turbo boost hyperthreading disabled, $ cat

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread Dave Jones
On Thu, May 29, 2014 at 06:07:16PM -0700, Tim Chen wrote: On Thu, 2014-05-29 at 19:54 -0400, George Spelvin wrote: Sorry for the delay; my Ivy Bridge test machine isn't in my office and getting to the console to tweak the BIOS is a bit of a bother. Anyway, i7-4930K, turbo boost

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread George Spelvin
This is odd. On my Ivy Bridge system the CPU speed from /proc/cpuinfo is at max freq once I set the performance governor. The numbers above almost look like the cpu frequency is fluctuating and an average is taken. What version of the kernel are you running? Is

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-29 Thread George Spelvin
Olay, recompiled with the acpi-cpufreq driver, so the performance governor actually works, pegging the frequency at 3900 MHz. Existing (old) code: [ 455.641397] [ 455.641397] testing speed of crc32c [ 455.641403] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 73

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread Tim Chen
On Wed, 2014-05-28 at 19:01 -0400, George Spelvin wrote: > Thanks for the reply! > > > Changing from the aligned move (movdqa) to unaligned move and zeroing > > (pmovzxdq), is going to make things slower. If the table is aligned > > on 8 byte boundary, some of the table can span 2 cache lines,

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
Thanks for the reply! > Changing from the aligned move (movdqa) to unaligned move and zeroing > (pmovzxdq), is going to make things slower. If the table is aligned > on 8 byte boundary, some of the table can span 2 cache lines, which > can slow things further. Um, two notes: 1) This load is

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread Tim Chen
On Wed, 2014-05-28 at 10:40 -0400, George Spelvin wrote: > While following a number of tangents in the code (I was figuring out > how to edit lib/Kconfig; don't ask), I came across a table of 256 64-bit > words, all of which had the high half set to zero. > > Since the code depends on both

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
Jan Beulich wrote: > "George Spelvin" 05/28/14 4:40 PM >> Jan: Is support for SLE10's pre-2.18 binutils still required? >> Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask. > I'd much appreciate if I would be able to build the kernel that way for > another while. Does it

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread Jan Beulich
>>> "George Spelvin" 05/28/14 4:40 PM >>> >Jan: Is support for SLE10's pre-2.18 binutils still required? >Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask. I'd much appreciate if I would be able to build the kernel that way for another while. >Two other minor additional

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
Um, yeah, I just noticed the problem with that patch: half of the numbers in that table are 33 bits, and cause a pile of warnings (not errors, unfortunately!) from gas that scrolled by when I wasn't looking. Logically, there should be no need for 33-bit values; they should all be reducible modulo

[RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
While following a number of tangents in the code (I was figuring out how to edit lib/Kconfig; don't ask), I came across a table of 256 64-bit words, all of which had the high half set to zero. Since the code depends on both pclmulq and crc32, SSE 4.1 is obviously present, so it could use pmovzxdq

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread Jan Beulich
George Spelvin li...@horizon.com 05/28/14 4:40 PM Jan: Is support for SLE10's pre-2.18 binutils still required? Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask. I'd much appreciate if I would be able to build the kernel that way for another while. Two other minor

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
Jan Beulich jbeul...@suse.com wrote: George Spelvin li...@horizon.com 05/28/14 4:40 PM Jan: Is support for SLE10's pre-2.18 binutils still required? Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask. I'd much appreciate if I would be able to build the kernel that way for

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread Tim Chen
On Wed, 2014-05-28 at 10:40 -0400, George Spelvin wrote: While following a number of tangents in the code (I was figuring out how to edit lib/Kconfig; don't ask), I came across a table of 256 64-bit words, all of which had the high half set to zero. Since the code depends on both pclmulq and

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
Thanks for the reply! Changing from the aligned move (movdqa) to unaligned move and zeroing (pmovzxdq), is going to make things slower. If the table is aligned on 8 byte boundary, some of the table can span 2 cache lines, which can slow things further. Um, two notes: 1) This load is

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread Tim Chen
On Wed, 2014-05-28 at 19:01 -0400, George Spelvin wrote: Thanks for the reply! Changing from the aligned move (movdqa) to unaligned move and zeroing (pmovzxdq), is going to make things slower. If the table is aligned on 8 byte boundary, some of the table can span 2 cache lines, which

[RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
While following a number of tangents in the code (I was figuring out how to edit lib/Kconfig; don't ask), I came across a table of 256 64-bit words, all of which had the high half set to zero. Since the code depends on both pclmulq and crc32, SSE 4.1 is obviously present, so it could use pmovzxdq

Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

2014-05-28 Thread George Spelvin
Um, yeah, I just noticed the problem with that patch: half of the numbers in that table are 33 bits, and cause a pile of warnings (not errors, unfortunately!) from gas that scrolled by when I wasn't looking. Logically, there should be no need for 33-bit values; they should all be reducible modulo