I just pushed cache line detection and prefetching, so you'll have to rerun configure.
The cache detection probably only works correctly on Linux or with glibc (it tries to use sysconf with _SC_LEVEL1_DCACHE_*, then falls back on running getconf(1P) with these arguments. I know how to do it generically on x86/x64, but that requires inline assembly (you have to be able to call CPUID) which I don't know how to do portably. The defaults are fairly conservative: 32KiB L1D, 32-byte cache lines, 2-way associativity (though currently only the line length is being used). For comparison, Core 2 has (32KiB, 64, 8-way) and AMD K10 has (64KiB, 64, 2-way). If the script is getting these wrong, you can set them manually with --known-level1-dcache-size --known-level1-dcache-linesize --known-level1-dcache-assoc but also let me know so we can figure out how to autodetect it. And of course, yell at me if I've broken your build. Jed
