Hello All, My second (and hopefully last) question after returning to the GIMPS fold. I've never unsub'd from the mailing list, though, and haven't seen this question come across....
I have a Quad Core QX6700, which is my first multi-core system, and my first overclocked system. It has just finished up P-1 stage 2 factoring and begun LL tests on each of its cores. http://www.mersenne.org/bench.htm The benchmark page above says that a Core 2 Quad QX6700 should be running about 0.0569 per iteration on a 2560K FFT. I assume that's at default clocks. My system seems to be averaging less than half that, between 0.105 and 0.131, depending on which core.... and I can't figure out why. What should take 24 days is going to take 60 days at this rate. The following may be irrelevant, but I'm wondering.... The native 2.66 Ghz CPU is overclocked to 3.47 Ghz, with Vista 64 (and Prime95's 64-bit version) so I can have full access to all of 4 GB of very fast RAM. The RAM is Corsair Dominator PC2-8500 (800 Mhz native, 5-5-5-18-2T) and for extra memory bandwidth I've overclocked the RAM to 1000 Mhz. The CPU overclocking is all done via multiplier (10X -> 13X) and voltage, and the RAM overclock is done by configuring the motherboard's RAM speed to be unlinked from the FSB, then manually increasing the RAM speed to its max stable level of 1000 Mhz. I did not alter the FSB speed at all, since my RAID controller doesn't like FSB speed tweaks. Nor did I alter the memory timings, leaving them at the native 5-5-5-18-2T. The system is sufficiently cooled (liquid, sustaining 59C-63C at 100% load on all 4 cores) and passed overnight torture testing without throwing up errors. With the above tweaks, the system throws up 12500 3DMarks, and scores an overall 5.9 on Vista's Experience Rating (the best one can score). It was scoring 5.8 prior to my increase of the memory speed from 800 Mhz to 1000 Mhz, so my bump should have boosted memory bandwidth. Alas, matching 1:1 and running the memory at 1066 Mhz (the native FSB speed of the system) is not stable. I can't quite push the memory that fast -- 1000 Mhz is where stability tops out. At 1066 Mhz memory speed, I start getting rounding errors after 3-4 hours of torture testing. Now, here's what I'm wondering. Is it possible that the source of these slower benchmarks is that tiny discrepancy between the FSB speed and the RAM speed? Would there be timing delays in running the memory just SLIGHTLY slower than the FSB that P95 doesn't much like due to its use of RAM for the lookup tables? Or, is it simpler than that? Am I perhaps bumping up against L2 cache thrashing, something that might be common on Intel multi-core machines that share a single L2 cache like this? (Nehalem, where art thou?) Is the benchmark listed simply what one core would do if it had exclusive use of the L2 cache while other cores were idle, and the lower iteration times are to be expected when all four cores are each working on their respective exponents, contending for a single L2 cache? This last theory seems to be supported by the fact that I paused all but one core and let it iterate on that one core with near-exclusive use of the otherwise idle system, and the iteration times fell dramatically, to 0.050 sustained and "best time" from the Options->Benchmark menu of 0.048. That's more in line with what I'd expect, given the benchmark pages showing a stock clocked QX6700 cranking at 0.0569, combined with my overclock. So, to cut to the chase, the benchmarks seem to be geared to exclusive use of the L2 cache, but in the real world that's not how I'd imagine most GIMPS users run P95 on a multi-core system. If my hypothesis is correct, wouldn't it be better to post separate benchmarks for "one core in use" versus "all cores in use", so that people's expectations aren't skewed by a benchmark table that doesn't represent typical use? Any guidance or experience would be welcome. Thanks! Jeff Woods Reading, PA _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
