Am 25.08.2015 um 20:09 schrieb Richard Henderson:
On 08/25/2015 07:37 AM, Dennis Luehring wrote:
> Am 25.08.2015 um 16:25 schrieb Richard Henderson:
>> Er, no, it should. The primary vector by which I expect improvement is via
not
>> encoding dmmu.mmu_primary_context into the TB flags. I.e. ASI_DMMU, which
>> sun4u certainly uses.
>>
>> The fact that the patch_also_ fixes a sun4v problem is secondary.
>
> please, can you(or someone else) give me a feedback about my tests/numbers -
> and the relevance of them - the stream benchmarks results seems to be worser
> then before and the compilespeed is just a little bit better - so i don't
understand (at
> all) what problems are fixed or what is improved now
The fact that stream degraded means that stream is unreliable as a benchmark.
I suspect that if you simply run it N times with the exact same setup you'll
see a very large variance in its runtime.
This particular patch cannot possibly have degraded performance, as it could
only result in a reduction, not expansion, of the number of TBs created.
As to why stream should be unreliable, I have no clue.
6 runs - 6 times nearly the same result (and the stream benchmark itself
seems not to be an unknown https://www.cs.virginia.edu/stream/ -
measures sustainable memory bandwidth vs. FPU performance)
run 1#
Function Best Rate MB/s Avg time Min time Max time
Copy: 278.3 0.576045 0.574946 0.581186
Scale: 181.5 0.888582 0.881669 0.900648
Add: 217.6 1.109354 1.102955 1.123495
Triad: 167.7 1.440939 1.430755 1.463517
run 2#
Function Best Rate MB/s Avg time Min time Max time
Copy: 277.8 0.577607 0.575970 0.582532
Scale: 181.4 0.909480 0.882134 1.058552
Add: 217.5 1.110417 1.103327 1.122539
Triad: 167.5 1.444383 1.432864 1.477904
run 3#
Function Best Rate MB/s Avg time Min time Max time
Copy: 278.3 0.586721 0.574839 0.655187
Scale: 181.7 0.889060 0.880544 0.898155
Add: 217.3 1.115113 1.104248 1.146618
Triad: 167.6 1.480999 1.432066 1.748302
run 4#
Function Best Rate MB/s Avg time Min time Max time
Copy: 276.7 0.580837 0.578262 0.585253
Scale: 180.6 0.891853 0.885707 0.895370
Add: 216.5 1.116623 1.108630 1.126520
Triad: 167.1 1.444834 1.435996 1.451557
run 5#
Function Best Rate MB/s Avg time Min time Max time
Copy: 278.3 0.593767 0.574839 0.689366
Scale: 182.0 0.897183 0.879005 0.938262
Add: 217.7 1.132244 1.102195 1.203082
Triad: 167.4 1.444530 1.434112 1.487601
> - the compilation test is still 180 times slower then on my host
I'll have to compare that test vs an Alpha guest and see what I get. I only
remember one factor of 10, not two...
But you're right, it would be nice to put together a coherent set of
benchmarks. Ideally, a guest kernel plus minimal ramdisk with the tests
pre-loaded so that we can boot and run ./benchmark at the prompt. That's
the sort of thing we can easily upload to the wiki and share.
any idea what memory bandwidth benchmark i could use
somthing on this list http://lbs.sourceforge.net/ ?