David - 

Thank you for the speedy reply.

> ...like 11.5% of the time, the value is not in the cache and must be 
> retrieved from main memory.
That is my understanding too. I do not know how much gain I can get by removing 
those 11.5%, but I would like to do that.

> ...dereferencing the variable named right immediately after the prefetch, so 
> the CPU will stall 11.5% of the time.
Ok…
Do I understand correctly that the 11.5% misses at the point of dereferencing 
the variable named right means UNDOUBTEDLY that the variable right itself is 
not in L1 at this moment?
(In other words, the 64 bytes at $1 = (IndexValue &) @0x7ffff5ad13e0: are not 
prefetched? ).
Also, do I understand correctly, that that if the right were prefetched, 
putting additional prefetch command would NOT have stalled CPU?
Also, if - for testing purposes -  I put some code between 
>>                  __builtin_prefetch(&right);
and 
>> __builtin_prefetch(*((void **)&right));  //?!! Why D1mr are observed there?

it should eliminate the stall. ( Though not going to improve overall 
performance, just eliminate the stall (D1mr misses)…)
I tried that, putting different length of code… But still getting the same 
11.5% at the same de-referrencing point… 
That is when I get puzzled… ;)

Finally, I modified the code like:
"
…
  static SListOp compare(const IndexValue& left,IndexValue& right,ulong ity) {
              __builtin_prefetch(*((void **)&right)); //?!! Why D1mr are 
observed there?
             int cmp=left.key.cmp(right.key,(TREE_KT)ity); return 
cmp<0?SLO_LT:cmp>0?SLO_GT:SLO_NOOP;
  }
…
"
and put __builtin_prefetch(&right) before calling compare(), upper the calling 
stack. I even tried different `distance` from the compare() call… But still 
stuck with 11.5% at the same point. 

And that is where I'm really get lost. ;) 


> Only a cache-friendly reorganization of your data or access patterns can 
> reduce this…
I understand that. The problem is - I do not understand what is happening with 
cache at the point of dereferencing. That is why it is difficult to adjust the 
code. 


> ...if your working set size exceeds the cache size,

I do not think it is the case with that program.

Thanks for your help. I'm still trying to 'get control' over those D1mr.
Michael.

On 2012-05-12, at 12:43 AM, David Chapman <dcchap...@acm.org> wrote:

> On 5/11/2012 8:04 PM, Michael Andronov wrote:
>> 
>> I'm looking for some help/hint to explain the results I'm observing with the 
>> following peace of code:
>> "
>> …
>> struct CmpIndexValue {
>>              static SListOp compare(const IndexValue& left,IndexValue& 
>> right,ulong ity) {
>>                  __builtin_prefetch(&right);
>>                  __builtin_prefetch(*((void **)&right));  //?!! Why D1mr are 
>> observed there?
>>                  int cmp=left.key.cmp(right.key,(TREE_KT)ity); return 
>> cmp<0?SLO_LT:cmp>0?SLO_GT:SLO_NOOP;
>>              }
>>      };
>> …
>> "
>> where the `right` is the reference for some structure like:
>> (gdb) print right
>> $1 = (IndexValue &) @0x7ffff5ad13e0: {key = {ptr = {p = 0x7ffff5ad13d8, l = 
>> 5}, ... }
>> (gdb) x /4x (void **)&right
>> 0x7ffff5ad13e0:      0xf5ad13d8      0x00007fff      0x00000005      
>> 0x00000000
>> (gdb) x /4x *(void **)&right
>> 0x7ffff5ad13d8:      0x048056a1      0x01000016      0xf5ad13d8      
>> 0x00007fff
>> (gdb) 
>> 
>> The expected behaviour:
>> - the line - __builtin_prefetch(*((void **)&right)); - should trigger the 
>> prefetch (64 bytes of my machine) from address {key = {ptr = {p = 
>> 0x7ffff5ad13d8… .
>> - when the actual access to right.key.ptr.p will occur - within 
>> left.key.cmp() - the values should be within Ld1…
>> 
>> The observed behaviour is different, however:
>> - the kcachegrind is reporting significant 11.5% D1mr misses at 
>> __builtin_prefetch(*((void **)&right)); line. Which looks a bit strange and 
>> unexpected for me...
>> 
>> Actually, if that line is commented out, then approximately the same amount 
>> of D1mr losses is shown later, within left.key.cmp() function, at attempt to 
>> access right.key.ptr.p values.
>> So, to the 'certain degree', putting the __builtin_prefetch() function is 
>> doing the job and performing prefetching… 
>> But the idea was to eliminate the D1mr misses, not  'to move' them around 
>> the code. ;)
>> 
>> I would be very grateful for any hint for direction to understand and/or 
>> explanation where my expectations are wrong and/or what I'm doing wrong.
>> 
>> 
> 
> The goal of prefetching is to get the data you are going to need while your 
> code is doing something else.  You have told the compiler that a specific 
> value is going to be used very soon, so it should ask the CPU to ensure that 
> the value is in the cache.  It sounds like 11.5% of the time, the value is 
> not in the cache and must be retrieved from main memory.  Only a 
> cache-friendly reorganization of your data or access patterns can reduce 
> this, and if your working set size exceeds the cache size, you're more or 
> less stuck with that miss rate.  You could move the misses around in the 
> code, but you can't get rid of them by prefetching.
> 
> Alternatively, you could rewrite the code to do more work between the 
> prefetch request and the actual use of the data.   Right now you are 
> dereferencing the variable named right immediately after the prefetch, so the 
> CPU will stall 11.5% of the time.  If there was more work to do, the CPU 
> would not be idle while the cache line was being loaded.
> -- 
>     David Chapman      dcchap...@acm.org
>     Chapman Consulting -- San Jose, CA
>     Software Development Done Right.
>     www.chapman-consulting-sj.com

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to