Attached.

From: Ye, Mei
Sent: Thursday, June 02, 2011 12:52 PM
To: 'Fred Chow'
Cc: open64-devel@lists.sourceforge.net
Subject: RE: [Open64-devel] code review - fix for Bug #778

Hi Fred

The overflow situation is rare for 64-bit archs, but not so for 32-bit ones.  
We have seen two benchmarks fail because of this (459.GemsFDTD and cyflow).
"end-const2" has a chance to underflow if trip count is 1.  In the attached new 
patch, I add a check to ensure that the transformation is applied to constant 
trip counts that are at least 2.

I agree with you that LFTR is unsafe.  I incline to think that  our current 
implementation is actually incorrect without taking care of all boundary 
conditions.
Since LFTR is on for O2 and beyond, it is a bit scary that we get incorrect 
codes silently by default.  I have been thinking of disabling it for 32-bit 
arches unless it can be caught by my changes.
What do you think?

-Mei

From: Fred Chow [mailto:frdc...@gmail.com]
Sent: Thursday, June 02, 2011 8:16 AM
To: Ye, Mei
Cc: open64-devel@lists.sourceforge.net
Subject: Re: [Open64-devel] code review - fix for Bug #778

Hi Mei,

So in your original example, before your fix, "end" is at or a little smaller 
than 0xffffffff, and sym11 in the last iteration is less than "end", but after 
adding const2, it wraps around.  I would say this situation is extremely rare, 
while your fix is creating maintenance overhead because of such an extremely 
rare situation.  Is it worth it?

As to your expectation that "end - const2 will never be below 0", I still do 
not accept it.  I can have a loop where "end" is very close to 0 and will 
iterate only once because the termination test will be false the first time.  
After your fix, "end - const2" becomes a very large unsigned number and the 
loop will behave differently.

Fred

On 05/31/2011 11:05 AM, Ye, Mei wrote:
Hi Fred,

"end" is greater than "const2".   "const2" is the stride of memory access.    
"end" is the new loop upper-bound, which is: "beginning_memory_address  +  
const2 * upperbound_of_original_IV".
So "end - const2" will never be below 0.

Sorry that my explanation of "end" in my 1st Email may be a bit confusing.


-Mei

From: Fred Chow [mailto:frdc...@gmail.com]
Sent: Saturday, May 28, 2011 10:12 AM
To: Ye, Mei
Cc: 
open64-devel@lists.sourceforge.net<mailto:open64-devel@lists.sourceforge.net>
Subject: Re: [Open64-devel] code review - fix for Bug #778

Hi Mei,

I buy your point, but then I can apply my argument to (end - const2) in the 
loop termination test.  If end is close to 0, (end - const2) can become a very 
large unsigned number.  That will also cause the termination test to evaluate 
wrong.

Whenever a wraparound occurs in the code formed by LFTR, the iteration behavior 
of the loop will change.  No matter how you add constant adjustments here and 
there, wraparound can still occur at runtime because we don't know the actual 
address value at compilation time.

Fred

On 05/28/2011 08:49 AM, Ye, Mei wrote:
Hi Fred,

Thanks much  for your comments.  When sym3 is just above 0, Even though 
"sym11v3= const1 - const2 + sym3" may produce  a negative number, but "sym11v5 
= sym11v4 + const2" should bring the value back to positive.
So the underflow won't happen.  Yes, the transformation creates an overhead.  
But it is needed to preserve correctness.

-Mei


From: Fred Chow [mailto:frdc...@gmail.com]
Sent: Friday, May 27, 2011 7:42 PM
To: 
open64-devel@lists.sourceforge.net<mailto:open64-devel@lists.sourceforge.net>
Subject: Re: [Open64-devel] code review - fix for Bug #778

Mei,

The problem you described is the notarious wraparound issue when performing 
LFTR (that's why we provide the -OPT:wrap_around_unsafe_op" flag to help 
diagnose such problems when optimization changes a program's behavior).  But 
your claim that your change makes this problem "less likely" to occur is 
elusive.  Generally speaking, the value of the address

sym3

is not known until run-time.  If it is just above 0, it would not have any 
problem before your change, but after your change, the "- const2" may create 
underflow in sym11, making it becomes a very large value, which then causes the 
comparison (sym11 <= end) to give different result.

Your change just shifts the occurrence of the wraparound arbitrarily to 
different range of values.  Statistically speaking, the probability of 
wraparound stays the same.  You may claim that for your system/processor 
combination, your change justifies because the value of sym3 is more likely to 
be close to 0xffffffff than 0x00000000.  But this is only for your system, not 
in general.

On the other hand, your change introduces a small overhead due to the 
additional (- const2), resulting in small performance degradation.

Fred

On 05/23/2011 05:53 PM, Ye, Mei wrote:

My earlier check-in r3502 forces loop end-test comparison to be "unsigned" in 
the Linear Function Test Replacement (LFTR) phase.  This fixes an overflow 
problem when memory address crosses the boundary of 0x80000000 in 32-bit 
architectures. The fix exposes another bug in Linear Function Test Replacement, 
which is explained below using the following pseudo-codes, where "sym7" is the 
original loop index, "sym11" is the memory address that replaces the original 
loop index as the induction variable, and "end" is the loop upper bound.  If 
the value of the memory address is very closed to 0xffffffff, adding a positive 
constant can overflow and produces a small positive result for "sym11v5", which 
then causes "sym11v5 <= end" to be evaluated as "TRUE", and the loop is 
mistakenly executed many more times than it should.  This leads to seg faults.

sym7v3 = const1
sym11v3 = sym7v3 + sym3

LABEL:
sym11v4 = phi(sym11v3, sym11v5)
*sym11v4 = ...
sym11v5 = sym11v4 + const2
if (sym11v5 <= end)
   goto LABEL

To fix this bug, we transform the above code into the following.

sym7v3 = const1
sym11v3 = const1 - const2 + sym3 // The result of 'const1 - const2' should use 
signed type.
LABEL:
sym11v4 = phi(sym11v3, sym11v5)
sym11v5 = sym11v4 + const2
*sym11v5  = ...
if (sym11v5 <= end - const2)
   goto LABEL


This transformation uses pre-increment instead of post-increment for the 
induction variable update and replaces the use of "sym11v4" with "sym11v5".

My implementation replaces CR operands associated with "sym11v4" without 
rehashing since none of these expressions appear outside of the loop.  It is 
also impractical to rehash these expressions at this point since doing so will 
burn compilation time to find each occurrence from all the worklsts.  I also 
avoid the situations that the transformation will create new expressions having 
CSE occurrences outside of the loop.  This is to avoid changing CODEREPs 
unintentionally.

Although this work still will not make LFTR 100% safe, but it should cover a 
vast majority.





------------------------------------------------------------------------------

vRanger cuts backup time in half-while increasing security.

With the market-leading solution for virtual backup and recovery,

you get blazing-fast, flexible, and affordable data protection.

Download your free trial now.

http://p.sf.net/sfu/quest-d2dcopy1





_______________________________________________

Open64-devel mailing list

Open64-devel@lists.sourceforge.net<mailto:Open64-devel@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/open64-devel



Attachment: odc
Description: odc

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Discover what all the cheering's about.
Get your free trial download today. 
http://p.sf.net/sfu/quest-dev2dev2 
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to