Hi Mei,
You said it's a new patch but I found it is identical to the patch
you provided in your first message of this thread.
I have provided my opinions and said enough of this subject. I'm
withholding my judgement. Go ahead with this fix if no other gatekeeper
objects.
Fred
On 06/02/2011 12:53 PM, Ye, Mei wrote:
Attached.
*From:*Ye, Mei
*Sent:* Thursday, June 02, 2011 12:52 PM
*To:* 'Fred Chow'
*Cc:* open64-devel@lists.sourceforge.net
*Subject:* RE: [Open64-devel] code review - fix for Bug #778
Hi Fred
The overflow situation is rare for 64-bit archs, but not so for 32-bit
ones. We have seen two benchmarks fail because of this (459.GemsFDTD
and cyflow).
"end-const2" has a chance to underflow if trip count is 1. In the
attached new patch, I add a check to ensure that the transformation is
applied to constant trip counts that are at least 2.
I agree with you that LFTR is unsafe. I incline to think that our
current implementation is actually incorrect without taking care of
all boundary conditions.
Since LFTR is on for O2 and beyond, it is a bit scary that we get
incorrect codes silently by default. I have been thinking of
disabling it for 32-bit arches unless it can be caught by my changes.
What do you think?
-Mei
*From:*Fred Chow [mailto:frdc...@gmail.com]
*Sent:* Thursday, June 02, 2011 8:16 AM
*To:* Ye, Mei
*Cc:* open64-devel@lists.sourceforge.net
*Subject:* Re: [Open64-devel] code review - fix for Bug #778
Hi Mei,
So in your original example, before your fix, "end" is at or a little
smaller than 0xffffffff, and sym11 in the last iteration is less than
"end", but after adding const2, it wraps around. I would say this
situation is extremely rare, while your fix is creating maintenance
overhead because of such an extremely rare situation. Is it worth it?
As to your expectation that "end - const2 will never be below 0", I
still do not accept it. I can have a loop where "end" is very close
to 0 and will iterate only once because the termination test will be
false the first time. After your fix, "end - const2" becomes a very
large unsigned number and the loop will behave differently.
Fred
On 05/31/2011 11:05 AM, Ye, Mei wrote:
Hi Fred,
"end" is greater than "const2". "const2" is the stride of memory
access. "end" is the new loop upper-bound, which is:
"beginning_memory_address + const2 * upperbound_of_original_IV".
So "end -- const2" will never be below 0.
Sorry that my explanation of "end" in my 1^st Email may be a bit
confusing.
-Mei
*From:*Fred Chow [mailto:frdc...@gmail.com]
*Sent:* Saturday, May 28, 2011 10:12 AM
*To:* Ye, Mei
*Cc:* open64-devel@lists.sourceforge.net
<mailto:open64-devel@lists.sourceforge.net>
*Subject:* Re: [Open64-devel] code review - fix for Bug #778
Hi Mei,
I buy your point, but then I can apply my argument to (end - const2)
in the loop termination test. If end is close to 0, (end - const2)
can become a very large unsigned number. That will also cause the
termination test to evaluate wrong.
Whenever a wraparound occurs in the code formed by LFTR, the iteration
behavior of the loop will change. No matter how you add constant
adjustments here and there, wraparound can still occur at runtime
because we don't know the actual address value at compilation time.
Fred
On 05/28/2011 08:49 AM, Ye, Mei wrote:
Hi Fred,
Thanks much for your comments. When sym3 is just above 0, Even
though "sym11v3= const1 -- const2 + sym3" may produce a negative
number, but "sym11v5 = sym11v4 + const2" should bring the value back
to positive.
So the underflow won't happen. Yes, the transformation creates an
overhead. But it is needed to preserve correctness.
-Mei
*From:*Fred Chow [mailto:frdc...@gmail.com]
*Sent:* Friday, May 27, 2011 7:42 PM
*To:* open64-devel@lists.sourceforge.net
<mailto:open64-devel@lists.sourceforge.net>
*Subject:* Re: [Open64-devel] code review - fix for Bug #778
Mei,
The problem you described is the notarious wraparound issue when
performing LFTR (that's why we provide the -OPT:wrap_around_unsafe_op"
flag to help diagnose such problems when optimization changes a
program's behavior). But your claim that your change makes this
problem "less likely" to occur is elusive. Generally speaking, the
value of the address
sym3
is not known until run-time. If it is just above 0, it would not have
any problem before your change, but after your change, the "- const2"
may create underflow in sym11, making it becomes a very large value,
which then causes the comparison (sym11 <= end) to give different result.
Your change just shifts the occurrence of the wraparound arbitrarily
to different range of values. Statistically speaking, the probability
of wraparound stays the same. You may claim that for your
system/processor combination, your change justifies because the value
of sym3 is more likely to be close to 0xffffffff than 0x00000000. But
this is only for your system, not in general.
On the other hand, your change introduces a small overhead due to the
additional (- const2), resulting in small performance degradation.
Fred
On 05/23/2011 05:53 PM, Ye, Mei wrote:
My earlier check-in r3502 forces loop end-test comparison to be
"unsigned" in the Linear Function Test Replacement (LFTR) phase. This
fixes an overflow problem when memory address crosses the boundary of
0x80000000 in 32-bit architectures. The fix exposes another bug in
Linear Function Test Replacement, which is explained below using the
following pseudo-codes, where "sym7" is the original loop index,
"sym11" is the memory address that replaces the original loop index as
the induction variable, and "end" is the loop upper bound. If the
value of the memory address is very closed to 0xffffffff, adding a
positive constant can overflow and produces a small positive result
for "sym11v5", which then causes "sym11v5 <= end" to be evaluated as
"TRUE", and the loop is mistakenly executed many more times than it
should. This leads to seg faults.
sym7v3 = const1
sym11v3 = sym7v3 + sym3
LABEL:
sym11v4 = phi(sym11v3, sym11v5)
*sym11v4 = ...
sym11v5 = sym11v4 + const2
if (sym11v5 <= end)
goto LABEL
To fix this bug, we transform the above code into the following.
sym7v3 = const1
sym11v3 = const1 - const2 + sym3 // The result of 'const1 - const2'
should use signed type.
LABEL:
sym11v4 = phi(sym11v3, sym11v5)
sym11v5 = sym11v4 + const2
*sym11v5 = ...
if (sym11v5 <= end - const2)
goto LABEL
This transformation uses pre-increment instead of post-increment for
the induction variable update and replaces the use of "sym11v4" with
"sym11v5".
My implementation replaces CR operands associated with "sym11v4"
without rehashing since none of these expressions appear outside of
the loop. It is also impractical to rehash these expressions at this
point since doing so will burn compilation time to find each
occurrence from all the worklsts. I also avoid the situations that
the transformation will create new expressions having CSE occurrences
outside of the loop. This is to avoid changing CODEREPs unintentionally.
Although this work still will not make LFTR 100% safe, but it should
cover a vast majority.
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net <mailto:Open64-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/open64-devel
------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Discover what all the cheering's about.
Get your free trial download today.
http://p.sf.net/sfu/quest-dev2dev2
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel