--- Comment #33 from eyal at geomage dot com 2008-02-13 16:06 ---
Hi All,
I've done some changes that hopefully prevent the memory from being a
performance bottleneck. I see a perf gain of ~10%. However the compiler still
gives me the warnings in comment #19 -
Test.cpp:24: note
--- Comment #30 from eyal at geomage dot com 2008-02-12 08:43 ---
Hi,
Thanks a lot for the input about a potential memory bottle-neck. I indeed was
under the impression that once I got the loop vectorized, I'd immidiatly see a
performance boost.
I would appriciate, however
--- Comment #32 from eyal at geomage dot com 2008-02-12 11:28 ---
(In reply to comment #31)
I would appriciate, however, a further explaination about this issue.
The explanation has to deal with CPU architecture and is not related to
compilers. In case of cache miss the memory load
--- Comment #27 from eyal at geomage dot com 2008-02-11 14:00 ---
Hi,
I am a bit lost and appriciate your guidelines. Up till now, after all those
emails, I still have no clue as to why such a simple test case doesnt work. As
far as I understood the vectorization should have shown
--- Comment #21 from eyal at geomage dot com 2008-02-10 13:48 ---
(In reply to comment #14)
Giving it another thought, this is not necessary an alias analysis issue, even
that it fails to tell that the pointers not alias. Since in this case the
pointers do differ, the runtime test
--- Comment #23 from eyal at geomage dot com 2008-02-10 15:47 ---
(In reply to comment #22)
1. It looks like vectorizer was enabled in both cases, since -O3 enables the
vectorizer by the default. You need to add -fno-tree-vectorize to disable it
explicitly.
2. To get better results
--- Comment #19 from eyal at geomage dot com 2008-02-10 07:42 ---
Hi,
This is the simplest test I have.
#include iostream
#include stdio.h
#include stdlib.h
typedef float ARRTYPE;
int main ( int argc, char *argv[] )
{
int m_nSamples = atoi( argv[1] );
int itBegin
--- Comment #20 from eyal at geomage dot com 2008-02-10 07:56 ---
Hi,
I've tried putting the loop to be vectorized in a different method and the
compiler output looks better, but the performance is still the same as the
non-vectorized code.
#include iostream
#include stdio.h
#include
--- Comment #17 from eyal at geomage dot com 2008-02-08 08:58 ---
Using malloc instead of new does generate better code and improves performance
slightly for me, admittedly not as much as we would like; the kernel becomes:
(using only -O3 -S -m64 -maltivec)
.L29:
lvx 13,7,9
--- Comment #16 from eyal at geomage dot com 2008-02-08 08:55 ---
Thanks a lot Ira, I appriciate it.
If you need the full test code with .vect file and makefiles,please let me
know.
thanks,
eyal
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117
at geomage dot com
GCC build triplet: gcc (GCC) 4.3.0 20071124 (experimental)
GCC host triplet: PowerPC
GCC target triplet: PowerPC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117
--- Comment #2 from eyal at geomage dot com 2008-02-07 10:36 ---
Yes the loop is vectorized. What do you mean by memory bound? dont you think
that vectorization can help here? I see around 20% performance gain in the real
application.
Bellow is the compiler output:
Eyal.cpp:34: note
--- Comment #5 from eyal at geomage dot com 2008-02-07 10:43 ---
(In reply to comment #3)
I think this is a dup of another bug I filed with respect of the builtin
operator new that getting the malloc attribute.
Are you refering to using malloc instead of new?
using malloc didnt make
--- Comment #7 from eyal at geomage dot com 2008-02-07 11:06 ---
(In reply to comment #6)
(In reply to comment #2)
Yes the loop is vectorized.
...
Eyal.cpp:34: note: created 9 versioning for alias checks.
Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition
--- Comment #8 from eyal at geomage dot com 2008-02-07 12:16 ---
Hi Ira,
Here is the compiler output for the real code.
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86651_134 and *D.8_160
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check
--- Comment #10 from eyal at geomage dot com 2008-02-07 12:58 ---
(In reply to comment #9)
(In reply to comment #8)
{
float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float)
*m_nSamples);
float *pTempSum2Phase_Temp_cre_angle = (float*) malloc
--- Comment #12 from eyal at geomage dot com 2008-02-07 13:07 ---
(In reply to comment #11)
(In reply to comment #10)
Is there some pragma or a coding convention I can use to make the compiler
understant those pointers have nothing to do with each other?
There is __restrict__
17 matches
Mail list logo