On Wed, Mar 20, 2013 at 3:18 PM, Pekka Jääskeläinen <
[email protected]> wrote:

> On 03/20/2013 07:48 PM, Erik Schnetter wrote:
>
>> I think I found the problems. The C++ compiler does not know that long and
>> double are to be supported, since the C++ code does not include types.h.
>> Therefore, only round(float) is generated, and not round(double).
>> Presumably,
>> round(double) is then taken from somewhere else. Also, the C++ compiler
>> doesn't
>> seem to see the optimization settings, so it produces unoptimized code,
>> so that
>> the calls to memcpy remain, and the call chain within VML is not inlined.
>>
>
> Hmm. I wonder could the "merging" of a float2 arg to a double in the
> calling
> convention mess this up somehow. If it ends up calling round(double) when
> it
> should call round(float2)? And the round(double) is actually a libm
> scalar round instead of a vector round. Just shooting in the dark here...
>

You should look at the final parallel.bc in the kernel temp dir
> if you want to see if the memcpys are optimized away. It has all the
> optimizations applied after fully linking and aggressively inlining
> everything. The clang++ per module optimizations should not matter here
> so much.


I'll have a look at the parallel.bc file then.

-erik

-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to