On 03/20/2013 05:49 PM, Erik Schnetter wrote:

> With gcc, memcpy is completely optimized away. With clang as well -- I
> am using memcpy internally e.g. to convert doubles into integers to
> access certain bits, and this translates to no instruction at all,
> things are just kept in the same register. I would therefore hope that
> the pocl->vecmathlib transition would be similarly ideal.

Let's hope so. Anyways, the generic type version is useful to ensure
portability to other targets. Afterall, the most important thing is to
have an inlineable math library. Other optimizations are secondary
at this point.

I cannot compile vecmathlib separately to produce the 'test' binary:

[  2%] Building CXX object CMakeFiles/bench.dir/bench.cc.o
make[2]: clang++-mp-3.3: Command not found
make[2]: *** [CMakeFiles/bench.dir/bench.cc.o] Error 127
make[1]: *** [CMakeFiles/bench.dir/all] Error 2

But I know it uses the SSE2 optimized header as I inserted an
#warning there where it includes them. I do not have AVX.

cat /proc/cpuinfo
...
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 
ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat dtherm 
tpr_shadow vnmi flexpriority ept vpid


-- 
Pekka

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to