On 03/20/2013 05:49 PM, Erik Schnetter wrote: > With gcc, memcpy is completely optimized away. With clang as well -- I > am using memcpy internally e.g. to convert doubles into integers to > access certain bits, and this translates to no instruction at all, > things are just kept in the same register. I would therefore hope that > the pocl->vecmathlib transition would be similarly ideal.
Let's hope so. Anyways, the generic type version is useful to ensure portability to other targets. Afterall, the most important thing is to have an inlineable math library. Other optimizations are secondary at this point. I cannot compile vecmathlib separately to produce the 'test' binary: [ 2%] Building CXX object CMakeFiles/bench.dir/bench.cc.o make[2]: clang++-mp-3.3: Command not found make[2]: *** [CMakeFiles/bench.dir/bench.cc.o] Error 127 make[1]: *** [CMakeFiles/bench.dir/all] Error 2 But I know it uses the SSE2 optimized header as I inserted an #warning there where it includes them. I do not have AVX. cat /proc/cpuinfo ... flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid -- Pekka ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
