On 03/19/2013 08:24 PM, Erik Schnetter wrote:
> I can reproduce test suite failures on my system (OSX, x86_64), but no
> hangs. Also, vecmathlib does not have loops that could lead to a hang.

Also stack corruption can lead to hangs. E.g., if a memcpy/memset goes
over the bounds of a stack object.

But it doesn't seem to be the case. There is actually an uncoditional
jmp to itself where it gets stuck:

=> 0x00007ffff5dc7136 <_test_convert_type+263318>: jmp 0x7ffff5dc7136 
<_test_convert_type+263318>

I can see it in the final parallel.bc:

...
@.str1047 = private unnamed_addr constant [27 x i8] 
c"convert_char_sat((double))\00", align 1
...

...
if.then.i422340.wi_0_0_0.i:                       ; preds = 
%for.body.i422334.wi_0_0_0.i 

   %conv.i422335.i = sext i8 %11703 to i32, !wi !1, !wi_counter !47768 
 

   %conv2.i422336.i = sext i8 %11704 to i32, !wi !1, !wi_counter !47769 
 

   %call.i422339.i = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds 
([71 x i8]* @.str, i64 0, i64 0), i8* getelementptr inbounds ([27 x i8]* 
@.str1047, i64 0, i64 0), i32 0, i32 0, i32 %conv.i422335.i, i32$
   br label %tailrecurse.i, !wi !1, !wi_counter !47771 
 

 
 

tailrecurse.i:                                    ; preds = %tailrecurse.i, 
%if.then.i422340.wi_0_0_0.i, %for.cond.i422329.wi_0_0_0.i 

   br label %tailrecurse.i 
 

}


So probably something accidentally ends up calling itself leading to
infinite recursion that is optimized to an empty infinite loop. It happens
after the printf which prints "convert_char_sat((double))"

In the kernel source:

...
compare_char_elements("convert_char_sat((double))", i, expected.raw, 
actual.raw, 1); 

expected.value = ((char((char)double_rounded_values_rte[i])); 
 


actual.value = convert_char_rte((double)double_values[i]); 
 


Looking up convert_char_rte in the kernel_linked.bc shows it calls
_cl_round(double) which calls VML routines in a longer chain which might
cause the recursion (I didn't track the calls to the end).

Indeed: I tried to use the pocl's round.cl instead of VML's and it didn't
hang anymore. It prints the verification errors and finishes.

I saw you have a version that uses the __ext_vector_type__ attribute for
the internal vector types instead of the __m128 (the SSE optimized version).
It didn't compile when I tried to enable it, but maybe that's something to try 
next; try to use the exact same internal data type for the storage
in VML as pocl does.

Even if the issue we are seeing wasn't caused by the memset/memcpy
conversion directly, it seems waste of cycles to convert back and
worth this way just due to the VML interfacing, if that can be avoided.

> I notice that all failures are for vector datatypes that are 32 bytes
> long (short16, int8, long4, float8, double4). This is very suspicious,
> since the implementation of convert_type does not depend on this size. I
> also notice that the AVX vector instructions provide 32-byte vectors,
> but are very difficult to use for integer vectors. Could this be an LLVM
> code generation issue?

I doubt it, but I'll try with LLVM trunk next.

-- 
Pekka

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to