On Fri, Jul 2, 2010 at 2:59 PM, Hakan Ardo <[email protected]> wrote: > Hi, > we got the simplest possible interpreter level implementation of an > array-like object running (in the interplevel-array branch) and it > executes my previous example about 2 times slower than optimized C. > Attached is the trace generated by the following example: > > img=array(640*480); l=0; i=0; > while i<640*480: > l+=img[i] > i+=1 > > a simplified version of that trace is: > > 1. [p0, p1, p2, p3, i4, p5, p6, p7, p8, p9, p10, f11, i] > 2. i14 = int_lt(i, 307200) > 3. guard_true(i14, descr=<Guard1>) > 4. guard_nonnull_class(p10, 145745952, descr=<Guard2>) > 5. img = getfield_gc(p10, descr=<GcPtrFieldDescr 8>) > 6. f17 = getarrayitem_gc(img, i, descr=<FloatArrayDescr>) > 7. f18 = float_add(f11, f17) > 8. i20 = int_add_ovf(i, 1) > 9. guard_no_overflow(, descr=<Guard3>) # > 10. i23 = getfield_raw(149604768, descr=<SignedFieldDescr 0>) > 11. i25 = int_add(i23, 1) > 12. setfield_raw(149604768, i25, descr=<SignedFieldDescr 0>) > 13. i28 = int_and(i25, -2131755008) > 14. i29 = int_is_true(i28) > 15. guard_false(i29, descr=<Guard4>) > 16. jump(p0, p1, p2, p3, 27, ConstPtr(ptr31), ConstPtr(ptr32), > ConstPtr(ptr33), p8, p9, p10, f18, i20) > > Does these operation more or less correspond to assembler > instructions? I guess that the extra overhead here as compared to the > the C version would be line 4, 5, 9 and 10-15. What's 10-15 all about? > I guess that most of these additional operation would not affect the > performance of more complicated loops as they will only occur once per > loop (although combining the guard on line 9 with line 3 might be a > possible optimization)? Line 4 will appear once for each array used in > the loop and line 5 once for every array access, right? > > Can the array implementation be designed in someway that would not > generate line 5 above? Or would it be possible to get rid of it by > some optimization? > > -- > Håkan Ardö > > _______________________________________________ > [email protected] > http://codespeak.net/mailman/listinfo/pypy-dev >
In addition to the things you noted, I guess the int overflow check can be optimized out, since i+=1 can never cause it to overflow given that i is bounded at 640*480. I suppose in general that would require more dataflow analysis. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Voltaire "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
