A little extr info to previous message. On Sun, Sep 18, 2022 at 04:44:03PM +0200, Waldek Hebisch wrote: > I took a look at Poplog (in)efficiency. My starting point > was example from HELP * EXTERNAL about thresholding. <snip> > Below > are results on 3GHz Intel i-7 (all times real time in > microseconds): > <snip> > 1000x100, Pop11 intvec fast_subscrintvec => 1239 <snip> > I also tried popc adding version in Syspop11: > > 1000x100, Pop11 fi_< => 1591 > 1000x100, Pop11 fast_subscrintvec => 702 > 1000x100, Pop11 intvec, Syspop => 91
I looked at difference between incremental compiler and popc. One difference is that popc can directly use machine call instruction, while incremental compiler first loads address of routine to register and then is doing indirect call. This seem to be unavoidable, because popc can assume that relative locations of routines are fixed, while code generated by incremental compiler can be moved by garbage collector. But this probably is small difference. There is second difference: in popc version fi_< is done via inline code, while incremental compiler uses function call. This probably is main source of difference in runtime. Intrisingly incremental compiler contains optimization that should convert fi_< to inline code. In more detail, main loop is: fast_for i from 1 to ii do if fast_subscrintvec(i, av) fi_< limit then 0 -> fast_subscrintvec(i, av) endif endfor; Using LIB showcode I see that produces following seqent of operations for Poplog VM: PUSHQ 1 POP i GOTO label_10 label_8: PUSH i PUSH av CALL fast_subscrintvec PUSH limit CALL fi_< IFNOT label_12 PUSHQ 0 PUSH i PUSH av UCALL fast_subscrintvec label_12: label_11: label_9: PUSH i PUSHQ 1 CALL fi_+ POP i label_10: PUSH i PUSH ii CALL fi_> IFNOT label_8 label_7: CALL fi_> + IFNOT (which comes from fast_for) gets converted to inline code. CALL fi_< + IFNOT which could be handled in similar way for some reason leads to actual function call. I also tried Pop11 full vector, using incremental compiler this gives: 1000x100, Pop11 full vector => 635 Using popc: 1000x100, Pop11 full vector => 118 With full vector array indexing and fi_< comparison are uptimized to inline code. As one can see popc version is almost as good as Syspop one. Incremental compiler is much worse here: while it generates inline code this code performs reads and stores to Polog user stack, and that may be main reason for slowdown. There is also another possiblity: popc putc code checking for traps/stack overflow in different place. So it is possible that version produced by incremental compiler suffers due to pathological jump behaviour. Concerning array indexing AFAICS reasons for performance difference are clear: Poplog has optimization for access to full vectors, but lacks similar optimization for access to specialized vectors. So, it would be good to add extra inline optimization. To eliminate access to Polog user stack incremental compiler probably should use similar method like popc. In fact, it probably would be much better to have single compiler that operates in two modes (incremental or batch) instead of current two compilers. -- Waldek Hebisch