On Thu, Nov 22, 2018 at 11:46:44AM +0100, felix.winkelm...@bevuta.com wrote:
> This patch adds an additional optimization pass to the "lfa2"
> compiler stage, which attempts to remove unnecessary
> boxing and unboxing of floating point numbers. Specifically,
> calls to floating point inline operations that have a variant that
> accepts unboxed arguments are replaced with a faster version,
> omitting the unboxing of arguments, and possibly also the
> boxing of results.
Hi Felix,
I had a look and it looks quite simple but effective. I have made a few
small modifications to your patch:
- "utype" was no longer used in c-backend.scm after removing the old
unboxing code, so I've removed that procedure as well.
- The patch introduced several lines with trailing whitespace, I've
cleaned it up a bit so git doesn't complain as much when applying.
- I noticed you introduced a second entry for fix_to_flo, in the
constructor map, so I've removed that. There were also some
additional entries with the (invalid) "flonum" type that you
missed, so I've updated those too.
- In lfa2, sub-boxed was always called just before calling extinguish!
to decrement variable nodes in floatvars, but this seems not entirely
justified; extinguish! will do some checks to see if it is droppable
at all, and only drop it in that case. Additionally, it will traverse
the sub-node tree first to see if it can drop any of those.
Therefore I think it is better to move the sub-boxed calls into
drop!, to ensure the counts will match up with the actual number of
nodes that remain; if there are multiple references to the variable
in the sub-nodes and the entire node is dropped, you want to decrement
the counters by the total number of (sub)nodes referring to the
variable that were dropped, not just once for the main node.
I kept the one remaining call to sub-boxed in the case where we look
up the call in +ffi-type-check-map+ to "raise" the subexpression, but
I'm not 100% sure it is correct; if we raise the subexpression, the
sub-expressions remain the same, no? So, for example, when we make
this replacement:
(##core#inline "C_i_foreign_flonum_argumentp" foo) => foo
AFAIK that does not remove any references to variables in the foo
sub-expression, so if I understand it correctly, it should not be
decremented from the list of floatvars. I've kept it because I'm
not certain enough about this to remove it, but if you agree, please
remove that call after applying this modified patch.
- Finally: there are still quite some remnants of the old boxing/unboxing
code around to mark variables as 'boxed, and there's still ##core#box
and ##core#unbox in the intermediate language.
Is that still relevant, or can we delete that too? As far as I can
tell, that code is still active and used; could you tell me more about
how it works and how it relates (or not) to the lfa2 boxing and
unboxing step, especially why the patch introduces a new box_float
operation rather than re-using the old intermediate language box/unbox
operations?
See attachment for the updated patch.
Cheers,
Peter
From 226623a7b9fb5c3f083fa1245f656cd909c751c8 Mon Sep 17 00:00:00 2001
From: felix
Date: Wed, 21 Nov 2018 18:48:48 +0100
Subject: [PATCH] Add unboxing pass to lfa2
After the lfa2 pass another pass is executed to eliminate unnecessary
boxing + unboxing of floating point intermediate values. The process
is roughly this: identify variables that are unassigned and are known
to contain flonums, count all accesses, then count all accesses of these
variables that are in direct operator position of an intrinsic that has
an unboxed variant and, if the number of accesses in unboxed position
is the same as the number of total accesses, then the variable
can be let-bound using a specialized construct (##core#let_float)
and all accesses be direct accesses (without any boxing/unboxing).
Results of unboxable intrinsics are boxed automatically
(using ##core#box_float), uses of ##core#inline_allocate on
unboxable intrinsics are converted to ##core#inline forms.
The lfa2 pass is now enabled at optimization levels 2 or higher.
Signed-off-by: Peter Bex
---
NEWS | 4 +
batch-driver.scm | 9 +-
c-backend.scm | 57
chicken.h | 18 ++-
chicken.scm | 7 +-
core.scm | 56 +---
lfa2.scm | 331 --
manual/Using the compiler | 2 +-
support.scm | 4 +-
9 files changed, 347 insertions(+), 141 deletions(-)
diff --git a/NEWS b/NEWS
index f59d72ec..f3be786d 100644
--- a/NEWS
+++ b/NEWS
@@ -59,6 +59,10 @@
with the same version of the compiler.
- the "-consult-type-file" and "-emit-type-file" options have been renamed
to "-consult-types-file" and "-emit-types-file", respectively.
+ - Added an