An update on this. I've committed some changes that seem to have solved
the problem. It's probably of general interest so I'll provide a bit of
background.
It has to do with the optimisation of closures. In general an ML
function needs to be represented by a closure, an item on the heap
consisting of a pointer to the code and copies of the free variables
needed by the function. A naive implementation would build a closure
for each function as it was declared but it's possible to do much better
than that. If we can be sure that a function is only ever called and
not returned or assigned to a reference we don't need to put the closure
on the heap. Poly/ML has used a couple of different techniques but more
recently it has handled these cases by building the closure on the
stack. A stack closure looks just like a heap closure so the calling
conventions are the same. That means it can be used both when we can
identify all the call sites of the function but also if it is passed
into a function such as map or fold. (Actually, List.map and
List.foldl/r are small enough that they're inlined but the principle is
the same). The disadvantage is that a call to a function with a
stack-closure cannot be tail-recursive because the closure needs to stay
on the stack until the function returns. This is what went wrong with
this example.
Another solution, which works when all the call sites of a function can
be identified but not when a function is passed as an argument, is
lambda-lifting. This involves adding the free variables as extra
arguments to the function. The resulting function does not need a
closure. I've implemented this for local functions where all the call
sites can be found and left the stack closures for the other cases. It
is this change that has fixed the problem since lambda-lifted functions
can be tail-recursive.
It may be that lambda-lifted functions are more efficient for other
reasons. Functions without closures can be code-generated using calls
or jumps directly to the function rather than requiring an indirection
through the closure. This may fit better with the pre-fetching of
modern hardware.
David
On 05/02/2015 05:00, Michael Norrish wrote:
(I'll attempt to attach the tgz file to this message, but if 817KB is too big
and this bounces or the attachment is dropped, I'll make it available
separately.)
If compiled with Poly/ML, the attached program performs abysmally on the
provided testcase.sml input file. Within seconds of beginning, it attempts to
allocate more memory than my machine has (16GB), and basically brings it to its
knees. If compiled with mlton, the testcase doesn't seem to cause any obvious
memory allocation, and finishes in less than a second.
You can also compile with Moscow ML, though I haven’t included the build
instructions in the Makefile. The execution of the mosml executable doesn't
seem to allocate any memory either, and it terminates cleanly (after 17s).
Though slow this is still preferable to Poly/ML's behaviour.
This is with Poly/ML 5.5.2 on
Linux telemachus 3.2.0-75-generic #110-Ubuntu SMP Tue Dec 16 19:11:55 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux
The Makefile generates two executables (if you have mlton and poly in your
PATH): pholdeptool and mholdeptool. You can run them on the testcase, e.g.:
pholdeptool testcase.sml
The lexer code looks to me to be tail-recursive, and the program only
accumulates a binary tree containing 15 strings. So it really does seem as if
Poly/ML is doing something buggy.
(The program is lexing the test-case (which uncompresses to a 30MB file) in an
extremely simple way, but performing a useful analysis.)
Michael
_______________________________________________
polyml mailing list
[email protected]
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
_______________________________________________
polyml mailing list
[email protected]
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml