From the start int in Poly/ML has been arbitrary precision. After
around thirty years I think it's time to make a change. The current
plan is to introduce a fixed precision integer type and use that as the
default int. Arbitrary precision will remain as LargeInt.int/IntInf.int.
The reason for the change is that it is becoming clear that having
arbitrary precision int as the only integer type imposes restrictions on
the code both as regards the integer type itself but also more widely.
The arbitrary precision type in Poly/ML is implemented using a tagged
representation. Small values that will fit in 31/63 bits are
represented as the value shifted one bit left and with the bottom bit
set. Larger values are represented as pointers to arrays of bytes,
either in the GMP format or Poly's own format. Since pointers are
always word-aligned it's possible to distinguish the two forms by
looking at the bottom bit. The "tag-bit" also serves another purpose.
The garbage-collector can use it to distinguish pointers from
non-pointers so all non-pointer values, e.g. bool, char and word, are
represented using the tagged form.
Arbitrary precision operations are handled directly by the
code-generator. The idea is that when the values are small, as they
usually are, the machine can execute the appropriate instruction
directly. For example, addition is generated as an add instruction. It
is preceded by a test on the tags and followed by a test for overflow.
If the arguments were not tagged or the result overflowed there is a
call into the run-time system. This emulates the instruction and
returns with the result in the correct register.
Emulation has limitations. Currently only (in)equality, comparison,
addition and subtraction are emulated. Other operations such as
multiplication and division involve calls to assembly code. This means
that code that makes significant use of arithmetic incurs overheads even
if it only ever uses small values. There is also a more subtle cost.
Because emulating addition and subtraction could involve allocating
memory on the heap for the long-format result every addition and
subtraction could result in a garbage-collection. This means that the
registers, which could be holding intermediate results, have to be
treated as the roots for the GC. That in turn means that they must
always contain either valid pointers or tagged values. This sometimes
involves adding extra instructions simply to ensure this is the case.
It also means that whenever the code returns from the run-time system to
ML all the registers have to be loaded with known values, adding to the
cost of a RTS call.
The plan is to retain the existing arbitrary precision using the current
tagged values but remove emulation. All arbitrary precision operations
will involve an ML function call to the assembly code. This will
compute the result directly if it is short and call into the run-time
system if it is long. This is actually likely to faster than the
current scheme if the values are long. Some cases, for example equality
between an arbitrary precision value and a short constant, can continue
to be handled by the code-generator since they do not require emulation.
Fixed precision int will be based on the current short format i.e. 31
bits on 32-bit machines, 63 bits on 63-bit machines. Operations on
these can be handled directly by the code-generator.
These changes are currently being tested in the FixedPrecisionInt
branch. This requires at least one, preferably two, calls to "make
compiler" to build a version of the compiler that generates fixed
precision operations. For backwards compatibility it does not yet gain
all the advantages of the change; in particular it still clears all the
registers on return from the run-time system.
_______________________________________________
polyml mailing list
[email protected]
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml