[polyml] Fixed precision int

David Matthews Wed, 10 Feb 2016 00:56:49 -0800

From the start int in Poly/ML has been arbitrary precision. Afteraround thirty years I think it's time to make a change. The currentplan is to introduce a fixed precision integer type and use that as thedefault int. Arbitrary precision will remain as LargeInt.int/IntInf.int.

The reason for the change is that it is becoming clear that havingarbitrary precision int as the only integer type imposes restrictions onthe code both as regards the integer type itself but also more widely.

The arbitrary precision type in Poly/ML is implemented using a taggedrepresentation. Small values that will fit in 31/63 bits arerepresented as the value shifted one bit left and with the bottom bitset. Larger values are represented as pointers to arrays of bytes,either in the GMP format or Poly's own format. Since pointers arealways word-aligned it's possible to distinguish the two forms bylooking at the bottom bit. The "tag-bit" also serves another purpose.The garbage-collector can use it to distinguish pointers fromnon-pointers so all non-pointer values, e.g. bool, char and word, arerepresented using the tagged form.

Arbitrary precision operations are handled directly by thecode-generator. The idea is that when the values are small, as theyusually are, the machine can execute the appropriate instructiondirectly. For example, addition is generated as an add instruction. Itis preceded by a test on the tags and followed by a test for overflow.If the arguments were not tagged or the result overflowed there is acall into the run-time system. This emulates the instruction andreturns with the result in the correct register.

Emulation has limitations. Currently only (in)equality, comparison,addition and subtraction are emulated. Other operations such asmultiplication and division involve calls to assembly code. This meansthat code that makes significant use of arithmetic incurs overheads evenif it only ever uses small values. There is also a more subtle cost.Because emulating addition and subtraction could involve allocatingmemory on the heap for the long-format result every addition andsubtraction could result in a garbage-collection. This means that theregisters, which could be holding intermediate results, have to betreated as the roots for the GC. That in turn means that they mustalways contain either valid pointers or tagged values. This sometimesinvolves adding extra instructions simply to ensure this is the case.It also means that whenever the code returns from the run-time system toML all the registers have to be loaded with known values, adding to thecost of a RTS call.

The plan is to retain the existing arbitrary precision using the currenttagged values but remove emulation. All arbitrary precision operationswill involve an ML function call to the assembly code. This willcompute the result directly if it is short and call into the run-timesystem if it is long. This is actually likely to faster than thecurrent scheme if the values are long. Some cases, for example equalitybetween an arbitrary precision value and a short constant, can continueto be handled by the code-generator since they do not require emulation.

Fixed precision int will be based on the current short format i.e. 31bits on 32-bit machines, 63 bits on 63-bit machines. Operations onthese can be handled directly by the code-generator.

These changes are currently being tested in the FixedPrecisionIntbranch. This requires at least one, preferably two, calls to "makecompiler" to build a version of the compiler that generates fixedprecision operations. For backwards compatibility it does not yet gainall the advantages of the change; in particular it still clears all theregisters on return from the run-time system.


_______________________________________________
polyml mailing list
[email protected]
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml

[polyml] Fixed precision int

Reply via email to