---
Regarding the specification of floating point numbers with radix ~= 10,
and in particular the question raised by James Foster(Gemstone).
Summarizing the comments from James Foster, Nicolas Collier, Prof Stef Ducasse;
followed by some arguments with a lot of hand waving,
in order to agree with and support what Nicolas said at the outset:
this deserves a new syntax.
---
Question: How should it work?
The literal 16rFF is 255 (aSmallInteger). Pharo permits lowercase
hexadecimal digits, so the literal 16rff is also taken to be 255.
The literal 1.23e3 is 1230.0 (aSmallDouble in GemStone). Pharo also
permits floating point numbers to have a radix, so that e.g. both
2r1.111e3 and 2r1111 are taken to be 15.
This makes certain grammars for numbers ambiguous, because upon
encountering either an $E or an $e during the parse of a number, we find
two possible interpretations:
Is this a hexadecimal digit or an exponent marker?
James identified four possible solutions: 1) Distinguish by letter case
uppercase $E is a hexadecimal digit lowercase $e is a marker signifying
an exponent.
2) Allow exponents on base ten numbers only
3) Distinguish by range
radix >= 15 ifTrue: [$e is the hexadecimal digit ]ifFalse: [$e is a
marker signifying exponent]
4) Develop a new syntax for floats that does not use either the letter
$e or the letter $E to mark an exponent
Many implementations of Smalltalk use solution #1, whereas Pharo
currently uses solution #3.
The result is that any number expressed in other dialects will port to
Pharo without issue, while certain expressions for numbers that are
recognized in Pharo become ambiguous when ported to other dialects.
The most practical “fix” is for Pharo to adopt the more ‘popular’ solution.
<tl;dr><outcry> Even so, I submit that this is not the right thing to
do. But only because it is a hack, on top of a hack, on top of a design
error that went unnoticed for far too long, and really ought to be
corrected, for a number of reasons, viz.:
1) The root cause of the problem is an ambiguous grammar.
2) This ambiguity is unique to Smalltalk. It does not occur in any other
language, as far as I know.
3) The source of the ambiguity is the design decisionwhich introduced a
consistent syntax for expressing numbers in different bases by directly
specifying the desired radix, instead of choosing from the very limited
sets of special cases provided by other languages..
One is usually limited to binary, octal, decimal, and hexadecimal, with
a unique syntactic form required for expression in each base. We get
B’01’ for binary, \001 for octal, #01 or %01 for hex, and the unadorned
1, left for (the most privileged) decimal form.
Introducing a consistent form for specifying alternate bases was itself
a great design decision.
At the same time, however, a change was introduced which impacted some
very long standing properties of numeric representations, and the effect
of that change was perhaps not fully considered.
As we go about the task of correcting such a latent error, we should
takeenough time to more fully consider the particulars that brought us
here.
4) Smalltalk is a spectacularly consistent design and a spectacularly
consistent language to work in.Increasing the consistency of such a
language is arguably the right thing to do at every opportunity.This is
not always themost popular thing to do—but it is usually the most
honorable. And the most useful, in the long term.Practically speaking.
5) Of the four solutions (or cases), we can arguably eliminate three.
1) Distinguish by letter case uppercase $E is a hexadecimal digit
lowercase $e is a marker signifying an exponent. 2) Allow exponents on
base ten numbers only 3) Distinguish by range radix >= 15 ifTrue: [$e is
the hexadecimal digit ]ifFalse: [$e is a marker signifying exponent] 4)
Develop a new syntax for floats that does not use either the letter $e
or the letter $E to mark an exponent In other domains and languages,
numerical values are specified with digits only. In such contexts, using
a letter as a syntactic marker is reasonable.
Once we adopt the specifiable-radix form (radix)r(rigits)in which
numerical values are expressed using digits AND letters, it becomes far
less reasonable to use letters as a marker.
Case (1) :
Differentiation based on the case of letters is fine where the use of
letters is pervasive and capitalization is itself generically
meaningful, e.g. certain shorthand notations used in regular expressions
(%h signifying a match of any lowercase hexadecimal, with %H signifying
a match of any uppercase hexadecimal).
Whereas using capitalization to distinguish ‘a value’ from ‘a syntactic
marke' is a very poor use of character classes, of pixels, and of
synaptic gaps, because the association is made without mnemonic support
of any kind. Such ‘rules’ require rote memorization, i.e. perfect match
of an arbitrary fact.The hidden assumption--that any ‘skill’ involved is
both ubiquitous and evenly distributed— is, alas, unfounded.
Case (2)
Disallowing exponents for all bases other than 10 is a) inconsistent b)
contrary to the point of consistently specifying the desired radix. c) lazy
Case (3):
Differentiating meaning based on a particular range of valuesmakes for a
great explanation of the ‘discovered’ effect, but is somewhat
frighteningto consider using *on purpose*.If we were to adopt anything
of this ilk,a better crossover of ranges would be
radix <= 10Values are confined to the set of digits (ascii 16r30-16r39)
and $E are exponent markers.See also $s, $d, and $q.
radix >= 11Letters are required for use as extended values as determined
by the radix. We cannot imagine using non-letter characters for this
case. Therefore, no exponents for bases above 10.
This leaves us with solution (4), create a new syntax for marking the
exponent. Because the other solutions are hacks.Practical—sure.But
hacks, nonetheless.Abominable.
7) As Nicolas pointed out, this issue deserves a new syntax.
The moment we adopted the specifiable-radix solution, we needed to also
abandon the use of the letter $e for marking exponents.
Now is our chance to make it right.
</tl;dr></outcry>
Ideas?
-Jim Sawyer
------------------------------------------------------------------------
AVG logo <https://www.avg.com/internet-security>
This email has been checked for viruses by AVG antivirus software.
www.avg.com <https://www.avg.com/internet-security>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
--
This email has been checked for viruses by AVG.
https://www.avg.com