Jim, You’ve done an excellent job of summarizing the issues and providing strong arguments.
For my part, while consistency is important, simplicity is also important. To that end, a new syntax would need to bring a lot of value to justify the additional cognitive load. So unless you can get `^` to act as the exponent prefix, I’d be skeptical of a new syntax. For me, the idea that an Integer (and not a Float) can have a radix is sufficient and consistent. James > On Sep 9, 2019, at 3:13 PM, Jim Sawyer <[email protected]> wrote: > > > --- > Regarding the specification of floating point numbers with radix ~= 10, > and in particular the question raised by James Foster(Gemstone). > > Summarizing the comments from James Foster, Nicolas Collier, Prof Stef > Ducasse; > followed by some arguments with a lot of hand waving, > in order to agree with and support what Nicolas said at the outset: > this deserves a new syntax. > --- > Question: How should it work? > > The literal 16rFF is 255 (aSmallInteger). > Pharo permits lowercase hexadecimal digits, > so the literal 16rff is also taken to be 255. > The literal 1.23e3 is 1230.0 (aSmallDouble in GemStone). > Pharo also permits floating point numbers to have a radix, > so that e.g. both 2r1.111e3 and 2r1111 are taken to be 15. > This makes certain grammars for numbers ambiguous, because > upon encountering either an $E or an $e during the parse > of a number, we find two possible interpretations: > Is this a hexadecimal digit > or an exponent marker? > > James identified four possible solutions: > > > 1) Distinguish by letter case > uppercase $E is a hexadecimal digit > lowercase $e is a marker signifying an exponent. > 2) Allow exponents on base ten numbers only > 3) Distinguish by range > radix >= 15 > ifTrue: [$e is the hexadecimal digit ] > ifFalse: [$e is a marker signifying exponent] > 4) Develop a new syntax for floats that does not use > either the letter $e or the letter $E to mark an exponent > > Many implementations of Smalltalk use solution #1, > whereas Pharo currently uses solution #3. > > The result is that any number expressed in other dialects > will port to Pharo without issue, while certain expressions for > numbers that are recognized in Pharo become ambiguous > when ported to other dialects. > The most practical “fix” is for Pharo to adopt > the more ‘popular’ solution. > > <tl;dr><outcry> > > Even so, I submit that this is not the right thing to do. > But only because it is a hack, on top of a hack, on top of a > design error that went unnoticed for far too long, and really > ought to be corrected, for a number of reasons, viz.: > 1) The root cause of the problem is an ambiguous grammar. > 2) This ambiguity is unique to Smalltalk. > It does not occur in any other language, as far as I know. > 3) The source of the ambiguity is the design decision which > introduced a consistent syntax for expressing numbers in > different bases by directly specifying the desired radix, > instead of choosing from the very limited sets of special cases > provided by other languages.. > One is usually limited to binary, octal, decimal, and hexadecimal, > with a unique syntactic form required for expression in each base. > We get B’01’ for binary, \001 for octal, #01 or %01 for hex, > and the unadorned 1, left for (the most privileged) decimal form. > Introducing a consistent form for specifying alternate bases > was itself a great design decision. > At the same time, however, a change was introduced which impacted > some very long standing properties of numeric representations, and > the effect of that change was perhaps not fully considered. > As we go about the task of correcting such a latent error, we should > take enough time to more fully consider the particulars that brought us > here. > > 4) Smalltalk is a spectacularly consistent design and a spectacularly > consistent > language to work in. Increasing the consistency of such a language is > arguably > the right thing to do at every opportunity. This is not always the most > popular thing > to do—but it is usually the most honorable. > And the most useful, in the long term. > Practically speaking. > > 5) Of the four solutions (or cases), we can arguably eliminate three. > > 1) Distinguish by letter case > uppercase $E is a hexadecimal digit > lowercase $e is a marker signifying an exponent. > 2) Allow exponents on base ten numbers only > > 3) Distinguish by range > radix >= 15 > ifTrue: [$e is the hexadecimal digit ] > ifFalse: [$e is a marker signifying exponent] > > 4) Develop a new syntax for floats that does not use > either the letter $e or the letter $E to mark an exponent > > In other domains and languages, numerical values are specified > with digits only. > In such contexts, using a letter as a syntactic marker is > reasonable. > Once we adopt the specifiable-radix form > (radix)r(rigits) > in which numerical values are expressed using digits AND letters, > it becomes far less reasonable to use letters as a marker. > > Case (1) : > > Differentiation based on the case of letters is fine where the use of > letters is pervasive and capitalization is itself generically > meaningful, > e.g. certain shorthand notations used in regular expressions > (%h signifying a match of any lowercase hexadecimal, with > %H signifying a match of any uppercase hexadecimal). > Whereas using capitalization to distinguish ‘a value’ from ‘a syntactic > marke' > is a very poor use of character classes, of pixels, and of synaptic > gaps, because > the association is made without mnemonic support of any kind. Such > ‘rules’ > require rote memorization, i.e. perfect match of an arbitrary fact. > The hidden > assumption--that any ‘skill’ involved is both ubiquitous and evenly > distributed— > is, alas, unfounded. > > Case (2) > Disallowing exponents for all bases other than 10 is > a) inconsistent > b) contrary to the point of consistently specifying the desired > radix. > c) lazy > > Case (3): > Differentiating meaning based on a particular range of values > makes for a great explanation of the ‘discovered’ effect, but is somewhat > frightening to consider using *on purpose*. If we were to adopt anything > of this ilk, > a better crossover of ranges would be > radix <= 10 Values are confined to the set of digits (ascii > 16r30-16r39) > and $E are exponent markers. See also $s, $d, > and $q. > radix >= 11 Letters are required for use as extended values > as determined by the radix. > We cannot imagine using non-letter characters > for this case. > Therefore, no exponents for bases above 10. > > This leaves us with solution (4), create a new syntax for marking the > exponent. > Because the other solutions are hacks. Practical—sure. But hacks, > nonetheless. > Abominable. > > 7) As Nicolas pointed out, this issue deserves a new syntax. > The moment we adopted the specifiable-radix solution, > we needed to also abandon the use of the letter $e for marking > exponents. > Now is our chance to make it right. > </tl;dr></outcry> > > Ideas? > > > -Jim Sawyer > > > > > <https://www.avg.com/internet-security> > This email has been checked for viruses by AVG antivirus software. > www.avg.com <https://www.avg.com/internet-security> > <x-msg://32/#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
