Jim,

You’ve done an excellent job of summarizing the issues and providing strong 
arguments.

For my part, while consistency is important, simplicity is also important. To 
that end, a new syntax would need to bring a lot of value to justify the 
additional cognitive load. So unless you can get `^` to act as the exponent 
prefix, I’d be skeptical of a new syntax. For me, the idea that an Integer (and 
not a Float) can have a radix is sufficient and consistent.

James

> On Sep 9, 2019, at 3:13 PM, Jim Sawyer <[email protected]> wrote:
> 
> 
> ---
> Regarding the specification of floating point numbers with radix ~= 10,
> and in particular the question raised by James Foster(Gemstone).
> 
> Summarizing the comments from James Foster, Nicolas Collier, Prof Stef 
> Ducasse;
> followed by some arguments with a lot of hand waving,
> in order to agree with and support what Nicolas said at the outset:
> this deserves a new syntax.
> ---
> Question: How should it work?
>  
> The literal 16rFF is 255 (aSmallInteger).
> Pharo permits lowercase hexadecimal digits,
> so the literal 16rff is also taken to be 255.
> The literal 1.23e3 is 1230.0 (aSmallDouble in GemStone).
> Pharo also permits floating point numbers to have a radix,
> so that e.g. both 2r1.111e3 and 2r1111 are taken to be 15.
> This makes certain grammars for numbers ambiguous, because
> upon encountering either an $E or an $e during the parse
> of a number, we find two possible interpretations:
>             Is this a hexadecimal digit
>           or an exponent marker?
> 
> James identified four possible solutions:
> 
> 
>       1) Distinguish by letter case
>                       uppercase $E is a hexadecimal digit
>                       lowercase $e is a marker signifying an exponent.
>       2) Allow exponents on base ten numbers only
>          3) Distinguish by range
>               radix >= 15
>                       ifTrue: [$e is the hexadecimal digit ]
>                       ifFalse: [$e is a marker signifying exponent]
>             4) Develop a new syntax for floats that does not use 
>                 either the letter $e or the letter $E to mark an exponent
> 
> Many implementations of Smalltalk use solution #1,
> whereas Pharo currently uses solution #3.
> 
> The result is that any number expressed in other dialects
> will port to Pharo without issue, while certain expressions for
> numbers that are recognized in Pharo become ambiguous
> when ported to other dialects.
> The most practical “fix” is for Pharo to adopt
> the more ‘popular’ solution.
> 
> <tl;dr><outcry>
> 
> Even so, I submit that this is not the right thing to do.
> But only because it is a hack, on top of a hack, on top of a
> design error that went unnoticed for far too long, and really
> ought to be corrected, for a number of reasons, viz.:
> 1) The root cause of the problem is an ambiguous grammar.
> 2) This ambiguity is unique to Smalltalk.
>     It does not occur in any other language, as far as I know.
> 3) The source of the ambiguity is the design decision which
>     introduced a consistent syntax for expressing numbers in
>    different bases by directly specifying the desired radix,
>    instead of choosing from the very limited sets of special cases
>    provided by other languages..
>    One is usually limited to binary, octal, decimal, and hexadecimal,
>    with a unique syntactic form required for expression in each base.
>    We get B’01’ for binary, \001 for octal, #01 or %01 for hex,
>    and the unadorned 1,  left for (the most privileged) decimal form.
>    Introducing a consistent form for specifying alternate bases
>    was itself a great design decision.
>    At the same time, however, a change was introduced which impacted
>    some very long standing properties of numeric representations, and
>    the effect of that change was perhaps not fully considered.
>    As we go about the task of correcting such a latent error, we should
>    take enough time to more fully consider the particulars that brought us 
> here.
> 
> 4) Smalltalk is a spectacularly consistent design and a spectacularly 
> consistent
>     language to work in.  Increasing the consistency of such a language is 
> arguably
>     the right thing to do at every opportunity.  This is not always the most 
> popular thing
>     to do—but it is usually the most honorable.
>    And the most useful, in the long term.
>    Practically speaking.
> 
> 5) Of the four solutions (or cases), we can arguably eliminate three.
> 
>       1) Distinguish by letter case
>                       uppercase $E is a hexadecimal digit
>                       lowercase $e is a marker signifying an exponent.
>       2) Allow exponents on base ten numbers only
> 
>          3) Distinguish by range
>               radix >= 15
>                       ifTrue: [$e is the hexadecimal digit ]
>                       ifFalse: [$e is a marker signifying exponent]
> 
>             4) Develop a new syntax for floats that does not use 
>                 either the letter $e or the letter $E to mark an exponent
> 
>             In other domains and languages, numerical values are specified 
> with digits only.
>             In such contexts, using a letter as a syntactic marker is 
> reasonable.
>            Once we adopt the specifiable-radix form
>                         (radix)r(rigits)
>             in which numerical values are expressed using digits AND letters,
>             it becomes far less reasonable to use letters as a marker.
> 
> Case (1) :
> 
>       Differentiation based on the case of letters is fine where the use of
>       letters is pervasive and capitalization is itself generically 
> meaningful,
>       e.g. certain shorthand notations used in regular expressions
>                (%h signifying a match of any lowercase hexadecimal, with
>                 %H signifying a match of any uppercase hexadecimal).
>       Whereas using capitalization to distinguish ‘a value’ from ‘a syntactic 
> marke'
>       is a very poor use of character classes, of pixels, and of synaptic 
> gaps, because
>       the association is made without mnemonic support of any kind. Such 
> ‘rules’ 
>       require rote memorization, i.e. perfect match of an arbitrary fact.  
> The hidden
>       assumption--that any ‘skill’ involved is both ubiquitous and evenly 
> distributed—
>       is, alas, unfounded.      
> 
> Case (2)
>       Disallowing exponents for all bases other than 10 is
>               a) inconsistent
>               b) contrary to the point of consistently specifying the desired 
> radix.
>               c) lazy
> 
> Case (3):
>    Differentiating meaning based on a particular range of values
>    makes for a great explanation of the ‘discovered’ effect, but is somewhat 
>    frightening to consider using *on purpose*.   If we were to adopt anything 
> of this ilk,
>    a better crossover of ranges would be
>            radix <= 10  Values are confined to the set of digits (ascii 
> 16r30-16r39)
>                               and $E are exponent markers.  See also $s, $d, 
> and $q.
>         radix >= 11   Letters are required for use as extended values
>                               as determined by the radix.
>                               We cannot imagine using non-letter characters 
> for this case.
>                               Therefore, no exponents for bases above 10.
> 
> This leaves us with solution (4), create a new syntax for marking the 
> exponent.
> Because the other solutions are hacks.  Practical—sure.  But hacks, 
> nonetheless. 
> Abominable.
>  
> 7) As Nicolas pointed out, this issue deserves a new syntax.
>       The moment we adopted the specifiable-radix solution,
>       we needed to also abandon the use of the letter $e for marking 
> exponents.
>       Now is our chance to make it right.
>       </tl;dr></outcry>
> 
>     Ideas?
> 
> 
> -Jim Sawyer
> 
> 
> 
> 
>   <https://www.avg.com/internet-security>     
> This email has been checked for viruses by AVG antivirus software. 
> www.avg.com <https://www.avg.com/internet-security>
>  <x-msg://32/#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Reply via email to