Re: Raw string literals and Unicode escapes

2018-02-27 Thread Remi Forax
> De: "Guy Steele" > À: "John Rose" > Cc: "amber-spec-experts" > Envoyé: Mardi 27 Février 2018 22:12:14 > Objet: Re: Raw string literals and Unicode escapes >> On Feb 27, 2018, at 4:20 PM, John Rose < [ mailto:john.r.r...@oracle.com | &g

Re: Raw string literals and Unicode escapes

2018-02-27 Thread Guy Steele
> On Feb 27, 2018, at 4:33 PM, Brian Goetz wrote: > > >> Which leads us to the following theoretical result: the mechanism does >> not require you to grub around in the interior of the string AT ALL if you >> don’t want to. All you need to know is the length. If the length of the >> r

Re: Raw string literals and Unicode escapes

2018-02-27 Thread Brian Goetz
Which leads us to the following theoretical result: the mechanism does not require you to grub around in the interior of the string AT ALL if you don’t want to.  All you need to know is the length.  If the length of the raw string is n, and it does not begin or end with ` (a necessary ch

Re: Raw string literals and Unicode escapes

2018-02-27 Thread Guy Steele
> On Feb 27, 2018, at 4:20 PM, John Rose wrote: > > On Feb 27, 2018, at 11:48 AM, Brian Goetz > wrote: >> >>> >>> So after this length instead of having the probability to see a character >>> to be virtually 1, you have the opposite effect, because programming

Re: Raw string literals and Unicode escapes

2018-02-27 Thread John Rose
On Feb 27, 2018, at 11:48 AM, Brian Goetz wrote: > >> >> So after this length instead of having the probability to see a character to >> be virtually 1, you have the opposite effect, because programming languages >> (a human construct) are very regular in the set of chars they use. So you do

Re: Raw string literals and Unicode escapes

2018-02-27 Thread Guy Steele
> On Feb 27, 2018, at 2:48 PM, Brian Goetz wrote: > > >> So after this length instead of having the probability to see a character to >> be virtually 1, you have the opposite effect, because programming languages >> (a human construct) are very regular in the set of chars they use. So you do

Re: Raw string literals and Unicode escapes

2018-02-27 Thread Brian Goetz
So after this length instead of having the probability to see a character to be virtually 1, you have the opposite effect, because programming languages (a human construct) are very regular in the set of chars they use. So you do not need to a repetition of a character to avoid a statistical

Re: Raw string literals and Unicode escapes

2018-02-27 Thread Maurizio Cimadamore
On 27/02/18 08:16, fo...@univ-mlv.fr wrote: Hi John, see below. - Mail original - De: "John Rose" À: "Remi Forax" Cc: "amber-spec-experts" Envoyé: Lundi 26 Février 2018 21:17:13 Objet: Re: Raw string literals and Unicode escapes On Feb 26, 2018, at

Re: Raw string literals and Unicode escapes

2018-02-27 Thread forax
Hi John, see below. - Mail original - > De: "John Rose" > À: "Remi Forax" > Cc: "amber-spec-experts" > Envoyé: Lundi 26 Février 2018 21:17:13 > Objet: Re: Raw string literals and Unicode escapes > On Feb 26, 2018, at 10:43 AM, Alex B

Re: Raw string literals and Unicode escapes

2018-02-26 Thread Maurizio Cimadamore
I stand corrected - repeated underscores are allowed - but Josh's example reminded me of the state of affair with raw strings. Maurizio On 26/02/18 22:57, Maurizio Cimadamore wrote: At least there were other cases were we found different trade off between expressiveness and practicality - s

Re: Raw string literals and Unicode escapes

2018-02-26 Thread Maurizio Cimadamore
Of course - delimiters is not part of the string length - I see now why you can have (in theory) unbound prefix/suffix. Personally, I find the argument - "because you can have unlimited-length identifiers" not a great fit. From a lexer writer perspective, I can see that it is used as a candida

Re: Raw string literals and Unicode escapes

2018-02-26 Thread John Rose
On Feb 26, 2018, at 1:29 PM, Maurizio Cimadamore wrote: > > On 26/02/18 20:17, John Rose wrote: >> Any *finite choice* of end-quotes has the same problem, with >> a non-zero probability that decreases (but does not vanish) >> with the number of available end-quotes. The only way to >> break out

Re: Raw string literals and Unicode escapes

2018-02-26 Thread Jim Laskey
Why introduce an artificial limit? Identifiers don’t have a limit. 3.8. Identifiers An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. — Jim > On Feb 26, 2018, at 5:29 PM, Maurizio Cimadamore > wrote: > > > > On 26/02/18

Re: Raw string literals and Unicode escapes

2018-02-26 Thread Maurizio Cimadamore
On 26/02/18 20:17, John Rose wrote: Any*finite choice* of end-quotes has the same problem, with a non-zero probability that decreases (but does not vanish) with the number of available end-quotes. The only way to break out of the box is to allow the user an unlimited range of successively "st

Re: Raw string literals and Unicode escapes

2018-02-26 Thread John Rose
On Feb 26, 2018, at 10:43 AM, Alex Buckley wrote: > > On 2/25/2018 4:19 AM, Remi Forax wrote: >> I'm late in the game but why not using the same system as Perl, PHP, >> Ruby to solve the Lts [1], i.e >> you have a sequence that says this is the starts of a raw string (%Q, >> qq, m) then a charact

Re: Raw string literals and Unicode escapes

2018-02-26 Thread Alex Buckley
On 2/25/2018 4:19 AM, Remi Forax wrote: I'm late in the game but why not using the same system as Perl, PHP, Ruby to solve the Lts [1], i.e you have a sequence that says this is the starts of a raw string (%Q, qq, m) then a character (in a predefined list), the raw string and at the end of the ra

Re: Raw string literals and Unicode escapes

2018-02-25 Thread Remi Forax
> À: "Alex Buckley" > Cc: "amber-spec-experts" > Envoyé: Mercredi 14 Février 2018 23:46:54 > Objet: Re: Raw string literals and Unicode escapes > On Feb 14, 2018, at 2:42 PM, Alex Buckley < [ mailto:alex.buck...@oracle.com | > alex.buck...@oracle.com ]

Re: Raw string literals and Unicode escapes

2018-02-24 Thread Brian Goetz
And, I am very happy that, in lengthening the opening and closing quotes, we are making it possible to paste an arbitrary sequence of unicode without having to hunt around inside the sequence to find stuff that needs extra quoting, as is the case with today's strings. That's the high order bit h

Re: Raw string literals and Unicode escapes

2018-02-23 Thread John Rose
On Feb 23, 2018, at 1:00 PM, Brian Goetz wrote: > >> >> >> However, since the JEP's goal is to allow copy-paste of arbitrary text >> without interpretation, I think the RawSP trick of assigning meaning to >> whitespace is out of place. To most people, the raw string literal: >> >> ` and `

Re: Raw string literals and Unicode escapes

2018-02-23 Thread Guy Steele
+200. Or even String s = “`" + `a raw string` + “`”; It’s perfectly okay to use both kinds of string in one expression. > On Feb 23, 2018, at 4:00 PM, Brian Goetz wrote: > > > >> >> However, since the JEP's goal is to allow copy-paste of arbitrary text >> without interpretatio

Re: Raw string literals and Unicode escapes

2018-02-23 Thread Brian Goetz
However, since the JEP's goal is to allow copy-paste of arbitrary text without interpretation, I think the RawSP trick of assigning meaning to whitespace is out of place. To most people, the raw string literal:   ` and ` denotes a perfectly good five-character string that will probably be

Re: Raw string literals and Unicode escapes

2018-02-14 Thread John Rose
On Feb 14, 2018, at 2:42 PM, Alex Buckley wrote: > > Also, the inclusion of RawSP makes the lexing of RawStringLiteral ambiguous, > since RawStringBody allows opening and closing whitespace. No doubt this can > be fixed with rules involving "If the first character after RawSP is a > backtick .

Re: Raw string literals and Unicode escapes

2018-02-14 Thread Alex Buckley
On 2/14/2018 1:48 PM, John Rose wrote: P.S. I posted another version that takes a slightly different tack on the restriction of "cannot begin with a backquote". It basically lifts the whole design of Markdown code quotes. http://cr.openjdk.java.net/~jrose/jls/raw-string-pages-v5.pdf The inclus

Re: Raw string literals and Unicode escapes

2018-02-14 Thread John Rose
On Feb 14, 2018, at 1:43 PM, Alex Buckley wrote: > > Strictly speaking, the semantic rule is unnecessary because InputCharacter is > DEFINED to exclude the CR and LF line terminators! But the semantic rule > makes the intent very very clear. Writing rules in this form also prevents > the spec

Re: Raw string literals and Unicode escapes

2018-02-14 Thread Alex Buckley
On 2/14/2018 12:42 PM, John Rose wrote: On Feb 14, 2018, at 12:24 PM, Alex Buckley mailto:alex.buck...@oracle.com>> wrote: There is plenty of precedent for semantic rules In my draft version this is done with "where" clauses on the grammar rules: RawStringLiteral: RawQuote RawStringBody

Re: Raw string literals and Unicode escapes

2018-02-14 Thread John Rose
On Feb 14, 2018, at 12:24 PM, Alex Buckley wrote: > > There is plenty of precedent for semantic rules In my draft version this is done with "where" clauses on the grammar rules: > > RawStringLiteral: > > RawQuote RawStringBody RawQuote > where the two raw-quotes are constrained to be ide

Re: Raw string literals and Unicode escapes

2018-02-14 Thread Alex Buckley
On 2/13/2018 2:19 PM, Jim Laskey wrote: 10a. String s = `abc`; 10b. String s = \u0060abc`; ... So, change the scanner to A) Peek back to make sure the first open backtick was exactly a backtick. B) Turn off Unicode escapes immediately so that only backtick characters can be part of the delimiter

Re: Raw string literals and Unicode escapes

2018-02-14 Thread Alex Buckley
On 2/13/2018 2:11 PM, John Rose wrote: On Feb 13, 2018, at 9:58 AM, Alex Buckley mailto:alex.buck...@oracle.com>> wrote: I suspect the trickiest part of specifying raw string literals will be the lexer's modal behavior for Unicode escapes. As such, I am going to put the behavior under the micro

Re: Raw string literals and Unicode escapes

2018-02-13 Thread John Rose
On Feb 13, 2018, at 2:19 PM, Jim Laskey wrote: > > So, change the scanner to > > A) Peek back to make sure the first open backtick was exactly a backtick. > B) Turn off Unicode escapes immediately so that only backtick characters can > be part of the delimiter. > C) Turn on Unicode escapes only

Re: Raw string literals and Unicode escapes

2018-02-13 Thread Jim Laskey
10a. String s = `abc`; 10b. String s = \u0060abc`; As it stands both are legal. This decision has been mostly taken away from us because the lookahead of the previous token has “consumed" the character. There is little hope of finding out which form the backtick was derived. Not technically tru

Re: Raw string literals and Unicode escapes

2018-02-13 Thread John Rose
On Feb 13, 2018, at 9:58 AM, Alex Buckley wrote: > > I suspect the trickiest part of specifying raw string literals will be the > lexer's modal behavior for Unicode escapes. As such, I am going to put the > behavior under the microscope. For an approach to this see: http://cr.openjdk.java.ne

Raw string literals and Unicode escapes

2018-02-13 Thread Alex Buckley
I suspect the trickiest part of specifying raw string literals will be the lexer's modal behavior for Unicode escapes. As such, I am going to put the behavior under the microscope. Here is what the JEP has to say: - Unicode escapes, in the form \u, are processed as part of character in