… response inline

> On May 2, 2016, at 2:23 PM, John Holdsworth <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> 
>>> I'm having trouble getting the `e` modifier to work as advertised, at least 
>>> for the sequence `\\`. For example, `print(e"\\\\")` prints two 
>>> backslashes, and `print(e"\\\")` seems to try to escape the string literal. 
>>> I'm currently envisioning `e` as disabling *all* backslash escapes, so 
>>> these behaviors wouldn't be appropriate. It also looks like interpolation 
>>> is still enabled in `e` strings.
>>> 
>>> Since other things like `print(e"\w+")` work just fine, I'm guessing this 
>>> is a bug in the proposal's sketches (not being clear enough about the 
>>> expected behavior), not your code.
>>> 
>>> I've written a gist with some tests to show how I expect things to work:
>>> 
>>>     https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e 
>>> <https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e>
>> The problem here is that I’ve not implemented unescaped literals fully as it 
>> would require changes outside the lexer.
>> This is because the string is first lexed and tokenised by one piece of code 
>> Lexer::lexStringLiteral but later
>> on in the code generation phase it generates the actual literal in a 
>> function Lexer::getEncodedStringSegment.
>> This is passed the same string from the source file but does not know what 
>> modifiers should be applied. As a result
>> normal escapes are still processed. All the “e” flag does is silence the 
>> error for invalid escapes during tokenising.
> 
> Lexer just lays ropes around certain areas to tell what's where. sometimes 
> this is not enough for extra semantics. this is the reason why i went down 
> the path of a custom string_multiline_literal token. It looks like you might 
> want to consider that path too. If you do, you might consider the merits of 
> suggesting that half the work be put in place now, allowing both our 
> experimentations (and other more sophisticated) to lean on it, as an 
> alternative to just directly adding extra conditional code in the default 
> lexer code.

Not sure what you mean here. It’s the modifiers that have a greater effect on 
lexing, not whether a string is multi-line. IMO it’s 
probably best to avoid creating a separate string_multiline_literal token as 
that would require visiting the grammar everywhere
a string could occur. If you want to see what I mean I’ve committed a change 
which uses 3 extra bits to the Token structure to
carry modifiers applied from the lexing stage to code generation so 
non-escaping strings can finally be handled correctly.

https://github.com/apple/swift/pull/2275
new toolchain: http://johnholdsworth.com/swift-LOCAL-2016-05-04-a-osx.tar.gz 
<http://johnholdsworth.com/swift-LOCAL-2016-05-04-a-osx.tar.gz>

The following now holds

        assert( e"\w\d+\(author)\n" == "\\w\\d+\\(author)\\n" );
        assert( r"\w\d+\(author)\n" == "\\w\\d+\(author)\n" ); // previous 
implementation

John


_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to