> On May 4, 2016, at 1:51 PM, John Holdsworth <[email protected]> wrote:
>
> … response inline
>
>>> On May 2, 2016, at 2:23 PM, John Holdsworth <[email protected]> wrote:
>>>
>>>
>>>> I'm having trouble getting the `e` modifier to work as advertised, at
>>>> least for the sequence `\\`. For example, `print(e"\\\\")` prints two
>>>> backslashes, and `print(e"\\\")` seems to try to escape the string
>>>> literal. I'm currently envisioning `e` as disabling *all* backslash
>>>> escapes, so these behaviors wouldn't be appropriate. It also looks like
>>>> interpolation is still enabled in `e` strings.
>>>>
>>>> Since other things like `print(e"\w+")` work just fine, I'm guessing this
>>>> is a bug in the proposal's sketches (not being clear enough about the
>>>> expected behavior), not your code.
>>>>
>>>> I've written a gist with some tests to show how I expect things to work:
>>>>
>>>> https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e
>>>
>>> The problem here is that I’ve not implemented unescaped literals fully as
>>> it would require changes outside the lexer.
>>> This is because the string is first lexed and tokenised by one piece of
>>> code Lexer::lexStringLiteral but later
>>> on in the code generation phase it generates the actual literal in a
>>> function Lexer::getEncodedStringSegment.
>>> This is passed the same string from the source file but does not know what
>>> modifiers should be applied. As a result
>>> normal escapes are still processed. All the “e” flag does is silence the
>>> error for invalid escapes during tokenising.
>>
>> Lexer just lays ropes around certain areas to tell what's where. sometimes
>> this is not enough for extra semantics. this is the reason why i went down
>> the path of a custom string_multiline_literal token. It looks like you might
>> want to consider that path too. If you do, you might consider the merits of
>> suggesting that half the work be put in place now, allowing both our
>> experimentations (and other more sophisticated) to lean on it, as an
>> alternative to just directly adding extra conditional code in the default
>> lexer code.
>
> Not sure what you mean here. It’s the modifiers that have a greater effect on
> lexing, not whether a string is multi-line. IMO it’s
> probably best to avoid creating a separate string_multiline_literal token as
> that would require visiting the grammar everywhere
> a string could occur. If you want to see what I mean I’ve committed a change
> which uses 3 extra bits to the Token structure to
> carry modifiers applied from the lexing stage to code generation so
> non-escaping strings can finally be handled correctly.
>
> https://github.com/apple/swift/pull/2275
> new toolchain: http://johnholdsworth.com/swift-LOCAL-2016-05-04-a-osx.tar.gz
>
> The following now holds
>
> assert( e"\w\d+\(author)\n" == "\\w\\d+\\(author)\\n" );
> assert( r"\w\d+\(author)\n" == "\\w\\d+\(author)\n" ); // previous
> implementation
>
> John
>
Cool job!.. Yup, you proceed by "widening the existing holes" to carry the
missing info (eg Modifiers). Making direct changes to lexCharacter() is a step
I thought might be a bit premature considering nothing is carved in stone yet.
I was trying to advocate for a clean boundary between current behavior and new
ones, such that we, as well as others, would be able to try alternative
syntaxes by changing the content of clearly identified methods (as opposed to
starting their own integration from scratch each time, or having to un-unstitch
parts of multiple already not so simple methods). I guess I am also extra
cautious in my own coding because this a lexer, and the more paths through
something like lexCharacter() or getEncodedStringSegment(), the more difficult
it might be to prove that all of them have been identified and exercised. Thx
for inspiring my experimentations.
Very Best
(From mobile)
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution