Re: [swift-evolution] multi-line string literals.

L Mihalkovic via swift-evolution Sun, 01 May 2016 10:56:42 -0700

Hi John,

in truth, your work is what gave me the idea of creating a separate parsing 
path now (what I called lexStringMultilineLiteral() )exactly for the two points 
you make:


it is possible/straightforward to ‘tweak’ the current lexing code to get 
something going as you proved it
using the default string literal recognition mechanism binds us to have to 
‘wait’ a bit in order to figure which kind of string literal we are dealing 
with (single/multi line)

I poorly described that and apologize for it. So there is now doubt that with 
enough small alterations, the current parser will be made better, smarter, … my 
point is that this may not be desirable, and in fact be like the song of a 
siren calling in the wrong direction?!

Taking today the step to split the parsing into 2 separate paths has some 
implication on the features that can be devised in the future, without IMO ever 
having to revisit this fundamental split. However, I will also be the first to 
recognize that if there is no long term appeal for content type tagging or 
macros or custom formatters/verifiers, then there is indeed no need to split 
the path today, or ever for that matter.

But if these make sense for 3 and more likely past 3, then one can easily see 
today that a split is likely going to make these features a lot easier to 
implement then, and if it is the right path then and we can see it today, then 
IMO it stands to reason to build it in today (per the team’s development 
doctrine that features be as much as possible split between the preliminary 
enablement steps and the specifics of the features) provided that it does not 
commit us to anything today that has not and cannot yet be decided, and 
considering how it will on the other hand facilitate more exploratory work in 
the community with minimal burden to the core team. Considering the length of 
this thread and sophistication of some of the interventions, there is little 
doubt in my mind that this will not be a simple consensus builder, which will 
IMO only increase the value of having a common extension ground for prototypes.

The names I proposed are only the result of trying to make the additions simple 
by reference to the surrounding code, while trying to minimize the intersection 
between current and new code.  To that effect my ongoing implementation is 
based on a couple of changes in lexImpl(), the rest being isolated inside new 
methods. But fundamentally the question I am presenting is about the

long term usability of a contents tagging possibility (does not have to be done 
now, but will IMO be easier that way)
workload management utilitarian aspect of a simple base split today for 
parallel explorations and easier merges along the way

The short term how is only one of the possible ways, which only makes sense 
based on the long term term direction string literals will take.

Best regards, and thank you John for showing me where to start.

LM/


> On May 1, 2016, at 7:15 PM, John Holdsworth <[email protected]> wrote:
> 
> Thanks Brent for pulling together the proposal and summarising this thread.
> 
> I have to say I still feel most drawn to your “continuation quotes” idea and 
> after
> some thought the “_” modifier for _”strings”with”quotes”_ also seems sensible.
> Most of all, for me the appeal of these approaches is their absolute 
> simplicity.
> My only reservation is what external editors will make of these strings as 
> there
> is no precedent in another programming language I am aware of.
> 
> I’ve updated the "reference toolchain" and PR for testing and review.
> 
> http://johnholdsworth.com/swift-LOCAL-2016-05-01-a-osx.tar.gz 
> <http://johnholdsworth.com/swift-LOCAL-2016-05-01-a-osx.tar.gz>
> https://github.com/apple/swift/pull/2275 
> <https://github.com/apple/swift/pull/2275>
> 
> This implementation still contains the “e” modifier as an example of how they
> would be lexed (which I’ll remove before submission as it is outside the 
> scope 
> of this proposal) and one new feature that \ before a newline ignores the 
> newline.
> In this implementation modifiers can only be applied to the first segment of 
> the literal.
> 
> This makes the following strings valid to my mind:
> 
>         let xml = "\
>             "<?xml version=\"1.0\"?>
>             "<catalog>
>             "   <book id=\"bk101\" empty=\"\">
>             "       <author>\(author)</author>
>             "       <title>XML Developer's Guide</title>
>             "       <genre>Computer</genre>
>             "       <price>44.95</price>
>             "       <publish_date>2000-10-01</publish_date>
>             "       <description>An in-depth look at creating \
>                         "applications with XML.</description>
>             "   </book>
>             "</catalog>
>             ""
>         print(xml)
> 
>         assert( xml == _"<?xml version="1.0"?>
>             "<catalog>
>             "   <book id="bk101" empty="">
>             "       <author>\(author)</author>
>             "       <title>XML Developer's Guide</title>
>             "       <genre>Computer</genre>
>             "       <price>44.95</price>
>             "       <publish_date>2000-10-01</publish_date>
>             "       <description>An in-depth look at creating applications 
> with XML.</description>
>             "   </book>
>             "</catalog>
>             ""_ )
> 
>         try! NSRegularExpression(pattern: e"<([a-zA-Z][\w]*)", options: [])
>             .enumerateMatches(in: xml, options: [], range: NSMakeRange(0, 
> xml.utf16.count)) {
>                 (result, flags, stop) in
>                 print((xml as NSString).substring( with:result!.range(at: 1)))
>         }
> 
> I’d not create a lexStringMultilineLiteral() function just yet as the changes 
> are still minor as you
> can see from the PR and due to the fact you can’t determine if a string is 
> multiline until you
> are half way through it. Excuse the “bottom up" approach but the reasoning is 
> it would be no
> accident that if the lexer and any changes to it are minimal, it will be easy 
> to document and use.
> 
> John
> 
> 
>> On 1 May 2016, at 13:04, L Mihalkovic <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> [couple minutes read]
>> 
>> I read with great attention this thread, trying to see it from the 
>> implementation viewpoint (I know that the compiler structure should not 
>> drive the language features). I also revisited the how-to-contribute notes 
>> as well as the dev-process description. One of the ideas that stood out in 
>> my mind was that when looking at an implementation, enablement changes 
>> should be separated from the bulk of the feature, such that reviews can be 
>> easier.
>> 
>> So I tried to elevate this to the rank of a hidden-mandatory-requirement for 
>> anything related to this feature. It lead me to a staged approach to this 
>> feature that would allow a lot of things to be done, OVER TIME.
>> 
>> When distilling this feature to the smallest part enabler that would have to 
>> be added to the compiler I came to the following short list
>> 
>> add a string_multiline_token  to the lexer
>> I realize that the current lexer can be tweaked to work (as per John’s PR), 
>> but IMO adding a dedicated "hole" in the parsing code is what will give 
>> something working today (no difference with current compiler behavior) while 
>> allowing all future changes to be cleanly isolated from anything around
>> if one accepts the idea of a hole created by the token, then it stands to 
>> reason to have delimiters around it. I looking at the structure of the 
>> grammar, I came to the conclusion that  _” and “_ where an easy, unambiguous 
>> choice (I believe “”” and “”” looked like an equally easy an unambiguous 
>> choice)
>> the next choice should be the creation of a lexStringMultilineLiteral() and 
>> lexMultilineCharacter() method in the Lexer. Again… bare with me, I do 
>> believe it is relevant to what everyone wants this feature to be… The latter 
>> method should contain only extensions specific to multiline literals 
>> delegating common use cases to lexCharacter()
>> 
>> The main point of following this route (or any equivalent) is that 
>> it represents a very clear commitment to multiline string literals
>> it ensures that there is no strong commitment to feature details, while 
>> allowing many future scenarios
>> it will remain backward compatible with enhancements to the current string 
>> literal syntax (translation?)
>> external contributors will be able to prototype while making sure we stay 
>> within strict boundaries for integration with the compiler
>> 
>> The next equally small step would be to describe the required minimal 
>> changes to Parser, a step I do not want to take now if the compiler experts  
>> view no merit at all to the proposed staged approach.
>> 
>> 
>> 
>> A thought experiment pushing further down this path, shows how the following 
>> would be equally possible language features (with roughly equivalent 
>> implementation cost):
>> 
>> let whyOwhy = “”"\
>>     !!    Can't understand what improvements it truly delivers 
>>     !!        It basically removes a handful of characters
>>     !!    It works today
>>     !!        But I don't see it as a likable foundations for adding in 
>> future enhancements
>>     !!\
>>     !!    I don't envy the people who will have to support it outside of 
>> xcode
>>     !!        Or even in xcode (considering how it currently struggles with 
>> indents/formatting
>>     !!    As for elegance, beauty is in the eye of the beholder, they say.
>> “”"
>> var json1 = _"[json]\
>>     !!{
>>     !!  "file" : "\(wishIhadPlaceholders)_000.md"
>>     !!  "desc" : "and why are all examples in xml, i thought it died a while 
>> ago ;-)"
>>     !!  "rational" : [
>>     !!          "Here we go again"
>>     !!          "How will xcode help make these workable"
>>     !!       ]
>>     !!}
>> “_
>> var json2 = _"[json]\
>> {
>>   "file" : "\(wishIhadPlaceholders)_000.md"
>>   "desc" : "and why are all examples in xml, i thought it died a while ago 
>> ;-)"
>>   "rational" : [
>>           "Here we go again"
>>           "How will xcode help make these workable"
>>        ]
>> }
>> “_
>> 
>>  [_"]  --> start string
>>  [_"\] --> start line + ignore spaces until eol (basically swallow \r\n)
>>  [!!\] --> ignore everything until eol... basically the gap does not exits
>>  ["_]  --> terminate string
>>  [_"[TYPEID]\] --> start string knowing that it a verifyer or a formatter 
>> (or a chain of) understanding TYPEID can syntax check or format or or or
>> 
>> 
>> IMO splitting these expression from the current lexing/parsing has another 
>> long term benefits when coupled with the aforementioned idea of contents 
>> tagging:
>> allow external dedicated formatter to be created in any editor supporting 
>> swift
>> allow external validators (including in the form of compiler plugins)
>> open a door for an equivalent to the scala's macros for contents marked as  
>> [swift]
>> 
>> Once again I fully appreciate that implementation should not drive language 
>> design, but considering the flurry of great ideas, I thought it might in 
>> this instance be useful to identify a minimal, noncommittal, direction 
>> common to many scenarios, such that a step can be taken that will neither 
>> favor nor prohibit any of the proposals, but simply enable them all.
>> 
>> Thank you for your patience
>> Regards
>> 
>> PS: I am working on a rudimentary implementation that I hope could help 
>> people test all the ideas floating in this list. 
>> 
>> 
>>> On Apr 26, 2016, at 8:04 AM, Chris Lattner via swift-evolution 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> 
>>> On Apr 25, 2016, at 5:22 PM, Brent Royal-Gordon <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>>>> 3. It might be useful to make multiline `"` strings trim trailing 
>>>>>> whitespace and comments like Perl's `/x` regex modifier does.
>>>>> 
>>>>> If you have modifier characters already, it is easy to build a small zoo 
>>>>> full of these useful beasts.
>>>> 
>>>> Modifiers are definitely a workable alternative, and can be quite 
>>>> flexible, particularly if a future macro system can let you create new 
>>>> modifiers.
>>> 
>>> Right. I consider modifiers to be highly precedented in other languages, 
>>> and therefore proven to work.  If we go this way, I greatly prefer prefix 
>>> to postfix modifiers.
>>> 
>>>>>> * Alternative delimiters: If a string literal starts with three, or 
>>>>>> five, or seven, or etc. quotes, that is the delimiter, and fewer quotes 
>>>>>> than that in a row are simply literal quote marks. Four, six, etc. 
>>>>>> quotes is a quote mark abutting the end of the literal.
>>>>>> 
>>>>>>  let xml: String = """<?xml version="1.0"?>
>>>>>>                          """<catalog>
>>>>>>                          """\t<book id="bk101" empty="">
>>>>>>                          """\t\t<author>\(author)</author>
>>>>>>                          """\t</book>
>>>>>>                          """</catalog>"""
>>>>>> 
>>>>>> You can't use this syntax to express an empty string, or a string 
>>>>>> consisting entirely of quote marks, but `""` handles empty strings 
>>>>>> adequately, and escaping can help with quote marks. (An alternative 
>>>>>> would be to remove the abutting rule and permit `""""""` to mean "empty 
>>>>>> string", but abutting quotes seem more useful than long-delimiter empty 
>>>>>> strings.)
>>>>> 
>>>>> I agree that there is a need to support alternative delimiters, but 
>>>>> subjectively, I find this to be pretty ugly.  It is also a really 
>>>>> unfortunate degenerate case for “I just want a large blob of XML” because 
>>>>> you’d end up using “"” almost all the time, and you have to use it on 
>>>>> every line.
>>>> 
>>>> On the other hand, the `"""` does form a much larger, more obvious 
>>>> continuation indicator. It is *extremely* obvious that the above line is 
>>>> not Swift code, but something else embedded in it. It's also extremely 
>>>> obvious what its extent is: when you stop seeing `"""`, you're back to 
>>>> normal Swift code.
>>> 
>>> Right, but it is also heavy weight and ugly.  In your previous email you 
>>> said about the single quote approach: "The quotation marks on the left end 
>>> up forming a column that marks the lines as special”, so I don’t see a need 
>>> for a triple quote syntax to solve this specific problem.
>>> 
>>>> I *really* don't like the idea of our only alternatives being "one 
>>>> double-quote mark with backslashing" or "use an entire heredoc". Heredocs 
>>>> have their place, but they are a *very* heavyweight quoting mechanism, and 
>>>> relatively short strings with many double-quotes are pretty common. 
>>>> (Consider, for instance, strings containing unparsed JSON.) I think we 
>>>> need *some* alternative to double-quotes, either single-quotes (with the 
>>>> same semantics, just as an alternative) or this kind of quote-stacking.
>>> 
>>> I agree that this is a real problem that would be great to solve.
>>> 
>>> If I step back and look at the string literal space we’re discussing, I 
>>> feel like there are three options:
>>> 
>>> 1) single and simple multiline strings, using “
>>> 2) your triple quote sort of string, specifically tuned to avoid having to 
>>> escape “ when it occurs once or twice in sequence.
>>> 3) heredoc, which is a very general (but also very heavy weight) solution 
>>> to quoting problems.
>>> 
>>> I’m trying to eliminate the middle one, so we only have to have "two 
>>> things”.  Here are some alternative ways to solve the problem, which might 
>>> have less of an impact on the language:
>>> 
>>> A) Introduce single quoted string literals to avoid double quote problems 
>>> specifically, e.g.:   ‘look “here” I say!’.  This is another form of #2 
>>> which is less ugly.  It also doesn’t help you if you have both “ and ‘ in 
>>> your string.
>>> 
>>> B) Introduce a modifier character that requires a more complex closing 
>>> sequence to close off the string, see C++ raw string literals for prior art 
>>> on this approach.  Perhaps something like:
>>> 
>>>      Rxxx”look “ here “ I can use quotes “xxx
>>> 
>>> That said, I still prefer C) "ignore this issue for now”.  In other words, 
>>> I wouldn’t want to block progress on improving the string literal situation 
>>> overall on this issue, because anything we do here is an further extension 
>>> to a proposal that doesn’t solve this problem.
>>> 
>>>> 
>>>>> For cases like this, I think it would be reasonable to have a “heredoc” 
>>>>> like scheme, which does not allow leading indentation, and does work with 
>>>>> all the same modifier characters above.  I do not have a preference on a 
>>>>> particular syntax, and haven’t given it any thought, but this would allow 
>>>>> you to do things like:
>>>>> 
>>>>>   let str = <<EOF
>>>>> <?xml version="1.0"?>
>>>>> <catalog>
>>>>> \t<book id="bk101" empty="">
>>>>> \t\t<author>\(author)</author>
>>>>> \t</book>
>>>>> </catalog>
>>>>> EOF
>>>>> 
>>>>> for example.  You could then turn off escaping and other knobs using the 
>>>>> modifier character (somehow, it would have to be incorporated into the 
>>>>> syntax of course).
>>>> 
>>>> There are two questions and a suggestion I have whenever heredoc syntax 
>>>> comes up.
>>>> 
>>>> Q1: Does the heredoc begin immediately, at the next line, or at the next 
>>>> valid place for a statement to start? Heredocs traditionally take the 
>>>> second approach.
>>>> 
>>>> Q2: Do you permit heredocs to stack—that is, for a single line to specify 
>>>> multiple heredocs?
>>>> 
>>>> S: During the Perl 6 redesign, they decided to use the delimiter's 
>>>> indentation to determine the base indentation for the heredoc:
>>>> 
>>>>    func x() -> String {
>>>>            return <<EOF
>>>>            <?xml version="1.0"?>
>>>>            <catalog>
>>>>            \t<book id="bk101" empty="">
>>>>            \t\t<author>\(author)</author>
>>>>            \t</book>
>>>>            </catalog>
>>>>            EOF
>>>>    }
>>>> 
>>>> Does that seem like a good approach?
>>> 
>>> I think that either approach could work, that you have a lot more 
>>> experience on these topics than I do, and I would expect a vigorous 
>>> community debate about these topics. :-)
>>> 
>>> That said, if you look at what we’re discussing:
>>> 
>>> 1. “Continuation" string literals, to allow a multi-line string literal.  
>>> You and I appear to completely agree about this.
>>> 2. Heredoc: You and I seem to agree that they are a good “fully general” 
>>> solution to have, but there are the details you outline above to iron out.
>>> 3. Modifier characters:  I’m in favor, but I don’t know where you stand.  
>>> There is also still much to iron out here (such as the specific characters).
>>> 4. A way to avoid having to escape “ in a non-heredoc literal.  I’m still 
>>> unconvinced, and think that any solution to this problem will be orthogonal 
>>> to the problems solved by 1-3 (and therefore can be added after getting 
>>> experience with the other parts).
>>> 
>>> If you agree that these are all orthogonal pieces, then treat them as such: 
>>> I’d suggest that you provide a proposal that just tackles the continuation 
>>> string literals.  This seems simple, and possible to get in for Swift 3.  
>>> After that, we can discuss heredoc and modifiers (if you think they’re a 
>>> good solution) on their own threads.  If those turn out to be 
>>> uncontroversial, then perhaps they can get in too.
>>> 
>>> On the heredoc aspects specifically, unless others chime in with strong 
>>> opinions about the topics you brought up, I’d suggest that you craft a 
>>> proposal for adding them with your preferred solution to these.  You can 
>>> mention the other answers (along with their tradeoffs and rationale for why 
>>> you picked whatever you think is right) in the proposal, and we can help 
>>> the community hash it out.
>>> 
>>> What do you think?
>>> 
>>> -Chris
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> 
>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] multi-line string literals.

Reply via email to