Agreed. Taking this offlist :)
On Thu, Sep 22, 2016 at 9:01 PM, Michael Gottesman <[email protected]> wrote: > > > On Sep 22, 2016, at 6:11 PM, Xiaodi Wu <[email protected]> wrote: > > On Thu, Sep 22, 2016 at 7:44 PM, Michael Gottesman <[email protected]> > wrote: > >> >> On Sep 22, 2016, at 5:09 PM, Xiaodi Wu <[email protected]> wrote: >> >> On Thu, Sep 22, 2016 at 6:54 PM, Michael Gottesman <[email protected]> >> wrote: >> >>> >>> On Sep 22, 2016, at 4:19 PM, Xiaodi Wu <[email protected]> wrote: >>> >>> You mean values of type String? >>> >>> >>> I was speaking solely of constant strings. >>> >>> I would want those to be exactly what I say they are; NFC normalization >>> is available, if I recall, as part of Foundation, but by no means should my >>> String values be silently changed! >>> >>> >>> Why. >>> >> >> For one, I don't want to pay the computational cost of normalization at >> runtime unless necessary. >> >> >> This would only happen with strings that are known to be constant at >> compile time (and as such the transformation would occur at compile time). >> There would be no runtime cost. >> > > Yes, for constant strings only there would be no runtime cost. > > >> >> For another, I expect to be able to round-trip user input. >> >> >> String checks for canonical equivalence, IIRC. >> > > Sure, but I'm not talking about using comparison operators here. I mean > that if we have `let str = "[some non-NFC string]"`, I should be able to > write that out to a file with all the non-canonical glyphs intact. > > > I would argue that most people that is not an interesting distinction. > Naturally there would be a way to escape such canonicalization to get the > non-canonicalized String. > > > There are known issues with NFC that are acceptable for normalizing Swift > identifiers but make it unsuitable for general use. For example, the > normalized form of Greek ano teleia is middle dot, but these two glyphs are > rendered differently in many fonts, and substituting a middle dot in place > of the Greek punctuation mark is actually quite inadequate for Greek text > (ano teleia is supposed to be around x-height; middle dot is not). Even for > constant strings, it is essential that one can output ano teleia when it is > specified rather than middle dot. However, Unicode normalization algorithms > guarantee stability and will forever require swapping the former for the > latter. I understand that other such problematic characters exist. > > > I would argue that that is a problem with the unicode standard and with > the fonts. This is not a problem for Swift to solve. > > > Normalization is not lossless and cannot be reversed. Finally, if I want >> to use normalization form D (NFD), your proposal >> >> would make it impossible, because (IIUC) serial NFC + NFD normalization >> can produce different output than NFD normalization alone. >> >> >> Why would you want to do this/care about this? I.e. what is the use case? >> > > Use cases for NFD include searching, where you'd find substrings > considered "compatible." For instance, the fi ligature is considered > compatible with the letters f and i, but they are not equal. If you've ever > successfully searched for a word like "finance" in a PDF document that's > been typeset with ligatures, you've benefited from NFD. Roughly speaking > (IIUC), the difference between searching NFC-normalized strings and > NFD-normalized strings is analogous to the difference between a > case-sensitive and a case-insensitive search. Therefore, given a string x, > it's sometimes important to be able to obtain NFD(x). If every string x is > now automatically NFC(x), then the best one can do is NFD(NFC(x)), which is > not guaranteed equal to NFD(x) even with canonical comparison (i.e. > NFC(NFD(NFC(x))) != NFC(NFD(x)) for all x). > > > There are issues here related to String design. For instance, one could > make an argument that such searching is really only interesting for a > "Text" use case which is different from a String use case. That being said, > I don't want to argue about this here since we are hijacking this thread ; > ). > > > >> As an aside, I am not formally proposing this. I am just discussing >> potential opportunities for optimization given that we would need (as apart >> of this proposal) to add knowledge of unicode to the compiler which would >> allow for compile time transformations. >> > > I'd be interested to know what performance gains you're envisioning with > such an optimization of constant strings at compile time. > > > I would have to measure such wins to say anything concrete. > Algorithmically one would be able to avoid normalization during common > unicode operations when you know you are using constant strings. Even > though this may provide a runtime win, the major win from teaching the > compiler about unicode would be in terms of applying unicode operations > such as encoding/decoding to constant strings. > > That being said, this is not the proposal that is being discussed here or > even being proposed here. [i.e. lets stop hijacking this thread ; )] > > > On Thu, Sep 22, 2016 at 6:10 PM, Michael Gottesman <[email protected]> >>> wrote: >>> >>>> >>>> > On Sep 22, 2016, at 10:50 AM, Joe Groff via swift-evolution < >>>> [email protected]> wrote: >>>> > >>>> > >>>> >> On Jul 26, 2016, at 12:26 PM, Xiaodi Wu via swift-evolution < >>>> [email protected]> wrote: >>>> >> >>>> >> +1. Even if it's too late for Swift 3, though, I'd argue that it's >>>> highly unlikely to be code-breaking in practice. Any existing code that >>>> would get tripped up by this normalization is arguably broken already. >>>> > >>>> > I'm inclined to agree. To be paranoid about perfect compatibility, we >>>> could conceivably allow existing code with differently-normalized >>>> identifiers with a warning based on Swift version, but it's probably not >>>> worth it. It'd be interesting to data-mine Github or the iOS Swift >>>> Playgrounds app and see if this breaks any Swift 3 code in practice. >>>> >>>> As an additional interesting point here, we could in general normalize >>>> unicode strings. This could potentially reduce the size of unicode >>>> characters or allow us to constant propagate certain unicode algorithms in >>>> the optimizer. >>>> >>>> > >>>> > -Joe >>>> > _______________________________________________ >>>> > swift-evolution mailing list >>>> > [email protected] >>>> > https://lists.swift.org/mailman/listinfo/swift-evolution >>>> >>> >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
