on Tue May 30 2017, Jordan Rose <[email protected]> wrote: >> On May 30, 2017, at 16:13, Dave Abrahams <[email protected]> > wrote: >> >> >> on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com >> <http://at-apple.com/>> wrote: >> > >>>> On May 30, 2017, at 14:53, Dave Abrahams <[email protected]> wrote: >>>> >>>> >>>> on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote: >>>> >>>>> My knee-jerk reaction is to say it's too late in Swift 4 for this kind >>>>> of change, but with that out of the way, I'm most concerned about what >>>>> it means to have, say, a UTF-8 index that's not on a UTF-16 boundary. >>>>> >>>>> let str = "言" >>>>> let oneUnitIn = str.utf8.index(after: str.utf8.startIndex) >>>>> let trailingBytes = str.utf8[oneUnitIn...] >>>> >>>> This is not new; it exists today. >>> >>> Yes, I think that’s valuable. What’s different is that it’s not a >>> String.Index. >>> >>>> >>>>> What can I do with 'oneUnitIn'? >>>> >>>> All the usual stuff; we're not proposing to change what you can do with >>>> it. >>> >>> By changing the type, you have increased the scope of where an index >>> can be used. What happens when I use it in one of the other views and >>> it’s not on a boundary? >>> >>> (I suspect the answer is “it traps” but the proposal should spell that >>> out explicitly.) >> >> Sorry, I mistakenly limited the “rounding down” behavior to slicing and >> range replacement. The index would be rounded down to the previous >> boundary, and then used as ever. > > Makes sense! > >> >>> >>>> >>>>> How do I test to see if it's on a Character boundary or a >>>>> UnicodeScalar boundary? >>>> >>>> as noted, >>>> >>>> Replacing the failable APIs listed [above](#motivation) that detect >>>> whether an index represents a valid position in a given view, and >>>> enhancement that explicitly round index positions to nearby boundaries >>>> in a given view, are left to a later proposal. For now, we do not >>>> propose to remove the existing index conversion APIs. >>>> >>>> That means you can use oneUnitIn.samePosition(in: str) or >>>> oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta >>>> character or unicode scalar boundary. >>> >>> I’m sorry, I completely missed that. This part of the question is withdrawn. >>> >>> I’m also concerned about putting “UTF-16” in the documentation for >>> encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t >> >> It is today; hopefully it won't be someday >> >>> ; if it’s an opaque value then it should be treated as such. >> >> Today a String has underlying UTF-16-compatible storage and that's >> documented as such, but we intend to lift that restriction and don't >> want the names to lock us into semantics. > > I don’t think you should promise that about new APIs, then, or someone > will start relying on it.
Okay, we could leave it out of this doc comment. But as long as something documents that Strings are stored as UTF-16 (e.g. we say you get random-access performance for the utf16 view when Foundation is loaded), the implication is there. >>> (It’s also a little disturbing that round-tripping through >>> encodedOffset isn’t guaranteed to give you the same index back.) >> >> Define “same.” >> >> The encodedOffset is not the full value of an *arbitrary* index, and >> doesn't claim to be. The indices that can be serialized and >> reconstructed exactly using encodedOffset are those that fall on code >> unit boundaries. Today, that means everything but UTF-8 indices. We >> could consider exposing the transcodedOffset (offset within the UTF8 >> encoding of the scalar) as well, but I want to be conservative. > > I’m not sure it’s clear from the name “encodedOffset” that this is a > lossy conversion. It's not a conversion :-) > I’d say it should be an optional property, but that’s probably too > annoying in the invalid case. Maybe it should trap. I really don't think so; IMO that would be inconsistent with the “rounding down” behavior proposed. I think either all misaligned accesses should trap or they should do something lenient. I proposed lenience, but trapping is still an option. -- -Dave _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
