on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote: >> On May 30, 2017, at 14:53, Dave Abrahams <[email protected]> wrote: >> >> >> on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote: >> >>> My knee-jerk reaction is to say it's too late in Swift 4 for this kind >>> of change, but with that out of the way, I'm most concerned about what >>> it means to have, say, a UTF-8 index that's not on a UTF-16 boundary. >>> >>> let str = "言" >>> let oneUnitIn = str.utf8.index(after: str.utf8.startIndex) >>> let trailingBytes = str.utf8[oneUnitIn...] >> >> This is not new; it exists today. > > Yes, I think that’s valuable. What’s different is that it’s not a > String.Index. > >> >>> What can I do with 'oneUnitIn'? >> >> All the usual stuff; we're not proposing to change what you can do with >> it. > > By changing the type, you have increased the scope of where an index > can be used. What happens when I use it in one of the other views and > it’s not on a boundary? > > (I suspect the answer is “it traps” but the proposal should spell that > out explicitly.)
Sorry, I mistakenly limited the “rounding down” behavior to slicing and range replacement. The index would be rounded down to the previous boundary, and then used as ever. > >> >>> How do I test to see if it's on a Character boundary or a >>> UnicodeScalar boundary? >> >> as noted, >> >> Replacing the failable APIs listed [above](#motivation) that detect >> whether an index represents a valid position in a given view, and >> enhancement that explicitly round index positions to nearby boundaries >> in a given view, are left to a later proposal. For now, we do not >> propose to remove the existing index conversion APIs. >> >> That means you can use oneUnitIn.samePosition(in: str) or >> oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta >> character or unicode scalar boundary. > > I’m sorry, I completely missed that. This part of the question is withdrawn. > > I’m also concerned about putting “UTF-16” in the documentation for > encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t It is today; hopefully it won't be someday > ; if it’s an opaque value then it should be treated as such. Today a String has underlying UTF-16-compatible storage and that's documented as such, but we intend to lift that restriction and don't want the names to lock us into semantics. > (It’s also a little disturbing that round-tripping through > encodedOffset isn’t guaranteed to give you the same index back.) Define “same.” The encodedOffset is not the full value of an *arbitrary* index, and doesn't claim to be. The indices that can be serialized and reconstructed exactly using encodedOffset are those that fall on code unit boundaries. Today, that means everything but UTF-8 indices. We could consider exposing the transcodedOffset (offset within the UTF8 encoding of the scalar) as well, but I want to be conservative. -- -Dave _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
