On Wed, Sep 28, 2016 at 10:34 PM, Xiaodi Wu <[email protected]> wrote:
> On Wed, Sep 28, 2016 at 10:23 PM, Charles Srstka via swift-evolution < > [email protected]> wrote: > >> On Sep 28, 2016, at 9:57 PM, Erica Sadun via swift-evolution < >> [email protected]> wrote: >> >> >> D'erp. I missed that. And that's an unambiguous answer. >> >> So let me move on to part B of the pitch: I think CharacterSets are >> broken. >> >> Xiaodi Wu: "isn't the problem you're presenting really an argument that >> the type should be fleshed out to handle characters (grapheme clusters) >> containing more than one Unicode scalar?" >> >> >> It seems that it already does handle such characters: >> >> (done in Objective-C so we can log the length of the range as a count of >> UTF-16 code units) >> >> #import <Foundation/Foundation.h> >> >> int main(int argc, char *argv[]) { >> @autoreleasepool { >> NSCharacterSet *bikeSet = [NSCharacterSet >> characterSetWithCharactersInString:@"🚲"]; >> NSString *str = @"foo🚲bar"; >> >> >> NSRange range = [str rangeOfCharacterFromSet:bikeSet]; >> >> >> NSLog(@"location: %lu length: %lu", range.location, range.length >> ); >> } >> } >> >> - - - - - - - >> >> *2016-09-28 22:20:00.622471 test[15577:2433912] location: 3 length: 2* >> *Program ended with exit code: 0* >> >> - - - - - - - >> >> As we can see, the character from the set is recognized as consisting of >> two code units. There are a few bugs in the system, though. See the >> cocoa-dev thread “Where is my bicycle?” from about a year ago: >> http://prod.lists.apple.com/archives/cocoa-dev/2015/Apr/msg00074.html >> > > The bike emoji might be two code units, but it is one Unicode scalar > (U+1F6B2). However, the Canadian flag emoji, for instance, is two Unicode > scalars (U+1F1E8 U+1F1E6) but nonetheless one character. > To illustrate in code how CharacterSet doesn't actually handle characters made up of multiple Unicode scalars: ``` import Foundation let str1 = "🇦🇩" let first = CharacterSet(charactersIn: str1) // this actually crashes corelibs-foundation let str2 = "🇦🇺" let second = CharacterSet(charactersIn: str2) let intersection = first.intersection(second) print(intersection.isEmpty) // actual output: false // obviously, if we were really dealing with characters, the intersection should be empty ``` > Charles >> >> >> _______________________________________________ >> swift-evolution mailing list >> [email protected] >> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
