Re: Swift: How to determine if a Character represents whitespace?
I have no idea how a linguistic tagger determines whitespace and whether it uses the same definition for whitespace as NSCharacterSet does. Given that it's multi-language-aware I wouldn't be shocked to find it uses some entirely different way of enumerating textual elements. On 6 Apr 2015, at 20:29, Gerriet M. Denkmann gerri...@icloud.com wrote: On 4 Apr 2015, at 16:13, cocoa-dev-requ...@lists.apple.com wrote: ok here’s my try, assuming NSLinguisticTagger knows what it’s doing. And yes it’s a bit stupid to use a linguistic tagger to do something like that but .. whatever Linguistic Tagger should use the same definition for white as NSCharacterSet.whitespaceCharacterSet. If this is so, this would work for all characters (even if their Unicode code point does NOT fit into an unsigned short): import Cocoa let whiteSet = NSCharacterSet.whitespaceCharacterSet() let testString = ... var i : Int = 0 for scalar in testString.unicodeScalars { let uChar : UTF32Char = scalar.value let isWhite = whiteSet.longCharacterIsMember(uChar) let note = isWhite ? whiteSpace : non white var stringWithScalar = stringWithScalar.append(scalar) let indexFormated = NSString(format: %2d, i++) let codePoint = scalar.value// UInt32 let hexFormated = NSString(format: %#07x, codePoint) println( codePoint[ + indexFormated + ] = + hexFormated + note + stringWithScalar) } Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
I imagine you’re right, that they’re NString indexes packaged up into a frustrating return type. After sleeping on it, though, I imagined that even if complex grapheme clusters WERE to make count( attrStr.string ) return a different result than attrStr.length, it would probably never be due to whitespace. So if I go back to Charles Strstka’s original suggestion, where you pull off one character at a time, convert it to a 1-Character string, and then test for whitespace membership, I should be able to count leading and trailing whitespace characters and then do math based on attrStr.length to create the range. Here’s my current playground: import Cocoa extension Character { func isMemberOfSet( set:NSCharacterSet ) - Bool { // The for loop only executes once; // its purpose is to convert Character to a type // you can actually do something with for char in String( self ).utf16 { if set.characterIsMember( char ) { return true } } return false } } var result:NSRange let whitespace = NSCharacterSet.whitespaceAndNewlineCharacterSet() let attrStr = NSAttributedString( string: Fourscore and seven years ago... \n\n \t\t ) let str = attrStr.string var headCount = 0 var tailCount = 0 var startIx = str.startIndex var endIx = str.endIndex while endIx startIx str[ endIx.predecessor() ].isMemberOfSet( whitespace ) { ++tailCount endIx = endIx.predecessor() } if endIx startIx { while str[ startIx ].isMemberOfSet( whitespace ) { ++headCount startIx = startIx.successor() } let length = attrStr.length - ( headCount + tailCount ) result = NSRange( location:headCount, length:length ) } else { // String was empty or all whitespace result = NSRange( location:0, length:0 ) } let resultString = attrStr.attributedSubstringFromRange( result ) — Charles On April 2, 2015 at 11:16:52 PM, Quincey Morris (quinceymor...@rivergatesoftware.com) wrote: On Apr 2, 2015, at 19:28 , Charles Jenkins cejw...@gmail.com wrote: I can indeed call attrStr.string.rangeOfCharacterFromSet(). But in typical Swift string fashion, the return type is as unfriendly as possible: RangeString.Index? — as if the NSString were a Swift string. I finally read the whole of what you said here, and I had to run to a playground to check: import Cocoa var strA = Hello?, String” var strB = Hello?, String as NSString var strC = Hello\u{1f650}, String” var strD = Hello\u{1f650}, NSString as NSString var rangeA = strA.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // {Some “7..8”} var rangeB = strB.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // (7,1) var rangeC = strC.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // {Some “8..9”} var rangeD = strD.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // (8,1) So, yes, these are NSString indexes all the way, even if the result is packaged as a RangeString.Index. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
extension Character { func isMemberOfSet( set:NSCharacterSet ) - Bool { // The for loop only executes once; // its purpose is to convert Character to a type // you can actually do something with for char in String( self ).utf16 { if set.characterIsMember( char ) { return true } } return false } } I believe your comment that the loop executes once is incorrect. It may execute more than once when the Character is a composed character that maps to multiple utf16 characters. Example (stolen from this link): http://stackoverflow.com/questions/27697508/nscharacterset-characterismember-with-swifts-character-type let acuteA: Character = \u{e1} // An a with an accent let acuteAComposed: Character = \u{61}\u{301}// Also an a with an accent Both are a single Character. The latter will cause the loop to iterate twice. Marc ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 3, 2015, at 04:00 , Charles Jenkins cejw...@gmail.com wrote: for char in String( self ).utf16 { if set.characterIsMember( char ) { return true } Now we’re back to the place we started. This code is wrong. It fails for any code point that isn’t representable a single UTF-16 code value, and it fails for any grapheme that isn’t representable as a single code point. This is what I would do (playground version): import Cocoa let notWhitespace = NSCharacterSet.whitespaceAndNewlineCharacterSet().invertedSet let attrStr = NSAttributedString( string:Fourscore and seven years ago... \n\n \t\t ) let str = attrStr.string as NSString let startRange = str.rangeOfCharacterFromSet(notWhitespace, options: NSStringCompareOptions.allZeros) let endRange = str.rangeOfCharacterFromSet(notWhitespace, options: NSStringCompareOptions.BackwardsSearch) let startIndex = startRange.length != 0 ? startRange.location : 0 let endIndex = endRange.length != 0 ? endRange.location + 1 : str.length let resultRange = NSRange (location: startIndex, length: endIndex - startIndex) let resultStr = attrStr.attributedSubstringFromRange (resultRange) It’s the Obj-C code, just written in Swift. The ‘as NSString’ in the 3rd line makes it work. The practical difficulty in your original approach is that (e.g.) String.rangeOfCharacterFromSet returns a RangeString.Index, but AFAICT that isn’t convertible back to a NSRange, or even just integer indexes. At the same time, AFAICT it isn’t useful with a String because it doesn’t contain Character indexes, just unichar indexes, which have no meaning for a String in general. Actually, my testing is with Swift 1.1, since I’m not in a position to move to Xcode 6.3 yet. It’s possible that the results are different in Swift 1.2. However, in the section of the release notes that talks about bridging between String and NSString, it says: Note that these Cocoa types in Objective-C headers are still automatically bridged to their corresponding Swift type so I suspect the results would be the same in 1.2. It seems to me there is an actual bug here: “String methods corresponding to NSString methods that return NSRange values actually return RangeString.Index values, but these are not valid ranges, either for String objects (they represent UTF-16 code value positions, not Character positions) or for NSString objects (they’re not convertible back to NSRange). The String methods ought to return NSRange values just like their NSString counterparts.” ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 3, 2015, at 11:04 AM, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 3, 2015, at 04:00 , Charles Jenkins cejw...@gmail.com wrote: for char in String( self ).utf16 { if set.characterIsMember( char ) { return true } Now we’re back to the place we started. This code is wrong. It fails for any code point that isn’t representable a single UTF-16 code value, and it fails for any grapheme that isn’t representable as a single code point. No it doesn't. Give it a test. let acuteA: Character = \u{e1} // An a with an accent let acuteAComposed: Character = \u{61}\u{301}// Also an a with an accent func howManyChars( c: Character) - Int { var count = 0 for char in String( c ).utf16 { count += 1 } return count } howManyChars(acuteA)// returns 1 howManyChars(acuteAComposed)// returns 2 The original code will return true only if all code points map to white space. Marc ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 3, 2015, at 11:19 , Marco S Hyman m...@snafu.org wrote: The original code will return true only if all code points map to white space. The “failure” I was talking about is something a bit different. It has two problems: 1. For Unicode code points that are represented by 2 code values, it tests the code values, not the code points. That’s wrong. 2. For graphemes that are represented by 2 or more code points, it still tests the code values, of which there could be 4 or more per grapheme. That’s also wrong. With the ‘for char in String (self)’ code, if you tested whether a decomposed acuteA was in the (7-bit) ASCII character set, you’d get the answer “YES. You could mitigate #1 by using UTF-32 code values instead of UTF-16, but that wouldn’t help with #2. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
So my Character.isMemberOfSet() is a poor general-purpose method, and I need to ditch it. I like your code. I had to modify it a bit so it wouldn’t fall on strings composed entirely of whitespace: let testString = \n\n \t\t let attrStr = NSAttributedString( string:testString ) let str = attrStr.string as NSString let notWhitespace = NSCharacterSet.whitespaceAndNewlineCharacterSet().invertedSet var resultRange:NSRange let startRange = str.rangeOfCharacterFromSet( notWhitespace, options:NSStringCompareOptions.allZeros ) if startRange.length 0 { let endRange = str.rangeOfCharacterFromSet( notWhitespace, options:NSStringCompareOptions.BackwardsSearch ) let startIndex = startRange.location let endIndex = endRange.location + endRange.length resultRange = NSRange( location:startIndex, length:endIndex - startIndex ) } else { // String is empty or all whitespace resultRange = NSRange( location:0, length:0 ) } let resultStr = attrStr.attributedSubstringFromRange( resultRange ) So, even though attrStr.string returns an NSString, you have use the “as” to explicitly keep the type and be able to do math on range indexes. Lacking that cast is what made rangeOfCharacterFromSet() useless to me yesterday. Your code seems way better. but is there a way in the playground for use to test addresses to make sure attrStr.string as NSString doesn’t perform a copy? — Charles On April 3, 2015 at 2:04:01 PM, Quincey Morris (quinceymor...@rivergatesoftware.com) wrote: On Apr 3, 2015, at 04:00 , Charles Jenkins cejw...@gmail.com wrote: for char in String( self ).utf16 { if set.characterIsMember( char ) { return true } Now we’re back to the place we started. This code is wrong. It fails for any code point that isn’t representable a single UTF-16 code value, and it fails for any grapheme that isn’t representable as a single code point. This is what I would do (playground version): import Cocoa let notWhitespace = NSCharacterSet.whitespaceAndNewlineCharacterSet().invertedSet let attrStr = NSAttributedString( string: Fourscore and seven years ago... \n\n \t\t ) let str = attrStr.string as NSString let startRange = str.rangeOfCharacterFromSet(notWhitespace, options: NSStringCompareOptions.allZeros) let endRange = str.rangeOfCharacterFromSet(notWhitespace, options: NSStringCompareOptions.BackwardsSearch) let startIndex = startRange.length != 0 ? startRange.location : 0 let endIndex = endRange.length != 0 ? endRange.location + 1 : str.length let resultRange = NSRange (location: startIndex, length: endIndex - startIndex) let resultStr = attrStr.attributedSubstringFromRange (resultRange) It’s the Obj-C code, just written in Swift. The ‘as NSString’ in the 3rd line makes it work. The practical difficulty in your original approach is that (e.g.) String.rangeOfCharacterFromSet returns a RangeString.Index, but AFAICT that isn’t convertible back to a NSRange, or even just integer indexes. At the same time, AFAICT it isn’t useful with a String because it doesn’t contain Character indexes, just unichar indexes, which have no meaning for a String in general. Actually, my testing is with Swift 1.1, since I’m not in a position to move to Xcode 6.3 yet. It’s possible that the results are different in Swift 1.2. However, in the section of the release notes that talks about bridging between String and NSString, it says: Note that these Cocoa types in Objective-C headers are still automatically bridged to their corresponding Swift type so I suspect the results would be the same in 1.2. It seems to me there is an actual bug here: “String methods corresponding to NSString methods that return NSRange values actually return RangeString.Index values, but these are not valid ranges, either for String objects (they represent UTF-16 code value positions, not Character positions) or for NSString objects (they’re not convertible back to NSRange). The String methods ought to return NSRange values just like their NSString counterparts.” ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 3, 2015, at 13:18 , Charles Jenkins cejw...@gmail.com wrote: is there a way in the playground for use to test addresses to make sure attrStr.string as NSString doesn’t perform a copy? I doubt it. This is the best I could come up with in a couple of minutes: import Cocoa let notWhitespace = NSCharacterSet.whitespaceAndNewlineCharacterSet().invertedSet let count = 50 let aString: String = String (count: count, repeatedValue: Character (A)) let aNSString: NSString = ( as NSString).stringByPaddingToLength (count, withString: A, startingAtIndex: 0) let date1 = NSDate () let bString: String = aNSString as String let date2 = NSDate () let time2 = date2.timeIntervalSinceDate(date1) let date3 = NSDate () let bNSString: NSString = aString as NSString let date4 = NSDate () let time4 = date4.timeIntervalSinceDate(date3) let attrStr = NSAttributedString (string: aNSString) let date5 = NSDate () let range5 = attrStr.string.rangeOfCharacterFromSet(notWhitespace, options: NSStringCompareOptions.allZeros) let date6 = NSDate () let time6 = date6.timeIntervalSinceDate(date5) let date7 = NSDate () let range7 = (attrStr.string as NSString).rangeOfCharacterFromSet(notWhitespace, options: NSStringCompareOptions.allZeros) let date8 = NSDate () let time8 = date8.timeIntervalSinceDate(date7) Playground results: time2: 0.3328 time4: 0.1817 time6: 0.0022 time8: 0.0017 Since the “rangeOfCharacter” scans terminate at the first character, this suggests that there’s no real conversion in the last case, which is the one you’re interested in. (Also, time6 and time8 don’t vary with the value of ‘count’.) However, generalizing from this seems treacherous. And I may have just Done It Wrong™. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
ok here’s my try, assuming NSLinguisticTagger knows what it’s doing. And yes it’s a bit stupid to use a linguistic tagger to do something like that but .. whatever var str = Some String WIth Whitespace var lt = NSLinguisticTagger( tagSchemes: [NSLinguisticTagSchemeTokenType], options: 0 ) lt.string = str var endsWithWhitespace = ( lt.tagAtIndex( (str as NSString).length-1, scheme: NSLinguisticTagSchemeTokenType, tokenRange: nil, sentenceRange: nil ) == NSLinguisticTagOtherWhitespace ) ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 2, 2015, at 19:28 , Charles Jenkins cejw...@gmail.com wrote: I can indeed call attrStr.string.rangeOfCharacterFromSet(). But in typical Swift string fashion, the return type is as unfriendly as possible: RangeString.Index? — as if the NSString were a Swift string. I finally read the whole of what you said here, and I had to run to a playground to check: import Cocoa var strA = Hello?, String” var strB = Hello?, String as NSString var strC = Hello\u{1f650}, String” var strD = Hello\u{1f650}, NSString as NSString var rangeA = strA.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // {Some “7..8”} var rangeB = strB.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // (7,1) var rangeC = strC.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // {Some “8..9”} var rangeD = strD.rangeOfCharacterFromSet(NSCharacterSet.whitespaceCharacterSet()) // (8,1) So, yes, these are NSString indexes all the way, even if the result is packaged as a RangeString.Index. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 2, 2015, at 19:28 , Charles Jenkins cejw...@gmail.com wrote: So after doing two anchored searches, one at the beginning and one at the end of the string, if I get two different ranges, I’m stuck with two values that aren’t subtractable to determine the length of the NSRange I need in a call to attributedSubstringFromRange(). Not at all. All of this API is *NSString* API, even if the instance happens to be String instead of NSString, so the ranges are NSString-compatible ranges (i.e. UTF16 code value ranges), so you can just do the subtraction and use the result in attributedSubstringFromRange. I think the safest thing for me to do for attributed string compatibility is give up on Swift purity and put my range-trimming function in an Objective-C file. Again, it’s all NSString API, so the results are what the Obj-C API would return. Otherwise, interoperability wouldn’t work. If, additionally, you cast any String-returning result ‘as’ NSString, then you literally are doing Obj-C, though it happens to be created by the Swift compiler. That is to say, it’s going to make Obj-C-style method calls with an Obj-C-NSString-style object as receiver, so the source language is irrelevant. (!) That, or I’ve run wildly off the rails. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
Amen, brother. Given my attributed string “attrStr,” I can indeed call attrStr.string.rangeOfCharacterFromSet(). But in typical Swift string fashion, the return type is as unfriendly as possible: RangeString.Index? — as if the NSString were a Swift string. So after doing two anchored searches, one at the beginning and one at the end of the string, if I get two different ranges, I’m stuck with two values that aren’t subtractable to determine the length of the NSRange I need in a call to attributedSubstringFromRange(). I think the safest thing for me to do for attributed string compatibility is give up on Swift purity and put my range-trimming function in an Objective-C file. — Charles On April 2, 2015 at 2:15:07 PM, Quincey Morris (quinceymor...@rivergatesoftware.com) wrote: On Apr 2, 2015, at 04:54 , Charles Jenkins cejw...@gmail.com wrote: Swift has a built-in func stringByTrimmingCharactersInSet(set: NSCharacterSet) - String There is something wacky going on here — or not. (I know you don’t want to use this particular method, but I’m just using it as an example.) First of all, String and NSString are different classes, for real. Quoting a god-like personage, in a recent thread: On Mar 23, 2015, at 13:52 , Greg Parker gpar...@apple.com wrote: Most of NSString's methods are re-implemented in a Swift extension on class String. You get this extension when you `import Cocoa`. And indeed, if you try this in a playground: let strA = Hello, string let strB = Hello, NSString as NSString let a = strA.characterAtIndex (6) // line 3 let b = strB.characterAtIndex (6) // line 4 you get an error at line 3, as you would expect/hope (since Strings aren’t “made of” unichars), but no error in line 4 (since NSStrings are). So it’s not odd that String.stringByTrimmingCharactersInSet would return a String. What’s very odd is that *in Swift* NSString.stringByTrimmingCharactersInSet returns a String — not a NSString — as does NSAttributedString.string, or apparently any Cocoa API that would return a NSString in Obj-C. This means it’s not possible *in Swift* to apply NSString methods to a NSString and stay entirely within the NSString world without casting/converting. *That’s* wacky, given that String and NSString are different classes with different (though very similar) APIs. The only way to un-wack this, that I can think of right now, would be for expressions like ‘someNSString.stringByTrimmingCharactersInSet (…) as NSString’ to involve only a cheap or free conversion from String to NSString. However there is no API contract to this effect AFAIK. Therefore: 1. We need a god-like personage to step in and un-wack this for real. 2. Subject to the outcome of #1, you can approach this entirely in the NSString world, in which case I like Uli’s suggestion, applied to 'yourAttributedString.string as NSString’. You’d have to verify by performance testing that massive conversions aren’t being made. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
I kept my original question as brief as I could, but let me tell you what problem I’m trying to solve, and maybe someone will have good advice I haven’t yet considered. I’m trying to code in pure Swift. I have an NSAttributedString which can potentially be very large, and I want to save off the attributedSubstringFromRange: which represents the string with leading and trailing whitespace trimmed. I’m trying to avoid copying the giant string merely to determine the proper substring range for copying it again. Swift has a built-in func stringByTrimmingCharactersInSet(set: NSCharacterSet) - String which won’t help me because using it would copy the string and discard the attributes. Even using it for length-testing wouldn’t work, because I have no way to know how many characters were trimmed off the head versus the tail of the string. What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. If there were no conversion to the unicode-compliant and amazingly difficult-to-do-anything-with-it Swift string, I’d be more confident that the shrunken range I calculate would be apples to apples. -- Charles On April 2, 2015 at 01:25:40, Quincey Morris (quinceymor...@rivergatesoftware.com) wrote: On Apr 1, 2015, at 21:17 , Charles Jenkins cejw...@gmail.com wrote: for ch in String(char).utf16 { if !set.characterIsMember(ch) { found = false } } Except that this code can’t possibly be right, in general. 1. A ‘unichar’ is a UTF-16 code value, but it’s not a Unicode code point. Some UTF-16 code values have no meaning as “characters” by themselves. I think you could mitigate this problem by using ‘longCharacterIsMember’, which takes a UTF-32 code value instead (and enumerating the string as UTF-32 instead of UTF-16). 2. A Swift ‘Character’ isn’t a Unicode code point, but rather a grapheme. That is, it might be a sequence of code points (and I mean code points, not code values). It might be such a sequence either because there’s no way of representing the grapheme by a single code point, or because it’s a composed character made up of a base code points and some combining characters. In this case, you can’t validly test the individual code points for membership of the character set. I’m not sure, but I suspect the underlying obstacle is that NSCharacterSet is at best a set of code points, and you cannot test a grapheme for membership of a set of code points. In your particular application, if it’s true that all** Unicode whitespace characters are represented as a single code point (via a single UTF-32 code value), or a single UTF-16 code value, then you can get away with one of the above solutions. Otherwise you’re going to need a more complex solution, that doesn’t involve NSCharacterSet at all. ** Or at least the ones you happen to care about, but ignoring the others may be a perilous proceeding. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On 02 Apr 2015, at 13:54, Charles Jenkins cejw...@gmail.com mailto:cejw...@gmail.com wrote: What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. If there were no conversion to the unicode-compliant and amazingly difficult-to-do-anything-with-it Swift string, I’d be more confident that the shrunken range I calculate would be apples to apples. Does Swift have an equivalent to rangeOfCharacterFromSet:options: or would that require converting it to NSString? Because you could just generate the inverse NSCharacterSet to the whitespace character set, and then look for the first (NSAnchoredSearch) and last (NSAnchoredSearch | NSBackwardsSearch) non-whitespace character, and then extract only the range between those two offsets. Wildly guessing, -- Uli Kusterer http://stacksmith http://stacksmith/.org ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 2, 2015, at 6:54 AM, Charles Jenkins cejw...@gmail.com wrote: What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. NSAttributedString.string does not involve a conversion. The underlying string is part of NSAttributedString's data model. The documentation for the method explicitly says, For performance reasons, this property returns the current backing store of the attributed string object. I don't know if there's a conversion to create a Swift string from that, but you don't have to. I believe you can work with NSString in Swift. Regards, Ken ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 2, 2015, at 4:54 AM, Charles Jenkins cejw...@gmail.com wrote: What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. If there were no conversion to the unicode-compliant and amazingly difficult-to-do-anything-with-it Swift string, I’d be more confident that the shrunken range I calculate would be apples to apples. Use NSString.rangeOfCharactersFromSet() on the attributed string’s underlying NSString. Don’t use any native Swift String character accessors, because the character positions aren’t going to agree with NSString since they use different interpretations of Unicode. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
The documentation certainly says that, Ken, but stick this code in a playground and see that you can’t examine the characters via index no matter whether you assume it to be String or NSString: let whitespaceSet = NSCharacterSet.whitespaceAndNewlineCharacterSet() let attrStr = NSAttributedString( string: Fourscore and seven years ago \n\n\n\t\t\t ) let str = attrStr.string var head = 0 let tooFar = attrStr.length while head tooFar { if whitespaceSet.characterIsMember( str.characterAtIndex( head ) ) { // Skip -- I did it this way so the error message received from the above line will be clear } else { break; } ++head } var headIx = str.startIndex let tooFarIx = str.endIndex while headIx tooFarIx { if whitespaceSet.characterIsMember( str[ headIx ] ) { // Skip } else { break; } headIx = headIx.successor() } characterAtIndex() doesn’t work because it’s not available in Swift. If you replace str.characterAtIndex( head ) with with str[ head ], you get the same error as in the version below it that incorrectly assumes it’s a Swift string: “Could not find overload of 'subscript' that accepts the supplied arguments.” Now, I did just type this out on a computer running Xcode 6.2. At home I’m using 6.3 beta, so it’s possible I’ll get home and find one of these versions works as expected, even though I’m sure I tried both ways last night when I first hit the roadblock… I’m now guessing that maybe converting from NSString to String and examining characters via one of the UTF views might possibly not involve a copy. But then how do I decide which view I should be using... -- Charles On April 2, 2015 at 08:44:52, Ken Thomases (k...@codeweavers.com) wrote: On Apr 2, 2015, at 6:54 AM, Charles Jenkins cejw...@gmail.com wrote: What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. NSAttributedString.string does not involve a conversion. The underlying string is part of NSAttributedString's data model. The documentation for the method explicitly says, For performance reasons, this property returns the current backing store of the attributed string object. I don't know if there's a conversion to create a Swift string from that, but you don't have to. I believe you can work with NSString in Swift. Regards, Ken ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
Oops. My documentation viewer was set up wrong. characterAtIndex() is indeed supposed to be available in Swift. Don’t know what I’ve done wrong that I can’t use it in a playground. -- Charles On April 2, 2015 at 10:18:00, Charles Jenkins (cejw...@gmail.com) wrote: The documentation certainly says that, Ken, but stick this code in a playground and see that you can’t examine the characters via index no matter whether you assume it to be String or NSString: let whitespaceSet = NSCharacterSet.whitespaceAndNewlineCharacterSet() let attrStr = NSAttributedString( string: Fourscore and seven years ago \n\n\n\t\t\t ) let str = attrStr.string var head = 0 let tooFar = attrStr.length while head tooFar { if whitespaceSet.characterIsMember( str.characterAtIndex( head ) ) { // Skip -- I did it this way so the error message received from the above line will be clear } else { break; } ++head } var headIx = str.startIndex let tooFarIx = str.endIndex while headIx tooFarIx { if whitespaceSet.characterIsMember( str[ headIx ] ) { // Skip } else { break; } headIx = headIx.successor() } characterAtIndex() doesn’t work because it’s not available in Swift. If you replace str.characterAtIndex( head ) with with str[ head ], you get the same error as in the version below it that incorrectly assumes it’s a Swift string: “Could not find overload of 'subscript' that accepts the supplied arguments.” Now, I did just type this out on a computer running Xcode 6.2. At home I’m using 6.3 beta, so it’s possible I’ll get home and find one of these versions works as expected, even though I’m sure I tried both ways last night when I first hit the roadblock… I’m now guessing that maybe converting from NSString to String and examining characters via one of the UTF views might possibly not involve a copy. But then how do I decide which view I should be using... -- Charles On April 2, 2015 at 08:44:52, Ken Thomases (k...@codeweavers.com) wrote: On Apr 2, 2015, at 6:54 AM, Charles Jenkins cejw...@gmail.com wrote: What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. NSAttributedString.string does not involve a conversion. The underlying string is part of NSAttributedString's data model. The documentation for the method explicitly says, For performance reasons, this property returns the current backing store of the attributed string object. I don't know if there's a conversion to create a Swift string from that, but you don't have to. I believe you can work with NSString in Swift. Regards, Ken ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 2, 2015, at 04:54 , Charles Jenkins cejw...@gmail.com wrote: Swift has a built-in func stringByTrimmingCharactersInSet(set: NSCharacterSet) - String There is something wacky going on here — or not. (I know you don’t want to use this particular method, but I’m just using it as an example.) First of all, String and NSString are different classes, for real. Quoting a god-like personage, in a recent thread: On Mar 23, 2015, at 13:52 , Greg Parker gpar...@apple.com wrote: Most of NSString's methods are re-implemented in a Swift extension on class String. You get this extension when you `import Cocoa`. And indeed, if you try this in a playground: let strA = Hello, string let strB = Hello, NSString as NSString let a = strA.characterAtIndex (6) // line 3 let b = strB.characterAtIndex (6) // line 4 you get an error at line 3, as you would expect/hope (since Strings aren’t “made of” unichars), but no error in line 4 (since NSStrings are). So it’s not odd that String.stringByTrimmingCharactersInSet would return a String. What’s very odd is that *in Swift* NSString.stringByTrimmingCharactersInSet returns a String — not a NSString — as does NSAttributedString.string, or apparently any Cocoa API that would return a NSString in Obj-C. This means it’s not possible *in Swift* to apply NSString methods to a NSString and stay entirely within the NSString world without casting/converting. *That’s* wacky, given that String and NSString are different classes with different (though very similar) APIs. The only way to un-wack this, that I can think of right now, would be for expressions like ‘someNSString.stringByTrimmingCharactersInSet (…) as NSString’ to involve only a cheap or free conversion from String to NSString. However there is no API contract to this effect AFAIK. Therefore: 1. We need a god-like personage to step in and un-wack this for real. 2. Subject to the outcome of #1, you can approach this entirely in the NSString world, in which case I like Uli’s suggestion, applied to 'yourAttributedString.string as NSString’. You’d have to verify by performance testing that massive conversions aren’t being made. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 1, 2015, at 21:17 , Charles Jenkins cejw...@gmail.com wrote: for ch in String(char).utf16 { if !set.characterIsMember(ch) { found = false } } Except that this code can’t possibly be right, in general. 1. A ‘unichar’ is a UTF-16 code value, but it’s not a Unicode code point. Some UTF-16 code values have no meaning as “characters” by themselves. I think you could mitigate this problem by using ‘longCharacterIsMember’, which takes a UTF-32 code value instead (and enumerating the string as UTF-32 instead of UTF-16). 2. A Swift ‘Character’ isn’t a Unicode code point, but rather a grapheme. That is, it might be a sequence of code points (and I mean code points, not code values). It might be such a sequence either because there’s no way of representing the grapheme by a single code point, or because it’s a composed character made up of a base code points and some combining characters. In this case, you can’t validly test the individual code points for membership of the character set. I’m not sure, but I suspect the underlying obstacle is that NSCharacterSet is at best a set of code points, and you cannot test a grapheme for membership of a set of code points. In your particular application, if it’s true that all** Unicode whitespace characters are represented as a single code point (via a single UTF-32 code value), or a single UTF-16 code value, then you can get away with one of the above solutions. Otherwise you’re going to need a more complex solution, that doesn’t involve NSCharacterSet at all. ** Or at least the ones you happen to care about, but ignoring the others may be a perilous proceeding. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
On Apr 1, 2015, at 8:14 PM, Charles Jenkins cejw...@gmail.com wrote: Given this code: let someCharacter = str[str.endIndex.predecessor()] How can I determine if someCharacter is whitespace? import Foundation func isChar(char: Character, inSet set: NSCharacterSet) - Bool { // this function is from an answer on StackOverflow: // http://stackoverflow.com/questions/27697508/nscharacterset-characterismember-with-swifts-character-type var found = true for ch in String(char).utf16 { if !set.characterIsMember(ch) { found = false } } return found } let str = foo let chr = str[str.endIndex.predecessor()] let isWhitespace = isChar(chr, inSet: NSCharacterSet.whitespaceAndNewlineCharacterSet()) // true Charles ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Swift: How to determine if a Character represents whitespace?
Thank you very much. :-) I had been trying to figure out how to use NSCharacterSet, but I didn’t know the bit about converting to UTF-16 string first. — Charles On April 1, 2015 at 9:52:47 PM, Charles Srstka (cocoa...@charlessoft.com) wrote: On Apr 1, 2015, at 8:14 PM, Charles Jenkins cejw...@gmail.com wrote: Given this code: let someCharacter = str[str.endIndex.predecessor()] How can I determine if someCharacter is whitespace? import Foundation func isChar(char: Character, inSet set: NSCharacterSet) - Bool { // this function is from an answer on StackOverflow: // http://stackoverflow.com/questions/27697508/nscharacterset-characterismember-with-swifts-character-type var found = true for ch in String(char).utf16 { if !set.characterIsMember(ch) { found = false } } return found } let str = foo let chr = str[str.endIndex.predecessor()] let isWhitespace = isChar(chr, inSet: NSCharacterSet.whitespaceAndNewlineCharacterSet()) // true Charles ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com