Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-07 Thread Uli Kusterer
On 06 May 2014, at 20:12, Quincey Morris quinceymor...@rivergatesoftware.com wrote: FWIW, my opinion is that if your library clients are specifying UTF-8 sequences at the API, and expect byte offsets into those sequences to be meaningful, you might well be forced to maintain the original

Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-06 Thread Jens Alfke
On May 5, 2014, at 10:19 PM, Stephen J. Butler stephen.but...@gmail.com wrote: What's your next step after doing the UTF8 to UTF16 range conversion? If it's just going to be -[NSString substringWithRange:] then I'd strongly suggest just doing -[NSString initWithBytes:length:encoding:] on

Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-06 Thread Mark Munz
No, it would probably be to highlight that range of the string in a text view, which does require knowing the character range. Maybe you could take each of the ranges returned and create a string from the UTF8 byte stream; search for it in the original string; the results giving you the range

Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-06 Thread Quincey Morris
On May 5, 2014, at 12:06 , Jens Alfke j...@mooseyard.com wrote: How can I map a byte offset in a UTF-8 string back to the corresponding character offset in the NSString it came from? I’ve been thinking about this since your original question, and it seems to me that this is a subtler problem

Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-06 Thread Jens Alfke
On May 6, 2014, at 11:12 AM, Quincey Morris quinceymor...@rivergatesoftware.com wrote: I’ve been thinking about this since your original question, and it seems to me that this is a subtler problem than it seems: No offense, but I think you’re overanalyzing it. Remember I said that the UTF-8

How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-05 Thread Jens Alfke
How can I map a byte offset in a UTF-8 string back to the corresponding character offset in the NSString it came from? I’m writing an Objective-C wrapper around a C text-tokenizer API that takes a UTF-8 string as input, and as part of its output returns byte ranges of words that it found. So

Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-05 Thread Charles Srstka
On May 5, 2014, at 2:06 PM, Jens Alfke j...@mooseyard.com wrote: How can I map a byte offset in a UTF-8 string back to the corresponding character offset in the NSString it came from? I’m writing an Objective-C wrapper around a C text-tokenizer API that takes a UTF-8 string as input, and

Re: How to convert a UTF-8 byte offset into an NSString character offset?

2014-05-05 Thread Stephen J. Butler
What's your next step after doing the UTF8 to UTF16 range conversion? If it's just going to be -[NSString substringWithRange:] then I'd strongly suggest just doing -[NSString initWithBytes:length:encoding:] on the UTF8 string. At least profile it and see what the penalty is. You've already paid