Re: NSXML and invalid UTF8 characters

2010-01-31 Thread Andrew Thompson
I'm a little surprised that no one else mentioned it, but are you sure that you actually want to strip the characters? As Sixten Otto said For what it's worth, another common cause of problems with stuff pasted from Word (at least on the web), is Word docs that contain characters from the

Re: NSXML and invalid UTF8 characters

2010-01-31 Thread Jens Alfke
On Jan 31, 2010, at 9:42 AM, Andrew Thompson wrote: 0x80 to 0x9F in codepage 1252 inclues the Euro sign, the bullet (option-8 on the mac) the en-dash and em-dash... i.e. all things that will be found even in English text. (Reference http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx)

Re: NSXML and invalid UTF8 characters

2010-01-29 Thread Keith Blount
--- On Fri, 1/29/10, Jens Alfke j...@mooseyard.com wrote: From: Jens Alfke j...@mooseyard.com Subject: Re: NSXML and invalid UTF8 characters To: Keith Blount keithblo...@yahoo.com Cc: cocoa-dev@lists.apple.com Date: Friday, January 29, 2010, 3:23 AM On Jan 28, 2010, at 3:47 PM, Keith Blount wrote

Re: NSXML and invalid UTF8 characters

2010-01-29 Thread Jens Alfke
This code looks good. Just a few possible improvements, in the spirit of code-review: On Jan 29, 2010, at 4:00 AM, Keith Blount wrote: NSMutableCharacterSet *XMLCharacterSet = [[NSMutableCharacterSet alloc] init]; Variable names shouldn't start with an uppercase letter — the

Re: NSXML and invalid UTF8 characters

2010-01-29 Thread Keith Blount
and all the best, Keith --- On Fri, 1/29/10, Jens Alfke j...@mooseyard.com wrote: From: Jens Alfke j...@mooseyard.com Subject: Re: NSXML and invalid UTF8 characters To: Keith Blount keithblo...@yahoo.com Cc: cocoa-dev@lists.apple.com Date: Friday, January 29, 2010, 5:05 PM This code looks good

Re: NSXML and invalid UTF8 characters

2010-01-29 Thread Jens Alfke
On Jan 29, 2010, at 11:16 AM, Keith Blount wrote: A habit from my fear that the compiler will get even fussier (for instance it is these days fussier about conditional expressions). The compiler will never complain about that. It's a basic tenet of object-oriented programming that an

NSXML and invalid UTF8 characters

2010-01-28 Thread Keith Blount
Hello, I am using the NSXML classes to generate and parse my own XML files. Sometimes these files store strings of text that has been brought in from other applications (for instance, there might be a plain text representation of some text the user has pasted in from Word). In some instances

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Sixten Otto
On Thu, Jan 28, 2010 at 6:16 PM, Keith Blount keithblo...@yahoo.com wrote: I am using the NSXML classes to generate and parse my own XML files. Sometimes these files store strings of text that has been brought in from other applications (for instance, there might be a plain text

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Jens Alfke
On Jan 28, 2010, at 3:16 PM, Keith Blount wrote: So, my question is, what is the best way for me to filter out these invalid characters from my NSString before I pass it into NSXMLElement's -initWithName:stringValue: or similar methods, to avoid creating XML documents that won't open?

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Keith Blount
it seems simple to those more grounded in C. Thanks again. All the best, Keith --- On Thu, 1/28/10, Sixten Otto hims...@sfko.com wrote: From: Sixten Otto hims...@sfko.com Subject: Re: NSXML and invalid UTF8 characters To: Keith Blount keithblo...@yahoo.com Cc: cocoa-dev@lists.apple.com Date

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Keith Blount
wrote: From: Jens Alfke j...@mooseyard.com Subject: Re: NSXML and invalid UTF8 characters To: Keith Blount keithblo...@yahoo.com Cc: cocoa-dev@lists.apple.com Date: Thursday, January 28, 2010, 11:40 PM On Jan 28, 2010, at 3:16 PM, Keith Blount wrote: So, my question is, what is the best

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Keith Blount
As an update, I tried this, which seems to partially work: - (NSString *)stringCleanedForXML // in an NSString category { unichar character; NSInteger index, len = [self length]; NSMutableString *cleanedString = [[NSMutableString alloc] init]; for (index =

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Graham Cox
On 29/01/2010, at 11:29 AM, Keith Blount wrote: As an update, I tried this, which seems to partially work: - (NSString *)stringCleanedForXML // in an NSString category { unichar character; [] Using this saved my XML strings in such a way as they didn't produce errors on loading,

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Graham Cox
On 29/01/2010, at 11:34 AM, Graham Cox wrote: 0x10 are (at least) 20 bit constants 24-bits in this case (misread it). --Graham ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Jens Alfke
On Jan 28, 2010, at 3:47 PM, Keith Blount wrote: Many thanks for your reply. Wouldn't using these methods be a lot more expensive (and slower) than going through using -characterAtIndex: or something similar, accessing the characters directly, though? No, because it's more efficient to let

Re: NSXML and invalid UTF8 characters

2010-01-28 Thread Jens Alfke
On Jan 28, 2010, at 4:29 PM, Keith Blount wrote: [cleanedString appendFormat:@%C, character]; If you're worried about efficiency, format conversions like this are particularly slow; so is building up an NSString a character at a time. It's more efficient to allocate a