Re: Am I Reinventing the Wheel? (Part I)
Jens is right, and I was quite mistaken. The disaster wasn’t a problem with XMLParser’s deserialization at all. When writing out my paragraphs, I create one p node for each and put the full paragraph text in. My mistake was believing XMLParser would operate the same way, reading the full text of the p node at once. So every time my delegate’s parser:foundCharacters: was called, I treated the delivered string as a full paragraph and added newlines. In fact, parser:foundCharacters: may be called repeatedly to deliver paragraph text in chunks. In my case, ampersands and curled quotes were delivered in their own chunks, and it was me botching the result up by inserting extraneous newlines. I have no excuse, because Apple’s documentation says three times that the string received by parser:foundCharacters: may be incomplete. — Charles Jenkins On Thursday, January 8, 2015 at 12:30 PM, Jens Alfke wrote: On Jan 8, 2015, at 4:43 AM, Charles Jenkins cejw...@gmail.com (mailto:cejw...@gmail.com) wrote: I'm writing data to XML. When you create a node and set its string contents, the node will happily accept whatever string you give and allow you to serialize information XML deserialization cannot then recreate. In my case, the string in question contained curled quotes. I could serialize and save the data—and if I remember correctly* the output looked good when I inspected the file on disk—but reading it back and deserializing it led to disaster! No, it's fine for XML text to contain non-ASCII Unicode characters. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Am I Reinventing the Wheel? (Part I)
When you try to reinvent the wheel, most often what you end up with is a flat tire. I need to deal with two issues that are probably already handled in some Cocoa API I just haven't found yet. This email asks about the first of these issues. I'm writing data to XML. When you create a node and set its string contents, the node will happily accept whatever string you give and allow you to serialize information XML deserialization cannot then recreate. In my case, the string in question contained curled quotes. I could serialize and save the data—and if I remember correctly* the output looked good when I inspected the file on disk—but reading it back and deserializing it led to disaster! Right now I'm using NSString stringByAddingPercentEncoding: and having no further problems with curled quotes, but I'm sure that's a poor long-term solution. *I encountered this problem a few weeks ago and put off a final solution by using the percent encoding. Is there already a Cocoa API call that would convert a string to use HTML entities so I could safely put any string into an XML node? — Charles ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Am I Reinventing the Wheel? (Part I)
I am not familiar with the API you are using, I use my own XML generator/parser, but it may be worth nothing something about XML. XML files are implicitly Unicode and generally UTF-8. So you cannot put an arbitrary sequence of bytes into XML as a string. A curly quote is not in the low Latin (=127) range so it must be a multibyte value. Clearly there are different API approaches possible on encoding: - convert an input encoding to UTF-8 - accept and write UTF-8 with validation, rejecting bad UTF-8 sequences - accept and write UTF-8 with validation, converting bad UTF-8 sequences silently to something else - accept and write UTF-8 without validation, potentially writing malformed XML Parsers have similar choices to make. But anyway, if your data is not valid UTF-8, it would explain why you get disastrous results. XML has no standard binary representation for anything other than Unicode strings, so symmetric encoding/decoding of such data, following your own invention or some extension to basic XML, is the only way. A low level XML API cannot be expected to offer this, especially one intended to write XML for consumption by other software. (This is in addition to the five characters prohibited in strings because they are XML markup). On Thu, Jan 8, 2015 at 12:43 PM, Charles Jenkins cejw...@gmail.com wrote: I'm writing data to XML. When you create a node and set its string contents, the node will happily accept whatever string you give and allow you to serialize information XML deserialization cannot then recreate. In my case, the string in question contained curled quotes. I could serialize and save the data—and if I remember correctly* the output looked good when I inspected the file on disk—but reading it back and deserializing it led to disaster! Right now I'm using NSString stringByAddingPercentEncoding: and having no further problems with curled quotes, but I'm sure that's a poor long-term solution. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Am I Reinventing the Wheel? (Part I)
Do you absolutely _require_ the use of Cocoa to process your XML? There are oodles of Open Source XML libraries. I myself have had great success with Xerces-C (actually C+). Michael David Crawford, Consulting Software Engineer mdcrawf...@gmail.com http://www.warplife.com/mdc/ Available for Software Development in the Portland, Oregon Metropolitan Area. On Thu, Jan 8, 2015 at 5:27 AM, Aandi Inston aa...@quite.com wrote: I am not familiar with the API you are using, I use my own XML generator/parser, but it may be worth nothing something about XML. XML files are implicitly Unicode and generally UTF-8. So you cannot put an arbitrary sequence of bytes into XML as a string. A curly quote is not in the low Latin (=127) range so it must be a multibyte value. Clearly there are different API approaches possible on encoding: - convert an input encoding to UTF-8 - accept and write UTF-8 with validation, rejecting bad UTF-8 sequences - accept and write UTF-8 with validation, converting bad UTF-8 sequences silently to something else - accept and write UTF-8 without validation, potentially writing malformed XML Parsers have similar choices to make. But anyway, if your data is not valid UTF-8, it would explain why you get disastrous results. XML has no standard binary representation for anything other than Unicode strings, so symmetric encoding/decoding of such data, following your own invention or some extension to basic XML, is the only way. A low level XML API cannot be expected to offer this, especially one intended to write XML for consumption by other software. (This is in addition to the five characters prohibited in strings because they are XML markup). On Thu, Jan 8, 2015 at 12:43 PM, Charles Jenkins cejw...@gmail.com wrote: I'm writing data to XML. When you create a node and set its string contents, the node will happily accept whatever string you give and allow you to serialize information XML deserialization cannot then recreate. In my case, the string in question contained curled quotes. I could serialize and save the data--and if I remember correctly* the output looked good when I inspected the file on disk--but reading it back and deserializing it led to disaster! Right now I'm using NSString stringByAddingPercentEncoding: and having no further problems with curled quotes, but I'm sure that's a poor long-term solution. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/mdcrawford%40gmail.com This email sent to mdcrawf...@gmail.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Am I Reinventing the Wheel? (Part I)
Fantastic! I can't wait to get home and try it! — Charles On Thursday, January 8, 2015 at 11:08, Keary Suska wrote: NSDictionary *documentAttributes = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType}; NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) documentAttributes:documentAttributes error:NULL]; NSString *htmlString = [[NSString alloc] initWithData:htmlData encoding:NSUTF8StringEncoding]; ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Am I Reinventing the Wheel? (Part I)
On Jan 8, 2015, at 4:43 AM, Charles Jenkins cejw...@gmail.com wrote: I'm writing data to XML. When you create a node and set its string contents, the node will happily accept whatever string you give and allow you to serialize information XML deserialization cannot then recreate. In my case, the string in question contained curled quotes. I could serialize and save the data—and if I remember correctly* the output looked good when I inspected the file on disk—but reading it back and deserializing it led to disaster! No, it's fine for XML text to contain non-ASCII Unicode characters. The problem in your case was probably that the doctype string at the start of the document didn't properly declare the text encoding. What you want to do is write the XML as UTF-8 and add the proper annotation to that effect in the doctype. (Sorry, it's been years since I worked with XML so I don't remember the exact syntax for doing this.) The only characters that MUST be escaped in XML text are and . —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Am I Reinventing the Wheel? (Part I)
Aandi Inston wrote: (This is in addition to the five characters prohibited in strings because they are XML markup). Minor nit. There are only 2 prohibited characters in XML, whether in a string or out. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Am I Reinventing the Wheel? (Part I)
On Jan 8, 2015, at 5:43 AM, Charles Jenkins cejw...@gmail.com wrote: I need to deal with two issues that are probably already handled in some Cocoa API I just haven't found yet. This email asks about the first of these issues. I'm writing data to XML. When you create a node and set its string contents, the node will happily accept whatever string you give and allow you to serialize information XML deserialization cannot then recreate. In my case, the string in question contained curled quotes. I could serialize and save the data—and if I remember correctly* the output looked good when I inspected the file on disk—but reading it back and deserializing it led to disaster! Right now I'm using NSString stringByAddingPercentEncoding: and having no further problems with curled quotes, but I'm sure that's a poor long-term solution. *I encountered this problem a few weeks ago and put off a final solution by using the percent encoding. Is there already a Cocoa API call that would convert a string to use HTML entities so I could safely put any string into an XML node? You can apparently route through NSAttributedString (found via StackOverflow): NSAttributedString *s = [[NSAttributedString alloc] initWithString:sourceString]; NSDictionary *documentAttributes = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType}; NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) documentAttributes:documentAttributes error:NULL]; NSString *htmlString = [[NSString alloc] initWithData:htmlData encoding:NSUTF8StringEncoding]; HTH, Keary Suska Esoteritech, Inc. Demystifying technology for your home or business ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com