Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Charles Jenkins
Jens is right, and I was quite mistaken. The disaster wasn’t a problem with 
XMLParser’s deserialization at all.  

When writing out my paragraphs, I create one p node for each and put the full 
paragraph text in. My mistake was believing XMLParser would operate the same 
way, reading the full text of the p node at once. So every time my delegate’s 
parser:foundCharacters: was called, I treated the delivered string as a full 
paragraph and added newlines.

In fact, parser:foundCharacters: may be called repeatedly to deliver paragraph 
text in chunks. In my case, ampersands and curled quotes were delivered in 
their own chunks, and it was me botching the result up by inserting extraneous 
newlines. I have no excuse, because Apple’s documentation says three times that 
the string received by parser:foundCharacters: may be incomplete.

—

Charles Jenkins


On Thursday, January 8, 2015 at 12:30 PM, Jens Alfke wrote:

  
  On Jan 8, 2015, at 4:43 AM, Charles Jenkins cejw...@gmail.com 
  (mailto:cejw...@gmail.com) wrote:
  I'm writing data to XML. When you create a node and set its string 
  contents, the node will happily accept whatever string you give and allow 
  you to serialize information XML deserialization cannot then recreate. In 
  my case, the string in question contained curled quotes. I could serialize 
  and save the data—and if I remember correctly* the output looked good when 
  I inspected the file on disk—but reading it back and deserializing it led 
  to disaster!
 No, it's fine for XML text to contain non-ASCII Unicode characters.
  
 —Jens  

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Charles Jenkins
When you try to reinvent the wheel, most often what you end up with is a flat 
tire.

I need to deal with two issues that are probably already handled in some Cocoa 
API I just haven't found yet. This email asks about the first of these issues.  

I'm writing data to XML. When you create a node and set its string contents, 
the node will happily accept whatever string you give and allow you to 
serialize information XML deserialization cannot then recreate. In my case, the 
string in question contained curled quotes. I could serialize and save the 
data—and if I remember correctly* the output looked good when I inspected the 
file on disk—but reading it back and deserializing it led to disaster! Right 
now I'm using NSString stringByAddingPercentEncoding: and having no further 
problems with curled quotes, but I'm sure that's a poor long-term solution.

*I encountered this problem a few weeks ago and put off a final solution by 
using the percent encoding.

Is there already a Cocoa API call that would convert a string to use HTML 
entities so I could safely put any string into an XML node?

—  

Charles

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Aandi Inston
I am not familiar with the API you are using, I use my own XML
generator/parser, but it may be worth nothing something about XML. XML
files are implicitly Unicode and generally UTF-8. So you cannot put an
arbitrary sequence of bytes into XML as a string. A curly quote is not in
the low Latin (=127) range so it must be a multibyte value.

Clearly there are different API approaches possible on encoding:
- convert an input encoding to UTF-8
- accept and write UTF-8 with validation, rejecting bad UTF-8 sequences
- accept and write UTF-8 with validation, converting bad UTF-8  sequences
silently to something else
- accept and write UTF-8 without validation, potentially writing malformed
XML
Parsers have similar choices to make. But anyway, if your data is not valid
UTF-8, it would explain why you get disastrous results.

XML has no standard binary representation for anything other than Unicode
strings, so symmetric encoding/decoding of such data, following your own
invention or some extension to basic XML, is the only way. A low level XML
API cannot be expected to offer this, especially one intended to write XML
for consumption by other software.

(This is in addition to the five characters prohibited in strings because
they are XML markup).


On Thu, Jan 8, 2015 at 12:43 PM, Charles Jenkins cejw...@gmail.com wrote:


 I'm writing data to XML. When you create a node and set its string
 contents, the node will happily accept whatever string you give and allow
 you to serialize information XML deserialization cannot then recreate. In
 my case, the string in question contained curled quotes. I could serialize
 and save the data—and if I remember correctly* the output looked good when
 I inspected the file on disk—but reading it back and deserializing it led
 to disaster! Right now I'm using NSString stringByAddingPercentEncoding:
 and having no further problems with curled quotes, but I'm sure that's a
 poor long-term solution.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Michael Crawford
Do you absolutely _require_ the use of Cocoa to process your XML?

There are oodles of Open Source XML libraries.  I myself have had
great success with Xerces-C (actually C+).
Michael David Crawford, Consulting Software Engineer
mdcrawf...@gmail.com
http://www.warplife.com/mdc/

   Available for Software Development in the Portland, Oregon Metropolitan
Area.


On Thu, Jan 8, 2015 at 5:27 AM, Aandi Inston aa...@quite.com wrote:
 I am not familiar with the API you are using, I use my own XML
 generator/parser, but it may be worth nothing something about XML. XML
 files are implicitly Unicode and generally UTF-8. So you cannot put an
 arbitrary sequence of bytes into XML as a string. A curly quote is not in
 the low Latin (=127) range so it must be a multibyte value.

 Clearly there are different API approaches possible on encoding:
 - convert an input encoding to UTF-8
 - accept and write UTF-8 with validation, rejecting bad UTF-8 sequences
 - accept and write UTF-8 with validation, converting bad UTF-8  sequences
 silently to something else
 - accept and write UTF-8 without validation, potentially writing malformed
 XML
 Parsers have similar choices to make. But anyway, if your data is not valid
 UTF-8, it would explain why you get disastrous results.

 XML has no standard binary representation for anything other than Unicode
 strings, so symmetric encoding/decoding of such data, following your own
 invention or some extension to basic XML, is the only way. A low level XML
 API cannot be expected to offer this, especially one intended to write XML
 for consumption by other software.

 (This is in addition to the five characters prohibited in strings because
 they are XML markup).


 On Thu, Jan 8, 2015 at 12:43 PM, Charles Jenkins cejw...@gmail.com wrote:


 I'm writing data to XML. When you create a node and set its string
 contents, the node will happily accept whatever string you give and allow
 you to serialize information XML deserialization cannot then recreate. In
 my case, the string in question contained curled quotes. I could serialize
 and save the data--and if I remember correctly* the output looked good when
 I inspected the file on disk--but reading it back and deserializing it led
 to disaster! Right now I'm using NSString stringByAddingPercentEncoding:
 and having no further problems with curled quotes, but I'm sure that's a
 poor long-term solution.


 ___

 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com

 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/mdcrawford%40gmail.com

 This email sent to mdcrawf...@gmail.com
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Charles Jenkins
Fantastic! I can't wait to get home and try it!  

—  

Charles


On Thursday, January 8, 2015 at 11:08, Keary Suska wrote:

 NSDictionary *documentAttributes = @{NSDocumentTypeDocumentAttribute: 
 NSHTMLTextDocumentType};
 NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) 
 documentAttributes:documentAttributes error:NULL];
 NSString *htmlString = [[NSString alloc] initWithData:htmlData 
 encoding:NSUTF8StringEncoding];


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Jens Alfke

 On Jan 8, 2015, at 4:43 AM, Charles Jenkins cejw...@gmail.com wrote:
 
 I'm writing data to XML. When you create a node and set its string contents, 
 the node will happily accept whatever string you give and allow you to 
 serialize information XML deserialization cannot then recreate. In my case, 
 the string in question contained curled quotes. I could serialize and save 
 the data—and if I remember correctly* the output looked good when I inspected 
 the file on disk—but reading it back and deserializing it led to disaster!

No, it's fine for XML text to contain non-ASCII Unicode characters. The problem 
in your case was probably that the doctype string at the start of the document 
didn't properly declare the text encoding. 

What you want to do is write the XML as UTF-8 and add the proper annotation to 
that effect in the doctype. (Sorry, it's been years since I worked with XML so 
I don't remember the exact syntax for doing this.)

The only characters that MUST be escaped in XML text are  and .

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Greg Weston
Aandi Inston wrote:
 (This is in addition to the five characters prohibited in strings because
 they are XML markup).

Minor nit. There are only 2 prohibited characters in XML, whether in a string 
or out.
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Am I Reinventing the Wheel? (Part I)

2015-01-08 Thread Keary Suska
On Jan 8, 2015, at 5:43 AM, Charles Jenkins cejw...@gmail.com wrote:

 I need to deal with two issues that are probably already handled in some 
 Cocoa API I just haven't found yet. This email asks about the first of these 
 issues.  
 
 I'm writing data to XML. When you create a node and set its string contents, 
 the node will happily accept whatever string you give and allow you to 
 serialize information XML deserialization cannot then recreate. In my case, 
 the string in question contained curled quotes. I could serialize and save 
 the data—and if I remember correctly* the output looked good when I inspected 
 the file on disk—but reading it back and deserializing it led to disaster! 
 Right now I'm using NSString stringByAddingPercentEncoding: and having no 
 further problems with curled quotes, but I'm sure that's a poor long-term 
 solution.
 
 *I encountered this problem a few weeks ago and put off a final solution by 
 using the percent encoding.
 
 Is there already a Cocoa API call that would convert a string to use HTML 
 entities so I could safely put any string into an XML node?

You can apparently route through NSAttributedString (found via StackOverflow):

NSAttributedString *s = [[NSAttributedString alloc] 
initWithString:sourceString];
NSDictionary *documentAttributes = @{NSDocumentTypeDocumentAttribute: 
NSHTMLTextDocumentType};
NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) 
documentAttributes:documentAttributes error:NULL];
NSString *htmlString = [[NSString alloc] initWithData:htmlData 
encoding:NSUTF8StringEncoding];

HTH,

Keary Suska
Esoteritech, Inc.
Demystifying technology for your home or business


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com