I had a similar thing happen, we were getting some accented characters
   (>  7F/127) and the parser would jerk to a halt (sometimes). Our issue
   was  that  the XML was actually written in iso-8859-1 encoding but the
   encoding   element   in   the   XML  said  UTF-8  <?xml  version="1.0"
   encoding="utf-8" ?>.

   I think the way it works is that UTF-8 expects 0000-007f characters to
   be  single byte for ASCII compatibility and after that as double byte.
   Mostly  this  is  fine  if all the characters are under 007F (127). In
   this  case, when the parser got to the accent character it would throw
   it's hands in the air (like it did in fact care) saying "okay - you've
   told  me  it's  utf-8, but then when I read this character that should
   have another byte with it, but it doesn't - what is going on!?"

   Check  that your encoding, element is correct. If the encoding element
   says  UTF-8  and  the  text  is  actually ASCII then the parser may be
   having difficulty when it "sees" an over 127 character and thinks that
   the   text   "should"   be  Unicode.  Try  changing  the  encoding  to
   "ISO-8859-1" (or something suitable) and see what happens.

   Stuart

   ______________________________________________________________________

   some instances where
   particular  elements  have  foreign  language characters in (over char
   127).
   This seems to be giving me a segmentation faults in the open stage, if
   i use
   an  EXT  that  tries to extract data from that element or any elements
   after
   the element in question.

   **********************************************************************

   This  email message and any files transmitted with it are confidential
   and intended solely for the use of addressed recipient(s). If you have
   received  this  email  in  error please notify the Spotless IS Support
   Centre  (+61 3 9269 7555) immediately, who will advise further action.
   This  footnote  also confirms that this email message has been scanned
   for the presence of computer related viruses.

   **********************************************************************
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to