Bartolomeo, CDATA just means character data; a CDATA section demarcates a block of character data. The character data still have to be in some character encoding. If your xml declaration has an encoding attribute that tells the xml parser what character encoding your document is using, then the parser is going to use it (unless the parser was given some metadata ahead of parsing the xml, e.g., from HTTP headers that specify a different encoding). If you think about it, you'll realize that it's not reasonable to expect the parser to change from the encoding you told it to use to a different encoding just because you demarcate some block of characters inside a CDATA section. - Wing Yew
-----Original Message----- From: Bartolomeo Nicolotti [mailto:bnicolo...@siapcn.it] Sent: Tuesday, August 25, 2009 2:15 AM To: user@xmlbeans.apache.org Subject: Re: Illegal XML character: 0x1c inside CDATA Hi, I've seen that link already, but, as you can see from the previous attachment, the 0x1c character is in a [CDATA[...]] section and in the specification of xml there's written that: Within a CDATA section, only the CDEnd string is recognized as markup http://www.w3.org/TR/2008/REC-xml-20081126/#sec-cdata-sect So you're saying that ALSO inside the CDATA section one has to use the same encoding as said in the xml directive: <?xml version="1.0" encoding="ISO-8859-1" ?> I think that our supplier is aware of the fact that he has two differnt encodings and uses the [CDATA[]] section for the the purpose of being able to put UTF-8 bytes inside an ISO-8859-1 encoded xml. Could anyone help? Many thanks Best regards Il giorno mar, 25/08/2009 alle 01.35 -0700, Jacob Danner ha scritto: > Ahh, just re-looked at your old post and clicked on the link and ended > up at the following page which I think might explain some of your > issue. > http://www.w3schools.com/xmL/xml_encoding.asp > > > > On Mon, Aug 24, 2009 at 11:47 PM, Bartolomeo Nicolotti > <bnicolo...@siapcn.it> wrote: > Hi, > > if you open the attached file with an editor that let you see > the hex > code of the files, for example ghex2 in linux, you'll see that > before > the string > > > "denominaciones de origen espa" > > there's a 0x1c byte that's the one that causes the exception. > Removing > this byte there's a further failure due to 0x1d bytes. > > I've the xml in a Java String and I use the method > parse(String), that > fails due to the 0x1c, 0x1d bytes inside the string. No http > header is > involved, as I have the xml in memory as a Java String. > > Many thanks > > Best regards. > > > > Il giorno lun, 24/08/2009 alle 11.39 -0700, Jacob Danner ha > scritto: > > > Can you properly parse the XMLObject when the value you are > trying to > > parse comes from a file? > > > > Again, I do not think this error is caused by an entry in > the CDATA of > > an element but rather in the content of the HTTP. When I > recieved this > > error before I found the issue was in some data that I > recieved before > > I had even recieved the XML PI. > > Also, what are you doing with the http headers since that > occurs > > before the payload? > > > > -jacobd > > > > On Mon, Aug 24, 2009 at 12:25 AM, Bartolomeo Nicolotti > > <bnicolo...@siapcn.it> wrote: > > Hi, > > > > we do the same, we use have the attached file in a > string, > > having POSTed > > it with > > > > > > int > > > org.apache.commons.httpclient.HttpClient.executeMethod(HttpMethod > > method) throws IOException, HttpException > > > > and then we do XMLObject.parse, as you can see also > from the > > call > > stack: > > > > > > org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:208) > > > > > > Our problem is that in the xml itself there's a > character 0x1c > > that > > causes the parser to crash giving > > > > > > e.toString():org.apache.xmlbeans.XmlException: > > error: Illegal > > > XML > > > character: 0x1c > > > > > > org.apache.xmlbeans.impl.piccolo.io.IllegalCharException: > > > Illegal XML > > > character: 0x1c > > > > > > I think that the parser ignores that inside CDATA > there could > > be 0x1c > > characters due to a different encoding. > > > > This is a big problem, isn't it? Especially because > the parse > > fails > > completely! > > > > Many thanks > > > > Bye > > > > Il giorno ven, 21/08/2009 alle 14.10 -0700, Jacob > Danner ha > > scritto: > > > > > I've seen similar when working with content > retrieved from > > URLs. What > > > I found was the problem wasn't in the content of > the xml, > > but in some > > > additional data that was passed along prior to the > xml > > payload I > > > wanted. My workaround to this was to use some IO > Stream APIs > > to read > > > the content into a string and then parse the data. > > > > > > Out of curiousity, if you save the payload to a > file, can > > you read it > > > with XMLBeans (ie, XMLObject.parse(...))? > > > > > > HTH, > > > -jacobd > > > > > > On Fri, Aug 21, 2009 at 10:03 AM, Bartolomeo > Nicolotti > > > <bnicolo...@siapcn.it> wrote: > > > Hi, > > > > > > we're receiving xml from a supplier > encoded in > > ISO-8859-1, but > > > some tags > > > body are encoded with UTF-8, but they are > surrounded > > with > > > CDATA, so that > > > strange encodings, like 0x1c character > shouldn't be > > a problem > > > to the > > > parser, as said here: > > > > > > http://www.w3schools.com/xmL/xml_cdata.asp > > > > > > We've built a parser with xmlbean last > stable > > version, but the > > > parser > > > complain about this 0x1c character, see > attachment > > near: > > > > > > ... > > > "denominaciones de origen espa" > > > ... > > > > > > Fri Aug 21 16:14:39 CEST 2009:class > > > > > > com.siap.DPKWebServices.Util.OTA_literal_HttpPost.queryHttp > > > caught an > > > exception: 29047814 > org.apache.xmlbeans.XmlException > > > > e.toString():org.apache.xmlbeans.XmlException: > > error: Illegal > > > XML > > > character: 0x1c > > > > > > org.apache.xmlbeans.impl.piccolo.io.IllegalCharException: > > > Illegal XML > > > character: 0x1c > > > at > > > > > > > org.apache.xmlbeans.impl.piccolo.xml.XMLReaderReader.read(XMLReaderReader.java:169) > > > at > > > > > > > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3474) > > > at > > > > > > > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yynextChar(PiccoloLexer.java:3721) > > > at > > > > > > > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseCdataSection(PiccoloLexer.java:2671) > > > at > > > > > > > org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4850) > > > at > > > > > > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290) > > > at > > > > > > > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400) > > > at > > > > > > org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714) > > > at > org.apache.xmlbeans.impl.store.Locale > > > $SaxLoader.load(Locale.java:3439) > > > at > > > > > > org.apache.xmlbeans.impl.store.Locale.parse(Locale.java:706) > > > at > > > > > > > org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:690) > > > at > > > > > > > org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:677) > > > at > > > > > > > org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:208) > > > at > com.siap.TransHotel.GetAvailAccomDocument > > > $Factory.parse(Unknown Source) > > > at > > > > > > com.siap.DPKWebServices.Util.TransHotelUtil.validateRS(TransHotelUti > > > > > > Is there a way to work-around this prolem? > > > > > > Many thanks > > > > > > Best regards > > > > > > Bartolomeo > > > > > > -- > > > Bartolomeo Nicolotti > > > SIAP s.r.l. > > > www.siapcn.it > > > v.S.Albano 13 12049 > > > Trinità(CN) Italy > > > ph:+39 0172 652553 > > > centralino: +39 0172 652511 > > > fax: +39 0172 652519 > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > > user-unsubscr...@xmlbeans.apache.org > > > For additional commands, e-mail: > > user-h...@xmlbeans.apache.org > > > > > > > -- > > > > Bartolomeo Nicolotti > > SIAP s.r.l. > > www.siapcn.it > > v.S.Albano 13 12049 > > Trinità(CN) Italy > > ph:+39 0172 652553 > > centralino: +39 0172 652511 > > fax: +39 0172 652519 > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > user-unsubscr...@xmlbeans.apache.org > > For additional commands, e-mail: > user-h...@xmlbeans.apache.org > > > > > > > > -- > > Bartolomeo Nicolotti > SIAP s.r.l. > www.siapcn.it > v.S.Albano 13 12049 > Trinità(CN) Italy > ph:+39 0172 652553 > centralino: +39 0172 652511 > fax: +39 0172 652519 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@xmlbeans.apache.org > For additional commands, e-mail: user-h...@xmlbeans.apache.org > > > -- Bartolomeo Nicolotti SIAP s.r.l. www.siapcn.it v.S.Albano 13 12049 Trinità(CN) Italy ph:+39 0172 652553 centralino: +39 0172 652511 fax: +39 0172 652519 --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@xmlbeans.apache.org For additional commands, e-mail: user-h...@xmlbeans.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@xmlbeans.apache.org For additional commands, e-mail: user-h...@xmlbeans.apache.org