Thanks for all the responses !

For the problem at hand, given the string "<tag><![CDATA[text &
markup]]></tag>" , XmlBeans generates
<tag>&lt;![CDATA[text & markup]]>lt;/tag>. I'm making it work by replacing
the '&lt;!' with a '<!'. Whats more, XmlBeans replaces markup characters
that occur within the CDATA section, with their references ( It doesn't
understand a CDATA section ). Seems like I'd have to post-process the
generated Xml to resolve all the references within the CDATA section.

Thanks,
Radhakrishnan

-----Original Message-----
From: Andy Clark [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 23, 2004 1:44 AM
To: [EMAIL PROTECTED]
Subject: Re: SAX Parse error while parsing CDATA element


Bob Foster wrote:
> A CDATA section can be (mostly) included as CDATA by breaking up the
> ]]> delimiter. For example,
> 
> <![CDATA[text & markup]]>
> 
> Can be wrapped as,
> 
> <![CDATA[<![CDATA[text & markup]]]>]>
> 
> with characters ]> outside any CDATA section as plain text. This 
> technique will work for any sequence of mixed text and markup by 
> escaping out of and back into CDATA sections as many times as
> necessary.

Interesting trick. For those that are interested, here are
the SAX callbacks that Xerces generates for the above example:

   startCDATA()
    characters(text="<![CDATA[text & markup")
    characters(text="]")
   endCDATA()
   characters(text="]")
   characters(text=">")

Simply ignoring the CDATA section boundaries produces a set
of characters that represents the CDATA section as text. Nice.

However, care still needs to be taken when encoding the embedded
content. Using SAX or DOM, you can detect CDATA section boundaries
and properly encode them. Dumping the contents of an XML file
directly into a CDATA section, however, can produce an ill-formed
document.

And simply searching for the text "]]>" within the stream that
is being embedded is not easy due to character encoding issues.
Which means that you should use SAX or DOM to parse the embedded
document in order to properly encode its contents.

All of this work can be avoided, of course, if the document to
be embedded is generated and you can guarantee that it does not
contain the "]]>" sequence.

-- 
Andy Clark * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to