> Hi, > > Could any of the developers be kind enough to read this email and > do the needful. Was only trying to help the development team in creating a > good product. > Should any one of you feel that it is incorrect, please say so. > > Regards, > -sripathy > > > -----Original Message----- > From: Sripathy Subramania > Sent: Wednesday, April 04, 2001 2:03 PM > To: '[email protected]' > Cc: '[EMAIL PROTECTED]' > Subject: BaseMarkupSerializer bug > > Hi, > > xerces-1_1_3, BaseMarkupSerializer.characters(char[], int, int) > inserts escape sequence "]]<![CDATA[" for embedded string > pattern "]]>", at the wrong location. > This results in incorrect XML data serialization from the DOM. > > I Have proposed a fix in this mail. > > Xerces version : 1.1.3 > JDK version : 1.3 > > I had a requirement of serializing the DOM conforming to the > following DTD. > > <?xml version="1.0" encoding="UTF-8"?> > <!ELEMENT Sample (Id, Messages+)> > <!ELEMENT Id (#PCDATA)> > <!ELEMENT Messages (MsgId, MsgDesc?, Msg)> > <!ELEMENT MsgId (#PCDATA)> > <!ELEMENT MsgDesc (#PCDATA)> > <!ELEMENT Msg (#PCDATA)> > > Xml file conforming to this dtd may be > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE Sample SYSTEM "Sample.dtd"> > <Sample> > <Id>Doc 1</Id> > <Messages> > <MsgId>Msg 1</MsgId> > <MsgDesc>Testing document</MsgDesc> > <Msg><![CDATA[This is a test message having patterns ]]>. This message > may cotain multiple occurrences of patterns ]]>. The End]]></Msg> > </Messages> > </Sample> > > In the above mentioned DTD, 'Msg' element value will be a > CDATA section. This element value may contain the string "]]>" > embedded in it(as shown in the saple xml document above). > BaseMarkupSerializer identifies this pattern and > escapes it by prepending the string "<![CDATA[", to "]]>". But the > code logic for escaping seems to have a bug. > > Original source from > Xerces-1_1_3\src\org\apache\xml\serialize\BaseMarkupSerializer > (Lines 457~491) > ********************************************************* > public void characters( char[] chars, int start, int length ) > { > ElementState state; > > state = content(); > // Check if text should be print as CDATA section or unescaped > // based on elements listed in the output format (the element > // state) or whether we are inside a CDATA section or entity. > > if ( state.inCData || state.doCData ) { > int saveIndent; > > // Print a CDATA section. The text is not escaped, but ']]>' > // appearing in the code must be identified and dealt with. > // The contents of a text node is considered space > // preserving. > if ( ! state.inCData ) { > _printer.printText( "<![CDATA[" ); > state.inCData = true; > } > saveIndent = _printer.getNextIndent(); > _printer.setNextIndent( 0 ); > for ( int index = 0 ; index < length ; ++index ) { > if ( index + 2 < length && chars[ index ] == ']' && > chars[ index + 1 ] == ']' && > chars[ index + 2 ] == '>') { > > printText( chars, start, index + 2, true, true ); > _printer.printText( "]]><![CDATA[" ); > start += index + 2; > length -= index + 2; > index = 0; > } > } > if ( length > 0 ) > printText( chars, start, length, true, true ); > _printer.setNextIndent( saveIndent ); > ************************************************************* > Proposed changes for the above block > > public void characters( char[] chars, int start, int length ) > { > ElementState state; > > state = content(); > // Check if text should be print as CDATA section or unescaped > // based on elements listed in the output format (the element > // state) or whether we are inside a CDATA section or entity. > > if ( state.inCData || state.doCData ) { > int saveIndent; > int index = 0; > int endIndex = 0; > > // Print a CDATA section. The text is not escaped, but ']]>' > // appearing in the code must be identified and dealt with. > // The contents of a text node is considered space > // preserving. > if ( ! state.inCData ) { > _printer.printText( "<![CDATA[" ); > state.inCData = true; > } > saveIndent = _printer.getNextIndent(); > _printer.setNextIndent( 0 ); > endIndex = start + length; > for ( index = start ; index < endIndex ; ++index ) { > if ( index + 2 < endIndex && chars[ index ] == ']' && > chars[ index + 1 ] == ']' && > chars[ index + 2 ] == '>') { > > printText( chars, start, index + 2 - start, > true, true); > _printer.printText( "]]><![CDATA[" ); > start = index + 2; > index = start; > } > } > if ( index > start ) > printText( chars, start, index-start, true, true ); > _printer.setNextIndent( saveIndent ); > ******************************************************************** > > NOTE : However this fix does not handle the case when the string > pattern "]]>" does not fall within the buffer boundary. > This might require more changes. > > I checked the source for Xerces-1_2_3 and observed that this bug is > not fixed yet. Moreover I couldn't find mails discussing this problem/fix > in > 'xerces-j-dev'/'xerces-j-user' mailing list. > I don't know whether this bug has been already identified by the > development team or not. > > Would appreciate, if someone familiar with the code can verify the > bug and baseline the changes. Would be glad to provide more > information, in this regard. > > Thanks, > -sripathy >
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
