I managed to solve these problems. I give a patch on xerces 1.3 in attachement.
I hope this will be integrated soon in the current repository.
Sebastien
Sebastien Ponce wrote:
> I'm trying to serialize a tree that was built using xerces. The point is that
> this tree has many ignorable
> whitespaces node in it (basically one every 2 nodes).
>
> When I try to serialize with options setPreserveSpace(false) and
> setIndenting(true), the identation is not
> down correctly. Basically, no carriage return is down. After looking at
> org.apache.xml.serialize.BaseMarkupSerializer and
> org.apache.xml.serialize.XmlSerializer, it appears that
> there are two main problems :
> - the content() method is called in BaseMarkupSerializer for a text node
> and changes the element state
> from empty to non empty.
> Thus, when the first subelements of an element are an ignorable text node and
> then an element, the state goes
> to non empty while the ignorable text node is serialized and the element
> don't print a carriage return since
> the state is non empty nor afterElement when it arrives.
> - in the same way content make state.afterElement equal to false. So when
> you got element - ignorable
> text node - element, the 2 elements are on the same line
>
> At last, comments are taken for text and thus no carriage return is printed
> before and after them. So if your
> xml has comments every 2 lines that explains what the data are, it is
> serialized on a single line...
>
> Sebastien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
Index: BaseMarkupSerializer.java
===================================================================
RCS file:
/home/cvspublic/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
retrieving revision 1.21
diff -r1.21 BaseMarkupSerializer.java
361a362
> state.afterComment = false;
601c602,607
< _printer.indent();
---
> // Indent this element on a new line if the first
> // content of the parent element or immediately
> // following an element.
> if ( _indenting && ! state.preserveSpace)
> _printer.breakLine();
> _printer.indent();
603c609
< _printer.unindent();
---
> _printer.unindent();
604a611,612
> state.afterComment = true;
> state.afterElement = false;
886c894,895
< characters( node.getNodeValue() );
---
> if (
> !_indenting || getElementState().preserveSpace || !(text.replace('\n','
> ').trim() != ""))
>
> characters( node.getNodeValue() );
1036a1046,1049
> // Except for one content type, all of them
> // are not last comment. That one content
> // type will take care of itself.
> state.afterComment = false;
1368a1382
> state.afterComment = false;
Index: ElementState.java
===================================================================
RCS file:
/home/cvspublic/xml-xerces/java/src/org/apache/xml/serialize/ElementState.java,v
retrieving revision 1.6
diff -r1.6 ElementState.java
114a115,120
> * True if the last serialized node was a comment node.
> */
> boolean afterComment;
>
>
> /**
Index: XMLSerializer.java
===================================================================
RCS file:
/home/cvspublic/xml-xerces/java/src/org/apache/xml/serialize/XMLSerializer.java,v
retrieving revision 1.17
diff -r1.17 XMLSerializer.java
217c217
< // following an element.
---
> // following an element or a comment
219c219
< ( state.empty || state.afterElement ) )
---
> ( state.empty || state.afterElement || state.afterComment ) )
335c335
< if ( _indenting && ! state.preserveSpace && state.afterElement )
---
> if ( _indenting && ! state.preserveSpace && (state.afterElement
> || state.afterComment) )
344a345
> state.afterComment = false;
391c392
< ( state.empty || state.afterElement ) )
---
> ( state.empty || state.afterElement || state.afterComment ) )
585c586
< ( state.empty || state.afterElement ) )
---
> ( state.empty || state.afterElement || state.afterComment ) )
647a649
> state.afterComment = false;