Comment:

I'd hate to see any implementation of FormatPrettyPrint go in that does not take account of the DTD/Schema.  Without the DTD/Schema one cannot make any safe assumption about the content of the document (eg ignoring whitespace nodes, inserting new lines etc).

Maybe there needs to two features, one that only works if a DTD/Schema is available and one that takes a reasonable 'punt' at what is OK (such as the rules below) irrespective of any DTD/Schema if any?


Example:
<node>
        <s><s1/><s2>more text</s2></s>
<node>

<node>
        <s>
                <s1/>
                <s2>more text</s2>
        </s>
<node>

Do these represent the same data content?  You don't know without the DTD/Schema ...
If element s is defined as mixed (ie can contain text and elements), then the above are potentially critically different.

ie if element must conform to:
<!ELEMENT s (#PCDATA | s1 | s2)* >

How would you format:
<node>
        <s><s1/><s2>more text</s2>trailing text</s>
<node>




Gareth Reakes <[EMAIL PROTECTED]>

21/11/2002 09:56 PM
Please respond to xerces-c-dev

       
        To:        [EMAIL PROTECTED]
        cc:        
        Subject:        Re: FormatPrettyPrint implementation


Hi,
                nice work :). The spec does not say how the pretty printing should
work so if no one has any objections to the way you have done it then I
suggest we commit it and see what users say. I know many people ask for
this functionality.

Gareth



On Thu, 21 Nov 2002, Kevin King wrote:

> Hello,
>
> I should probably first issue the disclaimer that as of a few days ago I
> did not know any details about XML, nor had even heard of Xerces.   I have
> however been able to very quickly integrate Xerces-C++ into my application
> and get some basic XML functionality working using the DOM API.
>
> I obtained unexpected results when setting FormatPrettyPrint to serialize a
> document that was created from scratch within my application.  Quick
> examination of the DomWriter implementation and searching through the
> archives of the xerces-c-dev list confirmed that there was no user error
> and it was functioning as designed.  So this evening I modified  DomWriter
> to format its output "Pretty".
>
> The fact that I was able to get basic XML working within my application,
> and even add some functionality in a matter of a couple days is a testament
> to everyone that has worked on this project - I was very impressed at how
> easy it was to write code based from the provided samples and even edit the
> source.  Everything is very well organized and documented.
>
> My implementation of PrettyPrint seems to work with some random XML files I
> was able to find.  But I will not begin to suggest it is a complete or
> working solution since my knowledge of XML is minimal.
>
> I came up with a few rules, added to DOMWriterImpl::processNode() which
> seem to do the trick when PrettyPrint is enabled:
>
>          1) All text nodes that contain ONLY whitespace are ignored
>
>          2) Each tag begins on a new line, indented a variable amount based
> on its level.  A level is defined as how many generations removed from the
> root element it is.
>
>          3) Closing tags for Element nodes are printed on the same line as
> the opening if no newlines have been output as the result of any
> children.  Otherwise closing tags are printed on a newline indented the
> same level as the opening tag.
>
>          4) An empty newline is printed just before the tag for each child
> of the root node.
>
>
> Currently I have the amount of indenting to be hard coded to two blank
> spaces per level.  This should be user configurable in a final implementation.
>
> Now my concern is that rule #1 may not fly.  I do not know enough about XML
> to know if that will incorrectly ignore some valid data.  From all the XML
> samples I could find, the only time that a text node contained only
> whitespace was when it was in between an element's close tag and the next
> element's open tag, thus providing a readable format.  I decided that it is
> best to ignore all existing formatting when FormatPrettyPrint  is enabled
> as any attempt to combine the two would be too complex and create an
> unpredictable output.
>
> Rules 2, 3, and 4 are just my own preference in what I think looks good,
> and they were very easy to implement.
>
> I do not know if anyone was working on this but the following thread seemed
> to indicate it was not, as the only more recent discussions were people
> indicating that FormatPrettyPrint produced unexpected results.
>          http://marc.theaimsgroup.com/?l=xerces-c-dev&m=102760381301304&w=2
>
>
> I would like to hear any comments on the above.  And would also not mind
> receiving some sample XML to run through DomWriter to see if it handles it
> with FormatPrettyPrint on.  I am more than willing to share any of these
> changes, and add to them any oversights that I had.
>
> -Kevin King
>
>
> Sample output using the "personal.xml" file provided in the samples.  I
> removed 3 of the users for a briefer sample:
>          "domprint.exe -wfpp=on personal.xml"
>
> <?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>
> <!DOCTYPE personnel>
> <!-- @version: -->

>
>
> <personnel>
>    <person id="Big.Boss">
>      <name>
>        <family>Mr Boss</family>
>        <given>Big</given>
>      </name>
>      <email>[EMAIL PROTECTED]</email>
>      <link subordinates="one.worker two.worker"/>
>    </person>
>
>    <person id="one.worker">
>      <name>
>        <family>Worker</family>
>        <given>One</given>
>      </name>
>      <email>[EMAIL PROTECTED]</email>
>      <link manager="Big.Boss"/>
>    </person>
>
>    <person id="two.worker">
>      <name>
>        <family>Worker</family>
>        <given>Two</given>
>      </name>
>      <email>[EMAIL PROTECTED]</email>
>      <link manager="Big.Boss"/>
>    </person>
>
> </personnel>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

--
Gareth Reakes, Head of Product Development  
DecisionSoft Ltd.            http://www.decisionsoft.com
Office: +44 (0) 1865 203192



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to