Re: FormatPrettyPrint implementation

Gareth Reakes Wed, 27 Nov 2002 01:23:22 -0800

Hi,

On Wed, 27 Nov 2002 [EMAIL PROTECTED] wrote:


> wrt to fitting in with spec.
> If an element is defined as mixed by a DTD then you can't insert or remove 
> white space for the purposes of being 'human readable' because if the 
> whitespace is important then what the human reads (after pretty printing) 
> will be different to what a machine reads.

Agreed, but the rules are more complex when we take into account schema's. 
Adding whitespace may not be allowed even in mixed content.

> I agree that the spec is somewhat grey.  However there is no implication 
> that the transformation is allowed to make the document invalid - so 
> should you play it safe?

The fact that it states that the spec does not define it suggests to me 
that we can use either approach.

> 
> wrt  to the difficulty.
> Agree it could be hard.  However, seeing as the parser can determine the 
> result for DOMText::getIsWhitespaceInElementContent, why can't the parser 
> figure out if a Text node can be added with this attribute set?  Is that a 
> naive question ?;-).

Its a matter of schema validation as well. We cannot currently perform
validation on a DOM Tree. We have to serialise and reparse. Then we have
problems like creating invalid documents.


> I would have thought this sort of thing would have to the sorted with 
> cononicalisation anyway ...

I know more about the schema stuff so I will take it from that 
perspective. Each of the schema types (and therefore derived types etc) 
does have a defined canonical representation. If we implemented the 
ability to print this out using a method like printCanonicalRepresentaion 
then I don't see why we could not also write a method (that may take some 
params) called prettyPrintCanonicalRepresentation. Implementing this in 
addition to binding the schema type info to the eles /attrs (something I 
am currently working on) and then having the SchemaGrammar available to 
find the validators, would, I think be sufficient. This is some work and 
we are still left with problems such as what to do with manipulated trees 
that are ow invalid.

> 
> wrt to two features.
> Feature "format-pretty-print", that plays it safe and maintains validity.
> Feature 
> "http://apache.org/xml/features/format-pretty-print-no-grammar-check"; that 
> takes a punt according to rules such as those below.

Send a message suggesting this to [EMAIL PROTECTED] They do reply. They have 
probably discussed this. I would be interested in the response.

> 2./
> If people do want particular formating then maybe it is not a big deal for 
> them to iterate over the DOM and insert text nodes (ie ignoreable 
> whitespace) as appropriate for their particular needs.

You would have to clone the tree first because you may want to use it 
afterwards.

> 3./
> Maybe having format-canonical implemented would suit people who want 
> 'human readability' but need to maintain validity.  'Fraid I'm not up on 
> the Canonical XML spec and/or status.

I think what you suggest definitely has benefits, I just don't think it fits 
with the normal pretty print. I agree with you that it might be nice to 
have 2 features. Lets see what the DOM WG says.

Gareth


-- 
Gareth Reakes, Head of Product Development  
DecisionSoft Ltd.            http://www.decisionsoft.com
Office: +44 (0) 1865 203192



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: FormatPrettyPrint implementation

Reply via email to