Hussein Shafie wrote:
> Benoit Maisonny wrote:
>   
>> I believe I've found a bug in whitespace handling. I've read some posts
>> about whitespace handling on the archives but I don't think this was
>> addressed already.
>>
>> In short, if an element such as XHTML's p contains a text node with
>> leading/trailing whitespace, that text node is trimmed when opening the
>> document.
>>
>> Steps to reproduce:
>>
>>   1. File, New, XHTML Strict/Page.
>>   2. In the p element, type "test: " (note the ending space).
>>   3. Save and close the document. The file is saved with <p>test: </p>
>>   4. Open the document in XXE: notice that the ending space has
>>      disappeared from the view. The file on disk hasn't been modified yet.
>>   5. Type something in the title element (i.e. the <p> is not affected)
>>   6. Save the document.The file is saved with <p>test:</p>, so the
>>      ending space has been removed.
>>
>> From my understanding of XML whitespace handling, I think XXE can format
>> that whitespace however it wants (add a new line, put several
>> spaces...), but it may not remove it. Or, so to speak, xml:space=default
>> does not mean "trim text nodes".
>>
>> Note, this issue does not arise if there is an element following that
>> space, inside the <p>. As in <p>test: <b>bold</b></p>. Likewise, it does
>> not happen for <div>test: <p>paragraph</p></div>.
>>
>>     
>
> I'm sorry by xml:space="default" means do whatever you want with it.
>
> http://www.w3.org/TR/xml/#sec-white-space
> ---
> The value "default" signals that applications' default white-space
> processing modes are acceptable for this element; the value "preserve"
> indicates the intent that applications preserve all the white space.
> ---
>
> * In XXE, whitespace trimming only occurs when the document is opened.
>
> * XXE behavior is pretty reasonable:
> ---
> <p>
> Some text.
> </p>
> ---
>
> is loaded as:
>
> ---
> <p>Some text.</p>
> ---
>
> While:
> ---
> <p>
> Some text:
>
>
>
> <b>HERE!</>
> </p>
> ---
>
> is loaded as:
>
> ---
> <p>Some text: <b>HERE!</></p>
> ---
>
> If some spaces are important for you, please use &nbsp; (i.e.
> non-breaking spaces) and not plain space characters. With XXE, you can
> type &nbsp; by pressing Ctrl-SPACE.
>   
So you're right, XXE can define its own "whitespace processing mode", 
according to the XML spec.

What you find reasonable is to process whitespace differently depending 
on whether that space is in the beginning of a text node or in the 
middle, or if there is an opening tag following the text node.

I would have seen more logical to treat the same way all whitespace in a 
given text node.

I found another special handling:
<body>
    <div><p>test: </p><p>test</p></div>
  </body>
In this case, XXE keeps the space in the first p.
But not in the following:
<body>
    <p>test: </p>

    <p>test</p>
  </body>

What is the difference? Do you look at the text node's grand-parent's 
content model (mixed not not) to decide whether to trim or not?

I would need to know:
1. What is the decision process leading to trim or not
2. If your method is a widely accepted way of processing whitespace in 
"default" mode.
3. How did you decide on this method rather than not trimming as I expected?


Thanks,
Benoit




-- 
Benoit Maisonny                benoit at synclude.com
Director & Consultant          http://synclude.com
Synclude


Reply via email to