Re: [Pharo-dev] How to get rid of empty XML nodes?

Norbert Hartl Fri, 08 Dec 2017 07:03:57 -0800


> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[email protected]>:
> 
> Hi monty
> 
> 
>> On Fri, Dec 8, 2017 at 9:03 AM, monty <[email protected]> wrote:
>> By "empty XML nodes," do you mean whitespace-only string nodes?
> 
> Yes
> 
>> Those are included because all in-element whitespace is assumed significant 
>> by the spec: https://www.w3.org/TR/xml/#sec-white-space
> 
> I know. There was a discussion a while ago. I just lost a couple of
> hours understanding that :(
> 
> But this is a super super super annoying practices.
> We had to test each nodes to see if it is a empty nodes so it makes
> everything a lot more complex without real justification
> beside the fact that these standardizers probably never implemented
> some real cases.
> This standard is a really out of reality from that perspective.


Are you sure you do not oversimplify things? XML would be even more complex if 
these cases would be in the standard. It is not easy to decide if a whitespace 
is important or not.
Where do this whitespaces in your case come from? Most probably because the XML 
is pretty printed. That is inserting whitespaces into the serialized text. So 
why not just stopping to pretty print and your problem is gone. 

Norbert
> 
>> The exception is if the element is declared in the DTD as only having 
>> element children ("element content"): 
>> https://www.w3.org/TR/xml/#dt-elemcontent
> 
> Well the XML files that I had (I did not choose XML because I would
> have prefer JSON :) ), had no DTD :(
> 
> So at the end of the day, this wonderful standard puts all the stress
> and burden to people.
> 
>> 
>> For example, if you declare an element like this:
>> 
>> <!ELEMENT one (two,three*,four?)>
>> 
>> Any whitespace around a "two," "three," or "four" element child of a "one" 
>> element is insignificant and ignored (unless #preservesIgnorableWhitespace: 
>> is true). Other parsers, like LibXML2 and Xerces, behave the same way.
>> 
>> I'll see if I can come up with some easier way to deal with this, like an 
>> optional parser setting, new enumeration methods, or maybe a tree 
>> transformation.
> 
> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!!
> 
> 
> Because reality is that people have XML files with just nodes and no
> empty nodes and they are forced to
> Let me know because I could try.
> 
> I was showing how to use Pharo to import code to pharo learners and
> this was a big drag.
> 
> Stef
> 
> 
> I tried to set some values in the parser but it did not work.
> BTW I saw that the configuration logic forces to write the following
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>   on: self xmlContents;
>   preservesIgnorableWhitespace: true.
> 
> and not
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>    preservesIgnorableWhitespace: true.
>    on: self xmlContents;
> 
> 
>> 
>>> Sent: Tuesday, December 05, 2017 at 8:29 AM
>>> From: "Stephane Ducasse" <[email protected]>
>>> To: "Pharo Development List" <[email protected]>
>>> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>>> 
>>> )Hi
>>> 
>>> we are manipulating an XML document and I would like to get rid of the
>>> spurious empty string.
>>> We saw that the gt panes are doing it.
>>> 
>>> (aNodeWithElements isStringNode
>>> and: [aNodeWithElements isEmpty
>>> or: [aNodeWithElements isWhitespace]]
>>> 
>>> Is there a way not to produce empty nodes?
>>> Is there a simple way not to have to handle them
>>> 
>>> Now each time we are dealing with a node with have to check.
>>> 
>>> Stef
>>> 
>>> 
>>

Re: [Pharo-dev] How to get rid of empty XML nodes?

Reply via email to