Re: [Pharo-dev] How to get rid of empty XML nodes?

Norbert Hartl Sun, 10 Dec 2017 11:53:39 -0800

Sure it can get quite annoying. It would be good to have a switch to prevent 
the creation of whitespace-only nodes at parse time.


Norbert
> Am 10.12.2017 um 08:42 schrieb Stephane Ducasse <[email protected]>:
> 
> Norbert
> 
> Should I say to the tool generating the XML that it is an idiot? Even
> that I cannot. It is a tool I do not control.
> I have no control about what I get.
> Now why we cannot control that if people add a line return or not does
> not matter?
> Why I cannot be in charge of deciding? I take the risk of the
> interpretation but now
> the "standard" does not help me at all. It just tells me that is good for me.
> 
> I implemented in the past "standards" like XMI to found that there
> were bugs in the spec.
> 
> At then end, each time I visit a node I have to check
> 
> visitNodeWithElements: aNodeWithElements
>   | currentNode |
>   currentNode := OkStubNode new.
>   self cleanNode: aNodeWithElements.
>   aNodeWithElements hasChildren
>        ifTrue: [ | tokenNode |
>                    self cleanNode: aNodeWithElements nodes first.
>                    tokenNode := self visitElement: aNodeWithElements
> nodes first.
>                    self assert: tokenNode isToken.
>                    currentNode addChild: tokenNode.
>                    aNodeWithElements nodes allButFirst
>                        do: [ :each | currentNode addChild: (self
> visitNodeWithElements: each) ] ].
>    ^ currentNode
> 
> And I do not like to modify a structure while I'm visiting it.
> 
> 
> cleanNode: aNodeWithElements
>      aNodeWithElements removeNodes: (aNodeWithElements nodes select:
> [ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ])
> 
> So I understand why people are going away from XML.
> 
> Stef
> 
>> On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl <[email protected]> wrote:
>> 
>> 
>>> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[email protected]>:
>>> 
>>> Hi monty
>>> 
>>> 
>>>> On Fri, Dec 8, 2017 at 9:03 AM, monty <[email protected]> wrote:
>>>> By "empty XML nodes," do you mean whitespace-only string nodes?
>>> 
>>> Yes
>>> 
>>>> Those are included because all in-element whitespace is assumed 
>>>> significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
>>> 
>>> I know. There was a discussion a while ago. I just lost a couple of
>>> hours understanding that :(
>>> 
>>> But this is a super super super annoying practices.
>>> We had to test each nodes to see if it is a empty nodes so it makes
>>> everything a lot more complex without real justification
>>> beside the fact that these standardizers probably never implemented
>>> some real cases.
>>> This standard is a really out of reality from that perspective.
>> 
>> Are you sure you do not oversimplify things? XML would be even more complex 
>> if these cases would be in the standard. It is not easy to decide if a 
>> whitespace is important or not.
>> Where do this whitespaces in your case come from? Most probably because the 
>> XML is pretty printed. That is inserting whitespaces into the serialized 
>> text. So why not just stopping to pretty print and your problem is gone.
>> 
>> Norbert
>>> 
>>>> The exception is if the element is declared in the DTD as only having 
>>>> element children ("element content"): 
>>>> https://www.w3.org/TR/xml/#dt-elemcontent
>>> 
>>> Well the XML files that I had (I did not choose XML because I would
>>> have prefer JSON :) ), had no DTD :(
>>> 
>>> So at the end of the day, this wonderful standard puts all the stress
>>> and burden to people.
>>> 
>>>> 
>>>> For example, if you declare an element like this:
>>>> 
>>>> <!ELEMENT one (two,three*,four?)>
>>>> 
>>>> Any whitespace around a "two," "three," or "four" element child of a "one" 
>>>> element is insignificant and ignored (unless 
>>>> #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and 
>>>> Xerces, behave the same way.
>>>> 
>>>> I'll see if I can come up with some easier way to deal with this, like an 
>>>> optional parser setting, new enumeration methods, or maybe a tree 
>>>> transformation.
>>> 
>>> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>> Because reality is that people have XML files with just nodes and no
>>> empty nodes and they are forced to
>>> Let me know because I could try.
>>> 
>>> I was showing how to use Pharo to import code to pharo learners and
>>> this was a big drag.
>>> 
>>> Stef
>>> 
>>> 
>>> I tried to set some values in the parser but it did not work.
>>> BTW I saw that the configuration logic forces to write the following
>>> 
>>> | parser doc visitor |
>>> parser := XMLDOMParser new
>>>  on: self xmlContents;
>>>  preservesIgnorableWhitespace: true.
>>> 
>>> and not
>>> 
>>> | parser doc visitor |
>>> parser := XMLDOMParser new
>>>   preservesIgnorableWhitespace: true.
>>>   on: self xmlContents;
>>> 
>>> 
>>>> 
>>>>> Sent: Tuesday, December 05, 2017 at 8:29 AM
>>>>> From: "Stephane Ducasse" <[email protected]>
>>>>> To: "Pharo Development List" <[email protected]>
>>>>> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>>>>> 
>>>>> )Hi
>>>>> 
>>>>> we are manipulating an XML document and I would like to get rid of the
>>>>> spurious empty string.
>>>>> We saw that the gt panes are doing it.
>>>>> 
>>>>> (aNodeWithElements isStringNode
>>>>> and: [aNodeWithElements isEmpty
>>>>> or: [aNodeWithElements isWhitespace]]
>>>>> 
>>>>> Is there a way not to produce empty nodes?
>>>>> Is there a simple way not to have to handle them
>>>>> 
>>>>> Now each time we are dealing with a node with have to check.
>>>>> 
>>>>> Stef
>>>>> 
>>>>> 
>>>> 
>> 
> <Original-java.xml>

Re: [Pharo-dev] How to get rid of empty XML nodes?

Reply via email to