Sure it can get quite annoying. It would be good to have a switch to prevent the creation of whitespace-only nodes at parse time.
Norbert > Am 10.12.2017 um 08:42 schrieb Stephane Ducasse <[email protected]>: > > Norbert > > Should I say to the tool generating the XML that it is an idiot? Even > that I cannot. It is a tool I do not control. > I have no control about what I get. > Now why we cannot control that if people add a line return or not does > not matter? > Why I cannot be in charge of deciding? I take the risk of the > interpretation but now > the "standard" does not help me at all. It just tells me that is good for me. > > I implemented in the past "standards" like XMI to found that there > were bugs in the spec. > > At then end, each time I visit a node I have to check > > visitNodeWithElements: aNodeWithElements > | currentNode | > currentNode := OkStubNode new. > self cleanNode: aNodeWithElements. > aNodeWithElements hasChildren > ifTrue: [ | tokenNode | > self cleanNode: aNodeWithElements nodes first. > tokenNode := self visitElement: aNodeWithElements > nodes first. > self assert: tokenNode isToken. > currentNode addChild: tokenNode. > aNodeWithElements nodes allButFirst > do: [ :each | currentNode addChild: (self > visitNodeWithElements: each) ] ]. > ^ currentNode > > And I do not like to modify a structure while I'm visiting it. > > > cleanNode: aNodeWithElements > aNodeWithElements removeNodes: (aNodeWithElements nodes select: > [ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ]) > > So I understand why people are going away from XML. > > Stef > >> On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl <[email protected]> wrote: >> >> >>> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[email protected]>: >>> >>> Hi monty >>> >>> >>>> On Fri, Dec 8, 2017 at 9:03 AM, monty <[email protected]> wrote: >>>> By "empty XML nodes," do you mean whitespace-only string nodes? >>> >>> Yes >>> >>>> Those are included because all in-element whitespace is assumed >>>> significant by the spec: https://www.w3.org/TR/xml/#sec-white-space >>> >>> I know. There was a discussion a while ago. I just lost a couple of >>> hours understanding that :( >>> >>> But this is a super super super annoying practices. >>> We had to test each nodes to see if it is a empty nodes so it makes >>> everything a lot more complex without real justification >>> beside the fact that these standardizers probably never implemented >>> some real cases. >>> This standard is a really out of reality from that perspective. >> >> Are you sure you do not oversimplify things? XML would be even more complex >> if these cases would be in the standard. It is not easy to decide if a >> whitespace is important or not. >> Where do this whitespaces in your case come from? Most probably because the >> XML is pretty printed. That is inserting whitespaces into the serialized >> text. So why not just stopping to pretty print and your problem is gone. >> >> Norbert >>> >>>> The exception is if the element is declared in the DTD as only having >>>> element children ("element content"): >>>> https://www.w3.org/TR/xml/#dt-elemcontent >>> >>> Well the XML files that I had (I did not choose XML because I would >>> have prefer JSON :) ), had no DTD :( >>> >>> So at the end of the day, this wonderful standard puts all the stress >>> and burden to people. >>> >>>> >>>> For example, if you declare an element like this: >>>> >>>> <!ELEMENT one (two,three*,four?)> >>>> >>>> Any whitespace around a "two," "three," or "four" element child of a "one" >>>> element is insignificant and ignored (unless >>>> #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and >>>> Xerces, behave the same way. >>>> >>>> I'll see if I can come up with some easier way to deal with this, like an >>>> optional parser setting, new enumeration methods, or maybe a tree >>>> transformation. >>> >>> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! >>> >>> >>> Because reality is that people have XML files with just nodes and no >>> empty nodes and they are forced to >>> Let me know because I could try. >>> >>> I was showing how to use Pharo to import code to pharo learners and >>> this was a big drag. >>> >>> Stef >>> >>> >>> I tried to set some values in the parser but it did not work. >>> BTW I saw that the configuration logic forces to write the following >>> >>> | parser doc visitor | >>> parser := XMLDOMParser new >>> on: self xmlContents; >>> preservesIgnorableWhitespace: true. >>> >>> and not >>> >>> | parser doc visitor | >>> parser := XMLDOMParser new >>> preservesIgnorableWhitespace: true. >>> on: self xmlContents; >>> >>> >>>> >>>>> Sent: Tuesday, December 05, 2017 at 8:29 AM >>>>> From: "Stephane Ducasse" <[email protected]> >>>>> To: "Pharo Development List" <[email protected]> >>>>> Subject: [Pharo-dev] How to get rid of empty XML nodes? >>>>> >>>>> )Hi >>>>> >>>>> we are manipulating an XML document and I would like to get rid of the >>>>> spurious empty string. >>>>> We saw that the gt panes are doing it. >>>>> >>>>> (aNodeWithElements isStringNode >>>>> and: [aNodeWithElements isEmpty >>>>> or: [aNodeWithElements isWhitespace]] >>>>> >>>>> Is there a way not to produce empty nodes? >>>>> Is there a simple way not to have to handle them >>>>> >>>>> Now each time we are dealing with a node with have to check. >>>>> >>>>> Stef >>>>> >>>>> >>>> >> > <Original-java.xml>
