Norbert
Should I say to the tool generating the XML that it is an idiot? Even
that I cannot. It is a tool I do not control.
I have no control about what I get.
Now why we cannot control that if people add a line return or not does
not matter?
Why I cannot be in charge of deciding? I take the risk of the
interpretation but now
the "standard" does not help me at all. It just tells me that is good for me.
I implemented in the past "standards" like XMI to found that there
were bugs in the spec.
At then end, each time I visit a node I have to check
visitNodeWithElements: aNodeWithElements
| currentNode |
currentNode := OkStubNode new.
self cleanNode: aNodeWithElements.
aNodeWithElements hasChildren
ifTrue: [ | tokenNode |
self cleanNode: aNodeWithElements nodes first.
tokenNode := self visitElement: aNodeWithElements
nodes first.
self assert: tokenNode isToken.
currentNode addChild: tokenNode.
aNodeWithElements nodes allButFirst
do: [ :each | currentNode addChild: (self
visitNodeWithElements: each) ] ].
^ currentNode
And I do not like to modify a structure while I'm visiting it.
cleanNode: aNodeWithElements
aNodeWithElements removeNodes: (aNodeWithElements nodes select:
[ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ])
So I understand why people are going away from XML.
Stef
On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl <[email protected]> wrote:
>
>
>> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[email protected]>:
>>
>> Hi monty
>>
>>
>>> On Fri, Dec 8, 2017 at 9:03 AM, monty <[email protected]> wrote:
>>> By "empty XML nodes," do you mean whitespace-only string nodes?
>>
>> Yes
>>
>>> Those are included because all in-element whitespace is assumed significant
>>> by the spec: https://www.w3.org/TR/xml/#sec-white-space
>>
>> I know. There was a discussion a while ago. I just lost a couple of
>> hours understanding that :(
>>
>> But this is a super super super annoying practices.
>> We had to test each nodes to see if it is a empty nodes so it makes
>> everything a lot more complex without real justification
>> beside the fact that these standardizers probably never implemented
>> some real cases.
>> This standard is a really out of reality from that perspective.
>
> Are you sure you do not oversimplify things? XML would be even more complex
> if these cases would be in the standard. It is not easy to decide if a
> whitespace is important or not.
> Where do this whitespaces in your case come from? Most probably because the
> XML is pretty printed. That is inserting whitespaces into the serialized
> text. So why not just stopping to pretty print and your problem is gone.
>
> Norbert
>>
>>> The exception is if the element is declared in the DTD as only having
>>> element children ("element content"):
>>> https://www.w3.org/TR/xml/#dt-elemcontent
>>
>> Well the XML files that I had (I did not choose XML because I would
>> have prefer JSON :) ), had no DTD :(
>>
>> So at the end of the day, this wonderful standard puts all the stress
>> and burden to people.
>>
>>>
>>> For example, if you declare an element like this:
>>>
>>> <!ELEMENT one (two,three*,four?)>
>>>
>>> Any whitespace around a "two," "three," or "four" element child of a "one"
>>> element is insignificant and ignored (unless #preservesIgnorableWhitespace:
>>> is true). Other parsers, like LibXML2 and Xerces, behave the same way.
>>>
>>> I'll see if I can come up with some easier way to deal with this, like an
>>> optional parser setting, new enumeration methods, or maybe a tree
>>> transformation.
>>
>> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!!
>>
>>
>> Because reality is that people have XML files with just nodes and no
>> empty nodes and they are forced to
>> Let me know because I could try.
>>
>> I was showing how to use Pharo to import code to pharo learners and
>> this was a big drag.
>>
>> Stef
>>
>>
>> I tried to set some values in the parser but it did not work.
>> BTW I saw that the configuration logic forces to write the following
>>
>> | parser doc visitor |
>> parser := XMLDOMParser new
>> on: self xmlContents;
>> preservesIgnorableWhitespace: true.
>>
>> and not
>>
>> | parser doc visitor |
>> parser := XMLDOMParser new
>> preservesIgnorableWhitespace: true.
>> on: self xmlContents;
>>
>>
>>>
>>>> Sent: Tuesday, December 05, 2017 at 8:29 AM
>>>> From: "Stephane Ducasse" <[email protected]>
>>>> To: "Pharo Development List" <[email protected]>
>>>> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>>>>
>>>> )Hi
>>>>
>>>> we are manipulating an XML document and I would like to get rid of the
>>>> spurious empty string.
>>>> We saw that the gt panes are doing it.
>>>>
>>>> (aNodeWithElements isStringNode
>>>> and: [aNodeWithElements isEmpty
>>>> or: [aNodeWithElements isWhitespace]]
>>>>
>>>> Is there a way not to produce empty nodes?
>>>> Is there a simple way not to have to handle them
>>>>
>>>> Now each time we are dealing with a node with have to check.
>>>>
>>>> Stef
>>>>
>>>>
>>>
>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<childElement>
<token column="-1" index="-1" line="0" text="COMPILATION_UNIT" type="104"/>
<childElement>
<token column="-1" index="-1" line="0" text="PACKAGE_DECL" type="106"/>
<childElement>
<token column="-1" index="-1" line="0" text="CONCRETE_UNIT_DECL" type="107"/>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="13" index="4" line="1" text="Original" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="22" index="6" line="1" text="{" type="23"/>
</childElement>
<name>
</name>childElement>
<token column="-1" index="-1" line="0" text="FUNCTION_DECL" type="115"/>
<childElement>
<token column="-1" index="-1" line="0" text="MODIFIER_LIST" type="139"/>
<childElement>
<token column="1" index="12" line="3" text="public" type="87"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="TYPE" type="119"/>
<childElement>
<token column="8" index="14" line="3" text="void" type="101"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="13" index="16" line="3" text="sumProd" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="20" index="17" line="3" text="FORMAL_PARAM_LIST" type="116"/>
<childElement>
<token column="-1" index="-1" line="0" text="PARAMETER_DECL" type="117"/>
<childElement>
<token column="-1" index="-1" line="0" text="MODIFIER_LIST" type="139"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="TYPE" type="119"/>
<childElement>
<token column="21" index="18" line="3" text="int" type="79"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="25" index="20" line="3" text="n" type="149"/>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="BLOCK_SCOPE" type="109"/>
<childElement>
<token column="-1" index="-1" line="0" text="SEPARATOR" type="147"/>
<childElement>
<token column="28" index="23" line="3" text="{" type="23"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VAR_DECL" type="111"/>
<childElement>
<token column="-1" index="-1" line="0" text="MODIFIER_LIST" type="139"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="TYPE" type="119"/>
<childElement>
<token column="2" index="28" line="4" text="float" type="72"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGNMENT_STATEMENT" type="143"/>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGN_OPERATOR" type="144"/>
<childElement>
<token column="12" index="32" line="4" text="=" type="6"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="8" index="30" line="4" text="sum" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VALUE" type="140"/>
<childElement>
<token column="-1" index="-1" line="0" text="CONST" type="141"/>
<childElement>
<token column="14" index="34" line="4" text="0.0" type="153"/>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="17" index="35" line="4" text=";" type="44"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VAR_DECL" type="111"/>
<childElement>
<token column="-1" index="-1" line="0" text="MODIFIER_LIST" type="139"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="TYPE" type="119"/>
<childElement>
<token column="2" index="38" line="5" text="float" type="72"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGNMENT_STATEMENT" type="143"/>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGN_OPERATOR" type="144"/>
<childElement>
<token column="13" index="42" line="5" text="=" type="6"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="8" index="40" line="5" text="prod" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VALUE" type="140"/>
<childElement>
<token column="-1" index="-1" line="0" text="CONST" type="141"/>
<childElement>
<token column="15" index="44" line="5" text="1.0" type="153"/>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="18" index="45" line="5" text=";" type="44"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="LOOP_STATEMENT" type="126"/>
<childElement>
<token column="-1" index="-1" line="0" text="KEYWORD" type="148"/>
<childElement>
<token column="2" index="50" line="6" text="for" type="73"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="INIT" type="127"/>
<childElement>
<token column="-1" index="-1" line="0" text="VAR_DECL" type="111"/>
<childElement>
<token column="-1" index="-1" line="0" text="MODIFIER_LIST" type="139"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="TYPE" type="119"/>
<childElement>
<token column="7" index="53" line="6" text="int" type="79"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGNMENT_STATEMENT" type="143"/>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGN_OPERATOR" type="144"/>
<childElement>
<token column="13" index="57" line="6" text="=" type="6"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="11" index="55" line="6" text="i" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VALUE" type="140"/>
<childElement>
<token column="-1" index="-1" line="0" text="CONST" type="141"/>
<childElement>
<token column="15" index="59" line="6" text="1" type="152"/>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="CONDITION" type="125"/>
<childElement>
<token column="-1" index="-1" line="0" text="COMPARISON_OPERATOR" type="146"/>
<childElement>
<token column="20" index="64" line="6" text="<=" type="24"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="18" index="62" line="6" text="i" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="23" index="66" line="6" text="n" type="149"/>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="STEP" type="128"/>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGNMENT_STATEMENT" type="143"/>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="26" index="69" line="6" text="i" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="OPERATOR" type="145"/>
<childElement>
<token column="27" index="70" line="6" text="++" type="21"/>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="BLOCK_SCOPE" type="109"/>
<childElement>
<token column="-1" index="-1" line="0" text="SEPARATOR" type="147"/>
<childElement>
<token column="31" index="73" line="6" text="{" type="23"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGNMENT_STATEMENT" type="143"/>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGN_OPERATOR" type="144"/>
<childElement>
<token column="7" index="81" line="7" text="=" type="6"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="3" index="79" line="7" text="sum" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VALUE" type="140"/>
<childElement>
<token column="-1" index="-1" line="0" text="OPERATOR" type="145"/>
<childElement>
<token column="13" index="85" line="7" text="+" type="38"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="9" index="83" line="7" text="sum" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="15" index="87" line="7" text="i" type="149"/>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="16" index="88" line="7" text=";" type="44"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGNMENT_STATEMENT" type="143"/>
<childElement>
<token column="-1" index="-1" line="0" text="ASSIGN_OPERATOR" type="144"/>
<childElement>
<token column="8" index="96" line="8" text="=" type="6"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="3" index="94" line="8" text="prod" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="VALUE" type="140"/>
<childElement>
<token column="-1" index="-1" line="0" text="OPERATOR" type="145"/>
<childElement>
<token column="15" index="100" line="8" text="*" type="49"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="10" index="98" line="8" text="prod" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="17" index="102" line="8" text="i" type="149"/>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="18" index="103" line="8" text=";" type="44"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="FUNCTION_CALL" type="120"/>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="3" index="109" line="9" text="foo" type="149"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ARGUMENT_LIST" type="121"/>
<childElement>
<token column="-1" index="-1" line="0" text="SEPARATOR" type="147"/>
<childElement>
<token column="6" index="110" line="9" text="(" type="29"/>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ARGUMENT" type="122"/>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="7" index="111" line="9" text="sum" type="149"/>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="ARGUMENT" type="122"/>
<childElement>
<token column="-1" index="-1" line="0" text="NAME" type="118"/>
<childElement>
<token column="12" index="114" line="9" text="prod" type="149"/>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="SEPARATOR" type="147"/>
<childElement>
<token column="16" index="115" line="9" text=")" type="43"/>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="17" index="116" line="9" text=";" type="44"/>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="SEPARATOR" type="147"/>
<childElement>
<token column="2" index="121" line="10" text="}" type="42"/>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="-1" index="-1" line="0" text="SEPARATOR" type="147"/>
<childElement>
<token column="1" index="125" line="11" text="}" type="42"/>
</childElement>
</childElement>
</childElement>
</childElement>
<childElement>
<token column="0" index="128" line="12" text="}" type="42"/>
</childElement>
</childElement>
</childElement>
</childElement>