Re: [Pharo-dev] How to get rid of empty XML nodes?

monty Thu, 25 Jan 2018 23:38:41 -0800

See #removeAllFormattingNodes and its comment in the latest version.

And instances of SAXHandler and subclasses are meant to be created with #on: 
(or another "instance creation" message), _not #new_, otherwise they won't be 
properly initialized. The class comment is clear about this, but I should have 
overridden #new to raise an error like Stream does. Your misuse was helpful in 
bringing this to my attention, and I added a Stream-like #new implementation to 
SAXHandler.


> Sent: Friday, December 08, 2017 at 9:21 AM
> From: "Stephane Ducasse" <[email protected]>
> To: "Pharo Development List" <[email protected]>
> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>
> Hi monty
> 
> 
> On Fri, Dec 8, 2017 at 9:03 AM, monty <[email protected]> wrote:
> > By "empty XML nodes," do you mean whitespace-only string nodes?
> 
> Yes
> 
> > Those are included because all in-element whitespace is assumed significant 
> > by the spec: https://www.w3.org/TR/xml/#sec-white-space
> 
> I know. There was a discussion a while ago. I just lost a couple of
> hours understanding that :(
> 
> But this is a super super super annoying practices.
> We had to test each nodes to see if it is a empty nodes so it makes
> everything a lot more complex without real justification
> beside the fact that these standardizers probably never implemented
> some real cases.
> This standard is a really out of reality from that perspective.
> 
> > The exception is if the element is declared in the DTD as only having 
> > element children ("element content"): 
> > https://www.w3.org/TR/xml/#dt-elemcontent
> 
> Well the XML files that I had (I did not choose XML because I would
> have prefer JSON :) ), had no DTD :(
> 
> So at the end of the day, this wonderful standard puts all the stress
> and burden to people.
> 
> >
> > For example, if you declare an element like this:
> >
> > <!ELEMENT one (two,three*,four?)>
> >
> > Any whitespace around a "two," "three," or "four" element child of a "one" 
> > element is insignificant and ignored (unless #preservesIgnorableWhitespace: 
> > is true). Other parsers, like LibXML2 and Xerces, behave the same way.
> >
> > I'll see if I can come up with some easier way to deal with this, like an 
> > optional parser setting, new enumeration methods, or maybe a tree 
> > transformation.
> 
> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!!
> 
> 
> Because reality is that people have XML files with just nodes and no
> empty nodes and they are forced to
> Let me know because I could try.
> 
> I was showing how to use Pharo to import code to pharo learners and
> this was a big drag.
> 
> Stef
> 
> 
> I tried to set some values in the parser but it did not work.
> BTW I saw that the configuration logic forces to write the following
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>    on: self xmlContents;
>    preservesIgnorableWhitespace: true.
> 
> and not
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>     preservesIgnorableWhitespace: true.
>     on: self xmlContents;
> 
> 
> >
> >> Sent: Tuesday, December 05, 2017 at 8:29 AM
> >> From: "Stephane Ducasse" <[email protected]>
> >> To: "Pharo Development List" <[email protected]>
> >> Subject: [Pharo-dev] How to get rid of empty XML nodes?
> >>
> >> )Hi
> >>
> >> we are manipulating an XML document and I would like to get rid of the
> >> spurious empty string.
> >> We saw that the gt panes are doing it.
> >>
> >> (aNodeWithElements isStringNode
> >> and: [aNodeWithElements isEmpty
> >> or: [aNodeWithElements isWhitespace]]
> >>
> >> Is there a way not to produce empty nodes?
> >> Is there a simple way not to have to handle them
> >>
> >> Now each time we are dealing with a node with have to check.
> >>
> >> Stef
> >>
> >>
> >
> 
>

Re: [Pharo-dev] How to get rid of empty XML nodes?

Reply via email to