See #removeAllFormattingNodes and its comment in the latest version. And instances of SAXHandler and subclasses are meant to be created with #on: (or another "instance creation" message), _not #new_, otherwise they won't be properly initialized. The class comment is clear about this, but I should have overridden #new to raise an error like Stream does. Your misuse was helpful in bringing this to my attention, and I added a Stream-like #new implementation to SAXHandler.
> Sent: Friday, December 08, 2017 at 9:21 AM > From: "Stephane Ducasse" <[email protected]> > To: "Pharo Development List" <[email protected]> > Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? > > Hi monty > > > On Fri, Dec 8, 2017 at 9:03 AM, monty <[email protected]> wrote: > > By "empty XML nodes," do you mean whitespace-only string nodes? > > Yes > > > Those are included because all in-element whitespace is assumed significant > > by the spec: https://www.w3.org/TR/xml/#sec-white-space > > I know. There was a discussion a while ago. I just lost a couple of > hours understanding that :( > > But this is a super super super annoying practices. > We had to test each nodes to see if it is a empty nodes so it makes > everything a lot more complex without real justification > beside the fact that these standardizers probably never implemented > some real cases. > This standard is a really out of reality from that perspective. > > > The exception is if the element is declared in the DTD as only having > > element children ("element content"): > > https://www.w3.org/TR/xml/#dt-elemcontent > > Well the XML files that I had (I did not choose XML because I would > have prefer JSON :) ), had no DTD :( > > So at the end of the day, this wonderful standard puts all the stress > and burden to people. > > > > > For example, if you declare an element like this: > > > > <!ELEMENT one (two,three*,four?)> > > > > Any whitespace around a "two," "three," or "four" element child of a "one" > > element is insignificant and ignored (unless #preservesIgnorableWhitespace: > > is true). Other parsers, like LibXML2 and Xerces, behave the same way. > > > > I'll see if I can come up with some easier way to deal with this, like an > > optional parser setting, new enumeration methods, or maybe a tree > > transformation. > > It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! > > > Because reality is that people have XML files with just nodes and no > empty nodes and they are forced to > Let me know because I could try. > > I was showing how to use Pharo to import code to pharo learners and > this was a big drag. > > Stef > > > I tried to set some values in the parser but it did not work. > BTW I saw that the configuration logic forces to write the following > > | parser doc visitor | > parser := XMLDOMParser new > on: self xmlContents; > preservesIgnorableWhitespace: true. > > and not > > | parser doc visitor | > parser := XMLDOMParser new > preservesIgnorableWhitespace: true. > on: self xmlContents; > > > > > >> Sent: Tuesday, December 05, 2017 at 8:29 AM > >> From: "Stephane Ducasse" <[email protected]> > >> To: "Pharo Development List" <[email protected]> > >> Subject: [Pharo-dev] How to get rid of empty XML nodes? > >> > >> )Hi > >> > >> we are manipulating an XML document and I would like to get rid of the > >> spurious empty string. > >> We saw that the gt panes are doing it. > >> > >> (aNodeWithElements isStringNode > >> and: [aNodeWithElements isEmpty > >> or: [aNodeWithElements isWhitespace]] > >> > >> Is there a way not to produce empty nodes? > >> Is there a simple way not to have to handle them > >> > >> Now each time we are dealing with a node with have to check. > >> > >> Stef > >> > >> > > > >
