Thanks for the link. In-place parsing is a non-starter because it means storing the entire input as a string in memory, so you could only parse files that fit in Pharo's address space. The multi-gigabyte OpenStreetMap docs the article mentions would be unparsable with SAX in a 32-bit VM.
Linked lists for storing child nodes is common (LibXML2 and Xerces do it) and provides constant time insertion and sibling access, but arrays/vectors (Arrays/OrderedCollections) are more cache friendly and faster for sequential access and in Pharo are almost always the correct choice. There is always the option of an FFI-based parser, but it shouldn't be a hybrid like Python's minidom (FFI Expat with a Python DOM implementation), because something like that already exists in Smalltalk/X (FFI Expat with a Smalltalk DOM) and it was slower than a St/X port of XMLParser in my tests (I assume due to the FFI overhead), so it's probably not worth it. But a non-hybrid parser with everything (including the DOM) done in C should definitely be faster. > Sent: Wednesday, July 13, 2016 at 10:27 AM > From: stepharo <[email protected]> > To: "Pharo Development List" <[email protected]> > Subject: [Pharo-dev] tricks for XML parsing. > > Hi guys > > these free books may be interesting for you > > http://aosabook.org/ > > http://aosabook.org/en/posa/parsing-xml-at-the-speed-of-light.html > > > stef > > >
