Re: [Pharo-dev] tricks for XML parsing.

stepharo Thu, 14 Jul 2016 01:05:23 -0700

thanks for the analysis :)

I just skimmed over the tabl of contents and this caugth my eyes.




Le 14/7/16 à 01:58, monty a écrit :

Thanks for the link.

In-place parsing is a non-starter because it means storing the entire input as 
a string in memory, so you could only parse files that fit in Pharo's address 
space. The multi-gigabyte OpenStreetMap docs the article mentions would be 
unparsable with SAX in a 32-bit VM.

Linked lists for storing child nodes is common (LibXML2 and Xerces do it) and 
provides constant time insertion and sibling access, but arrays/vectors 
(Arrays/OrderedCollections) are more cache friendly and faster for sequential 
access and in Pharo are almost always the correct choice.

There is always the option of an FFI-based parser, but it shouldn't be a hybrid 
like Python's minidom (FFI Expat with a Python DOM implementation), because 
something like that already exists in Smalltalk/X (FFI Expat with a Smalltalk 
DOM) and it was slower than a St/X port of XMLParser in my tests (I assume due 
to the FFI overhead), so it's probably not worth it. But a non-hybrid parser 
with everything (including the DOM) done in C should definitely be faster.

Sent: Wednesday, July 13, 2016 at 10:27 AM
From: stepharo <[email protected]>
To: "Pharo Development List" <[email protected]>
Subject: [Pharo-dev] tricks for XML parsing.

Hi guys

these free books may be interesting for you

      http://aosabook.org/

http://aosabook.org/en/posa/parsing-xml-at-the-speed-of-light.html

stef

Re: [Pharo-dev] tricks for XML parsing.

Reply via email to