Re: [Pharo-dev] tricks for XML parsing.

monty Wed, 13 Jul 2016 17:00:06 -0700

Thanks for the link.

In-place parsing is a non-starter because it means storing the entire input as 
a string in memory, so you could only parse files that fit in Pharo's address 
space. The multi-gigabyte OpenStreetMap docs the article mentions would be 
unparsable with SAX in a 32-bit VM.


Linked lists for storing child nodes is common (LibXML2 and Xerces do it) and 
provides constant time insertion and sibling access, but arrays/vectors 
(Arrays/OrderedCollections) are more cache friendly and faster for sequential 
access and in Pharo are almost always the correct choice.

There is always the option of an FFI-based parser, but it shouldn't be a hybrid 
like Python's minidom (FFI Expat with a Python DOM implementation), because 
something like that already exists in Smalltalk/X (FFI Expat with a Smalltalk 
DOM) and it was slower than a St/X port of XMLParser in my tests (I assume due 
to the FFI overhead), so it's probably not worth it. But a non-hybrid parser 
with everything (including the DOM) done in C should definitely be faster.

> Sent: Wednesday, July 13, 2016 at 10:27 AM
> From: stepharo <[email protected]>
> To: "Pharo Development List" <[email protected]>
> Subject: [Pharo-dev] tricks for XML parsing.
>
> Hi guys
> 
> these free books may be interesting for you
> 
>      http://aosabook.org/
> 
> http://aosabook.org/en/posa/parsing-xml-at-the-speed-of-light.html
> 
> 
> stef
> 
> 
>

Re: [Pharo-dev] tricks for XML parsing.

Reply via email to