On Thu, 2016-07-14 at 01:58 +0200, monty wrote: > Thanks for the link. > > In-place parsing is a non-starter because it means storing the entire > input as a string in memory, so you could only parse files that fit > in Pharo's address space. The multi-gigabyte OpenStreetMap docs the > article mentions would be unparsable with SAX in a 32-bit VM.
I do not understand. I only know expat which does - AFAIK - in-place parsing and surelt does not need the whole input in memory. > There is always the option of an FFI-based parser, but it shouldn't > be a hybrid like Python's minidom (FFI Expat with a Python DOM > implementation), > because something like that already exists in Smalltalk/X (FFI Expat > with a Smalltalk DOM) I guess you refer to the implementation I did ages ago. > and it was slower than a St/X port of XMLParser in my tests (I assume > due to the FFI overhead), so it's probably not worth it. Very, very interesting. Where can I find the benchmarks? I just run a very simple benchmark on 112MB document (http://www.xml-be nchmark.org/downloads.html) and results are quite the opposite: Benchmark resut: Generated at :14-07-2016 07:32:25 AM Benchmark Execution Time [ms] # of M&S GCs [1] # of newspace GCs [1] Parameters BenchmarkXML SAX - VW 93418 0 2060 SAX - XMLSuite 9921 0 410 As you can see, the latter is roughly 10 times faster. I agree my implementation which uses Expat is clearly suboptimal and need to be improved (for example it does not use a ILC-based send to driver so you have a lot of cache misses and does a lot of unnecessary memcpy()s, but this can be easily improved) Jan > But a non-hybrid parser with everything (including the DOM) done in C > should definitely be faster. > > > Sent: Wednesday, July 13, 2016 at 10:27 AM > > From: stepharo <[email protected]> > > To: "Pharo Development List" <[email protected]> > > Subject: [Pharo-dev] tricks for XML parsing. > > > > Hi guys > > > > these free books may be interesting for you > > > > http://aosabook.org/ > > > > http://aosabook.org/en/posa/parsing-xml-at-the-speed-of-light.html > > > > > > stef > > > > > > >
