On Thu, 2016-07-14 at 01:58 +0200, monty wrote:
> Thanks for the link.
> 
> In-place parsing is a non-starter because it means storing the entire
> input as a string in memory, so you could only parse files that fit
> in Pharo's address space. The multi-gigabyte OpenStreetMap docs the
> article mentions would be unparsable with SAX in a 32-bit VM.

I do not understand. I only know expat which does - AFAIK - in-place
parsing and surelt does not need the whole input in memory. 

> There is always the option of an FFI-based parser, but it shouldn't
> be a hybrid like Python's minidom (FFI Expat with a Python DOM
> implementation), 
> because something like that already exists in Smalltalk/X (FFI Expat
> with a Smalltalk DOM) 

I guess you refer to the implementation I did ages ago. 

> and it was slower than a St/X port of XMLParser in my tests (I assume
> due to the FFI overhead), so it's probably not worth it. 

Very, very interesting. Where can I find the benchmarks? 

I just run a very simple benchmark on 112MB document (http://www.xml-be
nchmark.org/downloads.html) and results are quite the opposite: 

Benchmark resut:
Generated at :14-07-2016 07:32:25 AM

           Benchmark      Execution Time [ms]      # of M&S GCs
[1]      # of newspace GCs [1]   Parameters
BenchmarkXML
            SAX -
VW                    93418                     0                      
 2060   
      SAX -
XMLSuite                     9921                     0                
        410   

As you can see, the latter is roughly 10 times faster. 

I agree my implementation which uses Expat is clearly suboptimal 
and need to be improved (for example it does not use a ILC-based 
send to driver so you have a lot of cache misses and does a lot 
of unnecessary memcpy()s, but this can be easily improved)

Jan
  

> But a non-hybrid parser with everything (including the DOM) done in C
> should definitely be faster.
> 
> > Sent: Wednesday, July 13, 2016 at 10:27 AM
> > From: stepharo <[email protected]>
> > To: "Pharo Development List" <[email protected]>
> > Subject: [Pharo-dev] tricks for XML parsing.
> > 
> > Hi guys
> > 
> > these free books may be interesting for you
> > 
> >      http://aosabook.org/
> > 
> > http://aosabook.org/en/posa/parsing-xml-at-the-speed-of-light.html
> > 
> > 
> > stef
> > 
> > 
> > 
> 

Reply via email to