Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Alex Harsanyi
I suggested using `string-append` because in my own performance investigations with reading 100Mb+ CSV files: constructing short tokens using string-append is faster than using a string port -- perhaps there is a fixed overhead with using string ports which makes `string-append` faster for

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Ryan Culpepper
Thanks Alex for pointing out the use of list->string. I've created a PR ( https://github.com/racket/racket/pull/3275) that changes that code to use string ports instead (similar to Hendrik's suggestion, but the string port handles resizing automatically). Could someone (John?) with some large XML

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Hendrik Boom
On Sun, Jun 28, 2020 at 11:30:27PM +0200, Ryan Culpepper wrote: > Thanks Alex for pointing out the use of list->string. I've created a PR ( > https://github.com/racket/racket/pull/3275) that changes that code to use > string ports instead (similar to Hendrik's suggestion, but the string port >

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Alex Harsanyi
I tested the your string port version and I also wrote a "string-append" version of the xml reader and they are both slower by about 10-15% on my machine, when compared to the current read-xml implementation which uses `list->string`. It looks like `list->string` is not the bottleneck here.

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Hendrik Boom
On Sat, Jun 27, 2020 at 05:16:34PM -0700, Alex Harsanyi wrote: > Looking at the source for `read-xml`, it seems to be using `list->string` > in several places. That is, it reads characters one-by-one and constructs > a list by appending a character to the end of it, than calls `list->string` >

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Neil Van Dyke
If anyone wants to optimize `read-xml` for particular classes of use, without changing the interface, it might be very helpful to run your representative tests using the statistical profiler. The profiler text report takes a little while of tracing through manually to get a feel for how to