Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Alex Harsanyi
I tested the your string port version and I also wrote a "string-append" 
version of the xml reader and they are both slower by about 10-15% on my 
machine, when compared to the current read-xml implementation which uses 
`list->string`.  It looks like `list->string` is not the bottleneck here.

There are some small improvements that can be made from micro 
optimizations.  For example, I changed `name-char?` to not use 
`name-start?` but instead check for all chars, and I also changed 
`lex-name` to construct the list in reverse and use `(list->string (reverse 
chars))`, plus I reordered the cond condition to check the common case 
first (that the next character is a name-char? and not a 'special one).  
However, this resulted in only about 5-10% speed improvement, nowhere near 
where the 4 time speedup when using `sxml`, as reported by John.

In the end, it may well be that speeding up `read-xml` can only be done by 
these types of micro-optimizations.  Another thing I looked into was the 
"pattern" used for reading: all the `read-xml` code will use the pattern of 
"peeking" the next character, deciding if it is good, than reading it.  
This is much slower than just reading the characters directly.  These are 
the results from just reading in a 14Mb XML file:

read-char only:  cpu time: 312 real time: 307 gc time: 0
read-char-or-special only:  cpu time: 750 real time: 741 gc time: 0
peek-char than read-char:  cpu time: 1234 real time: 1210 gc time: 0
peek-char-or-special than read-char-or-special:  cpu time: 1688 real 
time: 1690 gc time: 0

Using this code:

(define file-name "your-test-file-here.xml")

(printf "read-char only~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
   (lambda (in)
 (let loop ([c (read-char in)])
   (if (eof-object? c)
   (void)
   (loop (read-char in)))

(printf "read-char-or-special only~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
   (lambda (in)
 (let loop ([c (read-char-or-special in)])
   (if (eof-object? c)
   (void)
   (loop (read-char-or-special in)))

(printf "peek-char than read-char~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
  (lambda (in)
(let loop ([c (peek-char in)])
  (if (eof-object? c)
  (void)
  (begin
(void (read-char in))
(loop (peek-char in

(printf "peek-char-or-special than read-char-or-special~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
  (lambda (in)
(let loop ([c (peek-char-or-special in)])
  (if (eof-object? c)
  (void)
  (begin
(void (read-char-or-special in))
(loop (peek-char-or-special in

Alex.

On Monday, June 29, 2020 at 5:30:43 AM UTC+8 rmculp...@gmail.com wrote:

> Thanks Alex for pointing out the use of list->string. I've created a PR (
> https://github.com/racket/racket/pull/3275) that changes that code to use 
> string ports instead (similar to Hendrik's suggestion, but the string port 
> handles resizing automatically). Could someone (John?) with some large XML 
> files lying around try the changes and see if they help?
>
> Ryan
>
>
> On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke  
> wrote:
>
>> If anyone wants to optimize `read-xml` for particular classes of use, 
>> without changing the interface, it might be very helpful to run your 
>> representative tests using the statistical profiler.
>>
>> The profiler text report takes a little while of tracing through 
>> manually to get a feel for how to read and use it, but it can be 
>> tremendously useful, and is worth learning to do if you need performance.
>>
>> After a first pass with that, you might also want to look at how costly 
>> allocations/GC are, and maybe do some controlled experiments around 
>> that.  For example, force a few GC cycles, run your workload under 
>> profiler, check GC time during, and forced time after.  If you're 
>> dealing with very large graphs coming out of the parser, I don't know 
>> whether those are enough to matter with the current GC mechanism, but 
>> maybe also check GC time while you're holding onto large graphs, when 
>> you release them, and after they've been collected.  At some point, GC 
>> gets hard for at least me to reason about, but some things make sense, 
>> and other things you decide when to stop digging. :)  If you record all 
>> your measurements, you can compare empirically the how different changes 
>> to the code affect things, hopefully in representative situations.
>>
>> I went through a lot of these exercises to optimize a large system, and 
>> sped up dynamic Web page loads dramatically in the usual case (to the 
>> point we were then mainly limited by PostgreSQL query cost, not much by 
>> the application code in Scheme, nor our request&response network I/O), 
>> and also greatly reduced the pain of intermittent request latency spikes 
>> due to GC.
>>
>> O

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Alex Harsanyi
I suggested using `string-append` because in my own performance 
investigations with reading 100Mb+ CSV files: constructing short tokens 
using string-append is faster than using a string port -- perhaps there is 
a fixed overhead with using string ports which makes `string-append` faster 
for short strings, but I don't know at what string length the string ports 
become faster.

I think using string ports will be definitely faster than using 
`list->string`, but for the difference between `string-append` and string 
ports, some performance measurement might be needed.

Thanks for looking into this,
Alex.

On Monday, June 29, 2020 at 5:30:43 AM UTC+8 rmculp...@gmail.com wrote:

> Thanks Alex for pointing out the use of list->string. I've created a PR (
> https://github.com/racket/racket/pull/3275) that changes that code to use 
> string ports instead (similar to Hendrik's suggestion, but the string port 
> handles resizing automatically). Could someone (John?) with some large XML 
> files lying around try the changes and see if they help?
>
> Ryan
>
>
> On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke  
> wrote:
>
>> If anyone wants to optimize `read-xml` for particular classes of use, 
>> without changing the interface, it might be very helpful to run your 
>> representative tests using the statistical profiler.
>>
>> The profiler text report takes a little while of tracing through 
>> manually to get a feel for how to read and use it, but it can be 
>> tremendously useful, and is worth learning to do if you need performance.
>>
>> After a first pass with that, you might also want to look at how costly 
>> allocations/GC are, and maybe do some controlled experiments around 
>> that.  For example, force a few GC cycles, run your workload under 
>> profiler, check GC time during, and forced time after.  If you're 
>> dealing with very large graphs coming out of the parser, I don't know 
>> whether those are enough to matter with the current GC mechanism, but 
>> maybe also check GC time while you're holding onto large graphs, when 
>> you release them, and after they've been collected.  At some point, GC 
>> gets hard for at least me to reason about, but some things make sense, 
>> and other things you decide when to stop digging. :)  If you record all 
>> your measurements, you can compare empirically the how different changes 
>> to the code affect things, hopefully in representative situations.
>>
>> I went through a lot of these exercises to optimize a large system, and 
>> sped up dynamic Web page loads dramatically in the usual case (to the 
>> point we were then mainly limited by PostgreSQL query cost, not much by 
>> the application code in Scheme, nor our request&response network I/O), 
>> and also greatly reduced the pain of intermittent request latency spikes 
>> due to GC.
>>
>> One of the hotspots, I did half a dozen very different implementations, 
>> including C extension, and found an old-school pure Scheme 
>> implementation was fastest.  I compared the performance of the 
>> implementation using something like `shootout`, but there might be 
>> better ways now in Racket. https://www.neilvandyke.org/racket/shootout/  
>> I also found we could be much faster if we made a change to what the 
>> algorithm guarantees, since it was more of a consistency check that 
>> turned out to be very expensive and very redundant, due to all the ways 
>> that utility code ended up being used.
>>
>> In addition to contrived experiments, I also rigged up a runtime option 
>> so that the server would save data from the statistical profiler for 
>> each request a Web server handled in production.  Which was tremendously 
>> useful, since it gave us real-world examples that were also difficult to 
>> synthesize (e.g., complex dynamic queries), and we could go from Web 
>> logs and user feedback, to exactly what happened.
>>
>> (In that system I optimized, we used Oleg's SXML tools very heavily 
>> throughout the system, plus some bespoke SXML tools for HTML and XML.  
>> There was one case in which someone had accidentally used the `xml` 
>> module, not knowing it was incompatible with the rest of the system, 
>> which caused some strange failures (no static checking) before it was 
>> discovered, and we changed that code to use SXML.)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.co

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Hendrik Boom
On Sun, Jun 28, 2020 at 11:30:27PM +0200, Ryan Culpepper wrote:
> Thanks Alex for pointing out the use of list->string. I've created a PR (
> https://github.com/racket/racket/pull/3275) that changes that code to use
> string ports instead (similar to Hendrik's suggestion, but the string port
> handles resizing automatically). Could someone (John?) with some large XML
> files lying around try the changes and see if they help?

I'm currently using sxml to read the current openGL specification. 
2564178 bytes in one file. 
(a) Is that relevant to your request?
(b) What do I do to have a choice of which sxml to use?
(c) Do I need to figure out using xml as well?
(d) And what do you want the test to do?

-- hendrik

> 
> Ryan
> 
> 
> On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke  wrote:
> 
> > If anyone wants to optimize `read-xml` for particular classes of use,
> > without changing the interface, it might be very helpful to run your
> > representative tests using the statistical profiler.
> >
> > The profiler text report takes a little while of tracing through
> > manually to get a feel for how to read and use it, but it can be
> > tremendously useful, and is worth learning to do if you need performance.
> >
> > After a first pass with that, you might also want to look at how costly
> > allocations/GC are, and maybe do some controlled experiments around
> > that.  For example, force a few GC cycles, run your workload under
> > profiler, check GC time during, and forced time after.  If you're
> > dealing with very large graphs coming out of the parser, I don't know
> > whether those are enough to matter with the current GC mechanism, but
> > maybe also check GC time while you're holding onto large graphs, when
> > you release them, and after they've been collected.  At some point, GC
> > gets hard for at least me to reason about, but some things make sense,
> > and other things you decide when to stop digging. :)  If you record all
> > your measurements, you can compare empirically the how different changes
> > to the code affect things, hopefully in representative situations.
> >
> > I went through a lot of these exercises to optimize a large system, and
> > sped up dynamic Web page loads dramatically in the usual case (to the
> > point we were then mainly limited by PostgreSQL query cost, not much by
> > the application code in Scheme, nor our request&response network I/O),
> > and also greatly reduced the pain of intermittent request latency spikes
> > due to GC.
> >
> > One of the hotspots, I did half a dozen very different implementations,
> > including C extension, and found an old-school pure Scheme
> > implementation was fastest.  I compared the performance of the
> > implementation using something like `shootout`, but there might be
> > better ways now in Racket. https://www.neilvandyke.org/racket/shootout/
> > I also found we could be much faster if we made a change to what the
> > algorithm guarantees, since it was more of a consistency check that
> > turned out to be very expensive and very redundant, due to all the ways
> > that utility code ended up being used.
> >
> > In addition to contrived experiments, I also rigged up a runtime option
> > so that the server would save data from the statistical profiler for
> > each request a Web server handled in production.  Which was tremendously
> > useful, since it gave us real-world examples that were also difficult to
> > synthesize (e.g., complex dynamic queries), and we could go from Web
> > logs and user feedback, to exactly what happened.
> >
> > (In that system I optimized, we used Oleg's SXML tools very heavily
> > throughout the system, plus some bespoke SXML tools for HTML and XML.
> > There was one case in which someone had accidentally used the `xml`
> > module, not knowing it was incompatible with the rest of the system,
> > which caused some strange failures (no static checking) before it was
> > discovered, and we changed that code to use SXML.)
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Racket Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to racket-users+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org
> > .
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/racket-users/CANy33q%3DpZw9EPmZG%2Bdz5cRYMSP17Ofntq9JwFqVVoN8ZhO6POg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-use

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Ryan Culpepper
Thanks Alex for pointing out the use of list->string. I've created a PR (
https://github.com/racket/racket/pull/3275) that changes that code to use
string ports instead (similar to Hendrik's suggestion, but the string port
handles resizing automatically). Could someone (John?) with some large XML
files lying around try the changes and see if they help?

Ryan


On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke  wrote:

> If anyone wants to optimize `read-xml` for particular classes of use,
> without changing the interface, it might be very helpful to run your
> representative tests using the statistical profiler.
>
> The profiler text report takes a little while of tracing through
> manually to get a feel for how to read and use it, but it can be
> tremendously useful, and is worth learning to do if you need performance.
>
> After a first pass with that, you might also want to look at how costly
> allocations/GC are, and maybe do some controlled experiments around
> that.  For example, force a few GC cycles, run your workload under
> profiler, check GC time during, and forced time after.  If you're
> dealing with very large graphs coming out of the parser, I don't know
> whether those are enough to matter with the current GC mechanism, but
> maybe also check GC time while you're holding onto large graphs, when
> you release them, and after they've been collected.  At some point, GC
> gets hard for at least me to reason about, but some things make sense,
> and other things you decide when to stop digging. :)  If you record all
> your measurements, you can compare empirically the how different changes
> to the code affect things, hopefully in representative situations.
>
> I went through a lot of these exercises to optimize a large system, and
> sped up dynamic Web page loads dramatically in the usual case (to the
> point we were then mainly limited by PostgreSQL query cost, not much by
> the application code in Scheme, nor our request&response network I/O),
> and also greatly reduced the pain of intermittent request latency spikes
> due to GC.
>
> One of the hotspots, I did half a dozen very different implementations,
> including C extension, and found an old-school pure Scheme
> implementation was fastest.  I compared the performance of the
> implementation using something like `shootout`, but there might be
> better ways now in Racket. https://www.neilvandyke.org/racket/shootout/
> I also found we could be much faster if we made a change to what the
> algorithm guarantees, since it was more of a consistency check that
> turned out to be very expensive and very redundant, due to all the ways
> that utility code ended up being used.
>
> In addition to contrived experiments, I also rigged up a runtime option
> so that the server would save data from the statistical profiler for
> each request a Web server handled in production.  Which was tremendously
> useful, since it gave us real-world examples that were also difficult to
> synthesize (e.g., complex dynamic queries), and we could go from Web
> logs and user feedback, to exactly what happened.
>
> (In that system I optimized, we used Oleg's SXML tools very heavily
> throughout the system, plus some bespoke SXML tools for HTML and XML.
> There was one case in which someone had accidentally used the `xml`
> module, not knowing it was incompatible with the rest of the system,
> which caused some strange failures (no static checking) before it was
> discovered, and we changed that code to use SXML.)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CANy33q%3DpZw9EPmZG%2Bdz5cRYMSP17Ofntq9JwFqVVoN8ZhO6POg%40mail.gmail.com.


Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Neil Van Dyke
If anyone wants to optimize `read-xml` for particular classes of use, 
without changing the interface, it might be very helpful to run your 
representative tests using the statistical profiler.


The profiler text report takes a little while of tracing through 
manually to get a feel for how to read and use it, but it can be 
tremendously useful, and is worth learning to do if you need performance.


After a first pass with that, you might also want to look at how costly 
allocations/GC are, and maybe do some controlled experiments around 
that.  For example, force a few GC cycles, run your workload under 
profiler, check GC time during, and forced time after.  If you're 
dealing with very large graphs coming out of the parser, I don't know 
whether those are enough to matter with the current GC mechanism, but 
maybe also check GC time while you're holding onto large graphs, when 
you release them, and after they've been collected.  At some point, GC 
gets hard for at least me to reason about, but some things make sense, 
and other things you decide when to stop digging. :)  If you record all 
your measurements, you can compare empirically the how different changes 
to the code affect things, hopefully in representative situations.


I went through a lot of these exercises to optimize a large system, and 
sped up dynamic Web page loads dramatically in the usual case (to the 
point we were then mainly limited by PostgreSQL query cost, not much by 
the application code in Scheme, nor our request&response network I/O), 
and also greatly reduced the pain of intermittent request latency spikes 
due to GC.


One of the hotspots, I did half a dozen very different implementations, 
including C extension, and found an old-school pure Scheme 
implementation was fastest.  I compared the performance of the 
implementation using something like `shootout`, but there might be 
better ways now in Racket. https://www.neilvandyke.org/racket/shootout/  
I also found we could be much faster if we made a change to what the 
algorithm guarantees, since it was more of a consistency check that 
turned out to be very expensive and very redundant, due to all the ways 
that utility code ended up being used.


In addition to contrived experiments, I also rigged up a runtime option 
so that the server would save data from the statistical profiler for 
each request a Web server handled in production.  Which was tremendously 
useful, since it gave us real-world examples that were also difficult to 
synthesize (e.g., complex dynamic queries), and we could go from Web 
logs and user feedback, to exactly what happened.


(In that system I optimized, we used Oleg's SXML tools very heavily 
throughout the system, plus some bespoke SXML tools for HTML and XML.  
There was one case in which someone had accidentally used the `xml` 
module, not knowing it was incompatible with the rest of the system, 
which caused some strange failures (no static checking) before it was 
discovered, and we changed that code to use SXML.)


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org.


Re: [racket-users] Re: note about parsing speed of xml vs sxml?

2020-06-28 Thread Hendrik Boom
On Sat, Jun 27, 2020 at 05:16:34PM -0700, Alex Harsanyi wrote:
> Looking at the source for `read-xml`, it seems to be using `list->string` 
> in several places.  That is, it reads characters one-by-one and constructs 
> a list by appending a character to the end of it, than calls `list->string` 
> to produce the string.  I suspect read-xml could be made faster by using 
> `string-append` in these cases.

So you would be copying and reallocating strings instead of cons-cells?

The way to make that eliminate all that allocation is to implement a 
likely big enough mutable string buffer and insert characters (likely 
one at at time if I read you correctly) without allocating new storage 
each time (unless you've made the buffer too smal; in which case, double 
its size).

Then allocate the right amount of space for a string once and copy 
the buffer into it when the string has been completely read in.

-- hendrik

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/20200628190739.tpjrkilrf5eq546w%40topoi.pooq.com.