Re: [racket-users] note about parsing speed of xml vs sxml?

2020-07-01 Thread 'John Clements' via Racket Users
Ryan, I just tested your pull request, and… it doesn’t make much difference in 
my example.

One important thing that I realize that I *totally neglected* to mention is 
that I’m running CS racket here, not BC. Based on my experiments, it appears 
that 

1) CS is much faster than BC for both xml(read-xml) and sxml (ssax:xml->sxml), 
and
2) CS speeds up sxml more dramatically.

Here are the results of running my tests with ryan’s/your PR:

pajaro2:/tmp clements> racketcs zz.rkt
cpu time: 12858 real time: 15642 gc time: 4242
ssax:warn: warning at position 150: DOCTYPE DECL plist 
http://www.apple.com/DTDs/PropertyList-1.0.dtd found and skipped
cpu time: 2157 real time: 2342 gc time: 332
pajaro2:/tmp clements> racketcs zz.rkt
cpu time: 10518 real time: 11248 gc time: 3544
ssax:warn: warning at position 150: DOCTYPE DECL plist 
http://www.apple.com/DTDs/PropertyList-1.0.dtd found and skipped
cpu time: 2183 real time: 2327 gc time: 305
pajaro2:/tmp clements> racketcs zz.rkt
cpu time: 10162 real time: 10706 gc time: 3363
ssax:warn: warning at position 150: DOCTYPE DECL plist 
http://www.apple.com/DTDs/PropertyList-1.0.dtd found and skipped
cpu time: 2188 real time: 2325 gc time: 328

(so actually, the first of these was pretty bad. … I suspect that’s a rare 
occurrence.

This broadly matches my first set of timings, which suggests that in racket CS, 
parsing an 18 Megabyte XML file generated by Apple Music “Export Library…” is 
about four times faster in sxml than in xml. 

In BC, by the way, parsing using xml takes about 14 seconds, and parsing using 
sxml takes about seven.

So really, I think maybe the on-the-side takeaway from this is this: CS is much 
faster than BC in this case.

John

> On Jun 28, 2020, at 17:30, Ryan Culpepper  wrote:
> 
> Thanks Alex for pointing out the use of list->string. I've created a PR 
> (https://github.com/racket/racket/pull/3275) that changes that code to use 
> string ports instead (similar to Hendrik's suggestion, but the string port 
> handles resizing automatically). Could someone (John?) with some large XML 
> files lying around try the changes and see if they help?
> 
> Ryan
> 
> 
> On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke  wrote:
> If anyone wants to optimize `read-xml` for particular classes of use, 
> without changing the interface, it might be very helpful to run your 
> representative tests using the statistical profiler.
> 
> The profiler text report takes a little while of tracing through 
> manually to get a feel for how to read and use it, but it can be 
> tremendously useful, and is worth learning to do if you need performance.
> 
> After a first pass with that, you might also want to look at how costly 
> allocations/GC are, and maybe do some controlled experiments around 
> that.  For example, force a few GC cycles, run your workload under 
> profiler, check GC time during, and forced time after.  If you're 
> dealing with very large graphs coming out of the parser, I don't know 
> whether those are enough to matter with the current GC mechanism, but 
> maybe also check GC time while you're holding onto large graphs, when 
> you release them, and after they've been collected.  At some point, GC 
> gets hard for at least me to reason about, but some things make sense, 
> and other things you decide when to stop digging. :)  If you record all 
> your measurements, you can compare empirically the how different changes 
> to the code affect things, hopefully in representative situations.
> 
> I went through a lot of these exercises to optimize a large system, and 
> sped up dynamic Web page loads dramatically in the usual case (to the 
> point we were then mainly limited by PostgreSQL query cost, not much by 
> the application code in Scheme, nor our request network I/O), 
> and also greatly reduced the pain of intermittent request latency spikes 
> due to GC.
> 
> One of the hotspots, I did half a dozen very different implementations, 
> including C extension, and found an old-school pure Scheme 
> implementation was fastest.  I compared the performance of the 
> implementation using something like `shootout`, but there might be 
> better ways now in Racket. https://www.neilvandyke.org/racket/shootout/  
> I also found we could be much faster if we made a change to what the 
> algorithm guarantees, since it was more of a consistency check that 
> turned out to be very expensive and very redundant, due to all the ways 
> that utility code ended up being used.
> 
> In addition to contrived experiments, I also rigged up a runtime option 
> so that the server would save data from the statistical profiler for 
> each request a Web server handled in production.  Which was tremendously 
> useful, since it gave us real-world examples that were also difficult to 
> synthesize (e.g., complex dynamic queries), and we could go from Web 
> logs and user feedback, to exactly what happened.
> 
> (In that system I optimized, we used Oleg's SXML tools very heavily 
> 

Re: [racket-users] note about parsing speed of xml vs sxml?

2020-06-29 Thread Bonface M. K.
Neil Van Dyke  writes:

> I think anyone using XML or HTML seriously with Racket should probably at 
> least
> be told of the SXML family of tools.  And warned about the compatibility
> problems.
>
> Though not tell them *everywhere* XML in the docs.  For example, I 
> figure a
> tutorial for Racket Web Server shouldn't distract readers with that.
>
> As you know, :) there are some useful tools using SXML, and Oleg's SSAX parser
> has some different properties than core Racket's XML parser.
>
> Complication: The incompatibility between SXML and core Racket's 
> representations
> of XML is an unfortunate accident of parallel invention, and I think will
> tend to be confusing to new people.  I once tried to address the confusion in
> the `sxml-intro` documentation package,
> "https://www.neilvandyke.org/racket/sxml-intro/;, and I'm unhappy with the
> result.  The details in my document say more than perhaps anyone will ever 
> want
> to know, and, "optics"-wise, make the situation look worse than it actually is
> in practice.  I think you could do a more graceful job of this.
>
> (Someday, someone might undertake the large task of SXML-ifying all the many
> non-SXML bits of Racket, and incidentally reunite Racket with the rest of the
> Scheme community in that regard.  I started, with one piece, but got
> interrupted. "https://www.neilvandyke.org/racket/rws-html-template/"  :)

Thanks for this! Tbh, I never knew of this.

-- 
Bonface M. K. (https://www.bonfacemunyoki.com)
One Divine Emacs To Rule Them All
GPG key = D4F09EB110177E03C28E2FE1F5BBAE1E0392253F

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/86sgeeqik0.fsf%40gmail.com.


Re: [racket-users] note about parsing speed of xml vs sxml?

2020-06-26 Thread Neil Van Dyke
I think anyone using XML or HTML seriously with Racket should probably 
at least be told of the SXML family of tools.  And warned about the 
compatibility problems.


Though not tell them *everywhere* XML in the docs.  For example, I 
figure a tutorial for Racket Web Server shouldn't distract readers with 
that.


As you know, :) there are some useful tools using SXML, and Oleg's SSAX 
parser has some different properties than core Racket's XML parser.


Complication: The incompatibility between SXML and core Racket's 
representations of XML is an unfortunate accident of parallel 
invention, and I think will tend to be confusing to new people.  I once 
tried to address the confusion in the `sxml-intro` documentation 
package, "https://www.neilvandyke.org/racket/sxml-intro/;, and I'm 
unhappy with the result.  The details in my document say more than 
perhaps anyone will ever want to know, and, "optics"-wise, make the 
situation look worse than it actually is in practice.  I think you could 
do a more graceful job of this.


(Someday, someone might undertake the large task of SXML-ifying all the 
many non-SXML bits of Racket, and incidentally reunite Racket with the 
rest of the Scheme community in that regard.  I started, with one piece, 
but got interrupted. 
"https://www.neilvandyke.org/racket/rws-html-template/"  :)


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/79401c33-1468-d716-aa31-45e4cc018890%40neilvandyke.org.