On Wed, Jun 5, 2019 at 12:26 PM James Geddes <james.ged...@gmail.com> wrote:

> Annoyingly, my colleague had got the same speedup by using Numpy, but that
> does require a change to the way one arranges the calculation (and the
> change is unnatural, to my eye).
>

Also, that's cheating! Numpy is mostly written in C, with all of the safety
issues that entails. Numpy is a great, well-maintained library, but these
issues came up with a library I depend on just this week.

The JSON parser and writer in Python's standard library are not especially
fast,* so there are a lot of JSON packages that do what you need to do to
write fast Python modules: write them in C. At Digital Ricouer, we are
using spaCy, a Python NLP library. They've vendored a fork of one of these
C-level JSON libraries and some similar libraries with the goal of making
maintenance easier. This week, someone found some memory leaks
<https://github.com/explosion/srsly/issues/4>, because manual memory
management is hard. The reporter said, "*Days* were spent on this issue,
because I would never suspect the JSON library to be at fault, but it is."
It turns out the upstream library has had an open PR
<https://github.com/esnme/ultrajson/pull/270> to patch these leaks since
June 2017. The creator of spaCy commented:

> I have to say, I feel pretty sheepish about this. The fact that ujson had
> that PR open for a while was one of the things that made us think that
> vendoring our own fork might not be such a bad idea. But...then I got
> started doing srsly, and I forgot all about that PR :(. So I never ended up
> actually integrating the patch!
>

With Racket, you are getting this speed while working in a safe language,
with C-level bits only where they really do make sense.

*I'd love to see benchmarks of Python's stock JSON parser vs. Racket's,
especially since Matthew's optimizations in Racket 7.3.

On your broader questions:

> However, I'm now unsure how to continue. How should I think about making
> speed improvements?


If you want to optimize Mandelbrot, it happens to be worked through as an
example in The Racket Guide section on parallelism with futures:
https://docs.racket-lang.org/guide/parallelism.html

In general, though, I think a the fundamental question is what kind of
programs you're going to write and where the hot spots are likely to be. I
personally don't do much pure number-crunching, but I think that is an area
where Typed Racket can help: it can effectively translate `+` into
`unsafe-fl+` because the type-checker has proven the arguments are correct.
For more general advice, the Guide has a section on performance:
https://docs.racket-lang.org/guide/performance.html

Two other miscellaneous points that come to mind:

   - `match` does some similar optimizations to Typed Racket, so that it
   can use e.g. `unsafe-car` and `unsafe-cdr` if it knows it's already done a
   `pair?` check.
   - Many sequence constructors like `in-list` document that they "can
   provide better performance for … iteration when [they] appear[] directly in
   a `for
   
<https://docs.racket-lang.org/reference/for.html?q=in-list#%28form._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._for%29%29>`
   clause." The difference is significant: with `in-list`, `in-vector`, etc.,
   the `for`-like forms are generally as fast as hand-written loops. I'm in
   the habit of using these constructors all the time.

-Philip

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CAH3z3gY_4DXDxZ04yJ3RtmNPnAZV6Nzr17PV9naKLejzytHGug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to