Alright, I re-discovered Ryan Culpepper's talk, "The Cost of Sugar," from 
the RacketCon 2018 video stream ( and 
made some progress by following along.

Here are the .zo files larger than 100K:

993K ./vector/compiled/tests_rkt.zo
830K ./scribblings/compiled/glm_scrbl.zo
328K ./vector/compiled/relational_rkt.zo
295K ./vec4/compiled/bool_rkt.zo
291K ./vec4/compiled/int_rkt.zo
290K ./vec4/compiled/uint_rkt.zo
290K ./vec4/compiled/double_rkt.zo
289K ./vec4/compiled/float_rkt.zo
280K ./vec3/compiled/bool_rkt.zo
276K ./vec3/compiled/int_rkt.zo
275K ./vec3/compiled/uint_rkt.zo
275K ./vec3/compiled/double_rkt.zo
274K ./vec3/compiled/float_rkt.zo
262K ./vec2/compiled/bool_rkt.zo
258K ./vec2/compiled/uint_rkt.zo
258K ./vec2/compiled/int_rkt.zo
258K ./vec2/compiled/double_rkt.zo
257K ./vec2/compiled/float_rkt.zo
213K ./vec1/compiled/bool_rkt.zo
210K ./vec1/compiled/uint_rkt.zo
210K ./vec1/compiled/int_rkt.zo
210K ./vec1/compiled/double_rkt.zo
209K ./vec1/compiled/float_rkt.zo
102K ./compiled/main_rkt.zo
101K ./compiled/vector_rkt.zo

I'm pretty sure that's a lot of big files. It's for a port of GLM, a 
graphics math library that implements (among other things) fixed-length 
vectors of up to 4 components over 5 distinct scalar types, for a total of 
20 distinct type-length combinations with many small variations in their 
APIs and implementations.

The variations I'm targeting either require a macro or exacerbate 
developer- or run-time overhead when functions are introduced. For example, 
the base component accessors for a four-component vector of doubles are:


Each of the "xyzw" components has two aliases -- one from "rgba" and 
another from "stpq". Each accessor also has a corresponding mutator, e.g., 
dvec4-g and set-dvec4-g!. 

For another example, whereas adding two dvec4's sums four components,

   (fl+ (dvec4-x v1) (dvec4-x v2))
   (fl+ (dvec4-x v1) (dvec4-x v2))
   (fl+ (dvec4-x v1) (dvec4-x v2))
   (fl+ (dvec4-x v1) (dvec4-x v2)))

the same operation on dvec2's sums only the first two components.

Furthermore, the sheer volume of the target code base makes writing 
everything out by hand a mind-numbing exercise in frustration, and that's 
when looking at a mere 20% of the pile. It's going to get much worse very 
quickly. To add fixed-length matrices up to shape 4x4 over the same scalar 
types, I'm looking at 16x5 = 80 more distinct type-shape combinations!

Getting back to the .zo files, I had no luck running "raco macro-profiler" 
on the top end of the list. It appears to diverge. My dev laptop probably 
doesn't have enough RAM, so I'll have to try again on a bigger machine.

Here's an excerpt from a file on the bottom end:

[eric@walden racket-glm]$ raco macro-profiler glm/vec4/double
profiling (lib "glm/vec4/double.rkt")
Initial code size: 87
Final code size  : 86531
Phase 0
the-template (defined as the-template.1 in glm/vector/template)
  total: 31536, mean: 31536
  direct: 2054, mean: 2054, count: 1, stddev: 0
define-dvec4-unop (defined in "this module")
  total: 7300, mean: 730
  direct: 7480, mean: 748, count: 10, stddev: 0
define/contract (defined in racket/contract/region)
  total: 6666, mean: 44
  direct: 3572, mean: 23, count: 153, stddev: 1.48
define-dvec4-binop (defined in "this module")
  total: 6200, mean: 620
  direct: 6380, mean: 638, count: 10, stddev: 0

Phase 1
for/list (defined in racket/private/for)
  total: 6558, mean: 273
  direct: 2274, mean: 95, count: 24, stddev: 14.94
for/fold/derived/final (defined in racket/private/for)
  total: 4332, mean: 180
  direct: 336, mean: 14, count: 24, stddev: 0
for/fold/derived (defined in racket/private/for)
  total: 4284, mean: 178
  direct: 240, mean: 10, count: 24, stddev: 0
for/foldX/derived (defined in racket/private/for)
  total: 3996, mean: 24
  direct: 3164, mean: 19, count: 170, stddev: 48.16

Wow, does that look like nearly 1000x compression? Three orders of 
magnitude seems right, given what I know about how these macros interact.

The "the-template" macro is defined inside a module generated by my custom 
#%module-begin. It defines 4 type-agnostic, fixed-length module templates 
(e.g., glm/vec4/template), which are instantiated once for each of the 5 
scalar types. Those fixed-length module templates are based, in turn, on 
another module template (glm/vector/template) that takes a length argument 
and uses the other profiled macros (define-dvec4-unop, define/contract, 
define-dvec4-binop) to create 20 component-wise operations per instance. 
All together, that should inflate the size of the output to somewhere near 
the middle of the interval 4x20x5x[1,4], which is 1000.

At phase 1, the comprehension forms are busy churning out component aliases 
and unrolling component-wise operations at "compile" time. I'm reluctant to 
anti-inline these because they keep the written code small and the 
generated code fast.

I guess the next step is to anti-inlinedefine-dvec4-unop and 
define-dvec4-binop, maybe eliminate some define/contract's, and re-profile.


On Friday, March 13, 2020 at 6:20:47 PM UTC-7, Eric Griffis wrote:
> Hello, 
> I've got a package that generates (i.e., expands into) a ridiculous 
> amount of Racket code. I'd like to generate an unbelievable amount of 
> code, but things have already slowed down a lot. 
> At this point, I'm generating 20% of a massive code base and it takes 
> 4 minutes to compile (i.e., raco make) it all. If those numbers scale 
> linearly, I'm looking at 20 minutes to generate and compile the full 
> code base. Realistically, that's an optimistic lower bound. 
> How might I go about profiling something that does most of its work at 
> expansion time? 
> Should I be concerned about knocking over or clogging the package 
> repository when checking in highly "compressed" meta-programs that 
> unfurl at compile time? 
> Thanks! 
> Eric 

You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit

Reply via email to