On Sun, Aug 7, 2022 at 7:05 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Even on a modern Linux:
>
> $ size src/backend/parser/gram.o
>    text    data     bss     dec     hex filename
>  656568       0       0  656568   a04b8 src/backend/parser/gram.o
> $ size src/interfaces/ecpg/preproc/preproc.o
>    text    data     bss     dec     hex filename
>  912005     188    7348  919541   e07f5 src/interfaces/ecpg/preproc/preproc.o
>
> So there's something pretty bloated there.  It doesn't seem like
> ecpg's additional productions should justify a nigh 50% code
> size increase.

Comparing gram.o with preproc.o:

$ objdump -t src/backend/parser/gram.o             | grep yy | grep -v
UND | awk '{print $5, $6}' | sort -r | head -n3
000000000003a24a yytable
000000000003a24a yycheck
0000000000013672 base_yyparse

$ objdump -t src/interfaces/ecpg/preproc/preproc.o | grep yy | grep -v
UND | awk '{print $5, $6}' | sort -r | head -n3
000000000004d8e2 yytable
000000000004d8e2 yycheck
000000000002841e base_yyparse

The largest lookup tables are ~25% bigger (other tables are trivial in
comparison), and the function base_yyparse is about double the size,
most of which is a giant switch statement with 2510 / 3912 cases,
respectively. That difference does seem excessive. I've long wondered
if it would be possible / feasible to have more strict separation for
each C, ECPG commands, and SQL. That sounds like a huge amount of
work, though.

Playing around with the compiler flags on preproc.c, I get these
compile times, gcc memory usage as reported by /usr/bin/time -v , and
symbol sizes (non-debug build):

-O2:
time 8.0s
Maximum resident set size (kbytes): 255884

-O1:
time 6.3s
Maximum resident set size (kbytes): 170636
000000000004d8e2 yytable
000000000004d8e2 yycheck
00000000000292de base_yyparse

-O0:
time 2.9s
Maximum resident set size (kbytes): 153148
000000000004d8e2 yytable
000000000004d8e2 yycheck
000000000003585e base_yyparse

Note that -O0 bloats the binary probably because it's not using a jump
table anymore. O1 might be worth it just to reduce build times for
slower animals, even if Noah reported this didn't help the issue
upthread. I suspect it wouldn't slow down production use much since
the output needs to be compiled anyway.

-- 
John Naylor
EDB: http://www.enterprisedb.com


Reply via email to