Re: COBOL: Hoping for insight with middle-end computation time.
On Sat, Mar 21, 2026 at 7:49 PM Robert Dubner wrote: > > > > > -Original Message- > > From: Jakub Jelinek > > Sent: Saturday, March 21, 2026 05:25 > > To: Robert Dubner > > Cc: Richard Biener ; David Malcolm > > ; H. J. Lu ; [email protected]; > James > > K. Lowden > > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > > > On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote: > > > 10,000 repeats of that code in the C++ program compiles in 1.36 > seconds. > > > 20,000 repeats 3.18 > seconds. > > > 40,000 repeats 7.92 > seconds. > > > > > > 10,000 repeats in the COBOL program 16.76 > seconds. > > > 20,000 repeats in the COBOL program 97.40 > seconds. > > > 10,000 repeats in the COBOL program 551.56 > seconds. > > > > Perhaps also look at -fdump-tree-ssa-vops dump differences too, that > will > > make it clearer if there aren't differences in what is TREE_ADDRESSABLE > and > > what is not, or what is a global var and what could have been rewritten > into > > SSA form. > > > > Jakub > > Thank you very much. I will do that. > > Is there something special that the front end needs to do with GLOBAL or > static variables? For globals/statics you are eventually expected to call rest_of_decl_compilation, but that's something you already do. Richard.
Re: COBOL: Hoping for insight with middle-end computation time.
On Sat, Mar 21, 2026 at 8:15 PM Robert Dubner wrote: > > > > > -Original Message- > > From: Richard Biener > > Sent: Saturday, March 21, 2026 06:42 > > To: Jakub Jelinek > > Cc: Robert Dubner ; David Malcolm > > ; H. > > J. Lu ; [email protected]; James K. Lowden > > > > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > > > On Sat, Mar 21, 2026 at 10:25 AM Jakub Jelinek wrote: > > > > > > On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote: > > > > 10,000 repeats of that code in the C++ program compiles in 1.36 > > > > seconds. > > > > 20,000 repeats 3.18 > > > > seconds. > > > > 40,000 repeats 7.92 > > > > seconds. > > > > > > > > 10,000 repeats in the COBOL program 16.76 > > > > seconds. > > > > 20,000 repeats in the COBOL program 97.40 > > > > seconds. > > > > 10,000 repeats in the COBOL program 551.56 > > > > seconds. > > > > > > Perhaps also look at -fdump-tree-ssa-vops dump differences too, that > > > will > > > make it clearer if there aren't differences in what is TREE_ADDRESSABLE > > > and > > > what is not, or what is a global var and what could have been rewritten > > > into > > > SSA form. > > > > Looking at -gimple there's a single scope block with try/finally and all > > clobbers in the finally block. I suppose given that cobol emits a single > > function only we could elide the end-of-function clobbers for it. > > > > The D.nnn are registers, not memory AFAICS. > > > > What's a bit odd is that there seems to be global variables > > called __gg__treeplet_{1,2,3}{f,o,s} where we store addresses > > of aaa, etc, into: > > > > __gg__treeplet_1f.58_103 = __gg__treeplet_1f; > > _104 = 0; > > _105 = __gg__treeplet_1f.58_103 + _104; > > *_105 = &aaa.73.0; > > > > and the actual computation happens in > > > > __gg__add_fixed_phase1 (2, 2, 0, 0, > > __gg__arithmetic_rounds.64_121, 0, &D.321); > > > > I assume which seems to indicate that argumets get passed via global > > variables > > rather than formal arguments. That might in practice help code > > generation at -O0 > > though. > > > > I suspect it's simply very many life variables (the D.) that need > > stack > > space and make the x86 stack var analysis code slow. > > > > I do wonder about > > > > _intermediate__stack327_329.0.0.data = &_stack327_data_330.0; > > _intermediate__stack327_329.0.0.capacity = 16; > > _intermediate__stack327_329.0.0.allocated = 16; > > _intermediate__stack327_329.0.0.offset = 0; > > _intermediate__stack327_329.0.0.name = &"_stack327"[0]; > > _intermediate__stack327_329.0.0.picture = &""[0]; > > _intermediate__stack327_329.0.0.initial = 0B; > > _intermediate__stack327_329.0.0.parent = 0B; > > _intermediate__stack327_329.0.0.occurs_lower = 0; > > _intermediate__stack327_329.0.0.occurs_upper = 0; > > _intermediate__stack327_329.0.0.attr = 4160; > > _intermediate__stack327_329.0.0.type = 6; > > _intermediate__stack327_329.0.0.level = 0; > > _intermediate__stack327_329.0.0.digits = 37; > > _intermediate__stack327_329.0.0.rdigits = 0; > > _intermediate__stack327_329.0.0.encoding = 1; > > _intermediate__stack327_329.0.0.alphabet = 0; > > __gg__initialize_variable_clean (&_intermediate__stack327_329.0.0, > > 32768); > > > > so why do we have inline initialization of the variable but also a call to > > apparently do sth similar? That seems to be abstraction that is at least > > oddly designed. > > > > With all of the above it would help if those temporary variables like > > _intermediate__stack327_329.0.0 or the D.nnn would be wrapped > > in their own scope block as to limit their lifetime given at least > > the stack ones are address-taken. So in C terms, have > > > > { > > compute ddd = aaa + bbb > > } > > { > > compute ddd = aaa + bbb > > } > > ... > > > > so the frontend emits those temporaries into its own scope wrapping > > the computes to limit their lifetime.
RE: COBOL: Hoping for insight with middle-end computation time.
> -Original Message- > From: Richard Biener > Sent: Saturday, March 21, 2026 06:42 > To: Jakub Jelinek > Cc: Robert Dubner ; David Malcolm > ; H. > J. Lu ; [email protected]; James K. Lowden > > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > On Sat, Mar 21, 2026 at 10:25 AM Jakub Jelinek wrote: > > > > On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote: > > > 10,000 repeats of that code in the C++ program compiles in 1.36 > > > seconds. > > > 20,000 repeats 3.18 > > > seconds. > > > 40,000 repeats 7.92 > > > seconds. > > > > > > 10,000 repeats in the COBOL program 16.76 > > > seconds. > > > 20,000 repeats in the COBOL program 97.40 > > > seconds. > > > 10,000 repeats in the COBOL program 551.56 > > > seconds. > > > > Perhaps also look at -fdump-tree-ssa-vops dump differences too, that > > will > > make it clearer if there aren't differences in what is TREE_ADDRESSABLE > > and > > what is not, or what is a global var and what could have been rewritten > > into > > SSA form. > > Looking at -gimple there's a single scope block with try/finally and all > clobbers in the finally block. I suppose given that cobol emits a single > function only we could elide the end-of-function clobbers for it. > > The D.nnn are registers, not memory AFAICS. > > What's a bit odd is that there seems to be global variables > called __gg__treeplet_{1,2,3}{f,o,s} where we store addresses > of aaa, etc, into: > > __gg__treeplet_1f.58_103 = __gg__treeplet_1f; > _104 = 0; > _105 = __gg__treeplet_1f.58_103 + _104; > *_105 = &aaa.73.0; > > and the actual computation happens in > > __gg__add_fixed_phase1 (2, 2, 0, 0, > __gg__arithmetic_rounds.64_121, 0, &D.321); > > I assume which seems to indicate that argumets get passed via global > variables > rather than formal arguments. That might in practice help code > generation at -O0 > though. > > I suspect it's simply very many life variables (the D.) that need > stack > space and make the x86 stack var analysis code slow. > > I do wonder about > > _intermediate__stack327_329.0.0.data = &_stack327_data_330.0; > _intermediate__stack327_329.0.0.capacity = 16; > _intermediate__stack327_329.0.0.allocated = 16; > _intermediate__stack327_329.0.0.offset = 0; > _intermediate__stack327_329.0.0.name = &"_stack327"[0]; > _intermediate__stack327_329.0.0.picture = &""[0]; > _intermediate__stack327_329.0.0.initial = 0B; > _intermediate__stack327_329.0.0.parent = 0B; > _intermediate__stack327_329.0.0.occurs_lower = 0; > _intermediate__stack327_329.0.0.occurs_upper = 0; > _intermediate__stack327_329.0.0.attr = 4160; > _intermediate__stack327_329.0.0.type = 6; > _intermediate__stack327_329.0.0.level = 0; > _intermediate__stack327_329.0.0.digits = 37; > _intermediate__stack327_329.0.0.rdigits = 0; > _intermediate__stack327_329.0.0.encoding = 1; > _intermediate__stack327_329.0.0.alphabet = 0; > __gg__initialize_variable_clean (&_intermediate__stack327_329.0.0, > 32768); > > so why do we have inline initialization of the variable but also a call to > apparently do sth similar? That seems to be abstraction that is at least > oddly designed. > > With all of the above it would help if those temporary variables like > _intermediate__stack327_329.0.0 or the D.nnn would be wrapped > in their own scope block as to limit their lifetime given at least > the stack ones are address-taken. So in C terms, have > > { > compute ddd = aaa + bbb > } > { > compute ddd = aaa + bbb > } > ... > > so the frontend emits those temporaries into its own scope wrapping > the computes to limit their lifetime. Scopes in GENERIC are BLOCKs > and in the IL you'd have BIND_EXPRs. > > Richard. The reason for __gg__initialize_variable_clean is two-fold. 1) COBOL variables can have a hierarchical structure, where the storage for multiple variables exist in a single memory block. The initialize function establishes the data area for each variable by adding the offset to the base. (This could be done in GENERIC, and maybe I will.) But 2) COBOL has the INITIALIZE statement, which can set the data area to a number of different possibilities, including the original val
RE: COBOL: Hoping for insight with middle-end computation time.
> -Original Message- > From: Richard Biener > Sent: Saturday, March 21, 2026 03:33 > To: Robert Dubner > Cc: David Malcolm ; H. J. Lu ; > [email protected]; James K. Lowden > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > > > > Am 21.03.2026 um 04:17 schrieb Robert Dubner : > > > > Let's try another line of query. > > > > Maybe here are things I should be doing that I am not, or things that I > > am > > doing that I shouldn't. > > > > As described earlier, I created a version of the gcobol/cobol1 compiler > > that > > produced a stripped-down GENERIC that resulted in this repeated pattern > > in > > the -fdump-tree-original output. > > > > _intermediate__stack1_3.0.0.data = &_stack1_data_17.0; > > _intermediate__stack1_3.0.0.capacity = 16; > > _intermediate__stack1_3.0.0.allocated = 16; > > _intermediate__stack1_3.0.0.offset = 0; > > _intermediate__stack1_3.0.0.name = &"_stack1"[0]; > > _intermediate__stack1_3.0.0.picture = &""[0]; > > _intermediate__stack1_3.0.0.initial = 0B; > > _intermediate__stack1_3.0.0.parent = 0B; > > _intermediate__stack1_3.0.0.occurs_lower = 0; > > _intermediate__stack1_3.0.0.occurs_upper = 0; > > _intermediate__stack1_3.0.0.attr = 4160; > > _intermediate__stack1_3.0.0.type = 6; > > _intermediate__stack1_3.0.0.level = 0; > > _intermediate__stack1_3.0.0.digits = 37; > > _intermediate__stack1_3.0.0.rdigits = 0; > > _intermediate__stack1_3.0.0.encoding = 1; > > _intermediate__stack1_3.0.0.alphabet = 0; > > D.311 = 0; > > D.312 = 0; > > _13 = D.311 & 18; > > if (_13 != 0) goto ; else goto ; > > : > > goto ; > > : > > D.314 = 0; > > ..pa_erf.19_14 = ..pa_erf; > > D.312 = D.312 | ..pa_erf.19_14; > > if (D.312 != 0) goto ; else goto ; > > : > > goto ; > > : > > : > > : > > > > I have experimented with a C program until I came up with the same > > .gimple: > > > > typedef struct strct > > { > > int a0; int a1; int a2; int a3; int a4; int a5; int a6; int a7; int a8; > > int a9; > > }strct; > > > > int > > main(int argc, char **argv) > > { > > strct strc0{0,1,2,3,4,5,6,7,8,9}; > > int x0=0; > > int y0=0; > > int z0; > > y0=x0&18; > > if(y0!=0) > >{} > > else > >{ > >z0=z0|pa_erf; > >if(z0!=0) > > {} > >} > > } > > > > Here is a side-by-side comparison of the repetitive .gimple generated by > > the > > two compilations: > > > > From the gcobol compilation From the xg++ > > compilation > > > - > > D.322 = 0; x3 = 0; > > D.323 = 0; y3 = 0; > > _17 = D.322 & 18;y3 = x3 & 18; > > if (_17 != 0) goto ; else goto ; if (y3 != 0) goto > > ; > > else goto ; > > : : > > goto ;goto ; > > : : > > D.324 = 0; > > ..pa_erf.25_18 = ..pa_erf; pa_erf.3_4 = pa_erf; > > D.323 = D.323 | ..pa_erf.25_18; z3 = z3 | pa_erf.3_4; > > if (D.323 != 0) goto ; else goto ; if (z3 != 0) goto > > ; > > else goto ; > > : : > > goto ;goto ; > > : : > > : : > > : : > > You can see they are essentially identical. > > Could it be your D.324 is global? Comparing the GIMPLE would tell. > Possibly > also BIND_EXPRs matter. -blocks dumps (some) also dump the scope tree. > > > However! > > > > 10,000 repeats of that code in the C++ program compiles in 1.36 seconds. > > 20,000 repeats 3.18 seconds. > > 40,000 repeats 7.92 seconds. > > > > 10,000 repeats in the COBOL program 16.76 seconds. > > 20,000 repeats in the COBOL program 97.40 seconds. > > 10,000 repeats in
RE: COBOL: Hoping for insight with middle-end computation time.
> -Original Message- > From: Jakub Jelinek > Sent: Saturday, March 21, 2026 05:25 > To: Robert Dubner > Cc: Richard Biener ; David Malcolm > ; H. J. Lu ; [email protected]; James > K. Lowden > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote: > > 10,000 repeats of that code in the C++ program compiles in 1.36 seconds. > > 20,000 repeats 3.18 seconds. > > 40,000 repeats 7.92 seconds. > > > > 10,000 repeats in the COBOL program 16.76 seconds. > > 20,000 repeats in the COBOL program 97.40 seconds. > > 10,000 repeats in the COBOL program 551.56 seconds. > > Perhaps also look at -fdump-tree-ssa-vops dump differences too, that will > make it clearer if there aren't differences in what is TREE_ADDRESSABLE and > what is not, or what is a global var and what could have been rewritten into > SSA form. > > Jakub Thank you very much. I will do that. Is there something special that the front end needs to do with GLOBAL or static variables?
Re: COBOL: Hoping for insight with middle-end computation time.
On Sat, Mar 21, 2026 at 10:25 AM Jakub Jelinek wrote:
>
> On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote:
> > 10,000 repeats of that code in the C++ program compiles in 1.36 seconds.
> > 20,000 repeats 3.18 seconds.
> > 40,000 repeats 7.92 seconds.
> >
> > 10,000 repeats in the COBOL program 16.76 seconds.
> > 20,000 repeats in the COBOL program 97.40 seconds.
> > 10,000 repeats in the COBOL program 551.56 seconds.
>
> Perhaps also look at -fdump-tree-ssa-vops dump differences too, that will
> make it clearer if there aren't differences in what is TREE_ADDRESSABLE and
> what is not, or what is a global var and what could have been rewritten into
> SSA form.
Looking at -gimple there's a single scope block with try/finally and all
clobbers in the finally block. I suppose given that cobol emits a single
function only we could elide the end-of-function clobbers for it.
The D.nnn are registers, not memory AFAICS.
What's a bit odd is that there seems to be global variables
called __gg__treeplet_{1,2,3}{f,o,s} where we store addresses
of aaa, etc, into:
__gg__treeplet_1f.58_103 = __gg__treeplet_1f;
_104 = 0;
_105 = __gg__treeplet_1f.58_103 + _104;
*_105 = &aaa.73.0;
and the actual computation happens in
__gg__add_fixed_phase1 (2, 2, 0, 0,
__gg__arithmetic_rounds.64_121, 0, &D.321);
I assume which seems to indicate that argumets get passed via global variables
rather than formal arguments. That might in practice help code
generation at -O0
though.
I suspect it's simply very many life variables (the D.) that need stack
space and make the x86 stack var analysis code slow.
I do wonder about
_intermediate__stack327_329.0.0.data = &_stack327_data_330.0;
_intermediate__stack327_329.0.0.capacity = 16;
_intermediate__stack327_329.0.0.allocated = 16;
_intermediate__stack327_329.0.0.offset = 0;
_intermediate__stack327_329.0.0.name = &"_stack327"[0];
_intermediate__stack327_329.0.0.picture = &""[0];
_intermediate__stack327_329.0.0.initial = 0B;
_intermediate__stack327_329.0.0.parent = 0B;
_intermediate__stack327_329.0.0.occurs_lower = 0;
_intermediate__stack327_329.0.0.occurs_upper = 0;
_intermediate__stack327_329.0.0.attr = 4160;
_intermediate__stack327_329.0.0.type = 6;
_intermediate__stack327_329.0.0.level = 0;
_intermediate__stack327_329.0.0.digits = 37;
_intermediate__stack327_329.0.0.rdigits = 0;
_intermediate__stack327_329.0.0.encoding = 1;
_intermediate__stack327_329.0.0.alphabet = 0;
__gg__initialize_variable_clean (&_intermediate__stack327_329.0.0, 32768);
so why do we have inline initialization of the variable but also a call to
apparently do sth similar? That seems to be abstraction that is at least
oddly designed.
With all of the above it would help if those temporary variables like
_intermediate__stack327_329.0.0 or the D.nnn would be wrapped
in their own scope block as to limit their lifetime given at least
the stack ones are address-taken. So in C terms, have
{
compute ddd = aaa + bbb
}
{
compute ddd = aaa + bbb
}
...
so the frontend emits those temporaries into its own scope wrapping
the computes to limit their lifetime. Scopes in GENERIC are BLOCKs
and in the IL you'd have BIND_EXPRs.
Richard.
>
> Jakub
>
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote: > 10,000 repeats of that code in the C++ program compiles in 1.36 seconds. > 20,000 repeats 3.18 seconds. > 40,000 repeats 7.92 seconds. > > 10,000 repeats in the COBOL program 16.76 seconds. > 20,000 repeats in the COBOL program 97.40 seconds. > 10,000 repeats in the COBOL program 551.56 seconds. Perhaps also look at -fdump-tree-ssa-vops dump differences too, that will make it clearer if there aren't differences in what is TREE_ADDRESSABLE and what is not, or what is a global var and what could have been rewritten into SSA form. Jakub
Re: COBOL: Hoping for insight with middle-end computation time.
> Am 21.03.2026 um 04:17 schrieb Robert Dubner :
>
> Let's try another line of query.
>
> Maybe here are things I should be doing that I am not, or things that I am
> doing that I shouldn't.
>
> As described earlier, I created a version of the gcobol/cobol1 compiler that
> produced a stripped-down GENERIC that resulted in this repeated pattern in
> the -fdump-tree-original output.
>
> _intermediate__stack1_3.0.0.data = &_stack1_data_17.0;
> _intermediate__stack1_3.0.0.capacity = 16;
> _intermediate__stack1_3.0.0.allocated = 16;
> _intermediate__stack1_3.0.0.offset = 0;
> _intermediate__stack1_3.0.0.name = &"_stack1"[0];
> _intermediate__stack1_3.0.0.picture = &""[0];
> _intermediate__stack1_3.0.0.initial = 0B;
> _intermediate__stack1_3.0.0.parent = 0B;
> _intermediate__stack1_3.0.0.occurs_lower = 0;
> _intermediate__stack1_3.0.0.occurs_upper = 0;
> _intermediate__stack1_3.0.0.attr = 4160;
> _intermediate__stack1_3.0.0.type = 6;
> _intermediate__stack1_3.0.0.level = 0;
> _intermediate__stack1_3.0.0.digits = 37;
> _intermediate__stack1_3.0.0.rdigits = 0;
> _intermediate__stack1_3.0.0.encoding = 1;
> _intermediate__stack1_3.0.0.alphabet = 0;
> D.311 = 0;
> D.312 = 0;
> _13 = D.311 & 18;
> if (_13 != 0) goto ; else goto ;
> :
> goto ;
> :
> D.314 = 0;
> ..pa_erf.19_14 = ..pa_erf;
> D.312 = D.312 | ..pa_erf.19_14;
> if (D.312 != 0) goto ; else goto ;
> :
> goto ;
> :
> :
> :
>
> I have experimented with a C program until I came up with the same .gimple:
>
> typedef struct strct
> {
> int a0; int a1; int a2; int a3; int a4; int a5; int a6; int a7; int a8;
> int a9;
> }strct;
>
> int
> main(int argc, char **argv)
> {
> strct strc0{0,1,2,3,4,5,6,7,8,9};
> int x0=0;
> int y0=0;
> int z0;
> y0=x0&18;
> if(y0!=0)
>{}
> else
>{
>z0=z0|pa_erf;
>if(z0!=0)
> {}
>}
> }
>
> Here is a side-by-side comparison of the repetitive .gimple generated by the
> two compilations:
>
> From the gcobol compilation From the xg++ compilation
> --- -
> D.322 = 0; x3 = 0;
> D.323 = 0; y3 = 0;
> _17 = D.322 & 18;y3 = x3 & 18;
> if (_17 != 0) goto ; else goto ; if (y3 != 0) goto ;
> else goto ;
> : :
> goto ;goto ;
> : :
> D.324 = 0;
> ..pa_erf.25_18 = ..pa_erf; pa_erf.3_4 = pa_erf;
> D.323 = D.323 | ..pa_erf.25_18; z3 = z3 | pa_erf.3_4;
> if (D.323 != 0) goto ; else goto ; if (z3 != 0) goto ;
> else goto ;
> : :
> goto ;goto ;
> : :
> : :
> : :
> You can see they are essentially identical.
Could it be your D.324 is global? Comparing the GIMPLE would tell. Possibly
also BIND_EXPRs matter. -blocks dumps (some) also dump the scope tree.
> However!
>
> 10,000 repeats of that code in the C++ program compiles in 1.36 seconds.
> 20,000 repeats 3.18 seconds.
> 40,000 repeats 7.92 seconds.
>
> 10,000 repeats in the COBOL program 16.76 seconds.
> 20,000 repeats in the COBOL program 97.40 seconds.
> 10,000 repeats in the COBOL program 551.56 seconds.
>
> Those times are -O0 compilations from the same build of gcc, invoked in the
> same ways.
>
> So. I am obviously doing something different in gcobol than whatever is
> happening in xg++.
>
> The GENERIC I am creating is apparently about right. So, now I have to ask
> a question that I probably should have asked years ago:
>
> What do I need to do before creating the GENERIC for a function?
>
> What do I need to do afterward?
>
> Right now I don't think I am doing anything special before generating any
> GENERIC.
>
> At the end of a source code module, I call
>
>cgraph_node::finalize_function(function_decl, true);
>
> for each function I created in that module.
>
> What am I missing? What could I be doing, or not doing, that's causing that
> N-squared behavior?
I don’t think you are doing anything obviously wrong. With cobol you are going
to have a single large function which is somewhat special and more prone to run
into such bugs where people were lazy and never imagined there’s so large data.
We’ll have to fix those algorithms eventually.
> I am feeling like somebody who has found a car, and figured out how to get
> it started and get it moving in first gear, but hasn'
RE: COBOL: Hoping for insight with middle-end computation time.
Let's try another line of query.
Maybe here are things I should be doing that I am not, or things that I am
doing that I shouldn't.
As described earlier, I created a version of the gcobol/cobol1 compiler that
produced a stripped-down GENERIC that resulted in this repeated pattern in
the -fdump-tree-original output.
_intermediate__stack1_3.0.0.data = &_stack1_data_17.0;
_intermediate__stack1_3.0.0.capacity = 16;
_intermediate__stack1_3.0.0.allocated = 16;
_intermediate__stack1_3.0.0.offset = 0;
_intermediate__stack1_3.0.0.name = &"_stack1"[0];
_intermediate__stack1_3.0.0.picture = &""[0];
_intermediate__stack1_3.0.0.initial = 0B;
_intermediate__stack1_3.0.0.parent = 0B;
_intermediate__stack1_3.0.0.occurs_lower = 0;
_intermediate__stack1_3.0.0.occurs_upper = 0;
_intermediate__stack1_3.0.0.attr = 4160;
_intermediate__stack1_3.0.0.type = 6;
_intermediate__stack1_3.0.0.level = 0;
_intermediate__stack1_3.0.0.digits = 37;
_intermediate__stack1_3.0.0.rdigits = 0;
_intermediate__stack1_3.0.0.encoding = 1;
_intermediate__stack1_3.0.0.alphabet = 0;
D.311 = 0;
D.312 = 0;
_13 = D.311 & 18;
if (_13 != 0) goto ; else goto ;
:
goto ;
:
D.314 = 0;
..pa_erf.19_14 = ..pa_erf;
D.312 = D.312 | ..pa_erf.19_14;
if (D.312 != 0) goto ; else goto ;
:
goto ;
:
:
:
I have experimented with a C program until I came up with the same .gimple:
typedef struct strct
{
int a0; int a1; int a2; int a3; int a4; int a5; int a6; int a7; int a8;
int a9;
}strct;
int
main(int argc, char **argv)
{
strct strc0{0,1,2,3,4,5,6,7,8,9};
int x0=0;
int y0=0;
int z0;
y0=x0&18;
if(y0!=0)
{}
else
{
z0=z0|pa_erf;
if(z0!=0)
{}
}
}
Here is a side-by-side comparison of the repetitive .gimple generated by the
two compilations:
>From the gcobol compilation From the xg++ compilation
--- -
D.322 = 0; x3 = 0;
D.323 = 0; y3 = 0;
_17 = D.322 & 18;y3 = x3 & 18;
if (_17 != 0) goto ; else goto ; if (y3 != 0) goto ;
else goto ;
: :
goto ;goto ;
: :
D.324 = 0;
..pa_erf.25_18 = ..pa_erf; pa_erf.3_4 = pa_erf;
D.323 = D.323 | ..pa_erf.25_18; z3 = z3 | pa_erf.3_4;
if (D.323 != 0) goto ; else goto ; if (z3 != 0) goto ;
else goto ;
: :
goto ;goto ;
: :
: :
: :
You can see they are essentially identical.
However!
10,000 repeats of that code in the C++ program compiles in 1.36 seconds.
20,000 repeats 3.18 seconds.
40,000 repeats 7.92 seconds.
10,000 repeats in the COBOL program 16.76 seconds.
20,000 repeats in the COBOL program 97.40 seconds.
10,000 repeats in the COBOL program 551.56 seconds.
Those times are -O0 compilations from the same build of gcc, invoked in the
same ways.
So. I am obviously doing something different in gcobol than whatever is
happening in xg++.
The GENERIC I am creating is apparently about right. So, now I have to ask
a question that I probably should have asked years ago:
What do I need to do before creating the GENERIC for a function?
What do I need to do afterward?
Right now I don't think I am doing anything special before generating any
GENERIC.
At the end of a source code module, I call
cgraph_node::finalize_function(function_decl, true);
for each function I created in that module.
What am I missing? What could I be doing, or not doing, that's causing that
N-squared behavior?
I am feeling like somebody who has found a car, and figured out how to get
it started and get it moving in first gear, but hasn't figured out how to
get to any of the higher gears.
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, Mar 20, 2026 at 8:09 PM David Malcolm via Gcc wrote: > > On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote: > > > > > > > -Original Message- > > > From: David Malcolm > > > Sent: Friday, March 20, 2026 10:10 > > > To: Robert Dubner ; [email protected] > > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > > time. > > > > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > > > It happens that COBOL has the COMPUTE statement. It takes the > > > > form > > > > of, > > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > > > We implement that by creating a temporary variable, using that as > > > > the > > > > target of an addition of AAA and BBB, and then doing an > > > > assignment to > > > > DDD. > > > > (Recall that COBOL variables can be quite complex, so we are a > > > > long > > > > way > > > > from being able to do this with an ADD_EXPR.) > > > > > > > > We have determined that the way I've been producing GENERIC for > > > > that > > > > results in N-squared computation time somewhere in the middle > > > > end. > > > > > > > > I have been tearing my hair out trying to figure out what's > > > > causing > > > > that > > > > N-squared behavior. I commented away the assignment, and I got > > > > rid > > > > of the > > > > arithmetic. All that's left is the creation of the temporary, > > > > and > > > > some IF > > > > statements that are generated to test for errors along the way of > > > > the > > > > computation. (COBOL has a very rich error-detection and > > > > exception-generating facility. > > > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > > times > > > > for repetitions of that GIMPLE are shown here: > > > > > > > > phase optFactor > > > > Repeats & generate > > > > 1,000 0.17 > > > > 2,000 0.49 2.9 > > > > 4,000 1.55 3.2 > > > > 8,000 7.56 4.9 > > > > 16,000 49.31 6.5 > > > > 32,000 281.29 5.7 > > > > > > > > I have been struggling with this for days. Is there an > > > > explanation > > > > for > > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > > > Hi Bob. You cite the overall "phase opt and generate" times, but > > > no > > > data on how this is spread across the various optimization passes. > > > > > > What's the output of -ftime-report on your workload? > > > > > > In particular, are there any specific passes that are responsible > > > for > > > the growth in time (and thus where we can pinpoint a bug), or is > > > the > > > time evenly distributed across all of them? > > > > > > Sorry if this is a silly question > > > Dave > > > > The only silly thing is that in an egregious display of monumental > > arrogance > > and ignorance, some years back I decided to take on the problem of > > code > > generation for a new front end without having had any prior > > experience with > > GCC internals or compiler theory, and without access to anybody who > > actually > > knew anything. > > > > I hope you've seen my response to Richard. For a compilation with > > > > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > > > > the next big component is > > > > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > > > > What that means is yet one more mystery to me. > > 64% of the wallclock time is being accounted to this timing item. > Looking in timevar.def, I see that this timing item is: > > DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue") > > Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in > function.cc that account their time to this timevar: > pass_thread_prologue_and_epilogue and > pass_late_thread_prologue_and_epilogue. > > Both of these passes ultimately call the function > rest_of_handle_thread_prologue_and_ep
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, Mar 20, 2026 at 9:05 PM Richard Biener wrote: > > On Fri, Mar 20, 2026 at 8:58 PM Richard Biener > wrote: > > > > On Fri, Mar 20, 2026 at 8:09 PM David Malcolm via Gcc > > wrote: > > > > > > On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote: > > > > > > > > > > > > > -Original Message- > > > > > From: David Malcolm > > > > > Sent: Friday, March 20, 2026 10:10 > > > > > To: Robert Dubner ; [email protected] > > > > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > > > > time. > > > > > > > > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > > > > > It happens that COBOL has the COMPUTE statement. It takes the > > > > > > form > > > > > > of, > > > > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > > > > > > > We implement that by creating a temporary variable, using that as > > > > > > the > > > > > > target of an addition of AAA and BBB, and then doing an > > > > > > assignment to > > > > > > DDD. > > > > > > (Recall that COBOL variables can be quite complex, so we are a > > > > > > long > > > > > > way > > > > > > from being able to do this with an ADD_EXPR.) > > > > > > > > > > > > We have determined that the way I've been producing GENERIC for > > > > > > that > > > > > > results in N-squared computation time somewhere in the middle > > > > > > end. > > > > > > > > > > > > I have been tearing my hair out trying to figure out what's > > > > > > causing > > > > > > that > > > > > > N-squared behavior. I commented away the assignment, and I got > > > > > > rid > > > > > > of the > > > > > > arithmetic. All that's left is the creation of the temporary, > > > > > > and > > > > > > some IF > > > > > > statements that are generated to test for errors along the way of > > > > > > the > > > > > > computation. (COBOL has a very rich error-detection and > > > > > > exception-generating facility. > > > > > > > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > > > > times > > > > > > for repetitions of that GIMPLE are shown here: > > > > > > > > > > > > phase optFactor > > > > > > Repeats & generate > > > > > > 1,000 0.17 > > > > > > 2,000 0.49 2.9 > > > > > > 4,000 1.55 3.2 > > > > > > 8,000 7.56 4.9 > > > > > > 16,000 49.31 6.5 > > > > > > 32,000 281.29 5.7 > > > > > > > > > > > > I have been struggling with this for days. Is there an > > > > > > explanation > > > > > > for > > > > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > > > > > > > Hi Bob. You cite the overall "phase opt and generate" times, but > > > > > no > > > > > data on how this is spread across the various optimization passes. > > > > > > > > > > What's the output of -ftime-report on your workload? > > > > > > > > > > In particular, are there any specific passes that are responsible > > > > > for > > > > > the growth in time (and thus where we can pinpoint a bug), or is > > > > > the > > > > > time evenly distributed across all of them? > > > > > > > > > > Sorry if this is a silly question > > > > > Dave > > > > > > > > The only silly thing is that in an egregious display of monumental > > > > arrogance > > > > and ignorance, some years back I decided to take on the problem of > > > > code > > > > generation for a new front end without having had any prior > > > > experience with > > > > GCC internals
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, Mar 20, 2026 at 8:58 PM Richard Biener wrote: > > On Fri, Mar 20, 2026 at 8:09 PM David Malcolm via Gcc wrote: > > > > On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote: > > > > > > > > > > -Original Message- > > > > From: David Malcolm > > > > Sent: Friday, March 20, 2026 10:10 > > > > To: Robert Dubner ; [email protected] > > > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > > > time. > > > > > > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > > > > It happens that COBOL has the COMPUTE statement. It takes the > > > > > form > > > > > of, > > > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > > > > > We implement that by creating a temporary variable, using that as > > > > > the > > > > > target of an addition of AAA and BBB, and then doing an > > > > > assignment to > > > > > DDD. > > > > > (Recall that COBOL variables can be quite complex, so we are a > > > > > long > > > > > way > > > > > from being able to do this with an ADD_EXPR.) > > > > > > > > > > We have determined that the way I've been producing GENERIC for > > > > > that > > > > > results in N-squared computation time somewhere in the middle > > > > > end. > > > > > > > > > > I have been tearing my hair out trying to figure out what's > > > > > causing > > > > > that > > > > > N-squared behavior. I commented away the assignment, and I got > > > > > rid > > > > > of the > > > > > arithmetic. All that's left is the creation of the temporary, > > > > > and > > > > > some IF > > > > > statements that are generated to test for errors along the way of > > > > > the > > > > > computation. (COBOL has a very rich error-detection and > > > > > exception-generating facility. > > > > > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > > > times > > > > > for repetitions of that GIMPLE are shown here: > > > > > > > > > > phase optFactor > > > > > Repeats & generate > > > > > 1,000 0.17 > > > > > 2,000 0.49 2.9 > > > > > 4,000 1.55 3.2 > > > > > 8,000 7.56 4.9 > > > > > 16,000 49.31 6.5 > > > > > 32,000 281.29 5.7 > > > > > > > > > > I have been struggling with this for days. Is there an > > > > > explanation > > > > > for > > > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > > > > > Hi Bob. You cite the overall "phase opt and generate" times, but > > > > no > > > > data on how this is spread across the various optimization passes. > > > > > > > > What's the output of -ftime-report on your workload? > > > > > > > > In particular, are there any specific passes that are responsible > > > > for > > > > the growth in time (and thus where we can pinpoint a bug), or is > > > > the > > > > time evenly distributed across all of them? > > > > > > > > Sorry if this is a silly question > > > > Dave > > > > > > The only silly thing is that in an egregious display of monumental > > > arrogance > > > and ignorance, some years back I decided to take on the problem of > > > code > > > generation for a new front end without having had any prior > > > experience with > > > GCC internals or compiler theory, and without access to anybody who > > > actually > > > knew anything. > > > > > > I hope you've seen my response to Richard. For a compilation with > > > > > > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > > > > > > the next big component is > > > > > > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > > > > > > What th
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, 20 Mar 2026 15:07:38 -0400 David Malcolm via Gcc wrote: > Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in > function.cc that account their time to this timevar: > pass_thread_prologue_and_epilogue and > pass_late_thread_prologue_and_epilogue. > > Both of these passes ultimately call the function > rest_of_handle_thread_prologue_and_epilogue. > > So the slowdown is presumably somewhere inside there. When I compiled Bob's example program under Linux perf and examined it with hotspot, it showed about 1/2 the time is spent in lra_process_new_insns. 100% of that time, 5 levels down, is bitmap_list_find_element. A little higher up, 73% overall is spent in the caller, do_reload. I suspect that's too late in the game. I'm observing the optimizer doing its job, not why it has any job to do. But I was wrong once, not very long ago, actually, and not the first time then, either. So I pass it along. --jkl
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, 2026-03-20 at 10:46 -0500, Robert Dubner wrote: > > -Original Message- > > From: Richard Biener > > Sent: Friday, March 20, 2026 03:24 > > To: Robert Dubner > > Cc: [email protected] > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > time. > > > > On Fri, Mar 20, 2026 at 12:23 AM Robert Dubner > > wrote: > > > > > > It happens that COBOL has the COMPUTE statement. It takes the > > > form of, > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > We implement that by creating a temporary variable, using that as > > > the > > > target of an addition of AAA and BBB, and then doing an > > > assignment to > > > DDD. > > > (Recall that COBOL variables can be quite complex, so we are a > > > long way > > > from being able to do this with an ADD_EXPR.) > > > > > > We have determined that the way I've been producing GENERIC for > > > that > > > results in N-squared computation time somewhere in the middle > > > end. > > > > > > I have been tearing my hair out trying to figure out what's > > > causing that > > > N-squared behavior. I commented away the assignment, and I got > > > rid of > > > the > > > arithmetic. All that's left is the creation of the temporary, > > > and some > > > IF > > > statements that are generated to test for errors along the way of > > > the > > > computation. (COBOL has a very rich error-detection and > > > exception-generating facility. > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > times > > > for repetitions of that GIMPLE are shown here: > > > > > > phase opt Factor > > > Repeats & generate > > > 1,000 0.17 > > > 2,000 0.49 2.9 > > > 4,000 1.55 3.2 > > > 8,000 7.56 4.9 > > > 16,000 49.31 6.5 > > > 32,000 281.29 5.7 > > > > > > I have been struggling with this for days. Is there an > > > explanation for > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > > > I agree that the conditionals are a bit convoluted, and Jim and I > > > are > > > working to do COMPUTE in a much improved way. But I still feel a > > > need > > > to > > > understand what's going on! > > > > Can you share a COBOL testcase that shows what you observe above? > > Maybe two of them, so I can see how one does the "repeats"? > > > > It's always a bug in the middle-end unless the size of the initial > > GIMPLE > > code also grows squared. > > The program is simple enough: > ~ > program-id. prog. > data division. > working-storage section. > 01 aaa pic 999. > 01 bbb pic 999. > 01 ddd pic 999. > procedure division. > compute ddd = aaa + bbb > *> Repeat as necessary > compute ddd = aaa + bbb > goback. > ~ > > > > With 500 repeats of "compute ddd = aaa + bbb", -fdump-tree-gimple > produced a > file of 2.4Mbytes bytes. 1,000 repeats gives a file of 4.9Mbytes > > The respective -ftime-reports start off with > > Time variable wall GGC > phase parsing : 0.05 ( 3%) 4838k ( 8%) > phase opt and generate : 1.70 ( 97%) 53M ( 92%) > ... > thread pro- & epilogue : 0.91 ( 52%) 4120 ( 0%) > > and > > Time variable wall GGC > phase parsing : 0.07 ( 1%) 9523k ( 8%) > phase opt and generate : 6.25 ( 99%) 107M ( 92%) > ... > thread pro- & epilogue : 4.38 ( 71%) 4120 ( 0%) > > Doubling it again to 2,000 COMPUTE DDD = AAA + BBB produces a > 9.8Mbyte > gimple file, with these timings > > Time variable wall GGC > phase parsing : 0.14 ( 0%) 18M ( 8%) > phase opt and generate : 31.55 ( 99%) 213M ( 92%) > ... > thread pro- & epilogue : 27.55 ( 87%) 4120 ( 0%) > >
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote: > > > > -Original Message- > > From: David Malcolm > > Sent: Friday, March 20, 2026 10:10 > > To: Robert Dubner ; [email protected] > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > time. > > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > > It happens that COBOL has the COMPUTE statement. It takes the > > > form > > > of, > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > We implement that by creating a temporary variable, using that as > > > the > > > target of an addition of AAA and BBB, and then doing an > > > assignment to > > > DDD. > > > (Recall that COBOL variables can be quite complex, so we are a > > > long > > > way > > > from being able to do this with an ADD_EXPR.) > > > > > > We have determined that the way I've been producing GENERIC for > > > that > > > results in N-squared computation time somewhere in the middle > > > end. > > > > > > I have been tearing my hair out trying to figure out what's > > > causing > > > that > > > N-squared behavior. I commented away the assignment, and I got > > > rid > > > of the > > > arithmetic. All that's left is the creation of the temporary, > > > and > > > some IF > > > statements that are generated to test for errors along the way of > > > the > > > computation. (COBOL has a very rich error-detection and > > > exception-generating facility. > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > times > > > for repetitions of that GIMPLE are shown here: > > > > > > phase opt Factor > > > Repeats & generate > > > 1,000 0.17 > > > 2,000 0.49 2.9 > > > 4,000 1.55 3.2 > > > 8,000 7.56 4.9 > > > 16,000 49.31 6.5 > > > 32,000 281.29 5.7 > > > > > > I have been struggling with this for days. Is there an > > > explanation > > > for > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > Hi Bob. You cite the overall "phase opt and generate" times, but > > no > > data on how this is spread across the various optimization passes. > > > > What's the output of -ftime-report on your workload? > > > > In particular, are there any specific passes that are responsible > > for > > the growth in time (and thus where we can pinpoint a bug), or is > > the > > time evenly distributed across all of them? > > > > Sorry if this is a silly question > > Dave > > The only silly thing is that in an egregious display of monumental > arrogance > and ignorance, some years back I decided to take on the problem of > code > generation for a new front end without having had any prior > experience with > GCC internals or compiler theory, and without access to anybody who > actually > knew anything. > > I hope you've seen my response to Richard. For a compilation with > > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > > the next big component is > > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > > What that means is yet one more mystery to me. 64% of the wallclock time is being accounted to this timing item. Looking in timevar.def, I see that this timing item is: DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue") Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in function.cc that account their time to this timevar: pass_thread_prologue_and_epilogue and pass_late_thread_prologue_and_epilogue. Both of these passes ultimately call the function rest_of_handle_thread_prologue_and_epilogue. So the slowdown is presumably somewhere inside there. Dave > > Here's the entire output of -ftime-report-details: > > Time variable wall GGC > phase setup : 0.01 ( 0%) 150k ( 0%) > phase parsing : 1.33 ( 8%) 36M ( 12%) > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > phase last asm : 0.02 ( 0%) 377k ( 0%) > phase finalize : 0.01 ( 0%) 0 ( 0%)
RE: COBOL: Hoping for insight with middle-end computation time.
> -Original Message- > From: David Malcolm > Sent: Friday, March 20, 2026 10:10 > To: Robert Dubner ; [email protected] > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > It happens that COBOL has the COMPUTE statement. It takes the form > > of, > > for example, COMPUTE DDD = AAA + BBB. > > > > We implement that by creating a temporary variable, using that as the > > target of an addition of AAA and BBB, and then doing an assignment to > > DDD. > > (Recall that COBOL variables can be quite complex, so we are a long > > way > > from being able to do this with an ADD_EXPR.) > > > > We have determined that the way I've been producing GENERIC for that > > results in N-squared computation time somewhere in the middle end. > > > > I have been tearing my hair out trying to figure out what's causing > > that > > N-squared behavior. I commented away the assignment, and I got rid > > of the > > arithmetic. All that's left is the creation of the temporary, and > > some IF > > statements that are generated to test for errors along the way of the > > computation. (COBOL has a very rich error-detection and > > exception-generating facility. > > > > The remaining GIMPLE for a single iteration (as shown by > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > times > > for repetitions of that GIMPLE are shown here: > > > > phase opt Factor > > Repeats & generate > > 1,000 0.17 > > 2,000 0.49 2.9 > > 4,000 1.55 3.2 > > 8,000 7.56 4.9 > > 16,000 49.31 6.5 > > 32,000 281.29 5.7 > > > > I have been struggling with this for days. Is there an explanation > > for > > why the following GIMPLE is resulting in that N-squared behavior? > > Hi Bob. You cite the overall "phase opt and generate" times, but no > data on how this is spread across the various optimization passes. > > What's the output of -ftime-report on your workload? > > In particular, are there any specific passes that are responsible for > the growth in time (and thus where we can pinpoint a bug), or is the > time evenly distributed across all of them? > > Sorry if this is a silly question > Dave The only silly thing is that in an egregious display of monumental arrogance and ignorance, some years back I decided to take on the problem of code generation for a new front end without having had any prior experience with GCC internals or compiler theory, and without access to anybody who actually knew anything. I hope you've seen my response to Richard. For a compilation with phase opt and generate : 14.95 ( 92%) 259M ( 88%) the next big component is thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) What that means is yet one more mystery to me. Here's the entire output of -ftime-report-details: Time variable wall GGC phase setup: 0.01 ( 0%) 150k ( 0%) phase parsing : 1.33 ( 8%)36M ( 12%) phase opt and generate : 14.95 ( 92%) 259M ( 88%) phase last asm : 0.02 ( 0%) 377k ( 0%) phase finalize : 0.01 ( 0%) 0 ( 0%) garbage collection : 0.09 ( 1%) 0 ( 0%) callgraph construction : 0.13 ( 1%)12M ( 4%) `- CFG verifier: 0.02 ( 0%) 0 ( 0%) `- tree STMT verifier : 0.16 ( 1%) 0 ( 0%) `- symout : 0.01 ( 0%)10M ( 4%) `- tree SSA verifier : 0.04 ( 0%) 0 ( 0%) `- garbage collection : 0.01 ( 0%) 0 ( 0%) callgraph optimization : 0.04 ( 0%) 0 ( 0%) `- dominance computation : 0.01 ( 0%) 0 ( 0%) `- CFG verifier: 0.01 ( 0%) 0 ( 0%) `- tree STMT verifier : 0.21 ( 1%) 0 ( 0%) `- tree SSA verifier : 0.06 ( 0%) 0 ( 0%) `- garbage collection : 0.03 ( 0%) 0 ( 0%) callgraph ipa passes : 1.14 ( 7%)35M ( 12%) ipa function summary : 0.00 ( 0%) 1832 ( 0%) `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) ipa inlining heuristics: 0.00 ( 0%) 0 ( 0%) `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) `- tree STMT verifier : 0.02 ( 0%) 0
RE: COBOL: Hoping for insight with middle-end computation time.
> -Original Message- > From: Richard Biener > Sent: Friday, March 20, 2026 03:24 > To: Robert Dubner > Cc: [email protected] > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > On Fri, Mar 20, 2026 at 12:23 AM Robert Dubner wrote: > > > > It happens that COBOL has the COMPUTE statement. It takes the form of, > > for example, COMPUTE DDD = AAA + BBB. > > > > We implement that by creating a temporary variable, using that as the > > target of an addition of AAA and BBB, and then doing an assignment to > > DDD. > > (Recall that COBOL variables can be quite complex, so we are a long way > > from being able to do this with an ADD_EXPR.) > > > > We have determined that the way I've been producing GENERIC for that > > results in N-squared computation time somewhere in the middle end. > > > > I have been tearing my hair out trying to figure out what's causing that > > N-squared behavior. I commented away the assignment, and I got rid of > > the > > arithmetic. All that's left is the creation of the temporary, and some > > IF > > statements that are generated to test for errors along the way of the > > computation. (COBOL has a very rich error-detection and > > exception-generating facility. > > > > The remaining GIMPLE for a single iteration (as shown by > > -fdump-tree-gimple) is shown below. The "phase opt and generate" times > > for repetitions of that GIMPLE are shown here: > > > > phase opt Factor > > Repeats & generate > > 1,000 0.17 > > 2,000 0.49 2.9 > > 4,000 1.55 3.2 > > 8,000 7.56 4.9 > > 16,000 49.31 6.5 > > 32,000 281.29 5.7 > > > > I have been struggling with this for days. Is there an explanation for > > why the following GIMPLE is resulting in that N-squared behavior? > > > > I agree that the conditionals are a bit convoluted, and Jim and I are > > working to do COMPUTE in a much improved way. But I still feel a need > > to > > understand what's going on! > > Can you share a COBOL testcase that shows what you observe above? > Maybe two of them, so I can see how one does the "repeats"? > > It's always a bug in the middle-end unless the size of the initial GIMPLE > code also grows squared. The program is simple enough: ~ program-id. prog. datadivision. working-storage section. 01 aaa pic 999. 01 bbb pic 999. 01 ddd pic 999. procedure division. compute ddd = aaa + bbb *> Repeat as necessary compute ddd = aaa + bbb goback. ~ With 500 repeats of "compute ddd = aaa + bbb", -fdump-tree-gimple produced a file of 2.4Mbytes bytes. 1,000 repeats gives a file of 4.9Mbytes The respective -ftime-reports start off with Time variable wall GGC phase parsing : 0.05 ( 3%) 4838k ( 8%) phase opt and generate : 1.70 ( 97%)53M ( 92%) ... thread pro- & epilogue : 0.91 ( 52%) 4120 ( 0%) and Time variable wall GGC phase parsing : 0.07 ( 1%) 9523k ( 8%) phase opt and generate : 6.25 ( 99%) 107M ( 92%) ... thread pro- & epilogue : 4.38 ( 71%) 4120 ( 0%) Doubling it again to 2,000 COMPUTE DDD = AAA + BBB produces a 9.8Mbyte gimple file, with these timings Time variable wall GGC phase parsing : 0.14 ( 0%)18M ( 8%) phase opt and generate : 31.55 ( 99%) 213M ( 92%) ... thread pro- & epilogue : 27.55 ( 87%) 4120 ( 0%) That's the full case that got me started investigating. I think you'll find that the gimple from the full compilation is more complicated than necessary. If you apply this patch, you'll find it carves away a lot of generic that is irrelevant to this discussion: diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc index 4f71f9b1152..5f43e759b9f 100644 --- a/gcc/cobol/genapi.cc +++ b/gcc/cobol/genapi.cc @@ -5867,13 +5867,13 @@ parser_assign( size_t nC, cbl_num_result_t *C, TREEPLET tsource; treeplet_fill_source(tsource, sourceref); static bool check_for_error = true; -move_helper(erf, -destref, -sourceref, -tsource, -rounded, -check_for_error, -
Re: COBOL: Hoping for insight with middle-end computation time.
On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > It happens that COBOL has the COMPUTE statement. It takes the form > of, > for example, COMPUTE DDD = AAA + BBB. > > We implement that by creating a temporary variable, using that as the > target of an addition of AAA and BBB, and then doing an assignment to > DDD. > (Recall that COBOL variables can be quite complex, so we are a long > way > from being able to do this with an ADD_EXPR.) > > We have determined that the way I've been producing GENERIC for that > results in N-squared computation time somewhere in the middle end. > > I have been tearing my hair out trying to figure out what's causing > that > N-squared behavior. I commented away the assignment, and I got rid > of the > arithmetic. All that's left is the creation of the temporary, and > some IF > statements that are generated to test for errors along the way of the > computation. (COBOL has a very rich error-detection and > exception-generating facility. > > The remaining GIMPLE for a single iteration (as shown by > -fdump-tree-gimple) is shown below. The "phase opt and generate" > times > for repetitions of that GIMPLE are shown here: > > phase opt Factor > Repeats & generate > 1,000 0.17 > 2,000 0.49 2.9 > 4,000 1.55 3.2 > 8,000 7.56 4.9 > 16,000 49.31 6.5 > 32,000 281.29 5.7 > > I have been struggling with this for days. Is there an explanation > for > why the following GIMPLE is resulting in that N-squared behavior? Hi Bob. You cite the overall "phase opt and generate" times, but no data on how this is spread across the various optimization passes. What's the output of -ftime-report on your workload? In particular, are there any specific passes that are responsible for the growth in time (and thus where we can pinpoint a bug), or is the time evenly distributed across all of them? Sorry if this is a silly question Dave > > I agree that the conditionals are a bit convoluted, and Jim and I are > working to do COMPUTE in a much improved way. But I still feel a > need to > understand what's going on! > > Thanks so much for any insight into what might be happening here. > > _intermediate__stack501_503.0.0.data = &_stack501_data_517.0; > _intermediate__stack501_503.0.0.capacity = 16; > _intermediate__stack501_503.0.0.allocated = 16; > _intermediate__stack501_503.0.0.offset = 0; > _intermediate__stack501_503.0.0.name = &"_stack501"[0]; > _intermediate__stack501_503.0.0.picture = &""[0]; > _intermediate__stack501_503.0.0.initial = 0B; > _intermediate__stack501_503.0.0.parent = 0B; > _intermediate__stack501_503.0.0.occurs_lower = 0; > _intermediate__stack501_503.0.0.occurs_upper = 0; > _intermediate__stack501_503.0.0.attr = 4160; > _intermediate__stack501_503.0.0.type = 6; > _intermediate__stack501_503.0.0.level = 0; > _intermediate__stack501_503.0.0.digits = 37; > _intermediate__stack501_503.0.0.rdigits = 0; > _intermediate__stack501_503.0.0.encoding = 1; > _intermediate__stack501_503.0.0.alphabet = 0; > D.2812 = 0; > D.2813 = 0; > _1013 = D.2812 & 18; > if (_1013 != 0) goto ; else goto ; > : > goto ; > : > D.2814 = 0; > ..pa_erf.1519_1014 = ..pa_erf; > D.2813 = D.2813 | ..pa_erf.1519_1014; > if (D.2813 != 0) goto ; else goto ; > : > goto ; > : > : > : > >
Re: COBOL: Hoping for insight with middle-end computation time.
On Fri, Mar 20, 2026 at 12:23 AM Robert Dubner wrote: > > It happens that COBOL has the COMPUTE statement. It takes the form of, > for example, COMPUTE DDD = AAA + BBB. > > We implement that by creating a temporary variable, using that as the > target of an addition of AAA and BBB, and then doing an assignment to DDD. > (Recall that COBOL variables can be quite complex, so we are a long way > from being able to do this with an ADD_EXPR.) > > We have determined that the way I've been producing GENERIC for that > results in N-squared computation time somewhere in the middle end. > > I have been tearing my hair out trying to figure out what's causing that > N-squared behavior. I commented away the assignment, and I got rid of the > arithmetic. All that's left is the creation of the temporary, and some IF > statements that are generated to test for errors along the way of the > computation. (COBOL has a very rich error-detection and > exception-generating facility. > > The remaining GIMPLE for a single iteration (as shown by > -fdump-tree-gimple) is shown below. The "phase opt and generate" times > for repetitions of that GIMPLE are shown here: > > phase opt Factor > Repeats & generate > 1,000 0.17 > 2,000 0.49 2.9 > 4,000 1.55 3.2 > 8,000 7.56 4.9 > 16,000 49.31 6.5 > 32,000 281.29 5.7 > > I have been struggling with this for days. Is there an explanation for > why the following GIMPLE is resulting in that N-squared behavior? > > I agree that the conditionals are a bit convoluted, and Jim and I are > working to do COMPUTE in a much improved way. But I still feel a need to > understand what's going on! Can you share a COBOL testcase that shows what you observe above? Maybe two of them, so I can see how one does the "repeats"? It's always a bug in the middle-end unless the size of the initial GIMPLE code also grows squared. > > Thanks so much for any insight into what might be happening here. > > _intermediate__stack501_503.0.0.data = &_stack501_data_517.0; > _intermediate__stack501_503.0.0.capacity = 16; > _intermediate__stack501_503.0.0.allocated = 16; > _intermediate__stack501_503.0.0.offset = 0; > _intermediate__stack501_503.0.0.name = &"_stack501"[0]; > _intermediate__stack501_503.0.0.picture = &""[0]; > _intermediate__stack501_503.0.0.initial = 0B; > _intermediate__stack501_503.0.0.parent = 0B; > _intermediate__stack501_503.0.0.occurs_lower = 0; > _intermediate__stack501_503.0.0.occurs_upper = 0; > _intermediate__stack501_503.0.0.attr = 4160; > _intermediate__stack501_503.0.0.type = 6; > _intermediate__stack501_503.0.0.level = 0; > _intermediate__stack501_503.0.0.digits = 37; > _intermediate__stack501_503.0.0.rdigits = 0; > _intermediate__stack501_503.0.0.encoding = 1; > _intermediate__stack501_503.0.0.alphabet = 0; > D.2812 = 0; > D.2813 = 0; > _1013 = D.2812 & 18; > if (_1013 != 0) goto ; else goto ; > : > goto ; > : > D.2814 = 0; > ..pa_erf.1519_1014 = ..pa_erf; > D.2813 = D.2813 | ..pa_erf.1519_1014; > if (D.2813 != 0) goto ; else goto ; > : > goto ; > : > : > : > >
