Re: Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Alex Ameen
You can try to use the `requires` toposort routine to identify "Strongly
Connected Sub-Components", which is where I imagine you'll get the
best results. What you'll need to watch out for is undeclared ordering
requirements that parallelism would break.

The `m4sh` and `m4sugar` source code is documented in a lot of detail. The
manuals exclude that type of documentation because it's internal; but you
could keep yourself occupied for at least a month or two before you ran out
of topics to explore.

On Mon, Jun 13, 2022, 8:45 PM Dale R. Worley  wrote:

> Paul Eggert  writes:
> > In many Gnu projects, the 'configure' script is the biggest barrier to
> > building because it takes s long to run. Is there some way that we
> > could improve its performance without completely reengineering it, by
> > improving Bash so that it can parallelize 'configure' scripts?
>
> It seems to me that bash provides the needed tools -- "( ... ) &" lets
> you run things in parallel.  Similarly, if you've got a lot of small
> tasks with a complex web of dependencies, you can encode that in a
> "makefile".
>
> It seems to me that the heavy work is rebuilding how "configure" scripts
> are constructed based on which items can be run in parallel.  I've never
> seen any "metadocumentation" that laid out how all that worked.
>
> Dale
>
>


Re: Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Alex Ameen
Yeah honestly splitting most of the `configure` checks into multiple
threads is definitely possible.

Caching between projects is even a straightforward extension with systems
like `Nix`.

The "gotcha" here in both cases is that existing scripts that are living in
source tarballs are not feasible to "regenerate" in the general case. You
could have this ship out with future projects though if project authors
updated to new versions of Autoconf.


If you have a particularly slow package, you can optimize it in a few
hours. Largely this means "identify which tests 100% match the standard
implementation of a check" in which case you can fill in a cached value.
But what I think y'all are asking about is "can I safely use a cache from
one project in another project?" and the answer there is "no not really -
and please don't because it will be a nightmare to debug".

The nasty part about trying to naively share caches is that it will
probably work fine ~90% of the time. The problem is that the 10% that
misbehave are high risk for undefined behavior. My concern is the 0.5% that
appear to work fine, but "whoops we didn't know project X extended a macro
without changing the name - and now an ABI conflict in `gpgp` appears on
the third Sunday of every October causing it skip encryption silently" or
some absurd edge case.


I think optimizating "freshly generated" scripts is totally doable though.

On Mon, Jun 13, 2022, 5:40 PM Paul Eggert  wrote:

> In many Gnu projects, the 'configure' script is the biggest barrier to
> building because it takes s long to run. Is there some way that we
> could improve its performance without completely reengineering it, by
> improving Bash so that it can parallelize 'configure' scripts?
>
> For ideas about this, please see PaSh-JIT:
>
> Kallas K, Mustafa T, Bielak J, Karnikis D, Dang THY, Greenberg M,
> Vasilakis N. Practically correct, just-in-time shell script
> parallelization. Proc OSDI 22. July 2022.
> https://nikos.vasilak.is/p/pash:osdi:2022.pdf
>
> I've wanted something like this for *years* (I assigned a simpler
> version to my undergraduates but of course it was too much to expect
> them to implement it) and I hope some sort of parallelization like this
> can get into production with Bash at some point (or some other shell if
> Bash can't use this idea).
>
>


Re: Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Paul Eggert

On 6/13/22 18:25, Dale R. Worley wrote:

It seems to me that bash provides the needed tools -- "( ... ) &" lets
you run things in parallel.  Similarly, if you've got a lot of small
tasks with a complex web of dependencies, you can encode that in a
"makefile".

It seems to me that the heavy work is rebuilding how "configure" scripts
are constructed based on which items can be run in parallel.


Yes, all that could be done in theory, but it'd take a lot of hacking 
and it's been decades and it hasn't happened.


I'd rather have shell scripts "just work" in parallel with a minimum of 
fuss.





Re: Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Dale R. Worley
Paul Eggert  writes:
> In many Gnu projects, the 'configure' script is the biggest barrier to 
> building because it takes s long to run. Is there some way that we 
> could improve its performance without completely reengineering it, by 
> improving Bash so that it can parallelize 'configure' scripts?

It seems to me that bash provides the needed tools -- "( ... ) &" lets
you run things in parallel.  Similarly, if you've got a lot of small
tasks with a complex web of dependencies, you can encode that in a
"makefile".

It seems to me that the heavy work is rebuilding how "configure" scripts
are constructed based on which items can be run in parallel.  I've never
seen any "metadocumentation" that laid out how all that worked.

Dale



Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Paul Eggert
In many Gnu projects, the 'configure' script is the biggest barrier to 
building because it takes s long to run. Is there some way that we 
could improve its performance without completely reengineering it, by 
improving Bash so that it can parallelize 'configure' scripts?


For ideas about this, please see PaSh-JIT:

Kallas K, Mustafa T, Bielak J, Karnikis D, Dang THY, Greenberg M, 
Vasilakis N. Practically correct, just-in-time shell script 
parallelization. Proc OSDI 22. July 2022. 
https://nikos.vasilak.is/p/pash:osdi:2022.pdf


I've wanted something like this for *years* (I assigned a simpler 
version to my undergraduates but of course it was too much to expect 
them to implement it) and I hope some sort of parallelization like this 
can get into production with Bash at some point (or some other shell if 
Bash can't use this idea).