Re: [PATCH] Cygwin: Speed up mkimport

2020-12-16 Thread Jon Turney

On 26/11/2020 09:56, Mark Geisert wrote:

Cut mkimport elapsed time in half by forking each iteration of the two
time-consuming loops within.  Only do this if more than one CPU is
present.  In the second loop, combine the two 'objdump' calls into one
system() invocation to avoid a system() invocation per iteration.


Nice.  Thanks for looking into this.


@@ -86,8 +94,18 @@ for my $f (keys %text) {
  if (!$text{$f}) {
unlink $f;
  } else {
-   system $objcopy, '-R', '.text', $f and exit 1;
-   system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+   if ($forking && fork) {
+   # Testing shows parent does need to sleep a short time here,
+   # otherwise system is inundated with hundreds of objcopy processes
+   # and the forked perl processes that launched them.
+   my $delay = 0.01; # NOTE: Slower systems may need to raise this
+   select(undef, undef, undef, $delay); # Supports fractional seconds
+   } else {
+   # Do two objcopy calls at once to avoid one system() call overhead
+   system '(', $objcopy, '-R', '.text', $f, ')', '||',
+   $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+   exit 0 if $forking;
+   }
  }
  }
  


Hmm... not so sure about this.  This seems racy, as nothing ensures that 
these objcopies have finished before we combine all the produced .o 
files into a library.


I'm pretty sure with more understanding, this whole thing could be done 
better:  For example, from a brief look, it seems that the t-*.o files 
are produced by gas, and then we remove .bss and .data sections.  Could 
we not arrange to assemble these objects without those sections in the 
first place?


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-28 Thread Achim Gratz
Achim Gratz writes:
> That actually works, but the speedup is quite modest on my system
> (4C/8T) even though I've allowed it to use unlimited resources.  So it
> basically forks slower than the runtime for each of the invocations is.
> Some more speedup can be had if the assembler is run on actual files in
> the same way, but the best I've come up with goes from 93s to 47s and
> runs at 150% CPU (up from 85%).  Most of that time is spent in system,
> so forking and I/O.

Not that I really know what I'm doing, but creating a single .s file and
running as just once gets mkimport down to 21s / 110%.  Now the
resulting library doesn't actually link, because somehow the information
ends up in the wrong place…


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-28 Thread Achim Gratz
Achim Gratz writes:
> b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
> names.

That actually works, but the speedup is quite modest on my system
(4C/8T) even though I've allowed it to use unlimited resources.  So it
basically forks slower than the runtime for each of the invocations is.
Some more speedup can be had if the assembler is run on actual files in
the same way, but the best I've come up with goes from 93s to 47s and
runs at 150% CPU (up from 85%).  Most of that time is spent in system,
so forking and I/O.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Brian Inglis

On 2020-11-27 11:37, Achim Gratz wrote:

Mark Geisert writes:

Still faster than two system commands :-).  But thanks for the
comment;


It still seems you are barking up the wrong tree.


I thought I was merely grouping args, to get around Perl's
greedy arg list building for the system command.


Wot?  It just takes a list which you can build any which way you desire.
The other option is to give it the full command line in a string, which
does work for this script (but not on Windows).  If it finds shell
metacharacters in the arguments it'll run a shell, otherwise the forked
perl just does an execve.

If it's really the forking that is causing the slowdown, why not do
either of those things:

a) Generate a complete shell script and fork once to run that.

b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
names.

Getting the error codes back to the script and handling the error is
left as an exercise for the reader.


Use explicit binary paths to avoid path search overhead; for portability: /bin/ 
for base system, dir, file, and net utils including compressors, grep, and sed; 
/usr/bin/ otherwise; {/usr,}/sbin/ for some admin utils not elsewhere.


--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Achim Gratz
Mark Geisert writes:
> Still faster than two system commands :-).  But thanks for the
> comment;

It still seems you are barking up the wrong tree.

> I thought I was merely grouping args, to get around Perl's
> greedy arg list building for the system command.

Wot?  It just takes a list which you can build any which way you desire.
The other option is to give it the full command line in a string, which
does work for this script (but not on Windows).  If it finds shell
metacharacters in the arguments it'll run a shell, otherwise the forked
perl just does an execve.

If it's really the forking that is causing the slowdown, why not do
either of those things:

a) Generate a complete shell script and fork once to run that.

b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
names.

Getting the error codes back to the script and handling the error is
left as an exercise for the reader.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Mark Geisert

Jon Turney wrote:

On 26/11/2020 09:56, Mark Geisert wrote:

@@ -86,8 +94,18 @@ for my $f (keys %text) {
  if (!$text{$f}) {
  unlink $f;
  } else {
-    system $objcopy, '-R', '.text', $f and exit 1;
-    system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+    if ($forking && fork) {
+    # Testing shows parent does need to sleep a short time here,
+    # otherwise system is inundated with hundreds of objcopy processes
+    # and the forked perl processes that launched them.
+    my $delay = 0.01; # NOTE: Slower systems may need to raise this
+    select(undef, undef, undef, $delay); # Supports fractional seconds
+    } else {
+    # Do two objcopy calls at once to avoid one system() call overhead
+    system '(', $objcopy, '-R', '.text', $f, ')', '||',
+    $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+    exit 0 if $forking;
+    }
  }
  }


Hmm... not so sure about this.  This seems racy, as nothing ensures that these 
objcopies have finished before we combine all the produced .o files into a library.


Good point.  I've added a hash to track the forked pids, and after each of these 
two time-consuming loops finishes I loop over the pids list doing waitpid() on 
each pid.


I'm pretty sure with more understanding, this whole thing could be done better:  
For example, from a brief look, it seems that the t-*.o files are produced by gas, 
and then we remove .bss and .data sections.  Could we not arrange to assemble 
these objects without those sections in the first place?


I looked over as's options in its man page but could not see anything obvious.  I 
wonder if defining the sections explicitly as zero-length somehow in mkimport's 
assembler snippets would accomplish the same thing.  I'll try this next.


Note that mkimport operates both on those tiny object files it creates with as, 
but also on the object files created by the whole Cygwin build.  So adjusting the 
latter object files would need to be done somewhere else.

Thanks,

..mark



Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Mark Geisert

Achim Gratz wrote:

Mark Geisert writes:

+   # Do two objcopy calls at once to avoid one system() call overhead
+   system '(', $objcopy, '-R', '.text', $f, ')', '||',
+   $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;


That doesn't do what you think it does.  It in fact increases the
overhead since it'll start a shell that runs those two commands sand
will even needlessly start the first objcopy in a subshell.


Still faster than two system commands :-).  But thanks for the comment; I thought 
I was merely grouping args, to get around Perl's greedy arg list building for the 
system command.  After more experimenting I ended up with:

system '/bin/true', '||', $objcopy, '-R', '.text', $f, '||',
$objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
Kind of ugly, but better?  It obviates the need for parent to pace itself so the 
enclosing loop runs a bit faster.


..mark


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-26 Thread Achim Gratz
Mark Geisert writes:
> + # Do two objcopy calls at once to avoid one system() call overhead
> + system '(', $objcopy, '-R', '.text', $f, ')', '||',
> + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;

That doesn't do what you think it does.  It in fact increases the
overhead since it'll start a shell that runs those two commands sand
will even needlessly start the first objcopy in a subshell.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-26 Thread Mark Geisert

Previously, Mark Geisert wrote:

Cut mkimport elapsed time in half by forking each iteration of the two
time-consuming loops within.  Only do this if more than one CPU is
present.  In the second loop, combine the two 'objdump' calls into one

 ^^^
That should say objcopy.  The code is correct though.

..mark