Re: [PATCH] Cygwin: Speed up mkimport

2020-12-16 Thread Jon Turney

On 26/11/2020 09:56, Mark Geisert wrote:

Cut mkimport elapsed time in half by forking each iteration of the two
time-consuming loops within.  Only do this if more than one CPU is
present.  In the second loop, combine the two 'objdump' calls into one
system() invocation to avoid a system() invocation per iteration.


Nice.  Thanks for looking into this.


@@ -86,8 +94,18 @@ for my $f (keys %text) {
  if (!$text{$f}) {
unlink $f;
  } else {
-   system $objcopy, '-R', '.text', $f and exit 1;
-   system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+   if ($forking && fork) {
+   # Testing shows parent does need to sleep a short time here,
+   # otherwise system is inundated with hundreds of objcopy processes
+   # and the forked perl processes that launched them.
+   my $delay = 0.01; # NOTE: Slower systems may need to raise this
+   select(undef, undef, undef, $delay); # Supports fractional seconds
+   } else {
+   # Do two objcopy calls at once to avoid one system() call overhead
+   system '(', $objcopy, '-R', '.text', $f, ')', '||',
+   $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+   exit 0 if $forking;
+   }
  }
  }
  


Hmm... not so sure about this.  This seems racy, as nothing ensures that 
these objcopies have finished before we combine all the produced .o 
files into a library.


I'm pretty sure with more understanding, this whole thing could be done 
better:  For example, from a brief look, it seems that the t-*.o files 
are produced by gas, and then we remove .bss and .data sections.  Could 
we not arrange to assemble these objects without those sections in the 
first place?


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-28 Thread Achim Gratz
Achim Gratz writes:
> That actually works, but the speedup is quite modest on my system
> (4C/8T) even though I've allowed it to use unlimited resources.  So it
> basically forks slower than the runtime for each of the invocations is.
> Some more speedup can be had if the assembler is run on actual files in
> the same way, but the best I've come up with goes from 93s to 47s and
> runs at 150% CPU (up from 85%).  Most of that time is spent in system,
> so forking and I/O.

Not that I really know what I'm doing, but creating a single .s file and
running as just once gets mkimport down to 21s / 110%.  Now the
resulting library doesn't actually link, because somehow the information
ends up in the wrong place…


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-28 Thread Achim Gratz
Achim Gratz writes:
> b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
> names.

That actually works, but the speedup is quite modest on my system
(4C/8T) even though I've allowed it to use unlimited resources.  So it
basically forks slower than the runtime for each of the invocations is.
Some more speedup can be had if the assembler is run on actual files in
the same way, but the best I've come up with goes from 93s to 47s and
runs at 150% CPU (up from 85%).  Most of that time is spent in system,
so forking and I/O.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Brian Inglis

On 2020-11-27 11:37, Achim Gratz wrote:

Mark Geisert writes:

Still faster than two system commands :-).  But thanks for the
comment;


It still seems you are barking up the wrong tree.


I thought I was merely grouping args, to get around Perl's
greedy arg list building for the system command.


Wot?  It just takes a list which you can build any which way you desire.
The other option is to give it the full command line in a string, which
does work for this script (but not on Windows).  If it finds shell
metacharacters in the arguments it'll run a shell, otherwise the forked
perl just does an execve.

If it's really the forking that is causing the slowdown, why not do
either of those things:

a) Generate a complete shell script and fork once to run that.

b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
names.

Getting the error codes back to the script and handling the error is
left as an exercise for the reader.


Use explicit binary paths to avoid path search overhead; for portability: /bin/ 
for base system, dir, file, and net utils including compressors, grep, and sed; 
/usr/bin/ otherwise; {/usr,}/sbin/ for some admin utils not elsewhere.


--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Achim Gratz
Mark Geisert writes:
> Still faster than two system commands :-).  But thanks for the
> comment;

It still seems you are barking up the wrong tree.

> I thought I was merely grouping args, to get around Perl's
> greedy arg list building for the system command.

Wot?  It just takes a list which you can build any which way you desire.
The other option is to give it the full command line in a string, which
does work for this script (but not on Windows).  If it finds shell
metacharacters in the arguments it'll run a shell, otherwise the forked
perl just does an execve.

If it's really the forking that is causing the slowdown, why not do
either of those things:

a) Generate a complete shell script and fork once to run that.

b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
names.

Getting the error codes back to the script and handling the error is
left as an exercise for the reader.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Mark Geisert

Jon Turney wrote:

On 26/11/2020 09:56, Mark Geisert wrote:

@@ -86,8 +94,18 @@ for my $f (keys %text) {
  if (!$text{$f}) {
  unlink $f;
  } else {
-    system $objcopy, '-R', '.text', $f and exit 1;
-    system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+    if ($forking && fork) {
+    # Testing shows parent does need to sleep a short time here,
+    # otherwise system is inundated with hundreds of objcopy processes
+    # and the forked perl processes that launched them.
+    my $delay = 0.01; # NOTE: Slower systems may need to raise this
+    select(undef, undef, undef, $delay); # Supports fractional seconds
+    } else {
+    # Do two objcopy calls at once to avoid one system() call overhead
+    system '(', $objcopy, '-R', '.text', $f, ')', '||',
+    $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+    exit 0 if $forking;
+    }
  }
  }


Hmm... not so sure about this.  This seems racy, as nothing ensures that these 
objcopies have finished before we combine all the produced .o files into a library.


Good point.  I've added a hash to track the forked pids, and after each of these 
two time-consuming loops finishes I loop over the pids list doing waitpid() on 
each pid.


I'm pretty sure with more understanding, this whole thing could be done better:  
For example, from a brief look, it seems that the t-*.o files are produced by gas, 
and then we remove .bss and .data sections.  Could we not arrange to assemble 
these objects without those sections in the first place?


I looked over as's options in its man page but could not see anything obvious.  I 
wonder if defining the sections explicitly as zero-length somehow in mkimport's 
assembler snippets would accomplish the same thing.  I'll try this next.


Note that mkimport operates both on those tiny object files it creates with as, 
but also on the object files created by the whole Cygwin build.  So adjusting the 
latter object files would need to be done somewhere else.

Thanks,

..mark



Re: [PATCH] Cygwin: Speed up mkimport

2020-11-27 Thread Mark Geisert

Achim Gratz wrote:

Mark Geisert writes:

+   # Do two objcopy calls at once to avoid one system() call overhead
+   system '(', $objcopy, '-R', '.text', $f, ')', '||',
+   $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;


That doesn't do what you think it does.  It in fact increases the
overhead since it'll start a shell that runs those two commands sand
will even needlessly start the first objcopy in a subshell.


Still faster than two system commands :-).  But thanks for the comment; I thought 
I was merely grouping args, to get around Perl's greedy arg list building for the 
system command.  After more experimenting I ended up with:

system '/bin/true', '||', $objcopy, '-R', '.text', $f, '||',
$objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
Kind of ugly, but better?  It obviates the need for parent to pace itself so the 
enclosing loop runs a bit faster.


..mark


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-26 Thread Achim Gratz
Mark Geisert writes:
> + # Do two objcopy calls at once to avoid one system() call overhead
> + system '(', $objcopy, '-R', '.text', $f, ')', '||',
> + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;

That doesn't do what you think it does.  It in fact increases the
overhead since it'll start a shell that runs those two commands sand
will even needlessly start the first objcopy in a subshell.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds


Re: [PATCH] Cygwin: Speed up mkimport

2020-11-26 Thread Mark Geisert

Previously, Mark Geisert wrote:

Cut mkimport elapsed time in half by forking each iteration of the two
time-consuming loops within.  Only do this if more than one CPU is
present.  In the second loop, combine the two 'objdump' calls into one

 ^^^
That should say objcopy.  The code is correct though.

..mark


[PATCH] Cygwin: Speed up mkimport

2020-11-26 Thread Mark Geisert
Cut mkimport elapsed time in half by forking each iteration of the two
time-consuming loops within.  Only do this if more than one CPU is
present.  In the second loop, combine the two 'objdump' calls into one
system() invocation to avoid a system() invocation per iteration.

---
 winsup/cygwin/mkimport | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/winsup/cygwin/mkimport b/winsup/cygwin/mkimport
index 2b08dfe3d..919dc305b 100755
--- a/winsup/cygwin/mkimport
+++ b/winsup/cygwin/mkimport
@@ -47,6 +47,9 @@ for my $sym (keys %replace) {
 $import{$fn} = $imp_sym;
 }
 
+my $ncpus = `grep -c ^processor /proc/cpuinfo`;
+my $forking = $ncpus > 1; # Decides if loops below should fork() each iteration
+
 for my $f (keys %text) {
 my $imp_sym = delete $import{$f};
 my $glob_sym = $text{$f};
@@ -56,25 +59,30 @@ for my $f (keys %text) {
$text{$f} = 0;
 } else {
$text{$f} = 1;
-   open my $as_fd, '|-', $as, '-o', "$dir/t-$f", "-";
-   if ($is64bit) {
-   print $as_fd <

[PATCH 3/3] Cygwin: Speed up dumper

2020-07-21 Thread Jon Turney
Stop after we've written the dump in response to the initial breakpoint
EXCEPTION_DEBUG_EVENT we recieve for attaching to the process.

(rather than bogusly sitting there for 20 seconds waiting for more debug
events from a stopped process after we've already written the dump).
---
 winsup/utils/dumper.cc | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/winsup/utils/dumper.cc b/winsup/utils/dumper.cc
index ace752464..e80758e0c 100644
--- a/winsup/utils/dumper.cc
+++ b/winsup/utils/dumper.cc
@@ -615,8 +615,6 @@ out:
 int
 dumper::collect_process_information ()
 {
-  int exception_level = 0;
-
   if (!sane ())
 return 0;
 
@@ -631,7 +629,7 @@ dumper::collect_process_information ()
 
   while (1)
 {
-  if (!WaitForDebugEvent (_event, 2))
+  if (!WaitForDebugEvent (_event, INFINITE))
return 0;
 
   deb_printf ("got debug event %d\n", current_event.dwDebugEventCode);
@@ -675,12 +673,6 @@ dumper::collect_process_information ()
 
case EXCEPTION_DEBUG_EVENT:
 
- exception_level++;
- if (exception_level == 2)
-   break;
- else if (exception_level > 2)
-   return 0;
-
  collect_memory_sections ();
 
  /* got all info. time to dump */
@@ -697,6 +689,9 @@ dumper::collect_process_information ()
  goto failed;
};
 
+ /* We're done */
+ goto failed;
+
  break;
 
default:
-- 
2.27.0



Cygwin64 vs. Cygwin (speed)

2015-11-03 Thread Jim Reisert AD1C
In another thread, I wrote (and Corinna replied):

>> I have tried 64-bit Cygwin in the past.  I do a lot of file I/O and
>> sorting/searching on largish test-based data sets, and 64-bit was
>> noticeably slower than 32-bit Cygwin,
>
> Hmm, I usually have the opposite impression...

I'm using malloc/calloc, is there a different memory allocator I
should be using for Cygwin64?

Thanks - Jim

-- 
Jim Reisert AD1C, , http://www.ad1c.us

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Cygwin64 vs. Cygwin (speed)

2015-11-03 Thread Corinna Vinschen
On Nov  3 07:59, Jim Reisert AD1C wrote:
> In another thread, I wrote (and Corinna replied):
> 
> >> I have tried 64-bit Cygwin in the past.  I do a lot of file I/O and
> >> sorting/searching on largish test-based data sets, and 64-bit was
> >> noticeably slower than 32-bit Cygwin,
> >
> > Hmm, I usually have the opposite impression...
> 
> I'm using malloc/calloc, is there a different memory allocator I
> should be using for Cygwin64?

There is none.  While Cygwin's malloc is slow in multi-threading
scenarios due to dumb locking (which I really hope to fix at one point),
it shouldn't be any slower on 64 bit compared to 32 bit.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat


pgp2UhF3Hkgjh.pgp
Description: PGP signature


Re: Cygwin speed difference on multiple cores -vs- single-core?

2010-08-14 Thread mike marchywka
On 8/13/10, Andy Nicholas  wrote:

 The scripts we're using form the basis of a build system to invoke GCC and
 an
 assembler lots of times throughout a directory tree of a few thousand items.

You can end up spending all your time chasing include paths that isn't
hard to do.

 to use multi-threaded builds. When running each testing method, the CPUs are
 barely loaded at all (10%, maybe) and there's almost no I/O that registers.
Is the disk light on ? This is almost always an indication of blocking
for something. Check task manager page faults for example.


 Btw, I don't think the issue is I/O. The disk I'm using is an SSD (OCZ
 Vertex
 2) which is fairly fast. But, the results repeat even if I try a regular
 7200
 RPM hard drive.

I should add this to my manifesto against adjectives along with
fast snail comments. 7200rpm=720/6 revs-per-second=120rps.
This puts rotation time somewhere in millisecond range. Track to
track seeks won't subtract from that. Memory of course is in
nanosecond range, 1e6 times faster.
The issue is likely
to be buffering strategies and syncing.



 Yeah, weird.

 andy






 --
 Problem reports:   http://cygwin.com/problems.html
 FAQ:   http://cygwin.com/faq/
 Documentation: http://cygwin.com/docs.html
 Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple




-- 
marchy...@gmail.com
note new address 2009-12-16:
Mike Marchywka
1975 Village Round
Marietta GA 30064
415-264-8477 (w)- use this
404-788-1216 (C)- leave message
989-348-4796 (P)- emergency only
marchy...@hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note: hotmail is censoring incoming mail using random criteria beyond
my control and often hangs my browser
but all my subscriptions are here..., try also marchy...@yahoo.com

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Cygwin speed difference on multiple cores -vs- single-core?

2010-08-13 Thread Andy Nicholas
Hi Folks,

When using cygwin, I've noticed that there seems to be a large speed 
difference when I boot my windows 7 (32-bit) machine in single-core mode 
versus the regular number of cores (4, Core i7-930).

I've read through the FAQ and didn't notice anything about this issue.

Normally, I would expect nearly no speed difference based on the Windows 
environment... but after some extensive timing tests it seems like the single-
core machine is usually at least 2x faster than using the same machine setup 
in multi-core mode. I limit the number of cores using MSCONFIG, advanced boot 
options.

We have some simple script and more complex scripts which show this behavior. 
The simple scripts do straightforward things like rm -rf over some directory 
trees. Even the simple scripts run slowly when the PC is booted with multiple 
cores.

Is this known behavior? Is there some way to work around it so I can boot my 
PC, use all the cores with other apps, and continue run cygwin 2x faster?

Thanks much,

andy



--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Cygwin speed difference on multiple cores -vs- single-core?

2010-08-13 Thread Tim Prince

On 8/13/2010 5:37 PM, Andy Nicholas wrote:

Hi Folks,

When using cygwin, I've noticed that there seems to be a large speed
difference when I boot my windows 7 (32-bit) machine in single-core mode
versus the regular number of cores (4, Core i7-930).

I've read through the FAQ and didn't notice anything about this issue.

Normally, I would expect nearly no speed difference based on the Windows
environment... but after some extensive timing tests it seems like the single-
core machine is usually at least 2x faster than using the same machine setup
in multi-core mode. I limit the number of cores using MSCONFIG, advanced boot
options.

We have some simple script and more complex scripts which show this behavior.
The simple scripts do straightforward things like rm -rf over some directory
trees. Even the simple scripts run slowly when the PC is booted with multiple
cores.

Is this known behavior? Is there some way to work around it so I can boot my
PC, use all the cores with other apps, and continue run cygwin 2x faster?

   


Several possibilities which you haven't addressed may affect this.
Are you comparing the performance of a single thread when locked to a 
single core, compared to when it is permitted to rotate among cores, 
with or without HyperThread enabled?
I've never run into anyone running win7 32-bit; it may have more such 
issues than the more common 64-bit.


--
Tim Prince


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Cygwin speed difference on multiple cores -vs- single-core?

2010-08-13 Thread mike marchywka
On 8/13/10, Andy Nicholas  wrote:
 Hi Folks,

 When using cygwin, I've noticed that there seems to be a large speed
 difference when I boot my windows 7 (32-bit) machine in single-core mode
 versus the regular number of cores (4, Core i7-930).

 I've read through the FAQ and didn't notice anything about this issue.

 Normally, I would expect nearly no speed difference based on the Windows
 environment... but after some extensive timing tests it seems like the
 single-
 core machine is usually at least 2x faster than using the same machine setup
 in multi-core mode. I limit the number of cores using MSCONFIG, advanced
 boot
 options.

 We have some simple script and more complex scripts which show this
 behavior.
 The simple scripts do straightforward things like rm -rf over some
 directory
 trees. Even the simple scripts run slowly when the PC is booted with
 multiple
 cores.

 Is this known behavior? Is there some way to work around it so I can boot my
 PC, use all the cores with other apps, and continue run cygwin 2x faster?

 Thanks much,

 andy



You want to look at details before concluding anything but if it is
real and you blame memory thrashing, I'd be curious to know about it.
This is hardly cygwin specific but people here may be interested.
Usually memory bottleneck kills you first and more processors can just
thrash. At least take a look at task manager and get some idea what
may be going on. Off hand it sounds like it may have more to do with
the file system details based on test you mention. Disk IO and
buffering and syncing can be an issue I would guess.

http://archives.free.net.ph/message/20081115.133519.47f76485.el.html


http://spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Cygwin speed difference on multiple cores -vs- single-core?

2010-08-13 Thread Andy Nicholas
Tim Prince n8tm at aol.com writes:

 Several possibilities which you haven't addressed may affect this.
 Are you comparing the performance of a single thread when locked to a 
 single core, compared to when it is permitted to rotate among cores, 
 with or without HyperThread enabled?
 I've never run into anyone running win7 32-bit; it may have more such 
 issues than the more common 64-bit.

The scripts we're using form the basis of a build system to invoke GCC and an 
assembler lots of times throughout a directory tree of a few thousand items. 
The tree itself on the file-system is not gigantic. I've tried to make sure 
that the environment has all the usual suspects disabled (virus-checking 
disabled, paging completely disabled for all disks, nothing else running in 
the background) before comparing anything.

I've been comparing using 2 different methods, one is the time to clean the 
tree using rm -rf via a makefile on empty directories and the other is to do 
a full build on a clean tree. When running make we don't use the -j option 
to use multi-threaded builds. When running each testing method, the CPUs are 
barely loaded at all (10%, maybe) and there's almost no I/O that registers.

Hyperthreading is disabled. I've tried comparisons when configuring the PC 
using msconfig to present 1 core, 2 cores, and 4 cores. The difference between 
1-core and 2 or 4 cores is dramatic with 1-core running 2x+ faster. There's 
almost no difference in speed between 2 cores and 4 cores. The disk is an SSD.

I've recently tried launching the original command-line window with its 
affinity locked to core0 and priority set to realtime. I've inspected the 
results using SysInternals' Process Explorer and spawned processes appear to 
be locked to core0. I made sure that the non-spawned processes 
like conhost.exe also had their affinities set and their priority raised to 
realtime. There's no difference in processing speed though.

Btw, I don't think the issue is I/O. The disk I'm using is an SSD (OCZ Vertex 
2) which is fairly fast. But, the results repeat even if I try a regular 7200 
RPM hard drive.

Yeah, weird.

andy






--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Cygwin speed

2007-10-05 Thread Oleg Volkov
I had come across with the following problem: after upgrading Cygwin from
version 1.5.19-4 to 1.5.24-2 my application (relational database)
began to function several times slower. The reason was a slowdown of
a function, which performs rebalancing of table index tree; this function
calls write() function very many times, at each call 4 bytes are updated
in index file (row number in tree node). To test performance of write()
function I created the following test program:

#include fcntl.h
#include unistd.h

int main(int argc, char **argv)
{
char chunk[64]=;
int i, fd;
if ((fd=open(tst_chunks.bin,
 O_CREAT|O_WRONLY|O_TRUNC,
 0666))0) return 1;
for (i=0; i100; i++)
if (write(fd,chunk,sizeof(chunk))!=sizeof(chunk)) return 1;
close(fd);
return 0;
}

When launched on Celeron 1.3MHz via time -p, it works:

  on 1.5.24-2 : 48 seconds;
  on 1.5.19-4 : 18 seconds.

After investigating differences between 1.5.24-2 and 1.5.19-4 I have found
out, that the problem is in function sig_dispatch_pending(), which is
called in the beginning of writev() function, which is called from write().
In function sig_dispatch_pending() the following has been changed:

void __stdcall
sig_dispatch_pending (bool fast)
{
 if (exit_state || _my_tls == _sig_tls || !sigq.start.next)  // 
version 1.5.19-4
  // if (exit_state || _my_tls == _sig_tls)  // 
version 1.5.24-2
{
  //...
  return;
}

  //...
  sig_send (myself, fast ? __SIGFLUSHFAST : __SIGFLUSH);
}

When make this modification in sources for 1.5.24-2 and rebuild cygwin1.dll,
my test program begins to work as fast as on 1.5.19-4. In message

  http://cygwin.com/ml/cygwin-developers/2006-07/msg00034.html

Brian Ford pointed to the following description of a change between
1.5.19-4 and 1.5.24-2:

  2006-02-24  Christopher Faylor  cgf at timesys dot com

* sigproc.cc (sigheld): Define new variable.
-  (sig_dispatch_pending): Don't check sigq since that's racy.
(sig_send): Set sigheld flag if __SIGHOLD is specified, reset it if
__SIGNOHOLD is specified.  Ignore flush signals if we're holding
signals.

I think, that maybe checking of sigq is a little bit racy, but it turns,
that getting rid of such a cheap check results in a great slowdown of
sig_dispatch_pending() function for most calls, when there are no pending
signals.

Maybe introducing a critical section or some other synchronization
mechanism would be a solution.

Oleg Volkov



--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-19 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Eric Blake on 3/7/2007 2:25 PM:
 Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending().
 
 And if cgf decides not to patch cygwin in this manner

Fortunately, snapshots are patched now.

, I can at least try to 
 patch bash to not call sigprocmask() if it knows the mask is not changing.

Unfortunately, it turned out to be harder than I expected to try and make
bash work around this issue - both readline and bash call sigprocmask, and
since they are not in the same binary, there is no way to make them share
state short of adding an API to readline.  Without remembering state, I
can't avoid the overhead of a context swap (even calling
sigprocmask(SIG_SETMASK,NULL,set) was unnecessarily swapping).  But I
don't want to add an API to readline to remember state when the next
release of cygwin already has a working sigprocmask.  So the upshot is
that bash builtins on cygwin 1.5.24 will remain slower than strictly
necessary.  Here's hoping that 1.7.0 isn't too far away!

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/1L784KuGfSFAYARAukzAJ4n9tMULVtyPnkPnhGfgCrCa1er2QCfW/P1
wXZYhvXG38SlXVXkY3t37C8=
=zgDF
-END PGP SIGNATURE-

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-07 Thread Christian Franke

Christopher Layne wrote:

On Fri, Mar 02, 2007 at 11:11:54AM -0800, Brian Dessent wrote:

Vinod Gupta wrote:


Cygwin was a slow by a factor of 3x. Is that normal?

Yes.  Emulation of POSIX functions which do not exist on Windows is
expensive.  Fork is especially bad, which is all you're really testing
there.


Where is the *continual* fork in his script btw?


There is no fork at all, the script uses only builtin shell commands.

This command prints the fork() count of a script on Cygwin:

$ strace bash ./script.sh | grep -c 'fork: 0 = fork()'


One reason for the slow execution of the script are 800 context 
switches done by Cygwin.


Bash calls sigprocmask() before starting each command, even for builtin 
commands.

Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending().
This is necessary because POSIX requires that at least one pending 
signal is dispatched by sigprocmask().
sig_dispatch_pending() sends a __SIGFLUSH* to self and this causes 2 
thread context switches: main-sig-main.


With the attached patch, sigprocmask() does nothing if the signal mask 
is not changed.

This reduces the context switches to 5000.
(Patch is only intended for testing, it at least breaks above POSIX rule)


I've run 4 tests scripts on 5 platforms:

Test 1: Original script, but with [[...]] instead of [...]:

i=100
while [[ $i -gt 0 ]]; do
j=$(((i/3+i*3)**3))
i=$((i-1))
done

Test 2: Original script unchanged:

i=100
while [ $i -gt 0 ]; do
...

Test 3: Original script with /100 iterations and using command version 
of [ (test):


i=1
while /usr/bin/[ $i -gt 0 ]; do
...

Test 4: A real world ./configure script


Results on same AMD64 3200+ @2GHz, XP SP2:

| Runtime (seconds) of test
|  1  2   3   4
---
Cygwin 1.5.24-2   77 84 138  33
Cygwin +patch 38 46 138  33
Linux on Virt.PC: 49 57  62  22
Linux on VMware:  29 34  23  20
Linux native: 23 29   7   6

(Linux = grml 0.9 live CD)

Observations:
- Shell scripts with many builtin commands would benefit from a Cygwin 
optimization preventing unnecessary context switches ...

- ... but this might not help for most real world scripts.
- fork() on Linux is also considerably slower when running in a VM on 
Windows.

- Bash's builtin [[...]] is faster than [...]


Christian

diff -up cygwin-1.5.24-2.orig/winsup/cygwin/signal.cc 
cygwin-1.5.24-2/winsup/cygwin/signal.cc
--- cygwin-1.5.24-2.orig/winsup/cygwin/signal.cc2006-07-05 
01:57:43.00100 +0200
+++ cygwin-1.5.24-2/winsup/cygwin/signal.cc 2007-03-07 19:23:27.59375 
+0100
@@ -153,7 +153,6 @@ sigprocmask (int how, const sigset_t *se
 int __stdcall
 handle_sigprocmask (int how, const sigset_t *set, sigset_t *oldset, sigset_t 
opmask)
 {
-  sig_dispatch_pending ();
   /* check that how is in right range */
   if (how != SIG_BLOCK  how != SIG_UNBLOCK  how != SIG_SETMASK)
 {
@@ -171,7 +170,8 @@ handle_sigprocmask (int how, const sigse
 
   if (set)
 {
-  sigset_t newmask = opmask;
+  sigset_t oldmask = opmask;
+  sigset_t newmask = oldmask;
   switch (how)
{
case SIG_BLOCK:
@@ -187,7 +187,11 @@ handle_sigprocmask (int how, const sigse
  newmask = *set;
  break;
}
-  set_signal_mask (newmask, opmask);
+  if (oldmask != newmask)
+   {
+ sig_dispatch_pending();
+ set_signal_mask (newmask, opmask);
+}
 }
   return 0;
 }

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/

Re: Cygwin speed

2007-03-07 Thread Eric Blake
Christian Franke Christian.Franke at t-online.de writes:

 Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending().
 This is necessary because POSIX requires that at least one pending 
 signal is dispatched by sigprocmask().

Actually, POSIX requires If there are any pending unblocked signals after the 
call to sigprocmask(), at least one of those signals shall be delivered before 
the call to sigprocmask() returns.

And the way I see it, if the mask is unchanged, then any signal that was 
unblocked before calling sigprocmask() should have already fired.  In other 
words, the only signals that sigprocmask() HAS to worry about are signals that 
just changed to unmasked; and if the mask isn't changing, then there is no need 
to flush the signal queue.

 
 With the attached patch, sigprocmask() does nothing if the signal mask 
 is not changed.
 This reduces the context switches to 5000.
 (Patch is only intended for testing, it at least breaks above POSIX rule)

I think your patch is still within the spirit of POSIX - I don't see the rule 
being broken.  I'll defer to cgf's judgment on this; but it sounds like a 
worthwhile patch to apply, even if it doesn't help the common case of non-
builtins.

And if cgf decides not to patch cygwin in this manner, I can at least try to 
patch bash to not call sigprocmask() if it knows the mask is not changing.

-- 
Eric Blake
volunteer cygwin bash maintainer



--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-07 Thread Christopher Faylor
On Wed, Mar 07, 2007 at 09:13:33PM +0100, Christian Franke wrote:
Christopher Layne wrote:
On Fri, Mar 02, 2007 at 11:11:54AM -0800, Brian Dessent wrote:
Vinod Gupta wrote:

Cygwin was a slow by a factor of 3x. Is that normal?
Yes.  Emulation of POSIX functions which do not exist on Windows is
expensive.  Fork is especially bad, which is all you're really testing
there.

Where is the *continual* fork in his script btw?

There is no fork at all, the script uses only builtin shell commands.

This command prints the fork() count of a script on Cygwin:

$ strace bash ./script.sh | grep -c 'fork: 0 = fork()'


One reason for the slow execution of the script are 800 context 
switches done by Cygwin.

Bash calls sigprocmask() before starting each command, even for builtin 
commands.
Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending().
This is necessary because POSIX requires that at least one pending 
signal is dispatched by sigprocmask().
sig_dispatch_pending() sends a __SIGFLUSH* to self and this causes 2 
thread context switches: main-sig-main.

With the attached patch, sigprocmask() does nothing if the signal mask 
is not changed.
This reduces the context switches to 5000.
(Patch is only intended for testing, it at least breaks above POSIX rule)

I removed the sig_dispatch_pending from handle_sigprocmask.  I don't see
any need for extra logic beyond that since you're doing tests that are
already being done in set_signal_mask.

I'll generate a snapshot with these changes for testing.

Thanks for the patch.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-07 Thread Christian Franke

Eric Blake wrote:

...
And the way I see it, if the mask is unchanged, then any signal that was 
unblocked before calling sigprocmask() should have already fired.  In other 
words, the only signals that sigprocmask() HAS to worry about are signals that 
just changed to unmasked;...
  


To handle this case, wouldn't it be necessary to call 
sig_dispatch_pending() *after* set_signal_mask() has unblocked the signal?


Christian


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-07 Thread Brian Ford
On Wed, 7 Mar 2007, Christopher Faylor wrote:

 I removed the sig_dispatch_pending from handle_sigprocmask.

Would now be a good time to ask this question again?

http://cygwin.com/ml/cygwin-developers/2006-07/msg00029.html

I assume the answer is still the same, though ;-(.

-- 
Brian Ford
Lead Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
the best safety device in any aircraft is a well-trained crew...

 


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-04 Thread Christopher Layne
On Fri, Mar 02, 2007 at 11:11:54AM -0800, Brian Dessent wrote:
 Vinod Gupta wrote:
 
  Cygwin was a slow by a factor of 3x. Is that normal?
 
 Yes.  Emulation of POSIX functions which do not exist on Windows is
 expensive.  Fork is especially bad, which is all you're really testing
 there.

Where is the *continual* fork in his script btw?

-cl

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Cygwin speed

2007-03-02 Thread Vinod Gupta

I ran the following loop under bash on three different machines:

i=100
while [ $i -gt 0 ]; do
 j=$(((i/3+i*3)**3))
 i=$((i-1))
done

Here is how long it took:

CPU   OSTime (secs)
---     --
P4/3.2GHz Linux RHEL4  41
Core Duo/2.2GHz   Mac OSX 10.4 43
Core Duo/2.4GHz   WinXP+Cygwin107

Cygwin was a slow by a factor of 3x. Is that normal?

Vinod


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-02 Thread Brian Dessent
Vinod Gupta wrote:

 Cygwin was a slow by a factor of 3x. Is that normal?

Yes.  Emulation of POSIX functions which do not exist on Windows is
expensive.  Fork is especially bad, which is all you're really testing
there.

Brian

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Cygwin speed

2007-03-02 Thread Thorsten Kampe
* Brian Dessent (Fri, 02 Mar 2007 11:11:54 -0800)
 Vinod Gupta wrote:
 Cygwin was a slow by a factor of 3x. Is that normal?
 
 Yes.

Actually no. The standard approximate guess is a factor of two which 
corresponds to Vinod's testings.


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/