Re: [PATCH] Cygwin: Speed up mkimport
On 26/11/2020 09:56, Mark Geisert wrote: Cut mkimport elapsed time in half by forking each iteration of the two time-consuming loops within. Only do this if more than one CPU is present. In the second loop, combine the two 'objdump' calls into one system() invocation to avoid a system() invocation per iteration. Nice. Thanks for looking into this. @@ -86,8 +94,18 @@ for my $f (keys %text) { if (!$text{$f}) { unlink $f; } else { - system $objcopy, '-R', '.text', $f and exit 1; - system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; + if ($forking && fork) { + # Testing shows parent does need to sleep a short time here, + # otherwise system is inundated with hundreds of objcopy processes + # and the forked perl processes that launched them. + my $delay = 0.01; # NOTE: Slower systems may need to raise this + select(undef, undef, undef, $delay); # Supports fractional seconds + } else { + # Do two objcopy calls at once to avoid one system() call overhead + system '(', $objcopy, '-R', '.text', $f, ')', '||', + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; + exit 0 if $forking; + } } } Hmm... not so sure about this. This seems racy, as nothing ensures that these objcopies have finished before we combine all the produced .o files into a library. I'm pretty sure with more understanding, this whole thing could be done better: For example, from a brief look, it seems that the t-*.o files are produced by gas, and then we remove .bss and .data sections. Could we not arrange to assemble these objects without those sections in the first place?
Re: [PATCH] Cygwin: Speed up mkimport
Achim Gratz writes: > That actually works, but the speedup is quite modest on my system > (4C/8T) even though I've allowed it to use unlimited resources. So it > basically forks slower than the runtime for each of the invocations is. > Some more speedup can be had if the assembler is run on actual files in > the same way, but the best I've come up with goes from 93s to 47s and > runs at 150% CPU (up from 85%). Most of that time is spent in system, > so forking and I/O. Not that I really know what I'm doing, but creating a single .s file and running as just once gets mkimport down to 21s / 110%. Now the resulting library doesn't actually link, because somehow the information ends up in the wrong place… Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Wavetables for the Terratec KOMPLEXER: http://Synth.Stromeko.net/Downloads.html#KomplexerWaves
Re: [PATCH] Cygwin: Speed up mkimport
Achim Gratz writes: > b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file > names. That actually works, but the speedup is quite modest on my system (4C/8T) even though I've allowed it to use unlimited resources. So it basically forks slower than the runtime for each of the invocations is. Some more speedup can be had if the assembler is run on actual files in the same way, but the best I've come up with goes from 93s to 47s and runs at 150% CPU (up from 85%). Most of that time is spent in system, so forking and I/O. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Factory and User Sound Singles for Waldorf Blofeld: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Re: [PATCH] Cygwin: Speed up mkimport
On 2020-11-27 11:37, Achim Gratz wrote: Mark Geisert writes: Still faster than two system commands :-). But thanks for the comment; It still seems you are barking up the wrong tree. I thought I was merely grouping args, to get around Perl's greedy arg list building for the system command. Wot? It just takes a list which you can build any which way you desire. The other option is to give it the full command line in a string, which does work for this script (but not on Windows). If it finds shell metacharacters in the arguments it'll run a shell, otherwise the forked perl just does an execve. If it's really the forking that is causing the slowdown, why not do either of those things: a) Generate a complete shell script and fork once to run that. b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file names. Getting the error codes back to the script and handling the error is left as an exercise for the reader. Use explicit binary paths to avoid path search overhead; for portability: /bin/ for base system, dir, file, and net utils including compressors, grep, and sed; /usr/bin/ otherwise; {/usr,}/sbin/ for some admin utils not elsewhere. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.]
Re: [PATCH] Cygwin: Speed up mkimport
Mark Geisert writes: > Still faster than two system commands :-). But thanks for the > comment; It still seems you are barking up the wrong tree. > I thought I was merely grouping args, to get around Perl's > greedy arg list building for the system command. Wot? It just takes a list which you can build any which way you desire. The other option is to give it the full command line in a string, which does work for this script (but not on Windows). If it finds shell metacharacters in the arguments it'll run a shell, otherwise the forked perl just does an execve. If it's really the forking that is causing the slowdown, why not do either of those things: a) Generate a complete shell script and fork once to run that. b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file names. Getting the error codes back to the script and handling the error is left as an exercise for the reader. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Factory and User Sound Singles for Waldorf Q+, Q and microQ: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Re: [PATCH] Cygwin: Speed up mkimport
Jon Turney wrote: On 26/11/2020 09:56, Mark Geisert wrote: @@ -86,8 +94,18 @@ for my $f (keys %text) { if (!$text{$f}) { unlink $f; } else { - system $objcopy, '-R', '.text', $f and exit 1; - system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; + if ($forking && fork) { + # Testing shows parent does need to sleep a short time here, + # otherwise system is inundated with hundreds of objcopy processes + # and the forked perl processes that launched them. + my $delay = 0.01; # NOTE: Slower systems may need to raise this + select(undef, undef, undef, $delay); # Supports fractional seconds + } else { + # Do two objcopy calls at once to avoid one system() call overhead + system '(', $objcopy, '-R', '.text', $f, ')', '||', + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; + exit 0 if $forking; + } } } Hmm... not so sure about this. This seems racy, as nothing ensures that these objcopies have finished before we combine all the produced .o files into a library. Good point. I've added a hash to track the forked pids, and after each of these two time-consuming loops finishes I loop over the pids list doing waitpid() on each pid. I'm pretty sure with more understanding, this whole thing could be done better: For example, from a brief look, it seems that the t-*.o files are produced by gas, and then we remove .bss and .data sections. Could we not arrange to assemble these objects without those sections in the first place? I looked over as's options in its man page but could not see anything obvious. I wonder if defining the sections explicitly as zero-length somehow in mkimport's assembler snippets would accomplish the same thing. I'll try this next. Note that mkimport operates both on those tiny object files it creates with as, but also on the object files created by the whole Cygwin build. So adjusting the latter object files would need to be done somewhere else. Thanks, ..mark
Re: [PATCH] Cygwin: Speed up mkimport
Achim Gratz wrote: Mark Geisert writes: + # Do two objcopy calls at once to avoid one system() call overhead + system '(', $objcopy, '-R', '.text', $f, ')', '||', + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; That doesn't do what you think it does. It in fact increases the overhead since it'll start a shell that runs those two commands sand will even needlessly start the first objcopy in a subshell. Still faster than two system commands :-). But thanks for the comment; I thought I was merely grouping args, to get around Perl's greedy arg list building for the system command. After more experimenting I ended up with: system '/bin/true', '||', $objcopy, '-R', '.text', $f, '||', $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; Kind of ugly, but better? It obviates the need for parent to pace itself so the enclosing loop runs a bit faster. ..mark
Re: [PATCH] Cygwin: Speed up mkimport
Mark Geisert writes: > + # Do two objcopy calls at once to avoid one system() call overhead > + system '(', $objcopy, '-R', '.text', $f, ')', '||', > + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1; That doesn't do what you think it does. It in fact increases the overhead since it'll start a shell that runs those two commands sand will even needlessly start the first objcopy in a subshell. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Factory and User Sound Singles for Waldorf rackAttack: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Re: [PATCH] Cygwin: Speed up mkimport
Previously, Mark Geisert wrote: Cut mkimport elapsed time in half by forking each iteration of the two time-consuming loops within. Only do this if more than one CPU is present. In the second loop, combine the two 'objdump' calls into one ^^^ That should say objcopy. The code is correct though. ..mark
[PATCH] Cygwin: Speed up mkimport
Cut mkimport elapsed time in half by forking each iteration of the two time-consuming loops within. Only do this if more than one CPU is present. In the second loop, combine the two 'objdump' calls into one system() invocation to avoid a system() invocation per iteration. --- winsup/cygwin/mkimport | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/winsup/cygwin/mkimport b/winsup/cygwin/mkimport index 2b08dfe3d..919dc305b 100755 --- a/winsup/cygwin/mkimport +++ b/winsup/cygwin/mkimport @@ -47,6 +47,9 @@ for my $sym (keys %replace) { $import{$fn} = $imp_sym; } +my $ncpus = `grep -c ^processor /proc/cpuinfo`; +my $forking = $ncpus > 1; # Decides if loops below should fork() each iteration + for my $f (keys %text) { my $imp_sym = delete $import{$f}; my $glob_sym = $text{$f}; @@ -56,25 +59,30 @@ for my $f (keys %text) { $text{$f} = 0; } else { $text{$f} = 1; - open my $as_fd, '|-', $as, '-o', "$dir/t-$f", "-"; - if ($is64bit) { - print $as_fd <
[PATCH 3/3] Cygwin: Speed up dumper
Stop after we've written the dump in response to the initial breakpoint EXCEPTION_DEBUG_EVENT we recieve for attaching to the process. (rather than bogusly sitting there for 20 seconds waiting for more debug events from a stopped process after we've already written the dump). --- winsup/utils/dumper.cc | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/winsup/utils/dumper.cc b/winsup/utils/dumper.cc index ace752464..e80758e0c 100644 --- a/winsup/utils/dumper.cc +++ b/winsup/utils/dumper.cc @@ -615,8 +615,6 @@ out: int dumper::collect_process_information () { - int exception_level = 0; - if (!sane ()) return 0; @@ -631,7 +629,7 @@ dumper::collect_process_information () while (1) { - if (!WaitForDebugEvent (_event, 2)) + if (!WaitForDebugEvent (_event, INFINITE)) return 0; deb_printf ("got debug event %d\n", current_event.dwDebugEventCode); @@ -675,12 +673,6 @@ dumper::collect_process_information () case EXCEPTION_DEBUG_EVENT: - exception_level++; - if (exception_level == 2) - break; - else if (exception_level > 2) - return 0; - collect_memory_sections (); /* got all info. time to dump */ @@ -697,6 +689,9 @@ dumper::collect_process_information () goto failed; }; + /* We're done */ + goto failed; + break; default: -- 2.27.0
Cygwin64 vs. Cygwin (speed)
In another thread, I wrote (and Corinna replied): >> I have tried 64-bit Cygwin in the past. I do a lot of file I/O and >> sorting/searching on largish test-based data sets, and 64-bit was >> noticeably slower than 32-bit Cygwin, > > Hmm, I usually have the opposite impression... I'm using malloc/calloc, is there a different memory allocator I should be using for Cygwin64? Thanks - Jim -- Jim Reisert AD1C,, http://www.ad1c.us -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Cygwin64 vs. Cygwin (speed)
On Nov 3 07:59, Jim Reisert AD1C wrote: > In another thread, I wrote (and Corinna replied): > > >> I have tried 64-bit Cygwin in the past. I do a lot of file I/O and > >> sorting/searching on largish test-based data sets, and 64-bit was > >> noticeably slower than 32-bit Cygwin, > > > > Hmm, I usually have the opposite impression... > > I'm using malloc/calloc, is there a different memory allocator I > should be using for Cygwin64? There is none. While Cygwin's malloc is slow in multi-threading scenarios due to dumb locking (which I really hope to fix at one point), it shouldn't be any slower on 64 bit compared to 32 bit. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgp2UhF3Hkgjh.pgp Description: PGP signature
Re: Cygwin speed difference on multiple cores -vs- single-core?
On 8/13/10, Andy Nicholas wrote: The scripts we're using form the basis of a build system to invoke GCC and an assembler lots of times throughout a directory tree of a few thousand items. You can end up spending all your time chasing include paths that isn't hard to do. to use multi-threaded builds. When running each testing method, the CPUs are barely loaded at all (10%, maybe) and there's almost no I/O that registers. Is the disk light on ? This is almost always an indication of blocking for something. Check task manager page faults for example. Btw, I don't think the issue is I/O. The disk I'm using is an SSD (OCZ Vertex 2) which is fairly fast. But, the results repeat even if I try a regular 7200 RPM hard drive. I should add this to my manifesto against adjectives along with fast snail comments. 7200rpm=720/6 revs-per-second=120rps. This puts rotation time somewhere in millisecond range. Track to track seeks won't subtract from that. Memory of course is in nanosecond range, 1e6 times faster. The issue is likely to be buffering strategies and syncing. Yeah, weird. andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple -- marchy...@gmail.com note new address 2009-12-16: Mike Marchywka 1975 Village Round Marietta GA 30064 415-264-8477 (w)- use this 404-788-1216 (C)- leave message 989-348-4796 (P)- emergency only marchy...@hotmail.com Note: If I am asking for free stuff, I normally use for hobby/non-profit information but may use in investment forums, public and private. Please indicate any concerns if applicable. Note: hotmail is censoring incoming mail using random criteria beyond my control and often hangs my browser but all my subscriptions are here..., try also marchy...@yahoo.com -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Cygwin speed difference on multiple cores -vs- single-core?
Hi Folks, When using cygwin, I've noticed that there seems to be a large speed difference when I boot my windows 7 (32-bit) machine in single-core mode versus the regular number of cores (4, Core i7-930). I've read through the FAQ and didn't notice anything about this issue. Normally, I would expect nearly no speed difference based on the Windows environment... but after some extensive timing tests it seems like the single- core machine is usually at least 2x faster than using the same machine setup in multi-core mode. I limit the number of cores using MSCONFIG, advanced boot options. We have some simple script and more complex scripts which show this behavior. The simple scripts do straightforward things like rm -rf over some directory trees. Even the simple scripts run slowly when the PC is booted with multiple cores. Is this known behavior? Is there some way to work around it so I can boot my PC, use all the cores with other apps, and continue run cygwin 2x faster? Thanks much, andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Cygwin speed difference on multiple cores -vs- single-core?
On 8/13/2010 5:37 PM, Andy Nicholas wrote: Hi Folks, When using cygwin, I've noticed that there seems to be a large speed difference when I boot my windows 7 (32-bit) machine in single-core mode versus the regular number of cores (4, Core i7-930). I've read through the FAQ and didn't notice anything about this issue. Normally, I would expect nearly no speed difference based on the Windows environment... but after some extensive timing tests it seems like the single- core machine is usually at least 2x faster than using the same machine setup in multi-core mode. I limit the number of cores using MSCONFIG, advanced boot options. We have some simple script and more complex scripts which show this behavior. The simple scripts do straightforward things like rm -rf over some directory trees. Even the simple scripts run slowly when the PC is booted with multiple cores. Is this known behavior? Is there some way to work around it so I can boot my PC, use all the cores with other apps, and continue run cygwin 2x faster? Several possibilities which you haven't addressed may affect this. Are you comparing the performance of a single thread when locked to a single core, compared to when it is permitted to rotate among cores, with or without HyperThread enabled? I've never run into anyone running win7 32-bit; it may have more such issues than the more common 64-bit. -- Tim Prince -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Cygwin speed difference on multiple cores -vs- single-core?
On 8/13/10, Andy Nicholas wrote: Hi Folks, When using cygwin, I've noticed that there seems to be a large speed difference when I boot my windows 7 (32-bit) machine in single-core mode versus the regular number of cores (4, Core i7-930). I've read through the FAQ and didn't notice anything about this issue. Normally, I would expect nearly no speed difference based on the Windows environment... but after some extensive timing tests it seems like the single- core machine is usually at least 2x faster than using the same machine setup in multi-core mode. I limit the number of cores using MSCONFIG, advanced boot options. We have some simple script and more complex scripts which show this behavior. The simple scripts do straightforward things like rm -rf over some directory trees. Even the simple scripts run slowly when the PC is booted with multiple cores. Is this known behavior? Is there some way to work around it so I can boot my PC, use all the cores with other apps, and continue run cygwin 2x faster? Thanks much, andy You want to look at details before concluding anything but if it is real and you blame memory thrashing, I'd be curious to know about it. This is hardly cygwin specific but people here may be interested. Usually memory bottleneck kills you first and more processors can just thrash. At least take a look at task manager and get some idea what may be going on. Off hand it sounds like it may have more to do with the file system details based on test you mention. Disk IO and buffering and syncing can be an issue I would guess. http://archives.free.net.ph/message/20081115.133519.47f76485.el.html http://spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Cygwin speed difference on multiple cores -vs- single-core?
Tim Prince n8tm at aol.com writes: Several possibilities which you haven't addressed may affect this. Are you comparing the performance of a single thread when locked to a single core, compared to when it is permitted to rotate among cores, with or without HyperThread enabled? I've never run into anyone running win7 32-bit; it may have more such issues than the more common 64-bit. The scripts we're using form the basis of a build system to invoke GCC and an assembler lots of times throughout a directory tree of a few thousand items. The tree itself on the file-system is not gigantic. I've tried to make sure that the environment has all the usual suspects disabled (virus-checking disabled, paging completely disabled for all disks, nothing else running in the background) before comparing anything. I've been comparing using 2 different methods, one is the time to clean the tree using rm -rf via a makefile on empty directories and the other is to do a full build on a clean tree. When running make we don't use the -j option to use multi-threaded builds. When running each testing method, the CPUs are barely loaded at all (10%, maybe) and there's almost no I/O that registers. Hyperthreading is disabled. I've tried comparisons when configuring the PC using msconfig to present 1 core, 2 cores, and 4 cores. The difference between 1-core and 2 or 4 cores is dramatic with 1-core running 2x+ faster. There's almost no difference in speed between 2 cores and 4 cores. The disk is an SSD. I've recently tried launching the original command-line window with its affinity locked to core0 and priority set to realtime. I've inspected the results using SysInternals' Process Explorer and spawned processes appear to be locked to core0. I made sure that the non-spawned processes like conhost.exe also had their affinities set and their priority raised to realtime. There's no difference in processing speed though. Btw, I don't think the issue is I/O. The disk I'm using is an SSD (OCZ Vertex 2) which is fairly fast. But, the results repeat even if I try a regular 7200 RPM hard drive. Yeah, weird. andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Cygwin speed
I had come across with the following problem: after upgrading Cygwin from version 1.5.19-4 to 1.5.24-2 my application (relational database) began to function several times slower. The reason was a slowdown of a function, which performs rebalancing of table index tree; this function calls write() function very many times, at each call 4 bytes are updated in index file (row number in tree node). To test performance of write() function I created the following test program: #include fcntl.h #include unistd.h int main(int argc, char **argv) { char chunk[64]=; int i, fd; if ((fd=open(tst_chunks.bin, O_CREAT|O_WRONLY|O_TRUNC, 0666))0) return 1; for (i=0; i100; i++) if (write(fd,chunk,sizeof(chunk))!=sizeof(chunk)) return 1; close(fd); return 0; } When launched on Celeron 1.3MHz via time -p, it works: on 1.5.24-2 : 48 seconds; on 1.5.19-4 : 18 seconds. After investigating differences between 1.5.24-2 and 1.5.19-4 I have found out, that the problem is in function sig_dispatch_pending(), which is called in the beginning of writev() function, which is called from write(). In function sig_dispatch_pending() the following has been changed: void __stdcall sig_dispatch_pending (bool fast) { if (exit_state || _my_tls == _sig_tls || !sigq.start.next) // version 1.5.19-4 // if (exit_state || _my_tls == _sig_tls) // version 1.5.24-2 { //... return; } //... sig_send (myself, fast ? __SIGFLUSHFAST : __SIGFLUSH); } When make this modification in sources for 1.5.24-2 and rebuild cygwin1.dll, my test program begins to work as fast as on 1.5.19-4. In message http://cygwin.com/ml/cygwin-developers/2006-07/msg00034.html Brian Ford pointed to the following description of a change between 1.5.19-4 and 1.5.24-2: 2006-02-24 Christopher Faylor cgf at timesys dot com * sigproc.cc (sigheld): Define new variable. - (sig_dispatch_pending): Don't check sigq since that's racy. (sig_send): Set sigheld flag if __SIGHOLD is specified, reset it if __SIGNOHOLD is specified. Ignore flush signals if we're holding signals. I think, that maybe checking of sigq is a little bit racy, but it turns, that getting rid of such a cheap check results in a great slowdown of sig_dispatch_pending() function for most calls, when there are no pending signals. Maybe introducing a critical section or some other synchronization mechanism would be a solution. Oleg Volkov -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Eric Blake on 3/7/2007 2:25 PM: Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending(). And if cgf decides not to patch cygwin in this manner Fortunately, snapshots are patched now. , I can at least try to patch bash to not call sigprocmask() if it knows the mask is not changing. Unfortunately, it turned out to be harder than I expected to try and make bash work around this issue - both readline and bash call sigprocmask, and since they are not in the same binary, there is no way to make them share state short of adding an API to readline. Without remembering state, I can't avoid the overhead of a context swap (even calling sigprocmask(SIG_SETMASK,NULL,set) was unnecessarily swapping). But I don't want to add an API to readline to remember state when the next release of cygwin already has a working sigprocmask. So the upshot is that bash builtins on cygwin 1.5.24 will remain slower than strictly necessary. Here's hoping that 1.7.0 isn't too far away! - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF/1L784KuGfSFAYARAukzAJ4n9tMULVtyPnkPnhGfgCrCa1er2QCfW/P1 wXZYhvXG38SlXVXkY3t37C8= =zgDF -END PGP SIGNATURE- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
Christopher Layne wrote: On Fri, Mar 02, 2007 at 11:11:54AM -0800, Brian Dessent wrote: Vinod Gupta wrote: Cygwin was a slow by a factor of 3x. Is that normal? Yes. Emulation of POSIX functions which do not exist on Windows is expensive. Fork is especially bad, which is all you're really testing there. Where is the *continual* fork in his script btw? There is no fork at all, the script uses only builtin shell commands. This command prints the fork() count of a script on Cygwin: $ strace bash ./script.sh | grep -c 'fork: 0 = fork()' One reason for the slow execution of the script are 800 context switches done by Cygwin. Bash calls sigprocmask() before starting each command, even for builtin commands. Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending(). This is necessary because POSIX requires that at least one pending signal is dispatched by sigprocmask(). sig_dispatch_pending() sends a __SIGFLUSH* to self and this causes 2 thread context switches: main-sig-main. With the attached patch, sigprocmask() does nothing if the signal mask is not changed. This reduces the context switches to 5000. (Patch is only intended for testing, it at least breaks above POSIX rule) I've run 4 tests scripts on 5 platforms: Test 1: Original script, but with [[...]] instead of [...]: i=100 while [[ $i -gt 0 ]]; do j=$(((i/3+i*3)**3)) i=$((i-1)) done Test 2: Original script unchanged: i=100 while [ $i -gt 0 ]; do ... Test 3: Original script with /100 iterations and using command version of [ (test): i=1 while /usr/bin/[ $i -gt 0 ]; do ... Test 4: A real world ./configure script Results on same AMD64 3200+ @2GHz, XP SP2: | Runtime (seconds) of test | 1 2 3 4 --- Cygwin 1.5.24-2 77 84 138 33 Cygwin +patch 38 46 138 33 Linux on Virt.PC: 49 57 62 22 Linux on VMware: 29 34 23 20 Linux native: 23 29 7 6 (Linux = grml 0.9 live CD) Observations: - Shell scripts with many builtin commands would benefit from a Cygwin optimization preventing unnecessary context switches ... - ... but this might not help for most real world scripts. - fork() on Linux is also considerably slower when running in a VM on Windows. - Bash's builtin [[...]] is faster than [...] Christian diff -up cygwin-1.5.24-2.orig/winsup/cygwin/signal.cc cygwin-1.5.24-2/winsup/cygwin/signal.cc --- cygwin-1.5.24-2.orig/winsup/cygwin/signal.cc2006-07-05 01:57:43.00100 +0200 +++ cygwin-1.5.24-2/winsup/cygwin/signal.cc 2007-03-07 19:23:27.59375 +0100 @@ -153,7 +153,6 @@ sigprocmask (int how, const sigset_t *se int __stdcall handle_sigprocmask (int how, const sigset_t *set, sigset_t *oldset, sigset_t opmask) { - sig_dispatch_pending (); /* check that how is in right range */ if (how != SIG_BLOCK how != SIG_UNBLOCK how != SIG_SETMASK) { @@ -171,7 +170,8 @@ handle_sigprocmask (int how, const sigse if (set) { - sigset_t newmask = opmask; + sigset_t oldmask = opmask; + sigset_t newmask = oldmask; switch (how) { case SIG_BLOCK: @@ -187,7 +187,11 @@ handle_sigprocmask (int how, const sigse newmask = *set; break; } - set_signal_mask (newmask, opmask); + if (oldmask != newmask) + { + sig_dispatch_pending(); + set_signal_mask (newmask, opmask); +} } return 0; } -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
Christian Franke Christian.Franke at t-online.de writes: Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending(). This is necessary because POSIX requires that at least one pending signal is dispatched by sigprocmask(). Actually, POSIX requires If there are any pending unblocked signals after the call to sigprocmask(), at least one of those signals shall be delivered before the call to sigprocmask() returns. And the way I see it, if the mask is unchanged, then any signal that was unblocked before calling sigprocmask() should have already fired. In other words, the only signals that sigprocmask() HAS to worry about are signals that just changed to unmasked; and if the mask isn't changing, then there is no need to flush the signal queue. With the attached patch, sigprocmask() does nothing if the signal mask is not changed. This reduces the context switches to 5000. (Patch is only intended for testing, it at least breaks above POSIX rule) I think your patch is still within the spirit of POSIX - I don't see the rule being broken. I'll defer to cgf's judgment on this; but it sounds like a worthwhile patch to apply, even if it doesn't help the common case of non- builtins. And if cgf decides not to patch cygwin in this manner, I can at least try to patch bash to not call sigprocmask() if it knows the mask is not changing. -- Eric Blake volunteer cygwin bash maintainer -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
On Wed, Mar 07, 2007 at 09:13:33PM +0100, Christian Franke wrote: Christopher Layne wrote: On Fri, Mar 02, 2007 at 11:11:54AM -0800, Brian Dessent wrote: Vinod Gupta wrote: Cygwin was a slow by a factor of 3x. Is that normal? Yes. Emulation of POSIX functions which do not exist on Windows is expensive. Fork is especially bad, which is all you're really testing there. Where is the *continual* fork in his script btw? There is no fork at all, the script uses only builtin shell commands. This command prints the fork() count of a script on Cygwin: $ strace bash ./script.sh | grep -c 'fork: 0 = fork()' One reason for the slow execution of the script are 800 context switches done by Cygwin. Bash calls sigprocmask() before starting each command, even for builtin commands. Cygwin's sigprocmask() unconditionally calls sig_dispatch_pending(). This is necessary because POSIX requires that at least one pending signal is dispatched by sigprocmask(). sig_dispatch_pending() sends a __SIGFLUSH* to self and this causes 2 thread context switches: main-sig-main. With the attached patch, sigprocmask() does nothing if the signal mask is not changed. This reduces the context switches to 5000. (Patch is only intended for testing, it at least breaks above POSIX rule) I removed the sig_dispatch_pending from handle_sigprocmask. I don't see any need for extra logic beyond that since you're doing tests that are already being done in set_signal_mask. I'll generate a snapshot with these changes for testing. Thanks for the patch. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
Eric Blake wrote: ... And the way I see it, if the mask is unchanged, then any signal that was unblocked before calling sigprocmask() should have already fired. In other words, the only signals that sigprocmask() HAS to worry about are signals that just changed to unmasked;... To handle this case, wouldn't it be necessary to call sig_dispatch_pending() *after* set_signal_mask() has unblocked the signal? Christian -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
On Wed, 7 Mar 2007, Christopher Faylor wrote: I removed the sig_dispatch_pending from handle_sigprocmask. Would now be a good time to ask this question again? http://cygwin.com/ml/cygwin-developers/2006-07/msg00029.html I assume the answer is still the same, though ;-(. -- Brian Ford Lead Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International the best safety device in any aircraft is a well-trained crew... -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
On Fri, Mar 02, 2007 at 11:11:54AM -0800, Brian Dessent wrote: Vinod Gupta wrote: Cygwin was a slow by a factor of 3x. Is that normal? Yes. Emulation of POSIX functions which do not exist on Windows is expensive. Fork is especially bad, which is all you're really testing there. Where is the *continual* fork in his script btw? -cl -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Cygwin speed
I ran the following loop under bash on three different machines: i=100 while [ $i -gt 0 ]; do j=$(((i/3+i*3)**3)) i=$((i-1)) done Here is how long it took: CPU OSTime (secs) --- -- P4/3.2GHz Linux RHEL4 41 Core Duo/2.2GHz Mac OSX 10.4 43 Core Duo/2.4GHz WinXP+Cygwin107 Cygwin was a slow by a factor of 3x. Is that normal? Vinod -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
Vinod Gupta wrote: Cygwin was a slow by a factor of 3x. Is that normal? Yes. Emulation of POSIX functions which do not exist on Windows is expensive. Fork is especially bad, which is all you're really testing there. Brian -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Cygwin speed
* Brian Dessent (Fri, 02 Mar 2007 11:11:54 -0800) Vinod Gupta wrote: Cygwin was a slow by a factor of 3x. Is that normal? Yes. Actually no. The standard approximate guess is a factor of two which corresponds to Vinod's testings. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/