[Bug libstdc++/47433] libstdc++ parallel mode calls std::swap explicitely
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47433 --- Comment #5 from Johannes Singler singler at kit dot edu 2011-01-24 13:19:22 UTC --- What are you proposing for a fix? Omitting std::? Using std::iter_swap where appropriate, like stl_algo.h mostly does? The latter would be more consistent. Johannes
[Bug libstdc++/47433] libstdc++ parallel mode calls std::swap explicitely
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47433 --- Comment #6 from Johannes Singler singler at kit dot edu 2011-01-24 13:23:26 UTC --- Taking __key as value type in (some variants of __delete_min_insert) makes sense, since it is also used as a buffer for storing the current loser. Having a local variable that is initialized with the const ref would have the same effect, would it not? Johannes
[Bug libstdc++/47433] libstdc++ parallel mode calls std::swap explicitely
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47433 --- Comment #9 from Johannes Singler singler at kit dot edu 2011-01-24 13:55:33 UTC --- I have made the attached minimal patch. Use std::iter_swap where possible, use swap for _Tp, and leave std::swap for built-in types. I will test and then submit the patch. Johannes
[Bug libstdc++/47433] libstdc++ parallel mode calls std::swap explicitely
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47433 --- Comment #10 from Johannes Singler singler at kit dot edu 2011-01-24 13:57:16 UTC --- Created attachment 23098 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=23098 Minimal patch avoid std::swap on template types.
[Bug libstdc++/47437] New: libstdc++ parallel mode: multiway_merge does not compile
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47437 Summary: libstdc++ parallel mode: multiway_merge does not compile Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: blocker Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: sing...@kit.edu CC: sing...@kit.edu, paolo.carl...@oracle.com, manuel.holtgr...@fu-berlin.de Depends on: 47433 Mainline currently fails to compile the parallel mode multiway merge facilities because of a spurious mutable reference. This is a serious regression. The fix is easy, though, and currently undergoes testing.
[Bug libstdc++/47437] libstdc++ parallel mode: multiway_merge does not compile
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47437 --- Comment #1 from Johannes Singler singler at kit dot edu 2011-01-24 14:48:42 UTC --- Created attachment 23101 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=23101 Remove mutable qualifier from reference member.
[Bug libstdc++/47437] libstdc++ parallel mode: multiway_merge does not compile
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47437 Johannes Singler singler at kit dot edu changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011.01.24 14:49:38 AssignedTo|unassigned at gcc dot |singler at kit dot edu |gnu.org | Ever Confirmed|0 |1
[Bug libstdc++/47437] [4.6 Regression] libstdc++ parallel mode: multiway_merge does not compile
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47437 Johannes Singler singler at kit dot edu changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #4 from Johannes Singler singler at kit dot edu 2011-01-24 16:52:37 UTC --- Fixed by above commit.
[Bug libstdc++/47433] libstdc++ parallel mode calls std::swap explicitely
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47433 Johannes Singler singler at kit dot edu changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #13 from Johannes Singler singler at kit dot edu 2011-01-24 17:10:05 UTC --- Fixed in mainline.
[Bug libgomp/43706] scheduling two threads on one core leads to starvation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706 --- Comment #26 from Johannes Singler singler at kit dot edu 2010-11-15 08:53:12 UTC --- (In reply to comment #25) You might have misread what I wrote. I did not mention 35 tests; I mentioned that a test became slower by 35%. The total number of different tests was 4 (and each was invoked multiple times per spincount setting, indeed). One out of four stayed 35% slower until I increased GOMP_SPINCOUNT to 20. Sorry, I got that wrong. This makes some sense, but the job of an optimizing compiler and runtime libraries is to deliver the best performance they can even with somewhat non-optimal source code. I agree with that in principle. But please be reminded that as is, there is the very simple testcase posted, which takes a serious performance hit. And repeated parallel loops like the one in the test program certainly appear very often in real applications. BTW: How does the testcase react to this change on your machine? There are plenty of real-world cases where spending time on application redesign for speed is unreasonable or can only be completed at a later time - yet it is desirable to squeeze a little bit of extra performance out of the existing code. There are also cases where more efficient parallelization - implemented at a higher level to avoid frequent switches between parallel and sequential execution - makes the application harder to use. To me, one of the very reasons to use OpenMP was to avoid/postpone that redesign and the user-visible complication for now. If I went for a more efficient higher-level solution, I would not need OpenMP in the first place. OpenMP should not be regarded as only good for loop parallelization. With the new task construct, it is a fully-fledged parallelization substrate. So I would suggest a threshold of 10 for now. My suggestion is 25. Well, that's already much better than staying with 20,000,000, so I agree. IMHO, something should really happen to this problem before the 4.6 release. Agreed. It'd be best to have a code fix, though. IMHO, there is no obvious way to fix this in principle. There will always be a compromise between busy waiting and giving back control to the OS. Jakub, what do you plan to do about this problem?
[Bug libgomp/43706] scheduling two threads on one core leads to starvation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706 --- Comment #24 from Johannes Singler singler at kit dot edu 2010-11-12 08:15:56 UTC --- If only one out of 35 tests becomes slower, I would rather blame it to this one (probably badly parallelized) application, not the OpenMP runtime system. So I would suggest a threshold of 10 for now. IMHO, something should really happen to this problem before the 4.6 release.
[Bug libgomp/43706] scheduling two threads on one core leads to starvation
--- Comment #20 from singler at kit dot edu 2010-08-30 08:41 --- Maybe we could agree on a compromise for a start. Alexander, what are the corresponding results for GOMP_SPINCOUNT=10? -- singler at kit dot edu changed: What|Removed |Added CC||singler at kit dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706
[Bug libgomp/43706] scheduling two threads on one core leads to starvation
--- Comment #16 from singler at kit dot edu 2010-08-13 15:48 --- I would really like to see this bug tackled. It has been confirmed two more times. Fixing it is easily done by lowering the spin count as proposed. Otherwise, please show cases where a low spin count hurts performance. In general, for a tuning parameter, a good-natured rather value should be preferred over a value that gives best results in one case, but very bad ones in another case. -- singler at kit dot edu changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-08-13 15:48:18 date|| Target Milestone|4.4.5 |4.5.2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706
[Bug bootstrap/44439] New: Configure states wrong required versions for GMP, MPFR, and MPC
In case of wrong prerequisite versions, configure says Building GCC requires GMP 4.2+, MPFR 2.3.1+ and MPC 0.8.0+ but the prerequisites page http://gcc.gnu.org/install/prerequisites.html says GNU Multiple Precision Library (GMP) version 4.3.2 (or later) MPFR Library version 2.4.2 (or later) MPC Library version 0.8.1 (or later) which is apparently more correct. This could cause some installation headaches. -- Summary: Configure states wrong required versions for GMP, MPFR, and MPC Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: singler at kit dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44439
[Bug libstdc++/44417] make check-target-libstdc++-v3 fails due to undefined ptrdiff_t
--- Comment #11 from singler at kit dot edu 2010-06-07 09:35 --- Obviously, I'm not the only one having this problem, Jason has patched libstdc++-v3/testsuite/util/testsuite_abi.h in the meantime. r160313 | jason | 2010-06-05 15:13:46 +0200 (Sat, 05 Jun 2010) | 1 line * testsuite/util/testsuite_abi.h: Work around glibc BZ 9694. +// Include stddef now to work around glibc BZ 9694 +#include stddef.h related to http://sourceware.org/bugzilla/show_bug.cgi?id=9694 This solves the problem testsuite_abi.h, but then, the next test fails, namely testsuite_allocator.h. I have tested all that using a different user ID, with a completely clean environment and new checkout, but the problem persists. -- singler at kit dot edu changed: What|Removed |Added CC||jason at redhat dot com Status|RESOLVED|UNCONFIRMED Resolution|WORKSFORME | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44417
[Bug libstdc++/44417] New: make check-target-libstdc++-v3 fails due to undefined ptrdiff_t
make check-target-libstdc++-v3 fails because ptrdiff_t is undefined. std::ptrdiff_t works. Maybe this bug is related to the Linux system run on. I have openSuse 11.1 running. configure --enable-languages=c,c++ --program-suffix=-rep --prefix=$HOME/gcc/install_trunk_1 In file included from /home/singler/gcc/trunk_1/libstdc++-v3/testsuite/util/testsuite_abi.h:27:0, from /home/singler/gcc/trunk_1/libstdc++-v3/testsuite/util/testsuite_abi.cc:23: /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:371:5: error: 'ptrdiff_t' does not name a type /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:447:23: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:459:18: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:469:26: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:495:18: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:501:26: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:540:18: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:546:26: error: 'ptrdiff_t' has not been declared /home/singler/gcc/trunk_1/libstdc++-v3/libsupc++/cxxabi.h:566:4: error: 'ptrdiff_t' has not been declared compiler exited with status 1 Any ideas and/or a workaround? -- Summary: make check-target-libstdc++-v3 fails due to undefined ptrdiff_t Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: singler at kit dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44417
[Bug libstdc++/44417] make check-target-libstdc++-v3 fails due to undefined ptrdiff_t
--- Comment #2 from singler at kit dot edu 2010-06-04 14:16 --- I had cleaned the builddir already. Adding #include cstddef solves the problem. The crucial file seems to be lib/gcc/x86_64-unknown-linux-gnu/4.6.0/include/stddef.h Only if it is (indirectly) included, ptrdiff_t is defined in the global scope. Maybe on other systems, ptrdiff_t is also declared somewhere else, so the problem does not appear. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44417
[Bug middle-end/44416] [4.6 regression] Failed to build 447.dealII in SPEC CPU 2006
--- Comment #3 from singler at kit dot edu 2010-06-04 14:19 --- Bug 44417 is very likely to have the same cause, but here, we can reproduce it more easily, using the testsuite. *** This bug has been marked as a duplicate of 44417 *** -- singler at kit dot edu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44416
[Bug libstdc++/44417] make check-target-libstdc++-v3 fails due to undefined ptrdiff_t
--- Comment #3 from singler at kit dot edu 2010-06-04 14:19 --- *** Bug 44416 has been marked as a duplicate of this bug. *** -- singler at kit dot edu changed: What|Removed |Added CC||hjl dot tools at gmail dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44417
[Bug libgomp/43706] scheduling two threads on one core leads to starvation
--- Comment #13 from singler at kit dot edu 2010-04-23 14:17 --- The default spin count is not 2,000,000 cycles, but even 20,000,000. As commented in libgomp/env.c, this is supposed to correspond to 200ms. The timings we see here are even larger, but the number of cycles is just a rough estimation. Throttling the spincount in the awareness of too many threads is a good idea, but it is just a heuristic. If there are other processes, the cores might be loaded anyway, and libgomp has little chances to figure that out. This gets even more difficult when having multiple programs using libgomp at the same time. So I would like the non-throttling value to be chosen more conservative, better balancing worst case behavior in difficult situations and best case behavior on an unloaded machine. There are algorithms in libstdc++ parallel mode that show speedups for as little as less than 1ms of sequential running time (when taking threads from the pool), so users will accept a parallelization overhead for such small computing times. However, if they are then hit by a 200ms penalty, this results in catastrophic slowdowns. Calling such short-lived parallel regions several times will make this very noticeable, although it need not be. So IMHO, by default, the spinning should take about as long as rescheduling a thread takes (that was already migrated on another core), by that making things at most twice as bad as in the best case. From my experience, this is a matter of a few milliseconds, so I propose to lower the default spincount to something like 10,000, at most 100,000. I think that spinning for even longer than a usual time slice like now is questionable anyway. Are nested threads taken into account when deciding on whether to throttle or not? -- singler at kit dot edu changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706
[Bug middle-end/39154] Miscompilation of VLAs in nested parallel regions
--- Comment #3 from singler at kit dot edu 2010-04-20 14:04 --- Can this old bug be closed? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39154
[Bug libstdc++/33485] parallel v3: do not use __builtin_alloca, use VLA
--- Comment #17 from singler at kit dot edu 2010-02-09 10:49 --- The actual problem has vanished, but maybe it would still be nice to use VLA in the appropriate places. We can close the bug as fixed/invalid, or reprioritize it as enhancement and leave it open. Both is fine with me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33485
[Bug libstdc++/42712] search_n/iterator.cc times out in parallel-mode
--- Comment #3 from singler at kit dot edu 2010-01-18 09:08 --- Paolo, you were right, it was just the fallback switch missing for this case. And since this specific test issues many thousands of calls with very small input, the overhead was very noticeable. Patch upcoming... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42712
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #16 from singler at kit dot edu 2010-01-15 14:29 --- First, let's remove superfluous #pragma omp single in two occurences, to make things simpler (see attached patch for trunk). The problem still persists, the program deadlocks. When dropping in some prints (see attached patch), the log ends like this: find going parallel, requesting 2 thread thread 0 of 2 starts thread 0 finished thread 1 of 2 starts thread 1 finished successful join find going parallel, requesting 2 thread thread 0 of 2 starts thread 0 finished Analysis: Thread 1 never starts (or at least does not reach the first printf). In general, for more threads, only thread 0 starts. This obviously leads to the deadlock. So on first sight, I would blame it on the OpenMP implementation. Maybe yet some interference with the pthreads. Any other explanations? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #17 from singler at kit dot edu 2010-01-15 14:30 --- Created an attachment (id=19616) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19616action=view) Removes superfluous pragma omp single twice -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #18 from singler at kit dot edu 2010-01-15 14:30 --- Created an attachment (id=19617) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19617action=view) Add printf debug statements. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #14 from singler at kit dot edu 2010-01-13 13:53 --- (In reply to comment #13) This code is compiled with -fno-exceptions, could that be a problem? No, that should rather help. Still, it is very difficult to debug this. Is there at least a way to access clamd's stdout and/or stderr? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #6 from singler at kit dot edu 2010-01-12 12:36 --- Can I get this thing to run without actually installing it into the system? 5. clamd/clamd -c etc/clamd.conf LibClamAV Error: cl_load(): Can't get status of /usr/local/share/clamav ERROR: Can't get file status Please enter the GCC version into the Reported against field. What happens for OMP_NUM_THREADS=1? I will look thoroughly into the find implementation in the meantime. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #10 from singler at kit dot edu 2010-01-12 14:35 --- Can reproduce deadlock now. -- singler at kit dot edu changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |singler at kit dot edu |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-01-12 14:35:01 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #11 from singler at kit dot edu 2010-01-12 14:35 --- (In reply to comment #9) Could this bug be related to this one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36242#c4 This bug is invalid for GCC 4.4. Clamd creates threads using pthread_create, std::find is called from those threads. There are also threads that only poll/dispatch, and never use the STL (hence never uses openmp). However the gcc manual doesn't mention incompatibility between pthread_create and openmp (or libstdc++ parallel mode). It should work nevertheless. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42624] libstdc++ parallel mode deadlocks in barrier
--- Comment #12 from singler at kit dot edu 2010-01-12 17:42 --- Thread 1 waits for its colleagues, but where are they gone? Is it possible that an exception is thrown inside find (by means of the value type or the predicate)? I don't fully trust gdb in this case, but it shows that an iterator range of (NULL, NULL) had to be searched. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624
[Bug libstdc++/42712] search_n/iterator.cc times out in parallel-mode
--- Comment #1 from singler at kit dot edu 2010-01-12 17:43 --- Maybe rather an endless loop. -- singler at kit dot edu changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |singler at kit dot edu |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-01-12 17:43:17 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42712