bug#7489: [coreutils] over aggressive threads in sort

2018-10-30 Thread Assaf Gordon
(triaging old bugs) Hello, This long thread ( http://bugs.gnu.org/7489 ) deals with multiple parallel-sort bugs, resulting in many commits: 1d0a12037 Paul Eggert 2010-12-22 sort: minor performance tweak with num_processors 41159f960 Pádraig Brady 2010-12-20 maint: fix a typo in sort

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-07 Thread Jim Meyering
Chen Guo wrote: ... I've attached the patch (inlined at the bottom). Here's the GDB crash, with backtrace. I also printed node-queued in GDB, so it's evident that its accessible. (gdb) run --parallel 2 rec_1M /dev/null Starting program: /data/chen/Coding/Coreutils/test/sort-mutex

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-07 Thread Jim Meyering
Chen Guo wrote: Hi Professor Eggert, On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert egg...@cs.ucla.edu wrote: On 12/05/2010 09:16 PM, Chen Guo wrote: Before saying anything else, I should note that for mutexes, on 4 threads 20% of the time there's a segfault on a seemingly innocuous line in

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-07 Thread Chen Guo
Hi Jim, On Tue, Dec 7, 2010 at 3:24 AM, Jim Meyering j...@meyering.net wrote: Hi Chen, Thanks.  What does your input file look like? I've been unable to reproduce the failure using the output of seq 100.  I've tried a few different -S ... options, in case the amount of available memory

bug#7489: [coreutils] over aggressive threads in sort

2010-12-07 Thread Jim Meyering
Chen Guo wrote: ... I've attached the patch (inlined at the bottom). Here's the GDB crash, with backtrace. I also printed node-queued in GDB, so it's evident that its accessible. (gdb) run --parallel 2 rec_1M /dev/null Starting program: /data/chen/Coding/Coreutils/test/sort-mutex

bug#7489: [coreutils] over aggressive threads in sort

2010-12-06 Thread Chen Guo
Hi Professor Eggert, On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert egg...@cs.ucla.edu wrote: On 12/05/2010 09:16 PM, Chen Guo wrote: Before saying anything else, I should note that for mutexes, on 4 threads 20% of the time there's a segfault on a seemingly innocuous line in queue_insert ():  

bug#7489: [coreutils] over aggressive threads in sort

2010-12-06 Thread Paul Eggert
On 12/05/10 03:21, Jim Meyering wrote: seq -w 20 exp tac exp in PATH=.:$PATH ./sort --compress-program=dzip -S 1k in out That gets stuck in waitpid (from sort.c's reap), waiting for a dzip invocation that appears will never terminate. This is also on that same 4-core system,

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Jim Meyering
Paul Eggert wrote: On 11/29/2010 02:46 PM, Paul Eggert wrote: My current guess, by the way, is that it's not a bug that can be triggered: it's merely useless code that is harmless and can safely be removed. I removed it as part of the following series of cleanup patches. These are intended

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Chen Guo
Hi Professor Eggert, On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert egg...@cs.ucla.edu wrote: On 12/03/10 12:18, Chen Guo wrote: Either option (either switch to mutexes everywhere, or have the top-level merge go to memory) should work.  Perhaps we should try both and benchmark them. Test

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Paul Eggert
On 12/05/2010 09:16 PM, Chen Guo wrote: Before saying anything else, I should note that for mutexes, on 4 threads 20% of the time there's a segfault on a seemingly innocuous line in queue_insert (): node-queued = true It does sound like mutexes are the way to go, and that this bug needs to

bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Jim Meyering
Paul Eggert wrote: On 11/29/2010 02:46 PM, Paul Eggert wrote: My current guess, by the way, is that it's not a bug that can be triggered: it's merely useless code that is harmless and can safely be removed. I removed it as part of the following series of cleanup patches. These are intended

bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Chen Guo
Hi Professor Eggert, On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert egg...@cs.ucla.edu wrote: On 12/03/10 12:18, Chen Guo wrote: Either option (either switch to mutexes everywhere, or have the top-level merge go to memory) should work.  Perhaps we should try both and benchmark them. Test

bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Paul Eggert
On 12/05/2010 09:16 PM, Chen Guo wrote: Before saying anything else, I should note that for mutexes, on 4 threads 20% of the time there's a segfault on a seemingly innocuous line in queue_insert (): node-queued = true It does sound like mutexes are the way to go, and that this bug needs to

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-04 Thread Paul Eggert
On 11/29/2010 02:46 PM, Paul Eggert wrote: My current guess, by the way, is that it's not a bug that can be triggered: it's merely useless code that is harmless and can safely be removed. I removed it as part of the following series of cleanup patches. These are intended merely to refactor

bug#7489: [coreutils] over aggressive threads in sort

2010-12-04 Thread Paul Eggert
On 11/29/2010 02:46 PM, Paul Eggert wrote: My current guess, by the way, is that it's not a bug that can be triggered: it's merely useless code that is harmless and can safely be removed. I removed it as part of the following series of cleanup patches. These are intended merely to refactor

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-03 Thread Chen Guo
Thanks Jim, that helped a lot. I'll try out Professor Eggert's suggestion, of switching to mutexes only at the top level merge. Of the following approaches, which would you guys consider better practice? 1) void pointer, cast as either mutex or spinlock in lock function 2) union of mutex and

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-03 Thread Paul Eggert
On 12/03/10 12:18, Chen Guo wrote: I'll try out Professor Eggert's suggestion, of switching to mutexes only at the top level merge. I'm having second thoughts about that. Yes, that'll prevent the top-level merge (which is generating the actual output) from chewing up CPU time. But it already

bug#7489: [coreutils] over aggressive threads in sort

2010-12-03 Thread Chen Guo
Thanks Jim, that helped a lot. I'll try out Professor Eggert's suggestion, of switching to mutexes only at the top level merge. Of the following approaches, which would you guys consider better practice? 1) void pointer, cast as either mutex or spinlock in lock function 2) union of mutex and

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-02 Thread Chen Guo
Hi Professor Eggert, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:  (for i in $(seq 12); do read line; echo $i; sleep .1; done  cat /dev/null) fifo  (ulimit -t 1; ./sort in fifo \  || echo killed via $(env kill -l $(expr $? - 128))) I ran this 10 times or so on

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-02 Thread Paul Eggert
On 12/02/10 02:22, Chen Guo wrote: On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote: (for i in $(seq 12); do read line; echo $i; sleep .1; done cat /dev/null) fifo (ulimit -t 1; ./sort in fifo \ || echo killed via $(env kill -l $(expr $? - 128))) I ran this 10

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-02 Thread Jim Meyering
Chen Guo wrote: Hi Professor Eggert, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:  (for i in $(seq 12); do read line; echo $i; sleep .1; done  cat /dev/null) fifo  (ulimit -t 1; ./sort in fifo \  || echo killed via $(env kill -l $(expr $? - 128))) I ran this

bug#7489: [coreutils] over aggressive threads in sort

2010-12-02 Thread Chen Guo
Hi Professor Eggert, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:  (for i in $(seq 12); do read line; echo $i; sleep .1; done  cat /dev/null) fifo  (ulimit -t 1; ./sort in fifo \  || echo killed via $(env kill -l $(expr $? - 128))) I ran this 10 times or so on

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-01 Thread Jim Meyering
Jim Meyering wrote: Paul Eggert wrote: Could you please try this little patch? It should fix your problem. I came up with this fix in my sleep (literally! I woke up this morning and the patch was in my head), but haven't had time to look at the code in this area to see if it's the best

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-12-01 Thread Jim Meyering
Paul Eggert wrote: On 11/29/2010 08:32 PM, Chen Guo wrote: Hi guys, Is something up with Savannah? I just tried a git clone and got connection time out; I cant even reach git.sv.gnu.org via ping. There was a breakin, which led to leaking of encrypted account passwords, some of them

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Chen Guo
On Tue, Nov 30, 2010 at 2:22 PM, Paul Eggert egg...@cs.ucla.edu wrote: Hi Professor Eggert, Anyway, perhaps Chen can review them (I don't have time to test them right now). I'll look at it as soon as Savannah's back up; I never actually pulled from coreutils after the original patch was

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Paul Eggert
On 11/30/10 13:41, Jim Meyering wrote: Is there anything you'd like to add? No, thanks, that looks good. I have some other patches to clean things up in this area, but they can wait. I hate to tease, so here is a draft of the cleanup patches. Most of this stuff is cleanup, but the first line of

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Paul Eggert
On 11/29/2010 08:32 PM, Chen Guo wrote: Hi guys, Is something up with Savannah? I just tried a git clone and got connection time out; I cant even reach git.sv.gnu.org via ping. There was a breakin, which led to leaking of encrypted account passwords, some of them discovered via a

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Chen Guo
Hi guys, Is something up with Savannah? I just tried a git clone and got connection time out; I cant even reach git.sv.gnu.org via ping. I'll try again at work tomorrow.

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Jim Meyering
Jim Meyering wrote: Paul Eggert wrote: Could you please try this little patch? It should fix your problem. I came up with this fix in my sleep (literally! I woke up this morning and the patch was in my head), but haven't had time to look at the code in this area to see if it's the best

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Paul Eggert
On 11/28/10 23:14, DJ Lucas wrote: http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html Ah, sorry, I didn't understand that message and thought Pádraig had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced the problem with the stated test case: (for i in $(seq 12); do

Re: bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Jim Meyering
Paul Eggert wrote: Could you please try this little patch? It should fix your problem. I came up with this fix in my sleep (literally! I woke up this morning and the patch was in my head), but haven't had time to look at the code in this area to see if it's the best fix. Clearly there's

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Pádraig Brady
On 29/11/10 07:14, DJ Lucas wrote: On 11/27/2010 08:18 PM, DJ Lucas wrote: lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat /lfs-source-archive/cracklib-words-20080507 | sort -u /dev/null; echo $? 0 lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ Appears to work as expected.

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Paul Eggert
On 11/28/10 23:14, DJ Lucas wrote: http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html Ah, sorry, I didn't understand that message and thought Pádraig had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced the problem with the stated test case: (for i in $(seq 12); do

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Chen Guo
Hi all, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote: entirely and use mutexes instead.  Perhaps a better fix would be to use mutexes at the top level (where threads can write to a file and therefore can wait) and to use spin locks at lower levels (where threads are

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Paul Eggert
On 11/29/10 16:34, Chen Guo wrote: The only way this would work is if, when a struct is locked via mutex the only threads trying to acquire the struct are trying to do so via mutex, and no threads are looking to lock via spinlock. Yes, that's definitely the idea. Under either of my

bug#7489: [coreutils] over aggressive threads in sort

2010-11-28 Thread DJ Lucas
On 11/27/2010 08:18 PM, DJ Lucas wrote: lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat /lfs-source-archive/cracklib-words-20080507 | sort -u /dev/null; echo $? 0 lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ Appears to work as expected. Thanks for jumping on this so

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Paul Eggert
On 11/26/2010 06:52 PM, Pádraig Brady wrote: Hmm, seems like multiple threads are racing to update the static saved variable in write_unique() ? I don't think it's as simple as that. write_unique is generating output, and when it is run it is supposed to have exclusive access to the output

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Paul Eggert
Following up on my previous email, it appears to me that the following line in mergelines_node is weird: node-dest -= lo_orig - node-lo + hi_orig - node-hi; Surely there should be a * in front of that line? (This does not fix the bug; perhaps it is a different bug?)

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Paul Eggert
Could you please try this little patch? It should fix your problem. I came up with this fix in my sleep (literally! I woke up this morning and the patch was in my head), but haven't had time to look at the code in this area to see if it's the best fix. Clearly there's at least one more bug as

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread DJ Lucas
Sent too bug-coreutils too (no bug id currently AFAICT). Bug only affects multi-byte locales. Take the following samples: bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug file echo $? sort: using `en_US.UTF-8' sorting rules Segmentation fault bash-4.1# echo $? 139 bash-4.1#

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread Paul Eggert
Thanks for the bug report. Unfortunately, I cannot reproduce the problem with coreutils 8.7, either on RHEL 5.5 x86-64 or on Ubuntu 10.10 x86. Which version of coreutils are you running? And on what platform? How did you build it? Can you reproduce it with --parallel=2? If not, which value

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread DJ Lucas
On 11/26/2010 05:24 PM, Paul Eggert wrote: Thanks for the bug report. Unfortunately, I cannot reproduce the problem with coreutils 8.7, either on RHEL 5.5 x86-64 or on Ubuntu 10.10 x86. Which version of coreutils are you running? 8.7. Haven't tested on 8.6 or 8.5. 8.4 worked correctly,

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread Pádraig Brady
On 26/11/10 18:01, DJ Lucas wrote: Sent too bug-coreutils too (no bug id currently AFAICT). Bug only affects multi-byte locales. Take the following samples: bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug file echo $? sort: using `en_US.UTF-8' sorting rules