(triaging old bugs)
Hello,
This long thread ( http://bugs.gnu.org/7489 )
deals with multiple parallel-sort bugs, resulting in many commits:
1d0a12037 Paul Eggert 2010-12-22 sort: minor performance tweak with
num_processors
41159f960 Pádraig Brady 2010-12-20 maint: fix a typo in sort
Chen Guo wrote:
...
I've attached the patch (inlined at the bottom). Here's the GDB
crash, with backtrace. I also printed node-queued in GDB, so it's
evident that its accessible.
(gdb) run --parallel 2 rec_1M /dev/null
Starting program: /data/chen/Coding/Coreutils/test/sort-mutex
Chen Guo wrote:
Hi Professor Eggert,
On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert egg...@cs.ucla.edu wrote:
On 12/05/2010 09:16 PM, Chen Guo wrote:
Before saying anything else, I should note that for mutexes, on 4
threads 20% of the time there's a segfault on a seemingly innocuous
line in
Hi Jim,
On Tue, Dec 7, 2010 at 3:24 AM, Jim Meyering j...@meyering.net wrote:
Hi Chen,
Thanks. What does your input file look like?
I've been unable to reproduce the failure using the output of
seq 100. I've tried a few different -S ... options, in case
the amount of available memory
Chen Guo wrote:
...
I've attached the patch (inlined at the bottom). Here's the GDB
crash, with backtrace. I also printed node-queued in GDB, so it's
evident that its accessible.
(gdb) run --parallel 2 rec_1M /dev/null
Starting program: /data/chen/Coding/Coreutils/test/sort-mutex
Hi Professor Eggert,
On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert egg...@cs.ucla.edu wrote:
On 12/05/2010 09:16 PM, Chen Guo wrote:
Before saying anything else, I should note that for mutexes, on 4
threads 20% of the time there's a segfault on a seemingly innocuous
line in queue_insert ():
On 12/05/10 03:21, Jim Meyering wrote:
seq -w 20 exp tac exp in
PATH=.:$PATH ./sort --compress-program=dzip -S 1k in out
That gets stuck in waitpid (from sort.c's reap), waiting for a
dzip invocation that appears will never terminate. This is also
on that same 4-core system,
Paul Eggert wrote:
On 11/29/2010 02:46 PM, Paul Eggert wrote:
My current guess, by the way,
is that it's not a bug that can be triggered: it's merely
useless code that is harmless and can safely be removed.
I removed it as part of the following series of cleanup
patches. These are intended
Hi Professor Eggert,
On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert egg...@cs.ucla.edu wrote:
On 12/03/10 12:18, Chen Guo wrote:
Either option (either switch to mutexes everywhere, or have the top-level
merge go to memory) should work. Perhaps we should try both and benchmark
them.
Test
On 12/05/2010 09:16 PM, Chen Guo wrote:
Before saying anything else, I should note that for mutexes, on 4
threads 20% of the time there's a segfault on a seemingly innocuous
line in queue_insert ():
node-queued = true
It does sound like mutexes are the way to go, and that this bug
needs to
Paul Eggert wrote:
On 11/29/2010 02:46 PM, Paul Eggert wrote:
My current guess, by the way,
is that it's not a bug that can be triggered: it's merely
useless code that is harmless and can safely be removed.
I removed it as part of the following series of cleanup
patches. These are intended
Hi Professor Eggert,
On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert egg...@cs.ucla.edu wrote:
On 12/03/10 12:18, Chen Guo wrote:
Either option (either switch to mutexes everywhere, or have the top-level
merge go to memory) should work. Perhaps we should try both and benchmark
them.
Test
On 12/05/2010 09:16 PM, Chen Guo wrote:
Before saying anything else, I should note that for mutexes, on 4
threads 20% of the time there's a segfault on a seemingly innocuous
line in queue_insert ():
node-queued = true
It does sound like mutexes are the way to go, and that this bug
needs to
On 11/29/2010 02:46 PM, Paul Eggert wrote:
My current guess, by the way,
is that it's not a bug that can be triggered: it's merely
useless code that is harmless and can safely be removed.
I removed it as part of the following series of cleanup
patches. These are intended merely to refactor
On 11/29/2010 02:46 PM, Paul Eggert wrote:
My current guess, by the way,
is that it's not a bug that can be triggered: it's merely
useless code that is harmless and can safely be removed.
I removed it as part of the following series of cleanup
patches. These are intended merely to refactor
Thanks Jim, that helped a lot.
I'll try out Professor Eggert's suggestion, of switching to mutexes
only at the top level merge. Of the following approaches, which would
you guys consider better practice?
1) void pointer, cast as either mutex or spinlock in lock function
2) union of mutex and
On 12/03/10 12:18, Chen Guo wrote:
I'll try out Professor Eggert's suggestion, of switching to mutexes
only at the top level merge.
I'm having second thoughts about that. Yes, that'll prevent the
top-level merge (which is generating the actual output) from chewing
up CPU time. But it already
Thanks Jim, that helped a lot.
I'll try out Professor Eggert's suggestion, of switching to mutexes
only at the top level merge. Of the following approaches, which would
you guys consider better practice?
1) void pointer, cast as either mutex or spinlock in lock function
2) union of mutex and
Hi Professor Eggert,
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:
(for i in $(seq 12); do read line; echo $i; sleep .1; done
cat /dev/null) fifo
(ulimit -t 1; ./sort in fifo \
|| echo killed via $(env kill -l $(expr $? - 128)))
I ran this 10 times or so on
On 12/02/10 02:22, Chen Guo wrote:
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:
(for i in $(seq 12); do read line; echo $i; sleep .1; done
cat /dev/null) fifo
(ulimit -t 1; ./sort in fifo \
|| echo killed via $(env kill -l $(expr $? - 128)))
I ran this 10
Chen Guo wrote:
Hi Professor Eggert,
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:
(for i in $(seq 12); do read line; echo $i; sleep .1; done
cat /dev/null) fifo
(ulimit -t 1; ./sort in fifo \
|| echo killed via $(env kill -l $(expr $? - 128)))
I ran this
Hi Professor Eggert,
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:
(for i in $(seq 12); do read line; echo $i; sleep .1; done
cat /dev/null) fifo
(ulimit -t 1; ./sort in fifo \
|| echo killed via $(env kill -l $(expr $? - 128)))
I ran this 10 times or so on
Jim Meyering wrote:
Paul Eggert wrote:
Could you please try this little patch? It should fix your
problem. I came up with this fix in my sleep (literally!
I woke up this morning and the patch was in my head), but
haven't had time to look at the code in this area to see
if it's the best
Paul Eggert wrote:
On 11/29/2010 08:32 PM, Chen Guo wrote:
Hi guys,
Is something up with Savannah? I just tried a git clone and got
connection time out; I cant even reach git.sv.gnu.org via ping.
There was a breakin, which led to leaking of encrypted account
passwords, some of them
On Tue, Nov 30, 2010 at 2:22 PM, Paul Eggert egg...@cs.ucla.edu wrote:
Hi Professor Eggert,
Anyway, perhaps Chen can review them (I don't have time
to test them right now).
I'll look at it as soon as Savannah's back up; I never actually pulled
from coreutils after the original patch was
On 11/30/10 13:41, Jim Meyering wrote:
Is there anything you'd like to add?
No, thanks, that looks good. I have some other patches
to clean things up in this area, but they can wait.
I hate to tease, so here is a draft of the cleanup patches.
Most of this stuff is cleanup, but the first line of
On 11/29/2010 08:32 PM, Chen Guo wrote:
Hi guys,
Is something up with Savannah? I just tried a git clone and got
connection time out; I cant even reach git.sv.gnu.org via ping.
There was a breakin, which led to leaking of encrypted account
passwords, some of them discovered via a
Hi guys,
Is something up with Savannah? I just tried a git clone and got
connection time out; I cant even reach git.sv.gnu.org via ping.
I'll try again at work tomorrow.
Jim Meyering wrote:
Paul Eggert wrote:
Could you please try this little patch? It should fix your
problem. I came up with this fix in my sleep (literally!
I woke up this morning and the patch was in my head), but
haven't had time to look at the code in this area to see
if it's the best
On 11/28/10 23:14, DJ Lucas wrote:
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html
Ah, sorry, I didn't understand that message and thought Pádraig
had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced
the problem with the stated test case:
(for i in $(seq 12); do
Paul Eggert wrote:
Could you please try this little patch? It should fix your
problem. I came up with this fix in my sleep (literally!
I woke up this morning and the patch was in my head), but
haven't had time to look at the code in this area to see
if it's the best fix.
Clearly there's
On 29/11/10 07:14, DJ Lucas wrote:
On 11/27/2010 08:18 PM, DJ Lucas wrote:
lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat
/lfs-source-archive/cracklib-words-20080507 | sort -u /dev/null; echo $?
0
lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$
Appears to work as expected.
On 11/28/10 23:14, DJ Lucas wrote:
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html
Ah, sorry, I didn't understand that message and thought Pádraig
had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced
the problem with the stated test case:
(for i in $(seq 12); do
Hi all,
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert egg...@cs.ucla.edu wrote:
entirely and use mutexes instead. Perhaps a better fix would be to
use mutexes at the top level (where threads can write to a file and
therefore can wait) and to use spin locks at lower levels (where
threads are
On 11/29/10 16:34, Chen Guo wrote:
The only way this would work is if, when a struct is locked via mutex the only
threads trying to acquire the struct are trying to do so via mutex,
and no threads
are looking to lock via spinlock.
Yes, that's definitely the idea. Under either of my
On 11/27/2010 08:18 PM, DJ Lucas wrote:
lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat
/lfs-source-archive/cracklib-words-20080507 | sort -u /dev/null; echo $?
0
lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$
Appears to work as expected. Thanks for jumping on this so
On 11/26/2010 06:52 PM, Pádraig Brady wrote:
Hmm, seems like multiple threads are racing to update the
static saved variable in write_unique() ?
I don't think it's as simple as that. write_unique
is generating output, and when it is run it is supposed
to have exclusive access to the output
Following up on my previous email, it appears to me that
the following line in mergelines_node is weird:
node-dest -= lo_orig - node-lo + hi_orig - node-hi;
Surely there should be a * in front of that line?
(This does not fix the bug; perhaps it is a different bug?)
Could you please try this little patch? It should fix your
problem. I came up with this fix in my sleep (literally!
I woke up this morning and the patch was in my head), but
haven't had time to look at the code in this area to see
if it's the best fix.
Clearly there's at least one more bug as
Sent too bug-coreutils too (no bug id currently AFAICT).
Bug only affects multi-byte locales. Take the following samples:
bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug file
echo $?
sort: using `en_US.UTF-8' sorting rules
Segmentation fault
bash-4.1# echo $?
139
bash-4.1#
Thanks for the bug report. Unfortunately,
I cannot reproduce the problem with coreutils 8.7, either on
RHEL 5.5 x86-64 or on Ubuntu 10.10 x86.
Which version of coreutils are you running? And on what
platform? How did you build it?
Can you reproduce it with --parallel=2? If not, which value
On 11/26/2010 05:24 PM, Paul Eggert wrote:
Thanks for the bug report. Unfortunately,
I cannot reproduce the problem with coreutils 8.7, either on
RHEL 5.5 x86-64 or on Ubuntu 10.10 x86.
Which version of coreutils are you running?
8.7. Haven't tested on 8.6 or 8.5. 8.4 worked correctly,
On 26/11/10 18:01, DJ Lucas wrote:
Sent too bug-coreutils too (no bug id currently AFAICT).
Bug only affects multi-byte locales. Take the following samples:
bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug file
echo $?
sort: using `en_US.UTF-8' sorting rules
43 matches
Mail list logo