Multithreaded sort hangs on Solaris
I have come across some odd results regarding the sort utility in coreutils version 8.20. I've looked through the archives and don't see any similar issues so it may be something specific to our systems. System: SunOS 5.10 Generic_147440-26 sun4u sparc SUNW,Sun-Fire-V890 Issue: When running sort on a 22.5 GB file I found that about 1 out of 10 times the process seems to hang (out of 100+ tests). The process is still running but the temp files are no longer changing and the final file either has not been created or is a 0 byte file. When this happens the temp files are never in the exact same state as a previous run. On this machine a complete sort normally takes about 20 minutes. On one occasion the process hung for over 48 hours before I killed it. Running top shows no significant load on the system. Command run: ./sort -t\n -S 256M --batch-size=100 -T /disk/craiwk01/prod/SORTWK -T /disk/craiwk02/prod/SORTWK -T /disk/craiwk03/prod/SORTWK -T /disk/craiwk04/prod/SORTWK -T /disk/craiwk06/prod/SORTWK -k1.1,1.10 infile -o infile.sorted : ps PID TTY TIME CMD 16328 pts/3 5:06 sort 12697 pts/3 0:00 ps : sudo truss -rall -wall -f -p 16328 16328: lwp_park(0x, 0) (sleeping...) : sudo pstack 16328 16328: /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100 -T /disk/ - lwp# 1 / thread# 1 7d4d8818 lwp_park (0, 0, 0) 00019c74 sortlines (111b56580, 111c56080, 7fffeab0, 10012a321, 7fffead0, 10012a328) + 514 0001a5cc sortlines (111558380, 2, 7fffeab0, 1121765e0, 0, 7fffeab0) + e6c 0001a5cc sortlines (111956f80, 4, 7fffeab0, 112176420, 0, 7fffeab0) + e6c 0001a5cc sortlines (112154760, 8, 7fffeab0, 1121760a0, 1, 7fffeab0) + e6c 0001c070 sort (10012a740, 0, 7fffead0, 23, 10012cddd, 112154760) + 350 0001e6e8 main (13, 7148, 0, 10012c220, fffd, 10012b1e0) + 1ee8 000141bc _start (0, 0, 0, 0, 0, 0) + 7c - lwp# 240 / thread# 240 0001a600 sortlines_thread(), exit value = 0x ** zombie (exited, not detached, not yet joined) ** - lwp# 241 / thread# 241 0001a600 sortlines_thread(), exit value = 0x ** zombie (exited, not detached, not yet joined) ** - lwp# 242 / thread# 242 0001a600 sortlines_thread(), exit value = 0x ** zombie (exited, not detached, not yet joined) ** If I change the sort to run as a single threaded process (add --parallel=1 to above command) then it doesn't hang. This makes me think that it's most likely a threading issue. I ran the same tests on a LINUX machine and it did not have the same hanging issue so it's most likely only an issue with Solaris. I initially found this issue using coreutils 8.9 and I changed to 8.20 to see if a fix had been made but no luck. Is this a known issue? Are there any additional tests I should run to further narrow down this issue? Thanks, Jeff This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.
[PATCH] cp: do not create empty dst file if failed to make reflink
If making reflink across devices or if the filesystem does not support reflink, we want it to return an error and do nothing else. However, now it will create a new empty file to the dst. Fix it. test case in an ext4 filesystem: $ ls foo $ cp --reflink foo bar cp: failed to clone `bar': Inappropriate ioctl for device $ ls foo bar $ Signed-off-by: Guangyu Sun guangyu@oracle.com --- src/copy.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/src/copy.c b/src/copy.c index 5c0ee1e..b323876 100644 --- a/src/copy.c +++ b/src/copy.c @@ -1176,6 +1176,9 @@ close_src_and_dst_desc: error (0, errno, _(failed to close %s), quote (dst_name)); return_val = false; } + if (! return_val *new_dst) +if (unlink (dst_name)) + error (0, errno, _(cannot remove %s), quote (dst_name)); close_src_desc: if (close (source_desc) 0) { -- 1.7.9.5
Re: [PATCH] shuf: use reservoir-sampling when possible
Hello, Pádraig Brady wrote, On 03/07/2013 06:26 PM: On 03/07/2013 07:32 PM, Assaf Gordon wrote: Pádraig Brady wrote, On 03/06/2013 08:24 PM: On 03/06/2013 11:50 PM, Assaf Gordon wrote: Attached is a suggestion to implement reservoir-sampling in shuf: When the expected output of lines is known, it will not load the entire file into memory - allowing shuffling very large inputs. static size_t read_input_reservoir_sampling (FILE *in, char eolbyte, char ***pline, size_t k, struct randint_source *s) ... struct linebuffer *rsrv = XCALLOC (k, struct linebuffer); /* init reservoir*/ Since this change is mainly about efficient mem usage we should probably handle the case where we have small inputs but large k. This will allocate (and zero) memory up front. The zeroing will defeat any memory overcommit configured on the system, but it's probably better to avoid the large initial commit and realloc as required (not per line, but per 1K lines maybe). Attached is an updated version (mostly a re-write of the memory allocation part), as per the comment above. Also includes a very_expensive valgrind test to exercise the new code. (and the other patch is the uniform-distribution randomness test). -gordon From 0ff2403dde869af3f9a44dd7418aae3082d8c0aa Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Thu, 7 Mar 2013 01:57:57 -0500 Subject: [PATCH 1/2] shuf: add (expensive) test for randomness To run manually: make check TESTS=tests/misc/shuf-randomess.sh \ SUBDIRS=. RUN_VERY_EXPENSIVE_TESTS=yes * tests/misc/shuf-randomness.sh: run 'shuf' repeatedly, and check if the output is uniformly distributed enough. * tests/local.mk: add new test script. --- tests/local.mk|1 + tests/misc/shuf-randomness.sh | 187 + 2 files changed, 188 insertions(+), 0 deletions(-) create mode 100755 tests/misc/shuf-randomness.sh diff --git a/tests/local.mk b/tests/local.mk index 607ddc4..d3923f8 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -313,6 +313,7 @@ all_tests = \ tests/misc/shred-passes.sh \ tests/misc/shred-remove.sh \ tests/misc/shuf.sh\ + tests/misc/shuf-randomness.sh \ tests/misc/sort.pl\ tests/misc/sort-benchmark-random.sh \ tests/misc/sort-compress.sh \ diff --git a/tests/misc/shuf-randomness.sh b/tests/misc/shuf-randomness.sh new file mode 100755 index 000..c0b9e2e --- /dev/null +++ b/tests/misc/shuf-randomness.sh @@ -0,0 +1,187 @@ +#!/bin/sh +# Test shuf for somewhat uniform randomness + +# Copyright (C) 2013 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + +. ${srcdir=.}/tests/init.sh; path_prepend_ ./src +print_ver_ shuf +getlimits_ + +# Don't run these tests by default. +very_expensive_ + +# Number of trails +T=1000 + +# Number of categories +N=100 +REQUIRED_CHI_SQUARED=200 # Be extremely leniet: + # don't require great goodness of fit + # even for our assumed 99 degrees of freedom + +# K - when testing reservoir-sampling, print K lines +K=20 +REQUIRED_CHI_SQUARED_K=50 # Be extremely leniet: + # don't require great goodness of fit + # even for our assumed 19 degrees of freedom + + + +# The input: many zeros followed by 1 one +(yes 0 | head -n $((N-1)) ; echo 1 ) in || framework_failure_ + + +is_uniform() +{ + # Input is assumed to be a string of $T spaces-separated-values + # between 1 and $N + LINES=$1 + + # Convert spaces to new-lines + LINES=$(echo $LINES | tr ' ' '\n' | sed '/^$/d') || framework_failure_ + + # Requre exactly $T values + COUNT=$(echo $LINES | wc -l) + test $COUNT -eq $T || framework_failure_ + + # HIST is the histogram of counts per categories + # ( categories are between 1 and $N ) + HIST=$(echo $LINES | sort -n | uniq -c) + + #DEBUG + #echo HIST=$HIST 12 + + ## Calculate Chi-Squared + CHI=$( echo $HIST | + awk -v n=$N -v t=$T '{ counts[$2] = $1 } + END { + exptd = ((1.0)*t)/n + chi = 0 + for (i=1;i=n;++i) + { +if (i in counts) +
bug#13927: stat --printf %t, %T flags (major and minor device types) don't work on mount points
If I run stat --printf='%D', the result is 10ca70, which is correct. However, if I run stat --printf='%t %T' /mountpoint, the result is erroneously 0 0. If I instead run stat against the device directly (stat --printf='%t %T' /dev/xvdx), I get the correct result of ca 170. I believe the proper fix is to replace (in stat.c): out_uint_x (pformat, prefix_len, major (statbuf-st_rdev)); with: out_uint_x (pformat, prefix_len, major (statbuf-st_dev)); That is, use statbuf-st_dev instead of st_rdev, which is what the %d and %D directives use. I'm using coreutils 8.9, compiled from source, and this is the output of uname -a: Linux ip-10-39-122-238 2.6.32-276.el6.x86_64 #1 SMP Tue May 29 17:38:19 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux Thanks for your time. - Tyler