Multithreaded sort hangs on Solaris

2013-03-11 Thread McFarland, Jeffrey
I have come across some odd results regarding the sort utility in coreutils 
version 8.20.  I've looked through the archives and don't see any similar 
issues so it may be something specific to our systems.

System:  SunOS 5.10 Generic_147440-26 sun4u sparc SUNW,Sun-Fire-V890

Issue:  When running sort on a 22.5 GB file I found that about 1 out of 10 
times the process seems to hang (out of 100+ tests).  The process is still 
running but the temp files are no longer changing and the final file either has 
not been created or is a 0 byte file.  When this happens the temp files are 
never in the exact same state as a previous run.  On this machine a complete 
sort normally takes about 20 minutes.  On one occasion the process hung for 
over 48 hours before I killed it.  Running top shows no significant load on the 
system.


Command run:

./sort -t\n -S 256M --batch-size=100 -T /disk/craiwk01/prod/SORTWK -T 
/disk/craiwk02/prod/SORTWK -T /disk/craiwk03/prod/SORTWK -T 
/disk/craiwk04/prod/SORTWK -T /disk/craiwk06/prod/SORTWK -k1.1,1.10 infile -o 
infile.sorted



: ps

   PID TTY TIME CMD

16328 pts/3   5:06 sort

12697 pts/3   0:00 ps



: sudo truss -rall -wall -f -p 16328

16328:  lwp_park(0x, 0) (sleeping...)


: sudo pstack 16328

16328:  /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100 -T /disk/

-  lwp# 1 / thread# 1  

7d4d8818 lwp_park (0, 0, 0)

00019c74 sortlines (111b56580, 111c56080, 7fffeab0, 10012a321, 
7fffead0, 10012a328) + 514

0001a5cc sortlines (111558380, 2, 7fffeab0, 1121765e0, 0, 
7fffeab0) + e6c

0001a5cc sortlines (111956f80, 4, 7fffeab0, 112176420, 0, 
7fffeab0) + e6c

0001a5cc sortlines (112154760, 8, 7fffeab0, 1121760a0, 1, 
7fffeab0) + e6c

0001c070 sort (10012a740, 0, 7fffead0, 23, 10012cddd, 
112154760) + 350

0001e6e8 main (13, 7148, 0, 10012c220, fffd, 10012b1e0) + 
1ee8

000141bc _start (0, 0, 0, 0, 0, 0) + 7c

-  lwp# 240 / thread# 240  

0001a600 sortlines_thread(), exit value = 0x

** zombie (exited, not detached, not yet joined) **

-  lwp# 241 / thread# 241  

0001a600 sortlines_thread(), exit value = 0x

** zombie (exited, not detached, not yet joined) **

-  lwp# 242 / thread# 242  

0001a600 sortlines_thread(), exit value = 0x

** zombie (exited, not detached, not yet joined) **

If I change the sort to run as a single threaded process (add --parallel=1 to 
above command) then it doesn't hang.  This makes me think that it's most likely 
a threading issue.  I ran the same tests on a LINUX machine and it did not have 
the same hanging issue so it's most likely only an issue with Solaris.

I initially found this issue using coreutils 8.9 and I changed to 8.20 to see 
if a fix had been made but no luck.

Is this a known issue?  Are there any additional tests I should run to further 
narrow down this issue?

Thanks,

Jeff



This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.


[PATCH] cp: do not create empty dst file if failed to make reflink

2013-03-11 Thread Guangyu Sun

If making reflink across devices or if the filesystem does not support
reflink, we want it to return an error and do nothing else. However,
now it will create a new empty file to the dst. Fix it.

test case in an ext4 filesystem:
$ ls
foo
$ cp --reflink foo bar
cp: failed to clone `bar': Inappropriate ioctl for device
$ ls
foo bar
$

Signed-off-by: Guangyu Sun guangyu@oracle.com
---
 src/copy.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/copy.c b/src/copy.c
index 5c0ee1e..b323876 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1176,6 +1176,9 @@ close_src_and_dst_desc:
   error (0, errno, _(failed to close %s), quote (dst_name));
   return_val = false;
 }
+  if (! return_val  *new_dst)
+if (unlink (dst_name))
+  error (0, errno, _(cannot remove %s), quote (dst_name));
 close_src_desc:
   if (close (source_desc)  0)
 {
--
1.7.9.5



Re: [PATCH] shuf: use reservoir-sampling when possible

2013-03-11 Thread Assaf Gordon
Hello,

Pádraig Brady wrote, On 03/07/2013 06:26 PM:
 On 03/07/2013 07:32 PM, Assaf Gordon wrote:
 Pádraig Brady wrote, On 03/06/2013 08:24 PM:
 On 03/06/2013 11:50 PM, Assaf Gordon wrote:
 Attached is a suggestion to implement reservoir-sampling in shuf:
 When the expected output of lines is known, it will not load the entire 
 file into memory - allowing shuffling very large inputs.



 static size_t
 read_input_reservoir_sampling (FILE *in, char eolbyte, char ***pline, 
 size_t k,
struct randint_source *s)
 ...
   struct linebuffer *rsrv = XCALLOC (k, struct linebuffer); /* init 
 reservoir*/

 Since this change is mainly about efficient mem usage we should probably 
 handle
 the case where we have small inputs but large k.  This will allocate (and 
 zero)
 memory up front. The zeroing will defeat any memory overcommit configured 
 on the
 system, but it's probably better to avoid the large initial commit and 
 realloc
 as required (not per line, but per 1K lines maybe).



Attached is an updated version (mostly a re-write of the memory allocation 
part), as per the comment above.
Also includes a very_expensive valgrind test to exercise the new code.
(and the other patch is the uniform-distribution randomness test).

-gordon
From 0ff2403dde869af3f9a44dd7418aae3082d8c0aa Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Thu, 7 Mar 2013 01:57:57 -0500
Subject: [PATCH 1/2] shuf: add (expensive) test for randomness

To run manually:
  make check TESTS=tests/misc/shuf-randomess.sh \
 SUBDIRS=. RUN_VERY_EXPENSIVE_TESTS=yes

* tests/misc/shuf-randomness.sh: run 'shuf' repeatedly, and check if the
output is uniformly distributed enough.
* tests/local.mk: add new test script.
---
 tests/local.mk|1 +
 tests/misc/shuf-randomness.sh |  187 +
 2 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100755 tests/misc/shuf-randomness.sh

diff --git a/tests/local.mk b/tests/local.mk
index 607ddc4..d3923f8 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -313,6 +313,7 @@ all_tests =	\
   tests/misc/shred-passes.sh			\
   tests/misc/shred-remove.sh			\
   tests/misc/shuf.sh\
+  tests/misc/shuf-randomness.sh			\
   tests/misc/sort.pl\
   tests/misc/sort-benchmark-random.sh		\
   tests/misc/sort-compress.sh			\
diff --git a/tests/misc/shuf-randomness.sh b/tests/misc/shuf-randomness.sh
new file mode 100755
index 000..c0b9e2e
--- /dev/null
+++ b/tests/misc/shuf-randomness.sh
@@ -0,0 +1,187 @@
+#!/bin/sh
+# Test shuf for somewhat uniform randomness
+
+# Copyright (C) 2013 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+. ${srcdir=.}/tests/init.sh; path_prepend_ ./src
+print_ver_ shuf
+getlimits_
+
+# Don't run these tests by default.
+very_expensive_
+
+# Number of trails
+T=1000
+
+# Number of categories
+N=100
+REQUIRED_CHI_SQUARED=200 # Be extremely leniet:
+ # don't require great goodness of fit
+ # even for our assumed 99 degrees of freedom
+
+# K - when testing reservoir-sampling, print K lines
+K=20
+REQUIRED_CHI_SQUARED_K=50 # Be extremely leniet:
+  # don't require great goodness of fit
+  # even for our assumed 19 degrees of freedom
+
+
+
+# The input: many zeros followed by 1 one
+(yes 0 | head -n $((N-1)) ; echo 1 )  in || framework_failure_
+
+
+is_uniform()
+{
+  # Input is assumed to be a string of $T spaces-separated-values
+  # between 1 and $N
+  LINES=$1
+
+  # Convert spaces to new-lines
+  LINES=$(echo $LINES | tr ' ' '\n' | sed '/^$/d') || framework_failure_
+
+  # Requre exactly $T values
+  COUNT=$(echo $LINES | wc -l)
+  test $COUNT -eq $T || framework_failure_
+
+  # HIST is the histogram of counts per categories
+  #  ( categories are between 1 and $N )
+  HIST=$(echo $LINES | sort -n | uniq -c)
+
+  #DEBUG
+  #echo HIST=$HIST 12
+
+  ## Calculate Chi-Squared
+  CHI=$( echo $HIST |
+ awk -v n=$N -v t=$T '{ counts[$2] = $1 }
+  END {
+  exptd = ((1.0)*t)/n
+  chi = 0
+  for (i=1;i=n;++i)
+  {
+if (i in counts)
+   

bug#13927: stat --printf %t, %T flags (major and minor device types) don't work on mount points

2013-03-11 Thread Tyler Hobbs
If I run stat --printf='%D', the result is 10ca70, which is correct.
However, if I run stat --printf='%t %T' /mountpoint, the result is
erroneously 0 0.  If I instead run stat against the device directly (stat
--printf='%t %T' /dev/xvdx), I get the correct result of ca 170.

I believe the proper fix is to replace (in stat.c):

  out_uint_x (pformat, prefix_len, major (statbuf-st_rdev));

with:

  out_uint_x (pformat, prefix_len, major (statbuf-st_dev));

That is, use statbuf-st_dev instead of st_rdev, which is what the %d and
%D directives use.


I'm using coreutils 8.9, compiled from source, and this is the output of
uname -a:

Linux ip-10-39-122-238 2.6.32-276.el6.x86_64 #1 SMP Tue May 29 17:38:19 EDT
2012 x86_64 x86_64 x86_64 GNU/Linux

Thanks for your time.
- Tyler