Re: Generate random numbers with shuf

2013-07-10 Thread Pádraig Brady
On 07/05/2013 10:43 PM, Assaf Gordon wrote:
 
 On 07/05/2013 12:12 PM, Pádraig Brady wrote:
 On 07/05/2013 07:04 PM, Assaf Gordon wrote:
 Hello,

 Regarding old discussion here:
 http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html

 Attached is a patch with adds --repetition option to shuf, enabling 
 random number generation with repetitions.


 I like this.
 --repetition seems to be a very good interface too,
 since it aligns with standard math nomenclature in regard to permutations.

 I'd prefer to generalize it though, to supporting stdin as well as -i.

 Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e 
 (using the naive implementation ATM).
 e.g.
$ shuf --repetitions --head-count=100 --echo Head Tail
 or
$ shuf -r -n100 -e Head Tail

 Excellent thanks.

 But the code is getting a bit messy, I guess from evolving features over 
 time.
 I'd like to re-organize it a bit, re-factor some functions and make the 
 code clearer - what do you think?
 it will make the code slightly more verbose (and slightly bigger), but 
 shouldn't change the running performance.

 If you're getting your head around the code enough to refactor,
 then it would be great if you could handle the TODO: item in shuf.c
 
 Attached is an updated patch, with some code cleanups (not including said 
 TODO item yet).
 
 -gordon

I've split to two patches.
1. Unrelated test improvements.
2. All the rest

Note in both patches I made adjustments to the tests like

-c=$(cat exp | wc -l) || framework_failure_
+c=$(wc -l  exp) || framework_failure_

-c=$(cat exp | sort -nu | fmt ) || framework_failure_
+c=$(sort -nu exp | paste -s -d ' ') || framework_failure_

I.E. avoid cat unless needed, and paste is more general than fmt in this usage.

Also I simplified the --help a little like:

-  -r, --repetitions output COUNT values, with repetitions.\n\
-with -iLO-HI, output random numbers.\n\
-with -e, stdin or FILE, output random lines.\n\
-count defaults to 1 if -n COUNT is not used.\n\
+  -r, --repetitions output COUNT items, allowing repetition.\n\
+  -n 1 is implied if not specified.\n\

I'll push the 2 attached patches soon.

thanks!
Pádraig.
From f20a3407a8ae8488b2e7434f75738b219a2320be Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Fri, 5 Jul 2013 14:59:44 -0600
Subject: [PATCH 1/2] tests: add more tests for shuf option combinations

* test/misc/shuf.sh: Add tests for erroneous conditions
like multiple '-o' and '--random-source'.
---
 tests/misc/shuf.sh |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/tests/misc/shuf.sh b/tests/misc/shuf.sh
index 3e33b61..492fd41 100755
--- a/tests/misc/shuf.sh
+++ b/tests/misc/shuf.sh
@@ -65,4 +65,33 @@ if ! test -r unreadable; then
   shuf -n1 unreadable  fail=1
 fi
 
+# Multiple -n is accepted, should use the smallest value
+shuf -n10 -i0-9 -n3 -n20  exp || framework_failure_
+c=$(wc -l  exp) || framework_failure_
+test $c -eq 3 || { fail=1; echo Multiple -n failed2 ; }
+
+# Test error conditions
+
+# -i and -e must not be used together
+: | shuf -i -e A B 
+  { fail=1; echo shuf did not detect erroneous -e and -i usage.2 ; }
+# Test invalid value for -n
+: | shuf -nA 
+  { fail=1; echo shuf did not detect erroneous -n usage.2 ; }
+# Test multiple -i
+shuf -i0-9 -n10 -i8-90 
+  { fail=1; echo shuf did not detect multiple -i usage.2 ; }
+# Test invalid range
+for ARG in '1' 'A' '1-' '1-A'; do
+  shuf -i$ARG 
+{ fail=1; echo shuf did not detect erroneous -i$ARG usage.2 ; }
+done
+
+# multiple -o are forbidden
+shuf -i0-9 -o A -o B 
+  { fail=1; echo shuf did not detect erroneous multiple -o usage.2 ; }
+# multiple random-sources are forbidden
+shuf -i0-9 --random-source A --random-source B 
+  { fail=1; echo shuf did not detect multiple --random-source usage.2 ; }
+
 Exit $fail
-- 
1.7.7.6


From 349eda8cb0765621979d8fd8b58c21e9c5d49073 Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Thu, 4 Jul 2013 13:26:45 -0600
Subject: [PATCH 2/2] shuf: add --repetition to support repetition in output

main(): Process new option.  Replace input_numbers_option_used()
with a local variable.  Re-organize argument processing.
usage(): Describe the new option.
(write_random_numbers): A new function to generate a
permutation of the specified input range with repetition.
(write_random_lines): Likewise for stdin and --echo.
(write_permuted_numbers):  New function refactored from
write_permuted_output().
(write_permuted_lines): Likewise.
* tests/misc/shuf.sh: Add tests for --repetitions option.
* doc/coreutils.texi: Mention --repetitions, add examples.
* TODO: Mention an optimization to avoid needing to
read all of the input into memory with --repetitions.
* NEWS: Mention new shuf option.
---
 NEWS   |3 +
 doc/coreutils.texi |   37 

Re: Generate random numbers with shuf

2013-07-10 Thread Assaf Gordon

On 07/10/2013 09:20 AM, Pádraig Brady wrote:


I've split to two patches.
1. Unrelated test improvements.
2. All the rest


...
 

Note in both patches I made adjustments to the tests [...]

...

I.E. avoid cat unless needed, and paste is more general than fmt in this usage.

...

Also I simplified the --help a little [...]


Indeed, looks more concise and much better. I keep on learning...



I'll push the 2 attached patches soon.



Thanks!
 -gordon





Re: Generate random numbers with shuf

2013-07-10 Thread Pádraig Brady
On 07/11/2013 12:54 AM, Assaf Gordon wrote:
 On 07/10/2013 09:20 AM, Pádraig Brady wrote:

 I've split to two patches.
 1. Unrelated test improvements.
 2. All the rest
 
 ...
  
 Note in both patches I made adjustments to the tests [...]
 ...
 I.E. avoid cat unless needed, and paste is more general than fmt in this 
 usage.
 ...
 Also I simplified the --help a little [...]
 
 Indeed, looks more concise and much better. I keep on learning...
 

 I'll push the 2 attached patches soon.


pushed,

thanks!
Pádraig.




Re: Generate random numbers with shuf

2013-07-05 Thread Assaf Gordon

Hello,

On 07/04/2013 05:40 PM, Pádraig Brady wrote:

On 07/04/2013 09:41 PM, Assaf Gordon wrote:


Regarding old discussion here:
http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html

Attached is a patch with adds --repetition option to shuf, enabling random 
number generation with repetitions.



I like this.
--repetition seems to be a very good interface too,
since it aligns with standard math nomenclature in regard to permutations.

I'd prefer to generalize it though, to supporting stdin as well as -i.


Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e 
(using the naive implementation ATM).
e.g.
  $ shuf --repetitions --head-count=100 --echo Head Tail
or
  $ shuf -r -n100 -e Head Tail


But the code is getting a bit messy, I guess from evolving features over time.
I'd like to re-organize it a bit, re-factor some functions and make the code 
clearer - what do you think?
it will make the code slightly more verbose (and slightly bigger), but 
shouldn't change the running performance.

-gordon



From 9e14bf963eb27faed847a979677fb5f344c27362 Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Fri, 5 Jul 2013 11:58:16 -0600
Subject: [PATCH 0/7] *** SUBJECT HERE ***

*** BLURB HERE ***

Assaf Gordon (7):
  shuf: add --repetition to generate random numbers
  shuf: add tests for --repetition option
  shuf: mention new --repetition option in NEWS
  shuf: document new --repetition option
  shuf: enable --repetition on stdin/FILE/-e input
  shuf: add tests for --repetition with STDIN
  shuf: document new --repetitions option

 NEWS   |  3 +++
 doc/coreutils.texi | 37 ++
 src/shuf.c | 66 --
 tests/misc/shuf.sh | 63 +++
 4 files changed, 162 insertions(+), 7 deletions(-)

-- 
1.8.3.2

From c41160016ed36fe5b4e2b3d03cde34e0dcec84b6 Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Thu, 4 Jul 2013 13:26:45 -0600
Subject: [PATCH 1/7] shuf: add --repetition to generate random numbers

* src/shuf.c: new option (-r,--repetition), generate random numbers.
main(): process new option.
usage(): mention new option.
write_random_numbers(): generate random numbers.
---
 src/shuf.c | 50 ++
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/src/shuf.c b/src/shuf.c
index 0fabb0b..cdc3151 100644
--- a/src/shuf.c
+++ b/src/shuf.c
@@ -76,6 +76,9 @@ Write a random permutation of the input lines to standard output.\n\
   -n, --head-count=COUNToutput at most COUNT lines\n\
   -o, --output=FILE write result to FILE instead of standard output\n\
   --random-source=FILE  get random bytes from FILE\n\
+  -r, --repetition  used with -iLO-HI, output COUNT random numbers\n\
+between LO and HI, with repetitions.\n\
+count defaults to 1 if -n COUNT is not used.\n\
   -z, --zero-terminated end lines with 0 byte, not newline\n\
 ), stdout);
   fputs (HELP_OPTION_DESCRIPTION, stdout);
@@ -104,6 +107,7 @@ static struct option const long_opts[] =
   {head-count, required_argument, NULL, 'n'},
   {output, required_argument, NULL, 'o'},
   {random-source, required_argument, NULL, RANDOM_SOURCE_OPTION},
+  {repetition, no_argument, NULL, 'r'},
   {zero-terminated, no_argument, NULL, 'z'},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
@@ -328,6 +332,23 @@ write_permuted_output (size_t n_lines, char *const *line, size_t lo_input,
   return 0;
 }
 
+static int
+write_random_numbers (struct randint_source *s, size_t count,
+  size_t lo_input, size_t hi_input, char eolbyte)
+{
+  size_t i;
+  const randint range = hi_input - lo_input + 1;
+
+  for (i = 0; i  count; i++)
+{
+  randint j = lo_input + randint_choose (s, range);
+  if (printf (%lu%c, j, eolbyte)  0)
+return -1;
+}
+
+  return 0;
+}
+
 int
 main (int argc, char **argv)
 {
@@ -340,6 +361,7 @@ main (int argc, char **argv)
   char eolbyte = '\n';
   char **input_lines = NULL;
   bool use_reservoir_sampling = false;
+  bool repetition = false;
 
   int optc;
   int n_operands;
@@ -348,7 +370,7 @@ main (int argc, char **argv)
   char **line = NULL;
   struct linebuffer *reservoir = NULL;
   struct randint_source *randint_source;
-  size_t *permutation;
+  size_t *permutation = NULL;
   int i;
 
   initialize_main (argc, argv);
@@ -424,6 +446,10 @@ main (int argc, char **argv)
 random_source = optarg;
 break;
 
+  case 'r':
+repetition = true;
+break;
+
   case 'z':
 eolbyte = '\0';
 break;
@@ -454,9 +480,19 @@ main (int argc, char **argv)
 }
   n_lines = hi_input - lo_input + 1;
   line = NULL;
+
+  /* When generating random numbers with repetitions,
+ the default count is one, unless

Re: Generate random numbers with shuf

2013-07-05 Thread Pádraig Brady
On 07/05/2013 07:04 PM, Assaf Gordon wrote:
 Hello,
 
 On 07/04/2013 05:40 PM, Pádraig Brady wrote:
 On 07/04/2013 09:41 PM, Assaf Gordon wrote:

 Regarding old discussion here:
 http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html

 Attached is a patch with adds --repetition option to shuf, enabling 
 random number generation with repetitions.


 I like this.
 --repetition seems to be a very good interface too,
 since it aligns with standard math nomenclature in regard to permutations.

 I'd prefer to generalize it though, to supporting stdin as well as -i.
 
 Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e 
 (using the naive implementation ATM).
 e.g.
   $ shuf --repetitions --head-count=100 --echo Head Tail
 or
   $ shuf -r -n100 -e Head Tail

Excellent thanks.

 But the code is getting a bit messy, I guess from evolving features over time.
 I'd like to re-organize it a bit, re-factor some functions and make the code 
 clearer - what do you think?
 it will make the code slightly more verbose (and slightly bigger), but 
 shouldn't change the running performance.

If you're getting your head around the code enough to refactor,
then it would be great if you could handle the TODO: item in shuf.c
That would handle a performance regression in the common case
with reservoir sampling, and would be a good fit for the
upcoming release, given its performance theme.

cheers,
Pádraig.



Re: Generate random numbers with shuf

2013-07-05 Thread Assaf Gordon


On 07/05/2013 12:12 PM, Pádraig Brady wrote:

On 07/05/2013 07:04 PM, Assaf Gordon wrote:

Hello,



Regarding old discussion here:
http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html

Attached is a patch with adds --repetition option to shuf, enabling random 
number generation with repetitions.



I like this.
--repetition seems to be a very good interface too,
since it aligns with standard math nomenclature in regard to permutations.

I'd prefer to generalize it though, to supporting stdin as well as -i.


Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e 
(using the naive implementation ATM).
e.g.
   $ shuf --repetitions --head-count=100 --echo Head Tail
or
   $ shuf -r -n100 -e Head Tail


Excellent thanks.


But the code is getting a bit messy, I guess from evolving features over time.
I'd like to re-organize it a bit, re-factor some functions and make the code 
clearer - what do you think?
it will make the code slightly more verbose (and slightly bigger), but 
shouldn't change the running performance.


If you're getting your head around the code enough to refactor,
then it would be great if you could handle the TODO: item in shuf.c


Attached is an updated patch, with some code cleanups (not including said TODO 
item yet).

-gordon


From 5ba2828e72f6d276fc349f69824cd6cb626053a4 Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Fri, 5 Jul 2013 15:41:17 -0600
Subject: [PATCH 00/14] *** SUBJECT HERE ***

*** BLURB HERE ***

Assaf Gordon (14):
  shuf: add --repetition to generate random numbers
  shuf: add tests for --repetition option
  shuf: mention new --repetition option in NEWS
  shuf: document new --repetition option
  shuf: enable --repetition on stdin/FILE/-e input
  shuf: add tests for --repetition with STDIN
  shuf: document new --repetitions option
  shuf: code-cleanup
  shuf: add more tests
  shuf: refactor --repetition with stdin
  shuf: refactor write_permuted_output()
  shuf: code cleanup
  shuf: code clean-up
  shuf: add tests for more erroneous usage

 NEWS   |   3 +
 doc/coreutils.texi |  37 +++
 src/shuf.c | 192 +
 tests/misc/shuf.sh |  92 +
 4 files changed, 268 insertions(+), 56 deletions(-)

-- 
1.8.3.2

From c41160016ed36fe5b4e2b3d03cde34e0dcec84b6 Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Thu, 4 Jul 2013 13:26:45 -0600
Subject: [PATCH 01/14] shuf: add --repetition to generate random numbers

* src/shuf.c: new option (-r,--repetition), generate random numbers.
main(): process new option.
usage(): mention new option.
write_random_numbers(): generate random numbers.
---
 src/shuf.c | 50 ++
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/src/shuf.c b/src/shuf.c
index 0fabb0b..cdc3151 100644
--- a/src/shuf.c
+++ b/src/shuf.c
@@ -76,6 +76,9 @@ Write a random permutation of the input lines to standard output.\n\
   -n, --head-count=COUNToutput at most COUNT lines\n\
   -o, --output=FILE write result to FILE instead of standard output\n\
   --random-source=FILE  get random bytes from FILE\n\
+  -r, --repetition  used with -iLO-HI, output COUNT random numbers\n\
+between LO and HI, with repetitions.\n\
+count defaults to 1 if -n COUNT is not used.\n\
   -z, --zero-terminated end lines with 0 byte, not newline\n\
 ), stdout);
   fputs (HELP_OPTION_DESCRIPTION, stdout);
@@ -104,6 +107,7 @@ static struct option const long_opts[] =
   {head-count, required_argument, NULL, 'n'},
   {output, required_argument, NULL, 'o'},
   {random-source, required_argument, NULL, RANDOM_SOURCE_OPTION},
+  {repetition, no_argument, NULL, 'r'},
   {zero-terminated, no_argument, NULL, 'z'},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
@@ -328,6 +332,23 @@ write_permuted_output (size_t n_lines, char *const *line, size_t lo_input,
   return 0;
 }
 
+static int
+write_random_numbers (struct randint_source *s, size_t count,
+  size_t lo_input, size_t hi_input, char eolbyte)
+{
+  size_t i;
+  const randint range = hi_input - lo_input + 1;
+
+  for (i = 0; i  count; i++)
+{
+  randint j = lo_input + randint_choose (s, range);
+  if (printf (%lu%c, j, eolbyte)  0)
+return -1;
+}
+
+  return 0;
+}
+
 int
 main (int argc, char **argv)
 {
@@ -340,6 +361,7 @@ main (int argc, char **argv)
   char eolbyte = '\n';
   char **input_lines = NULL;
   bool use_reservoir_sampling = false;
+  bool repetition = false;
 
   int optc;
   int n_operands;
@@ -348,7 +370,7 @@ main (int argc, char **argv)
   char **line = NULL;
   struct linebuffer *reservoir = NULL;
   struct randint_source *randint_source;
-  size_t *permutation;
+  size_t *permutation = NULL;
   int i;
 
   initialize_main (argc, argv);
@@ -424,6 +446,10 @@ main

Generate random numbers with shuf

2013-07-04 Thread Assaf Gordon

Hello,

Regarding old discussion here:
http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html

Attached is a patch with adds --repetition option to shuf, enabling random 
number generation with repetitions.

Example:

to generate 50 values between 0 and 9:
  $ shuf --rep -i0-9 -n50

Comments are welcomed,
 -gordon

From 12ca3d6d5b8591e7bd424ff264b9f26cc2f31b90 Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Thu, 4 Jul 2013 14:40:15 -0600
Subject: [PATCH 0/4] *** SUBJECT HERE ***

*** BLURB HERE ***

Assaf Gordon (4):
  shuf: add --repetition to generate random numbers
  shuf: add tests for --repetition option
  shuf: mention new --repetition option in NEWS
  shuf: document new --repetition option

 NEWS   |  3 +++
 doc/coreutils.texi | 23 +++
 src/shuf.c | 50 ++
 tests/misc/shuf.sh | 29 +
 4 files changed, 101 insertions(+), 4 deletions(-)

-- 
1.8.3.2

From 2c09d46ebeee61e2e46633dc8b9158edba1eaa8b Mon Sep 17 00:00:00 2001
From: Assaf Gordon assafgor...@gmail.com
Date: Thu, 4 Jul 2013 13:26:45 -0600
Subject: [PATCH 1/4] shuf: add --repetition to generate random numbers

* src/shuf.c: new option (-r,--repetition), generate random numbers.
main(): process new option.
usage(): mention new option.
write_random_numbers(): generate random numbers.
---
 src/shuf.c | 50 ++
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/src/shuf.c b/src/shuf.c
index 0fabb0b..cdc3151 100644
--- a/src/shuf.c
+++ b/src/shuf.c
@@ -76,6 +76,9 @@ Write a random permutation of the input lines to standard output.\n\
   -n, --head-count=COUNToutput at most COUNT lines\n\
   -o, --output=FILE write result to FILE instead of standard output\n\
   --random-source=FILE  get random bytes from FILE\n\
+  -r, --repetition  used with -iLO-HI, output COUNT random numbers\n\
+between LO and HI, with repetitions.\n\
+count defaults to 1 if -n COUNT is not used.\n\
   -z, --zero-terminated end lines with 0 byte, not newline\n\
 ), stdout);
   fputs (HELP_OPTION_DESCRIPTION, stdout);
@@ -104,6 +107,7 @@ static struct option const long_opts[] =
   {head-count, required_argument, NULL, 'n'},
   {output, required_argument, NULL, 'o'},
   {random-source, required_argument, NULL, RANDOM_SOURCE_OPTION},
+  {repetition, no_argument, NULL, 'r'},
   {zero-terminated, no_argument, NULL, 'z'},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
@@ -328,6 +332,23 @@ write_permuted_output (size_t n_lines, char *const *line, size_t lo_input,
   return 0;
 }
 
+static int
+write_random_numbers (struct randint_source *s, size_t count,
+  size_t lo_input, size_t hi_input, char eolbyte)
+{
+  size_t i;
+  const randint range = hi_input - lo_input + 1;
+
+  for (i = 0; i  count; i++)
+{
+  randint j = lo_input + randint_choose (s, range);
+  if (printf (%lu%c, j, eolbyte)  0)
+return -1;
+}
+
+  return 0;
+}
+
 int
 main (int argc, char **argv)
 {
@@ -340,6 +361,7 @@ main (int argc, char **argv)
   char eolbyte = '\n';
   char **input_lines = NULL;
   bool use_reservoir_sampling = false;
+  bool repetition = false;
 
   int optc;
   int n_operands;
@@ -348,7 +370,7 @@ main (int argc, char **argv)
   char **line = NULL;
   struct linebuffer *reservoir = NULL;
   struct randint_source *randint_source;
-  size_t *permutation;
+  size_t *permutation = NULL;
   int i;
 
   initialize_main (argc, argv);
@@ -424,6 +446,10 @@ main (int argc, char **argv)
 random_source = optarg;
 break;
 
+  case 'r':
+repetition = true;
+break;
+
   case 'z':
 eolbyte = '\0';
 break;
@@ -454,9 +480,19 @@ main (int argc, char **argv)
 }
   n_lines = hi_input - lo_input + 1;
   line = NULL;
+
+  /* When generating random numbers with repetitions,
+ the default count is one, unless specified by the user */
+  if (repetition  head_lines == SIZE_MAX)
+head_lines = 1 ;
 }
   else
 {
+  if (repetition)
+{
+  error (0, 0, _(--repetition requires --input-range));
+  usage (EXIT_FAILURE);
+}
   switch (n_operands)
 {
 case 0:
@@ -488,10 +524,12 @@ main (int argc, char **argv)
 }
 }
 
-  head_lines = MIN (head_lines, n_lines);
+  if (!repetition)
+head_lines = MIN (head_lines, n_lines);
 
   randint_source = randint_all_new (random_source,
-use_reservoir_sampling ? SIZE_MAX :
+(use_reservoir_sampling||repetition)?
+SIZE_MAX:
 randperm_bound (head_lines, n_lines));
   if (! randint_source)
 error (EXIT_FAILURE, errno, %s