Re: Generate random numbers with shuf
On 07/05/2013 10:43 PM, Assaf Gordon wrote: On 07/05/2013 12:12 PM, Pádraig Brady wrote: On 07/05/2013 07:04 PM, Assaf Gordon wrote: Hello, Regarding old discussion here: http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html Attached is a patch with adds --repetition option to shuf, enabling random number generation with repetitions. I like this. --repetition seems to be a very good interface too, since it aligns with standard math nomenclature in regard to permutations. I'd prefer to generalize it though, to supporting stdin as well as -i. Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e (using the naive implementation ATM). e.g. $ shuf --repetitions --head-count=100 --echo Head Tail or $ shuf -r -n100 -e Head Tail Excellent thanks. But the code is getting a bit messy, I guess from evolving features over time. I'd like to re-organize it a bit, re-factor some functions and make the code clearer - what do you think? it will make the code slightly more verbose (and slightly bigger), but shouldn't change the running performance. If you're getting your head around the code enough to refactor, then it would be great if you could handle the TODO: item in shuf.c Attached is an updated patch, with some code cleanups (not including said TODO item yet). -gordon I've split to two patches. 1. Unrelated test improvements. 2. All the rest Note in both patches I made adjustments to the tests like -c=$(cat exp | wc -l) || framework_failure_ +c=$(wc -l exp) || framework_failure_ -c=$(cat exp | sort -nu | fmt ) || framework_failure_ +c=$(sort -nu exp | paste -s -d ' ') || framework_failure_ I.E. avoid cat unless needed, and paste is more general than fmt in this usage. Also I simplified the --help a little like: - -r, --repetitions output COUNT values, with repetitions.\n\ -with -iLO-HI, output random numbers.\n\ -with -e, stdin or FILE, output random lines.\n\ -count defaults to 1 if -n COUNT is not used.\n\ + -r, --repetitions output COUNT items, allowing repetition.\n\ + -n 1 is implied if not specified.\n\ I'll push the 2 attached patches soon. thanks! Pádraig. From f20a3407a8ae8488b2e7434f75738b219a2320be Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Fri, 5 Jul 2013 14:59:44 -0600 Subject: [PATCH 1/2] tests: add more tests for shuf option combinations * test/misc/shuf.sh: Add tests for erroneous conditions like multiple '-o' and '--random-source'. --- tests/misc/shuf.sh | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/tests/misc/shuf.sh b/tests/misc/shuf.sh index 3e33b61..492fd41 100755 --- a/tests/misc/shuf.sh +++ b/tests/misc/shuf.sh @@ -65,4 +65,33 @@ if ! test -r unreadable; then shuf -n1 unreadable fail=1 fi +# Multiple -n is accepted, should use the smallest value +shuf -n10 -i0-9 -n3 -n20 exp || framework_failure_ +c=$(wc -l exp) || framework_failure_ +test $c -eq 3 || { fail=1; echo Multiple -n failed2 ; } + +# Test error conditions + +# -i and -e must not be used together +: | shuf -i -e A B + { fail=1; echo shuf did not detect erroneous -e and -i usage.2 ; } +# Test invalid value for -n +: | shuf -nA + { fail=1; echo shuf did not detect erroneous -n usage.2 ; } +# Test multiple -i +shuf -i0-9 -n10 -i8-90 + { fail=1; echo shuf did not detect multiple -i usage.2 ; } +# Test invalid range +for ARG in '1' 'A' '1-' '1-A'; do + shuf -i$ARG +{ fail=1; echo shuf did not detect erroneous -i$ARG usage.2 ; } +done + +# multiple -o are forbidden +shuf -i0-9 -o A -o B + { fail=1; echo shuf did not detect erroneous multiple -o usage.2 ; } +# multiple random-sources are forbidden +shuf -i0-9 --random-source A --random-source B + { fail=1; echo shuf did not detect multiple --random-source usage.2 ; } + Exit $fail -- 1.7.7.6 From 349eda8cb0765621979d8fd8b58c21e9c5d49073 Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Thu, 4 Jul 2013 13:26:45 -0600 Subject: [PATCH 2/2] shuf: add --repetition to support repetition in output main(): Process new option. Replace input_numbers_option_used() with a local variable. Re-organize argument processing. usage(): Describe the new option. (write_random_numbers): A new function to generate a permutation of the specified input range with repetition. (write_random_lines): Likewise for stdin and --echo. (write_permuted_numbers): New function refactored from write_permuted_output(). (write_permuted_lines): Likewise. * tests/misc/shuf.sh: Add tests for --repetitions option. * doc/coreutils.texi: Mention --repetitions, add examples. * TODO: Mention an optimization to avoid needing to read all of the input into memory with --repetitions. * NEWS: Mention new shuf option. --- NEWS |3 + doc/coreutils.texi | 37
Re: Generate random numbers with shuf
On 07/10/2013 09:20 AM, Pádraig Brady wrote: I've split to two patches. 1. Unrelated test improvements. 2. All the rest ... Note in both patches I made adjustments to the tests [...] ... I.E. avoid cat unless needed, and paste is more general than fmt in this usage. ... Also I simplified the --help a little [...] Indeed, looks more concise and much better. I keep on learning... I'll push the 2 attached patches soon. Thanks! -gordon
Re: Generate random numbers with shuf
On 07/11/2013 12:54 AM, Assaf Gordon wrote: On 07/10/2013 09:20 AM, Pádraig Brady wrote: I've split to two patches. 1. Unrelated test improvements. 2. All the rest ... Note in both patches I made adjustments to the tests [...] ... I.E. avoid cat unless needed, and paste is more general than fmt in this usage. ... Also I simplified the --help a little [...] Indeed, looks more concise and much better. I keep on learning... I'll push the 2 attached patches soon. pushed, thanks! Pádraig.
Re: Generate random numbers with shuf
Hello, On 07/04/2013 05:40 PM, Pádraig Brady wrote: On 07/04/2013 09:41 PM, Assaf Gordon wrote: Regarding old discussion here: http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html Attached is a patch with adds --repetition option to shuf, enabling random number generation with repetitions. I like this. --repetition seems to be a very good interface too, since it aligns with standard math nomenclature in regard to permutations. I'd prefer to generalize it though, to supporting stdin as well as -i. Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e (using the naive implementation ATM). e.g. $ shuf --repetitions --head-count=100 --echo Head Tail or $ shuf -r -n100 -e Head Tail But the code is getting a bit messy, I guess from evolving features over time. I'd like to re-organize it a bit, re-factor some functions and make the code clearer - what do you think? it will make the code slightly more verbose (and slightly bigger), but shouldn't change the running performance. -gordon From 9e14bf963eb27faed847a979677fb5f344c27362 Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Fri, 5 Jul 2013 11:58:16 -0600 Subject: [PATCH 0/7] *** SUBJECT HERE *** *** BLURB HERE *** Assaf Gordon (7): shuf: add --repetition to generate random numbers shuf: add tests for --repetition option shuf: mention new --repetition option in NEWS shuf: document new --repetition option shuf: enable --repetition on stdin/FILE/-e input shuf: add tests for --repetition with STDIN shuf: document new --repetitions option NEWS | 3 +++ doc/coreutils.texi | 37 ++ src/shuf.c | 66 -- tests/misc/shuf.sh | 63 +++ 4 files changed, 162 insertions(+), 7 deletions(-) -- 1.8.3.2 From c41160016ed36fe5b4e2b3d03cde34e0dcec84b6 Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Thu, 4 Jul 2013 13:26:45 -0600 Subject: [PATCH 1/7] shuf: add --repetition to generate random numbers * src/shuf.c: new option (-r,--repetition), generate random numbers. main(): process new option. usage(): mention new option. write_random_numbers(): generate random numbers. --- src/shuf.c | 50 ++ 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/src/shuf.c b/src/shuf.c index 0fabb0b..cdc3151 100644 --- a/src/shuf.c +++ b/src/shuf.c @@ -76,6 +76,9 @@ Write a random permutation of the input lines to standard output.\n\ -n, --head-count=COUNToutput at most COUNT lines\n\ -o, --output=FILE write result to FILE instead of standard output\n\ --random-source=FILE get random bytes from FILE\n\ + -r, --repetition used with -iLO-HI, output COUNT random numbers\n\ +between LO and HI, with repetitions.\n\ +count defaults to 1 if -n COUNT is not used.\n\ -z, --zero-terminated end lines with 0 byte, not newline\n\ ), stdout); fputs (HELP_OPTION_DESCRIPTION, stdout); @@ -104,6 +107,7 @@ static struct option const long_opts[] = {head-count, required_argument, NULL, 'n'}, {output, required_argument, NULL, 'o'}, {random-source, required_argument, NULL, RANDOM_SOURCE_OPTION}, + {repetition, no_argument, NULL, 'r'}, {zero-terminated, no_argument, NULL, 'z'}, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -328,6 +332,23 @@ write_permuted_output (size_t n_lines, char *const *line, size_t lo_input, return 0; } +static int +write_random_numbers (struct randint_source *s, size_t count, + size_t lo_input, size_t hi_input, char eolbyte) +{ + size_t i; + const randint range = hi_input - lo_input + 1; + + for (i = 0; i count; i++) +{ + randint j = lo_input + randint_choose (s, range); + if (printf (%lu%c, j, eolbyte) 0) +return -1; +} + + return 0; +} + int main (int argc, char **argv) { @@ -340,6 +361,7 @@ main (int argc, char **argv) char eolbyte = '\n'; char **input_lines = NULL; bool use_reservoir_sampling = false; + bool repetition = false; int optc; int n_operands; @@ -348,7 +370,7 @@ main (int argc, char **argv) char **line = NULL; struct linebuffer *reservoir = NULL; struct randint_source *randint_source; - size_t *permutation; + size_t *permutation = NULL; int i; initialize_main (argc, argv); @@ -424,6 +446,10 @@ main (int argc, char **argv) random_source = optarg; break; + case 'r': +repetition = true; +break; + case 'z': eolbyte = '\0'; break; @@ -454,9 +480,19 @@ main (int argc, char **argv) } n_lines = hi_input - lo_input + 1; line = NULL; + + /* When generating random numbers with repetitions, + the default count is one, unless
Re: Generate random numbers with shuf
On 07/05/2013 07:04 PM, Assaf Gordon wrote: Hello, On 07/04/2013 05:40 PM, Pádraig Brady wrote: On 07/04/2013 09:41 PM, Assaf Gordon wrote: Regarding old discussion here: http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html Attached is a patch with adds --repetition option to shuf, enabling random number generation with repetitions. I like this. --repetition seems to be a very good interface too, since it aligns with standard math nomenclature in regard to permutations. I'd prefer to generalize it though, to supporting stdin as well as -i. Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e (using the naive implementation ATM). e.g. $ shuf --repetitions --head-count=100 --echo Head Tail or $ shuf -r -n100 -e Head Tail Excellent thanks. But the code is getting a bit messy, I guess from evolving features over time. I'd like to re-organize it a bit, re-factor some functions and make the code clearer - what do you think? it will make the code slightly more verbose (and slightly bigger), but shouldn't change the running performance. If you're getting your head around the code enough to refactor, then it would be great if you could handle the TODO: item in shuf.c That would handle a performance regression in the common case with reservoir sampling, and would be a good fit for the upcoming release, given its performance theme. cheers, Pádraig.
Re: Generate random numbers with shuf
On 07/05/2013 12:12 PM, Pádraig Brady wrote: On 07/05/2013 07:04 PM, Assaf Gordon wrote: Hello, Regarding old discussion here: http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html Attached is a patch with adds --repetition option to shuf, enabling random number generation with repetitions. I like this. --repetition seems to be a very good interface too, since it aligns with standard math nomenclature in regard to permutations. I'd prefer to generalize it though, to supporting stdin as well as -i. Attached is an updated patch, supporting --repetitions with STDIN/FILE/-e (using the naive implementation ATM). e.g. $ shuf --repetitions --head-count=100 --echo Head Tail or $ shuf -r -n100 -e Head Tail Excellent thanks. But the code is getting a bit messy, I guess from evolving features over time. I'd like to re-organize it a bit, re-factor some functions and make the code clearer - what do you think? it will make the code slightly more verbose (and slightly bigger), but shouldn't change the running performance. If you're getting your head around the code enough to refactor, then it would be great if you could handle the TODO: item in shuf.c Attached is an updated patch, with some code cleanups (not including said TODO item yet). -gordon From 5ba2828e72f6d276fc349f69824cd6cb626053a4 Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Fri, 5 Jul 2013 15:41:17 -0600 Subject: [PATCH 00/14] *** SUBJECT HERE *** *** BLURB HERE *** Assaf Gordon (14): shuf: add --repetition to generate random numbers shuf: add tests for --repetition option shuf: mention new --repetition option in NEWS shuf: document new --repetition option shuf: enable --repetition on stdin/FILE/-e input shuf: add tests for --repetition with STDIN shuf: document new --repetitions option shuf: code-cleanup shuf: add more tests shuf: refactor --repetition with stdin shuf: refactor write_permuted_output() shuf: code cleanup shuf: code clean-up shuf: add tests for more erroneous usage NEWS | 3 + doc/coreutils.texi | 37 +++ src/shuf.c | 192 + tests/misc/shuf.sh | 92 + 4 files changed, 268 insertions(+), 56 deletions(-) -- 1.8.3.2 From c41160016ed36fe5b4e2b3d03cde34e0dcec84b6 Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Thu, 4 Jul 2013 13:26:45 -0600 Subject: [PATCH 01/14] shuf: add --repetition to generate random numbers * src/shuf.c: new option (-r,--repetition), generate random numbers. main(): process new option. usage(): mention new option. write_random_numbers(): generate random numbers. --- src/shuf.c | 50 ++ 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/src/shuf.c b/src/shuf.c index 0fabb0b..cdc3151 100644 --- a/src/shuf.c +++ b/src/shuf.c @@ -76,6 +76,9 @@ Write a random permutation of the input lines to standard output.\n\ -n, --head-count=COUNToutput at most COUNT lines\n\ -o, --output=FILE write result to FILE instead of standard output\n\ --random-source=FILE get random bytes from FILE\n\ + -r, --repetition used with -iLO-HI, output COUNT random numbers\n\ +between LO and HI, with repetitions.\n\ +count defaults to 1 if -n COUNT is not used.\n\ -z, --zero-terminated end lines with 0 byte, not newline\n\ ), stdout); fputs (HELP_OPTION_DESCRIPTION, stdout); @@ -104,6 +107,7 @@ static struct option const long_opts[] = {head-count, required_argument, NULL, 'n'}, {output, required_argument, NULL, 'o'}, {random-source, required_argument, NULL, RANDOM_SOURCE_OPTION}, + {repetition, no_argument, NULL, 'r'}, {zero-terminated, no_argument, NULL, 'z'}, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -328,6 +332,23 @@ write_permuted_output (size_t n_lines, char *const *line, size_t lo_input, return 0; } +static int +write_random_numbers (struct randint_source *s, size_t count, + size_t lo_input, size_t hi_input, char eolbyte) +{ + size_t i; + const randint range = hi_input - lo_input + 1; + + for (i = 0; i count; i++) +{ + randint j = lo_input + randint_choose (s, range); + if (printf (%lu%c, j, eolbyte) 0) +return -1; +} + + return 0; +} + int main (int argc, char **argv) { @@ -340,6 +361,7 @@ main (int argc, char **argv) char eolbyte = '\n'; char **input_lines = NULL; bool use_reservoir_sampling = false; + bool repetition = false; int optc; int n_operands; @@ -348,7 +370,7 @@ main (int argc, char **argv) char **line = NULL; struct linebuffer *reservoir = NULL; struct randint_source *randint_source; - size_t *permutation; + size_t *permutation = NULL; int i; initialize_main (argc, argv); @@ -424,6 +446,10 @@ main
Generate random numbers with shuf
Hello, Regarding old discussion here: http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html Attached is a patch with adds --repetition option to shuf, enabling random number generation with repetitions. Example: to generate 50 values between 0 and 9: $ shuf --rep -i0-9 -n50 Comments are welcomed, -gordon From 12ca3d6d5b8591e7bd424ff264b9f26cc2f31b90 Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Thu, 4 Jul 2013 14:40:15 -0600 Subject: [PATCH 0/4] *** SUBJECT HERE *** *** BLURB HERE *** Assaf Gordon (4): shuf: add --repetition to generate random numbers shuf: add tests for --repetition option shuf: mention new --repetition option in NEWS shuf: document new --repetition option NEWS | 3 +++ doc/coreutils.texi | 23 +++ src/shuf.c | 50 ++ tests/misc/shuf.sh | 29 + 4 files changed, 101 insertions(+), 4 deletions(-) -- 1.8.3.2 From 2c09d46ebeee61e2e46633dc8b9158edba1eaa8b Mon Sep 17 00:00:00 2001 From: Assaf Gordon assafgor...@gmail.com Date: Thu, 4 Jul 2013 13:26:45 -0600 Subject: [PATCH 1/4] shuf: add --repetition to generate random numbers * src/shuf.c: new option (-r,--repetition), generate random numbers. main(): process new option. usage(): mention new option. write_random_numbers(): generate random numbers. --- src/shuf.c | 50 ++ 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/src/shuf.c b/src/shuf.c index 0fabb0b..cdc3151 100644 --- a/src/shuf.c +++ b/src/shuf.c @@ -76,6 +76,9 @@ Write a random permutation of the input lines to standard output.\n\ -n, --head-count=COUNToutput at most COUNT lines\n\ -o, --output=FILE write result to FILE instead of standard output\n\ --random-source=FILE get random bytes from FILE\n\ + -r, --repetition used with -iLO-HI, output COUNT random numbers\n\ +between LO and HI, with repetitions.\n\ +count defaults to 1 if -n COUNT is not used.\n\ -z, --zero-terminated end lines with 0 byte, not newline\n\ ), stdout); fputs (HELP_OPTION_DESCRIPTION, stdout); @@ -104,6 +107,7 @@ static struct option const long_opts[] = {head-count, required_argument, NULL, 'n'}, {output, required_argument, NULL, 'o'}, {random-source, required_argument, NULL, RANDOM_SOURCE_OPTION}, + {repetition, no_argument, NULL, 'r'}, {zero-terminated, no_argument, NULL, 'z'}, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -328,6 +332,23 @@ write_permuted_output (size_t n_lines, char *const *line, size_t lo_input, return 0; } +static int +write_random_numbers (struct randint_source *s, size_t count, + size_t lo_input, size_t hi_input, char eolbyte) +{ + size_t i; + const randint range = hi_input - lo_input + 1; + + for (i = 0; i count; i++) +{ + randint j = lo_input + randint_choose (s, range); + if (printf (%lu%c, j, eolbyte) 0) +return -1; +} + + return 0; +} + int main (int argc, char **argv) { @@ -340,6 +361,7 @@ main (int argc, char **argv) char eolbyte = '\n'; char **input_lines = NULL; bool use_reservoir_sampling = false; + bool repetition = false; int optc; int n_operands; @@ -348,7 +370,7 @@ main (int argc, char **argv) char **line = NULL; struct linebuffer *reservoir = NULL; struct randint_source *randint_source; - size_t *permutation; + size_t *permutation = NULL; int i; initialize_main (argc, argv); @@ -424,6 +446,10 @@ main (int argc, char **argv) random_source = optarg; break; + case 'r': +repetition = true; +break; + case 'z': eolbyte = '\0'; break; @@ -454,9 +480,19 @@ main (int argc, char **argv) } n_lines = hi_input - lo_input + 1; line = NULL; + + /* When generating random numbers with repetitions, + the default count is one, unless specified by the user */ + if (repetition head_lines == SIZE_MAX) +head_lines = 1 ; } else { + if (repetition) +{ + error (0, 0, _(--repetition requires --input-range)); + usage (EXIT_FAILURE); +} switch (n_operands) { case 0: @@ -488,10 +524,12 @@ main (int argc, char **argv) } } - head_lines = MIN (head_lines, n_lines); + if (!repetition) +head_lines = MIN (head_lines, n_lines); randint_source = randint_all_new (random_source, -use_reservoir_sampling ? SIZE_MAX : +(use_reservoir_sampling||repetition)? +SIZE_MAX: randperm_bound (head_lines, n_lines)); if (! randint_source) error (EXIT_FAILURE, errno, %s