Re: cut -DF
Hello, Here's an updated patch for "cut -DF". Since it's a new code path, it opens the possibility of finally supporting multibyte characters with "cut -c". comments very welcomed, - assaf [PATCH 01/18] cut: set-fields: add no-sort options [PATCH 02/18] cut: iniitial -D implmentation, currently only with [PATCH 03/18] tests: add 'cut -D' tests [PATCH 04/18] cut: extract 'cut -D -f' to a separate function [PATCH 05/18] cut: implement -D with -b [PATCH 06/18] tests: add 'cut -D -b' tests [PATCH 07/18] cut: add -O short-option for --output-delimiter [PATCH 08/18] cut: implement -F [PATCH 09/18] tests: add 'cut -F' tests [PATCH 10/18] cut: extract cut-fields into separate functions [PATCH 11/18] cut: implement multibyte -c/--characters [PATCH 12/18] cut: change -F regex syntax to BRE [PATCH 13/18] cut: change -D long-option equivalent [PATCH 14/18] doc: mention 'cut -D' in NEWS [PATCH 15/18] doc: mention 'cut -F' in NEWS [PATCH 16/18] doc: mention 'cut -O' in NEWS [PATCH 17/18] doc: mention multibyte 'cut -c' in NEWS [PATCH 18/18] doc: expand 'cut' section From 2557ced8cb30655ef55c8532d814798172b5c392 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Wed, 5 Jan 2022 13:03:39 -0700 Subject: [PATCH 01/18] cut: set-fields: add no-sort options --- src/set-fields.c | 27 +++ src/set-fields.h | 4 +++- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/src/set-fields.c b/src/set-fields.c index e3cce30d9..5e4ee6715 100644 --- a/src/set-fields.c +++ b/src/set-fields.c @@ -279,22 +279,25 @@ set_fields (char const *fieldstr, unsigned int options) ? _("missing list of byte/character positions") : _("missing list of fields")); - qsort (frp, n_frp, sizeof (frp[0]), compare_ranges); - - /* Merge range pairs (e.g. `2-5,3-4' becomes `2-5'). */ - for (size_t i = 0; i < n_frp; ++i) + if (!(options & SETFLD_NO_SORT)) { - for (size_t j = i + 1; j < n_frp; ++j) + qsort (frp, n_frp, sizeof (frp[0]), compare_ranges); + + /* Merge range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + for (size_t i = 0; i < n_frp; ++i) { - if (frp[j].lo <= frp[i].hi) + for (size_t j = i + 1; j < n_frp; ++j) { - frp[i].hi = MAX (frp[j].hi, frp[i].hi); - memmove (frp + j, frp + j + 1, (n_frp - j - 1) * sizeof *frp); - n_frp--; - j--; + if (frp[j].lo <= frp[i].hi) +{ + frp[i].hi = MAX (frp[j].hi, frp[i].hi); + memmove (frp + j, frp + j + 1, (n_frp - j - 1) * sizeof *frp); + n_frp--; + j--; +} + else +break; } - else -break; } } diff --git a/src/set-fields.h b/src/set-fields.h index 7bc9b3afe..9127d9957 100644 --- a/src/set-fields.h +++ b/src/set-fields.h @@ -34,8 +34,10 @@ enum { SETFLD_ALLOW_DASH = 0x01, /* allow single dash meaning 'all fields' */ SETFLD_COMPLEMENT = 0x02, /* complement the field list */ - SETFLD_ERRMSG_USE_POS = 0x04 /* when reporting errors, say 'position' instead + SETFLD_ERRMSG_USE_POS = 0x04, /* when reporting errors, say 'position' instead of 'field' (used with cut -b/-c) */ + SETFLD_NO_SORT= 0x08 /* Do not sort the fields; keep duplicated + and overlapped fields */ }; /* allocates and initializes the FRP array and N_FRP count */ -- 2.30.2 From 6db6c47aabe5c0ba194cecb1f8f24957b65e1556 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Wed, 5 Jan 2022 13:04:08 -0700 Subject: [PATCH 02/18] cut: iniitial -D implmentation, currently only with "-f" --- src/cut.c | 161 -- 1 file changed, 156 insertions(+), 5 deletions(-) diff --git a/src/cut.c b/src/cut.c index 5143c8bd9..84caad091 100644 --- a/src/cut.c +++ b/src/cut.c @@ -20,7 +20,9 @@ /* POSIX changes, bug fixes, long-named options, and cleanup by David MacKenzie . - Rewrite cut_fields and cut_bytes -- Jim Meyering. */ + Rewrite cut_fields and cut_bytes -- Jim Meyering. + + Match toybox's -D,-F,-O options -- Assaf Gordon. */ #include @@ -43,7 +45,8 @@ #define AUTHORS \ proper_name ("David M. Ihnat"), \ proper_name ("David MacKenzie"), \ - proper_name ("Jim Meyering") + proper_name ("Jim Meyering"), \ + proper_name ("Assaf Gordon") #define FATAL_ERROR(Message) \ do \ @@ -113,6 +116,15 @@ static char *output_delimiter_string; /* True if we have ever read standard input. */ static bool have_read_stdin; +/* If true use different (but less optimized) code, + Used with -F and/or -D. */ +static bool adv_mode; + +/* True if -D is used: allow duplicate
Re: Compilations warnings-as-errors when building from git
follow-up: On 2022-01-13 11:22 p.m., Assaf Gordon wrote: I'm getting few warnings-as-errors when building the latest version from git (using Debian 10 amd64 with gcc 8.3.0). with clang-14 ( Debian clang version 14.0.0-++20211220125923+c79a67196828-1~exp1~20211220130019.184 ) I'm seeing the following warnings: --- src/uptime.c:75:47: warning: implicit conversion from 'time_t' (aka 'long') to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-cons t-int-float-conversion] uptime = (0 <= upsecs && upsecs < TYPE_MAXIMUM (time_t) ~ ^ ./lib/intprops.h:57:4: note: expanded from macro 'TYPE_MAXIMUM' ((t) (! TYPE_SIGNED (t) \ --- src/ls.c:2287:33: warning: result of comparison of constant 9223372036854775807 with expression of type 'unsigned short' is always true [-Wtautological-constant-out-of-range-compare] linelen = ws.ws_col <= MIN (PTRDIFF_MAX, SIZE_MAX) ? ws.ws_col : 0; ~ ^ ~~~ 1 warning generated. src/sort.c:1414:21: warning: implicit conversion from 'unsigned long' to 'double' changes value from 18446744073709551615 to 18446744073709551616 [-Wimplicit-const-int-float-conversion] if (mem < UINTMAX_MAX) ~ ^~~ /usr/include/stdint.h:202:24: note: expanded from macro 'UINTMAX_MAX' # define UINTMAX_MAX(__UINT64_C(18446744073709551615)) ^~~~ /usr/include/stdint.h:107:25: note: expanded from macro '__UINT64_C' # define __UINT64_C(c) c ## UL ^~~ :21:1: note: expanded from here 18446744073709551615UL ^~ 1 warning generated. - Also, when compiling gnulib modules there is this: warning: unknown warning option '-Wno-unsuffixed-float-constants' [-Wunknown-warning-option] Which, I see was removed from gnulib in 2011, and reinstated just now in https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=0c8a563f65d44752b33aec42cceec25bd485f2d5 ---
Compilations warnings-as-errors when building from git
Hi all, I'm getting few warnings-as-errors when building the latest version from git (using Debian 10 amd64 with gcc 8.3.0). I can send a patch for the "malloc" one, but not sure about the intricates of intprops.h . - assaf lib/randperm.c: In function 'sparse_new': lib/randperm.c:111:1: error: function might be candidate for attribute 'malloc' if it is known to return normally [-Werror=suggest-attribute=malloc] sparse_new (size_t size_hint) ^~ src/stat.c: In function 'default_format': src/stat.c:1653:1: error: function might be candidate for attribute 'malloc' if it is known to return normally [-Werror=suggest-attribute=malloc] default_format (bool fs, bool terse, bool device) ^~ cc1: all warnings being treated as errors ( This failure is in pinky.c and the same in csplit.c ) In file included from ./lib/xalloc.h:27, from src/system.h:244, from src/pinky.c:25: src/pinky.c: In function 'create_fullname': ./lib/intprops.h:44:55: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] #define EXPR_SIGNED(e) (_GL_INT_NEGATE_CONVERT (e, 1) < 0) ^ ./lib/intprops.h:407:42: note: in expansion of macro 'EXPR_SIGNED' ((!_GL_SIGNED_TYPE_OR_EXPR (*(r)) && EXPR_SIGNED (a) && EXPR_SIGNED (b) \ ^~~ src/pinky.c:115:11: note: in expansion of macro 'INT_MULTIPLY_WRAPV' if (INT_MULTIPLY_WRAPV (ulen, ampersands - 1, ) ^~ ./lib/intprops.h:44:55: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] #define EXPR_SIGNED(e) (_GL_INT_NEGATE_CONVERT (e, 1) < 0) ^ ./lib/intprops.h:407:61: note: in expansion of macro 'EXPR_SIGNED' ((!_GL_SIGNED_TYPE_OR_EXPR (*(r)) && EXPR_SIGNED (a) && EXPR_SIGNED (b) \ ^~~ src/pinky.c:115:11: note: in expansion of macro 'INT_MULTIPLY_WRAPV' if (INT_MULTIPLY_WRAPV (ulen, ampersands - 1, ) ^~ ./lib/intprops.h:588:8: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] ((b) < 0 \ ^ ./lib/intprops.h:408:10: note: in expansion of macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' && _GL_INT_MULTIPLY_RANGE_OVERFLOW (a, b, 0, (__typeof__ (*(r))) -1)) \ ^~~ src/pinky.c:115:11: note: in expansion of macro 'INT_MULTIPLY_WRAPV' if (INT_MULTIPLY_WRAPV (ulen, ampersands - 1, ) ^~ ./lib/intprops.h:589:11: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] ? ((a) < 0 \ ^ ./lib/intprops.h:408:10: note: in expansion of macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' && _GL_INT_MULTIPLY_RANGE_OVERFLOW (a, b, 0, (__typeof__ (*(r))) -1)) \ ^~~ src/pinky.c:115:11: note: in expansion of macro 'INT_MULTIPLY_WRAPV' if (INT_MULTIPLY_WRAPV (ulen, ampersands - 1, ) ^~ ./lib/intprops.h:44:55: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] #define EXPR_SIGNED(e) (_GL_INT_NEGATE_CONVERT (e, 1) < 0) ^ ./lib/intprops.h:590:10: note: in expansion of macro 'EXPR_SIGNED' ? (EXPR_SIGNED (_GL_INT_CONVERT (tmax, b)) \ ^~~ ./lib/intprops.h:408:10: note: in expansion of macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' && _GL_INT_MULTIPLY_RANGE_OVERFLOW (a, b, 0, (__typeof__ (*(r))) -1)) \ ^~~ src/pinky.c:115:11: note: in expansion of macro 'INT_MULTIPLY_WRAPV' if (INT_MULTIPLY_WRAPV (ulen, ampersands - 1, ) ^~ ./lib/intprops.h:44:55: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] #define EXPR_SIGNED(e) (_GL_INT_NEGATE_CONVERT (e, 1) < 0) ^ ./lib/intprops.h:597:10: note: in expansion of macro 'EXPR_SIGNED' ? (EXPR_SIGNED (a) \ ^~~ ./lib/intprops.h:408:10: note: in expansion of macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' && _GL_INT_MULTIPLY_RANGE_OVERFLOW (a, b, 0, (__typeof__ (*(r))) -1)) \ ^~~ src/pinky.c:115:11: note: in expansion of macro 'INT_MULTIPLY_WRAPV' if (INT_MULTIPLY_WRAPV (ulen, ampersands - 1, ) ^~ ./lib/intprops.h:603:11: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] : ((a) < 0 \ ^ ./lib/intprops.h:408:10: note: in expansion of macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' && _GL_INT_MULTIPLY_RANGE_OVERFLOW (a, b, 0, (__typeof__ (*(r))) -1)) \
Re: cut -DF
Hello, On 2022-01-06 7:35 a.m., Pádraig Brady wrote: Thanks for taking the time to consolidate options/functionality across different implementations. This is important for users. Some notes below... On 05/01/2022 16:23, Rob Landley wrote: Around 5 years ago toybox added the -D, -F, and -O options to cut: -D Don't sort/collate selections or match -fF lines without delimiter -F Select fields separated by DELIM regex -O Output delimiter (default one space for -F, input delim for -f) As I see it, the main functionalities added here: - reordering of selected fields - adjusted suppression of lines without matching fields - regex delimiter support I see regex support as less important, but still useful. Attached is a suggestion for initial implementation of "cut -FDO". It's split into smaller steps to ease review. The main issue is that the current "cut_fields" and "cut_bytes" are highly optimized for speed, so I left them as-is and created a secondary set of 'cut' functions - slower but with additional options. If this is acceptable, I'll go on to clean up the patches, add more tests and write documentation. There are likely some edge-cases regarding regex matching that need to be decided upon (e.g. BRE or ERE, what about BOL/EOL anchors, groups, etc.). Comments and feedback very welcomed, regards, - assaf >From dbfdef9a720c8ea9ed1a90a4e4c66aa7e0ed3e1f Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Wed, 5 Jan 2022 13:03:39 -0700 Subject: [PATCH 1/9] cut: set-fields: add no-sort options --- src/set-fields.c | 27 +++ src/set-fields.h | 4 +++- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/src/set-fields.c b/src/set-fields.c index e3cce30d9..5e4ee6715 100644 --- a/src/set-fields.c +++ b/src/set-fields.c @@ -279,22 +279,25 @@ set_fields (char const *fieldstr, unsigned int options) ? _("missing list of byte/character positions") : _("missing list of fields")); - qsort (frp, n_frp, sizeof (frp[0]), compare_ranges); - - /* Merge range pairs (e.g. `2-5,3-4' becomes `2-5'). */ - for (size_t i = 0; i < n_frp; ++i) + if (!(options & SETFLD_NO_SORT)) { - for (size_t j = i + 1; j < n_frp; ++j) + qsort (frp, n_frp, sizeof (frp[0]), compare_ranges); + + /* Merge range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + for (size_t i = 0; i < n_frp; ++i) { - if (frp[j].lo <= frp[i].hi) + for (size_t j = i + 1; j < n_frp; ++j) { - frp[i].hi = MAX (frp[j].hi, frp[i].hi); - memmove (frp + j, frp + j + 1, (n_frp - j - 1) * sizeof *frp); - n_frp--; - j--; + if (frp[j].lo <= frp[i].hi) +{ + frp[i].hi = MAX (frp[j].hi, frp[i].hi); + memmove (frp + j, frp + j + 1, (n_frp - j - 1) * sizeof *frp); + n_frp--; + j--; +} + else +break; } - else -break; } } diff --git a/src/set-fields.h b/src/set-fields.h index 7bc9b3afe..9127d9957 100644 --- a/src/set-fields.h +++ b/src/set-fields.h @@ -34,8 +34,10 @@ enum { SETFLD_ALLOW_DASH = 0x01, /* allow single dash meaning 'all fields' */ SETFLD_COMPLEMENT = 0x02, /* complement the field list */ - SETFLD_ERRMSG_USE_POS = 0x04 /* when reporting errors, say 'position' instead + SETFLD_ERRMSG_USE_POS = 0x04, /* when reporting errors, say 'position' instead of 'field' (used with cut -b/-c) */ + SETFLD_NO_SORT= 0x08 /* Do not sort the fields; keep duplicated + and overlapped fields */ }; /* allocates and initializes the FRP array and N_FRP count */ -- 2.20.1 >From d5d58eeb0bf5a399b2d65e174c72d0f8c11b2c01 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Wed, 5 Jan 2022 13:04:08 -0700 Subject: [PATCH 2/9] cut: iniitial -D implmentation, currently only with "-f" --- src/cut.c | 161 -- 1 file changed, 156 insertions(+), 5 deletions(-) diff --git a/src/cut.c b/src/cut.c index 5143c8bd9..84caad091 100644 --- a/src/cut.c +++ b/src/cut.c @@ -20,7 +20,9 @@ /* POSIX changes, bug fixes, long-named options, and cleanup by David MacKenzie . - Rewrite cut_fields and cut_bytes -- Jim Meyering. */ + Rewrite cut_fields and cut_bytes -- Jim Meyering. + + Match toybox's -D,-F,-O options -- Assaf Gordon. */ #include @@ -43,7 +45,8 @@ #define AUTHORS \ proper_name ("David M. Ihnat"), \ proper_name ("David MacKenzie"), \ - proper_name ("Jim Meyering") + proper_name ("Jim Meyering"), \ + proper_name ("Assaf Gordon") #define FATAL_ERROR(Message)
Re: cut -DF
Hello Rob and all, On 2022-01-05 9:23 a.m., Rob Landley wrote: Around 5 years ago toybox added the -D, -F, and -O options to cut: -D Don't sort/collate selections or match -fF lines without delimiter -F Select fields separated by DELIM regex -O Output delimiter (default one space for -F, input delim for -f) [...] Elliott Hughes (the Android base OS maintainer) asked if I could get the feature more widely adopted: your non-POSIX cut(1) extension covers 80% of the in-the-wild use of awk anyway :-) [...] This is working and in use in Android, and now in busybox, and it would simplify my regression test suite if coreutils was in sync, so I thought I'd ask if you were interested. I personally like the idea (at the very list "-D" will indeed replace awk for many simple use-cases). I'm working on a proof-of-concept (will share later today for feedback and comments). Do you mind sharing your test suite? -assaf
bug#49741: basenc --base64url decoding bug
tag 49741 fixed close 49741 stop On 2021-08-22 4:15 p.m., Assaf Gordon wrote: Attached a suggested fix. pushed in: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=709d1f8253072804cc27189a6f2b873d8d563399
bug#50151: Coreutils, aarch64 and chroot
tag 50151 notabug close 50151 stop On 2021-08-25 12:54 p.m., Frans de Boer wrote: On 8/25/21 10:16 AM, Assaf Gordon wrote: qemu-aarch64 -strace -L /newroot \ /newroot/usr/sbin/chroot /newroot /usr/bin/env --version 2&1 \ | tee log.txt @assaf: your suggestions no. 1 and 2, had the predicted results. Thus, suggestion no. 3 failed because of suggestion no.2. I followed then suggestion 4 and attached the strace output to this message. It seems that chroot is working as expected, only env seems to fail with an error. Not exactly: The 'chroot' system-call *seems* to succeed, followed by a failed "execve(2)" system call to execute another binary. That "execve" system fails - so it is not 'env' per-se, it is any program that will try to execute another aarch64 binary. Learning that, searching for "qemu-user", "chroot" and "architecture" leads to several web pages detailing similar errors (and few suggested solutions): https://wiki.gentoo.org/wiki/Crossdev_qemu-static-user-chroot https://newbedev.com/how-can-i-chroot-into-a-filesystem-with-a-different-architechture https://ownyourbits.com/2018/06/13/transparently-running-binaries-from-any-architecture-in-linux-with-qemu-and-binfmt_misc/ I hope you have some clue of what is going wrong. With the above information, we can conclude this is not a bug in coreutils - it is a limitation of the linux+qemu-user setup. So I'm closing this item and marking it as "not a bug", but discussion can continue by replying to this thread. regards, - assaf
bug#50151: Coreutils, aarch64 and chroot
Hello, On 2021-08-24 2:39 a.m., Paul Eggert wrote: However, I think it'll be a better use of our time for you to debug this one yourself. It doesn't sound like a Coreutils problem; it sounds like a problem in your virtual machine setup, and you're the best expert on that setup. Few suggestions to check, that might help you and us to troubleshoot: 1. ensure the binaries are indeed for aarch64: file /newroot/usr/sbin/chroot file /newroot/usr/bin/env file /newroot/usr/bin/bash it should say something like "ELF 64-bit LSB pie executable, ARM aarch64" for all of them. 2. ensure each binary works by itself: qemu-aarch64 -L /newroot /newroot/usr/sbin/chroot --version qemu-aarch64 -L /newroot /newroot/usr/bin/env --version qemu-aarch64 -L /newroot /newroot/usr/bin/bash --version (the actual version doesn't matter here, the main thing is that the qemu user-mode emulator was able to run the binaries.) On 2021-08-21 4:33 a.m., Frans de Boer wrote: Running 'qemu-aarch64 -L /newroot /newroot/usr/bin/bash -c /usr/bin/env> --help' does show the env help text. So, I guess chroot is to blame? Note that the above command runs your *host's* /usr/bin/env because chroot is not used - the binary under qemu (/newroot/usr/bin/bash) sees your host's file system. Observe with: qemu-aarch64 -L /newroot /newroot/usr/bin/bash -c /bin/uname -m qemu-aarch64 -L /newroot /newroot/usr/bin/env /bin/uname -m I'm guessing you will see "x86_64", not "aarch64". 3. What you should try is: qemu-aarch64 -L /newroot \ /newroot/usr/bin/bash -c /newroot/usr/bin/env --version and: qemu-aarch64 -L /newroot \ /newroot/usr/bin/env /newroot/usr/bin/bash --version In both cases, one aarch64 binary will try to execute another aach64 binary. Do these work for you, or are you seeing an error? 4. Use qemu's "-strace" to see the syscalls, hopefully that will help pinpoint the cause: qemu-aarch64 -strace -L /newroot \ /newroot/usr/sbin/chroot /newroot /usr/bin/env --version 2&1 \ | tee log.txt If the command results in an error, the "log.txt" file will show more details about what failed. If you're not familiar with 'strace' output, post it here as an email attachment. Hope this helps, - assaf P.S. On 2021-08-24 2:39 a.m., Paul Eggert wrote: A complete set of instructions for an outsider to reproduce the problem from scratch. Assume the outsider is running Fedora 34 x86-64 (since that's what I'm running :-). I'm not familiar with Fedora, but on Debian/x86_64 the following works: apt-get qemu-user apt-get install crossbuild-essential-arm64 libc6-arm64-cross cd coreutils ./configure --host=aarch64-linux-gnu make then: $ qemu-aarch64 -L /usr/aarch64-linux-gnu/ ./src/uname -m aarch64 Somewhat related: $ qemu-aarch64 -L /usr/aarch64-linux-gnu/ ./src/env ./src/uname -m /lib/ld-linux-aarch64.so.1: No such file or directory This fails because once "inside" qemu, the aarch64 searches for "/lib/ld-linux-aarch64.so.1" but the file is in "/usr/aarch64-linux-gnu/lib/ld-linux-aarch64.so.1". One possible work-around is to build static binaries. I don't want to assume that is the culprit for Frans, so we'll wait for the logs...
bug#49741: basenc --base64url decoding bug
On 2021-08-17 3:37 a.m., Jim Meyering wrote: On Tue, Aug 17, 2021 at 2:02 AM Pádraig Brady wrote: On 16/08/2021 22:17, Assaf Gordon wrote: Attached a suggested fix. minor nit in NEWS: a nit in the commit log: Thanks, attached updated patch. Will push this week if there are no other comments. -assaf >From 090663068a23662b36ddc0603fc1c2c752b6aff1 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Mon, 16 Aug 2021 15:03:36 -0600 Subject: [PATCH] basenc: fix bug49741: using wrong decoding buffer length Emil Lundberg reports in https://bugs.gnu.org/49741 about a 'basenc --base64 -d' decoding bug. The input buffer length was not divisible by 3, resulting in decoding errors. * NEWS: Mention fix. * src/basenc.c (DEC_BLOCKSIZE): Change from 1024*5 to 4200 (35*3*5*8) which is divisible by 3,4,5,8 - satisfying both base32 and base64; Use compile-time verify() macro to enforce the above. * tests/misc/basenc.pl: Add test. --- NEWS | 4 src/basenc.c | 4 +++- tests/misc/basenc.pl | 9 + 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index ddec56bdf..efdb1450e 100644 --- a/NEWS +++ b/NEWS @@ -60,6 +60,10 @@ GNU coreutils NEWS-*- outline -*- invalid combinations of case character classes. [bug introduced in coreutils-8.6] + basenc --base64 --decode no longer silently discards decoded characters + on (1024*5) buffer boundaries + [bug introduced in coreutils-8.31] + ** Changes in behavior cp and install now default to copy-on-write (COW) if available. diff --git a/src/basenc.c b/src/basenc.c index 5c97a3652..2ffdb2d27 100644 --- a/src/basenc.c +++ b/src/basenc.c @@ -213,7 +213,9 @@ verify (DEC_BLOCKSIZE % 12 == 0); /* So complete encoded blocks are used. */ /* Note that increasing this may decrease performance if --ignore-garbage is used, because of the memmove operation below. */ -# define DEC_BLOCKSIZE (1024*5) +# define DEC_BLOCKSIZE (4200) +verify (DEC_BLOCKSIZE % 40 == 0); /* complete encoded blocks for base32 */ +verify (DEC_BLOCKSIZE % 12 == 0); /* complete encoded blocks for base64 */ static int (*base_length) (int i); static bool (*isbase) (char ch); diff --git a/tests/misc/basenc.pl b/tests/misc/basenc.pl index 3383aaeef..ac5394731 100755 --- a/tests/misc/basenc.pl +++ b/tests/misc/basenc.pl @@ -37,6 +37,13 @@ my $base64url_out_nl = $base64url_out; $base64url_out_nl =~ s/(..)/\1\n/g; # add newline every two characters +# Bug 49741: +# The input is 'abc' in base64, in an 8K buffer (larger than 1024*5, +# the buffer size which caused the bug). +my $base64_bug49741_in = "YWJj" x 2000 ; +my $base64_bug49741_out = "abc" x 2000 ; + + my $base32_in = "\xfd\xd8\x07\xd1\xa5"; my $base32_out = "7XMAPUNF"; my $x = $base32_out; @@ -111,6 +118,8 @@ my @Tests = ['b64u_7', '--base64url -d', {IN=>$base64_out}, {EXIT=>1}, {ERR=>"$prog: invalid input\n"}], + ['b64_bug49741', '--base64 -d', {IN=>$base64_bug49741_in}, + {OUT=>$base64_bug49741_out}], -- 2.20.1
bug#49741: basenc --base64url decoding bug
Hello Emil and all, Thanks for the clear and easily reproducible bug report. Attached a suggested fix. Comments very welcomed, - Assaf >From 11330058443e7cc92b4a53322d810725d42b4e34 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Mon, 16 Aug 2021 15:03:36 -0600 Subject: [PATCH] basenc: fix bug49741: using wrong decoding buffer length Emil Lundberg reports in https://bugs.gnu.org/49741 about a 'basenc --base64 -d' decoding bug. The input buffer was not divisible by 3, resulting in decoding errors. * NEWS: Mention fix. * src/basenc.c (DEC_BLOCKSIZE): Change from 1024*5 to 4200 (35*3*5*8) which is divisible by 3,4,5,8 - satisfying both base32 and base64; Use compile-time verify() macro to enforce the above. * tests/misc/basenc.pl: Add test. --- NEWS | 4 src/basenc.c | 4 +++- tests/misc/basenc.pl | 9 + 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index ddec56bdf..d490ed101 100644 --- a/NEWS +++ b/NEWS @@ -60,6 +60,10 @@ GNU coreutils NEWS-*- outline -*- invalid combinations of case character classes. [bug introduced in coreutils-8.6] + basenc --base64 --decode no longer silently discard decoded characters + on (1024*5) buffer boundaries + [bug introduced in coreutils-8.31] + ** Changes in behavior cp and install now default to copy-on-write (COW) if available. diff --git a/src/basenc.c b/src/basenc.c index 5c97a3652..2ffdb2d27 100644 --- a/src/basenc.c +++ b/src/basenc.c @@ -213,7 +213,9 @@ verify (DEC_BLOCKSIZE % 12 == 0); /* So complete encoded blocks are used. */ /* Note that increasing this may decrease performance if --ignore-garbage is used, because of the memmove operation below. */ -# define DEC_BLOCKSIZE (1024*5) +# define DEC_BLOCKSIZE (4200) +verify (DEC_BLOCKSIZE % 40 == 0); /* complete encoded blocks for base32 */ +verify (DEC_BLOCKSIZE % 12 == 0); /* complete encoded blocks for base64 */ static int (*base_length) (int i); static bool (*isbase) (char ch); diff --git a/tests/misc/basenc.pl b/tests/misc/basenc.pl index 3383aaeef..ac5394731 100755 --- a/tests/misc/basenc.pl +++ b/tests/misc/basenc.pl @@ -37,6 +37,13 @@ my $base64url_out_nl = $base64url_out; $base64url_out_nl =~ s/(..)/\1\n/g; # add newline every two characters +# Bug 49741: +# The input is 'abc' in base64, in an 8K buffer (larger than 1024*5, +# the buffer size which caused the bug). +my $base64_bug49741_in = "YWJj" x 2000 ; +my $base64_bug49741_out = "abc" x 2000 ; + + my $base32_in = "\xfd\xd8\x07\xd1\xa5"; my $base32_out = "7XMAPUNF"; my $x = $base32_out; @@ -111,6 +118,8 @@ my @Tests = ['b64u_7', '--base64url -d', {IN=>$base64_out}, {EXIT=>1}, {ERR=>"$prog: invalid input\n"}], + ['b64_bug49741', '--base64 -d', {IN=>$base64_bug49741_in}, + {OUT=>$base64_bug49741_out}], -- 2.20.1
bug#49741: basenc --base64url decoding bug
Hi, I will also work on it this weekend. -assaf On 2021-08-12 7:37 p.m., Paul Eggert wrote: Simon, this looks like some sort of minor buffering problem in 'basenc --base64', since plain 'base64' works correctly. Is this something you have time to look into? https://bugs.gnu.org/49741
Re: question
Hello, On 2021-04-29 12:34 p.m., steve.lowder.ctr--- via GNU coreutils General Discussion wrote: Can you tell me what version of the GNU coreutils did the od command add the -endian option? Looking at the "NEWS" file ( https://git.savannah.gnu.org/cgit/coreutils.git/tree/NEWS#n1135 ), "-endian" was added in version 8.23, released in July 2014 ( https://git.savannah.gnu.org/cgit/coreutils.git/tree/NEWS#n1026 ). regards, - assaf
Re: [PATCH] wc: Add AVX2 optimization when counting only lines
Hello, On 2021-03-29 7:21 a.m., Pádraig Brady wrote: On 28/03/2021 18:29, Kristoffer Brånemyr via GNU coreutils General I wanted to practice some more using vector intrinsics, so I made a small AVX2 optimization for wc -l. Depending on line length it is about 2-5x faster than previous version. (Well, only looking at user time it is much faster than that even.) Excellent results. I'll review this very soon. I'm attaching the patch (copied from the Github's pull-request), hopefully we can continue the discussion here on the mailing list. -assaf >From 462386ea5aad1b1673f7c1bc51983374aad325a8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kristoffer=20Br=C3=A5nemyr?= Date: Sat, 20 Feb 2021 12:27:17 +0100 Subject: [PATCH] wc: Add AVX2 optimization when counting only lines --- configure.ac | 46 ++ po/POTFILES.in | 1 + src/local.mk | 9 +++ src/wc.c | 162 - src/wc_avx2.c | 115 +++ 5 files changed, 290 insertions(+), 43 deletions(-) create mode 100644 src/wc_avx2.c diff --git a/configure.ac b/configure.ac index 7fbecbf8d..8186b88f1 100644 --- a/configure.ac +++ b/configure.ac @@ -575,6 +575,52 @@ AM_CONDITIONAL([USE_PCLMUL_CRC32], test "x$pclmul_intrinsic_exists" = "xyes"]) CFLAGS=$ac_save_CFLAGS +AC_MSG_CHECKING([if __get_cpuid_count exists]) +AC_COMPILE_IFELSE( + [AC_LANG_SOURCE([[ +#include + +int main(void) +{ + unsigned int eax = 0, ebx = 0, ecx = 0, edx = 0; + __get_cpuid_count(7, 0, , , , ); + return 1; +} + ]]) + ],[ +AC_MSG_RESULT([yes]) +get_cpuid_count_exists=yes + ],[ +AC_MSG_RESULT([no]) + ]) + +CFLAGS="-mavx2 $CFLAGS" +AC_MSG_CHECKING([if avx2 intrinstics exists]) +AC_COMPILE_IFELSE( + [AC_LANG_SOURCE([[ +#include + +int main(void) +{ + __m256i a, b; + a = _mm256_sad_epu8(a, b); + return 1; +} + ]]) + ],[ +AC_MSG_RESULT([yes]) +AC_DEFINE([HAVE_AVX2_INTRINSIC], [1], [avx2 intrinsics exists]) +avx2_intrinsic_exists=yes + ],[ +AC_MSG_RESULT([no]) + ]) +if test "x$get_cpuid_count_exists" = "xyes" && test "x$avx2_intrinsic_exists" = "xyes"; then + AC_DEFINE([USE_AVX2_WC_LINECOUNT], [1], [Counting lines with AVX2 enabled]) +fi +AM_CONDITIONAL([USE_AVX2_WC_LINECOUNT], [test "x$get_cpuid_count_exists" = "xyes" && test "x$avx2_intrinsic_exists" = "xyes"]) + +CFLAGS=$ac_save_CFLAGS + dnl Autogenerated by the 'gen-lists-of-programs.sh' auxiliary script. diff --git a/po/POTFILES.in b/po/POTFILES.in index b5f5bbff1..dc80762db 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -142,6 +142,7 @@ src/unlink.c src/uptime.c src/users.c src/wc.c +src/wc_avx2.c src/who.c src/whoami.c src/yes.c diff --git a/src/local.mk b/src/local.mk index 8c8479a53..c6555dafb 100644 --- a/src/local.mk +++ b/src/local.mk @@ -427,6 +427,15 @@ src_basenc_CPPFLAGS = -DBASE_TYPE=42 $(AM_CPPFLAGS) src_expand_SOURCES = src/expand.c src/expand-common.c src_unexpand_SOURCES = src/unexpand.c src/expand-common.c +src_wc_SOURCES = src/wc.c +if USE_AVX2_WC_LINECOUNT +noinst_LIBRARIES += src/libwc_avx2.a +src_libwc_avx2_a_SOURCES = src/wc_avx2.c +wc_avx2_ldadd = src/libwc_avx2.a +src_wc_LDADD += $(wc_avx2_ldadd) +src_libwc_avx2_a_CFLAGS = -mavx2 $(AM_CFLAGS) +endif + # Ensure we don't link against libcoreutils.a as that lib is # not compiled with -fPIC which causes issues on 64 bit at least src_libstdbuf_so_LDADD = $(LIBINTL) diff --git a/src/wc.c b/src/wc.c index 5216db189..1ecec0d83 100644 --- a/src/wc.c +++ b/src/wc.c @@ -37,6 +37,9 @@ #include "safe-read.h" #include "stat-size.h" #include "xbinary-io.h" +#ifdef USE_AVX2_WC_LINECOUNT +#include +#endif #if !defined iswspace && !HAVE_ISWSPACE # define iswspace(wc) \ @@ -53,6 +56,15 @@ /* Size of atomic reads. */ #define BUFFER_SIZE (16 * 1024) +static +bool wc_lines(const char *file, int fd, uintmax_t *lines_out, uintmax_t *bytes_out); +#ifdef USE_AVX2_WC_LINECOUNT +/* From wc_avx2.c */ +bool wc_lines_avx2(const char *file, int fd, uintmax_t *lines_out, uintmax_t *bytes_out); +#endif +bool (*wc_lines_p)(const char *file, int fd, uintmax_t *lines_out, uintmax_t *bytes_out) = wc_lines; + + /* Cumulative number of lines, words, chars and bytes in all files so far. max_line_length is the maximum over all files processed so far. */ static uintmax_t total_lines; @@ -108,6 +120,41 @@ static struct option const longopts[] = {NULL, 0, NULL, 0} }; +#ifdef USE_AVX2_WC_LINECOUNT +static bool +avx2_supported(void) +{ + unsigned int eax = 0; + unsigned int ebx = 0; + unsigned int ecx = 0; + unsigned int edx = 0; + + if (! __get_cpuid(1, , , , )) +{ + return false; +} + + if (! (ecx & bit_OSXSAVE)) +{ + return false; +} + + eax = ebx = ecx = edx = 0; + + if (! __get_cpuid_count(7, 0, , , , )) +{ + return
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
tag 44704 notabug severity 44704 wishlist stop Hello, On 2020-11-17 6:32 a.m., Brian J. Murrell wrote: It would be a useful enhancement to uniq to replace all lines considered non-uniq (i.e. those that would be removed from the output) with a message about how many times the previous line was repeated. I.e. $ cat < [...] uniq supports the "--group" option, which adds a blank line after each group of identical lines - this can be used down-stream to process groups in any way you want. Example: $ cat < in first line second line repeated line repeated line repeated line repeated line repeated line third line EOF $ cat in | uniq --group=append first line second line repeated line repeated line repeated line repeated line repeated line third line $ cat in | uniq --group=append \ | awk '$0=="" { print "do something after group" ; next } ; 1 { print }' first line do something after group second line do something after group repeated line repeated line repeated line repeated line repeated line do something after group third line do something after group And with counting: $ cat in | uniq --group=append \ | awk 'BEGIN { c = 0 } ; $0=="" { print "Group has " c " lines" ; c=0 ; next } ; 1 { print ; c++ }' first line Group has 1 lines second line Group has 1 lines repeated line repeated line repeated line repeated line repeated line Group has 5 lines third line Group has 1 lines Hope this helps. More information about "uniq --group=X" is here: https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html I'm marking this as "notabug/wishlist", but will likely close soon as "wontfix" unless we come up with convincing argument why "--group" is not sufficient for your use case. Regardless of the status, discussion can continue by replying to this thread. regards, - assaf
Re: Enhancement Request: sort: skip table caption (or just a specified number of lines)
Hello, On 2020-11-05 10:23 a.m., Michael Mess wrote: I have a feature request for the sort command: I would like to sort a table but do not want to sort the column names a the top. Thus the column names or a specified number of lines should be just given out as they are, unsorted. This has been discussed few times in the past, please see discussion (and with further links to other relevant postings) at: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22057 I know there is a workaround, but this is not so handy/comfortable and has other disadvantages: for a in ColumnName 4 2 3 1 ; do echo $a ;done | ( sed -u 1q ; sort -n ) As for handy/comfortable, you can use create a shell alias or a shell function, e.g.: $ alias sortcap="(sed -u q1 ; sort)" $ printf "%s\n" ColumnName 4 2 3 1 | sortcap ColumnName 1 2 3 4 But also (shameless plug), long ago I wrote a perl wrapper for 'sort' that hides these details and accepts the same parameters as 'sort' with addition of "--header N" argument, similar to your requested "--caption". https://github.com/agordon/bin_scripts/blob/master/scripts/sort-header.pl Hope this helps. - Assaf
bug#43684: Problem with numerical splitting with files > 90*l
On 29/09/2020 02:18, ned haughton wrote: When splitting with -d, the numbering screws up after 89: In addition to Pádraig explanation, please see previous similar discussion here: https://lists.gnu.org/archive/html/bug-coreutils/2017-02/msg00050.html http://bugs.gnu.org/25832 regards, - assaf
bug#42340: Fwd: bug#42340: "join" reports that "sort"ed input is not sorted
Hello, On 2020-07-15 2:12 p.m., Beth Andres-Beck wrote: If that is the intended behavior, the bug is that: printf '12,\n1,\n' | sort -t, -k1 -s 1, 12, does _not_ take the remainder of the line into account, and only sorts on the initial field, prioritizing length. It is at the very least unexpected that adding an `a` to the end of both lines would change the sort order of those lines: printf '12,a\n1,a\n' | sort -t, -k1 -s 12,a 1,a Not a bug, just an incomplete usage :) sort's -k/--key parameter takes two values (the second being optional): the first and last column to use as the key. If the second value is omitted (as in your case), then the key is taken from the first field to the end of the line. And so: "sort -k1,1" means take the first *and only the first* field as the key. "sort -k1" means take the first field until the end of the line as the key. "sort -k1,3" means take the first,second and third fields as the single key. "sort -k1,1 -k2,2 -k3,3" means take the first field as the first key, second field as the second key, and third field as the third key. --- The "--debug" option can help illustrate what sort is doing, by adding underscore characters to show which characters are being used as keys in each line. Consider the following: $ printf '12,\n1,\n' | sort -t, -k1 -s --debug sort: using ‘en_CA.utf8’ sorting rules 1, __ 12, ___ $ printf '12,\n1,\n' | sort -t, -k1,1 -s --debug sort: using ‘en_CA.utf8’ sorting rules 1, _ 12, __ In the first example, the "-k1" means from first field till end of line, the underscore includes the "," characters. In the second example, the "-k1,1" means only the first field, and the comma is not used. Now consider your second case of adding an "a" at the end of each line: $ printf '12,a\n1,a\n' | sort -t, -k1 -s --debug sort: using ‘en_CA.utf8’ sorting rules 12,a 1,a ___ $ printf '12,a\n1,a\n' | sort -t, -k1,1 -s --debug sort: using ‘en_CA.utf8’ sorting rules 1,a _ 12,a __ In the first example, "-k1" means: from first field until the end of the line, and so the entire string "12,a" is compared against "1,a". **AND**, because the locale is a "utf-8" locale, punctuation characters are ignored (as mentioned in the previous email in this thread). So effectively the compared strings are "12a" vs "1a". The ASCII value of "2" is smaller than the ASCII value of "a", and therefore "12a" appears before "1a". If we force C locale, then the order is reversed: $ printf '12,a\n1,a\n' | LC_ALL=C sort -t, -k1 -s --debug sort: using simple byte comparison 1,a ___ 12,a Because now punctuation characters are used, and the ASCII value of "," is smaller than the ASCII value of "2". **HOWEVER**, this result of using "LC_ALL=C" together with "-k1" is only correct by a happy accident :) it is still very likely that "-k1" is not what you wanted - you probably meant to do "-k1,1". --- Lastly, the "-s/--stable" option in the above contrived examples is superfluous - it doesn't affect the output order because there are no equal field values (i.e. "1" vs "12"). A slightly better example to illustrate how "-s" affects ordering is this: $ printf "2,x\n1,a\n2,b\n" | sort -t, -k1,1 1,a 2,b 2,x $ printf "2,x\n1,a\n2,b\n" | sort -t, -k1,1 -s 1,a 2,x 2,b Here, "1" comes before "2" - that's obvious. But should "2,b" come before "2,x" ? If we do not use "-s/--stable", then "sort" ALSO does one additional comparison of the entire line as a last step (hence "sort --help" says "[disable] last-resort comparison" about "-s/--stable"). The substring ",b" comes before ",x" - therefore "2,b" appears first. If we add "-s/--stable", the last comparison step of the entire line is skipped, and the lines of "2" appear in the order they were in the input (hence - "stable"). By using "--debug" we can see the additional comparison step (indicated by additional underscore lines); $ printf "2,x\n1,a\n2,b\n" | sort -t, -k1,1 --debug sort: using ‘en_CA.utf8’ sorting rules 1,a _ ___ 2,b _ ___ 2,x _ ___ $ printf "2,x\n1,a\n2,b\n" | sort -t, -k1,1 -s --debug sort: using ‘en_CA.utf8’ sorting rules 1,a _ 2,x _ 2,b _ --- Hope this helps. regards, - assaf
bug#42340: "join" reports that "sort"ed input is not sorted
tags 42340 notabug close 42340 stop Hello, On 2020-07-12 5:57 p.m., Beth Andres-Beck wrote: In trying to use `join` with `sort` I discovered odd behavior: even after running a file through `sort` using the same delimiter, `join` would still complain that it was out of order. [...] Here is a way to reproduce the problem: printf '1.1.1,2\n1.1.12,2\n1.1.2,1' | sort -t, > a.txt printf '1.1.12,a\n1.1.1,b\n1.1.21,c' | sort -t, > b.txt join -t, a.txt b.txt join: b.txt:2: is not sorted: 1.1.1,b The expected behavior would be that if a file has been sorted by "sort" it will also be considered sorted by join. [...] I traced this back to what I believe to be a bug in sort.c This is not a bug in sort or join, just a side-effect of the locale on your system on the sorting results. By forcing a C locale with "LC_ALL=C" (meaning simple ASCII order), the files are ordered in the same way 'join' expected them to be: $ printf '1.1.1,2\n1.1.12,2\n1.1.2,1' | LC_ALL=C sort -t, > a.txt $ printf '1.1.12,a\n1.1.1,b\n1.1.21,c' | LC_ALL=C sort -t, > b.txt $ join -t, a.txt b.txt 1.1.1,2,b 1.1.12,2,a --- More details: I'm going to assume your system uses some locale based on UTF-8. You can check it by running 'locale', e.g. on my system: $ locale LANG=en_CA.utf8 LANGUAGE=en_CA:en LC_CTYPE="en_CA.utf8" .. .. Under most UTF-8 locales, punctuation characters are *ignored* in the compared input lines. This might be confusing and non-intuitive, but that's the way most systems have been working for many years (locale ordering is defined in the GNU C Library, and coreutils has no way to change it). Observe the following: $ printf '12,a\n1,b\n' | LC_ALL=en_CA.utf8 sort 12,a 1,b $ printf '12,a\n1,b\n' | LC_ALL=C sort 1,b 12,a With a UTF-8 locale, the comma character is ignored, and then "12a" appears before "1b" (since the character '2' comes before the character 'b'). With "C" locale, forcing ASCII or "byte comparison", punctuation characters are not ignored, and "1,b" appears before "12,a" (because the comma ',' ASCII value is 44 , which is smaller then the ASCII value digit '2'). --- Somewhat related: Your sort command defines the delimiter ("-t,") but does not define which columns to sort by; sort then uses the entire input line - and there's no need to specify delimiter at all. --- As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. regards, - assaf
Re: [PATCH] maint: add sm3sum based on OSCCA SM3 secure hash
Hello, On 2020-06-09 12:23 a.m., Tianjia Zhang wrote: Add message digest program sm3sum, it use OSCCA SM3 secure hash (OSCCA GM/T 0004-2012 SM3) generic hash transformation. There has already been a discussion about adding SM3 to coreutils three years ago, and it was decided against adding it: https://lists.gnu.org/archive/html/coreutils/2017-10/msg00043.html regards, - assaf
Re: [PATCH] md5sum: add an option to change directory
Hello, On 2020-05-30 3:59 p.m., Bertrand Jacquin wrote: [...] This definitely make sense $ sha256sum -C /etc fstab b5d6c0e5e6bc419b134478ad7b3e7c8cc628049876a7772cea469e81e4b0e0e5 fstab The net effect is that just the output has changed to omit the path name. Maybe this wants to be a --strip or -p option like with diff or patch, or --basename-only to strip a variable number of components, leaving only the last. This seems to be a better approach indeed. I just sent a new patch using base_name from coreutils itself. The GNU Datamash program can do basename and dirname on a column of a text file, producing the wanted results (and more): $ md5sum /etc/fstab world.txt | datamash -W --full basename 2 dirname 2 b50f98cdf2d6e26a99040ad5386b0884 /etc/fstab fstab /etc b1946ac92492d2347c6235b4d2611184 world.txt world.txt . And this will work on any input without the need to duplicate functionality in multiple programs. -assaf
Re: Extend uniq to support unsorted list based on hashtable
Hello, On 2020-05-29 10:16 p.m., Yair Lenga wrote: Wanted to suggest that the team will look (again) at implementing --unsorted option for 'uniq'. The idea was proposed (and rejected) about 10 years ago (https://lists.gnu.org/archive/html/coreutils/2011-11/msg00016.html). Lot of things have changed from the past. [...] Can you advise/provide feedback. I'm sure that there will be many volunteers (me included) to contribute to such important improvement. "uniq" is standardize by POSIX to work on "comparing adjacent lines" (from: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html ) - hence the requirement to pre-sort the input. While it could be extended with a completely different hash-based implementation, I don't think this is likely to happen. As an alternative (and a shameless plug), allow me to point to GNU Datamash ( https://www.gnu.org/software/datamash/ ). On one hand, it already has a hash-based implementation to remove duplicated fields (called "rmdup"). consider the following contrived example: $ (printf "%s\t%s\n" 9 B 3 A ; seq 10 | paste - -) | datamash rmdup 1 9 B 3 A 1 2 5 6 7 8 And on the other hand, because 'datamash' is non-standard, there's less of a problem in adding new functionality (i.e. "bloat" is not as big as a concern as it is for coreutils). Hope this helps. regards, - assaf
Re: [PATCH] md5sum: add an option to change directory
Hello, On 2020-05-20 3:15 p.m., Bertrand Jacquin wrote: In the fashion of make and git, add the ability for all sum tools to change directory before reading a file. [...] $ sha256sum -C /etc fstab b5d6c0e5e6bc419b134478ad7b3e7c8cc628049876a7772cea469e81e4b0e0e5 fstab I'm not entirely sure what is the use case, but GNU "env(1)" already has a '-C/--chdir' option which does exactly what you want (since version 8.28 / released 2017): env -C etc sha256sum fstab Or the (longer) shell construct: (cd etc && sha256sum fstab) regards, - assaf
Re: suggestion: /etc/dd.conf
Hello, On 2020-04-28 3:14 a.m., turgut kalfaoğlu wrote: I would like to suggest and in fact volunteer to create a conf file option to 'dd'. Adding to others replies, similar suggestions for Coreutils configuration files have been discussed in the past, and rejected: https://www.gnu.org/software/coreutils/rejected_requests.html#gnuconf regards, - assaf
Re: decorate - new sorting-helper program (experimental)
Hello, Just a quick note that the "decorate" program (explained below) was just released as part of GNU datamash 1.7: https://lists.gnu.org/archive/html/info-gnu/2020-04/msg00011.html Comments, suggestions and feedback are very welcomed. On 2020-04-13 1:14 p.m., Assaf Gordon wrote: Hello, I'm happy to announce the first experimental release of the "decorate" program. 'decorate' works in tandem with coreutils' sort(1) to allow new sorting methods (e.g. IP addresses, roman numerals, string lengths). This is a new program but an old idea, suggested by Pádraig here: https://lists.gnu.org/r/bug-coreutils/2015-06/msg00076.html --- The program is part of the "datamash" package, and available here: https://alpha.gnu.org/gnu/datamash/datamash-1.5.17-735b.tar.gz "./configure && make" should give you the "decorate" executable. The rest of this (long) email shows usage information and examples. This is an experimental version, and everything could still change. Comments, suggestions and feedback are *very* welcomed. regards, - assaf General Usage # The general idea is: 1. convert a field of an input file to a format that can be easily sorted by sort(1), e.g., converting roman numerals to their decimal equivalent or IPv4 addresses to 32 bit hex value. 2. Pass this converted (=decorated) input to sort 3. remove (=undecorate) the converted fields. Example 1: ### convert roman-numerals, add new field $ printf "%s\n" C V III IX XI | ./decorate -k1,1:roman --decorate 00100 C 5 V 3 III 9 IX 00011 XI ### combine decorate-sort-undecorate $ printf "%s\n" C V III IX XI \ | ./decorate -k1,1:roman --decorate \ | sort -k1,1 \ | ./decorate --undecorate 1 III V IX XI C Easy/automatic 'decorate-sort-undecorate' method Since the decorate-sort-undecorate pattern is repetitive, the "decorate" program can execute 'decorate + sort + undecorate' automatically (forking + piping to sort and back). This is done when "--decorate" and "--undecorate" arguments are *not* specified (i.e. - decorate is used as a 'sort' wrapper): $ printf "%s\n" C V III IX XI | ./decorate -k1,1:roman III V IX XI C Conversions Syntax # The -k/--key specification follows sort(1), with the addition of allowing a conversion function name following ":" (colons). Examples: $ printf "MMXX III\n" | ./decorate --decorate -k1,1:roman 02020 MMXX III $ printf "MMXX III\n" | ./decorate --decorate -k1.2,1:roman 01020 MMXX III $ printf "MMXX III\n" | ./decorate --decorate -k1,1:strlen 4 MMXX III $ printf "MMXX III\n" | ./decorate --decorate -k1:strlen 8 MMXX III The "r" (=reverse) flag can also be used: $ printf "%s\n" X I IV IX VI | ./decorate -k1,1:roman I IV VI IX X $ printf "%s\n" X I IV IX VI | ./decorate -k1,1r:roman X IX VI IV I Available conversions methods: as-is copy as-is roman roman numerals strlen length (in bytes) of the specified field ipv4 dotted-decimal IPv4 addresses ipv6 IPv6 addresses ipv4inet number-and-dots IPv4 addresses (incl. octal, hex values) Examples: $ printf "%s\n" 10.2.3.4 8.9.7.3 | ./decorate --decorate -k1,1:ipv4 0A020304 10.2.3.4 08090703 8.9.7.3 $ printf "%s\n" 10.010.0x10.10 192.168 \ | ./decorate --decorate -k1,1:ipv4inet 0A08100A 10.010.0x10.10 C0A8 192.168 $ printf "%s\n" :: 2000::1234 :::192.168.1.42 \ | ./decorate --decorate -k1,1:ipv6 ::::::: :: 2000:::::::1234 2000::1234 ::::::C0A8:012A :::192.168.1.42 Mixing -k/--key for decorating and sorting When 'decorate' automatically runs sort(1), any keys that are not used for decoration are passed to 'sort' (after being adjusted for the right column). Example: $ printf "%-2s %d\n" C 4 IC 1 I 107 II 4 C 31 I 19 \ | ./decorate -k1,1:roman -k2nr,2 I 107 I 19 II 4 IC 1 C 31 C 4 $ printf "%-2s %d\n" C 4 IC 1 I 107 II 4 C 31 I 19 \ | ./decorate -k2n,2 -k1,1:roman IC 1 II 4 C 4 I 19 C 31 I 107 To better understand what parameters are passed to sort(1), use "--print-sort-args" (which only prints the arguments to be used with sort(1) but does not decorate or sort the input): Here, "decorate" knows that a new field
Re: decorate - new sorting-helper program (experimental)
Hello Bernhard, Thanks for the feedback and thanks trying it (or trying to try it :) ). On 2020-04-14 12:51 a.m., Bernhard Voelker wrote: On 2020-04-13 21:14, Assaf Gordon wrote: I'm happy to announce the first experimental release of the "decorate" program. The program is part of the "datamash" package, and available here: https://alpha.gnu.org/gnu/datamash/datamash-1.5.17-735b.tar.gz I'm a bit confused. I've just pulled from 'git://git.sv.gnu.org/datamash.git', but the decorate sources are not there yet, but instead, there's a 'v1.6' tag which doesn't fit into above's "1.5.17-735b" versioning. Do you push somewhere else? Sorry about that, it's in such a preliminary state that I didn't want to push it yet (certainly not to the master branch). I added now a new "decorate" branch, which contains the (still messy) code. It was branched off a version prior to 1.6, hence the version issue. To test it please try: git clone -b decorate git://git.sv.gnu.org/datamash.git Once it stabilizes I will of course clean it, squash it, and push it to the "master" branch. regards, - assaf
decorate - new sorting-helper program (experimental)
Hello, I'm happy to announce the first experimental release of the "decorate" program. 'decorate' works in tandem with coreutils' sort(1) to allow new sorting methods (e.g. IP addresses, roman numerals, string lengths). This is a new program but an old idea, suggested by Pádraig here: https://lists.gnu.org/r/bug-coreutils/2015-06/msg00076.html --- The program is part of the "datamash" package, and available here: https://alpha.gnu.org/gnu/datamash/datamash-1.5.17-735b.tar.gz "./configure && make" should give you the "decorate" executable. The rest of this (long) email shows usage information and examples. This is an experimental version, and everything could still change. Comments, suggestions and feedback are *very* welcomed. regards, - assaf General Usage # The general idea is: 1. convert a field of an input file to a format that can be easily sorted by sort(1), e.g., converting roman numerals to their decimal equivalent or IPv4 addresses to 32 bit hex value. 2. Pass this converted (=decorated) input to sort 3. remove (=undecorate) the converted fields. Example 1: ### convert roman-numerals, add new field $ printf "%s\n" C V III IX XI | ./decorate -k1,1:roman --decorate 00100 C 5 V 3 III 9 IX 00011 XI ### combine decorate-sort-undecorate $ printf "%s\n" C V III IX XI \ | ./decorate -k1,1:roman --decorate \ | sort -k1,1 \ | ./decorate --undecorate 1 III V IX XI C Easy/automatic 'decorate-sort-undecorate' method Since the decorate-sort-undecorate pattern is repetitive, the "decorate" program can execute 'decorate + sort + undecorate' automatically (forking + piping to sort and back). This is done when "--decorate" and "--undecorate" arguments are *not* specified (i.e. - decorate is used as a 'sort' wrapper): $ printf "%s\n" C V III IX XI | ./decorate -k1,1:roman III V IX XI C Conversions Syntax # The -k/--key specification follows sort(1), with the addition of allowing a conversion function name following ":" (colons). Examples: $ printf "MMXX III\n" | ./decorate --decorate -k1,1:roman 02020 MMXX III $ printf "MMXX III\n" | ./decorate --decorate -k1.2,1:roman 01020 MMXX III $ printf "MMXX III\n" | ./decorate --decorate -k1,1:strlen 4 MMXX III $ printf "MMXX III\n" | ./decorate --decorate -k1:strlen 8 MMXX III The "r" (=reverse) flag can also be used: $ printf "%s\n" X I IV IX VI | ./decorate -k1,1:roman I IV VI IX X $ printf "%s\n" X I IV IX VI | ./decorate -k1,1r:roman X IX VI IV I Available conversions methods: as-iscopy as-is romanroman numerals strlen length (in bytes) of the specified field ipv4 dotted-decimal IPv4 addresses ipv6 IPv6 addresses ipv4inet number-and-dots IPv4 addresses (incl. octal, hex values) Examples: $ printf "%s\n" 10.2.3.4 8.9.7.3 | ./decorate --decorate -k1,1:ipv4 0A020304 10.2.3.4 08090703 8.9.7.3 $ printf "%s\n" 10.010.0x10.10 192.168 \ | ./decorate --decorate -k1,1:ipv4inet 0A08100A 10.010.0x10.10 C0A8 192.168 $ printf "%s\n" :: 2000::1234 :::192.168.1.42 \ | ./decorate --decorate -k1,1:ipv6 ::::::: :: 2000:::::::1234 2000::1234 ::::::C0A8:012A :::192.168.1.42 Mixing -k/--key for decorating and sorting When 'decorate' automatically runs sort(1), any keys that are not used for decoration are passed to 'sort' (after being adjusted for the right column). Example: $ printf "%-2s %d\n" C 4 IC 1 I 107 II 4 C 31 I 19 \ | ./decorate -k1,1:roman -k2nr,2 I 107 I 19 II 4 IC 1 C 31 C 4 $ printf "%-2s %d\n" C 4 IC 1 I 107 II 4 C 31 I 19 \ | ./decorate -k2n,2 -k1,1:roman IC 1 II 4 C 4 I 19 C 31 I 107 To better understand what parameters are passed to sort(1), use "--print-sort-args" (which only prints the arguments to be used with sort(1) but does not decorate or sort the input): Here, "decorate" knows that a new field will be added (the converted roman numerals), and so the "-k2nr,2" is adjusted to be "-k3,3nr": $ ./decorate --print-sort-args -k1,1:roman -k2nr,2 sort -k1,1 -k3,3nr Here, "decorate" will add two fields (first ipv4 from field 2, and roman numerals from field 3). The "-k5,5V" is adjusted to be "-k7,7V": $ ./decorate --print-sort-args -k5,5V -k2,2:ipv4 -k3,3:roman sort -k7,7V -k1,1 -k2,2 Other sort(1) parameters When 'decorate' automatically runs sort(1), several common sort(1) options are accepted and passed as-is to sort. Example: $ ./decorate --print-sort-args -k2,2:ipv4 \ --stable \
bug#40530: feature proposal: coreutils -> sort: adding sorting ability for Hebrew numerals
Hello, > On Apr 9, 2020, at 3:23 PM, Zeev Pekar wrote: > > it would be nice to be able to sort (coreutils -> sort) Hebrew numerals: An interesting idea, but I think it is a bit too niche to be included in the coreutils “sort” program (tradeoff of usefulness vs bloat). However, such functionality is very suitable to an old idea of an auxiliary “decorate” program that will allow many more sorting options when used in tandem with “sort”. I’ve started writing such program some time ago, based on Pádraig's idea (never completed, but perhaps these days are perfect opportunity to complete it): https://lists.gnu.org/archive/html/coreutils/2019-03/msg00056.html Would you like to try your hand at coding the sorting rules for such Hebrew-numerals sort? regards, - Assaf
Re: altchars for base64
Hello, On 2020-03-15 12:12 a.m., Kaz Kylheku (Coreutils) wrote: On 2020-03-14 22:20, Peng Yu wrote: Python base64 decoder has the altchars option. [...] But I don't see such an option in coreutils' base64. Can this option be added? Thanks. # use %* instead of +/: base64 whatever | tr '+/' '%*' The reason for alternative characters is typically do use then in URLs, where "/" and "+" are problematic. A new command "basenc" was introduced in coreutils version 8.31 (released last year) which supports multiple encodings. One of these is a "web-safe" variant of base64, as defined in RFC4648 section 5: $ printf '\376\117\202' | basenc --base64 /k+C $ printf '\376\117\202' | basenc --base64url _k-C regards, - assaf P.S. The other supported encodings are (basenc --help): --base64 same as 'base64' program (RFC4648 section 4) --base64url file- and url-safe base64 (RFC4648 section 5) --base32 same as 'base32' program (RFC4648 section 6) --base32hex extended hex alphabet base32 (RFC4648 section 7) --base16 hex encoding (RFC4648 section 8) --base2msbf bit string with most significant bit (msb) first --base2lsbf bit string with least significant bit (lsb) first
Re: Suggestion: Keep headings when sorted
Hello, On 2020-01-21 2:14 a.m., Mattias Johansson wrote: I often find that I want to keep one or a few lines untouched by sort, and end up using something like this: $ awk 'NR == 1; NR > 1 { print $0 | "sort" }' It would be handy if sort had an option for 'number of heading lines' or similar! I imagine something like this: $ sort -H # keeps first line in place while sorting the rest Adding "skip-header" support to GNU sort has been requested and discussed several times in the past (including by me, seven years ago...). The decision was that such functionality can be easily achieved using existing tools. For some more details and past discussions, please see: https://www.gnu.org/software/coreutils/rejected_requests.html#sort https://lists.gnu.org/archive/html/coreutils/2013-01/msg00027.html https://lists.gnu.org/archive/html/coreutils/2014-11/msg00022.html https://lists.gnu.org/archive/html/coreutils/2015-10/msg00102.html --- The simplest method is: $ ANY-PROGRAM | ( sed -u 1q ; sort ) This is slightly simpler and shorter than the above "awk" method. It requires GNU sed for the "-u/--unbuffered" option. The above sed+sort invocation can be made into a shell function: sorth() { sed -u 1q ; sort "$@"; } And then use "sorth" instead of "sort" (nothing the main difference is that "sort" can take input files on the command line, while "sorth" must take the input from STDIN). --- Change "1q" to "3q" or other values to keep more than one line of headers at the top of the input. The above shell function can be improved into: sorth() { num=$1 ; shift ; sed -u ${num}q ; sort "$@"; } To accept the number of header lines as a (required) first parameter, e.g. the following will keep the first the values intact and randomize the remaining 7: seq 10 | sorth 3 -R --- If all else fails, and such a sort-header program is still needed, I can offer my own attempt at such a perl-wrapper script, which I wrote before knowing about the "sed/sort" method: https://github.com/agordon/bin_scripts/blob/master/scripts/sort-header.pl Hope this helps, regards, Assaf
Re: base64 utilty Question
Hello, On 2020-01-03 1:00 p.m., Bahubali Y wrote: I have question about base64. If I have "LF" as line terminator will that me converted to "CRLF" in base64 encoding ?. Generally no. GNU base64 preserves the input exactly. Example: $ printf "hello\n" | base64 | base64 -d | od -tx1c -An 68 65 6c 6c 6f 0a h e l l o \n I observed above case in my usage Perhaps another part of your processing converts the new line characters? esp. if you are using Windows. If you can provide a succinct reproducible example, that would help in diagnosing the issue. regards, - assaf
Re: [PATCH] ls: support --time=creation to show/sort birth time
Hello, On 2020-01-02 2:01 p.m., Pádraig Brady wrote: On 02/01/2020 20:29, Assaf Gordon wrote: Regarding "fall back to mtime", I'm seeing the following results on some systems - not necessarily a bug, but perhaps it's worth knowing what to expect: * Debian 10/x86_64, Linux Kernel 4.19.0, glibc 2.28-10, with ext2 file system (not supporting birthtime): $ ./src/ls -l --time=birth /tmp/dummy-ext2/2 -rw-r--r-- 1 root root 0 Dec 31 1969 /tmp/dummy-ext2/2 Hmm. That suggests that STATX_BTIME is set in the returned statx mask, but populated with 0 in the structure ({-1,-1} would have printed as '?'). Though you say src/stat prints '-' in all cases, and the logic should be much the same. Could you confirm the birth time significant returns for this case. epoch isn't a bad time to output in this case, but it would be good to be consistent. Indeed, the returned "btime" is zero: $ strace -v -e trace=statx ./src/stat /tmp/dummy-ext2/2 statx(AT_FDCWD, "/tmp/dummy-ext2/2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_blksize=1024, stx_attributes=0, stx_nlink=1, stx_uid=0, stx_gid=0, stx_mode=S_IFREG|0644, stx_ino=12, stx_size=0, stx_blocks=0, stx_attributes_mask=STATX_ATTR_COMPRESSED|STATX_ATTR_IMMUTABLE|STATX_ATTR_APPEND|STATX_ATTR_NODUMP|STATX_ATTR_ENCRYPTED, stx_atime={tv_sec=1577995860, tv_nsec=0} /* 2020-01-02T13:11:00-0700 */, stx_btime={tv_sec=0, tv_nsec=0}, stx_ctime={tv_sec=1577995860, tv_nsec=0} /* 2020-01-02T13:11:00-0700 */, stx_mtime={tv_sec=1577995860, tv_nsec=0} /* 2020-01-02T13:11:00-0700 */, stx_rdev_major=0, stx_rdev_minor=0, stx_dev_major=7, stx_dev_minor=0}) = 0 File: /tmp/dummy-ext2/2 Size: 0 Blocks: 0 IO Block: 1024 regular empty file Device: 700h/1792d Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2020-01-02 13:11:00.0 -0700 Modify: 2020-01-02 13:11:00.0 -0700 Change: 2020-01-02 13:11:00.0 -0700 Birth: - +++ exited with 0 +++ $ strace -v -e trace=statx ./src/ls -l --time=birth /tmp/dummy-ext2/2 statx(AT_FDCWD, "/tmp/dummy-ext2/2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_SIZE|STATX_BTIME, {stx_mask=STATX_BASIC_STATS, stx_blksize=1024, stx_attributes=0, stx_nlink=1, stx_uid=0, stx_gid=0, stx_mode=S_IFREG|0644, stx_ino=12, stx_size=0, stx_blocks=0, stx_attributes_mask=STATX_ATTR_COMPRESSED|STATX_ATTR_IMMUTABLE|STATX_ATTR_APPEND|STATX_ATTR_NODUMP|STATX_ATTR_ENCRYPTED, stx_atime={tv_sec=1577995860, tv_nsec=0} /* 2020-01-02T13:11:00-0700 */, stx_btime={tv_sec=0, tv_nsec=0}, stx_ctime={tv_sec=1577995860, tv_nsec=0} /* 2020-01-02T13:11:00-0700 */, stx_mtime={tv_sec=1577995860, tv_nsec=0} /* 2020-01-02T13:11:00-0700 */, stx_rdev_major=0, stx_rdev_minor=0, stx_dev_major=7, stx_dev_minor=0}) = 0 -rw-r--r-- 1 root root 0 Dec 31 1969 /tmp/dummy-ext2/2 +++ exited with 0 +++ Looking closer at the new ls.c code (** are added for emphasis): --- do_statx (int fd, const char *name, struct stat *st, int flags, unsigned int mask) { struct statx stx; **bool want_btime = mask & STATX_BTIME; int ret = statx (fd, name, flags, mask, ); if (ret >= 0) { statx_to_stat (, st); /* Since we only need one timestamp type, store birth time in st_mtim. */ **if (mask & STATX_BTIME) st->st_mtim = statx_timestamp_to_timespec (stx.stx_btime); ** else if (want_btime) st->st_mtim.tv_sec = st->st_mtim.tv_nsec = -1; } return ret; --- Wouldn't "mask & STATX_BTIME" always be the same as "want_btime", resulting in the "else if" part never to be executed? IIUC, "mask" is the requested bitmask. Comparing with "stat.c:do_stat()", I see: ... statx_to_stat (, ); if (stx.stx_mask & STATX_BTIME) pa.btime = statx_timestamp_to_timespec (stx.stx_btime); ... Which I recon is the returned bitmask (vs requested bitmask). Could that be the issue ? -assaf
Re: [PATCH] ls: support --time=creation to show/sort birth time
Hello Pádraig and all, On 2020-01-02 10:48 a.m., Pádraig Brady wrote: + ls now supports the --time=birth option to display and sort by + file creation time, where available. +1 Patch looks good, builds and passes the test on Debian 10/x86_64, OpenBSD 6.6, FreeBSD 12.1, Alpine Linux, and Cygwin-10/64bit on Windows7/NTFS. A suggestion: static char const *const time_args[] = { - "atime", "access", "use", "ctime", "status", NULL + "atime", "access", "use", + "ctime", "status", + "birth", "creation", + NULL }; static enum time_type const time_types[] = { - time_atime, time_atime, time_atime, time_ctime, time_ctime + time_atime, time_atime, time_atime, + time_ctime, time_ctime, + time_btime, time_btime, }; Perhaps add "btime" and "crtime" as aliases to birth time? "btime" is for completion with atime/ctime. "crtime" is used/mentioned in some contexts (e.g. in "debugfs"). +/* Return the platform birthtime member of the stat structure, + or fallback to the mtime member, which we have populated + from the statx structure where supported. */ Regarding "fall back to mtime", I'm seeing the following results on some systems - not necessarily a bug, but perhaps it's worth knowing what to expect: * Debian 10/x86_64, Linux Kernel 4.19.0, glibc 2.28-10, with ext2 file system (not supporting birthtime): $ ./src/ls -l --time=birth /tmp/dummy-ext2/2 -rw-r--r-- 1 root root 0 Dec 31 1969 /tmp/dummy-ext2/2 (I guess this is unix-epoch adjusted for my local time zone) * Alpine Linux, Kernel 4.19.80, musl-libc 1.1.22: $ ./src/ls -l --time=birth README -rw-r--r-- 1 miles miles 10778? README * OpenBSD 6.6 on "ffs" type file system: $ ./src/ls -l --time=birth README -rw-r--r-- 1 miles miles 10778? README On all the above systems, running "./src/stat" correctly shows "birth: -" . regards, - assaf
Re: Decimal time support in 'date'
Hello, On Thu, Dec 12, 2019 at 6:57 PM za3k--- via GNU coreutils General Discussion wrote: > > I am interested in adding support for decimal time to 'date', but before > I dive into writing a patch, I wanted to ask whether the patch has a > chance of being accepted--this may just be too obscure. Thank you for the suggestion and for checking in first - that's an excellent approach. > In decimal time, 2019-12-12.75 would represent 2019-12-12T18:00:00. > Decimal time in the modern era is mainly used in timekeeping (to track > employee or contracting hours) and in scientific recording (to make > drawing graphs easy). Astronomers use another form of decimal time on > their own calendar and would not be supported. This is an interesting idea, certainly worth discussing. When such format is used by time-keepers or scientific recording, is it being used on the command-line or from a shell script? or is this more commonly done in a higher-level programming language? Can you expand on the other format used by Astronomers? --- Before going further, please be aware that in order for such patch to be accepted (or even evaluated), we'll need a copyright assignment from you (and, potentially, from your employer or university, if you implement it as part of work/school project). To learn more, please see here: https://www.gnu.org/licenses/why-assign.en.html To start the process, please fill the following form and send it to ass...@gnu.org : https://git.savannah.gnu.org/cgit/gnulib.git/tree/doc/Copyright/request-assign.future (for the program/package question, please fill both "coreutils" and "gnulib") --- On the technical side, I expect such a patch to modify mainly gnulib's nstrftime.c module: https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/nstrftime.c If we consider adding a new letter operator (e.g. "%X" ) we should make sure it does not conflate with any existing letters, including on non-gnu implementations (e.g. on BSDs). regards, - assaf
bug#38003: date --date=-1month gives same month today
tag 38003 notabug close 38003 stop Hello, On 2019-10-31 2:34 a.m., Ilja Honkonen wrote: Please CC me as I'm not on this list. Running date (GNU coreutils) 8.26 on fedora 30 today (date --utc -I: 2019-10-31) with --date=-1month gives the same month which doesn't make sense: $ date --utc -I --date=-1month 2019-10-01 date gained a "--debug" option that helps diagnosing the issue: $ date --utc -I --debug --date=-1month date: parsed relative part: -1 month(s) [...] date: using current date as starting value: '(Y-M-D) 2019-10-31' [...] date: warning: when adding relative months/years, it is recommended to specify the 15th of the months < date: after date adjustment (+0 years, -1 months, +0 days), date: new date/time = '(Y-M-D) 2019-10-01 17:29:20' date: warning: month/year adjustment resulted in shifted dates: date: adjusted Y M D: 2019 09 31< date:normalized Y M D: 2019 10 01< [...] date: final: (Y-M-D) 2019-10-01 17:29:20 (UTC) 2019-10-01 -- Subtracting 1 month from October 31st results in September 31st. Since the date doesn't exist, it is normalized: September 31st is "one day after September 30th", which results in October 1st. The "--debug" option also warns: when subtracting months, it is recommended to specify the 15th (middle) of the month, exactly to avoid such issues. $ date --utc -I --date="2019-10-15 -1month" 2019-09-15 regards, - assaf
Re: How to implement the V comparsion used by sort in python?
Hello, On 2019-10-28 3:00 a.m., Florian Weimer wrote: * Assaf Gordon: On Oct 26, 2019, at 5:05 PM, Peng Yu wrote: Are you sure they are 100% compatible with V? I don’t want to use them just later find they are not 100% compatible. There are no such guarantees, especially not with free software. I don't know why you say that. Perhaps my writing wasn't clear enough. What I meant was: *I* can not provide any such guarantees (since the question was "are *you* sure"). I can't speak for other coreutils maintainers (or the people who wrote the gnulib version-compare module), but I highly suspect that they will also not be willing to guarantee such %100 compatibility. As for the "free software" part - (almost?) every free software license explicitly mentions that the software comes with no warranty what so ever. Typically the license include the phrase "[no] FITNESS FOR A PARTICULAR PURPOSE" - meaning that even there is some implied purpose (such as sorting 'naturally' for "sort -V"), there is no guarantee it is even fit for that purpose. In practice, it means that even if I (or others) took a cursory look at both "sort -V" and the mentioned python package and deemed them "compatible", there is still *no* guarantees they are actually 100% compatible. There could always be a bug or an unexpected result. It seemed to me the OP wanted some very strong guarantees regarding that code that would save him time and effort, without investing time or other resources to do the testing themselves. To that, my answer was "no such guarantees". If my previous answer was too brief, I hope this clarifies it. But someone certainly has to do this work. I completely agree. If the OP wants reasonable assurance they are compatibly, they can read the details about "sort -V" and invest the time and effort in comparing it to the python package algorithm. Or for stronger guarantees - perhaps they can consider hiring someone to do a very thorough investigation and provide them with some concrete guarantees. regards, - assaf
Re: How to implement the V comparsion used by sort in python?
Hello, > On Oct 26, 2019, at 5:05 PM, Peng Yu wrote: > > Are you sure they are 100% compatible with V? I don’t want to use them just > later find they are not 100% compatible. There are no such guarantees, especially not with free software. The details I previously sent to you ( https://lists.gnu.org/archive/html/coreutils/2019-10/msg2.html ) explain any differences between “sort -V” and debian’s dpkg/apt algorithm, which is what the mentioned python package implements. You’ll have to go some work yourself to determine whether these differences affect your desired outcome. regards, - Assaf
Re: How to implement the V comparsion used by sort in python?
Hello, > On Oct 25, 2019, at 8:00 PM, Peng Yu wrote: > > > I'd like to mimic the V sort order in python. Is there any easy to use > comparison available in python? A simple online search will show several python packages that can do it. For example: https://deb-pkg-tools.readthedocs.io/en/latest/api.html#module-deb_pkg_tools.version -assaf
Re: Does head util cause SIGPIPE?
Hello, The question "does head cause SIGPIPE" is seemingly simple, and the answer is "yes" - but there are some nuances that might cause unexpected results. More specifically, 1. The "head" process terminates when all requested lines have been printed (e.g. one line with "head -n1"). 2. The STDIN of the 'head' process is closed, which corresponds to the STDOUT of the preceding process ('find' in your case). 3. *IF* the 'find' process tries to write again to its STDOUT (which is now a closed pipe), then a SIGPIPE will be raised and 'find' will terminate. On 2019-10-25 1:56 a.m., Ray Satiro wrote: Recently I tracked down a delay in some scripts to this line: find / -name filename* 2>/dev/null | head -n 1 [...] owner@ubuntu1604-x64-vm:~$ ( trap '' pipe; find / -name initrd* 2>/dev/null | strace -e 'trace=!all' head -n 1) /initrd.img +++ exited with 0 +++ (few seconds wait) In your case, I can guess that there is only a single file matching your predicate 'initrd*'. The 'head' indeed terminates, and the pipe is closed. But if 'find' doesn't find any more matching files, it doesn't try to print anything more, and SIGPIPE is never raised. Note the manual page of pipe(7) says: "If all file descriptors referring to the read end of a pipe have been closed, then a write(2) will cause a SIGPIPE signal to be generated for the calling process." So if no further files were found, 'find' continues (slowly scanning the disk) until it finishes. Since we only need the first line I can just use find options -print -quit and skip piping to head. But say we needed the first n results, how would I do that with head and get find to terminate rather than continue searching? That's an interesting question, but perhaps better answered in bug-findut...@gnu.org (although findutils maintainers are also on this mailing list). --- There could be other instances where the sending process won't receive SIGPIPE: if the entire output is very small (less than 4096 bytes on linux, and at least 512 bytes on POSIX systems). For example, this 'seq' won't be terminated by a signal, as the entire output is just 21 bytes: $ seq 10 | wc -c 21 $ seq 10 | head -n1 But this 'seq' will be terminated by a signal: $ seq 1 | wc -c 48894 $ seq 1 | head -n1 --- GNU 'time' can be used to quickly see how a process terminated (with a signal, or a non-zero exit code). It will print a line such as: "Command terminated by signal 13" (signal 13 is SIGPIPE on linux). $ \time -p -f "" seq 1 | head -n1 1 Command terminated by signal 13 $ \time -p -f "" seq 10 | head -n1 1 And just couple of days ago a new experimental feature was added to GNU time to allow finer printf-style output about signals and exit codes: https://lists.gnu.org/archive/html/bug-time/2019-10/msg2.html --- Lastly, Recent version of GNU 'env' (from coreutils version 8.31, released on March 2019) added new command-line options to ignore,block and restore to default any signal, as a useful alternative to "trap ''", see: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=95adadd9a420812ddd3f0fc6105f668922a97ae5 and the manual: https://www.gnu.org/software/coreutils/env --- Hope this helps, - assaf
bug#37702: Suggestion for 'df' utility
Hello Bernhard, On 2019-10-13 3:57 p.m., Bernhard Voelker wrote: On 2019-10-13 23:28, Paul Eggert wrote: In any sane system there would be only four lines of non-header output (for tmpfs etc, /, /home, and /media/eggert/B827-D456), but df is outputting 28 lines. What is so special about tmpfs so that you would like to see it? As an interesting use-case (though not common), I recently configured a raspberry PI device, and wanted to mount as many locations on tmpfs as possible, e.g. "/tmp" "/var/tmp", "/var/log" etc. In was very useful in those cases to be able to see separate tmpfs file system listed, with information about how big they are and how much space was used. Also in other systems where "/tmp" is a "tmpfs", users might want to see how much space is available. If we hide it by default, they can of course use "df /tmp" or "df --all" - it's not about removing this option, it is just about making users' life harder or easier, and making unexpected changes. I recently also encountered a change in a default behavior of a program which I've been using a very long time - and it is *very* frustrating to have something that worked "just fine" for so long being changed. Here on my openSUSE:Tumbleweed system, I see the following: $ df -T Filesystem Type 1K-blocks Used Available Use% Mounted on [...] /dev/loop0 ext2 31729 31729 0 100% /FULL_PARTITION_TMPDIR [...] (The /FULL_PARTITION_TMPDIR is used by a special coreutils test.) That's an interesting case, where I would think you'd want to see it, because you explicitly mounted it. I think I could well live with adding 'devtmpfs' and 'tmpfs' to the pseudo file systems in gnulib's "mountlist.c". I agree, but think this needs to be communicated very well, and in advance - perhaps announce this change ahead of time to the respective package maintainers of each distribution - just so they'll know it's coming (and also have a way to revert it if they don't like it). This seems to be a small change, and not satisfying the snap case. Possibly hiding "squashfs" of readonly-mounts could get rid of those snaps? regards, -assaf
bug#37702: Suggestion for 'df' utility
On 2019-10-13 3:28 p.m., Paul Eggert wrote: [..] I mean c'mon, here's the output of 'df' on the Ubuntu 18.04.3 LTS workstation I'm typing this particular message on. In any sane system there would be only four lines of non-header output (for tmpfs etc, /, /home, and /media/eggert/B827-D456), but df is outputting 28 lines. This is ridiculous. It is certainly inconvenient if that's not what you are looking for (and certainly most desktop users aren't). But I'm not sure if it's easy to find a set of criteria that would work well while having minimal unexpected side effects of hiding entries people in other systems do expect to see. Out of curiosity, can you share the output of the following commands on the same system? lsblk df -x tmpfs -x devtmpfs -x squashfs Thanks, - assaf
bug#37702: Suggestion for 'df' utility
Hi all, On 2019-10-13 2:27 p.m., Paul Eggert wrote: On 10/13/19 2:41 AM, Pádraig Brady wrote: I wonder could we key (also) on used==0||available==0. Yes, looking at the sample output I gave earlier, I'd say we could by default drop filesystems where usage is 1% or less. That would solve the problem for my workstation. This is roughly akin to the "used==0" test you're suggesting. I would humbly suggest caution with such unexpected user-facing changes to the default output of 'df' - learning the lessons from changing the quotes in 'ls'. Countless users have been using 'df' in their own ways, and have gotten used to certain outputs. This thread originated by a request to "clean up" the output on newer ubuntu machines which use "snap" packages as /dev/loopN . Let's not turn that into a drastic change that will affect many other existing systems - the users on other systems did not ask for any changes. --- Specifically for "default drop filesystems where usage is 1% or less" - I can think of few cases off the top of my head where this would be extremely confusing: - I recently installed a 33TB raid file system. The usage on that system is at %1 and will stay like so for at least several days. - Amazon cloud services (AWS) offers an NFS4 service (they call it "EFS") that has reported size of 8 exabytes. There too usage could be at %1 for a long long time. --- For cases where I want to list only the "real" storage, I typically use an alias such as: alias dff='df -h -x tmpfs -x devtmpfs' And it would be very easy and least disruptive to recommend to ubuntu users to add "-x squashfs" or another file system to ignore. Perhaps we can come up with a recommended list of "lesser" file systems to ignore (or conditions such as read-only file systems) and add it as a new option, but please let's not make it the default. My two cents, - assaf
Re: md5sum and recursive traversal of dirs
( adding bug-time@ ) Hello, On 2019-10-10 11:29 a.m., Сергей Кузнецов wrote: [...] By the way, I wrote two new small programs: xchg (or swap, which name is better?) And exst (exit status). [...] The second program launches the program indicated at startup and, after its completion, prints the output status or the caught signal. Somewhat related: the GNU Time program can report both exit code and signal in the following way: $ env time -f "Exit code: %x\n" [SOME PROGRAM that dies with segfault] Command terminated by signal 11 Exit code: 0 However, for a long time I wanted to add a new output format specifier to GNU time that will indicate whether a program existed cleaning or with a signal (and which exit code or signal). Your message reminded me of that, and I hope to add something like that in the near future. It could be something like: %T 1 if program terminated by a signal, empty otherwise %S signal number of program terminal by a signal, empty otherwise %X exit code if program terminated normal, or empty if terminated by a singal And could be used like so: time -f "Signaled: %T (signal number: %S)\nExit code: %X\n" [PROGRAM] Please send comments and suggestions to bug-t...@gnu.org . regards, - assaf P.S. Note that your built-in shell like has its own 'time' function. To use GNU time run "env time" or "\time" .
Re: Is natural sort supported?
(please use "reply-all" or "reply-group" to keep the coreutils@ mailing list in the loop) On 2019-10-08 1:09 a.m., Peng Yu wrote: Then, the option name causes misunderstand. -V is actually --debian-version. Or simply "--version-sort" as it is now. The natural order is plain and simple, just as what is explained below, which can be implemented by a few lines of python code. At the risk of arguing over semantics, I'll say again: there is no "one correct" natural order standard, and therefore it is not "plain and simple" because there is no just "one" such order. It can certainly be there there are some specific implementation of 'natural sort' that are simple. https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/ So my question is whether natural order as in the above URL is supported? No. and note that even the above blog writes: "... Don't let Ned's clever Python ten-liner fool you. Implementing a natural sort is more complex than it seems ... ".
Re: Is natural sort supported?
Hello, On 2019-10-08 12:36 a.m., Peng Yu wrote: The following example shows that version sort is not natural sort. Is natural sort supported in by `sort`? There is no such thing as "THE correct natural sort" order... $ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order should have been reversed. ... therefore "should have" is simply incorrect expectation. You might think it "should" be one way, and other implementations think it "should" be another way. For more details, please see the attached HTML file for details. (this HTML file is a new chapter of the coreutils manual that will be included in the next release. The source texinfo is here: https://git.savannah.gnu.org/cgit/coreutils.git/tree/doc/sort-version.texi ). regards, - assaf #[1]Version sort ordering 1 Version sort ordering • [2]Version sort overview: • [3]Implementation Details: • [4]Differences from the official Debian Algorithm: • [5]Advanced Topics: __ Next: [6]Implementation Details, Up: [7]Version sort ordering 1.1 Version sort overview version sort ordering (and similarly, natural sort ordering) is a method to sort items such as file names and lines of text in an order that feels more natural to people, when the text contains a mixture of letters and digits. Standard sorting usually does not produce the order that one expects because comparisons are made on a character-by-character basis. Compare the sorting of the following items: Alphabetical sort: Version Sort: a1 a1 a120 a2 a13 a13 a2 a120 version sort functionality in GNU coreutils is available in the ‘ls -v’, ‘ls --sort=version’, ‘sort -V’, ‘sort --version-sort’ commands. • [8]Using version sort in GNU coreutils: • [9]Origin of version sort and differences from natural sort: • [10]Correct/Incorrect ordering and Expected/Unexpected results: __ Next: [11]Origin of version sort and differences from natural sort, Up: [12]Version sort overview 1.1.1 Using version sort in GNU coreutils Two GNU coreutils programs use version sort: ls and sort. To list files in version sort order, use ls with -v or --sort=version options: default sort: version sort: $ ls -1$ ls -1 -v a1 a1 a100 a1.4 a1.13 a1.13 a1.4 a1.40 a1.40 a2 a2 a100 To sort text files in version sort order, use sort with the -V option: $ cat input b3 b11 b1 b20 alphabetical order:version sort order: $ sort input $ sort -V input b1 b1 b11b3 b20b11 b3 b20 To sort a specific column in a file use -k/--key with ‘V’ ordering option: $ cat input2 1000 b3 apples 2000 b11 oranges 3000 b1 potatoes 4000 b20 bananas $ sort -k2V,2 input2 3000 b1 potatoes 1000 b3 apples 2000 b11 oranges 4000 b20 bananas __ Next: [13]Correct/Incorrect ordering and Expected/Unexpected results, Previous: [14]Using version sort in GNU coreutils, Up: [15]Version sort overview 1.1.2 Origin of version sort and differences from natural sort In GNU coreutils, the name version sort was chosen because it is based on Debian GNU/Linux’s algorithm of sorting packages’ versions. Its goal is to answer the question “which package is newer, firefox-60.7.2 or firefox-60.12.3 ?” In coreutils this algorithm was slightly modified to work on more general input such as textual strings and file names (see [16]Differences from the official Debian Algorithm). In other contexts, such as other programs and other programming languages, a similar sorting functionality is called [17]natural sort. __ Previous: [18]Origin of version sort and differences from natural sort, Up: [19]Version sort overview 1.1.3 Correct/Incorrect ordering and Expected/Unexpected results Currently there is no standard for version/natural sort ordering. That is: there is no one correct way or universally agreed-upon way to order items. Each program and each programming language can decide its own ordering algorithm and call it ’natural sort’ (or other various names). See [20]Other version/natural sort implementations for many examples of differing sorting possibilities, each with its own rules and variations. If you do suspect a bug in coreutils’ implementation of version-sort, see [21]Reporting bugs or
Re: The output from GNU Core Utilities dd is different in apline and ubuntu
Hello, On 2019-09-09 6:39 a.m., 薛帅 wrote: In Ubuntu 18.04.1 LTS, the `dd` command output three lines. [...] While in apline 3.9.0, the `dd` command output only two lines. Alpine linux does not use "coreutils" programs in the default installation. Most of the equivalent programs are from busybox. To see which implementation you are using, try: # which dd /bin/dd # ls -l /bin/dd lrwxrwxrwx 1 root root 12 Sep 9 13:22 /bin/dd -> /bin/busybox Also, coreutils' dd supports "--version": $ dd --version | head -n1 dd (coreutils) 8.30 while busybox's dd will show its version in the help/usage screen (which is shown when unsupported option "--version" is used): # dd --version 2>&1 | head -n1 BusyBox v1.29.3 (2019-01-24 07:45:07 UTC) multi-call binary. regards, - assaf
bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)
tag 37093 notabug close 37093 stop Hello, On 2019-08-19 10:44 p.m., Edward Huff wrote: In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. [...] baseline results: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 | tee >(sha256sum>&2) | wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5007 s, 33.0 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 $ First, Try to avoid UTF8 locales (i.e., force a C/POSIX locale with LC_ALL=C) which makes 'wc' much faster. On my computer: With UTF8 locale: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \ | tee >(sha256sum>&2) | time --portability wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 46.5928 s, 23.0 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 real 46.59 user 46.37 sys 0.19 With C locale: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \ | tee >(sha256sum>&2) | LC_ALL=C time --portability wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.60285 s, 125 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 real 8.60 user 5.22 sys 0.26 Second, The "word counting" feature in 'wc' is the main cpu-hog. If you avoid that (i.e. counting only lines, or only characters), 'wc' is even faster (and it automatically ignores UTF8 issues): $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \ | tee >(sha256sum>&2) \ | \time --portability wc -c 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.59429 s, 141 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 1073741824 real 7.59 user 0.10 sys 0.71 Notice that the "real time" wasn't changed much (from 8.6s to 7.59s), but the actual work performed by 'wc' (measured in "user time") is down drastically. Third, If you are comfortable with compiling Coreutils from source, you can build it using optimized hashing function from OpenSSL, like so: ./configure --with-openssl make Then, "sha256sum" will be faster (about 2x fast on my computer). If you don't want to re-compile it, consider using "openssl" directly to calculate the checksum, like so: dd if=/dev/zero count=1K bs=1M | tee >(openssl sha256>&2) | wc -c Fourth, To save few more microseconds, consider using dd with larger block size (bs=) and fewer blocks (count=), e.g.: $ time dd if=/dev/zero of=/dev/null count=1M bs=1K 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.865853 s, 1.2 GB/s real 0m0.868s user 0m0.288s sys 0m0.579s $ time dd if=/dev/zero of=/dev/null count=1K bs=1M 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.0998688 s, 10.8 GB/s real 0m0.102s user 0m0.000s sys 0m0.102s This won't reduce the total time by much, but will result in fewer sys-calls, and less CPU kernel time (at least by a tiny bit). The effect is more noticeable when reading or writing to a physical disk. Lastly, If you use GNU time instead of the shell's built-in 'time' function, you can specify custom output format, and easily show the timing of each program in the pipeline. Example: $ FMT="\n=== CMD: %C ===\nreal %e\tuser %U\tsys %S\n" $ \time -f "$FMT" dd if=/dev/zero count=1M bs=1K \ | \time -f "$FMT" tee >(\time -f "$FMT" sha256sum>&2) \ | \time -f "$FMT" wc -c 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.77339 s, 138 MB/s === CMD: dd if=/dev/zero count=1048576 bs=1024 === real 7.77 user 0.36 sys 1.65 === CMD: tee /dev/fd/63 === real 7.77 user 0.10 sys 1.30 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - === CMD: sha256sum === real 7.77 user 7.47 sys 0.27 1073741824 === CMD: wc -c === real 7.77 user 0.05 sys 0.76 As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. regards, - assaf
bug#37058: Error message with local deployment of Galaxy-k8s
tag 37058 notabug close 37058 stop Hello, Two issues are mixed here. First: On 2019-08-16 2:17 p.m., Gao, Jianliang wrote: I followed https://github.com/phnmnl/phenomenal-h2020/wiki/QuickStart-Installation-for-Local-PhenoMeNal-Workflow with Older Galaxy chart to deploy local galaxy-k8s instance with minikube on Windows 10. The following message came from the logs of my pod. I can't connect to my local instance. [...] kubectl logs galaxy-k8s-tr6fc [ run_galaxy_config.sh ] -- Galaxy sqlite directory created since we are not using postgresql [ run_galaxy_config.sh ] -- Replaced galaxy ini for the user's injected one [...] dpkg-preconfigure: unable to re-open stdin: [WARNING]: It is unneccessary to use '{{' in loops, leave variables in loop expressions bare. [...] galaxy.tools.deps WARNING 2019-08-16 19:20:48,175 Path './database/dependencies' does not exist, ignoring galaxy.tools.deps WARNING 2019-08-16 19:20:48,175 Path './database/dependencies' is not directory, ignoring galaxy.tools.deps.installable WARNING 2019-08-16 19:20:48,190 Conda not installed and auto-installation disabled. galaxy.tools.deps.installable WARNING 2019-08-16 19:20:48,190 Conda not installed and auto-installation disabled. These are issues related your Galaxy setup. (for other readers: "Galaxy" in this context is a web-based framework for bioinformatics analysis, see https://galaxyproject.org/ and https://usegalaxy.org ). Such issues are best asked in their support forums: https://galaxyproject.org/support/ https://help.galaxyproject.org This includes problems in underlying layers, such as the 'dpkg' errors above that result from deploying Galaxy VMs or instances or kubernetes or containers etc. tail: unrecognized file system type 0x794c7630 for 'paster.log'. please report this to bug-coreutils@gnu.org. reverting to polling This warning indeed comes from coreutils program 'tail', however it is harmless in your situation. For more details, see here: https://www.gnu.org/software/coreutils/filesystems.html --- A cursory look at the error logs makes it seem like "bug-coreutils@gnu.org" is the place to ask General questions about "Galaxy" server (because it is the last thing mentioned), but that is not the case. We can only help with coreutils programs (e.g. 'tail'). Please contact the Galaxy team for galaxy-related issues. Hope this helps. regards, - assaf
Re: building old coreutils versions on new glibc systems
On 2019-08-13 11:45 p.m., Bernhard Voelker wrote: On 8/13/19 8:10 PM, Bernhard Voelker wrote: I'd only like to see following additional changes: - make the script callable from an arbitrary directory, i.e., make the file name of the patches relative to the script, and - mention to adjust MANPATH (because that also works with the common directory: 'man df-8.23'). WDYT? ... and we need to make sc_prohibit_tab_based_indentation and sc_long_lines happy again. (BTW: The latter would alternatively be fixed if the patches would be named *.diff instead of *.patch.) Thanks again for the review and improvements. Pushed here with your suggestions (+ very minor script changes): http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=b8609c7cf -assaf P.S. We can still add a new web page (echoing the README.older-versions) to the coreutils website, to ease finding this information via a search engine - WDYT ?
Re: [Implemented] [coreutils] Partial UTF-8 support for "cut -c"
Hello, On Mon, Aug 12, 2019 at 09:19:54PM +0200, jaime.mosqu...@tutanota.com wrote: > I have partially implemented the option "-c" ("--characters") for UTF-8 > non-ASCII characters[...] First and foremost, Thank you for taking the time and effort to develop new features and send them to the mailing list. > This implementation has two, somewhat important shortcomings: > > * Other encodings are not implemented. > [...] I decided to stick with just UTF-8. At this point in time, this limitation is a show-stopper. A multibyte-aware implementation for GNU coreutils (for all programs, not just for 'cut') should support all native encodings. Ostensibly, this should be implementated using the standard mbrtowc(3)/mbstowcs(3) family of functions - but in reality there is another complication - a good implementation should also support systems where 'char_t' is limited to 16bit (instead of 32bit), and therefore require handling of unicode surrogate pairs. You can read more about the programs (and past suggested solutions) here https://crashcourse.housegordon.org/coreutils-multibyte-support.html (as a side node to other readers: if these are not a show-stopper requirements any longer, please chime in - this will make things much easier.) > * Modifier characters are treated as individual characters [...] > Decisively, many languages from Western Europe (Spanish, > Portuguese...) might or might not work with this program, depending on > which kind of accented letters are produced [...] I see two related but separate issues here. The first is generally called "unicode normalization", e.g. if the user sees the letter "A" with acute accent, is it encoded as one unicode character (U+00C1, "Latin Capital Letter A with Acute") or two unicode characters ("A" followed by U+0301 "Combining Acute Accent"). This issue is not a problem (in the sense that it's OK if cut treats "A" followed by U+0301 as separate characters) - because we will also include an additional program that can convert from one form to the other (called "unorm" in the URL mentioned above). The second interesting issue are the (new?) modifiers such as the U+1F3FB "EMOJI MODIFIER FITZPATRICK" (http://unicode.org/reports/tr51/#Diversity https://codepoints.net/U+1F3FB) that affect other characters. Here I don't see a easy way to know if characters should be grouped, and they should probably be treated as separate characters in all cases. > On the other hand, missing bytes in a multibyte UTF-8 characters are > correctly handled [...] > It is my hope that you should find this first approach to the problem > sufficient for most uses, and incorporate it into the mainstream code. I would say that your approach of dealing only with UTF-8 has some merits (i.e., as a "fast path" in parallel to slower mbrtowc(3) part, and the faster unibyte path). I suspect that if we do go down that road, it'll be better to use gnulib's already implemented UTF-8 code (and also UTF-16/UTF-32) instead of adding ad-hoc functions. > (Should my modifications be big enough to require it for copyright > reasons, my name is "Jaime Mosquera", and I obviously agree to the > terms of the GNU GPL.) Thank you - that is indeed the gist (copyright assignment is needed from contributors), but the technicalities are slightly different. We ask that contributors fill and send the following form: https://git.savannah.gnu.org/cgit/gnulib.git/tree/doc/Copyright/request-assign.future explained 'why?' here: https://www.gnu.org/licenses/why-assign.en.html regards, - assaf
Re: for the next gnulib update
On Mon, Aug 12, 2019 at 05:55:55PM +0200, Bernhard Voelker wrote: > On 8/12/19 5:50 AM, Assaf Gordon wrote: > > Updated patch (fixed typo in commit message). > > +1 thanks thanks, pushed here: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a3d070fa3269e89dfad49fde8ea30758afa36f4b
Re: for the next gnulib update
On Sun, Aug 11, 2019 at 09:33:47PM -0600, Assaf Gordon wrote: > Hello, > > On Sun, Aug 11, 2019 at 10:42:49AM +0200, Bruno Haible wrote: > > A couple of changes in gnulib on 2019-07-15 [1] need updates on the > > coreutils > > side, the next you update the gnulib used by coreutils. > > Thanks for the heads-up. > > Patch attached - I'll apply it tomorrow if there are no further comments. > Updated patch (fixed typo in commit message). >From fc120af40548e63a98644f9f075710259a00 Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Sun, 11 Aug 2019 21:29:00 -0600 Subject: [PATCH] build: adjust for recent gnulib pthread changes Discussed in https://lists.gnu.org/r/coreutils/2019-08/msg00030.html . * bootstrap.conf (gnulib_modules): Replace 'pthread' with pthread-* modules. * src/sort.c: Remove GNULIB_defined_pthread_functions conditional. --- bootstrap.conf | 5 - src/sort.c | 5 - 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/bootstrap.conf b/bootstrap.conf index 49261524a..018bc4eb3 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -196,7 +196,10 @@ gnulib_modules=" priv-set progname propername - pthread + pthread-cond + pthread-mutex + pthread-thread + pthread_sigmask putenv quote quotearg diff --git a/src/sort.c b/src/sort.c index d812aa999..360a1f140 100644 --- a/src/sort.c +++ b/src/sort.c @@ -82,11 +82,6 @@ struct rlimit { size_t rlim_cur; }; # endif #endif -#if GNULIB_defined_pthread_functions -# undef pthread_sigmask -# define pthread_sigmask(how, set, oset) sigprocmask (how, set, oset) -#endif - #if !defined OPEN_MAX && defined NR_OPEN # define OPEN_MAX NR_OPEN #endif -- 2.20.1
Re: for the next gnulib update
Hello, On Sun, Aug 11, 2019 at 10:42:49AM +0200, Bruno Haible wrote: > A couple of changes in gnulib on 2019-07-15 [1] need updates on the coreutils > side, the next you update the gnulib used by coreutils. Thanks for the heads-up. Patch attached - I'll apply it tomorrow if there are no further comments. -assaf >From fc120af40548e63a98644f9f075710259a00 Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Sun, 11 Aug 2019 21:29:00 -0600 Subject: [PATCH] build: adjust for recent gnulib pthread changes Discussed in https://lists.gnu.org/r/coreutils/2019-08/msg00030.html . * bootstrap.conf (gnulib_modules): Replace 'pthread' with pthread-X moduels. * src/sort.c: Remove GNULIB_defined_pthread_functions conditional. --- bootstrap.conf | 5 - src/sort.c | 5 - 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/bootstrap.conf b/bootstrap.conf index 49261524a..018bc4eb3 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -196,7 +196,10 @@ gnulib_modules=" priv-set progname propername - pthread + pthread-cond + pthread-mutex + pthread-thread + pthread_sigmask putenv quote quotearg diff --git a/src/sort.c b/src/sort.c index d812aa999..360a1f140 100644 --- a/src/sort.c +++ b/src/sort.c @@ -82,11 +82,6 @@ struct rlimit { size_t rlim_cur; }; # endif #endif -#if GNULIB_defined_pthread_functions -# undef pthread_sigmask -# define pthread_sigmask(how, set, oset) sigprocmask (how, set, oset) -#endif - #if !defined OPEN_MAX && defined NR_OPEN # define OPEN_MAX NR_OPEN #endif -- 2.20.1
Re: parse-datetime.y - Military Timezones are inverted from the correct sense
On 2019-08-10 9:17 p.m., Assaf Gordon wrote: On Sat, Aug 10, 2019 at 01:05:23PM -0700, Paul Eggert wrote: The attached patch-set includes this fix, and the updated NEWS wording. (I'll wait until gnulib is updated with the additional fix, then create a new coreutil patch with the latest gnulib.) Thanks here too; it all sounds good. Attached latest version (with updated gnulib, and Bernhard's syntax-check fix). I'll push tomorrow unless other issues pop up. -assaf Pushed here: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=725c8d6bed902a181da867a5d38efd01f62d8c9a
Re: building old coreutils versions on new glibc systems
Hello, On Sat, Aug 10, 2019 at 03:19:57PM +0200, Bernhard Voelker wrote: > On 8/7/19 6:04 PM, Jim Meyering wrote: > > Since it is something that may contribute to binaries I build (with > > the handy related build target), it feels like it belongs in > > version-control > okay, fine. Both variants have advantages and disadvantages, so let's > go for the variant easier to maintain. > > I'll reply wrt/ the patch in the other email. Thanks for the improved script, the suggestion and the testing. Attached updated patch. changes are: - moved to ./script/build-older-versions - trimmed whitespace from patches - used your version of the script - added permissive license to the script - added a short blurb at the end of the script, showing PATHs. Comments welcomed, -assaf 0001-scripts-document-how-to-build-older-versions-on-newe.patch.gz Description: application/gunzip
Re: parse-datetime.y - Military Timezones are inverted from the correct sense
On Sat, Aug 10, 2019 at 01:05:23PM -0700, Paul Eggert wrote: > > The attached patch-set includes this fix, > > and the updated NEWS wording. > > (I'll wait until gnulib is updated with the additional fix, > > then create a new coreutil patch with the latest gnulib.) > > Thanks here too; it all sounds good. Attached latest version (with updated gnulib, and Bernhard's syntax-check fix). I'll push tomorrow unless other issues pop up. -assaf >From 961d668eea9c94beddd309d81f65c32a133a3260 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Fri, 9 Aug 2019 19:51:42 -0600 Subject: [PATCH 1/3] gnulib: update to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index c7d0b4506..8524167df 16 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit c7d0b4506574887be5835ae9ae892d365afbb98c +Subproject commit 8524167df6555c38079e9d041044dc59a9ddbeee -- 2.20.1 >From 3cbddd58fde11c911134bdfe79fc3f2579ba58e1 Mon Sep 17 00:00:00 2001 From: Bernhard Voelker Date: Mon, 22 Jul 2019 08:53:28 +0200 Subject: [PATCH 2/3] maint: add lib/argmatch.h to po/POTFILES.in * po/POTFILES.in (lib/argmatch.h): Add to avoid sc_po_check error: "maint.mk: you have changed the set of files with translatable \ diagnostics;" --- po/POTFILES.in | 1 + 1 file changed, 1 insertion(+) diff --git a/po/POTFILES.in b/po/POTFILES.in index 60c5124ac..4231f56c4 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -3,6 +3,7 @@ # These are nominally temporary... lib/argmatch.c +lib/argmatch.h lib/closein.c lib/closeout.c lib/copy-acl.c -- 2.20.1 >From 725c8d6bed902a181da867a5d38efd01f62d8c9a Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Fri, 9 Aug 2019 20:16:06 -0600 Subject: [PATCH 3/3] date: mention military timezone changes from gnulib Gnulib commits f1f10d47be8762e4ca17c8957a0520b08d28abfb and 0673d8ab42c9bb0cf618a21b537cdd8fb976fb73 negated the meaning of military timezones parsed in gnu date. See https://lists.gnu.org/r/bug-gnulib/2019-08/msg5.html and https://lists.gnu.org/r/coreutils/2019-08/msg00021.html * NEWS: Mention this user-visible change. * tests/misc/date.pl: Add tests for the new behavior. --- NEWS | 13 + tests/misc/date.pl | 10 ++ 2 files changed, 23 insertions(+) diff --git a/NEWS b/NEWS index 97c9d18bd..6719e504d 100644 --- a/NEWS +++ b/NEWS @@ -49,6 +49,19 @@ GNU coreutils NEWS-*- outline -*- coherency of file system attributes, useful on network file systems. +** Changes in behavior + + date now parses military time zones in accordance with common usage: +"A" to "M" are equivalent to UTC+1 to UTC+12 +"N" to "Y" are equivalent to UTC-1 to UTC-12 +"Z" is "zulu" time (UTC). + For example, 'date -d "09:00B" is now equivalent to 9am in UTC+2 time zone. + Previously, military time zones were parsed according to the obsolete + rfc822, with their value negated (e.g., "B" was equivalent to UTC-2). + [The old behavior was introduced in sh-utils 2.0.15 ca. 1999, predating + coreutils package.] + + * Noteworthy changes in release 8.31 (2019-03-10) [stable] ** Bug fixes diff --git a/tests/misc/date.pl b/tests/misc/date.pl index 9ba3d3983..92755b1f2 100755 --- a/tests/misc/date.pl +++ b/tests/misc/date.pl @@ -300,6 +300,16 @@ my @Tests = # https://bugs.gnu.org/34608 ['date-century-plus', '-d @0 +.%+4C.', {OUT => '.+019.'}], + + + # Military time zones, new behavior (since 8.32) + # https://lists.gnu.org/r/bug-gnulib/2019-08/msg5.html + ['mtz1', '-u -d "09:00B" +%T', {OUT => '07:00:00'}], + ['mtz2', '-u -d "09:00L" +%T', {OUT => '22:00:00'}], + ['mtz3', '-u -d "09:00N" +%T', {OUT => '10:00:00'}], + ['mtz4', '-u -d "09:00T" +%T', {OUT => '16:00:00'}], + ['mtz5', '-u -d "09:00X" +%T', {OUT => '20:00:00'}], + ['mtz6', '-u -d "09:00Z" +%T', {OUT => '09:00:00'}], ); # Repeat the cross-dst test, using Jan 1, 2005 and every interval from 1..364. -- 2.20.1
Re: parse-datetime.y - Military Timezones are inverted from the correct sense
Hello, (adding bug-gnulib again :) ) Thank you both for the review and suggestions. On 2019-08-10 1:46 a.m., Paul Eggert wrote: > Assaf Gordon wrote: >> I suggest the attached patch for coreutils. > > OK, except I'd remove "in accordance with rfc5322" since RFC 5322 > recommends treating all these zones as if they were UTC. Also, "T" > continues to have its military meaning (i.e., between "S" and "U") if > it's used properly. Good point about 'T'. After adding an additional test for it, I realized the gnulib fix wasn't complete because it didn't negate the 'T' value from UTC+7 to UTC-7. Attached suggested follow-up patch for gnulib. On Sat, Aug 10, 2019 at 05:40:30PM +0200, Bernhard Voelker wrote: > On 8/10/19 4:26 AM, Assaf Gordon wrote: > > This results in a user-visible change for gnu date, > > I suggest the attached patch for coreutils. > > The gnulib update requires the attached to calm down sc_po_check. > You may squash that into your gnulib update commit (or leave it separate). Good catch. The attached patch-set includes this fix, and the updated NEWS wording. (I'll wait until gnulib is updated with the additional fix, then create a new coreutil patch with the latest gnulib.) regards, - assaf >From 9f464d51d8311f33340942c76e758454fa59042d Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Sat, 10 Aug 2019 13:17:49 -0600 Subject: [PATCH] parse-datetime: fix 'T' military timezone handling follow-up to the previous commit: the 'T' case is handled outside the conversion table (used as either military timezone UTC-7 or ISO8601 separator). Change it from "HOUR(7)" to "-HOUR(7)" to match other timezone letters. * lib/parse-datetime.y: Change 'T' value from UTC+7 yo UTC-7. * ChangeLog: Mention the change. --- ChangeLog| 8 lib/parse-datetime.y | 4 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index 7616b5efd..7c25c53e5 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,11 @@ +2019-08-10 Assaf Gordon + + parse-datetime: fix 'T' military timezone handling + follow-up to the previous commit: the 'T' case is handled outside the + conversion table (used as either military timezone UTC-7 or ISO8601 + separator). Change it from "HOUR(7)" to "-HOUR(7)" to match other + timezone letters. + 2019-08-09 Paul Eggert parse-datetime: fix military timezone letters diff --git a/lib/parse-datetime.y b/lib/parse-datetime.y index d371b9cb1..218e3dc5b 100644 --- a/lib/parse-datetime.y +++ b/lib/parse-datetime.y @@ -754,14 +754,14 @@ zone: tZONE { pc->time_zone = $1; } | 'T' - { pc->time_zone = HOUR (7); } + { pc->time_zone = -HOUR (7); } | tZONE relunit_snumber { pc->time_zone = $1; if (! apply_relative_time (pc, $2, 1)) YYABORT; debug_print_relative_time (_("relative"), pc); } | 'T' relunit_snumber - { pc->time_zone = HOUR (7); + { pc->time_zone = -HOUR (7); if (! apply_relative_time (pc, $2, 1)) YYABORT; debug_print_relative_time (_("relative"), pc); } -- 2.20.1 >From 19f7eab06af234641a2927514c03570c07a311db Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Fri, 9 Aug 2019 19:51:42 -0600 Subject: [PATCH 1/3] gnulib: update to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index c7d0b4506..f1f10d47b 16 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit c7d0b4506574887be5835ae9ae892d365afbb98c +Subproject commit f1f10d47be8762e4ca17c8957a0520b08d28abfb -- 2.20.1 >From 673a0360b9dd96cfe0df017febb40980843cbb84 Mon Sep 17 00:00:00 2001 From: Bernhard Voelker Date: Mon, 22 Jul 2019 08:53:28 +0200 Subject: [PATCH 2/3] maint: add lib/argmatch.h to po/POTFILES.in * po/POTFILES.in (lib/argmatch.h): Add to avoid sc_po_check error: "maint.mk: you have changed the set of files with translatable \ diagnostics;" --- po/POTFILES.in | 1 + 1 file changed, 1 insertion(+) diff --git a/po/POTFILES.in b/po/POTFILES.in index 60c5124ac..4231f56c4 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -3,6 +3,7 @@ # These are nominally temporary... lib/argmatch.c +lib/argmatch.h lib/closein.c lib/closeout.c lib/copy-acl.c -- 2.20.1 >From 0c552fac1991f49ef2db347adaed7bd82b935d70 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Fri, 9 Aug 2019 20:16:06 -0600 Subject: [PATCH 3/3] date: mention military timezone changes from gnulib Gnulib commit f1f10d47be8762e4ca17c8957a0520b08d28abfb (based on https://lists.gnu.org/r/bug-gnulib/2019-08/msg5.html) negated the meaning of military timezones parsed in gnu date. * NEWS: Mention this user-visible change. * tests/misc/date.pl: Add tests for the new behavior. --- NEWS | 13 + t
Re: parse-datetime.y - Military Timezones are inverted from the correct sense
Hello, On Fri, Aug 09, 2019 at 02:01:35PM -0700, Paul Eggert wrote: > Since the RFC 822 error was fixed in 2001 when RFC 2822 came out, it is long > past time to fix parse-datetime.y accordingly, so I installed the attached > patch into Gnulib. This results in a user-visible change for gnu date, I suggest the attached patch for coreutils. -assaf >From 19f7eab06af234641a2927514c03570c07a311db Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Fri, 9 Aug 2019 19:51:42 -0600 Subject: [PATCH 1/2] gnulib: update to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index c7d0b4506..f1f10d47b 16 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit c7d0b4506574887be5835ae9ae892d365afbb98c +Subproject commit f1f10d47be8762e4ca17c8957a0520b08d28abfb -- 2.20.1 >From 6eb1118f00a7018f08f69c7ace86cd92f89ca961 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Fri, 9 Aug 2019 20:16:06 -0600 Subject: [PATCH 2/2] date: mention military timezone changes from gnulib Gnulib commit f1f10d47be8762e4ca17c8957a0520b08d28abfb (based on https://lists.gnu.org/r/bug-gnulib/2019-08/msg5.html) negated the meaning of military timezones parsed in gnu date. * NEWS: Mention this user-visible change. * tests/misc/date.pl: Add tests for the new behavior. --- NEWS | 16 tests/misc/date.pl | 9 + 2 files changed, 25 insertions(+) diff --git a/NEWS b/NEWS index 97c9d18bd..d4904d20b 100644 --- a/NEWS +++ b/NEWS @@ -49,6 +49,22 @@ GNU coreutils NEWS-*- outline -*- coherency of file system attributes, useful on network file systems. +** Changes in behavior + + date now parses military time zones in accordance with rfc5322: +"A" to "M" are equivalent to UTC+1 to UTC+12 +"N" to "S" are equivalent to UTC-1 to UTC-6 +"U" to "Y" are equivalent to UTC-8 to UTC-12 +"T" is parsed as a ISO-8601 format representation, +and should not be used for military time zones in gnu date. +"Z" is "zulu" time (UTC). + For example, 'date -d "09:00B" is now equivalent to 9am in UTC+2 time zone. + Previously, military time zones were parsed according to the obsolete + rfc822, with their value negated (e.g., "B" was equivalent to UTC-2). + [The old behavior was introduced in sh-utils 2.0.15 ca. 1999, predating + coreutils package.] + + * Noteworthy changes in release 8.31 (2019-03-10) [stable] ** Bug fixes diff --git a/tests/misc/date.pl b/tests/misc/date.pl index 9ba3d3983..e11753347 100755 --- a/tests/misc/date.pl +++ b/tests/misc/date.pl @@ -300,6 +300,15 @@ my @Tests = # https://bugs.gnu.org/34608 ['date-century-plus', '-d @0 +.%+4C.', {OUT => '.+019.'}], + + + # Military time zones, new behavior (since 8.32) + # https://lists.gnu.org/r/bug-gnulib/2019-08/msg5.html + ['mtz1', '-u -d "09:00B" +%T', {OUT => '07:00:00'}], + ['mtz2', '-u -d "09:00L" +%T', {OUT => '22:00:00'}], + ['mtz3', '-u -d "09:00N" +%T', {OUT => '10:00:00'}], + ['mtz4', '-u -d "09:00X" +%T', {OUT => '20:00:00'}], + ['mtz5', '-u -d "09:00Z" +%T', {OUT => '09:00:00'}], ); # Repeat the cross-dst test, using Jan 1, 2005 and every interval from 1..364. -- 2.20.1
bug#36985: tail
close 36985 stop Hello, On 2019-08-09 12:55 a.m., Rob Hearne wrote: root@kafka-robh-vmdub-04:/kafka/bin# tail -f Control tail: unrecognized file system type 0x794c7630 for ‘Control’. please report this to bug-coreutils@gnu.org. reverting to polling This has been fixed in version 8.25 (released in 2016). For more details, see https://www.gnu.org/software/coreutils/filesystems.html -assaf
Re: building old coreutils versions on new glibc systems
Hello, On Tue, Aug 06, 2019 at 09:35:01PM +0200, Bernhard Voelker wrote: > On 8/2/19 9:05 AM, Jim Meyering wrote: > > Nice work. I've had to go through this process a few times over the > > years, and having these handy patch files checked in and maintained > > would make it easier to automate the process. > While this work is definitely worth keeping, I'm only 20:80 to add > something to the current (and future) version which belongs to older > versions. > > What about either uploading it to the FTP, or even better to add it > to the web pages' CVS? Adding it as a page to the website sounds good (it will also be easy for people to find using common search engines). I don't like the FTP idea so much - not very accesible unless you know exactly what you're looking for. Attached is a possible HTML page (and the patches in a subdirectory). Comments welcomed, - assaf coreutils-website-older-versions.tar.gz Description: application/tar-gz
bug#36901: Enhance directory and file moves where target already exists
Hello, On Fri, Aug 02, 2019 at 10:47:18PM -0700, L A Walsh wrote: > It's not a wish list that 'mv' doesn't work as documented. The "wishlist" refers to the topic: You are asking to add new funtionality to 'mv'. That is a "wishlist" item. (answering out of order:) > > On 2019-08-02 9:56 p.m., L A Walsh wrote: > >> But you say posix wants it to perform as a rename? [...] > >> > >> So if I have: > >> mkdir A B > >> touch A/foo B/fee > >> So when I look at the system call on linux for rename: > >> oldpath can specify a directory. In this case, newpath must > >> either not > >> exist, or it must specify an empty directory. > >> (complying with POSIX_C_SOURCE >= 200809L) > >> > >> So move should give an error: Nope: > >> > >> mv A B > >>> tree B > >> B > >> ├── A > >> │ └── foo > >> └── fee > >> > >> 1 directory, 2 files > >> > >> So mv is violating POSIX - it didn't do the rename, but moved > >> A under B and neither dir had to be empty. > >> > >> Saying it has to follow POSIX when it doesn't appear to, seems > >> a bit contradictory? I previously quoted one small part of the entire "mv" POSIX specification (item #3, regarding using the 'rename(2)' function). It would be wise to read the entire specification before making claims about violating POSIX. Specifically, at the top of the page: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html SYNOPSIS mv [-if] source_file target_file mv [-if] source_file... target_dir DESCRIPTION [...] In the second synopsis form, mv shall move each file named by a source_file operand to a destination file in the existing directory named by the target_dir operand [...] This second form is assumed when the final operand names an existing directory In this regard GNU 'mv' is compliant with POSIX. > > On 2019-08-02 9:56 p.m., L A Walsh wrote: > >> On 2019/08/02 19:47, Assaf Gordon wrote: > >>> Can new merging features be added to 'mv'? yes. > >>> But it seems to me these would be better suited for 'higher level' > >>> programs (e.g. a GUI file manager). > >> --- > >> If the command was named 'ren', then I'd expect it to be dummer, > >> but 'mv'/move seem like it should be able to move files from > >> one dir into another. > >> > >> But you say posix wants it to perform as a rename? > >> I know, create a 're' command (or 'rn') for rename, and have > >> it do what 'mv' would do. Maybe posix would realize it would > >> be better to have re/rn behave like rename, and 'mv' to > >> behave it was moving something. The Austin group (https://www.opengroup.org/austin/) who is in charge of developing and maintaining the POSIX standard is the place to go when wanting to change things in POSIX (or add new things). You can write to them, suggest a modification, and if they change the standard, GNU coreutils will surely follow. As for renaming 'mv' or creating new 'rn' command - part of POSIX is to codify existing behavior (that is - programs which were in common use *before* POSIX). It's not always logic, it's not always ideal, but that's what has been in use for many years. Based on mv's wiki page (https://en.wikipedia.org/wiki/Mv), 'mv' was first introduced in 1971, 47 years ago. With hindsight of nearly 5 decades it's easy to point to faults in a program. If we were designing 'mv' today from scratch, I'm sure we would improve many of its aspects. But given that it is a long-standing program and its usage and quirks are well established, I'm inclined to say it is highly unlikely we will change mv's default behaviour or replace it with a different name. Adding new functionality (e.g. a new '--merge-directory' option) is possible, and concrete patches are always welcomed. However, given all the above, there is no guarentee that such new option will be accepted. I still think that such specific features are better suited for more sophisticated programs (whether GUI or command line). regards, - assaf
bug#36901: Enhance directory and file moves where target already exists
severity 36901 wishlist retitle 36901 mv: merge directories where target already exists stop Hello, (for context: this is a new topic, diverged at https://bugs.gnu.org/36831#38 ) For completeness, quoting your second message ( from https://bugs.gnu.org/36831#50 ): On 2019-08-02 9:56 p.m., L A Walsh wrote: > > On 2019/08/02 19:47, Assaf Gordon wrote: >> Can new merging features be added to 'mv'? yes. >> But it seems to me these would be better suited for 'higher level' >> programs (e.g. a GUI file manager). > --- > But neither the person who posted the original bug on this > nor I are using a GUI, we are running 'mv' GUI, we use the cmd line on > linux, so that wouldn't > be of any use. > > If the command was named 'ren', then I'd expect it to be dummer, > but 'mv'/move seem like it should be able to move files from > one dir into another. > > But you say posix wants it to perform as a rename? > I know, create a 're' command (or 'rn') for rename, and have > it do what 'mv' would do. Maybe posix would realize it would > be better to have re/rn behave like rename, and 'mv' to > behave it was moving something. > > So if I have: > mkdir A B > touch A/foo B/fee > > So when I look at the system call on linux for rename: > oldpath can specify a directory. In this case, newpath must > either not > exist, or it must specify an empty directory. > (complying with POSIX_C_SOURCE >= 200809L) > > So move should give an error: Nope: > > mv A B >> tree B > B > ├── A > │ └── foo > └── fee > > 1 directory, 2 files > > So mv is violating POSIX - it didn't do the rename, but moved > A under B and neither dir had to be empty. > > Saying it has to follow POSIX when it doesn't appear to, seems > a bit contradictory? >
bug#36831: Enhance directory move. (was Re: bug#36831: enhance 'directory not empty' message)
Hello, On 2019-08-02 9:56 p.m., L A Walsh wrote: On 2019/08/02 19:47, Assaf Gordon wrote: Can new merging features be added to 'mv'? yes. But it seems to me these would be better suited for 'higher level' programs (e.g. a GUI file manager). --- But neither the person who posted the original bug on this nor I are using a GUI, we are running 'mv' GUI, we use the cmd line on linux, so that wouldn't be of any use. The original post was about the error *message*, asking to make it clearer. That is the topic of this thread (and the previous patch) - so let's leave them at that. I see you started a new thread ( https://bugs.gnu.org/36901 ), so I'll reply there.
bug#36831: Enhance directory move. (was Re: bug#36831: enhance 'directory not empty' message)
Hello, On Fri, Aug 02, 2019 at 02:41:31AM -0700, L A Walsh wrote: > On 2019/07/28 23:28, Assaf Gordon wrote: > > > > > > $ mkdir A B B/A > > $ touch A/bar B/A/foo > > $ mv A B > > mv: cannot move 'A' to 'B/A': Directory not empty > > > > And the reason (as you've found out) is that the target directory 'B/A' > > is not empty (has the 'foo' file in it). > > Had this been allowed, moving 'A' to 'B/A' would result in the 'foo' > > file disappearing. > > > --- > Why must foo disappear? > > Microsoft Windows handles this situation by telling the user that > the target directory already exists and giving the option to *MERGE* > the directories. > > If you attempt to move a file into a directory that already contains > a file by the same name, it pops up another notice asking [...] Certainly, GUI programs (and more 'feature-rich' programs than 'mv') offer many "merging" options. I'm sure Midnight-Commander, KDE/Doplhine, XFCE/Thunar, Gnome/Nautilus and many other free software GUI file managers have some "merging" capabilities. But 'mv' is more basic and does not have this capability. Partly that is because it adheres to the POSIX standards, which mandates: "3. The mv utility shall perform actions equivalent to the rename() function [...]" https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html Some rsync options (--remove-source-files) can mimick 'mv' with merging, but then they are more like "copy+delete" than actual "rename/move". Can new merging features be added to 'mv'? yes. But it seems to me these would be better suited for 'higher level' programs (e.g. a GUI file manager). regards, - assaf
Re: building old coreutils versions on new glibc systemsy
Hello, On Fri, Aug 02, 2019 at 12:05:53AM -0700, Jim Meyering wrote: > On Thu, Aug 1, 2019 at 7:48 PM Assaf Gordon wrote: > > The attached patches enable building old tarballs on modern systems > > (tested on Debian 10 with GLIBC 2.28-10, gcc 8.3.0-6). > > > > Nice work. I've had to go through this process a few times over the > years, and having these handy patch files checked in and maintained > would make it easier to automate the process. I'm on the fence as to > whether it's worth checking them in, given how few of us end up > building all old versions like that. Selfishly, I want it. Now that I > write this, I conclude it's worth the small cost. No need to > distribute those files, of course, and anything that makes a > maintainer's job easier (for such a small cost) is worthwhile. Thanks. Attached a patch to add these (+ README + build script) to a new 'contrib' directory. NOTE: I had to disable the 'commit-msg' hook (for the 'contrib' prefix) and the 'precommit' hook (because the patch files contains spaces at end of lines, and lines longer than 80 characters). Not sure if this is valid, or will cause troubles later on. An alternative might be to gzip the patches before commiting? Ideas welcomed. -assaf 0001-contrib-document-how-to-build-older-versions-on-newe.patch.gz Description: application/gunzip
Re: seq: fix bug of printing extra line
On Fri, Aug 02, 2019 at 01:08:49PM +0100, Pádraig Brady wrote: > On 02/08/19 03:28, Assaf Gordon wrote: > > > > Prompted by the recent 'seq' thread, I spotted a bug in seq. > > Fix attached. > > Nice one. thanks! > Thanks, pushed here: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=07f811a3c02d3d6dc1943030afccdfdcf7ac1e5e
Re: Add new command
Hello, On Fri, Aug 02, 2019 at 07:12:06PM +0430, Saeed Dehqan wrote: > How do I add a command called rn? In general, please follow the instructions in the README-hacking and HACKING files to prepare a patch for a new command. https://git.savannah.gnu.org/cgit/coreutils.git/tree/README-hacking https://git.savannah.gnu.org/cgit/coreutils.git/tree/HACKING See past examples of such patches here: https://git.savannah.gnu.org/cgit/coreutils.git/log/?qt=grep=new+program (Note all the files they modify.) Since this will be a large contribution (i.e., more than 10 lines of code), a copyright assignment will be required. Please see here: https://www.gnu.org/licenses/why-assign.en.html Then fill and send this form: https://git.savannah.gnu.org/cgit/gnulib.git/tree/doc/Copyright/request-assign.future > This is an advanced command to rename large-scale files and accelerations. > This command supported Regexes and Counters. Before re-inventing the wheel, it is worth checking what other programs exist for such functionality. The basic "rename" program existed for some decades and allows regex renames. Many other programs provide more advanced options (and even GUI), like: https://www.ostechnix.com/how-to-rename-multiple-files-at-once-in-linux/ https://packages.debian.org/search?keywords=rename A command that replicate existing functionality is less likely to be accepted in gnu coreutils. regards, - assaf
bug#36831: enhance 'directory not empty' message
On Thu, Aug 01, 2019 at 03:58:51PM -0700, Paul Eggert wrote: > Thanks, that's better, but we're still missing some opportunities for > improvement. > > > mv: cannot move 'A' to 'B/A': Target directory not empty > > This should be "Destination" not "Target". [...] > You meant "mv" not "rm". [...] > > +static char* > Space before "*". [...] > > +strerror_target (int e) > Change name to "strerror_dest" [...] > This function should return NULL instead of aborting when the errno value is > inapplicable. That way, its callers need not hardcode which errno values it > handles. Thanks for the review and suggestions - attached an updated patch. > Come to think of it, the same improvement should be made to ln, cp, install > and shred. Basically, to any program that uses 'rename' or 'link' or similar > syscalls, and which reports an error if the syscall fails. OK, I will work on that next. -assaf >From 8dc6158a6fde668e55312b5fb69384f438b7e55a Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Mon, 29 Jul 2019 00:23:20 -0600 Subject: [PATCH] mv: improve error messages when destination directory is at fault Suggested by Alex Mantel in https://bugs.gnu.org/36831 . $ mkdir A B B/A $ touch A/bar B/A/foo Before: $ mv A B mv: cannot move 'A' to 'B/A': Directory not empty After: $ mv A B mv: cannot move 'A' to 'B/A': Destination directory not empty The following errors are handled: EDQUOT, EEXIST, ENOTEMPTY, EISDIR, ENOSPC, ETXTBSY. * src/copy.c (copy_internal): Print custom messages for errors that explicitly fault the destination directory. (strerror_dest): New function, return custom, translatable error messages for errors relating to 'destination' component. * tests/mv/dir2dir.sh: Adjust expected error message. * NEWS: Mention change. --- NEWS| 6 + src/copy.c | 53 ++--- tests/mv/dir2dir.sh | 8 --- 3 files changed, 61 insertions(+), 6 deletions(-) diff --git a/NEWS b/NEWS index fd0543351..3d80665ae 100644 --- a/NEWS +++ b/NEWS @@ -44,6 +44,12 @@ GNU coreutils NEWS-*- outline -*- stat(1) also supports a new --cached= option to control cache coherency of file system attributes, useful on network file systems. +** Improvements + + mv now prints clearer error messages when a failure relates to the + destination directory (e.g., "Destination directory is not empty" instead + of "Directory not empty"). + * Noteworthy changes in release 8.31 (2019-03-10) [stable] diff --git a/src/copy.c b/src/copy.c index 65cf65895..602c8307b 100644 --- a/src/copy.c +++ b/src/copy.c @@ -1867,6 +1867,44 @@ source_is_dst_backup (char const *srcbase, struct stat const *src_st, return dst_back_status == 0 && SAME_INODE (*src_st, dst_back_sb); } +/* Return custom error messages replacing the default libc's + messages. These messages explicity fault the destination component + in the error. + + Return NULL if E (errno value) is not handled (and by implication + should use the system's default text for the error message). */ +static char * +strerror_dest (int e) +{ + /* TRANSLATORS: These strings should mimick libc's standard + error messages (from strerror(3)), but explicitly mention + the fault is with the destination directory. */ + switch (errno) +{ +case EDQUOT: + return _("Disk quota exceeded on destination device"); +case EEXIST: +case ENOTEMPTY: + return _("Destination directory not empty"); +case EISDIR: + return _("Tried to overwrite a directory with a file"); +case ENOSPC: + return _("No space left on destination device"); +case ETXTBSY: + /* NOTE: The error is "Text file busy" - but "text" in that context + refers to "text segment" of an executable file (as opposed to + "data segment" and "BSS segment"). + + This error message is meant for users, and 'text file' can be easily + confused with an actual text file (i.e., one containing only ASCII + characters. Thus, say 'executable' instead of 'text'.*/ + return _("Destination executable file is busy"); +default: + return NULL; +} +} + + /* Copy the file SRC_NAME to the file DST_NAME. The files may be of any type. NEW_DST should be true if the file DST_NAME cannot exist because its parent directory was just created; NEW_DST should @@ -2477,9 +2515,18 @@ copy_internal (char const *src_name, char const *dst_name, If the permissions on the directory containing the source or destination file are made too restrictive, the rename will fail. Etc. */ - error (0, rena
building old coreutils versions on new glibc systems
Hello, While trying to find out the first version with the 'seq' bug (my previous email), I realized it has become quite hard to build old coreutils version on newer glibc system. In particular: 1. At some point 'gets' was removed from glibc, but old sources refer it. 2. Older gnulib used internal glibc symbols (libio.h) and the detection method changed (_IO_ftrylockfile vs _IO_EOF_SEEN). See: https://git.sv.gnu.org/cgit/gnulib.git/commit/?id=74d9d6a2 3. Old coreutils defined 'futimens','tee','eaccess' functions which conflict with later glibc functions of same name. In short, it's not trivial to download a tarball from https://ftp.gnu.org/gnu/coreutils/ and build it on modern systems (and it seems even more complicated to build from git). The attached patches enable building old tarballs on modern systems (tested on Debian 10 with GLIBC 2.28-10, gcc 8.3.0-6). The sequence should be: wget https://ftp.gnu.org/gnu/coreutils/coreutils-5.97.tar.gz tar -xf coreutils-5.97.tar.gz cd coreutils-5.97 patch -p1 < ../coreutils-5.97-on-glibc-2.28.patch ./configure make Coreutils Versions Patch file 5.0coreutils-5.0-on-glibc-2.28.patch 5.97 to 6.9coreutils-5.97-on-glibc-2.28.patch 6.10 coreutils-6.10-on-glibc-2.28.patch 6.11 coreutils-6.11-on-glibc-2.28.patch 6.12 coreutils-6.12-on-glibc-2.28.patch 7.2 to 8.3coreutils-7.2-on-glibc-2.28.patch 8.4 to 8.12 coreutils-8.4-on-glibc-2.28.patch 8.13 to 8.16 coreutils-8.13-on-glibc-2.28.patch 8.17 coreutils-8.17-on-glibc-2.28.patch 8.18 to 8.23 coreutils-8.18-on-glibc-2.28.patch 8.24 to 8.29 coreutils-8.24-on-glibc-2.28.patch 8.30 and newer [builds without patching] Hope this helps someone. regards, - assaf diff -r -U3 coreutils-5.0/src/Makefile.in coreutils-5.0-patched/src/Makefile.in --- coreutils-5.0/src/Makefile.in 2003-04-02 07:46:19.0 -0700 +++ coreutils-5.0-patched/src/Makefile.in 2019-08-01 19:38:07.440997426 -0600 @@ -209,7 +209,7 @@ printf_LDADD = $(LDADD) @POW_LIB@ @LIBICONV@ # If necessary, add -lm to resolve use of floor, rint, modf. -seq_LDADD = $(LDADD) @SEQ_LIBM@ +seq_LDADD = $(LDADD) @SEQ_LIBM@ -lm # If necessary, add -lm to resolve the `pow' reference in lib/strtod.c # or for the fesetround reference in programs using nanosec.c. diff -r -U3 coreutils-5.0/src/tee.c coreutils-5.0-patched/src/tee.c --- coreutils-5.0/src/tee.c 2002-12-15 07:21:45.0 -0700 +++ coreutils-5.0-patched/src/tee.c 2019-08-01 19:34:32.374301325 -0600 @@ -32,7 +32,7 @@ #define AUTHORS N_ ("Mike Parker, Richard M. Stallman, and David MacKenzie") -static int tee (int nfiles, const char **files); +static int tee_FOO (int nfiles, const char **files); /* If nonzero, append to output files rather than truncating them. */ static int append; @@ -146,7 +146,7 @@ /* Do *not* warn if tee is given no file arguments. POSIX requires that it work when given no arguments. */ - errs = tee (argc - optind, (const char **) [optind]); + errs = tee_FOO (argc - optind, (const char **) [optind]); if (close (STDIN_FILENO) != 0) error (EXIT_FAILURE, errno, _("standard input")); @@ -158,7 +158,7 @@ Return 0 if successful, 1 if any errors occur. */ static int -tee (int nfiles, const char **files) +tee_FOO (int nfiles, const char **files) { FILE **descriptors; char buffer[BUFSIZ]; diff -r -U3 coreutils-5.0/src/test.c coreutils-5.0-patched/src/test.c --- coreutils-5.0/src/test.c2003-02-10 02:19:09.0 -0700 +++ coreutils-5.0-patched/src/test.c2019-08-01 19:35:52.871307966 -0600 @@ -139,7 +139,7 @@ /* Do the same thing access(2) does, but use the effective uid and gid. */ static int -eaccess (char const *file, int mode) +eaccess_FOO (char const *file, int mode) { static int have_ids; static uid_t uid, euid; @@ -635,17 +635,17 @@ case 'r': /* file is readable? */ unary_advance (); - value = -1 != eaccess (argv[pos - 1], R_OK); + value = -1 != eaccess_FOO (argv[pos - 1], R_OK); return (TRUE == value); case 'w': /* File is writable? */ unary_advance (); - value = -1 != eaccess (argv[pos - 1], W_OK); + value = -1 != eaccess_FOO (argv[pos - 1], W_OK); return (TRUE == value); case 'x': /* File is executable? */ unary_advance (); - value = -1 != eaccess (argv[pos - 1], X_OK); + value = -1 != eaccess_FOO (argv[pos - 1], X_OK); return (TRUE == value); case 'O': /* File is owned by you? */ diff -r -U3 coreutils-6.4/lib/utimens.c coreutils-6.4-patched/lib/utimens.c --- coreutils-6.4/lib/utimens.c 2006-09-14 03:53:59.0 -0600 +++
seq: fix bug of printing extra line
Hello, Prompted by the recent 'seq' thread, I spotted a bug in seq. Fix attached. I think it does not introduce any regressions, but review and comments are very welcomed. -assaf >From 52505fe73fb00a30435009895d03fa3bba1297a4 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Thu, 1 Aug 2019 17:01:21 -0600 Subject: [PATCH] seq: fix superfluous output line Under certain circumstances seq prints an extra line when the output format has custom format with characters following the printed numbers: $ seq -f "%g " 100 100 1e+06 1e+06 This is due to the "print_extra_number" logic using strings to determine whether a 'extra number' is needed, but only one string was trimmed when using a custom printf format. Prompted by https://lists.gnu.org/r/coreutils/2019-08/msg1.html * NEWS: Mention fix. * src/seq.c (print_numbers): Trim the 'x0_str' string before comparing it to the previous 'x_str' string. * tests/misc/seq-extra-number.sh: Add this scenario. * tests/local.mk (all_tests): Add new test. --- NEWS | 4 +++ src/seq.c | 4 ++- tests/local.mk | 1 + tests/misc/seq-extra-number.sh | 47 ++ 4 files changed, 55 insertions(+), 1 deletion(-) create mode 100755 tests/misc/seq-extra-number.sh diff --git a/NEWS b/NEWS index fd0543351..97c9d18bd 100644 --- a/NEWS +++ b/NEWS @@ -34,6 +34,10 @@ GNU coreutils NEWS-*- outline -*- for --numeric, --hex, or default alphabetic suffixes respectively. [bug introduced in coreutils-8.24] + seq no longer prints an extra line under certain circumstances (such as + 'seq -f "%g " 100 100'). + [bug introduced in coreutils-6.10] + ** New Features od --skip-bytes now can use lseek even if the input is not a regular diff --git a/src/seq.c b/src/seq.c index b5913368a..8efe929e1 100644 --- a/src/seq.c +++ b/src/seq.c @@ -340,8 +340,10 @@ print_numbers (char const *fmt, struct layout layout, && x_val == last) { char *x0_str = NULL; - if (asprintf (_str, fmt, x0) < 0) + int x0_strlen = asprintf (_str, fmt, x0); + if (x0_strlen < 0) xalloc_die (); + x0_str[x0_strlen - layout.suffix_len] = '\0'; print_extra_number = !STREQ (x0_str, x_str); free (x0_str); } diff --git a/tests/local.mk b/tests/local.mk index e88d99f24..3e347cd96 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -245,6 +245,7 @@ all_tests = \ tests/misc/test.pl \ tests/misc/seq.pl\ tests/misc/seq-epipe.sh \ + tests/misc/seq-extra-number.sh \ tests/misc/seq-io-errors.sh \ tests/misc/seq-locale.sh \ tests/misc/seq-long-double.sh\ diff --git a/tests/misc/seq-extra-number.sh b/tests/misc/seq-extra-number.sh new file mode 100755 index 0..4295e1791 --- /dev/null +++ b/tests/misc/seq-extra-number.sh @@ -0,0 +1,47 @@ +#!/bin/sh +# Test the "print_extra_number" logic seq.c:print_numbers() + +# Copyright (C) 2019 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <https://www.gnu.org/licenses/>. + +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src +print_ver_ seq + +## +## Test 1: the documented reason for the logic +## +cat<<'EOF'>exp1 || framework_failure_ +0.00 +0.01 +0.02 +0.03 +EOF + +seq 0 0.01 0.03 > out1 || fail=1 +compare exp1 out1 || fail=1 + + +## +## Test 2: before 8.32, this resulted in TWO lines +## (print_extra_number was erroneously set to true) +## The '=' is there instead of a space to ease visual inspection, +cat<<'EOF'>exp2 || framework_failure_ +1e+06= +EOF + +seq -f "%g=" 100 100 > out2 || fail=1 +compare exp2 out2 || fail=1 + +Exit $fail -- 2.20.1
bug#36831: enhance 'directory not empty' message
Hello, On Wed, Jul 31, 2019 at 08:03:45PM -0700, Paul Eggert wrote: > Assaf Gordon wrote: > > An explicit error explicitly saying "cannot move", and mention the source > > and > > destination, and also "blames" the target directory seems the most > > user-friendly and least ambiguous. > > Sure, but that handles only the ENOTEMPTY/EEXIST case. How would you handle > the EDQUOT, EISDIR, and ENOSPC cases? Will you invent a separate diagnostic > for each case, or just treat them as in my proposed patch? I assume the > latter, but either way I'd like to see a patch that handles these properly > too. Also, please handle ETXTBUSY while you're at it (sorry, I missed that > one). > > > For the second and third cases, > > "No space" and "Quota exceeded" seem to me to always relate to the > > destination, and I don't think users get confused about those > > (other opinions of course welcomed). > > What's obvious to experts like us is not always obvious to users. If users > get confused by the current diagnostic for ENOTEMPTY/EEXIST, I don't see why > they wouldn't also get confused for ETXTBUSY etc. > > > Your patch also added "EISDIR", for which rename(2) says: > > "newpath is an existing directory, but oldpath is not a directory." > > > > But I don't think this error can happen with gnu mv. > > It can, as a result of a race condition if some other process is mutating > the file system while 'mv' is running. Admittedly unlikely, but we might as > well improve this errno value while we're improving the others. All good points. Please see attached updated version. It does add explicit error string for each error code, but I hope the implementation is reasonable and easy to maintain and translate. -assaf >From 8ee71b24d74d7cfe81f151de430d38935cf04675 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Mon, 29 Jul 2019 00:23:20 -0600 Subject: [PATCH] mv: improve error messages when target directory is at fault Suggested by Alex Mantel in https://bugs.gnu.org/36831 . $ mkdir A B B/A $ touch A/bar B/A/foo Before: $ mv A B mv: cannot move 'A' to 'B/A': Directory not empty After: $ mv A B mv: cannot move 'A' to 'B/A': Target directory not empty The following errors are handled: EDQUOT, EEXIST, ENOTEMPTY, EISDIR, ENOSPC, ETXTBSY. * src/copy.c (copy_internal): Print custom messages for errors that explicitly fault the target directory. (strerror_target): New function, return custom and translatable error messages. * tests/mv/dir2dir.sh: Adjust expected error message. * NEWS: Mention change. --- NEWS| 6 + src/copy.c | 56 ++--- tests/mv/dir2dir.sh | 6 ++--- 3 files changed, 62 insertions(+), 6 deletions(-) diff --git a/NEWS b/NEWS index fd0543351..4ec4d0df0 100644 --- a/NEWS +++ b/NEWS @@ -44,6 +44,12 @@ GNU coreutils NEWS-*- outline -*- stat(1) also supports a new --cached= option to control cache coherency of file system attributes, useful on network file systems. +** Improvements + + rm now prints clearer error messages when a failure relates to the + target directory (e.g., "Target directory is not empty" instead of + "Directory not empty"). + * Noteworthy changes in release 8.31 (2019-03-10) [stable] diff --git a/src/copy.c b/src/copy.c index 65cf65895..9cf02ad9c 100644 --- a/src/copy.c +++ b/src/copy.c @@ -1867,6 +1867,38 @@ source_is_dst_backup (char const *srcbase, struct stat const *src_st, return dst_back_status == 0 && SAME_INODE (*src_st, dst_back_sb); } +static char* +strerror_target (int e) +{ + /* TRANSLATORS: These strings should mimick libc's standard + error messages (from strerror(3)), but explicitly mention + the fault is with the target directory. */ + switch (errno) +{ +case EDQUOT: + return _("Disk quota exceeded on target device"); +case EEXIST: +case ENOTEMPTY: + return _("Target directory not empty"); +case EISDIR: + return _("Tried to overwrite a directory with a file"); +case ENOSPC: + return _("No space left on target device"); +case ETXTBSY: + /* NOTE: The error is "Text file busy" - but "text" in that context + refers to "text segment" of an executable file (as opposed to + "data segment" and "BSS segment"). + + This error message is meant for users, and 'text file' can be easily + confused with an actual text file (i.e., one containing only ASCII + characters. Thus, say 'executable' instead of 'text'.*/ + return _("Target executable file is busy"); +default: + assert (0); +} +} + + /* Co
Re: date: new options to parse input date with strptime(3)
Hello, Thank you for the review. (replying to both emails together) On Wed, Jul 31, 2019 at 04:27:20PM +0100, Stephane Chazelas wrote: > 2019-07-31 14:59:42 +0100, Pádraig Brady: > > On 26/07/19 08:29, Assaf Gordon wrote: > [...] > > > The first patch adds '--date-format=FORMAT', where FORMAT is > > > strptime(3) format. > > > > I like this, and think it's useful functionality. > > It's equivalent to -f in date(1) on FreeBSD, > > so we should probably support that short option > [...] > > Note that busybox date has -D for that. In gnu date(1), -f is already assigned to "--file" (batch processing). I added the "-D" short option. > [...] you can use the standard getdate() DATEMSK variable [...] Based on past coreutils policies, I think new environment variables won't be accepted to any program... >> The second patch adds '--arith-format=FORMAT', where FORMAT is >> limited >> to years/months/days/hours/minutes/seconds (%Y/%m/%d/%H/%M/%S). > > The idea here is to support more generic numeric deltas. > I'm not sure of the interface though. Perhaps --delta-format > would be clearer. I changed it to "--date-delta-format" (to match -D/--date-format). Note that there's a difference between this and freebsd's -v: The "--date-delta-format" takes the values from the same date string (-d), so it also works with "--file" (batch processing). > Or perhaps we should just support the > FreeBSD -v option to apply the adjustments, which seems more direct > and would further improve compat. I like the FreeBSD -v method, and implemented it as well (in two patches, to ease review). The commit messages and tests provide many examples. --- There could be many adjustment to these features, but I hope that if the bulk of the code exists, adapting it will be easy. One option, for example, is to do away with "--date-delta-format", and accept the "-v" syntax in the "-d" string, so it will work both from the command line and from a file: date -D "%F" -d "2019-10-31" -v "+2y -100h" printf "2019-10-31 +2y -100h" | date -D "%F" -f - --- The attached patches are: tests: add 'date -r/--reference=FILE' test tests: add 'date -f/--file' (batch processing) test date: add -D/--date-format=FORMAT option date: add --date-delta-format=FORMAT option date: add -v/--adjust-date=STRING option date: expand -v=STR syntax to match FreeBSD --- Comments and suggestions welcomed, - assaf date-strp-2019-08-01.patch.gz Description: application/gunzip
Re: How to convert a md5sum back to a timestamp?
Hello, On 2019-08-01 12:50 a.m., Stephane Chazelas wrote: 2019-07-31 22:36:18 -0500, Peng Yu: Suppose that I know a md5sum that is derived one of the timestamps computed below. Is there a way to quickly derive what the original timestamp is? I could make a database of all the timestamps and their md5sums. But as the total number of entries increases, this solution will not be scalable as the database can be big. Is it there any better solution to this problem? for i in {1..2563200}; do date -d "-$i minutes" +%Y%m%d_%I%M%p; done [...] seq -f '-%g minutes' 2563200 | date -f - +%Y%m%d_%I%M%p would be an improvement as it would only run one date invocation, but you'd still need to run one md5sum for each of those lines. coreutils md5sum in itself is not slow, but forking a process and loading a command and linking its libraries is, that's not a bug in coreutils itself. "datamash" will calculate md5 on multiple lines in one invocation: $ seq -f '-%g minutes' 2563200 \ | date -f - +%Y%m%d_%I%M%p \ | datamash md5 1 or to see the time AND the md5 sum, add "--full": $ seq -f '-%g minutes' 2563200 \ | date -f - +%Y%m%d_%I%M%p \ | datamash --full md5 1 Three notes: 1. I would recommend using "-%7.0f minutes" format in "seq" instead of "%g", as the latter will result in a scientific notation for large values: $ seq -f '-%7g minutes' 2563200 | tail -n1 -2.5632e+06 minutes $ seq -f '-%7.0f minutes' 2563200 | tail -n1 -2563200 minutes 2. Using "-N minutes" as a date format is relative to the current time. Are you sure that's the value you want? you'll get different values every time you run it... To be more reproducible, consider starting with a known date, e.g.: $ date -u -d "2019-08-01 01:53:22Z +55 minutes" +%Y%m%d_%I%M%p 20190801_0248AM or $ seq -f "2019-08-01 01:53:22Z +%7.0f minutes" 2563200 \ | date -u -f - +%Y%m%d_%I%M%p | head 20190801_0154AM 3. Using "datamash md5" does not include the newline for the md5 calculation, be careful about this when comparing hashing results. e.g.: $ echo 20190731_0848PM | md5sum deb75bda7f8e95d321897d181cbe2556 - $ printf "%s\n" 20190731_0848PM | md5sum deb75bda7f8e95d321897d181cbe2556 - $ printf "%s" 20190731_0848PM | md5sum d0bf332197593b7c3f6d7757f7d5754a - $ printf "%s" 20190731_0848PM | datamash md5 1 d0bf332197593b7c3f6d7757f7d5754a --- For reference, on my old desktop it takes: $ time seq -f "2019-08-01 01:53:22Z +%7.0f minutes" 2563200 \ | date -u -f - +%Y%m%d_%I%M%p \ | datamash --full md5 1 | wc -l -c 2563200 125596800 real0m14.185s user0m17.739s sys 0m0.527s And results in ~125MB of data - reasonable for an ad-hoc reverse lookup table for MD5 values. If you key space gets larger, you should look into https://en.wikipedia.org/wiki/Rainbow_table . Hope this helps, - assaf
bug#36831: enhance 'directory not empty' message
Hello Paul, On Mon, Jul 29, 2019 at 06:50:46PM -0500, Paul Eggert wrote: > On 7/29/19 1:28 AM, Assaf Gordon wrote: > > + if (rename_errno == ENOTEMPTY || rename_errno == EEXIST) > > +{ > > + error (0, 0, _("cannot move %s to %s: Target directory not > > empty"), > > + quoteaf_n (0, src_name), quoteaf_n (1, dst_name)); > > Although this is an improvement, it is not general enough, as other errno > values are relevant only for the destination. Better would be to have a > special case for errno values that matter only for the destination, and use > the existing code for errno values where we don't know whether the problem > is the source or the destination. Something like the attached, say. > +case EDQUOT: case EEXIST: case EISDIR: case ENOSPC: case > ENOTEMPTY: > + error (0, rename_errno, "%s", quotearg_colon (dst_name)); > + break; > + Thanks for the review. At the risk of bikeshedding, I'd like to argue for the prior method. While it is not general enough, I think it provides a clearer error message. For example, with the more general implementation the errors would be: $ mv A B mv: B/A: Directory not empty $ mv A B mv: B/A: No space left on device $ mv A B mv: B/A: Quota exceeded In the first case, I think this error is potentially more confusing than before: while it doesn't mention the source directory, it also doesn't say "cannot move" - so it is only implied it is an error (an inexperienced user might dismiss this as a warning). Also, it could be that there will be a source directory named very similarly to the destination directory, and from a quick glace it would not be easy to understand what happened. An explicit error explicitly saying "cannot move", and mention the source and destination, and also "blames" the target directory seems the most user-friendly and least ambiguous. --- For the second and third cases, "No space" and "Quota exceeded" seem to me to always relate to the destination, and I don't think users get confused about those (other opinions of course welcomed). --- Your patch also added "EISDIR", for which rename(2) says: "newpath is an existing directory, but oldpath is not a directory." But I don't think this error can happen with gnu mv. If we try to move a file onto a directory, we get: $ mkdir C C/D ; touch D $ mv D C mv: cannot overwrite directory 'C/D' with non-directory And this case is specifically handled in copy.c line 2131, before calling rename(2) (and also this is an example of a custom error message instead of using stock libc messages). --- Happy to hear your opinion, - assaf
bug#36831: enhance 'directory not empty' message
Hello, On Sun, Jul 28, 2019 at 08:58:59PM +0200, Alex Mantel wrote: [...] > Ah, the target directory does exist! Hmm... But i'd like the message to be > like: > > $ mv thing/ ../things > mv: cannot move 'thing' to '../things/things': Targetdirectory not empty > > ^ this little thing here, > it explains everyting. > > Change text from 'Directory not empty' to 'Targetdirectory not empty'. Thanks for the report. To clarify, the scenario is: $ mkdir A B B/A $ touch A/bar B/A/foo $ mv A B mv: cannot move 'A' to 'B/A': Directory not empty And the reason (as you've found out) is that the target directory 'B/A' is not empty (has the 'foo' file in it). Had this been allowed, moving 'A' to 'B/A' would result in the 'foo' file disappearing. --- How is a user expecting to know this error is about that target directory? There is a bit of a trade-off here between user-friendliness (especially for non-technical user) and more technical knowledge. If we go one step 'lower' to the programming interface, almost all sources mention this is about the 'target' directory not being empty: POSIX's says: https://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html [EEXIST] or [ENOTEMPTY] The link named by new is a directory that is not an empty directory. Linux's rename(2) manual page says: ENOTEMPTY or EEXIST newpath is a nonempty directory, that is, contains entries other than "." and "..". FreeBSD's rename(2) manual page says: [ENOTEMPTY]The to argument is a directory and is not empty. AIX rename(2) manual page says: ENOTEMPTY The ToPath parameter specifies an existing directory that is not empty. So there is some merit in claiming this helpful piece of information is lost when the error message is reported to the user. --- In GNU coreutils this error message originates from 'copy.c' line 2480: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/copy.c#n2480 error (0, rename_errno, _("cannot move %s to %s"), quoteaf_n (0, src_name), quoteaf_n (1, dst_name)); And herein lies the (technical) problem: The actual message "Directory not empty" is not in the source code - it is a system error message that corresponds to the value of 'rename_errno' variable (ENOTEMPTY/EEXIST). It originates from GLibc (or another libc). So there is no trivial way to change the error message in coreutils. Attached a patch to add special handling for this error. --- What do others think? If this is a desired improvement, I'll finish the patch with news/tests/etc. regards, - assaf >From 430b30104234db719bf15e6fc681a62312c7124f Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Mon, 29 Jul 2019 00:23:20 -0600 Subject: [PATCH] mv: improve ENOTEMPTY/EEXIST error message Suggested by Alex Mantel in https://bugs.gnu.org/36831 . $ mkdir A B B/A $ touch A/bar B/A/foo Before: $ mv A B mv: cannot move 'A' to 'B/A': Directory not empty After: $ mv A B mv: cannot move 'A' to 'B/A': Target directory not empty * src/copy.c (copy_internal): Add special handling for ENOTEMPTY/EEXIST. TODO: NEWS, tests. --- src/copy.c | 8 1 file changed, 8 insertions(+) diff --git a/src/copy.c b/src/copy.c index 65cf65895..a5af570bf 100644 --- a/src/copy.c +++ b/src/copy.c @@ -2450,6 +2450,14 @@ copy_internal (char const *src_name, char const *dst_name, return true; } + if (rename_errno == ENOTEMPTY || rename_errno == EEXIST) +{ + error (0, 0, _("cannot move %s to %s: Target directory not empty"), + quoteaf_n (0, src_name), quoteaf_n (1, dst_name)); + forget_created (src_sb.st_ino, src_sb.st_dev); + return false; +} + /* WARNING: there probably exist systems for which an inter-device rename fails with a value of errno not handled here. If/as those are reported, add them to the condition below. -- 2.11.0
date: new options to parse input date with strptime(3)
Hello, Some time ago there was a discussion relating to diffuculties of using GNU date's parsing. There was a mention of how using strptime(3) makes parsing explicit and easy. I like that idea, and decided to try my hand at adding such options. Attached is a proof of concept. The first patch adds '--date-format=FORMAT', where FORMAT is strptime(3) format. The second patch adds '--arith-format=FORMAT', where FORMAT is limited to years/months/days/hours/minutes/seconds (%Y/%m/%d/%H/%M/%S). Examples: # Specific date $ ./src/date --date-format '%d %b %Y' --date '17 Feb 1979' +%F 1979-02-17 # The 100th day of 2019 $ ./src/date --date-format '%Y %j' --date '2019 100' +%F 2019-04-10 # Tuesday of the 10th week in 2018 $ ./src/date --date-format '%Y %W %A' --date '2018 10 Tue' +%F 2018-03-06 # 2019-07-26 18:49:59, +49 hours, -10 minutes, -30 seconds: $ date --date-format '%Y%m%d %H%M%S' \ --arith-format '%H %M %S' \ --date '20190726 184959 49 -10 -30' \ '+%F %T' 2019-07-28 19:39:29 The test file (date-strp.pl) contains more usage examples. This is just a proof of concept, and of course many things can be improved and changed (assuming this feature is desired). Comments and suggestions very welcomed, - assaf >From 82c8b42de7bf9c69432ff175838f01f10008a512 Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Thu, 25 Jul 2019 02:35:46 -0600 Subject: [PATCH 1/2] date: add --date-format=FORMAT option Parse -d=STRING dates using strptime(3) instead of gnulib's parse_datetime.c heuristics. Example: print the 100th day of 2019: $ date --date-format '%Y %j' --date '2019 100' +%F 2019-04-10 TODO: coreutils.texi, NEWS, usage * src/date.c (long_options): Add --date-format/STRP_FORMAT option. (parse_datetime_flags): Replace with ... (debug): ... new variable. (strp_format): New variable to hold the user-specified FORMAT string. (parse_datetime_string): New function, wrapper for parse_datetime2/strptime. (batch_convert, main): Call parse_datetime_string instead of parse_datetime2. (main): Handle STRP_FORMAT option. * tests/misc/date-strp.pl: New tests. * tests/local.mk (TESTS): Add date-strp.pl --- src/date.c | 78 ++--- tests/local.mk | 1 + tests/misc/date-strp.pl | 151 3 files changed, 221 insertions(+), 9 deletions(-) create mode 100644 tests/misc/date-strp.pl diff --git a/src/date.c b/src/date.c index d97d0ae52..4879474e3 100644 --- a/src/date.c +++ b/src/date.c @@ -80,7 +80,8 @@ static char const rfc_email_format[] = "%a, %d %b %Y %H:%M:%S %z"; enum { RFC_3339_OPTION = CHAR_MAX + 1, - DEBUG_DATE_PARSING + DEBUG_DATE_PARSING, + STRP_FORMAT }; static char const short_options[] = "d:f:I::r:Rs:u"; @@ -97,6 +98,7 @@ static struct option const long_options[] = {"rfc-2822", no_argument, NULL, 'R'}, {"rfc-3339", required_argument, NULL, RFC_3339_OPTION}, {"set", required_argument, NULL, 's'}, + {"date-format", required_argument, NULL, STRP_FORMAT}, {"uct", no_argument, NULL, 'u'}, {"utc", no_argument, NULL, 'u'}, {"universal", no_argument, NULL, 'u'}, @@ -105,8 +107,11 @@ static struct option const long_options[] = {NULL, 0, NULL, 0} }; -/* flags for parse_datetime2 */ -static unsigned int parse_datetime_flags; +static bool debug ; + +/* the strp format string specified by the user */ +static char* strp_format; + #if LOCALTIME_CACHE # define TZSET tzset () @@ -142,6 +147,9 @@ Display the current time in the given FORMAT, or set the system date.\n\ -d, --date=STRING display time described by STRING, not 'now'\n\ "), stdout); fputs (_("\ + --date-format=FORMAT parse -d,-f values according to FORMAT\n\ +"), stdout); + fputs (_("\ --debugannotate the parsed date,\n\ and warn about questionable usage to stderr\n\ "), stdout); @@ -281,6 +289,57 @@ Show the local time for 9AM next Friday on the west coast of the US\n\ exit (status); } +/* A wrapper calling either gnulib's parse_datetime2() or strptime(3), + depending on whether the user specified --date-format=FORMAT argument. */ +static bool +parse_datetime_string (struct timespec *result, char const *datestr, + timezone_t tzdefault, char const *tzstring) +{ + if (strp_format) +{ + struct tm t; + time_t s = time (NULL); + localtime_rz (tzdefault, , ); + char *endp = strptime (datestr, strp_format, ); + if (!endp) +{ + if (debug) +error (0, 0, _("date string %s does not match format '%s'"), + quotearg (datestr), + strp_format); + return false; +} + + if (*endp) +{ + if (debug) +error (EXIT_FAIL
Re: doc: add "version sort" chapter
On 2019-07-22 11:56 p.m., Bernhard Voelker wrote: On 7/15/19 9:32 PM, Assaf Gordon wrote: [...] pushed [...] 'make check' fails for sc-avoid-builtin, and I propose some other fixes in the attached as well. WDYT? These all look good, thanks for the improvements!
Re: doc: add "version sort" chapter
On 10/07/19 19:57, Assaf Gordon wrote: I would like to suggest adding a new chapter to the manual, detailing the nitty-gritties of "version sort" in coreutils. Attached the updated version, including improvements Bernhard sent off-list. Comments welcomed, With no further comments, pushed here: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=3264d4ca0d4fd5477d2232c3e097422efdd669ec -assaf
bug#36674: Sort Suggestion
tag 36674 notabug close 36674 stop Hello, On Mon, Jul 15, 2019 at 11:42:01AM -0700, Marshall Lake wrote: > Even though this isn't a bug, I was asked to send the following to this > email address. (General suggestions and discussions are better suited for coreut...@gnu.org mailing list, that way the system won't open a new bug item.) > > Re: SORT Command from GNU coreutils 8.25 > > A suggestion for an additional option to the SORT command is to ignore > non-alphanumeric characters. > > As an example, in attempting to sort an index ... > > Abbott, William259 > > sorts before: > > Abbot, William 099 > > If non-alphanumeric characters were ignored then the same two records > would sort as: > > Abbot, William 099 > Abbott, William259 > > There's actually something else at play here: In your case, sort does ignore non-alphanumeric characters, but it ALSO ignores white space. That happens because your locale is set to some language (for example, en_US.UTF8). Using such locale makes sort ignore all non-alphanumeric chareacters, whitespace, and upper/lower cases. In essense, you are compaing "AbbottWilliam" (two 't's) to 'AbbotWilliam' (one 't') - and then the second 't' is compared to a 'w', and is determined to come first. If you force a POSIX/C locate, then all characters are considered, and the result will be as you requested. Observe the following: $ printf "%s\n" AbbottWilliam AbbotWilliam | LC_ALL=en_CA.utf8 sort AbbottWilliam AbbotWilliam $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=en_CA.utf8 sort Abbott William Abbot William $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=C sort Abbot William Abbott William $ printf "%s\n" "Abbott, William" "Abbot, William" | LC_ALL=C sort Abbot, William Abbott, William Note that 'sort' already has an option for dictionary style sorting: -d, --dictionary-order: consider only blanks and alphanumeric characters. However, locale rules take precedence over it, so effectively it only works in "C" locale: $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort Ab,,b,,ott William Abbot William $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort -d Abbot William Ab,,b,,ott William You can read past discussion about the confusion resulting from locale sorting rules here: https://debbugs.gnu.org/11621 https://debbugs.gnu.org/12783 As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. -assaf
bug#36671: tail: unrecognized file system type 0x794c7630 for ‘/var/log/messages’. please report this to bug-coreutils@gnu.org. reverting to polling
tag 36671 notabug close 36671 stop Hello, On Mon, Jul 15, 2019 at 06:22:47PM +0200, John Koppolu wrote: > tail: unrecognized file system type 0x794c7630 for ‘/var/log/messages’. > please report this to bug-coreutils@gnu.org. reverting to polling You've previously reported this 4 days ago, please see the reply there: https://bugs.gnu.org/36600#8 -assaf
Re: doc: add "version sort" chapter
Hello, On Thu, Jul 11, 2019 at 03:36:19PM +0100, Pádraig Brady wrote: > On 10/07/19 19:57, Assaf Gordon wrote: > > > > I would like to suggest adding a new chapter to the manual, > > detailing the nitty-gritties of "version sort" in coreutils. > > > A few adjustments attached. > Thanks. Attached the updated version, including improvements Bernhard sent off-list. Comments welcomed, - assaf 0001-doc-add-version-sort-ordering-chapter.patch.gz Description: application/gunzip
bug#36600: unrecognized file system type 0x794c7630 for ‘/var/log/messages’. please report this to bug-coreutils@gnu.org. reverting to polling
tag 36600 notabug close 36600 stop Hello, On Thu, Jul 11, 2019 at 05:53:16PM +0200, John Koppolu wrote: > unrecognized file system type 0x794c7630 for ‘/var/log/messages’. please > report this to bug-coreutils@gnu.org. reverting to polling > This has system (overlayfs, commonly used with Docker containers) has been added in version 8.25. Consider upgrading Coreutils if possible. See https://www.gnu.org/software/coreutils/filesystems.html for more details. regards, - assaf
Re: How to print sizes of both files and directories in a directory?
Hello, On 2019-07-01 12:10 a.m., Peng Yu wrote: `du -h --max-depth=1` only print directory sizes. Is there a way to print the sizes of both directories and files in a directory? du -h --max-depth=1 --all as mentioned in the --help screen: -d, --max-depth=N print the total for a directory (or file, with --all) only if it is N or fewer levels below the command line argument; regards, - assaf
Re: How to sort and count efficiently?
On 2019-06-30 11:10 a.m., Peng Yu wrote: The problem with this kind of awk program is that everything will be loaded to memory. Well, those are the to main options: store in memory or resort to disk I/O. each has its own pros and cons. But bare `sort` use external files to save memory. Not exactly - The goal is not to "save" memory - Sort resorts to external files to be able to complete the sort even with it runs out of the (alloted) memory (which can be controlled with the "-S" parameter). I'm not familiar with a program which implements hashing backed by file- storage, but perhaps such program exists. When the hash in awk is too large, accessing it can become very slow (maybe due to potential cache miss or slow down of hash as a function of hash size). Nothing is "free", and using a hash incurs its own costs. If you're using the simplified awk hashing program, try to use other AWK implementations than GNU awk (e.g. I have had some performance gains from switching to "mawk", the default awk in Debian). $ printf "%s\n" a c b b b b b b c \ | mawk 'a[$1]++ {} END { for (i in a) { print i, a[i] } }' a 1 b 6 c 2 Or, if your input is exceedingly large, perhaps consider pre-processing it and splitting the input into smaller files - each one will have less strings and hashing them will consume less memory. The following example splits the input file into 27 files, based on the first letter of the string (and an "other" file for non-letters): mawk '{ l = tolower(substr($0,1,1)) ; if (l>="a" && l<="z") { print $0 > l } else { print $0 > "other" } }' INPUT This is an O(N) operation that doesn't consume any memory (just lots of disk I/O) - and the resulting files will be much smaller - then can be hashed with less memory. Of course this can be extended to split into smaller-grained files. -assaf
Re: How to sort and count efficiently?
Correcting myself: On Sun, Jun 30, 2019 at 10:08:46AM -0600, Assaf Gordon wrote: > On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote: > > > > I have a long list of string (each string is in a line). I need to > > count the number of appearance for each string. > > > > [...] Does anybody know any better way > > to make the sort and count run more efficiently? > > > > Or using gnu awk: use 'asorti' instead of 'asort', with the two-parameter variant: $ printf "%s\n" a c b b b b b b c \ | awk 'a[$1]++ {} END { n = asorti(a,b) for (i = 1; i <= n; i++) { print b[i], a[b[i]] } }' a 1 b 6 c 2 For more details see: https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html#Array-Sorting-Functions -assaf
Re: How to sort and count efficiently?
Hello, On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote: > Hi, > > I have a long list of string (each string is in a line). I need to > count the number of appearance for each string. > > I currently use `sort` to sort the list and then use another program > to do the count. The second program doing the count needs only a small > amount of the memory as the input is sorted. > > [...] Does anybody know any better way > to make the sort and count run more efficiently? > Using awk: awk 'a[$1]++ {} END { for (i in a) { print i, a[i] } }' INPUT \ | sort -k1,1 Or using gnu awk: awk 'a[$1]++ {} END { n = asort(a) ; for (i = 1; i <= n; i++) { print i, a[i] } }' regards, - assaf
bug#35939: version sort is incorrect with hyphen-minus
Hello Paul, On Wed, Jun 26, 2019 at 12:57:14PM -0700, Paul Eggert wrote: > GNU sort uses the same algorithm as glibc strverscmp, I think that both sort and ls use 'filevercmp' - a simplified version that does not support locales (and doesn't fail). The change (from 'strvercmp') was made in: commit e505736f8211a608b00dfe75fb186a5211e1a183 Author: Kamil Dudka Date: Fri Oct 3 11:03:40 2008 +0200 ls and sort: use filevercmp instead of strverscmp https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=e505736f8211a608b00dfe75fb186a5211e1a183 > Has the Debian version-comparison algorithm changed since 1997? If so, could > you give details about the changes to the Debian algorithm? I don't think the algorithm changed in Debian, and also in gnulib there are only a handful of relevant commits, all 10 years old: 9121662f1 2008-10-03 filevercmp: new module 0443c2f39 2009-03-05 filevercmp: Move hidden files up in ordering. 1721cf06d 2009-03-24 filevercmp: handle simple~ and numbered.~3~ backup suffixes 4fd008794 2009-04-09 filevercmp: fix regression cc96df30d 2009-04-09 filevercmp: correct today's change I think (also based on Ian's confirmation) that this discrepancy was from the beginning. I now notice that there's an additional difference: coreutils/gnulib has special handling for extension, hidden files and backup files. As Ian wrote, a documentation improvement is probably the best fix. I'll try to come up with a suggested change. -assaf P.S. For completion, here are few other threads with details/explanations about 'version-sort': https://bugs.gnu.org/18168 https://bugs.gnu.org/22275 https://bugs.gnu.org/22455 https://bugs.gnu.org/33786
bug#35939: version sort is incorrect with hyphen-minus
(Adding Ian Jackson for dpkg/debian-version details) Hello, On Tue, May 28, 2019 at 02:53:39AM +0200, Vincent Lefevre wrote: > With GNU coreutils 8.30 under Debian/unstable, I get: > > $ LC_ALL=C ls > ab-cd abb abe > $ LC_ALL=C ls -v > abb abe ab-cd > > The hyphen-minus character should still be regarded as being less > than the letters (there are no digits, so both are expected to be > equivalent). The GNU coreutils manual says: > [...] Thanks for the report and the clear details. To summarize, "ls -v" and "sort -V" (coreutils' version sort) behaves differently than other implementations in regards to minus character: $ printf "%s\n" abb ab-cd | sort -V abb ab-cd $ v1="abb" $ v2="ab-cd" $ dpkg --compare-versions "$v1" lt "$v2" && printf "$v1\n$v2\n" || printf "$v2\n$v1\n" ab-cd abb If I understand correctly, The reason is that in Debian's version comparison algorithm [1], the minus character has a special meaning: it separates the "upstream version" part from the "debian revision" part. In Debian's implementation [2], a version string is first split into three parts (epoch, upstream version, debian revision) using ":" for epoch delimiter and "-" for revision delimiter. Only then the three parts are compared, separately [3]. [1] https://www.debian.org/doc/debian-policy/ch-controlfields.html#version [2] https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/parsehelp.c#n191 [3] https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/version.c#n140 On ther other hand, coreutils' implementation (from gnulib [4]) does not break version string into three parts - it treats the entire string as a single "upstream version" part. The rules for sorting the "upstream version" string say: "... The lexical comparison is a comparison of ASCII values modified so that all the letters sort earlier than all the non-letters and so that a tilde sorts before anything" (from [1]) [4] https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/filevercmp.c Therefore, dpkg first seprates "ab" from "cd", then compares "ab" to "abb" - and 'ab' comes first; Coreutils compare "ab-cd" to "abb" (or technically, just "ab-" to "abb"), and because "letters sort earlier than all non-letters", "abb" comes first. I hope this helps explain the differences (I also hope this explanation is correct, and I invite others to chime in). regards, - assaf
bug#35654: We've found a vulnerability of gnu chown, please check it and request a cve id for us.
tag 35654 close 35654 stop Hello, On Thu, May 09, 2019 at 11:53:11PM +0800, st0n3 ss wrote: > Hello! we have found a vulnerability of command chown, please check it.If > it is a vulnerability. please request a cve id for use, thank you!chown -h > bypass Given Paul's and Bob's detailed answers, I'm closing this as "not a bug". Discussion can continue by replying to this thread. regards, - assaf
bug#36130: split bug
tag 36130 notabug close 36130 stop Hello, On Mon, Jun 10, 2019 at 04:50:20PM -0600, Assaf Gordon wrote: > On 2019-06-10 12:28 p.m., Heather Wick wrote: > > Verbose: This seems to have made the same number of files this time; not > > sure why the other 3-4 times I ran it it did not. They appear to be the > > same size, with paired last reads > [...] > > Glad to hear it worked. > > Could it be that in previous times the queued job ran out of disk space? > > That would be my first guess, as such things are common in shared > grid/cluster environments, particularly if your job runs in a temporary > and limited storage location (e.g. "/tmp/job-"). With no further comments, I'm closing this ticket. If more issues arise (or this was not adequate solution) we can always re-open this ticket. regards, -assaf
bug#35632: date Parse of '13:00 + 2 hours' Broken.
tag 35632 notabug close 35632 stop Hello, (sorry for the delayed reply) On Wed, May 08, 2019 at 12:57:10PM +0100, Ralph Corderoy wrote: > > Using date from coreutils 8.31-1 on Arch Linux. > This surprised me. > > $ TZ=UTC0 /bin/date -d '1pm + 2 hours' > Wed 8 May 15:00:00 UTC 2019 > $ TZ=UTC0 /bin/date -d '13:00 + 2 hours' > Wed 8 May 12:00:00 UTC 2019 > > The documentation doesn't suggest `1pm' and `13:00' are treated > differently. `--debug' helps. > > $ TZ=UTC0 /bin/date --debug -d '1pm + 2 hours' > date: parsed time part: 01:00:00pm > date: parsed relative part: +2 hour(s) > ... > $ TZ=UTC0 /bin/date --debug -d '13:00 + 2 hours' > date: parsed time part: 13:00:00 UTC+02 > date: parsed relative part: +1 hour(s) > date: input timezone: parsed date/time string (+02) > ... > > It looks like parsing is broken in the second case. Thank you for for providing detailed output with "--debug", makes things easier to troubleshoot. When encountering a time string (HH:MM or HH:MM:SS) followed by a plus sign and a number, date's parser *always* treats it as a timezone (giving timezones higher priority than time adjustments). > The result I wanted can also be obtained my omitting the `+'. > > $ TZ=UTC0 /bin/date -d '1pm 2 hours' > Wed 8 May 15:00:00 UTC 2019 > $ TZ=UTC0 /bin/date -d '13:00 2 hours' > Wed 8 May 15:00:00 UTC 2019 And this is indeed one possibly solution. Other similar issues are detailed here: https://lists.gnu.org/archive/html/bug-coreutils/2018-10/msg00126.html As such, I'm closing this ticket, but discussion can continue by replying to this thread. regards, - assaf
bug#36383: date command processes timezone differently when doing math
tag 36383 notabug close 36383 stop Hello, On Tue, Jun 25, 2019 at 04:10:07PM -0700, Brian Woods wrote: > When doing a math operation to a date command it appear to process the > timezone differently. [...] > > #echo $datNow > 2019-06-25 15:21:34 > > #date -d "$datNow + 1 minute" "+%Y-%m-%d %H:%M:%S" --debug > date: parsed date part: (Y-M-D) 2019-06-25 > date: parsed time part: 15:21:34 UTC+01 > date: parsed relative part: +1 minutes > date: input timezone: parsed date/time string (+01) Thank you for providing detailed examples with "--debug", makes things much easier to troubleshoot. The issue is that a time string (HH:MM:SS) followed by a plus sign and a number is *always* taken to be a time zone. Using a value other than 1 will show it more clearly: $ date -d "$datNow + 8 minutes" "+%Y-%m-%d %H:%M:%S" --debug date: parsed date part: (Y-M-D) 2019-06-25 date: parsed time part: 15:21:34 UTC+08 date: parsed relative part: +1 minutes date: input timezone: parsed date/time string (+08) The "+8" part is treated as timezone, and the remaining text ("minutes") is taken as a one-minute time adjustment. One solution is to just remove the plus sign: $ date -d "$datNow 8 minutes" "+%Y-%m-%d %H:%M:%S" --debug date: parsed date part: (Y-M-D) 2019-06-25 date: parsed time part: 15:21:34 date: parsed relative part: +8 minutes date: input timezone: system default [...] 2019-06-25 15:29:34 Another is to specify the time zone: $ date -d "$datNow +00:00 +8 minutes" "+%Y-%m-%d %H:%M:%S" --debug date: parsed date part: (Y-M-D) 2019-06-25 date: parsed time part: 15:21:34 UTC+00 date: parsed relative part: +8 minutes date: input timezone: parsed date/time string (+00) [...] 2019-06-25 09:29:34 More examples of adjusting time strings are here (your example is similar to case #1): https://lists.gnu.org/archive/html/bug-coreutils/2018-10/msg00126.html As such, I'm closing this ticket but discussion can continue by replying to this thread. regards, - assaf
Re: About cc and dd
Hello, On 2019-06-23 9:06 a.m., altear wrote: If you don't mind, can you tell me how cp and dd works, just in summary, or maybe can tell me from which line thats code work? A nice code overview is available here: http://www.maizure.org/projects/decoded-gnu-coreutils/cp.html http://www.maizure.org/projects/decoded-gnu-coreutils/dd.html And code exploration (using OpenGrok) here: https://opengrok.housegordon.com/source//xref/coreutils/src/dd.c https://opengrok.housegordon.com/source//xref/coreutils/src/cp.c -assaf
Re: [musl] Re: date-debug test failure with musl
Hello, On 2019-05-16 11:52 a.m., Niklas Hambüchen wrote: will you submit your patch for inclusion, given that it works well? pushed here: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=0251229bfd9617e8a35cf9dd7d338d63fff74a0c -assaf
bug#36130: split bug
Hello, On 2019-06-10 12:28 p.m., Heather Wick wrote: Thank you so much for your response. Here are the results of the tests you sent: Verbose: This seems to have made the same number of files this time; not sure why the other 3-4 times I ran it it did not. They appear to be the same size, with paired last reads [...] Glad to hear it worked. Could it be that in previous times the queued job ran out of disk space? That would be my first guess, as such things are common in shared grid/cluster environments, particularly if your job runs in a temporary and limited storage location (e.g. "/tmp/job-"). I would suspect that the exit-code you are seeing is the exit code of the entire job (that is - of the shell script that is being qsub'd), and not necessarily that of 'split' (then again, this might not be correct if you explicitly checked the exit code of 'split'). Given that your grid environment already has configuration issues (the bash and "module" related errors), I would not be surprised if the exit code is not reliable. I would strongly encourage to always look into the STDERR file of the job to verify no other errors occurred. Or, perhaps write shell scripts more defensively, like so: [...] zcat MH1_R1.fastq.gz | split -l 4000 - DHT_R1_ \ && echo split MH1_R1 OK \ || echo split MH1_R1 FAILED [...] Then checking the STDOUT for positive confirmation each program succeeded. Or perhaps: # define a shell function "die" to print an error and terminate die() { base=$(basename "$0") echo "$base: error: $*" >&2 exit 1 } zcat MH1_R1.fastq.gz | split -l 4000 - DHT_R1_ \ || die "split MH1_R1 failed" And then run at least one job that will fail on purpose, and ensure you see the error message in the STDERR log, and you get a non-zero exit code (and then ensure you use 'die' on every command). It is sometimes recommended to use "set -e" for "easy" error handling in shell scripts- but I would recommend against it. Many reasons detailed here: https://mywiki.wooledge.org/BashFAQ/105 It might be more frustrating to add such extra checks on every program, but from my humble experience, grid environments bring on so many more intermittent and transient problems that it is definitely worth it. STDERR: The only thing in the stderr file is an odd duck of: -sh: module: line 1: syntax error: unexpected end of file -sh: error importing function definition for `BASH_FUNC_module' Python 3.6.8 :: Anaconda, Inc. /bin/sh: module: line 1: syntax error: unexpected end of file /bin/sh: error importing function definition for `BASH_FUNC_module' but this prints for every job I run with this particular flavor of conda/bash and doesn't seem to affect anything else (as far as I know) These errors are specific to your grid/cluster environment, and the best place to ask is the I.T or bioinformatics department in your institute (whomever is in charge of the cluster). Broadly speaking, "module" is mechanism that ease the use of various software packages. It is usally setup by your IT administrators. A typical use-case is to have different version of programs in non- standard locations, e.g. samtools version 1.6 in /opt/it/programs/samtools-1.6 and samtools version 1.9 in /opt/bioinfo/tools/new/samtools/ and then cluster users (e.g. you) just need to add: "module load samtools-1.8" and have the command "samtools" just work without knowing the gritty details of where the program is. It seems that in your case, something relating to the "module" setup is broken. More information here: https://en.wikipedia.org/wiki/Environment_Modules_(software) All jobs finished well below allotted memory and with exit status 0, even when split didn't make the right number of output files. > > Do you know any reason why the behavior would be inconsistent? The "alloted memory" is a non-issue for this "split" command, it will always use very little amount of memory regardless of how big the input files are. As for "exit status 0" - I can't be sure, but I suspect the exit status you see is the one of the entire job (i.e. the shell script), and perhaps it does not represent the exit code of the "split" program. If you have the STDERR files of the jobs which failed, it's worth checking them for any additional error messages. Pairing check: unfortunately my server's version of bash doesn't support paste in this way, I've run into this issue before but I forget what the workaround is. I can't run this command interactively because my server times out (these files are > 3 billion lines each, so it takes a long time to zcat them) Ah yes, the construct: program <(other program) is a "bash" feature that is not available in simple shell scripts (interactive use vs non-interactive and other things). One work-around is to run (from inside your script): bash -c "paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz)" \ | awk 'NR%4!=1
bug#36130: split bug
Hello, On Fri, Jun 07, 2019 at 09:48:44PM -0400, Heather Wick wrote: > Yes, sorry, I should have specified that I already checked that the > original fastq files are indeed paired and sorted with the same number of > lines and same starting/ending IDs, narrowing down the issue to a problem > with split. It could be a problem with "split", but we'll need to dig a bit deeper to be able to pinpoint the exact issue. Could you please try the following commands and post the results? zcat MH1_R1.fastq.gz \ | split --verbose -l 4000 - DHT_R1_ > DHT_R1.log ; echo DHT_R1 exit code: $? zcat MH1_R2.fastq.gz \ | split --verbose -l 4000 - DHT_R2_ > DHT_R2.log ; echo DHT_R2 exit code: $? wc -l DHT_R1.log DHT_R2.log Two more questions: 1. can you post the result of "split --version" ? 2. You mentioned "jobs" - if you are running these as submitted jobs on a cluster (e.g. with "qsub"), can you double-check the STDERR log files to ensure no errors where encountered ? If we still can't pinpoint the issue, the next steps would be to check the DHT_R{1,2}.log files, and then try to compare the content of the splitted files. I assume the input files are indeed correctly paired, but just to check, if you could try the following command, it should not print anything to the screen (indicating all sequence IDs are paired): paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \ | awk 'NR%4!=1 { next } $1!=$3 { print "Error in line " NR ":" $1 " vs " $3 }' regards, - assaf
Re: Error with clock?
Hello, On Fri, Jun 07, 2019 at 04:15:33PM +, h.lansel wrote: > I am using Debian Stretch with XFCE. I was customizing my clock, and > encountered a few errors. I don't know what package in relation to > this software cause this and I don't even know how to do bug report, > so I am writing to you, instead to bug report e-mail address. As you suspected, this is not the right mailing list for such bugs. The problem you describe might be related to Debian, or to the XFCE project (or to another project, if the clock applet you are using is not a built-in XFCE applet). A good place to start is likely the Debian user mailing list: https://lists.debian.org/debian-user/ Or perhaps submitting a Debian bug (if you are sure it is a bug): https://www.debian.org/Bugs/Reporting Alternatively, if Debian people indicate it is a problem in XFCE, there is the XFCE bugzilla website: https://bugzilla.xfce.org/ This mailing list relates to GNU coreutils - a collection of command-line programs which are not typically used directly in XFCE. regards, - assaf
bug#36130: split bug
Hello, On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote: > I am using split to split up some large, paired fastq files [...]: > > zcat MH1_R1.fastq.gz | split - -l 4000 DHT_R1_ > zcat MH1_R2.fastq.gz | split - -l 4000 DHT_R2_ > > This creates 96 chunks for the R1 and 95 chunks for R2, even though the > orignal fastq files have the same number of reads. > > Do you have any suggestions for how to proceed? Perhaps zcatting and piping > the files is not the best way to call split? To help diagnose to issue better, please run the following commands and tell us what are the results: 1. number of lines in each file: zcat MH1_R1.fastq.gz | wc -l zcat MH1_R2.fastq.gz | wc -l 2. The first two sequence IDs: zcat MH1_R1.fastq.gz | head -n8 | grep ^@ zcat MH1_R2.fastq.gz | head -n8 | grep ^@ 3. Last two sequence IDs: zcat MH1_R1.fastq.gz | tail -n8 | grep ^@ zcat MH1_R2.fastq.gz | tail -n8 | grep ^@ These will just verify the FASTQ files are indeed paired with no surprises. The files should have the same number of lines, and matching sequence IDs in the first and last lines. regards, - assaf
Re: question about parallelism in cp command
> -Original Message- > From: Olga Kornievskaia [mailto:a...@umich.edu] > > Is there something philosophically incorrect in making a “cp” > multi-threaded and allow for parallel copies when “cp -r” is done? If > it’s something that’s possible, are there any plans in making a > multi-threaded cp? On Thu, Jun 06, 2019 at 02:17:40PM -0400, Olga Kornievskaia wrote: > The use case I'm consider are network file systems. So perhaps a > default can be a single threaded system for the local filesystems but > add an option to cp for the -r case that would enable network file > system to copy files in parallel. In an interesting coincidence, see recent post by Paul Kolano here: https://lists.gnu.org/archive/html/coreutils/2019-06/msg00011.html (Note that his suggestions have not been reviewed yet, so this is neither endorsement nor criticism of his code.) regards, - assaf
Re: patches for multi-threaded cp and md5sum (along with other features)
Hello Paul, On Mon, Jun 03, 2019 at 09:29:20PM +, Paul Kolano (ARC-TN)[InuTeq, LLC] wrote: > Many years ago, I developed a set of patches to add a number of > features to cp and md5sum [...] > https://pkolano.github.io/projects/mutil.html Thanks for sharing, looks very impressive. Because the changes are massive, before we can start looking into their details and merits we'll need copyright assignment from the copyright holder of the code (you or NASA). For details please see here: https://www.gnu.org/licenses/why-assign.en.html To start the process, please fill and submit the following form: https://git.sv.gnu.org/cgit/gnulib.git/tree/doc/Copyright/request-assign.future --- Additionally, A cursory look at the patches [1] reveals several added terms in accordance of GPLv3 section 7 (e.g. Indemnifying NASA and the U.S. government). This is of course absolutely fine and valid for a GPL project, but I'm not sure if the FSF will agree to add additional terms to GNU coreutils (I'm not saying they won't, I simply don't know). Perhaps other maintainers can chime in, and if not, it is probably wise to ask licens...@gnu.org before we can consider these patches for inclusion. [1] https://github.com/pkolano/mutil/blob/master/patch/coreutils-8.22.patch --- regards, - assaf
Re: How to calculate date relative to another date?
Hello, On Wed, May 22, 2019 at 10:41:52AM -0400, Michael Stone wrote: > In general my advice is to just avoid the date parsing entirely, it will > never, ever do what you predict. I'm sorry to hear that is your experience with date(1) parsing. My different advice is to use "date --debug" to first troubleshoot what is being parsed, then search the mailing list archives for many common solutions, and lastly, write to coreutils@gnu.org with questions. > If you find something that happens to work, > just copy and paste it and never change it. It would be nice if there were a > new, simple and predictable grammer option in date(1) (abandon the natural > language guessing) but nobody has ever wanted to do the work. :) The grammar is predictable (though perhaps not trivial) for the simple reason it is based on a fixed set of rules defined in a GNU Bison ".y" file: https://git.sv.gnu.org/cgit/gnulib.git/tree/lib/parse-datetime.y . There are no "natural language guessing" algorithms. Instead, and perhap that's the confusing part, there are many attempts by the parser to match date strings into known meaning. For example, /NN/NN is parsed as /MM/DD. NN/NN/ is parsed as MM/DD/ (the north american way). NN is parsed as YYMMDD (with YY being 19YY or 20YY with 69 as the cutoff). Then similar pattern are matched for time, timezone, and date/time adjustments. The different formats and patterns are explained here: https://www.gnu.org/software/coreutils/manual/html_node/Date-input-formats.html#Date-input-formats > You might try "2018-05-01 59 months ago", but I'd suggest using a python > module or somesuch with a more regular grammar if you want something > maintainable in the long term. I would argue that "long term" and "maintainable" is exactly what GNU date(1) parsing is. You'd be hard-pressed to find programs with longer-term support than gnu date(1), including python modules. The confusing and possibly frustrating part happens when trying to mix different parsing "parts" like date and time and timezone and relative time calculations. The "--debug" option should be the first tool to use. The most common issues are: Crossing daylight-saving-time (getting unexpected "tomorrow" results): https://lists.gnu.org/archive/html/bug-coreutils/2019-04/msg3.html https://lists.gnu.org/archive/html/bug-coreutils/2016-04/msg00046.html https://debbugs.gnu.org/cgi/bugreport.cgi?bug=30795 Mixing time and time-zones: https://lists.gnu.org/archive/html/bug-coreutils/2018-10/msg00126.html Months-related adjustments: https://lists.gnu.org/archive/html/bug-coreutils/2018-10/msg00357.html General adjustments, and order of operations: https://lists.gnu.org/archive/html/bug-coreutils/2018-02/msg5.html Leap years and such: https://lists.gnu.org/archive/html/bug-coreutils/2017-03/msg00047.html Inner-working of date adjustments: https://lists.gnu.org/archive/html/bug-coreutils/2017-03/msg00044.html Hope this helps, -assaf