coreutils: bug in date --iso-8601={seconds,ns}?
Dear GNU maintainer/team/email-dude, in thre process of writing an Atom-feed-generator in bash, I discovered what MIGHT be a bug/documentation misinterpretation in GNU date's --iso-8601 switch when invoked with ns or seconds as a parameter. RFC3339 and some documents regarding ISO 8601 I could find on the web quickly seem to suggest that the time-offset component of the output should match the regex /\d\d:\d\d/; date, however, matches /\d\d\d\d/. As said, I don't know if the actual ISO 8601 standard specifies that as fair game, as I don't know where I could take a look at it - but it might not harm investigating if you aren't completely sure about the correct behaviour either... Anyway, thanks for taking the time and making coreutils available to us mere mortals in the first place - happy hacking! :-) -- with best regards: - Johannes Truschnigg ( [EMAIL PROTECTED] ) www: http://johannes.truschnigg.info/ phone: +43 650 2 17 jabber: [EMAIL PROTECTED] Please do not bother me with HTML-eMail or attachments. Thank you. signature.asc Description: This is a digitally signed message part. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils: bug in date --iso-8601={seconds,ns}?
On Fri, 22 Aug 2008, Johannes Truschnigg wrote: in thre process of writing an Atom-feed-generator in bash, I discovered what MIGHT be a bug/documentation misinterpretation in GNU date's --iso-8601 switch when invoked with ns or seconds as a parameter. --iso-8601 is deprecated since coreutils 5.90. RFC3339 and some documents regarding ISO 8601 I could find on the web quickly seem to suggest that the time-offset component of the output should match the regex /\d\d:\d\d/; date, however, matches /\d\d\d\d/. I think --rfc-3339=seconds will do what you want. Cheers, Phil ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort --ignore-case option changes underscore sort position
Thanks for the quick and very clear explanation, Bob! I saw the --ignore-case option definition, but the implications of it weren't immediately apparent to me. It was especially confusing because I was comparing with the output of a different tool which folds to lowercase when doing comparisons and couldn't understand why there was a difference. Also, the underscore character is particularly affected due to its heavy use in filenames and program identifiers. Maybe the documentation could be enhanced, something along the lines of: The sort order of non-case-sensitive characters, such as punctuation, will be affected if their sort order is different relative to lowercase and uppercase characters. For example, in the C locale, the underscore character sorts in between uppercase characters and lowercase characters, causing the strings m and _ to sort differently with and without the --ignore-case option. On Fri, Aug 22, 2008 at 1:27 AM, Bob Proulx [EMAIL PROTECTED] wrote: ... `-f' `--ignore-case' Fold lowercase characters into the equivalent uppercase characters when comparing so that, for example, `b' and `B' sort as equal. The `LC_CTYPE' locale determines character types. Therefore your test case: { echo a_; echo ax; } | sort --ignore-case Is really the same as: $ { echo a_; echo ax; } | sort a_ ax $ { echo A_; echo AX; } | sort AX A_ $ { echo A_; echo AX; } | sort --ignore-case AX A_ When using upper case you can see that it is equivalent to using the --ignore-case option. Perhaps this should have been more accurately called --convert-to-upper-case-before-sorting. The surprising part might be realizing that underscore collates between the upper and lower case letters when using the C/POSIX standard sort ordering. That is the standard legacy behavior. It does this along with [ \ ] ^ _ ` which all occur between Z and a in the US-ASCII code table. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Bug in wc
Dear maintainers, There is a bug in the implementation of the -L parameter in wc. It is triggered by http://www.ime.usp.br/~am/122/eps/gapqm2.gz Check this out: $ zcat gapqm2.gz |wc -l -c -L 1 6297954 6353180 That is, the single line is longer than the whole file. This was pointed out by William A. M. Gnann [EMAIL PROTECTED] Have fun! -- Arnaldo Mandel Departamento de Ciência da Computação - Computer Science Department Universidade de São Paulo, Bra[sz]il [EMAIL PROTECTED] Talvez você seja um Bright http://the-brights.net Maybe you are a Bright. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Bug in wc
Arnaldo Mandel [EMAIL PROTECTED] wrote: Dear maintainers, There is a bug in the implementation of the -L parameter in wc. It is triggered by http://www.ime.usp.br/~am/122/eps/gapqm2.gz Check this out: $ zcat gapqm2.gz |wc -l -c -L 1 6297954 6353180 That is, the single line is longer than the whole file. This was pointed out by William A. M. Gnann [EMAIL PROTECTED] Thanks for reporting it and for giving credit. FYI, here's a smaller reproducer: $ printf '\t'|wc -L 8 ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: `count-one-bits' - LGPLv2+
Ben Pfaff [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Ludovic Courtès) writes: Would you be OK to relicense `count-one-bits' under LGPLv2+ for use in Guile 1.9 (aka. the development branch)? I don't know who gets to make these decisions, but as the module's maintainer I'm fine with that. For small modules with few or no dependencies, relaxing the license to LGPLv2+ shouldn't be a problem. Go for it. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Bug in wc (cont.)
My earlier bug report lacked a pssibly relevant piece of info: The bug showed up with versions 6.10 and 5.97 of wc, on Linux 2.6.24 and 2.6.11, i686 and x86_64, LC_ALL=C. am -- Arnaldo Mandel Departamento de Ciência da Computação - Computer Science Department Universidade de São Paulo, Bra[sz]il [EMAIL PROTECTED] Talvez você seja um Bright http://the-brights.net Maybe you are a Bright. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
RFC: wc --max-line-length vs. TABs [Re: Bug in wc
Jim Meyering [EMAIL PROTECTED] wrote: Arnaldo Mandel [EMAIL PROTECTED] wrote: Dear maintainers, There is a bug in the implementation of the -L parameter in wc. It is triggered by http://www.ime.usp.br/~am/122/eps/gapqm2.gz Check this out: $ zcat gapqm2.gz |wc -l -c -L 1 6297954 6353180 That is, the single line is longer than the whole file. This was pointed out by William A. M. Gnann [EMAIL PROTECTED] Thanks for reporting it and for giving credit. FYI, here's a smaller reproducer: $ printf '\t'|wc -L 8 This behavior is not specified, and is currently untested. (it's a GNU invention, from Bruno Haible in textutils-1.22d, which was back in 1997) http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=ab5ff1597f5d734b711fbd95389cefcc8203d51c I.e., the following change to make --max-line-length (-L) never count a TAB as more than one byte does not induce any test failure. I'm tempted to make the change, but it seems too drastic, after 11 years. Do any of you rely on the current TAB-counting behavior of GNU wc? Bruno, what do you think? diff --git a/src/wc.c b/src/wc.c index 0bb1929..d44cf96 100644 --- a/src/wc.c +++ b/src/wc.c @@ -363,7 +363,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) linepos = 0; goto mb_word_separator; case '\t': - linepos += 8 - (linepos % 8); + linepos++; goto mb_word_separator; case ' ': linepos++; @@ -437,7 +437,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) linepos = 0; goto word_separator; case '\t': - linepos += 8 - (linepos % 8); + linepos++; goto word_separator; case ' ': linepos++; ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc
Jim Meyering wrote: I'm tempted to make the change, but it seems too drastic, after 11 years. Do any of you rely on the current TAB-counting behavior of GNU wc? Hi, It looks like TAB characters aren't alone in being counted by printed width rather than count: $ echo '好' | wc -L 2 Does it make sense to change the behavior for TAB, but not for wide characters? Bo diff --git a/src/wc.c b/src/wc.c index 0bb1929..b3f1ab2 100644 --- a/src/wc.c +++ b/src/wc.c @@ -378,7 +378,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) { int width = wcwidth (wide_char); if (width 0) - linepos += width; + linepos ++; if (iswspace (wide_char)) goto mb_word_separator; in_word = true; ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc
Bo Borgerson wrote (on Aug 22, 2008): Does it make sense to change the behavior for TAB, but not for wide characters? Relying on an undocumented tab length seems bad. However, on chars I suggest you just apply the bug-feature operator: document that line length is in chars, and explain that chars is a locale-dependent concept. Just my 2 cents. am ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc
Hi Jim, This behavior is not specified, and is currently untested. (it's a GNU invention, from Bruno Haible in textutils-1.22d, which was back in 1997) The intention of this option is and was to measure the maximum number of screen columns used by a file. For many purposes, people are encouraged to create/send/commit files with at most 80 screen columns. Or at most 79 screen columns for others. Or at most 74 columns for GNU texinfo files. The option '-L' was intended as a fast check for this metric. The original mail, sent to bug-gnu-utils on 1997-10-31, had this explanation: While GNU wc returns the vertical extent of a piece of text - i.e. the number of lines - it does not yet return the horizontal extent of a piece of text - i.e. the number of columns. This is a useful functionality, if you want to know - whether a text will fit on the paper when sent to the printer, - whether an email exceeds the recommended 72 character limit, - (in combination with nm) how long the identifiers were that made `ranlib' dump core, - etc. I propose a clarification in the documentation (see below). I'm tempted to make the change, but it seems too drastic, after 11 years. Do any of you rely on the current TAB-counting behavior of GNU wc? Bruno, what do you think? If you change the option to count every tab as 1, or every character as 1 regardless of its screen width, the option -L is not usable for its main purpose any more. Bruno 2008-08-22 Bruno Haible [EMAIL PROTECTED] * doc/coreutils.texi (wc invocation): Explain what the option -L measures. --- coreutils.texi.bak 2008-08-22 23:55:47.0 +0200 +++ coreutils.texi 2008-08-22 23:59:03.0 +0200 @@ -3137,7 +3137,9 @@ With the @option{--max-line-length} option, @command{wc} prints the length of the longest line per file, and if there is more than one file it -prints the maximum (not the sum) of those lengths. +prints the maximum (not the sum) of those lengths. The line lengths here +are measured in screen columns, according to the current locale and +assuming tab positions in every 8th column. The program accepts the following options. Also see @ref{Common options}. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils