coreutils: bug in date --iso-8601={seconds,ns}?

2008-08-22 Thread Johannes Truschnigg
Dear GNU maintainer/team/email-dude,

in thre process of writing an Atom-feed-generator in bash, I discovered what 
MIGHT be a bug/documentation misinterpretation in GNU date's --iso-8601 
switch when invoked with ns or seconds as a parameter.

RFC3339 and some documents regarding ISO 8601 I could find on the web quickly 
seem to suggest that the time-offset component of the output should match the 
regex /\d\d:\d\d/; date, however, matches /\d\d\d\d/.

As said, I don't know if the actual ISO 8601 standard specifies that as fair 
game, as I don't know where I could take a look at it - but it might not harm 
investigating if you aren't completely sure about the correct behaviour 
either...

Anyway, thanks for taking the time and making coreutils available to us mere 
mortals in the first place - happy hacking! :-)

-- 
with best regards:
- Johannes Truschnigg ( [EMAIL PROTECTED] )

www: http://johannes.truschnigg.info/
phone: +43 650 2 17
jabber: [EMAIL PROTECTED]

Please do not bother me with HTML-eMail or attachments. Thank you.


signature.asc
Description: This is a digitally signed message part.
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: coreutils: bug in date --iso-8601={seconds,ns}?

2008-08-22 Thread Philip Rowlands

On Fri, 22 Aug 2008, Johannes Truschnigg wrote:

in thre process of writing an Atom-feed-generator in bash, I discovered 
what MIGHT be a bug/documentation misinterpretation in GNU date's 
--iso-8601 switch when invoked with ns or seconds as a parameter.


--iso-8601 is deprecated since coreutils 5.90.


RFC3339 and some documents regarding ISO 8601 I could find on the web quickly
seem to suggest that the time-offset component of the output should match the
regex /\d\d:\d\d/; date, however, matches /\d\d\d\d/.


I think --rfc-3339=seconds will do what you want.


Cheers,
Phil


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: sort --ignore-case option changes underscore sort position

2008-08-22 Thread John Wiersba
Thanks for the quick and very clear explanation, Bob!  I saw the
--ignore-case option definition, but the implications of it weren't
immediately apparent to me.  It was especially confusing because I was
comparing with the output of a different tool which folds to lowercase when
doing comparisons and couldn't understand why there was a difference.  Also,
the underscore character is particularly affected due to its heavy use in
filenames and program identifiers.

Maybe the documentation could be enhanced, something along the lines of:

The sort order of non-case-sensitive characters, such as punctuation, will
be affected if their sort order is different relative to lowercase and
uppercase characters.  For example, in the C locale, the underscore
character sorts in between uppercase characters and lowercase characters,
causing the strings m and _ to sort differently with and without the
--ignore-case option.

On Fri, Aug 22, 2008 at 1:27 AM, Bob Proulx [EMAIL PROTECTED] wrote:

 ...
  `-f'
  `--ignore-case'
   Fold lowercase characters into the equivalent uppercase characters
   when comparing so that, for example, `b' and `B' sort as equal.
   The `LC_CTYPE' locale determines character types.

 Therefore your test case:

  { echo a_; echo ax; } | sort --ignore-case

 Is really the same as:

  $ { echo a_; echo ax; } | sort
  a_
  ax

   $ { echo A_; echo AX; } | sort
  AX
  A_

  $ { echo A_; echo AX; } | sort --ignore-case
  AX
  A_

 When using upper case you can see that it is equivalent to using the
 --ignore-case option.  Perhaps this should have been more accurately
 called --convert-to-upper-case-before-sorting.

 The surprising part might be realizing that underscore collates
 between the upper and lower case letters when using the C/POSIX
 standard sort ordering.  That is the standard legacy behavior.  It
 does this along with [ \ ] ^ _ ` which all occur between Z and a in
 the US-ASCII code table.
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Bug in wc

2008-08-22 Thread Arnaldo Mandel
Dear maintainers,

There is a bug in the implementation of the -L parameter in wc.
It is triggered by 

http://www.ime.usp.br/~am/122/eps/gapqm2.gz

Check this out:

$ zcat gapqm2.gz |wc -l -c -L
  1 6297954 6353180

That is, the single line is longer than the whole file.

This was pointed out by 

  William A. M. Gnann [EMAIL PROTECTED]

Have fun!

-- 
Arnaldo Mandel
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il  
[EMAIL PROTECTED]
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug in wc

2008-08-22 Thread Jim Meyering
Arnaldo Mandel [EMAIL PROTECTED] wrote:
 Dear maintainers,

 There is a bug in the implementation of the -L parameter in wc.
 It is triggered by

 http://www.ime.usp.br/~am/122/eps/gapqm2.gz

 Check this out:

 $ zcat gapqm2.gz |wc -l -c -L
   1 6297954 6353180

 That is, the single line is longer than the whole file.

 This was pointed out by

   William A. M. Gnann [EMAIL PROTECTED]

Thanks for reporting it and for giving credit.
FYI, here's a smaller reproducer:

  $ printf '\t'|wc -L
  8


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: `count-one-bits' - LGPLv2+

2008-08-22 Thread Jim Meyering
Ben Pfaff [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Ludovic Courtès) writes:

 Would you be OK to relicense `count-one-bits' under LGPLv2+ for use in
 Guile 1.9 (aka. the development branch)?

 I don't know who gets to make these decisions, but as the
 module's maintainer I'm fine with that.

For small modules with few or no dependencies,
relaxing the license to LGPLv2+ shouldn't be a problem.
Go for it.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Bug in wc (cont.)

2008-08-22 Thread Arnaldo Mandel
My earlier bug report lacked a pssibly relevant piece of info:

The bug showed up with versions 6.10 and 5.97 of wc, on Linux 2.6.24
and 2.6.11, i686 and x86_64, LC_ALL=C.

am

-- 
Arnaldo Mandel
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il  
[EMAIL PROTECTED]
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Jim Meyering
Jim Meyering [EMAIL PROTECTED] wrote:
 Arnaldo Mandel [EMAIL PROTECTED] wrote:
 Dear maintainers,

 There is a bug in the implementation of the -L parameter in wc.
 It is triggered by

 http://www.ime.usp.br/~am/122/eps/gapqm2.gz

 Check this out:

 $ zcat gapqm2.gz |wc -l -c -L
   1 6297954 6353180

 That is, the single line is longer than the whole file.

 This was pointed out by

   William A. M. Gnann [EMAIL PROTECTED]

 Thanks for reporting it and for giving credit.
 FYI, here's a smaller reproducer:

   $ printf '\t'|wc -L
   8

This behavior is not specified, and is currently untested.
(it's a GNU invention, from Bruno Haible in textutils-1.22d,
which was back in 1997)

http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=ab5ff1597f5d734b711fbd95389cefcc8203d51c

I.e., the following change to make --max-line-length (-L)
never count a TAB as more than one byte does not induce
any test failure.

I'm tempted to make the change, but it seems too drastic, after 11 years.
Do any of you rely on the current TAB-counting behavior of GNU wc?

Bruno, what do you think?


diff --git a/src/wc.c b/src/wc.c
index 0bb1929..d44cf96 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -363,7 +363,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
  linepos = 0;
  goto mb_word_separator;
case '\t':
- linepos += 8 - (linepos % 8);
+ linepos++;
  goto mb_word_separator;
case ' ':
  linepos++;
@@ -437,7 +437,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
  linepos = 0;
  goto word_separator;
case '\t':
- linepos += 8 - (linepos % 8);
+ linepos++;
  goto word_separator;
case ' ':
  linepos++;


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Bo Borgerson
Jim Meyering wrote:
 
 I'm tempted to make the change, but it seems too drastic, after 11 years.
 Do any of you rely on the current TAB-counting behavior of GNU wc?
 

Hi,

It looks like TAB characters aren't alone in being counted by printed
width rather than count:

$ echo '好' | wc -L
2

Does it make sense to change the behavior for TAB, but not for wide
characters?

Bo
diff --git a/src/wc.c b/src/wc.c
index 0bb1929..b3f1ab2 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -378,7 +378,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
 		{
 		  int width = wcwidth (wide_char);
 		  if (width  0)
-			linepos += width;
+			linepos ++;
 		  if (iswspace (wide_char))
 			goto mb_word_separator;
 		  in_word = true;
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Arnaldo Mandel
Bo Borgerson wrote (on Aug 22, 2008):
  
  Does it make sense to change the behavior for TAB, but not for wide
  characters?

Relying on an undocumented tab length seems bad.  However, on chars I
suggest you just apply the bug-feature operator: document that line
length is in chars, and explain that chars is a locale-dependent
concept.

Just my 2 cents.

am



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Bruno Haible
Hi Jim,

 This behavior is not specified, and is currently untested.
 (it's a GNU invention, from Bruno Haible in textutils-1.22d,
 which was back in 1997)

The intention of this option is and was to measure the maximum number of
screen columns used by a file. For many purposes, people are encouraged
to create/send/commit files with at most 80 screen columns. Or at most 79
screen columns for others. Or at most 74 columns for GNU texinfo files.
The option '-L' was intended as a fast check for this metric.

The original mail, sent to bug-gnu-utils on 1997-10-31, had this explanation:

  While GNU wc returns the vertical extent of a piece of text - i.e. the
   number of lines - it does not yet return the horizontal extent of a piece
   of text - i.e. the number of columns. This is a useful functionality, if
   you want to know

 - whether a text will fit on the paper when sent to the printer,
 - whether an email exceeds the recommended 72 character limit,
 - (in combination with nm) how long the identifiers were that made
   `ranlib' dump core,
 - etc.

I propose a clarification in the documentation (see below).

 I'm tempted to make the change, but it seems too drastic, after 11 years.
 Do any of you rely on the current TAB-counting behavior of GNU wc?
 
 Bruno, what do you think?

If you change the option to count every tab as 1, or every character as 1
regardless of its screen width, the option -L is not usable for its main
purpose any more.

Bruno


2008-08-22  Bruno Haible  [EMAIL PROTECTED]

* doc/coreutils.texi (wc invocation): Explain what the option -L
measures.

--- coreutils.texi.bak  2008-08-22 23:55:47.0 +0200
+++ coreutils.texi  2008-08-22 23:59:03.0 +0200
@@ -3137,7 +3137,9 @@
 
 With the @option{--max-line-length} option, @command{wc} prints the length
 of the longest line per file, and if there is more than one file it
-prints the maximum (not the sum) of those lengths.
+prints the maximum (not the sum) of those lengths.  The line lengths here
+are measured in screen columns, according to the current locale and
+assuming tab positions in every 8th column.
 
 The program accepts the following options.  Also see @ref{Common options}.
 



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils