bug#69369: wc -w ignores breaking space over UCHAR_MAX

2024-02-25 Thread Pádraig Brady
On 24/02/2024 20:44, Aearil via GNU coreutils Bug Reports wrote: Hi, wc -w doesn't seem to recognize whitespace characters with a codepoint over UCHAR_MAX (255) as word separators. For example, using the character EM SPACE U+2003: $ printf "foo\u2003bar" | ./wc -w 1 I should get a

bug#69369: wc -w ignores breaking space over UCHAR_MAX

2024-02-24 Thread Aearil via GNU coreutils Bug Reports
Hi, wc -w doesn't seem to recognize whitespace characters with a codepoint over UCHAR_MAX (255) as word separators. For example, using the character EM SPACE U+2003: $ printf "foo\u2003bar" | ./wc -w 1 I should get a word count of 2, but instead the space is ignored while coun

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-15 Thread Dave Hansen
uggest is adding some text that a user might actually see if they hit this issue, like: 'wc -l' no longer crashes with "Illegal instruction" messages on x86 Linux kernels that disable XSAVE YMM. [bug introduced in coreutils-9.0]

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-14 Thread Pádraig Brady
EWS b/NEWS index 3350f9871..535850549 100644 --- a/NEWS +++ b/NEWS @@ -29,7 +29,8 @@ GNU coreutils NEWS-*- outline -*- 'pr --length=1 --double-space' no longer enters an infinite loop. [This bug was present in "the beginning".] - 'wc -l' no l

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-14 Thread Paul Eggert
ert Date: Wed, 14 Jun 2023 14:18:42 -0700 Subject: [PATCH 2/3] =?UTF-8?q?cksum,wc:=20don=E2=80=99t=20include=20?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * src/cksum.c [!CRCTAB && USE_PCLMUL_CRC32]: * src/wc.c [USE_AVX2_WC_LINECOUNT]: Do

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-14 Thread Axel Beckert
Control: tag 1037264 + patch Hi Pádraig, On Wed, Jun 14, 2023 at 11:46:58AM +0100, Pádraig Brady wrote: > On 14/06/2023 05:14, Paul Eggert wrote: > > Thanks for the bug report. I installed the attached patch into coreutils > > on Savannah. It builds on your idea with several other changes: > >

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-14 Thread Pádraig Brady
On 14/06/2023 05:14, Paul Eggert wrote: Thanks for the bug report. I installed the attached patch into coreutils on Savannah. It builds on your idea with several other changes: * There's a similar issue with cksum.c and pclmul. * configure.ac can be simplified, since it seems there's no point

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-13 Thread Paul Eggert
91a74d361461494dd546467e83bc36c24185d6e7 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 13 Jun 2023 21:10:24 -0700 Subject: [PATCH] wc: port to kernels that disable XSAVE YMM Problem reported by Dave Hansen <https://bugs.gnu.org/64058>. Apply similar change to cksum and pclmul, too. * NEWS: Mention

bug#64058: [PATCH] wc: Fix crashes due to incomplete AVX2 enumeration

2023-06-13 Thread Dave Hansen
The AVX2 enumeration for 'wc -l' is incomplete which may cause wc to crash. The Intel SDM documents the whole AVX2 enumeration sequence in its "Detection of Intel AVX2" section. There are three pieces: 1. Ensuring the CPU supports AVX2 instructions 2. Ensuring the OS has enable

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-06 Thread Paul Eggert
On 2/6/23 11:38, Pádraig Brady wrote: Note also if you really want to read, you can always `cat | wc -c` rather than just `wc -c` Even that's not guaranteed, as 'cat' is not required to use the 'read' system call if it can determine that the standard input contains only NULs without calling

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-06 Thread Pádraig Brady
On 06/02/2023 06:27, Stephane Chazelas wrote: On 2023-02-05 20:59, Paul Eggert wrote: On 2023-02-05 11:59, Pádraig Brady wrote: [...] Let's leave that as-is, please. If 'wc' can output the correct value without reading its input, POSIX does not require 'wc' to do the read, and it seems

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Stephane Chazelas
On 2023-02-05 20:59, Paul Eggert wrote: On 2023-02-05 11:59, Pádraig Brady wrote: [...] Let's leave that as-is, please. If 'wc' can output the correct value without reading its input, POSIX does not require 'wc' to do the read, and it seems perverse to modify 'wc' to go to the effort to refuse

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Paul Eggert
On 2023-02-05 11:59, Pádraig Brady wrote: Hopefully the attached addresses this. Thanks for fixing that. Note it doesn't add the constraint on the input being readable, which I'll think a bit more about. Let's leave that as-is, please. If 'wc' can output the correct value without reading

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Pádraig Brady
On 05/02/2023 18:27, Stephane Chazelas wrote: "wc -c" without filename arguments is meant to read stdin til EOF and report the number of bytes it has read. When stdin is on a regular file, GNU wc has that optimisation whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) t

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Stephane Chazelas
"wc -c" without filename arguments is meant to read stdin til EOF and report the number of bytes it has read. When stdin is on a regular file, GNU wc has that optimisation whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) to find out its current position within the fil

bug#47702: wc man page: first you are talking about bytes, then you are talking about characters

2021-04-11 Thread Pádraig Brady
On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote: Man wc says Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space. first you are talking about

bug#47702: wc man page: first you are talking about bytes, then you are talking about characters

2021-04-10 Thread 積丹尼 Dan Jacobson
Man wc says Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space. first you are talking about bytes, then you are talking about characters. So

bug#46346: wc --human-readable or --verbose

2021-02-06 Thread Chris Elvidge
On 06/02/2021 01:38 pm, 積丹尼 Dan Jacobson wrote: wc needs a --verbose option. Else one is forced to do: $ file=e.html; echo $file:; for i in bytes chars lines words; do echo -en $i:\\t; wc --$i < $file; done e.html: bytes: 31655 chars: 29141 lines: 643 words: 1275 I mean sometimes

bug#46346: wc --human-readable or --verbose

2021-02-06 Thread Pádraig Brady
On 06/02/2021 13:38, 積丹尼 Dan Jacobson wrote: wc needs a --verbose option. Else one is forced to do: $ file=e.html; echo $file:; for i in bytes chars lines words; do echo -en $i:\\t; wc --$i < $file; done e.html: bytes: 31655 chars: 29141 lines: 643 words: 1275 I mean sometimes we w

bug#46346: wc --human-readable or --verbose

2021-02-06 Thread 積丹尼 Dan Jacobson
wc needs a --verbose option. Else one is forced to do: $ file=e.html; echo $file:; for i in bytes chars lines words; do echo -en $i:\\t; wc --$i < $file; done e.html: bytes: 31655 chars: 29141 lines: 643 words: 1275 I mean sometimes we want to send the output to a real person, and curren

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)

2019-08-20 Thread Assaf Gordon
tag 37093 notabug close 37093 stop Hello, On 2019-08-19 10:44 p.m., Edward Huff wrote: In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. [...] baseline results: $ dd if=/dev/zero

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)

2019-08-20 Thread Bernhard Voelker
On 8/20/19 6:44 AM, Edward Huff wrote: > In the demo below, dd uses 0.665s to write 1GiB of zeros. > sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. > wc uses 32.160s to count 1GiB of zeros. > > Linux localhost 5.2.8-200.fc30.x86_64 #1 SMP Sat Aug 10 13:21:39 UT

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)

2019-08-19 Thread Edward Huff
In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. Linux localhost 5.2.8-200.fc30.x86_64 #1 SMP Sat Aug 10 13:21:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux coreutils-8.31-2.fc30.x86_64

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-03-09 Thread Pádraig Brady
On 09/03/19 05:52, Bruno Haible wrote: > Hi Pádraig, > >>>> In regard to options for enabling various behaviors for wc(1), >>>> I'm thinking we might keep the strict POSIX isspace() behavior >>>> with LC_CTYPE=C and/or POSIXLY_CORRECT=1, and use iswnbs

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-03-09 Thread Bruno Haible
Hi Pádraig, > >> In regard to options for enabling various behaviors for wc(1), > >> I'm thinking we might keep the strict POSIX isspace() behavior > >> with LC_CTYPE=C and/or POSIXLY_CORRECT=1, and use iswnbspace() > >> by default Since you plan to ad

bug#34664: Mention wc default action on man page

2019-02-28 Thread 積丹尼 Dan Jacobson
Well > Print newline, word, and byte counts might sound like > Play movies, audio files, and even ... Sort of like a list of capabilities. > "PB" == Pádraig Brady writes: PB> I'm confused. It says 'bytes', and means it. PB> One has to specify -m to select characters. But OK, fine.

bug#34664: Mention wc default action on man page

2019-02-27 Thread Pádraig Brady
tag 34664 notabug close 34664 stop On 26/02/19 02:47, 積丹尼 Dan Jacobson wrote: > INFO says >By default, ‘wc’ prints three counts: the newline, words, and byte > and wc --help even more so. > > Alas man wc doesn't say if what we are looking at is bytes or > characters,

bug#34664: Mention wc default action on man page

2019-02-26 Thread 積丹尼 Dan Jacobson
INFO says By default, ‘wc’ prints three counts: the newline, words, and byte and wc --help even more so. Alas man wc doesn't say if what we are looking at is bytes or characters, so kindly mention the default on the man page!

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-25 Thread Pádraig Brady
On 24/02/19 19:55, Pádraig Brady wrote: > On 24/02/19 17:07, Pádraig Brady wrote: >> So non break space is generally considered a word delimiter, >> though there are complications you detail from unicode. >> >> In regard to options for enabling various behaviors for wc(1

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Pádraig Brady
On 24/02/19 17:07, Pádraig Brady wrote: > So non break space is generally considered a word delimiter, > though there are complications you detail from unicode. > > In regard to options for enabling various behaviors for wc(1), > I'm thinking we might keep the strict POSIX iss

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Pádraig Brady
On 24/02/19 05:58, Bruno Haible wrote: > [Ccing bug-libunistring, because this is about Unicode handling in GNU. The > original thread is in <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34524>.] > >>> The man page for wc states: "A word is a... sequence of chara

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Paul Eggert
Bruno Haible wrote: I would find it best to introduce an option '--unicode' to 'wc', that would produce Unicode compliant results, at the cost of - not following POSIX to the letter, It'd make sense to have an option. How about a more-general option --words, that would let the user define

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Bruno Haible
[Ccing bug-libunistring, because this is about Unicode handling in GNU. The original thread is in <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34524>.] > > The man page for wc states: "A word is a... sequence of characters > > delimited by white space." > >

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-23 Thread Pádraig Brady
On 18/02/19 00:12, vampyre...@gmail.com wrote: > $ wc --version > wc (GNU coreutils) 8.29 > Packaged by Gentoo (8.29-r1 (p1.0)) > > The man page for wc states: "A word is a... sequence of characters delimited > by white space." > > But its concept of white

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-22 Thread Bob Proulx
vampyre...@gmail.com wrote: > The man page for wc states: "A word is a... sequence of characters delimited > by white space." > > But its concept of white space only seems to include ASCII white > space. U+00A0 NO-BREAK SPACE, for instance, is not recognized.

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-18 Thread vampyrebat
$ wc --version wc (GNU coreutils) 8.29 Packaged by Gentoo (8.29-r1 (p1.0)) The man page for wc states: "A word is a... sequence of characters delimited by white space." But its concept of white space only seems to include ASCII white space. U+00A0 NO-BREAK SPACE, fo

bug#23441: mention wc defaults more on man page

2018-10-27 Thread Assaf Gordon
close 23441 stop (triaging old bugs) On 2016-05-03 9:14 p.m., 積丹尼 Dan Jacobson wrote: On the man page mention if the default if no arguments are given is wc --bytes --words --lines It seems your message was lost and not answered to in 2 years. Sorry about that. The first line of 'man wc

bug#20120: wc output padding differs when "-" is in the file list

2018-10-22 Thread Assaf Gordon
tags 20120 wontfix close 20120 stop (triaging old bugs) On 19/03/15 04:38 AM, Pádraig Brady wrote: On 18/03/15 17:54, Bernhard Voelker wrote: On 03/16/2015 06:42 AM, Eric Mrak wrote: It seems that whenever STDIN is involved the results padding reverts to the BSD-style 7/8 padding. Thanks

bug#28468: Bug in wc -l found

2017-09-15 Thread Assaf Gordon
tag 28468 notabug stop Hello Rob, On 2017-09-15 03:03 AM, Weidner, Robert (I/EE-31, extern) wrote: > seems I found a bug in wc, have a look: [[ the attach screen shot shows: $ wc -l monitore-serNr_all-run2.txt 16 while the attached file appears to have 17 lines. ]] This is not a

bug#28468: Bug in wc -l found

2017-09-15 Thread Ruediger Meier
On Friday 15 September 2017, Weidner, Robert (I/EE-31, extern) wrote: > Dear GNU Team, > > seems I found a bug in wc, have a look: > > [cid:image001.png@01D32E12.3F5A7C20] > > Despite of it, I really want to say a BIG Thank you for great > tool-set, especially tree, whic

bug#28468: Bug in wc -l found

2017-09-15 Thread Weidner, Robert (I/EE-31, extern)
Dear GNU Team, seems I found a bug in wc, have a look: [cid:image001.png@01D32E12.3F5A7C20] Despite of it, I really want to say a BIG Thank you for great tool-set, especially tree, which I use for 20 years now! THX Rob Mit freundlichen Gruessen Robert Weidner FAS Architektur / zFAS

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-20 Thread Bernhard Voelker
On 12/20/2016 02:12 PM, Pádraig Brady wrote: > Right! > > While st_size would have been incorrect for subsequent > files since v7.1, it was only used since v8.24. > > Fixed with: > http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68 Thanks! Have a nice day, Berny

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-20 Thread Pádraig Brady
On 20/12/16 01:50, Bernhard Voelker wrote: > On 12/19/2016 08:00 PM, Pádraig Brady wrote: >> + [bug introduced in coreutils-7.1] > > FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0: > I had a working 8.23 on a system here, so I took the time to search deeper. > I found the

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread Bernhard Voelker
On 12/19/2016 08:00 PM, Pádraig Brady wrote: > + [bug introduced in coreutils-7.1] FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0: I had a working 8.23 on a system here, so I took the time to search deeper. I found the reason to be the wrong value of the 'hi_pos' parameter

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread William R. Fraser
Looks good :) On Mon, Dec 19, 2016 at 11:00 AM, Pádraig Brady <p...@draigbrady.com> wrote: > On 21/03/16 15:16, Pádraig Brady wrote: > > On 21/03/16 00:59, William R. Fraser wrote: > >> When wc gets its list of files by reading from stdin, using the argument > >

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread Pádraig Brady
On 21/03/16 15:16, Pádraig Brady wrote: > On 21/03/16 00:59, William R. Fraser wrote: >> When wc gets its list of files by reading from stdin, using the argument >> '--from-files0=-', it reuses the same fstatus struct for each file. >> >> The problem is that the 'wc'

bug#24532: GNU wc --lines doesn't report last line when that doesn't end on a new-line.

2016-09-24 Thread Paul Eggert
Carlo Wood wrote: You can argue that this is a feature, but I consider it a bug for all practical purposes. POSIX requires that wc -l must just count newlines, so it is indeed a feature. If wc -l also counted incomplete lines at the end of a file, this would result in counterintuitive

bug#24532: GNU wc --lines doesn't report last line when that doesn't end on a new-line.

2016-09-24 Thread Carlo Wood
You can argue that this is a feature, but I consider it a bug for all practical purposes. A text file might be REQUIRED to end on a EOL sequence (ie, '\n' for linux), in which case wc --lines works, but consider for a moment a (otherwise) text file where the last line does not end on a new-line

bug#23441: mention wc defaults more on man page

2016-05-03 Thread 積丹尼 Dan Jacobson
On the man page mention if the default if no arguments are given is wc --bytes --words --lines

bug#23190: wc - Different output

2016-04-02 Thread Assaf Gordon
tags 23190 notabug close 23190 thanks Hello Seva, On 04/01/2016 06:02 PM, Seva Adari wrote: I am not sure if this a bug or expected behavior! Here is different output from each run variation of wc invocation: wc -l test.txt Output: 20 awk

bug#23190: wc - Different output

2016-04-02 Thread Seva Adari
Hello, I am not sure if this a bug or expected behavior! Here is different output from each run variation of wc invocation: wc -l test.txt Output: 20 awk '{print $0}' /tmp/test.txt | wc -l Output: 21 cut /tmp/test.txt -f1 | wc -l

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Jim Meyering
On Sun, Mar 20, 2016 at 5:59 PM, William R. Fraser <wfra...@codewise.org> wrote: > When wc gets its list of files by reading from stdin, using the argument > '--from-files0=-', it reuses the same fstatus struct for each file. > > The problem is that the 'wc' function checks t

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Bernhard Voelker
On 03/21/2016 04:16 PM, Pádraig Brady wrote: On 21/03/16 00:59, William R. Fraser wrote: When wc gets its list of files by reading from stdin, using the argument '--from-files0=-', it reuses the same fstatus struct for each file. The problem is that the 'wc' function checks the 'failed' member

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Pádraig Brady
On 21/03/16 00:59, William R. Fraser wrote: When wc gets its list of files by reading from stdin, using the argument '--from-files0=-', it reuses the same fstatus struct for each file. The problem is that the 'wc' function checks the 'failed' member of this struct and if it is <=0, it sk

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-20 Thread William R. Fraser
When wc gets its list of files by reading from stdin, using the argument '--from-files0=-', it reuses the same fstatus struct for each file. The problem is that the 'wc' function checks the 'failed' member of this struct and if it is <=0, it skips doing fstat on the file. The main l

bug#21636: a question of the “wc” program

2015-10-06 Thread JameDam
 I have a file which is named “-l”, and I use the wc to count the file, undesirably wc requested the standard input rather than my file "-l" ,although I use it through the commandwc -lw -lit didn't work as my will, however, linux itself do not limit this kind of name style, but w

bug#21636: a question of the “wc” program

2015-10-06 Thread Stephane Chazelas
2015-10-06 10:01:15 -0600, Eric Blake: > tag 21636 notabug > thanks > > On 10/06/2015 04:49 AM, JameDam wrote: > > I have a file which is named *“-l”*, and I use the wc to count the file, > > undesirably wc requested the standard input rather than my file *"-

bug#21636: a question of the “wc” program

2015-10-06 Thread Eric Blake
tag 21636 notabug thanks On 10/06/2015 04:49 AM, JameDam wrote: > I have a file which is named *“-l”*, and I use the wc to count the file, > undesirably wc requested the standard input rather than my file *"-l"* > ,although > I use it through the command > */wc -lw -l/

bug#20954: wc - linux

2015-07-05 Thread Bob Proulx
tele wrote: Maybe we did not understand. I don't want change old definitions but create new option for wc or echo, because this above examples not make logic sense, What would such an option do? ( and it I want fix, however with sed is also fixed ) Your original message asked if echo | wc

bug#20954: wc - linux

2015-07-03 Thread tele
tag 20954 + notabug close 20954 thanks Maybe we did not understand. I don't want change old definitions but create new option for wc or echo, because this above examples not make logic sense, ( and it I want fix, however with sed is also fixed ) however now Iunderstand that they work

bug#20954: wc - linux

2015-07-02 Thread Stephane Chazelas
2015-07-01 19:41:00 -0600, Bob Proulx: [...] $ a= ; echo $s | wc -l 1 [...] No. Should be 1. You have forgotten about the newline at the end of the command. The echo will terminate with a newline. [...] Leaving a variable unquoted will also cause the shell to apply the split+glob

bug#20954: wc - linux

2015-07-02 Thread tele
tag 20954 + notabug close 20954 thanks tele wrote: Hi! Hi! From terminal: $ a= ; echo $s | wc -l 1 Do you mean $a instead of $s? Either way is the same though assuming $s is empty too. - Yes, my mistake :-) Should be 0 , yes ? No. Should be 1. You have forgotten about

bug#20954: wc - linux

2015-07-02 Thread Bob Proulx
the same. but wc -l can count only from new line, so if something exist inside first line wc -l can not count. :-( wc -l counts newlines. That is the task that it was constructed to do. That is exactly what it does. No more and no less. What is a text line? A text line by definition ends

bug#20954: wc - linux

2015-07-01 Thread Bob Proulx
tag 20954 + notabug close 20954 thanks tele wrote: Hi! Hi! :-) From terminal: $ a= ; echo $s | wc -l 1 Do you mean $a instead of $s? Either way is the same though assuming $s is empty too. Should be 0 , yes ? No. Should be 1. You have forgotten about the newline at the end

bug#20954: wc - linux

2015-07-01 Thread tele
Hi! From terminal: $ a= ; echo $s | wc -l 1 Should be 0 , yes ?

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-07 Thread Valdis Vītoliņš
Thanks for clarification! I tested it with Bash script: chars=$(wc -m mylog|cut -d ' ' -f1) lines=$(wc -l mylog|cut -d ' ' -f1) let chars=$chars - $lines echo $chars and got the same number as given by vim :%s/.//gn (Which was place from what I got confused.) Hopefully this bug description

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-07 Thread Stephane Chazelas
encoding can't count as valid characters. So printf '\300' | wc -m should return 0 as 1100 alone is not a valid character so we can't use your algorithm without first verifying the validity of the input. Then the UTF-8 encoding of the UTF16 surrogate pairs (0xD800 to 0xDFFF) should probably

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Valdis Vītoliņš
://en.wikipedia.org/wiki/Byte_order_mark then count bytes with bits 0xxx and 11xx. You mailed submit@debbugs without specifying a Package:, so your bug report ended up on the help-debbugs list. I have reassigned it to coreutils. (Please note there is no wc package.) (My mailer is messing up

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Pádraig Brady
tag 20751 notabug close 20751 stop On 06/06/15 19:49, Valdis Vītoliņš wrote: Version: wc (GNU coreutils) 8.21 When 'wc -m' is invoked, it should print character count, but it counts incorrectly UTF-8 encoded characters. Attached files have 3, 4 an 6 bytes in them, but all have only two UTF-8

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Glenn Morris
You mailed submit@debbugs without specifying a Package:, so your bug report ended up on the help-debbugs list. I have reassigned it to coreutils. (Please note there is no wc package.) (My mailer is messing up the UTF-8 characters in your report. Interested parties can see the original at http

bug#20120: wc output padding differs when - is in the file list

2015-03-19 Thread Pádraig Brady
formatting also happend in other cases, e.g. for other non-regular files ... $ wc /etc/hosts /dev/null 41 1241355 /etc/hosts 0 0 0 /dev/null 41 1241355 total ... or where stat() returns a wrong value like for /proc files ... $ wc /proc

bug#20120: wc output padding differs when - is in the file list

2015-03-19 Thread Bernhard Voelker
-regular files ... $ wc /etc/hosts /dev/null 41 1241355 /etc/hosts 0 0 0 /dev/null 41 1241355 total ... or where stat() returns a wrong value like for /proc files ... $ wc /proc/cpuinfo x 52 256 1276 /proc/cpuinfo 1 0 1 x 53 256 1277 total

bug#20120: wc output padding differs when - is in the file list

2015-03-16 Thread Eric Mrak
padding. System: Arch Linux (package: core/coreutils 8.23-1) === $ wc --version wc (GNU coreutils) 8.23 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html . This is free software: you are free to change

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-02 Thread Pádraig Brady
On 02/03/15 21:29, Linda Walsh wrote: Jim Meyering wrote: As root: # cd /proc # find -H [^0-9]* -name self -prune -o -name thread-self -prune -o -type f ! -name kmsg ! -name kcore ! -name kpagecount ! -name kpageflags -print0|wc -c --files0-from=- |sort -n Thanks for the report

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-02 Thread Linda Walsh
Jim Meyering wrote: As root: # cd /proc # find -H [^0-9]* -name self -prune -o -name thread-self -prune -o -type f ! -name kmsg ! -name kcore ! -name kpagecount ! -name kpageflags -print0|wc -c --files0-from=- |sort -n Thanks for the report. However, with wc from coreutils-8.23 and a 3.10

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-02 Thread Jim Meyering
On Mon, Mar 2, 2015 at 1:29 PM, Linda Walsh coreut...@tlinx.org wrote: Jim Meyering wrote: As root: # cd /proc # find -H [^0-9]* -name self -prune -o -name thread-self -prune -o -type f ! -name kmsg ! -name kcore ! -name kpagecount ! -name kpageflags -print0|wc -c --files0-from=- |sort

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-02 Thread Eric Blake
On 02/28/2015 01:59 AM, Linda Walsh wrote: (coreutils-8.21-7.7.7) wc -c(bytes) doesn't seem to reliably read the number of bytes in a file. I was wanting to find out what the largest data-source files in '/proc' and '/sys' (didn't get around to trying /sys, since all the files under

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-02 Thread Mike Frysinger
On 02 Mar 2015 06:57, Eric Blake wrote: On 02/28/2015 01:59 AM, Linda Walsh wrote: (coreutils-8.21-7.7.7) wc -c(bytes) doesn't seem to reliably read the number of bytes in a file. I was wanting to find out what the largest data-source files in '/proc' and '/sys' (didn't get around

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-02 Thread Jim Meyering
On Sat, Feb 28, 2015 at 12:59 AM, Linda Walsh coreut...@tlinx.org wrote: (coreutils-8.21-7.7.7) wc -c(bytes) doesn't seem to reliably read the number of bytes in a file. I was wanting to find out what the largest data-source files in '/proc' and '/sys' (didn't get around to trying /sys

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-01 Thread Linda Walsh
Bernhard Voelker wrote: On 02/28/2015 09:59 AM, Linda Walsh wrote: (coreutils-8.21-7.7.7) wc -c(bytes) doesn't seem to reliably read the number of bytes in a file. I was wanting to find out what the largest data-source files in '/proc' and '/sys' (didn't get around to trying /sys, since all

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-03-01 Thread Bernhard Voelker
On 02/28/2015 09:59 AM, Linda Walsh wrote: (coreutils-8.21-7.7.7) wc -c(bytes) doesn't seem to reliably read the number of bytes in a file. I was wanting to find out what the largest data-source files in '/proc' and '/sys' (didn't get around to trying /sys, since all the files under

bug#19969: problem: wc -c doesn't read actual # of bytes in file

2015-02-28 Thread Linda Walsh
(coreutils-8.21-7.7.7) wc -c(bytes) doesn't seem to reliably read the number of bytes in a file. I was wanting to find out what the largest data-source files in '/proc' and '/sys' (didn't get around to trying /sys, since all the files under /proc/sys return 0 bytes. Note -- wc -l doesn't

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-08 Thread Pádraig Brady
On 10/08/2014 02:12 AM, Jim Meyering wrote: On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady p...@draigbrady.com wrote: On 10/08/2014 12:51 AM, Paul Eggert wrote: Paul Eggert wrote: The attached patch still needs a changelog entry and test cases. I wrote those up and pushed the attached patch;

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Paul Eggert
). The attached patch still needs a changelog entry and test cases. The basic idea is to not trust st_size when it's = ST_BLKSIZE. This fixes bugs in 'head', 'od', 'split', 'tac', 'tail', and 'wc' when applied to input files in proc or sysfs file systems. Here's an example bug fixed by this patch

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Paul Eggert
...@cs.ucla.edu Date: Tue, 7 Oct 2014 16:46:08 -0700 Subject: [PATCH] wc: don't miscount /sys and similar file systems Fix similar problems in head, od, split, tac, and tail. Reported by George Shuklin in: http://bugs.gnu.org/18621 * NEWS: Document this. * src/head.c (elseek): Move up. (elide_tail_bytes_pipe

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Pádraig Brady
On 10/08/2014 12:51 AM, Paul Eggert wrote: Paul Eggert wrote: The attached patch still needs a changelog entry and test cases. I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report. I was just going through the patch as it happens, and I

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Jim Meyering
On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady p...@draigbrady.com wrote: On 10/08/2014 12:51 AM, Paul Eggert wrote: Paul Eggert wrote: The attached patch still needs a changelog entry and test cases. I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread George Shuklin
There is many sysfs (linux) attributes which reported as '4k files' but contains just a few bytes. wc file and wc -c shows different sizes. Example: $cat /sys/kernel/vmcoreinfo 1b74c00 1024 $hexdump -Cv /sys/kernel/vmcoreinfo 31 62 37 34 63 30 30 20 31 30 32 34 0a

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Pádraig Brady
On 10/03/2014 03:47 PM, George Shuklin wrote: There is many sysfs (linux) attributes which reported as '4k files' but contains just a few bytes. wc file and wc -c shows different sizes. Example: $cat /sys/kernel/vmcoreinfo 1b74c00 1024 $hexdump -Cv /sys/kernel/vmcoreinfo

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Jim Meyering
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) fstatus-failed = fstat (fd, fstatus-st); if (! fstatus-failed S_ISREG (fstatus-st.st_mode) + fstatus-st.st_blocks (current_pos = lseek (fd, 0, SEEK_CUR)) != -1 (end_pos

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Paul Eggert
On 10/03/2014 11:26 AM, Jim Meyering wrote: That looks like a fine fix. Unfortunately that fix would make 'wc -c' way slower for a file that consists entirely of a big hole. How about if we change usable_st_size to return false for these proc files, with a heuristic as tight as we can

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Pádraig Brady
On 10/03/2014 07:47 PM, Paul Eggert wrote: On 10/03/2014 11:26 AM, Jim Meyering wrote: That looks like a fine fix. Unfortunately that fix would make 'wc -c' way slower for a file that consists entirely of a big hole. True, which you could avoid by deferring to read() for empty files

bug#18585: Bad count in WC command?

2014-09-29 Thread Schapovalov, Sebastian
Hello, I was testing one parameter in my .sh and realised that the wc command return a different value that I was expected. I send you a couple examples. I have to validate a celular telephone number (in Argentina), so, it has to be a 10 chars parameter. sarasa() -- seba echo 1029384756 | wc

bug#18585: Bad count in WC command?

2014-09-29 Thread Eric Blake
tag 18585 notabug thanks On 09/29/2014 07:25 AM, Schapovalov, Sebastian wrote: Hello, I was testing one parameter in my .sh and realised that the wc command return a different value that I was expected. I send you a couple examples. I have to validate a celular telephone number

bug#18585: Bad count in WC command?

2014-09-29 Thread Eric Blake
On 09/29/2014 02:47 PM, Eric Blake wrote: tag 18585 notabug thanks On 09/29/2014 07:25 AM, Schapovalov, Sebastian wrote: Hello, I was testing one parameter in my .sh and realised that the wc command return a different value that I was expected. I send you a couple examples. I have

bug#16561: Bug report for 'head' (and 'wc' et. al.)

2014-01-26 Thread Pádraig Brady
) __ Caracas, Sunday 26th, 2014 Ref: Bug report for 'head' (and 'wc' et. al.) Dear friends: Please find attached the text file 'head-tst.txt' As you easily can see, the following command fails and do not print anything, even if the file has: 6 lines, 49 words and 250 chars

bug#13897: Linux command - wc

2013-03-07 Thread Francisco José Tena
Hello, Using wc command in Linux, we're getting an unexpected behaviour. The next command is returning 5 chars: $ echo TEST | wc -m The same behaviour is returned using the following command: $ echo TEST | wc -c Put the text in a file doesn't change the result. If we type, $ echo TEST | wc

bug#13897: Linux command - wc

2013-03-07 Thread Paul Eggert
On 03/07/13 03:38, Francisco José Tena wrote: The next command is returning 5 chars: $ echo TEST | wc -m Tha's correct, since the 'echo' is outputting 5 characters: T, E, S, T, and newline.

bug#13897: Linux command - wc

2013-03-07 Thread Bob Proulx
Paul Eggert wrote: On 03/07/13 03:38, Francisco José Tena wrote: The next command is returning 5 chars: $ echo TEST | wc -m Tha's correct, since the 'echo' is outputting 5 characters: T, E, S, T, and newline. See also the output from od. $ echo TEST | od -tx1 -c 000 54 45

bug#9449: Bug report on 'wc' : characters count adds one character

2011-09-06 Thread Laurent TARRISSE
Hi, Documentation on 'wc' says: -- wc -m, --chars print the character counts -- But here follows the output I get: echo toto | wc --chars 5 echo five | wc --chars 5 echo four | wc --chars 5

  1   2   3   >