Re: wc(1): add -L flag to write length of longest line

2022-10-01 Thread Sebastian Benoit
Theo de Raadt(dera...@openbsd.org) on 2022.09.30 11:11:42 -0600:
> I'm sure there are other people have other desireable features which I
> haven't listed. For instance, could wc.c be the scaffold to use for the
> long-desired web browser to be included in OpenBSD?

Oh, it's clearly incomplete until it can send its result by mail.



Re: wc(1): add -L flag to write length of longest line

2022-10-01 Thread Marc Espie
On Fri, Sep 30, 2022 at 02:22:34AM +0200, Joerg Sonnenberger wrote:
> On Thu, Sep 29, 2022 at 08:39:16PM +1000, Jonathan Gray wrote:
> > wc counts items in files.  Finding the longest item indeed sounds
> > like a task better suited to awk.
> 
> Finding outliers, means and counting are all parts of the same basic
> class of operations. A good implementation of all of them requires
> constant space and a fixed time per input character. An implementation
> in awk will generally not have that property.

Why ? is awk really that bad at managing memory ?

Anyway, we have perl in base, and it will definitely fit the bill.

Heck, I can do it in shell with constant space and a fixed time per input
character.

Real world computing 101: complexity is not all that matters, the constant
in O(1) is often what will actually kill you.

Real world computing 102: any utility left unfettered for long enough will
grow a lisp interpreter and go into a mud-fight with emacs.



Re: wc(1): add -L flag to write length of longest line

2022-09-30 Thread Theo de Raadt
Todd C. Miller  wrote:

> On Thu, 29 Sep 2022 23:30:54 -0400, Daniel Dickman wrote:
> 
> > > On Sep 29, 2022, at 8:24 PM, Joerg Sonnenberger  wrote:
> > > 
> > > On Thu, Sep 29, 2022 at 08:39:16PM +1000, Jonathan Gray wrote:
> > >> wc counts items in files.  Finding the longest item indeed sounds
> > >> like a task better suited to awk.
> >
> > Doesn’t gnu wc show that tabs have length 8 rather than length 1?
> 
> Yes.
> 
> > Do the other wc implementations differ?
> 
> FreeBSD and NetBSD "wc -L" counts it as a single character.

How about if I want a feature that finds the shortest line?
Will that be -S?

And what about a feature to count dangling whitespace at the end of
lines?  -W looks available for use. Actually there are many flag
characters available, because wc hasn't jumped the shark yet, as ls did.

Imagine the eventual synopsis, it kind of rolls off the tongue

NAME
wc - word, line, and byte or character, or longest or shortest
 line, or dangling whitespace count

I'm looking forward to wc being able to edit files, and for further
extensibility because awk is slow, it can include a lisp interpreter.
Or maybe it should compile and run rust programs?  Safer that way.

I'm sure there are other people have other desireable features which I
haven't listed. For instance, could wc.c be the scaffold to use for the
long-desired web browser to be included in OpenBSD?



Re: wc(1): add -L flag to write length of longest line

2022-09-30 Thread Todd C . Miller
On Thu, 29 Sep 2022 23:30:54 -0400, Daniel Dickman wrote:

> > On Sep 29, 2022, at 8:24 PM, Joerg Sonnenberger  wrote:
> > 
> > On Thu, Sep 29, 2022 at 08:39:16PM +1000, Jonathan Gray wrote:
> >> wc counts items in files.  Finding the longest item indeed sounds
> >> like a task better suited to awk.
>
> Doesn’t gnu wc show that tabs have length 8 rather than length 1?

Yes.

> Do the other wc implementations differ?

FreeBSD and NetBSD "wc -L" counts it as a single character.

 - todd



Re: wc(1): add -L flag to write length of longest line

2022-09-29 Thread Daniel Dickman



> On Sep 29, 2022, at 8:24 PM, Joerg Sonnenberger  wrote:
> 
> On Thu, Sep 29, 2022 at 08:39:16PM +1000, Jonathan Gray wrote:
>> wc counts items in files.  Finding the longest item indeed sounds
>> like a task better suited to awk.

Doesn’t gnu wc show that tabs have length 8 rather than length 1?

Do the other wc implementations differ?

> 
> Finding outliers, means and counting are all parts of the same basic
> class of operations. A good implementation of all of them requires
> constant space and a fixed time per input character. An implementation
> in awk will generally not have that property.

Did you run any benchmarks to check this? I’m not doubting you but just 
wondering if there’s a speed difference that matters in practice.




> 
> Joerg
> 



Re: wc(1): add -L flag to write length of longest line

2022-09-29 Thread Joerg Sonnenberger
On Thu, Sep 29, 2022 at 08:39:16PM +1000, Jonathan Gray wrote:
> wc counts items in files.  Finding the longest item indeed sounds
> like a task better suited to awk.

Finding outliers, means and counting are all parts of the same basic
class of operations. A good implementation of all of them requires
constant space and a fixed time per input character. An implementation
in awk will generally not have that property.

Joerg



Re: wc(1): add -L flag to write length of longest line

2022-09-29 Thread Jonathan Gray
On Thu, Sep 29, 2022 at 08:57:04AM +, Job Snijders wrote:
> Hi all,
> 
> I often find myself piping data through ... | awk '{print length}' | ...
> I figured there should be a more direct way that requires less typing.
> Perhaps other developers have a similar itch? 
> 
> The FreeBSD, NetBSD, Dragonfly, and GNU variants of the wc(1) utility
> have a similar -L feature.

That isn't an argument for merit or good taste.  Choice of flag, sure.

wc counts items in files.  Finding the longest item indeed sounds
like a task better suited to awk.

> 
> Kind regards,
> 
> Job
> 
> Index: wc.1
> ===
> RCS file: /cvs/src/usr.bin/wc/wc.1,v
> retrieving revision 1.27
> diff -u -p -r1.27 wc.1
> --- wc.1  24 Oct 2016 13:46:58 -  1.27
> +++ wc.1  21 Sep 2022 15:47:29 -
> @@ -41,7 +41,7 @@
>  .Sh SYNOPSIS
>  .Nm wc
>  .Op Fl c | m
> -.Op Fl hlw
> +.Op Fl hLlw
>  .Op Ar
>  .Sh DESCRIPTION
>  The
> @@ -68,6 +68,14 @@ is written to the standard output.
>  Use unit suffixes: Byte, Kilobyte, Megabyte, Gigabyte, Terabyte,
>  Petabyte, and Exabyte in order to reduce the number of digits to four or 
> fewer
>  using powers of 2 for sizes (K=1024, M=1048576, etc.).
> +.It Fl L
> +Write the length of the longest line to the standard output.
> +Length is the number of bytes counted, or the number of characters if the
> +.Fl m
> +flag is specified.
> +If more than one input file is specified,
> +the length of the longest line of all files is reported as the value of
> +.Qq total .
>  .It Fl l
>  The number of lines in each input file
>  is written to the standard output.
> @@ -128,9 +136,9 @@ utility is compliant with the
>  .St -p1003.1-2008
>  specification.
>  .Pp
> -The flag
> -.Op Fl h
> -is an extension to that specification.
> +The flags
> +.Op Fl Lh
> +are extensions to that specification.
>  .Sh HISTORY
>  A
>  .Nm
> Index: wc.c
> ===
> RCS file: /cvs/src/usr.bin/wc/wc.c,v
> retrieving revision 1.30
> diff -u -p -r1.30 wc.c
> --- wc.c  2 Sep 2022 15:21:40 -   1.30
> +++ wc.c  21 Sep 2022 15:47:29 -
> @@ -44,12 +44,12 @@
>  
>  #define  _MAXBSIZE (64 * 1024)
>  
> -int64_t  tlinect, twordct, tcharct;
> -int  doline, doword, dochar, humanchar, multibyte;
> +int64_t  tlinect, twordct, tcharct, tlongest;
> +int  doline, doword, dochar, dolongest, humanchar, multibyte;
>  int  rval;
>  extern char *__progname;
>  
> -static void print_counts(int64_t, int64_t, int64_t, const char *);
> +static void print_counts(int64_t, int64_t, int64_t, int64_t, const char *);
>  static void format_and_print(int64_t);
>  static void cnt(const char *);
>  
> @@ -63,8 +63,11 @@ main(int argc, char *argv[])
>   if (pledge("stdio rpath", NULL) == -1)
>   err(1, "pledge");
>  
> - while ((ch = getopt(argc, argv, "lwchm")) != -1)
> + while ((ch = getopt(argc, argv, "Llwchm")) != -1)
>   switch(ch) {
> + case 'L':
> + dolongest = 1;
> + break;
>   case 'l':
>   doline = 1;
>   break;
> @@ -84,7 +87,7 @@ main(int argc, char *argv[])
>   case '?':
>   default:
>   fprintf(stderr,
> - "usage: %s [-c | -m] [-hlw] [file ...]\n",
> + "usage: %s [-c | -m] [-hLlw] [file ...]\n",
>   __progname);
>   return 1;
>   }
> @@ -96,7 +99,7 @@ main(int argc, char *argv[])
>* if you don't get any arguments, you have to turn them
>* all on.
>*/
> - if (!doline && !doword && !dochar)
> + if (!doline && !doword && !dochar && !dolongest)
>   doline = doword = dochar = 1;
>  
>   if (!*argv) {
> @@ -109,7 +112,8 @@ main(int argc, char *argv[])
>   } while(*++argv);
>  
>   if (dototal)
> - print_counts(tlinect, twordct, tcharct, "total");
> + print_counts(tlinect, twordct, tcharct, tlongest,
> + "total");
>   }
>  
>   return rval;
> @@ -127,11 +131,11 @@ cnt(const char *path)
>   wchar_t wc;
>   short gotsp;
>   ssize_t len;
> - int64_t linect, wordct, charct;
> + uint64_t linect, wordct, charct, longct, tmpll;
>   struct stat sbuf;
>   int fd;
>  
> - linect = wordct = charct = 0;
> + linect = wordct = charct = longct = tmpll = 0;
>   stream = NULL;
>   if (path != NULL) {
>   file = path;
> @@ -180,12 +184,19 @@ cnt(const char *path)
>* faster to get lines than to get words, since
>* the word count requires some logic.
>*/
> - else if (doline) {
> + else if (doline || dolongest) {
>   while ((len = read(fd, buf, _MAXBSIZE)) 

wc(1): add -L flag to write length of longest line

2022-09-29 Thread Job Snijders
Hi all,

I often find myself piping data through ... | awk '{print length}' | ...
I figured there should be a more direct way that requires less typing.
Perhaps other developers have a similar itch? 

The FreeBSD, NetBSD, Dragonfly, and GNU variants of the wc(1) utility
have a similar -L feature.

Kind regards,

Job

Index: wc.1
===
RCS file: /cvs/src/usr.bin/wc/wc.1,v
retrieving revision 1.27
diff -u -p -r1.27 wc.1
--- wc.124 Oct 2016 13:46:58 -  1.27
+++ wc.121 Sep 2022 15:47:29 -
@@ -41,7 +41,7 @@
 .Sh SYNOPSIS
 .Nm wc
 .Op Fl c | m
-.Op Fl hlw
+.Op Fl hLlw
 .Op Ar
 .Sh DESCRIPTION
 The
@@ -68,6 +68,14 @@ is written to the standard output.
 Use unit suffixes: Byte, Kilobyte, Megabyte, Gigabyte, Terabyte,
 Petabyte, and Exabyte in order to reduce the number of digits to four or fewer
 using powers of 2 for sizes (K=1024, M=1048576, etc.).
+.It Fl L
+Write the length of the longest line to the standard output.
+Length is the number of bytes counted, or the number of characters if the
+.Fl m
+flag is specified.
+If more than one input file is specified,
+the length of the longest line of all files is reported as the value of
+.Qq total .
 .It Fl l
 The number of lines in each input file
 is written to the standard output.
@@ -128,9 +136,9 @@ utility is compliant with the
 .St -p1003.1-2008
 specification.
 .Pp
-The flag
-.Op Fl h
-is an extension to that specification.
+The flags
+.Op Fl Lh
+are extensions to that specification.
 .Sh HISTORY
 A
 .Nm
Index: wc.c
===
RCS file: /cvs/src/usr.bin/wc/wc.c,v
retrieving revision 1.30
diff -u -p -r1.30 wc.c
--- wc.c2 Sep 2022 15:21:40 -   1.30
+++ wc.c21 Sep 2022 15:47:29 -
@@ -44,12 +44,12 @@
 
 #define_MAXBSIZE (64 * 1024)
 
-int64_ttlinect, twordct, tcharct;
-intdoline, doword, dochar, humanchar, multibyte;
+int64_ttlinect, twordct, tcharct, tlongest;
+intdoline, doword, dochar, dolongest, humanchar, multibyte;
 intrval;
 extern char *__progname;
 
-static void print_counts(int64_t, int64_t, int64_t, const char *);
+static void print_counts(int64_t, int64_t, int64_t, int64_t, const char *);
 static void format_and_print(int64_t);
 static void cnt(const char *);
 
@@ -63,8 +63,11 @@ main(int argc, char *argv[])
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
 
-   while ((ch = getopt(argc, argv, "lwchm")) != -1)
+   while ((ch = getopt(argc, argv, "Llwchm")) != -1)
switch(ch) {
+   case 'L':
+   dolongest = 1;
+   break;
case 'l':
doline = 1;
break;
@@ -84,7 +87,7 @@ main(int argc, char *argv[])
case '?':
default:
fprintf(stderr,
-   "usage: %s [-c | -m] [-hlw] [file ...]\n",
+   "usage: %s [-c | -m] [-hLlw] [file ...]\n",
__progname);
return 1;
}
@@ -96,7 +99,7 @@ main(int argc, char *argv[])
 * if you don't get any arguments, you have to turn them
 * all on.
 */
-   if (!doline && !doword && !dochar)
+   if (!doline && !doword && !dochar && !dolongest)
doline = doword = dochar = 1;
 
if (!*argv) {
@@ -109,7 +112,8 @@ main(int argc, char *argv[])
} while(*++argv);
 
if (dototal)
-   print_counts(tlinect, twordct, tcharct, "total");
+   print_counts(tlinect, twordct, tcharct, tlongest,
+   "total");
}
 
return rval;
@@ -127,11 +131,11 @@ cnt(const char *path)
wchar_t wc;
short gotsp;
ssize_t len;
-   int64_t linect, wordct, charct;
+   uint64_t linect, wordct, charct, longct, tmpll;
struct stat sbuf;
int fd;
 
-   linect = wordct = charct = 0;
+   linect = wordct = charct = longct = tmpll = 0;
stream = NULL;
if (path != NULL) {
file = path;
@@ -180,12 +184,19 @@ cnt(const char *path)
 * faster to get lines than to get words, since
 * the word count requires some logic.
 */
-   else if (doline) {
+   else if (doline || dolongest) {
while ((len = read(fd, buf, _MAXBSIZE)) > 0) {
charct += len;
-   for (C = buf; len--; ++C)
-   if (*C == '\n')
+   for (C = buf; len--; ++C) {
+   if (*C == '\n') {
+   if (tmpll > longct)
+