bug#44704: uniq: replace repeated lines with a message about how many repeated lines

2020-11-17 Thread Brian J. Murrell
On Tue, 2020-11-17 at 14:10 -0800, Paul Eggert wrote:
> On 11/17/20 5:32 AM, Brian J. Murrell wrote:
>  > [previous line repeated 4 times]
> 
> uniq -c already does something like that, though it outputs "5"
> instead of "4". 

Right.  I had considered that.  Something like:

$ cat /tmp/in | uniq -c | while read c line; do
> echo $line
> if [ $c -gt 1 ]; then
> echo "Last line repeated $((c-1)) times"
> fi
> done

But that eats leading whitespace on $line.

> Not sure it's worth gussying up 'uniq' to provide exactly the
> functionality 
> requested, as output reformatting is easy enough to do yourself using
> awk or 
> Python or whatever.

Right.  But if I were going to pull out such a big hammer, I'd just
again, eliminate uniq and do everything in awk or Python or whatever.

Anyway, it was just a suggestion.  Doesn't seem like it will go much of
anywhere.  That's fine.  If it really itched me enough, I guess I'd
just submit a patch.

Cheers,
b.



signature.asc
Description: This is a digitally signed message part


bug#44704: uniq: replace repeated lines with a message about how many repeated lines

2020-11-17 Thread Paul Eggert

On 11/17/20 5:32 AM, Brian J. Murrell wrote:
> [previous line repeated 4 times]

uniq -c already does something like that, though it outputs "5" instead of "4". 
Not sure it's worth gussying up 'uniq' to provide exactly the functionality 
requested, as output reformatting is easy enough to do yourself using awk or 
Python or whatever.






bug#44704: uniq: replace repeated lines with a message about how many repeated lines

2020-11-17 Thread Brian J. Murrell
On Tue, 2020-11-17 at 08:05 -0700, Assaf Gordon wrote:
> 
> Hello,

Hi,

> uniq supports the "--group" option, which adds a blank line after
> each
> group of identical lines - this can be used down-stream to process
> groups in any way you want.

But there is no way to have it remove the repeated lines also, correct?

By down-stream process, I feel like you are leaving it up to the down-
stream to remove the duplicate lines as well as add the "repeated %s
times" messages.  Is that correct?

If so, uniq really adds no value.  The down-stream might as well just
do the adjacent line comparison also in such a case.

> And with counting:
> 
> $ cat in | uniq --group=append \
>   | awk 'BEGIN { c = 0 } ;
>  $0=="" { print "Group has " c " lines" ; c=0 ; next } ;
>  1 { print ; c++ }'
>first line
>Group has 1 lines
>second line
>Group has 1 lines
>repeated line
>repeated line
>repeated line
>repeated line
>repeated line
>Group has 5 lines
>third line
>Group has 1 lines

This still doesn't really achieve the original stated goal as the
repeated lines are not being replaced by your "Group has %d lines".

I think once you add the repeated line suppression, you will see that
adding a simple adjacent line comparison and just not using uniq at all
is only slightly incrementally more in the down-stream (which is now
the main).

Cheers,
b.



signature.asc
Description: This is a digitally signed message part


bug#44704: uniq: replace repeated lines with a message about how many repeated lines

2020-11-17 Thread Assaf Gordon

tag 44704 notabug
severity 44704 wishlist
stop

Hello,

On 2020-11-17 6:32 a.m., Brian J. Murrell wrote:

It would be a useful enhancement to uniq to replace all lines
considered non-uniq (i.e. those that would be removed from the output)
with a message about how many times the previous line was repeated.

I.e.

$ cat <
[...]

uniq supports the "--group" option, which adds a blank line after each
group of identical lines - this can be used down-stream to process
groups in any way you want.

Example:
  $ cat < in
  first line
  second line
  repeated line
  repeated line
  repeated line
  repeated line
  repeated line
  third line
  EOF

  $ cat in | uniq --group=append
  first line

  second line

  repeated line
  repeated line
  repeated line
  repeated line
  repeated line

  third line


  $ cat in | uniq --group=append \
  | awk '$0=="" { print "do something after group" ; next } ;
 1 { print }'
  first line
  do something after group
  second line
  do something after group
  repeated line
  repeated line
  repeated line
  repeated line
  repeated line
  do something after group
  third line
  do something after group

And with counting:

$ cat in | uniq --group=append \
 | awk 'BEGIN { c = 0 } ;
$0=="" { print "Group has " c " lines" ; c=0 ; next } ;
1 { print ; c++ }'
  first line
  Group has 1 lines
  second line
  Group has 1 lines
  repeated line
  repeated line
  repeated line
  repeated line
  repeated line
  Group has 5 lines
  third line
  Group has 1 lines


Hope this helps.
More information about "uniq --group=X" is here:

https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html

I'm marking this as "notabug/wishlist", but will likely close soon as
"wontfix" unless we come up with convincing argument why "--group"
is not sufficient for your use case.

Regardless of the status, discussion can continue by replying to this 
thread.


regards,
 - assaf






bug#44704: uniq: replace repeated lines with a message about how many repeated lines

2020-11-17 Thread Brian J. Murrell
It would be a useful enhancement to uniq to replace all lines
considered non-uniq (i.e. those that would be removed from the output)
with a message about how many times the previous line was repeated.

I.e.

$ cat <

signature.asc
Description: This is a digitally signed message part