bug#44704: uniq: replace repeated lines with a message about how many repeated lines
On 11/18/20 12:25 PM, Chris Elvidge wrote: > You could write your own function to do it. E.g. > > unique() { > [ "$1" ] || { echo "Needs a readable file to test" && return 1; } > [ -r "$1" ] || { echo "Needs a readable file to test" && return 1; } > R=""; N=0 > while IFS=$'\n' read L; do > [ "$L" = "$R" ] && { ((N++)); continue; } > [ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; } > R="$L" > echo "$L" > done <$1 > } Nice. The UNIX toolbox is diverse. ;-) I'd use: awk ' function p(n) { if (n > 1) { printf("[previous line repeated %d times]\n", n-1); } } { if (line != $0) { p(n); n = 0; } line = $0; if (n == 0) print n++; } END { p(n); } ' Have a nice day, Berny
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
On 17/11/2020 01:32 pm, Brian J. Murrell wrote: It would be a useful enhancement to uniq to replace all lines considered non-uniq (i.e. those that would be removed from the output) with a message about how many times the previous line was repeated. I.e. $ cat < You could write your own function to do it. E.g. unique() { [ "$1" ] || { echo "Needs a readable file to test" && return 1; } [ -r "$1" ] || { echo "Needs a readable file to test" && return 1; } R=""; N=0 while IFS=$'\n' read L; do [ "$L" = "$R" ] && { ((N++)); continue; } [ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; } R="$L" echo "$L" done <$1 } -- Chris Elvidge
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
On Tue, 2020-11-17 at 14:10 -0800, Paul Eggert wrote: > On 11/17/20 5:32 AM, Brian J. Murrell wrote: > > [previous line repeated 4 times] > > uniq -c already does something like that, though it outputs "5" > instead of "4". Right. I had considered that. Something like: $ cat /tmp/in | uniq -c | while read c line; do > echo $line > if [ $c -gt 1 ]; then > echo "Last line repeated $((c-1)) times" > fi > done But that eats leading whitespace on $line. > Not sure it's worth gussying up 'uniq' to provide exactly the > functionality > requested, as output reformatting is easy enough to do yourself using > awk or > Python or whatever. Right. But if I were going to pull out such a big hammer, I'd just again, eliminate uniq and do everything in awk or Python or whatever. Anyway, it was just a suggestion. Doesn't seem like it will go much of anywhere. That's fine. If it really itched me enough, I guess I'd just submit a patch. Cheers, b. signature.asc Description: This is a digitally signed message part
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
On 11/17/20 5:32 AM, Brian J. Murrell wrote: > [previous line repeated 4 times] uniq -c already does something like that, though it outputs "5" instead of "4". Not sure it's worth gussying up 'uniq' to provide exactly the functionality requested, as output reformatting is easy enough to do yourself using awk or Python or whatever.
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
On Tue, 2020-11-17 at 08:05 -0700, Assaf Gordon wrote: > > Hello, Hi, > uniq supports the "--group" option, which adds a blank line after > each > group of identical lines - this can be used down-stream to process > groups in any way you want. But there is no way to have it remove the repeated lines also, correct? By down-stream process, I feel like you are leaving it up to the down- stream to remove the duplicate lines as well as add the "repeated %s times" messages. Is that correct? If so, uniq really adds no value. The down-stream might as well just do the adjacent line comparison also in such a case. > And with counting: > > $ cat in | uniq --group=append \ > | awk 'BEGIN { c = 0 } ; > $0=="" { print "Group has " c " lines" ; c=0 ; next } ; > 1 { print ; c++ }' >first line >Group has 1 lines >second line >Group has 1 lines >repeated line >repeated line >repeated line >repeated line >repeated line >Group has 5 lines >third line >Group has 1 lines This still doesn't really achieve the original stated goal as the repeated lines are not being replaced by your "Group has %d lines". I think once you add the repeated line suppression, you will see that adding a simple adjacent line comparison and just not using uniq at all is only slightly incrementally more in the down-stream (which is now the main). Cheers, b. signature.asc Description: This is a digitally signed message part
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
tag 44704 notabug severity 44704 wishlist stop Hello, On 2020-11-17 6:32 a.m., Brian J. Murrell wrote: It would be a useful enhancement to uniq to replace all lines considered non-uniq (i.e. those that would be removed from the output) with a message about how many times the previous line was repeated. I.e. $ cat < [...] uniq supports the "--group" option, which adds a blank line after each group of identical lines - this can be used down-stream to process groups in any way you want. Example: $ cat < in first line second line repeated line repeated line repeated line repeated line repeated line third line EOF $ cat in | uniq --group=append first line second line repeated line repeated line repeated line repeated line repeated line third line $ cat in | uniq --group=append \ | awk '$0=="" { print "do something after group" ; next } ; 1 { print }' first line do something after group second line do something after group repeated line repeated line repeated line repeated line repeated line do something after group third line do something after group And with counting: $ cat in | uniq --group=append \ | awk 'BEGIN { c = 0 } ; $0=="" { print "Group has " c " lines" ; c=0 ; next } ; 1 { print ; c++ }' first line Group has 1 lines second line Group has 1 lines repeated line repeated line repeated line repeated line repeated line Group has 5 lines third line Group has 1 lines Hope this helps. More information about "uniq --group=X" is here: https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html I'm marking this as "notabug/wishlist", but will likely close soon as "wontfix" unless we come up with convincing argument why "--group" is not sufficient for your use case. Regardless of the status, discussion can continue by replying to this thread. regards, - assaf
bug#44704: uniq: replace repeated lines with a message about how many repeated lines
It would be a useful enhancement to uniq to replace all lines considered non-uniq (i.e. those that would be removed from the output) with a message about how many times the previous line was repeated. I.e. $ cat < signature.asc Description: This is a digitally signed message part