Following up, I noticed a pattern among the outputs of| sort | uniq -u versus | sort u:

The three files that I evaluated had 26.1GB, 12GB, and 2.0GB, repectively, among 1. The original file, the result of grepping about 10GB of nMap output files, with many duplicates;
2. The | sort -u file; and
3. The | sort | uniq -u file, the smallest of the three.

I applied comm (with no arguments):
comm IPv6-uniq.lns01.v6.018.net.il.txt IPv6-uniqB.lns01.v6.018.net.il.txt > IPv6-commAll.lns01.v6.018.net.il.txt

An excerpt from this last script's output is attached; it has no Column $2 (files unique to the second (smaller) file; Column $3 (the less well represented among the two files) has nothing
obviously different from the entries above & below.

Not to contradict man uniq's description of uniq -u, but I'm suspicious. I'll be using sort -u
from now on.

Reply via email to