# from Rich Shepard on Tuesday 30 October 2012:
> I have a large data file that contains duplicate rows. 'uniq' finds
>those rows that match character-by-character, but not those who match
>only on the first three fields (separated by '|').
Hi Rich,
perl -e 'while(<>) {
my $k = join "|", (split /\|/, $_, 4)[0..2];
print unless $seen{$k}++
}'
(untested) That should give you the first instance for each $k, where
$k is the first three fields.
--Eric
--
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug