Re: [PLUG] Finding partial duplicate rows with uniq

Eric Wilhelm Tue, 30 Oct 2012 18:10:22 -0700

# from Rich Shepard on Tuesday 30 October 2012:
>   I have a large data file that contains duplicate rows. 'uniq' finds
>those rows that match character-by-character, but not those who match
>only on the first three fields (separated by '|').


Hi Rich,

  perl -e 'while(<>) {
    my $k = join "|", (split /\|/, $_, 4)[0..2];
    print unless $seen{$k}++
  }'

(untested)  That should give you the first instance for each $k, where 
$k is the first three fields. 

--Eric
-- 
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Re: [PLUG] Finding partial duplicate rows with uniq

Reply via email to