Re: [PLUG] Finding partial duplicate rows with uniq

Scott Bigelow Tue, 30 Oct 2012 17:17:18 -0700

uniq also has the "-w" flag, which instructs it to only compare the first N
characters in a line:


       -w, --check-chars=N
              compare no more than N characters in lines


although if your fields are all significantly different in length, it
probably won't work as well as Dale's solution.

On Tue, Oct 30, 2012 at 5:10 PM, Dale Snell <[email protected]> wrote:

> On Tue, 30 Oct 2012 16:17:08 -0700 (PDT)
> Rich Shepard <[email protected]> wrote:
>
> >    I have a large data file that contains duplicate rows. 'uniq'
> > finds those rows that match character-by-character, but not those who
> > match only on the first three fields (separated by '|'). There are
> > rows with the same location ID, date, and chemical that have
> > different concentrations listed, and I need to cull the duplicated
> > records based on the first three fields after the file's been sorted
> > on those fields.
> >
> >    The uniq man page doesn't show me how to do this; the information
> > may well be there and I'm not seeing it properly.
> >
> >    Recommendations appreciated.
> >
> > Rich
>
> Rich,
>
> >From the above, may I take it that any data other than the first three
> fields is irrelevant?  If so, use cut(1) to write those fields,
> line-by-line, to a scratch file.  Then sort(1) said file, and use
> uniq(1) to delete the duplicate lines.
>
> Just off the top of my head,
>
> --Dale
>
> --
> Keyboard failure.  Press F1 to continue.
> _______________________________________________
> PLUG mailing list
> [email protected]
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Re: [PLUG] Finding partial duplicate rows with uniq

Reply via email to