uniq also has the "-w" flag, which instructs it to only compare the first N
characters in a line:
-w, --check-chars=N
compare no more than N characters in lines
although if your fields are all significantly different in length, it
probably won't work as well as Dale's solution.
On Tue, Oct 30, 2012 at 5:10 PM, Dale Snell <[email protected]> wrote:
> On Tue, 30 Oct 2012 16:17:08 -0700 (PDT)
> Rich Shepard <[email protected]> wrote:
>
> > I have a large data file that contains duplicate rows. 'uniq'
> > finds those rows that match character-by-character, but not those who
> > match only on the first three fields (separated by '|'). There are
> > rows with the same location ID, date, and chemical that have
> > different concentrations listed, and I need to cull the duplicated
> > records based on the first three fields after the file's been sorted
> > on those fields.
> >
> > The uniq man page doesn't show me how to do this; the information
> > may well be there and I'm not seeing it properly.
> >
> > Recommendations appreciated.
> >
> > Rich
>
> Rich,
>
> >From the above, may I take it that any data other than the first three
> fields is irrelevant? If so, use cut(1) to write those fields,
> line-by-line, to a scratch file. Then sort(1) said file, and use
> uniq(1) to delete the duplicate lines.
>
> Just off the top of my head,
>
> --Dale
>
> --
> Keyboard failure. Press F1 to continue.
> _______________________________________________
> PLUG mailing list
> [email protected]
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug