Re: [PLUG] Removing Duplicate Rows from SQL Dump

Fred James Tue, 16 Aug 2011 07:12:56 -0700

Rich Shepard wrote:
> On Tue, 16 Aug 2011, Rich Shepard wrote:
>
>   
>>   This will work for all completely duplicated lines. I'll need to see how
>> many remain that vary in one or more columns ('fields') such as the
>> parameter, lab_id number, or qa_qc.
>>     
>
>    I had manually cleaned up a bunch of lines so the souce file had 12,119
> lines. AFter running uniq the output file has 8,605 lines, about 1/3 fewer.
>
>    The need to remove almost duplicates/triplicates, based on the same values
> in three columns regardless of the rest of the contents, remains.
>
> Rich
>   
Rich Shepard
If this a one off (once and done), one might consider a spreadsheet 
sorted on the three columns in question ... a little rough around the 
edges on the down and dirty side, but it should work.
If this is going to be ongoing, one might consider (G)AWK ... a glance 
at the "content" supplied suggest that the values might be delimited by 
either a tab or space, and that there may not be any spaces within a 
field?  Otherwise one might have to treat the data as fixed length 
records ... is that possible?
Hope this helps
Regards
Fred James



_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Re: [PLUG] Removing Duplicate Rows from SQL Dump

Reply via email to