Re: Quick question about sort and large files

Charles Galpin Tue, 14 Nov 2000 06:47:59 -0800
I'm not sure is sorting first actually buys you anything. But it should
work fine. Keep 3 shells open, 1 to run the command from, one running top,
and the other ready to kill it :).

I'd be more concerned with how you are collecting this data - i.e. how
easy is it for th esame person to provide different info etc. Usually it's
best to prevent the duplicates up front.

hth
charles

On Tue, 14 Nov 2000, Gary Nielson wrote:

> I think I have this right, but before I run it on a 53-meg text file, I
> want to make sure. On a test of about a dozen lines, it seems to work
> just fine.
> 
> I need to sort a file that is  456,193 lines long. People were allowed
> to enter a contest, but not more than once a day. Their names and the
> date were logged to this file. Some people voted more than once a day,
> on many days. Others, just once I day. I am getting rid of duplicates
> for the same day, discarding the second and more votes of each person
> per day.
> 
> The file looks as so:
> 
> 7/17/0|First Last|xxxx Vinca
> Cr|Charlotte|NC|28213|NA|[EMAIL PROTECTED]|name4
> 07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name1
> 07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name1
> 07/17/0|dd|dd|dd|NC|dd|dd|[EMAIL PROTECTED]|name5
> 07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name2
> 07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name2
> 07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|steve_park
> 07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|steve_park
> 07/18/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|steve_park
> 007/18/0|M Settle|0000 Audubon
> Dr.|Foley|AL|36535|555-555-5000|[EMAIL PROTECTED]|name6
> 07/18/0|d|d|d|NC|d|d|[EMAIL PROTECTED]
> 07/19/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name2
> 07/20/0|N Sipe|900 Way Cr|Charlotte|NC|28213|NA|[EMAIL PROTECTED]|name3
> 
> To strip out duplicate entries by someone on the same day, I am sorting
> this file first by the email address, and then by the first field (yeah,
> I know it should have been a date field, but the script apparently
> wasn't y2k-compatible and it really doesn't matter anyway). I am doing
> the following:
> 
> sort -t \| -k 8 -k 1 test.txt | uniq
> 
> It seems to work, so I am presuming it will work. Have I overlooked
> anything? But will I run into problems given the huge size of this
> file, as I said, 53-megs, sorting names from top to bottom each time? If
> it is a problem, any advice appreciated. Is there another approach I 
> should be taking?  Please email reply as well.
> 
> Gary




_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list
Re: Quick question about sort and large files

Reply via email to