do it outside basic using

$grep -F -f pattern-file csv-file > remove-file

the pattern file would have the pieces in there.  what if you're
excluding something that's not unique?  "smith" would exclude
"smithers", "smithy". "psmith (one for the wodehouse fans :-)" etc.

i do this with some huge syslog files, and fairly big pattern files and
it's pretty darn quick.  

ian

On Tue, 2004-01-27 at 10:33, George Gallen wrote:
> I can't setup any indexs to speed this up. Basically I'm scanning a CSV
> file
> for names to remove
>    and set the flag of KICK=1 to remove it (creating a new CSV file at
> the
> same time).
> 
> Keep in mind the ".." are people's last names, or zip codes, or part of
> their address, changed
> them to ".." to protect the unwanting...
> 
> Right now, I do a series of CASE's ...
> Now, it's not a major problem as I'm only checking for 20 or so names,
> but
> as more and more people
>   request to be removed (and we don't have access to the creation of the
> list). this could get quite
>   slow over 50 or 60 thousand lines of checking.
> 
> LIN is one line of the CSV file, the INDEX is checking for a last name &
> a
> zip code and sometimes
>    part of the address line.
> 
> Any Ideas?
> 
> Remember, we can't change the source of the file, it will always be a
> CSV,
> being read line by line
> 
>    KICK=0
>    BEGIN CASE
>       CASE -1
>          KICK=1
>        BEGIN CASE
>             CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>           CASE -1
>              KICK=0
>        END CASE
>    END CASE
> 
> George Gallen
> Senior Programmer/Analyst
> Accounting/Data Division
> [EMAIL PROTECTED]
> ph:856.848.1000 Ext 220
> 
> SLACK Incorporated - An innovative information, education and management
> company
> http://www.slackinc.com
> 
> _______________________________________________
> u2-users mailing list
> [EMAIL PROTECTED]
> http://www.oliver.com/mailman/listinfo/u2-users
-- 
Ian McGowan <[EMAIL PROTECTED]>

_______________________________________________
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

Reply via email to