Hi George,

We some processing through files of 5-4Mb in D3 and UniVerse. We found one
of the quicker to process these files was to read about 2k of data at a
time. You would need to identify the last complete line work with everything
before that keeping the last bit for then next processing chunk.

As far as finding matches you have one, two or three pieces of data to match
on.So start with one if you score a match then look further. This will
reduce your processing to quickly find anything that might match rather than
having to match on everything for every line. Processing in 2k chunks also
means you can index for "SMITH" and find none quickly rather than processing
each line looking for "SMITH" + "3333" and "SMITH" + "MENERE ST".

I would avoid using index on a line by line basis. I would also look at what
information you usually get and consider using a record where the item id is
the key search string. Where you have more than one out-opter you can then
use either multi-values or attributes to contain the other search criteria.

Sounds a little complicated but it breaks the job into smaller chunks to be
resolved and will require less processing in the long run I believe.

Good luck

T.

----- Original Message ----- 
From: "George Gallen" <[EMAIL PROTECTED]>
To: "'Ardent List'" <[EMAIL PROTECTED]>
Sent: Wednesday, January 28, 2004 5:33 AM
Subject: looking for faster Ideas...


> I can't setup any indexs to speed this up. Basically I'm scanning a CSV
file
> for names to remove
>    and set the flag of KICK=1 to remove it (creating a new CSV file at the
> same time).
>
> Keep in mind the ".." are people's last names, or zip codes, or part of
> their address, changed
> them to ".." to protect the unwanting...
>
> Right now, I do a series of CASE's ...
> Now, it's not a major problem as I'm only checking for 20 or so names, but
> as more and more people
>   request to be removed (and we don't have access to the creation of the
> list). this could get quite
>   slow over 50 or 60 thousand lines of checking.
>
> LIN is one line of the CSV file, the INDEX is checking for a last name & a
> zip code and sometimes
>    part of the address line.
>
> Any Ideas?
>
> Remember, we can't change the source of the file, it will always be a CSV,
> being read line by line
>
>    KICK=0
>    BEGIN CASE
>       CASE -1
>          KICK=1
> BEGIN CASE
>             CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>     CASE -1
>        KICK=0
> END CASE
>    END CASE
>
> George Gallen
> Senior Programmer/Analyst
> Accounting/Data Division
> [EMAIL PROTECTED]
> ph:856.848.1000 Ext 220
>
> SLACK Incorporated - An innovative information, education and management
> company
> http://www.slackinc.com
>
> _______________________________________________
> u2-users mailing list
> [EMAIL PROTECTED]
> http://www.oliver.com/mailman/listinfo/u2-users
>
>


_______________________________________________
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

Reply via email to