RE: looking for faster Ideas...

George Gallen Tue, 27 Jan 2004 15:34:40 -0800

Title: RE: looking for faster Ideas...

I thought of that, but soundex only works on the first three letters, if I remember correctly.

or it only encodes the first three letters, then remaining are unchanged.

The main problem is I can't isolate a last name from the source, it comes in as a full name,

and if I use the full name as given to us by the consumer, there is a chance it won't be in

the same exact format as in the file from the rental, might be missing the middle initial

one may have a married hyphenated name, one could be a shortened or different first name

(ie. betty instead of elizabeth, or jack instead john......etc).

Since my original was a list of if/thens, looks like the I'm not going to be able to gain much

in speed any other way with straight programming (that is no temp files, or files to bounce off).

George

-----Original Message-----
From: Jeff Schasny [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 5:12 PM
To: U2 Users Discussion List
Subject: RE: looking for faster Ideas...

I suppose you could soundex the whole thing
-----Original Message-----
From: Geoffrey Mitchell [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 2:59 PM
To: U2 Users Discussion List
Subject: RE: looking for faster Ideas...

We do something like this, using a "match code" composed of fragments of data concatenated together. I think we use a delimiter, but you wouldn't need to.

So, if you want to match Johnson in zipcode 12345 on Maple street, you might have a matchcode of "JOHNSON*12345*MAPLE", so you would extract the relevant fields, build the matchcode and check it against a list or file. Actually, we use an I-type dictionary to generate the matchcode, and have an index built on it. For small datasets this may be *slower* than your case statement, but I would think that it would be easier to maintain, and for large datasets it should be quicker since the time to construct the matchcode and do a read, selectindex, or whatever would be constant. Of course, if you have a Jonsson that gets spelled Johnson, you're going to have problems no matter how you approach it.

On Tue, 2004-01-27 at 13:05, George Gallen wrote:
I can't just check for names, it has to a name with a specific zip code
and if the name is fairly common, we also add in part of the address to
make sure no one else is weeded out that shouldn't be.

I suppose I could keep two or three arrays, do a specific lookup in each
saving the position, and if all three positions are identicle (asuming all
three arrays have the name, address, zip in the same order) then that would
be a match....Thanks

George

>-----Original Message-----
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 1:51 PM
>To: U2 Users Discussion List
>Subject: RE: looking for faster Ideas...
>
>
>how about keeping a list of excluded names as a record in a
>file (or as a
>flat file in a directory with each name/item/whatever on a
>line) and reading
>it into the program as a dynamic array then doing a locate on
>the string in
>question. Something like this:
>
>
>READ ALIST FROM AFILE,SOME-ID ELSE STOP
>X = 0
>LOOP
>   X += 1
>   ASTRING = INLIST<X>
>UNTIL ASTRING = ''
>   LOCATE ASTRING IN ALIST SETTING POS THEN
>      DO
>      OTHER
>      STUFF
>   END ELSE
>      DONT
>   END
>REPEAT
>
>Of course of you really want speed then sort the list and use
>a "BY clause
>in the locate
>
>-----Original Message-----
>From: George Gallen [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 11:33 AM
>To: 'Ardent List'
>Subject: looking for faster Ideas...
>
>
>I can't setup any indexs to speed this up. Basically I'm
>scanning a CSV file
>for names to remove
>   and set the flag of KICK=1 to remove it (creating a new CSV
>file at the
>same time).
>
>Keep in mind the ".." are people's last names, or zip codes, or part of
>their address, changed
>them to ".." to protect the unwanting...
>
>Right now, I do a series of CASE's ...
>Now, it's not a major problem as I'm only checking for 20 or
>so names, but
>as more and more people
> request to be removed (and we don't have access to the
>creation of the
>list). this could get quite
> slow over 50 or 60 thousand lines of checking.
>
>LIN is one line of the CSV file, the INDEX is checking for a
>last name & a
>zip code and sometimes
>   part of the address line.
>
>Any Ideas?
>
>Remember, we can't change the source of the file, it will
>always be a CSV,
>being read line by line
>
>   KICK=0
>   BEGIN CASE
>      CASE -1
>         KICK=1
>        BEGIN CASE
>            CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>           CASE -1
>              KICK=0
>        END CASE
>   END CASE
>
>George Gallen
>Senior Programmer/Analyst
>Accounting/Data Division
>[EMAIL PROTECTED]
>ph:856.848.1000 Ext 220
>
>SLACK Incorporated - An innovative information, education and
>management
>company
>http://www.slackinc.com
>
>_______________________________________________
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>_______________________________________________
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>
_______________________________________________
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users
-- 
Geoffrey Mitchell                                           314-684-1062
Programmer/Analyst                                  [EMAIL PROTECTED]
Knights Direct

_______________________________________________
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

RE: looking for faster Ideas...

Reply via email to