http://aspell.sourceforge.net/metaphone/metaphone.basic
soundex is pathetic - nowadays, metaphone is much better. if you're feeling perl'ish http://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0009.html has an interesting discussion of using several approximate methods for identifying records by name. it even discusses the betty/elizabeth, jack/john problem... looks slow so you would probably have to cache the results. c'mon there must be *something* unique in the file they send! :-) On Tue, 2004-01-27 at 14:32, George Gallen wrote: > I thought of that, but soundex only works on the first three letters, if > I remember correctly. > or it only encodes the first three letters, then remaining are > unchanged. > > The main problem is I can't isolate a last name from the source, it > comes in as a full name, > and if I use the full name as given to us by the consumer, there is a > chance it won't be in > the same exact format as in the file from the rental, might be missing > the middle initial > one may have a married hyphenated name, one could be a shortened or > different first name > (ie. betty instead of elizabeth, or jack instead john......etc). > > Since my original was a list of if/thens, looks like the I'm not going > to be able to gain much > in speed any other way with straight programming (that is no temp files, > or files to bounce off). > > George > > -----Original Message----- > From: Jeff Schasny [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 27, 2004 5:12 PM > To: U2 Users Discussion List > Subject: RE: looking for faster Ideas... > > > I suppose you could soundex the whole thing > > -----Original Message----- > From: Geoffrey Mitchell [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 27, 2004 2:59 PM > To: U2 Users Discussion List > Subject: RE: looking for faster Ideas... > > > We do something like this, using a "match code" composed of fragments of > data concatenated together. I think we use a delimiter, but you > wouldn't need to. > > So, if you want to match Johnson in zipcode 12345 on Maple street, you > might have a matchcode of "JOHNSON*12345*MAPLE", so you would extract > the relevant fields, build the matchcode and check it against a list or > file. Actually, we use an I-type dictionary to generate the matchcode, > and have an index built on it. For small datasets this may be *slower* > than your case statement, but I would think that it would be easier to > maintain, and for large datasets it should be quicker since the time to > construct the matchcode and do a read, selectindex, or whatever would be > constant. Of course, if you have a Jonsson that gets spelled Johnson, > you're going to have problems no matter how you approach it. > > On Tue, 2004-01-27 at 13:05, George Gallen wrote: > > I can't just check for names, it has to a name with a specific zip code > and if the name is fairly common, we also add in part of the address to > make sure no one else is weeded out that shouldn't be. > > I suppose I could keep two or three arrays, do a specific lookup in each > saving the position, and if all three positions are identicle (asuming > all > three arrays have the name, address, zip in the same order) then that > would > be a match....Thanks > > George > > >-----Original Message----- > >From: Jeff Schasny [ <mailto:[EMAIL PROTECTED]> > mailto:[EMAIL PROTECTED] > >Sent: Tuesday, January 27, 2004 1:51 PM > >To: U2 Users Discussion List > >Subject: RE: looking for faster Ideas... > > > > > >how about keeping a list of excluded names as a record in a > >file (or as a > >flat file in a directory with each name/item/whatever on a > >line) and reading > >it into the program as a dynamic array then doing a locate on > >the string in > >question. Something like this: > > > > > >READ ALIST FROM AFILE,SOME-ID ELSE STOP > >X = 0 > >LOOP > > X += 1 > > ASTRING = INLIST<X> > >UNTIL ASTRING = '' > > LOCATE ASTRING IN ALIST SETTING POS THEN > > DO > > OTHER > > STUFF > > END ELSE > > DONT > > END > >REPEAT > > > >Of course of you really want speed then sort the list and use > >a "BY clause > >in the locate > > > >-----Original Message----- > >From: George Gallen [ <mailto:[EMAIL PROTECTED]> > mailto:[EMAIL PROTECTED] > >Sent: Tuesday, January 27, 2004 11:33 AM > >To: 'Ardent List' > >Subject: looking for faster Ideas... > > > > > >I can't setup any indexs to speed this up. Basically I'm > >scanning a CSV file > >for names to remove > > and set the flag of KICK=1 to remove it (creating a new CSV > >file at the > >same time). > > > >Keep in mind the ".." are people's last names, or zip codes, or part of > >their address, changed > >them to ".." to protect the unwanting... > > > >Right now, I do a series of CASE's ... > >Now, it's not a major problem as I'm only checking for 20 or > >so names, but > >as more and more people > > request to be removed (and we don't have access to the > >creation of the > >list). this could get quite > > slow over 50 or 60 thousand lines of checking. > > > >LIN is one line of the CSV file, the INDEX is checking for a > >last name & a > >zip code and sometimes > > part of the address line. > > > >Any Ideas? > > > >Remember, we can't change the source of the file, it will > >always be a CSV, > >being read line by line > > > > KICK=0 > > BEGIN CASE > > CASE -1 > > KICK=1 > > BEGIN CASE > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND > >INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND > >INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 > > CASE -1 > > KICK=0 > > END CASE > > END CASE > > > >George Gallen > >Senior Programmer/Analyst > >Accounting/Data Division > >[EMAIL PROTECTED] > >ph:856.848.1000 Ext 220 > > > >SLACK Incorporated - An innovative information, education and > >management > >company > > <http://www.slackinc.com> http://www.slackinc.com > > > >_______________________________________________ > >u2-users mailing list > >[EMAIL PROTECTED] > > <http://www.oliver.com/mailman/listinfo/u2-users> > http://www.oliver.com/mailman/listinfo/u2-users > >_______________________________________________ > >u2-users mailing list > >[EMAIL PROTECTED] > > <http://www.oliver.com/mailman/listinfo/u2-users> > http://www.oliver.com/mailman/listinfo/u2-users > > > > > > _____ > > > > _______________________________________________ > > u2-users mailing list > > [EMAIL PROTECTED] > > <http://www.oliver.com/mailman/listinfo/u2-users> > http://www.oliver.com/mailman/listinfo/u2-users > -- Ian McGowan <[EMAIL PROTECTED]> _______________________________________________ u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users