As others have said, there's no really good solution and you will have to have people
making the final decisions. But there are things like soundex that often help. Here's
a technique I've had some success with. You write your own Basic program to build
strings of names & addresses (use whatever you have available) and call these routines
to tell you how closely they match. My own experience has been that when complete
addresses are used, an 80% match means you have a likely match and a 90% match is a
real good bet.
Hope this helps.
*!********************************************************************
*! Compare.Strings
*!
*! 09/22/03
*!********************************************************************
*! Brief description of routine...
*! This routine compares two multivalued strings and returns "Y" if
*! the percentage of characters that must be matched is met.
*!********************************************************************
*! Parameters
*! MATCHED.YN - Returns Y if minimum percentage.required is met.
*!
*! PERCENTAGE.REQUIRED - Minimum percentage to be considered a match
*! STRING.TO.CHECK - This is the string that must match what's in the
*! required string.
*! REQUIRED.STRING - All lines of this string must have a match.
*!********************************************************************
SUBROUTINE
COMPARE.STRINGS(MATCHED.YN,PERCENTAGE.REQUIRED,STRING.TO.CHECK,REQUIRED.STRING)
MATCHED.YN = 'Y'
*! Use OCONV to remove all non alpa and numeric characters.
STRING.ONE = OCONV(REQUIRED.STRING,'MCB')
STRING.TWO = OCONV(STRING.TO.CHECK,'MCB')
IF LEN(STRING.ONE) < LEN(STRING.TWO) THEN
CALL MATCH.CHARACTERS(MATCHED,STRING.ONE,STRING.TWO,PERCENTAGE.REQUIRED)
END ELSE
CALL MATCH.CHARACTERS(MATCHED,STRING.TWO,STRING.ONE,PERCENTAGE.REQUIRED)
END
IF NOT(MATCHED) THEN
MATCHED.YN = 'N'
END
RETURN
SUBROUTINE MATCH.CHARACTERS(SUCCESSFUL, STRING1, STRING2, PERCENT.REQUIRED)
*!************************************************************************
*!
*! August, 2001
*!
*! Determines how many of the characters in STRING1 exist in STRING2.
*!
*! Parameters:
*! SUCCESSFUL output Set to @TRUE if match is successful.
*! STRING1 input
*! STRING2 input
*! PERCENT.REQUIRED input The percent of matching characters that
*! are necessary for the match to be
*! considered successful.
*!
*!************************************************************************
[EMAIL PROTECTED]
IF NUM(PERCENT.REQUIRED) AND PERCENT.REQUIRED GE 1 AND PERCENT.REQUIRED LE 100 THEN
STRING1=TRIM(STRING1)
STRING1.LEN=LEN(STRING1)
IF STRING1.LEN THEN
WORK.STRING=STRING2
NUMBER.OF.HITS=0
FOR CHARACTER.PTR=1 TO STRING1.LEN
CHARACTER=STRING1[CHARACTER.PTR,1]
MATCHING.CHARACTER.LOCATION=INDEX(WORK.STRING,CHARACTER,1)
IF MATCHING.CHARACTER.LOCATION THEN
NUMBER.OF.HITS+=1
WORK.STRING[MATCHING.CHARACTER.LOCATION,1]=''
END
NEXT CHARACTER.PTR
PERCENT.OF.MATCH=INT((NUMBER.OF.HITS/STRING1.LEN)*100)
IF PERCENT.OF.MATCH GE PERCENT.REQUIRED THEN
[EMAIL PROTECTED]
END
END
END
RETURN
----- Original Message -----
From: "Steve Kunzman" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: [U2][UD] merging data - fuzzy keys
Date: Tue, 2 Nov 2004 16:25:03 -0600
>
> I am working on a project to populate industry information (SIC, NAICS,
> etc.) into my customer database from a different database (excel). The
> only thing somewhat in common is the customer name, city, and state. By
> using these common data elements, I have limited success with SOUNDEX.
> Is there some other tool/utility I could use to help me merge this data?
>
>
>
> Unidata 6.0
>
> HP9000 RP5470/ HPUX 11.11
>
>
>
> Thanks in advance. Steve
> -------
> u2-users mailing list
> [EMAIL PROTECTED]
> To unsubscribe please visit http://listserver.u2ug.org/
>
--
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm
-------
u2-users mailing list
[EMAIL PROTECTED]
To unsubscribe please visit http://listserver.u2ug.org/