As others have said, there's no really good solution and you will have to have people 
making the final decisions. But there are things like soundex that often help. Here's 
a technique I've had some success with. You write your own Basic program to build 
strings of names & addresses (use whatever you have available) and call these routines 
to tell you how closely they match. My own experience has been that when complete 
addresses are used, an 80% match means you have a likely match and a 90% match is a 
real good bet.

Hope this helps.

*!********************************************************************
*!                       Compare.Strings
*!                         
*!                         09/22/03
*!********************************************************************
*! Brief description of routine...
*!  This routine compares two multivalued strings and returns "Y" if
*! the percentage of characters that must be matched is met.
*!********************************************************************
*! Parameters
*! MATCHED.YN - Returns Y if minimum percentage.required is met.
*!
*! PERCENTAGE.REQUIRED - Minimum percentage to be considered a match
*! STRING.TO.CHECK - This is the string that must match what's in the
*!                   required string.
*! REQUIRED.STRING - All lines of this string must have a match.
*!********************************************************************
SUBROUTINE 
COMPARE.STRINGS(MATCHED.YN,PERCENTAGE.REQUIRED,STRING.TO.CHECK,REQUIRED.STRING)


MATCHED.YN = 'Y'

*! Use OCONV to remove all non alpa and numeric characters.
STRING.ONE = OCONV(REQUIRED.STRING,'MCB')
STRING.TWO = OCONV(STRING.TO.CHECK,'MCB')

IF LEN(STRING.ONE) < LEN(STRING.TWO) THEN
  CALL MATCH.CHARACTERS(MATCHED,STRING.ONE,STRING.TWO,PERCENTAGE.REQUIRED)
END ELSE
  CALL MATCH.CHARACTERS(MATCHED,STRING.TWO,STRING.ONE,PERCENTAGE.REQUIRED)
END

IF NOT(MATCHED) THEN
  MATCHED.YN = 'N'
END
RETURN


SUBROUTINE MATCH.CHARACTERS(SUCCESSFUL, STRING1, STRING2, PERCENT.REQUIRED)
*!************************************************************************
*!
*! August, 2001
*!
*! Determines how many of the characters in STRING1 exist in STRING2.
*!
*! Parameters:
*!    SUCCESSFUL        output   Set to @TRUE if match is successful.
*!    STRING1           input
*!    STRING2           input
*!    PERCENT.REQUIRED  input    The percent of matching characters that
*!                               are necessary for the match to be
*!                               considered successful.
*!
*!************************************************************************

[EMAIL PROTECTED]

IF NUM(PERCENT.REQUIRED) AND PERCENT.REQUIRED GE 1 AND PERCENT.REQUIRED LE 100 THEN
   STRING1=TRIM(STRING1)
   STRING1.LEN=LEN(STRING1)
   IF STRING1.LEN THEN
      WORK.STRING=STRING2
      NUMBER.OF.HITS=0
      FOR CHARACTER.PTR=1 TO STRING1.LEN
         CHARACTER=STRING1[CHARACTER.PTR,1]
         MATCHING.CHARACTER.LOCATION=INDEX(WORK.STRING,CHARACTER,1)
         IF MATCHING.CHARACTER.LOCATION THEN
            NUMBER.OF.HITS+=1
            WORK.STRING[MATCHING.CHARACTER.LOCATION,1]=''
         END
      NEXT CHARACTER.PTR
      PERCENT.OF.MATCH=INT((NUMBER.OF.HITS/STRING1.LEN)*100)
      IF PERCENT.OF.MATCH GE PERCENT.REQUIRED THEN
         [EMAIL PROTECTED]
      END
   END
END

RETURN



----- Original Message -----
From: "Steve Kunzman" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: [U2][UD] merging data - fuzzy keys
Date: Tue, 2 Nov 2004 16:25:03 -0600

> 
> I am working on a project to populate industry information (SIC, NAICS,
> etc.) into my customer database from a different database (excel).  The
> only thing somewhat in common is the customer name, city, and state.  By
> using these common data elements, I have limited success with SOUNDEX.
> Is there some other tool/utility I could use to help me merge this data?
> 
> 
> 
> Unidata 6.0
> 
> HP9000 RP5470/ HPUX 11.11
> 
> 
> 
> Thanks in advance. Steve
> -------
> u2-users mailing list
> [EMAIL PROTECTED]
> To unsubscribe please visit http://listserver.u2ug.org/
> 

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm
-------
u2-users mailing list
[EMAIL PROTECTED]
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to