On 11 October 2010 05:57, Michael Neale <[email protected]> wrote:
> great - I guess if it shifts away from "fixed" soundex - probably should > try and find out who is using it to ensure there are no surprises. I can't > imagine it is widely used. > Neither do I - you should have seen some complaints, then. -W > > On Mon, Oct 11, 2010 at 2:43 PM, Wolfgang Laun <[email protected]>wrote: > >> On 10 October 2010 23:41, Michael Neale <[email protected]> wrote: >> > I think you should clean room implement it (or reuse some old code of >> yours >> > if it is safe to do so). From what I have seen of the algorithm - it >> isn't >> > huge - and it would make sense to have it re-implemented. As an >> alternative >> > - consider taking a look at the MVEL soundex code and rewriting that - >> and >> > we will see if we can make it upstream. >> >> I just re-implemented this according to the algorithm I found in >> http://en.wikipedia.org/wiki/Soundex >> I've also consulted a CPAN module, to learn what was intended by the >> MVEL implementation, but it's undecidable (possibly due to omissions or >> bugs). >> >> >> > I would say it is just slightly >> > neglected - its not well known that it lives there. Using the MVEL one >> was >> > just opportunistic for drools. >> > I didn't know that it could return null, that is bad. I guess if it is >> null >> > - that would mean that you just do a literal case insensitive compare? >> >> A correct implementation never returns null. An empty word might, but for >> our purpose "" would be preferable. >> >> >> > Also - AFAIK - soundex is only for english right? >> Certainly. >> >> >> > Is there an equivalent for other languages? >> Soundex is coarse even for English. I've found the atrocious example that >> the Soundex for "Britney Spears" is the same as for >> "bewährten Superzicke" (~ "proven super-b*"). >> NYSIIS<http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System>is >> supposed >> to be better. >> >> For German, there is an equivalent: "Kölner Phonetik". It might >> make sense to provide this for an operator "soundex[de]". (All of >> /M[ae][iy]e?r/ sound alike in German, and all exist as proper names.) >> >> I have also found one link to an implementation adapted for French. >> >> Soundex is aimed at the pronunciation of proper names. There might be some >> leeway for that even in a language like Hungarian, which is pronounced >> exactly >> as written. >> >> I think Drools should drop the MVEL version and go for a flexible >> approach, >> possibly even s.th. better than Soundex/NARA for English. I'll research >> this >> some more, and report back before I commit anything ;-) >> >> -W >> >> >> >> > If so, perhaps having it in the drools codebase makes sense >> > and opens the way for people to plug in their own soundex. >> > On Mon, Oct 11, 2010 at 2:54 AM, Wolfgang Laun <[email protected] >> > >> > wrote: >> >> >> >> The implementation of "soundslilke" is broken in more than one respect. >> >> The conversion of a word to a Soundex string is provided by >> >> org.mvel2.util.Soundex. >> >> (.) There are words where Soundex.soundex returns null, so that the >> >> calling code, in Drools, crashes with a NPE. >> >> (.) The algorithm implemented in Soundex is erroneous. I'm not sure >> which >> >> Soundex algorithm it is supposed to implement, but it just doesn't meet >> the >> >> basic requirements. >> >> >> >> I have implemented, correctly, the version for the National Archives >> and >> >> Records Administration (NARA) rule set for the official implementation >> of >> >> Soundex used by the U.S. Government. >> >> >> >> Do we wait for MVEL to correct this bug, or do we just replace it with >> a >> >> correct implementation? >> >> >> >> Regards >> >> Wolfgang >> >> >> _______________________________________________ >> rules-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/rules-dev >> >> > > > -- > Michael D Neale > home: www.michaelneale.net > blog: michaelneale.blogspot.com > > _______________________________________________ > rules-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/rules-dev > >
_______________________________________________ rules-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/rules-dev
