RE: Spell checking street names

2008-01-31 Thread Max Metral
gic, it's obviously not the best metric. Is there an appropriate edit distance metric that takes phonetics into account? -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Thursday, January 31, 2008 6:12 AM To: java-user@lucene.apache.org Subject: Re: Spell checking s

Re: Spell checking street names

2008-01-31 Thread Karl Wettin
30 jan 2008 kl. 17.34 skrev Max Metral: Part of the reason is if we look at some common mistakes: For Commonwealth: Communwealth Comonwealth Common wealth If they are common misstakes you can pick them up using reinforcement learning.

Re: Spell checking street names

2008-01-31 Thread eks dev
ache.org Sent: Thursday, 31 January, 2008 6:02:28 AM Subject: Re: Spell checking street names Hmmm, "untokenized n-gram spell checker"... does that really make sense? lucene as 2-gram: lu uc ce en ne. but all as a single token? No, I don't think that

Re: Spell checking street names

2008-01-30 Thread Otis Gospodnetic
Hmmm, "untokenized n-gram spell checker"... does that really make sense? lucene as 2-gram: lu uc ce en ne. but all as a single token? No, I don't think that will work with the Lucene spellchecker. As for non-tokenizing Analyzer - KeywordAnalyzer. Otis -- Sematext -- http://sematext.com/