Re: umlauts / diacritic expansion

2019-04-17 Thread Michael Sokolov
Right, AsciiFoldingFilter seems to map Ü [LATIN CAPITAL LETTER U WITH DIAERESIS] to "U" not "UE". On Wed, Apr 17, 2019 at 12:26 AM Ralf Heyde wrote: > > Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc > > You could allow a distance of 1 or 2 given you use levenshtein

Re: umlauts / diacritic expansion

2019-04-17 Thread Michael Sokolov
Thanks - GermanNormalizer seems as if it addresses this problem, yes. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: umlauts / diacritic expansion

2019-04-16 Thread Ralf Heyde
Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc You could allow a distance of 1 or 2 given you use levenshtein distance - this might be close to what you need. Von meinem iPhone gesendet > Am 16.04.2019 um 20:08 schrieb Michael Sokolov : > > I'm learning how to

RE: umlauts / diacritic expansion

2019-04-16 Thread Markus Jelsma
; Sent: Tuesday 16th April 2019 20:28 > To: java-user@lucene.apache.org > Subject: Re: umlauts / diacritic expansion > > Hey, > > Take a look at Asciifoldingfilter - this one is quite generic. > > Does this answer your question? > > Cheers Ralf > > Von meinem iPh

Re: umlauts / diacritic expansion

2019-04-16 Thread Ralf Heyde
Hey, Take a look at Asciifoldingfilter - this one is quite generic. Does this answer your question? Cheers Ralf Von meinem iPhone gesendet > Am 16.04.2019 um 20:08 schrieb Michael Sokolov : > > I'm learning how to index/search German today and understanding that > vowels with umlauts are

umlauts / diacritic expansion

2019-04-16 Thread Michael Sokolov
I'm learning how to index/search German today and understanding that vowels with umlauts are conventionally expanded into two ASCII characters, eg "für" -> "fuer", so people may search for the expanded form "fuer", but they might also search with the diacritic, and finally they might lazily