Re: Umlauts et al.

Brooks, Travis C. Tue, 16 Nov 2010 20:29:56 +0100

On Nov 16, 2010, at 1:56 PM, Tibor Simko wrote:
> 
>> Would it not make sense to have ue <-> u <-> ü as well.  I know we've
>> talked about this before, but can someone remind me what our options
>> are here?
> 
> This requires to put a kind of synonym expansion around
> get_words_from_foo() family of functions in the indexer so that one
> index term could generate several.  This is both useful to have and
> straightforward to implement.  However, we should muse some more about
> how far we would like to go here.  E.g. the direction `ue' -> `u', `ü'
> should not be automatic, since it would not play nicely for words like
> `cruel'.



The example that came up here was Bruning, Bruening, Brüning

The point was made that Google gives the same results for _all three_.   This 
turns out not to be true at all...but it was recognized as a "good idea"

> 
> Alternatively, we can try to be more fancy and attempt some
> language-specific analysis and treatment, so depending on the language
> of the document and/or of the field used, we would do various stuff to
> the text.

I would not be averse to doing something like this, but in our case, authorID 
will fix this for real.  

> 
> I think the former should be probably sufficient.  WDYT?


Yep.

> 
> Best regards
> -- 
> Tibor Simko

Travis C. Brooks
Manager of Information Systems & INSPIRE
SLAC National Accelerator Laboratory Library
http://inspirebeta.net

Re: Umlauts et al.

Reply via email to