On Nov 16, 2010, at 1:56 PM, Tibor Simko wrote: > >> Would it not make sense to have ue <-> u <-> ü as well. I know we've >> talked about this before, but can someone remind me what our options >> are here? > > This requires to put a kind of synonym expansion around > get_words_from_foo() family of functions in the indexer so that one > index term could generate several. This is both useful to have and > straightforward to implement. However, we should muse some more about > how far we would like to go here. E.g. the direction `ue' -> `u', `ü' > should not be automatic, since it would not play nicely for words like > `cruel'.
The example that came up here was Bruning, Bruening, Brüning The point was made that Google gives the same results for _all three_. This turns out not to be true at all...but it was recognized as a "good idea" > > Alternatively, we can try to be more fancy and attempt some > language-specific analysis and treatment, so depending on the language > of the document and/or of the field used, we would do various stuff to > the text. I would not be averse to doing something like this, but in our case, authorID will fix this for real. > > I think the former should be probably sufficient. WDYT? Yep. > > Best regards > -- > Tibor Simko Travis C. Brooks Manager of Information Systems & INSPIRE SLAC National Accelerator Laboratory Library http://inspirebeta.net
