Public bug reported:

Binary package hint: tracker

System - Preferences - Indexing preferences - General tab - Stemming has
a short list of languages ... but you have to pick one.  This is not a
realistic scenario in many locales; for example, I routinely handle
stuff in three languages (English; my home language, Swedish; and the
majority language of the country where I live, Finnish) and so does
everyone in my family, including soon enough my daughter, who just
started school.

Not only is the absence of stemming for, say, Finnish, problematic, but
being forced to choose English or Swedish stemming for Finnish documents
is likely to produce a large amount of false-positive stems, making
searches for Finnish words return what seem like completely haphazard
matches in many cases -- enough to make it useless at least in some
scenarios.

What happens if later, you change this setting?  Does it throw away or
redo all the stemming it has done so far?

What happens if your primary locale preferences indicate a language
which is not on the list; would that be a workaround for disabling
stemming?

I do realize that coming up with a good fix for this is hard.  At a
minimum, indexing without any stemming should be possible.  Further out
in wishlist territory, it would be nice if at some point the indexer
could try to establish the language of each document (ignoring for now
the can of worms that is multilingual documents -- don't let any
philologists hear about this) and use an appropriate stemmer only if the
language can be established with reasonable certainty.  (Debian has a
package "mguesser" for stand-alone language identification, which is
also available as a library which ships with the mnogosearch search
engine; google for TextCat for some more suggestions.  Or ask me again
and be prepared for a veritable flood of bookmarks on the topic.)

** Affects: tracker (Ubuntu)
     Importance: Undecided
         Status: New

-- 
stemming language setting problematic
https://bugs.launchpad.net/bugs/157183
You received this bug notification because you are a member of Ubuntu
Bugs, which is the bug contact for Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to