https://bugzilla.wikimedia.org/show_bug.cgi?id=58701

--- Comment #4 from François Martin <frois...@gmail.com> ---
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > I know this is right for English, but maybe/probably not other languages.
> > 
> > This is right for French: apostrophes in this language are basically the
> > elision of a vowel and a space.
> 
> The new search has a special filter to handle French's elision.  Here it is: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/
> analysis-elision-tokenfilter.html
> .  I'll crack open the code and see what it does when I start work on this
> bug.

This new filter seems great. (Your link doesn’t mention “d’” as a stop word, it
will be worth the check when you hack the code.)
I’ve done some search tests on frwikisource and it appears that:

— apostrophes “'” and “’” are indeed interchangeable in the new Elasticsearch:
priority is given to the apostrophe typed in the search box, but the other one
is returned as well (e.g. the search “l'art d'avoir raison stratagème” first
returns a redirection page, but also every occurrence of “L’Art d’avoir
toujours raison”); although I don’t think that it’s due to the elision token
filter: the search “Morestal lorsqu'il” returns the same result as “Morestal
lorsqu’il”, even if “lorsqu” is not in this filter;

— despite this filter, apostrophes in french stop words don’t seem to break
words either: the search “avoir toujours raison” doesn’t return “L’Art d’avoir
toujours raison”, and the input “art d’avoir toujours raison” returns it but
“Art” in the search result is not bolded.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to