Or don't use stopwords. I haven't used stopwords for, oh, a dozen years or so.
Removing stopwords was a hack developed for 16-bit computers and 40 megabyte disks. We don't need to do that any more. wunder On Mar 13, 2013, at 8:28 AM, Ahmet Arslan wrote: > I would merge stop_en.txt and stop_fr.txt. Use same set of stop words for all > fields that you search on. > > You might find this useful : > http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ > > --- On Wed, 3/13/13, Burgmans, Tom <tom.burgm...@wolterskluwer.com> wrote: > >> From: Burgmans, Tom <tom.burgm...@wolterskluwer.com> >> Subject: strange edismax parsing when searching in multiple fields (#TB) >> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >> Date: Wednesday, March 13, 2013, 5:22 PM >> Hi group, >> >> Background: >> I have a collection containing English and French documents. >> I made sure to index the English content in field "body" >> (fieldType=text_en) and the French content in field >> "body_fr" (fieldType=text_fr). >> >> The user could be either English of French so the goal is to >> execute the queries against both fields simultaneously >> without knowing the query language upfront. The query is >> analyzed differently for each field. For both fields a >> stopFilter is configured with each its own list of stopwords >> (different per language). >> >> The issue: >> When I search for 'a result' (without single quotes) in >> field "body" and "body_fr" at the same time, then "a" is >> considered a stopword in English and removed for field >> "body", but not in French so both terms are still searched >> inside "body_fr". What happens is that the query is parsed >> (edismax) into this construction: >> >> ((body_fr:a)~1.0 (body:result | body_fr:result)~1.0) >> >> This query returns only French documents, although there are >> many English documents in the index that contain the term >> 'result' as well. How can that happen? I think it is related >> to the way my query is parsed: there seems to be an >> AND-relationship between (body_fr:a) and (body:result | >> body_fr:result). There is no English document that has >> (body_fr:a), so that's why they don't show up. For me a much >> more logic parsed query would be: >> >> ((body:result)~1.0 | (body_fr:a body_fr:result)~1.0) >> >> How should I interpret this? Is it a bug in edismax? Is it >> intended and if yes: why? >> >> Thanks for any hint, >> Tom >> >> This email and any attachments may contain confidential or >> privileged information >> and is intended for the addressee only. If you are not the >> intended recipient, please >> immediately notify us by email or telephone and delete the >> original email and attachments >> without using, disseminating or reproducing its contents to >> anyone other than the intended >> recipient. Wolters Kluwer shall not be liable for the >> incorrect or incomplete transmission of >> of this email or any attachments, nor for unauthorized use >> by its employees. >> >> Wolters Kluwer nv has its registered address in Alphen aan >> den Rijn, The Netherlands, and is registered >> with the Trade Registry of the Dutch Chamber of Commerce >> under number 33202517. >> -- Walter Underwood wun...@wunderwood.org