I would merge stop_en.txt and stop_fr.txt. Use same set of stop words for all 
fields that you search on.

You might find this useful : 
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

--- On Wed, 3/13/13, Burgmans, Tom <tom.burgm...@wolterskluwer.com> wrote:

> From: Burgmans, Tom <tom.burgm...@wolterskluwer.com>
> Subject: strange edismax parsing when searching in multiple fields (#TB)
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Wednesday, March 13, 2013, 5:22 PM
> Hi group,
> 
> Background:
> I have a collection containing English and French documents.
> I made sure to index the English content in field "body"
> (fieldType=text_en) and the French content in field
> "body_fr" (fieldType=text_fr).
> 
> The user could be either English of French so the goal is to
> execute the queries against both fields simultaneously
> without knowing the query language upfront. The query is
> analyzed differently for each field. For both fields a
> stopFilter is configured with each its own list of stopwords
> (different per language).
> 
> The issue:
> When I search for 'a result' (without single quotes) in
> field "body" and "body_fr" at the same time, then "a" is
> considered a stopword in English and removed for field
> "body", but not in French so both terms are still searched
> inside "body_fr". What happens is that the query is parsed
> (edismax) into this construction:
> 
> ((body_fr:a)~1.0 (body:result | body_fr:result)~1.0)
> 
> This query returns only French documents, although there are
> many English documents in the index that contain the term
> 'result' as well. How can that happen? I think it is related
> to the way my query is parsed: there seems to be an
> AND-relationship between (body_fr:a) and (body:result |
> body_fr:result). There is no English document that has
> (body_fr:a), so that's why they don't show up. For me a much
> more logic parsed query would be:
> 
> ((body:result)~1.0 | (body_fr:a body_fr:result)~1.0)
> 
> How should I interpret this? Is it a bug in edismax? Is it
> intended and if yes: why?
> 
> Thanks for any hint,
> Tom
> 
> This email and any attachments may contain confidential or
> privileged information
> and is intended for the addressee only. If you are not the
> intended recipient, please
> immediately notify us by email or telephone and delete the
> original email and attachments
> without using, disseminating or reproducing its contents to
> anyone other than the intended
> recipient. Wolters Kluwer shall not be liable for the
> incorrect or incomplete transmission of
> of this email or any attachments, nor for unauthorized use
> by its employees.
> 
> Wolters Kluwer nv has its registered address in Alphen aan
> den Rijn, The Netherlands, and is registered
> with the Trade Registry of the Dutch Chamber of Commerce
> under number 33202517.
> 

Reply via email to