Hi Tom, I don't use stop word removal either. I use hl.q parameter fed with "meaningful words". http://wiki.apache.org/solr/HighlightingParameters#hl.q
--- On Wed, 3/13/13, Burgmans, Tom <tom.burgm...@wolterskluwer.com> wrote: > From: Burgmans, Tom <tom.burgm...@wolterskluwer.com> > Subject: RE: [SPAM] Re: strange edismax parsing when searching in multiple > fields (#TB) > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Date: Wednesday, March 13, 2013, 5:55 PM > The main reason of using stopwords is > to speed up query performance, since we see that a huge part > is consumed by highlighting stopwords. Also when reading the > full highlighted document, we think that it makes a document > better readable when only meaningful words are highlighted. > > For searching in fact I like to keep stopwords... > > > -----Original Message----- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Wednesday 13 March 2013 04:43 > To: solr-user@lucene.apache.org > Subject: [SPAM] Re: strange edismax parsing when searching > in multiple fields (#TB) > Importance: Low > > Or don't use stopwords. I haven't used stopwords for, oh, a > dozen years or so. > > Removing stopwords was a hack developed for 16-bit computers > and 40 megabyte disks. We don't need to do that any more. > > wunder > > On Mar 13, 2013, at 8:28 AM, Ahmet Arslan wrote: > > > I would merge stop_en.txt and stop_fr.txt. Use same set > of stop words for all fields that you search on. > > > > You might find this useful : > > http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ > > > > --- On Wed, 3/13/13, Burgmans, Tom <tom.burgm...@wolterskluwer.com> > wrote: > > > >> From: Burgmans, Tom <tom.burgm...@wolterskluwer.com> > >> Subject: strange edismax parsing when searching in > multiple fields (#TB) > >> To: "solr-user@lucene.apache.org" > <solr-user@lucene.apache.org> > >> Date: Wednesday, March 13, 2013, 5:22 PM > >> Hi group, > >> > >> Background: > >> I have a collection containing English and French > documents. > >> I made sure to index the English content in field > "body" > >> (fieldType=text_en) and the French content in > field > >> "body_fr" (fieldType=text_fr). > >> > >> The user could be either English of French so the > goal is to > >> execute the queries against both fields > simultaneously > >> without knowing the query language upfront. The > query is > >> analyzed differently for each field. For both > fields a > >> stopFilter is configured with each its own list of > stopwords > >> (different per language). > >> > >> The issue: > >> When I search for 'a result' (without single > quotes) in > >> field "body" and "body_fr" at the same time, then > "a" is > >> considered a stopword in English and removed for > field > >> "body", but not in French so both terms are still > searched > >> inside "body_fr". What happens is that the query is > parsed > >> (edismax) into this construction: > >> > >> ((body_fr:a)~1.0 (body:result | > body_fr:result)~1.0) > >> > >> This query returns only French documents, although > there are > >> many English documents in the index that contain > the term > >> 'result' as well. How can that happen? I think it > is related > >> to the way my query is parsed: there seems to be > an > >> AND-relationship between (body_fr:a) and > (body:result | > >> body_fr:result). There is no English document that > has > >> (body_fr:a), so that's why they don't show up. For > me a much > >> more logic parsed query would be: > >> > >> ((body:result)~1.0 | (body_fr:a > body_fr:result)~1.0) > >> > >> How should I interpret this? Is it a bug in > edismax? Is it > >> intended and if yes: why? > >> > >> Thanks for any hint, > >> Tom > >> > >> This email and any attachments may contain > confidential or > >> privileged information > >> and is intended for the addressee only. If you are > not the > >> intended recipient, please > >> immediately notify us by email or telephone and > delete the > >> original email and attachments > >> without using, disseminating or reproducing its > contents to > >> anyone other than the intended > >> recipient. Wolters Kluwer shall not be liable for > the > >> incorrect or incomplete transmission of > >> of this email or any attachments, nor for > unauthorized use > >> by its employees. > >> > >> Wolters Kluwer nv has its registered address in > Alphen aan > >> den Rijn, The Netherlands, and is registered > >> with the Trade Registry of the Dutch Chamber of > Commerce > >> under number 33202517. > >> > > -- > Walter Underwood > wun...@wunderwood.org > > > > > This email and any attachments may contain confidential or > privileged information > and is intended for the addressee only. If you are not the > intended recipient, please > immediately notify us by email or telephone and delete the > original email and attachments > without using, disseminating or reproducing its contents to > anyone other than the intended > recipient. Wolters Kluwer shall not be liable for the > incorrect or incomplete transmission of > of this email or any attachments, nor for unauthorized use > by its employees. > > Wolters Kluwer nv has its registered address in Alphen aan > den Rijn, The Netherlands, and is registered > with the Trade Registry of the Dutch Chamber of Commerce > under number 33202517. >