Thanks a lot, Ahmet. I’ve just read up on this query field parameter and it sounds good. Since the field contents are currently all identical, I can’t really test it, yet.
Cheers, Martin > Am 25.03.2015 um 21:27 schrieb Ahmet Arslan <iori...@yahoo.com.INVALID>: > > Hi Martin, > > fq means filter query. May be you want to use qf (query fields) parameter of > edismax? > > > > On Wednesday, March 25, 2015 9:23 PM, Martin Wunderlich <martin...@gmx.net> > wrote: > Hi all, > > I am wondering what the process is for applying Tokenizers and Filter (as > defined in the FieldType definition) to field contents that result from > CopyFields. To be more specific, in my Solr instance, Iwould like to support > query expansion by two means: removing stop words and adding inflected word > forms as synonyms. > > To use a specific example, let’s say I have the following sentence to be > indexed (from a Wittgenstein manuscript): > > "Was zum Wesen der Welt gehört, kann die Sprache nicht ausdrücken.“ > > > This sentence will be indexed in a field called „original“ that is defined as > follows: > > <field name="original" type="text_original" indexed="true" stored="true" > required="true“/> > > <fieldType name="text_windex_original" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > </analyzer> > </fieldType> > > > Then, in order to create fields for the two types of query expansion, I have > set up specific fields for this: > > - one field where stopwords are removed both on the indexed content and the > query. So, if the users is searching for a phrase like „der Sprache“, Solr > should still find the segment above, because the determiners („der“ and > „die“) are removed prior to indexing and prior to querying, respectively. > This field is defined as follows: > > <field name="stopwords_removed" type="text_stopwords_removed" indexed="true" > stored="true" required="true“/> > > <fieldType name="text_stopwords_removed" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words=„stopwords_de.txt" format="snowball"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_de.txt" format="snowball"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > > - a second field where synonyms are added to the query so that more segments > will be found. For instance, if the user is searching for the plural form > „Sprachen“, Solr should return the segment above, due to this entry in the > synonyms file: "Sprache,Sprach,Sprachen“. This field is defined as follows: > > <field name="expanded" type="text_multiplied" indexed="true" stored="true" > required="true“/>expanded > > <fieldType name="text_expanded" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_de.txt" format="snowball"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_de.txt" format="snowball"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms_de.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > Finally, to avoid having to specify three fields with identical content in > the import documents, I am defining the two fields for query expansion as > copyFields: > > <copyField source="original" dest="stopwords_removed"/> > <copyField source="original" dest="expanded“/> > > Now, my expectation would be as follows: > - during import, two temporary fields are created by copying content from the > original field > - these two temporary fields are then pre-processed as per the definitions > above > - the pre-processed version of the text is added to the index > - then, the user can search for „Sprache“, „sprache“, „Sprachen“ or „der > Sprache“ and will always get the segment above as a matching result. > > However, what happens actually is that I get matches only for „Sprache“ and > „sprache“. > > The other thing that strikes as odd, is that when I restrict the search to > one of the fields only using the „fq“ parameter, I get no results. For > instance: > http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true > > <http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true> > > will return no matches. I would expected that using the fq parameter the user > can specify what type of search (s)he would like to carry out: A standard > search (field original) or an expanded search (one of the other two fields). > > For debugging, I have checked the analysis and results seem ok (posted > below). > Apologies for the long post, but I am really a bit stuck here (even after > doing a lot of reading and googling). It is probably something simple that I > missing. > Thanks a lot in advance for any help. > > Cheers, > > Martin > > > ST > Was > zum > Wesen > > der > Welt > gehört > kann > die > Sprache > nicht > ausdrücken > SF > Was > zum > Wesen > > Welt > gehört > kann > die > Sprache > nicht > ausdrücken > LCF > was > zum > wesen > > welt > gehört > kann > die > sprache > nicht > ausdrücken