Re: Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym
I don't think you can synonym-ize both the multi-token phrase and each individual token in the multi-token phrase at the same time. But anyone else feel free to chime in! Best, Audrey Lorberfeld On 3/16/20, 12:40 PM, "atin janki" wrote: I aim to achieve an expansion like - Synonym(soap powder) + Synonym(soap) + Synonym (powder) which is not happening because of the Synonym expansion is being done at the moment. At the moment, using Synonym Graph Filter with StandardTokenizer and sow = false , expands as - Synonym(soap powder) because "soap powder" is a multi-word synonym present in the synonym file. Using sow = true in the above setting will give - Synonym(soap) + Synonym (powder) Best Regards, Atin Janki On Mon, Mar 16, 2020 at 5:27 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > To confirm, you want a synonym like "soap powder" to map onto synonyms > like "hand soap," "hygiene products," etc? As in, more of a cognitive > synonym mapping where you feed synonyms that only apply to the multi-token > phrase as a whole? > > On 3/16/20, 12:17 PM, "atin janki" wrote: > > Using sow=true, does split the word on whitespaces but it will not > look for > synonyms of "soap powder" anymore, rather it expands separate synonyms > for > "soap" and "powder". > > > > Best Regards, > Atin Janki > > > On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > Have you set sow=true in your search handler? I know that we have it > set > > to false (sow = split on whitespace) because we WANT multi-token > synonyms > > retained as multiple tokens. > > > > On 3/16/20, 10:49 AM, "atin janki" wrote: > > > > Hello everyone, > > > > I am using solr 8.3. > > > > After I included Synonym Graph Filter in my managed-schema file, > I > > have noticed that if the query string contains a multi-word > synonym, > > it considers that multi-word synonym as a single term and does > not > > break it, further suppressing the default search behaviour. > > > > I am using StandardTokenizer. > > > > Below is a snippet from managed-schema file - > > > > > > > > * > positionIncrementGap="100" multiValued="true">* > > > ** > > > * * > > > * words="stopwords.txt" > > ignoreCase="true"/>* > > > * * > > > ** > > > ** > > > * * > > > * words="stopwords.txt" > > ignoreCase="true"/>* > > > * expand="true" > > ignoreCase="true" synonyms="synonyms.txt"/>* > > > * * > > > *** * > > > > > > Here "*soap powder*" is the search *query* which is also a > multi-word > > synonym in the synonym file as- > > > > > s(104254535,1,'soap powder',n,1,1). > > > s(104254535,2,'built-soap powder',n,1,0). > > > s(104254535,3,'washing powder',n,1,0). > > > > > > I am sharing some screenshots for understanding the problem- > > > > *without* Synonym Graph Filter => 2 docs returned (screenshot at > > below mentioned URL) - > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k= > > > > *with* Synonym Graph Filter => 2 docs expected, only 1 returned > > (screenshot at below mentioned URL) - > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks= > > > > > > Has anyone experienced this before? If yes, is there any > workaround ? > > Or is it an expected behaviour? > > > > Regards, > > Atin Janki > > > > > > > > >
Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym
I aim to achieve an expansion like - Synonym(soap powder) + Synonym(soap) + Synonym (powder) which is not happening because of the Synonym expansion is being done at the moment. At the moment, using Synonym Graph Filter with StandardTokenizer and sow = false , expands as - Synonym(soap powder) because "soap powder" is a multi-word synonym present in the synonym file. Using sow = true in the above setting will give - Synonym(soap) + Synonym (powder) Best Regards, Atin Janki On Mon, Mar 16, 2020 at 5:27 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > To confirm, you want a synonym like "soap powder" to map onto synonyms > like "hand soap," "hygiene products," etc? As in, more of a cognitive > synonym mapping where you feed synonyms that only apply to the multi-token > phrase as a whole? > > On 3/16/20, 12:17 PM, "atin janki" wrote: > > Using sow=true, does split the word on whitespaces but it will not > look for > synonyms of "soap powder" anymore, rather it expands separate synonyms > for > "soap" and "powder". > > > > Best Regards, > Atin Janki > > > On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > Have you set sow=true in your search handler? I know that we have it > set > > to false (sow = split on whitespace) because we WANT multi-token > synonyms > > retained as multiple tokens. > > > > On 3/16/20, 10:49 AM, "atin janki" wrote: > > > > Hello everyone, > > > > I am using solr 8.3. > > > > After I included Synonym Graph Filter in my managed-schema file, > I > > have noticed that if the query string contains a multi-word > synonym, > > it considers that multi-word synonym as a single term and does > not > > break it, further suppressing the default search behaviour. > > > > I am using StandardTokenizer. > > > > Below is a snippet from managed-schema file - > > > > > > > > * > positionIncrementGap="100" multiValued="true">* > > > ** > > > * * > > > * words="stopwords.txt" > > ignoreCase="true"/>* > > > * * > > > ** > > > ** > > > * * > > > * words="stopwords.txt" > > ignoreCase="true"/>* > > > * expand="true" > > ignoreCase="true" synonyms="synonyms.txt"/>* > > > * * > > > *** * > > > > > > Here "*soap powder*" is the search *query* which is also a > multi-word > > synonym in the synonym file as- > > > > > s(104254535,1,'soap powder',n,1,1). > > > s(104254535,2,'built-soap powder',n,1,0). > > > s(104254535,3,'washing powder',n,1,0). > > > > > > I am sharing some screenshots for understanding the problem- > > > > *without* Synonym Graph Filter => 2 docs returned (screenshot at > > below mentioned URL) - > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k= > > > > *with* Synonym Graph Filter => 2 docs expected, only 1 returned > > (screenshot at below mentioned URL) - > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks= > > > > > > Has anyone experienced this before? If yes, is there any > workaround ? > > Or is it an expected behaviour? > > > > Regards, > > Atin Janki > > > > > > > > >
Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym
To confirm, you want a synonym like "soap powder" to map onto synonyms like "hand soap," "hygiene products," etc? As in, more of a cognitive synonym mapping where you feed synonyms that only apply to the multi-token phrase as a whole? On 3/16/20, 12:17 PM, "atin janki" wrote: Using sow=true, does split the word on whitespaces but it will not look for synonyms of "soap powder" anymore, rather it expands separate synonyms for "soap" and "powder". Best Regards, Atin Janki On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Have you set sow=true in your search handler? I know that we have it set > to false (sow = split on whitespace) because we WANT multi-token synonyms > retained as multiple tokens. > > On 3/16/20, 10:49 AM, "atin janki" wrote: > > Hello everyone, > > I am using solr 8.3. > > After I included Synonym Graph Filter in my managed-schema file, I > have noticed that if the query string contains a multi-word synonym, > it considers that multi-word synonym as a single term and does not > break it, further suppressing the default search behaviour. > > I am using StandardTokenizer. > > Below is a snippet from managed-schema file - > > > > > * positionIncrementGap="100" multiValued="true">* > > ** > > * * > > * ignoreCase="true"/>* > > * * > > ** > > ** > > * * > > * ignoreCase="true"/>* > > * ignoreCase="true" synonyms="synonyms.txt"/>* > > * * > > *** * > > > Here "*soap powder*" is the search *query* which is also a multi-word > synonym in the synonym file as- > > > s(104254535,1,'soap powder',n,1,1). > > s(104254535,2,'built-soap powder',n,1,0). > > s(104254535,3,'washing powder',n,1,0). > > > I am sharing some screenshots for understanding the problem- > > *without* Synonym Graph Filter => 2 docs returned (screenshot at > below mentioned URL) - > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k= > > *with* Synonym Graph Filter => 2 docs expected, only 1 returned > (screenshot at below mentioned URL) - > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks= > > > Has anyone experienced this before? If yes, is there any workaround ? > Or is it an expected behaviour? > > Regards, > Atin Janki > > >
Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym
Using sow=true, does split the word on whitespaces but it will not look for synonyms of "soap powder" anymore, rather it expands separate synonyms for "soap" and "powder". Best Regards, Atin Janki On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Have you set sow=true in your search handler? I know that we have it set > to false (sow = split on whitespace) because we WANT multi-token synonyms > retained as multiple tokens. > > On 3/16/20, 10:49 AM, "atin janki" wrote: > > Hello everyone, > > I am using solr 8.3. > > After I included Synonym Graph Filter in my managed-schema file, I > have noticed that if the query string contains a multi-word synonym, > it considers that multi-word synonym as a single term and does not > break it, further suppressing the default search behaviour. > > I am using StandardTokenizer. > > Below is a snippet from managed-schema file - > > > > > * positionIncrementGap="100" multiValued="true">* > > ** > > * * > > * ignoreCase="true"/>* > > * * > > ** > > ** > > * * > > * ignoreCase="true"/>* > > * ignoreCase="true" synonyms="synonyms.txt"/>* > > * * > > *** * > > > Here "*soap powder*" is the search *query* which is also a multi-word > synonym in the synonym file as- > > > s(104254535,1,'soap powder',n,1,1). > > s(104254535,2,'built-soap powder',n,1,0). > > s(104254535,3,'washing powder',n,1,0). > > > I am sharing some screenshots for understanding the problem- > > *without* Synonym Graph Filter => 2 docs returned (screenshot at > below mentioned URL) - > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k= > > *with* Synonym Graph Filter => 2 docs expected, only 1 returned > (screenshot at below mentioned URL) - > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks= > > > Has anyone experienced this before? If yes, is there any workaround ? > Or is it an expected behaviour? > > Regards, > Atin Janki > > >
Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym
Have you set sow=true in your search handler? I know that we have it set to false (sow = split on whitespace) because we WANT multi-token synonyms retained as multiple tokens. On 3/16/20, 10:49 AM, "atin janki" wrote: Hello everyone, I am using solr 8.3. After I included Synonym Graph Filter in my managed-schema file, I have noticed that if the query string contains a multi-word synonym, it considers that multi-word synonym as a single term and does not break it, further suppressing the default search behaviour. I am using StandardTokenizer. Below is a snippet from managed-schema file - > > * * > ** > * * > * * > * * > ** > ** > * * > * * > * * > * * > *** * Here "*soap powder*" is the search *query* which is also a multi-word synonym in the synonym file as- > s(104254535,1,'soap powder',n,1,1). > s(104254535,2,'built-soap powder',n,1,0). > s(104254535,3,'washing powder',n,1,0). I am sharing some screenshots for understanding the problem- *without* Synonym Graph Filter => 2 docs returned (screenshot at below mentioned URL) - https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k= *with* Synonym Graph Filter => 2 docs expected, only 1 returned (screenshot at below mentioned URL) - https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks= Has anyone experienced this before? If yes, is there any workaround ? Or is it an expected behaviour? Regards, Atin Janki