[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced
[ https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-10423: -- Fix Version/s: 6.6 master (7.0) > ShingleFilter causes overly restrictive queries to be produced > -- > > Key: SOLR-10423 > URL: https://issues.apache.org/jira/browse/SOLR-10423 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.5 >Reporter: Steve Rowe >Assignee: Steve Rowe > Fix For: master (7.0), 6.6, 6.5.1 > > Attachments: SOLR-10423.patch > > > When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, > {{QueryBuilder}} produces queries that inappropriately require sequential > terms. E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query > analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}. > Aman Deep Singh reported this problem on the solr-user list. From > [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]: > {quote} > I was trying to use the shingle filter but it was not creating the query as > desirable. > my schema is > {noformat} > positionIncrementGap="100"> > > > maxShingleSize="4"/> > > > > > {noformat} > my solr query is > {noformat} > http://localhost:8983/solr/productCollection/select? > defType=edismax > =true > =one%20plus%20one%20four > =nameShingle > =false > =xml > {noformat} > and it was creating the parsed query as > {noformat} > > (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one > +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus > +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one > +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one > four)))~1)/no_coord > > > *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four)) > ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one > plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)* > > {noformat} > So ideally token creations is perfect but in the query it is using boolean + > operator which is causing the problem as if i have a document with name as > "one plus one" ,according to the shingles it has to matched as its token will > be ("one plus","one plus one","plus one") . > I have tried using the q.op and played around the mm also but nothing is > giving me the correct response. > Any idea how i can fetch that document even if the document is missing any > token. > My expected response will be getting the document "one plus one" even the > user query has any additional term like "one plus one two" and so on. > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced
[ https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-10423: -- Attachment: SOLR-10423.patch Patch with suggested fix and tests: on Solr {{}} allows functional queries over ShingleFilter'd fields. Running tests and precommit now. I'd like to include this in Solr 6.5.1. > ShingleFilter causes overly restrictive queries to be produced > -- > > Key: SOLR-10423 > URL: https://issues.apache.org/jira/browse/SOLR-10423 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.5 >Reporter: Steve Rowe > Attachments: SOLR-10423.patch > > > When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, > {{QueryBuilder}} produces queries that inappropriately require sequential > terms. E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query > analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}. > Aman Deep Singh reported this problem on the solr-user list. From > [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]: > {quote} > I was trying to use the shingle filter but it was not creating the query as > desirable. > my schema is > {noformat} > positionIncrementGap="100"> > > > maxShingleSize="4"/> > > > > > {noformat} > my solr query is > {noformat} > http://localhost:8983/solr/productCollection/select? > defType=edismax > =true > =one%20plus%20one%20four > =nameShingle > =false > =xml > {noformat} > and it was creating the parsed query as > {noformat} > > (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one > +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus > +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one > +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one > four)))~1)/no_coord > > > *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four)) > ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one > plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)* > > {noformat} > So ideally token creations is perfect but in the query it is using boolean + > operator which is causing the problem as if i have a document with name as > "one plus one" ,according to the shingles it has to matched as its token will > be ("one plus","one plus one","plus one") . > I have tried using the q.op and played around the mm also but nothing is > giving me the correct response. > Any idea how i can fetch that document even if the document is missing any > token. > My expected response will be getting the document "one plus one" even the > user query has any additional term like "one plus one two" and so on. > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced
[ https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-10423: -- Affects Version/s: 6.5 > ShingleFilter causes overly restrictive queries to be produced > -- > > Key: SOLR-10423 > URL: https://issues.apache.org/jira/browse/SOLR-10423 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.5 >Reporter: Steve Rowe > > When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, > {{QueryBuilder}} produces queries that inappropriately require sequential > terms. E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query > analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}. > Aman Deep Singh reported this problem on the solr-user list. From > [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]: > {quote} > I was trying to use the shingle filter but it was not creating the query as > desirable. > my schema is > {noformat} > positionIncrementGap="100"> > > > maxShingleSize="4"/> > > > > > {noformat} > my solr query is > {noformat} > http://localhost:8983/solr/productCollection/select? > defType=edismax > =true > =one%20plus%20one%20four > =nameShingle > =false > =xml > {noformat} > and it was creating the parsed query as > {noformat} > > (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one > +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus > +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one > +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one > four)))~1)/no_coord > > > *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four)) > ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one > plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)* > > {noformat} > So ideally token creations is perfect but in the query it is using boolean + > operator which is causing the problem as if i have a document with name as > "one plus one" ,according to the shingles it has to matched as its token will > be ("one plus","one plus one","plus one") . > I have tried using the q.op and played around the mm also but nothing is > giving me the correct response. > Any idea how i can fetch that document even if the document is missing any > token. > My expected response will be getting the document "one plus one" even the > user query has any additional term like "one plus one two" and so on. > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced
[ https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-10423: -- Component/s: query parsers > ShingleFilter causes overly restrictive queries to be produced > -- > > Key: SOLR-10423 > URL: https://issues.apache.org/jira/browse/SOLR-10423 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.5 >Reporter: Steve Rowe > > When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, > {{QueryBuilder}} produces queries that inappropriately require sequential > terms. E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query > analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}. > Aman Deep Singh reported this problem on the solr-user list. From > [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]: > {quote} > I was trying to use the shingle filter but it was not creating the query as > desirable. > my schema is > {noformat} > positionIncrementGap="100"> > > > maxShingleSize="4"/> > > > > > {noformat} > my solr query is > {noformat} > http://localhost:8983/solr/productCollection/select? > defType=edismax > =true > =one%20plus%20one%20four > =nameShingle > =false > =xml > {noformat} > and it was creating the parsed query as > {noformat} > > (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one > +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus > +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one > +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one > four)))~1)/no_coord > > > *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four)) > ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one > plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)* > > {noformat} > So ideally token creations is perfect but in the query it is using boolean + > operator which is causing the problem as if i have a document with name as > "one plus one" ,according to the shingles it has to matched as its token will > be ("one plus","one plus one","plus one") . > I have tried using the q.op and played around the mm also but nothing is > giving me the correct response. > Any idea how i can fetch that document even if the document is missing any > token. > My expected response will be getting the document "one plus one" even the > user query has any additional term like "one plus one two" and so on. > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org