[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced

2017-04-05 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-10423:
--
Fix Version/s: 6.6
   master (7.0)

> ShingleFilter causes overly restrictive queries to be produced
> --
>
> Key: SOLR-10423
> URL: https://issues.apache.org/jira/browse/SOLR-10423
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.5
>Reporter: Steve Rowe
>Assignee: Steve Rowe
> Fix For: master (7.0), 6.6, 6.5.1
>
> Attachments: SOLR-10423.patch
>
>
> When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, 
> {{QueryBuilder}} produces queries that inappropriately require sequential 
> terms.  E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query 
> analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}.
> Aman Deep Singh reported this problem on the solr-user list. From 
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]:
> {quote}
> I was trying to use the shingle filter but it was not creating the query as
> desirable.
> my schema is
> {noformat}
>  positionIncrementGap="100">
>   
> 
>  maxShingleSize="4"/>
> 
>   
> 
> 
> {noformat}
> my solr query is
> {noformat}
> http://localhost:8983/solr/productCollection/select?
>  defType=edismax
> =true
> =one%20plus%20one%20four
> =nameShingle
> =false
> =xml
> {noformat}
> and it was creating the parsed query as
> {noformat}
> 
> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one 
> +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one 
> four)))~1)/no_coord
> 
> 
> *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four))
> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
> 
> {noformat}
> So ideally token creations is perfect but in the query it is using boolean + 
> operator which is causing the problem as if i have a document with name as 
> "one plus one" ,according to the shingles it has to matched as its token will 
> be  ("one plus","one plus one","plus one") .
> I have tried using the q.op and played around the mm also but nothing is
> giving me the correct response.
> Any idea how i can fetch that document even if the document is missing any
> token.
> My expected response will be getting the document "one plus one" even the 
> user query has any additional term like "one plus one two" and so on.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced

2017-04-05 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-10423:
--
Attachment: SOLR-10423.patch

Patch with suggested fix and tests: on Solr {{}} allows functional queries over ShingleFilter'd 
fields.

Running tests and precommit now.  I'd like to include this in Solr 6.5.1.

> ShingleFilter causes overly restrictive queries to be produced
> --
>
> Key: SOLR-10423
> URL: https://issues.apache.org/jira/browse/SOLR-10423
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.5
>Reporter: Steve Rowe
> Attachments: SOLR-10423.patch
>
>
> When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, 
> {{QueryBuilder}} produces queries that inappropriately require sequential 
> terms.  E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query 
> analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}.
> Aman Deep Singh reported this problem on the solr-user list. From 
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]:
> {quote}
> I was trying to use the shingle filter but it was not creating the query as
> desirable.
> my schema is
> {noformat}
>  positionIncrementGap="100">
>   
> 
>  maxShingleSize="4"/>
> 
>   
> 
> 
> {noformat}
> my solr query is
> {noformat}
> http://localhost:8983/solr/productCollection/select?
>  defType=edismax
> =true
> =one%20plus%20one%20four
> =nameShingle
> =false
> =xml
> {noformat}
> and it was creating the parsed query as
> {noformat}
> 
> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one 
> +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one 
> four)))~1)/no_coord
> 
> 
> *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four))
> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
> 
> {noformat}
> So ideally token creations is perfect but in the query it is using boolean + 
> operator which is causing the problem as if i have a document with name as 
> "one plus one" ,according to the shingles it has to matched as its token will 
> be  ("one plus","one plus one","plus one") .
> I have tried using the q.op and played around the mm also but nothing is
> giving me the correct response.
> Any idea how i can fetch that document even if the document is missing any
> token.
> My expected response will be getting the document "one plus one" even the 
> user query has any additional term like "one plus one two" and so on.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced

2017-04-04 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-10423:
--
Affects Version/s: 6.5

> ShingleFilter causes overly restrictive queries to be produced
> --
>
> Key: SOLR-10423
> URL: https://issues.apache.org/jira/browse/SOLR-10423
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.5
>Reporter: Steve Rowe
>
> When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, 
> {{QueryBuilder}} produces queries that inappropriately require sequential 
> terms.  E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query 
> analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}.
> Aman Deep Singh reported this problem on the solr-user list. From 
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]:
> {quote}
> I was trying to use the shingle filter but it was not creating the query as
> desirable.
> my schema is
> {noformat}
>  positionIncrementGap="100">
>   
> 
>  maxShingleSize="4"/>
> 
>   
> 
> 
> {noformat}
> my solr query is
> {noformat}
> http://localhost:8983/solr/productCollection/select?
>  defType=edismax
> =true
> =one%20plus%20one%20four
> =nameShingle
> =false
> =xml
> {noformat}
> and it was creating the parsed query as
> {noformat}
> 
> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one 
> +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one 
> four)))~1)/no_coord
> 
> 
> *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four))
> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
> 
> {noformat}
> So ideally token creations is perfect but in the query it is using boolean + 
> operator which is causing the problem as if i have a document with name as 
> "one plus one" ,according to the shingles it has to matched as its token will 
> be  ("one plus","one plus one","plus one") .
> I have tried using the q.op and played around the mm also but nothing is
> giving me the correct response.
> Any idea how i can fetch that document even if the document is missing any
> token.
> My expected response will be getting the document "one plus one" even the 
> user query has any additional term like "one plus one two" and so on.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10423) ShingleFilter causes overly restrictive queries to be produced

2017-04-04 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-10423:
--
Component/s: query parsers

> ShingleFilter causes overly restrictive queries to be produced
> --
>
> Key: SOLR-10423
> URL: https://issues.apache.org/jira/browse/SOLR-10423
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.5
>Reporter: Steve Rowe
>
> When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, 
> {{QueryBuilder}} produces queries that inappropriately require sequential 
> terms.  E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query 
> analyzer includes {{ maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}.
> Aman Deep Singh reported this problem on the solr-user list. From 
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]:
> {quote}
> I was trying to use the shingle filter but it was not creating the query as
> desirable.
> my schema is
> {noformat}
>  positionIncrementGap="100">
>   
> 
>  maxShingleSize="4"/>
> 
>   
> 
> 
> {noformat}
> my solr query is
> {noformat}
> http://localhost:8983/solr/productCollection/select?
>  defType=edismax
> =true
> =one%20plus%20one%20four
> =nameShingle
> =false
> =xml
> {noformat}
> and it was creating the parsed query as
> {noformat}
> 
> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one 
> +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one 
> four)))~1)/no_coord
> 
> 
> *++nameShingle:one plus +nameShingle:plus one +nameShingle:one four))
> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
> 
> {noformat}
> So ideally token creations is perfect but in the query it is using boolean + 
> operator which is causing the problem as if i have a document with name as 
> "one plus one" ,according to the shingles it has to matched as its token will 
> be  ("one plus","one plus one","plus one") .
> I have tried using the q.op and played around the mm also but nothing is
> giving me the correct response.
> Any idea how i can fetch that document even if the document is missing any
> token.
> My expected response will be getting the document "one plus one" even the 
> user query has any additional term like "one plus one two" and so on.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org